Dialogue Evaluation 2024



Terms identification from Russian-language scientific articles


Description and task

Three tracks are suggested:

  1. Identification of terms;
  2. Identification and сlassification of terms into 3 classes (specific_term, common_term, nomen);
  3. Transfer experiments on other domains – identification and classification of terms into 3 classes (specific_term, common_term, nomen).

By terms identification we mean identifying text fragments that are terms in a broad sense.

Terms classes:

  • specific term– terms that are both domain and lexically specific;
  • common term– terms that are only domain specific (can be known and used by non-specialists);
  • nomen– names of unique objects belonging to a specific domain.


Especially for the competition, a manually labeled CL-RuTerm3 dataset was prepared based on Russian-language abstracts of articles from the «Dialogue» conference for the period 2000-2023. The train set consists of 850 abstracts from computational linguistics domain. The test set for Track 1 and Track 2 includes full-text articles of the same domain in addition to abstracts. The test set for Track 3 consists of abstracts from other domains.

Important: there are no full-text articles and abstracts from other domains in the training set; they are included only in the test set to assess the quality of the models.


01.07.2024 — publication of the train data;

01.01.2025 — publication of the test data;

10.01.2025 — Shared Task ends;

01.02.2025 — deadline for paper submissions.
