Description and task
Three tracks are suggested:
- Identification of terms;
- Identification and сlassification of terms into 3 classes (specific_term, common_term, nomen);
- Transfer experiments on other domains – identification and classification of terms into 3 classes (specific_term, common_term, nomen).
By terms identification we mean identifying text fragments that are terms in a broad sense.
Terms classes:
- specific term– terms that are both domain and lexically specific;
- common term– terms that are only domain specific (can be known and used by non-specialists);
- nomen– names of unique objects belonging to a specific domain.
Data
Especially for the competition, a manually labeled CL-RuTerm3 dataset was prepared based on Russian-language abstracts of articles from the «Dialogue» conference for the period 2000-2023. The train set consists of 850 abstracts from computational linguistics domain. The test set for Track 1 and Track 2 includes full-text articles of the same domain in addition to abstracts. The test set for Track 3 consists of abstracts from other domains.
Important: there are no full-text articles and abstracts from other domains in the training set; they are included only in the test set to assess the quality of the models.
Timeline
01.07.2024 — publication of the train data;
01.01.2025 — publication of the test data;
10.01.2025 — Shared Task ends;
01.02.2025 — deadline for paper submissions.