Abstract
The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation, SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way, we can offer more coarse-grained senses than directly available in WordNet.
Original language | English |
---|---|
Title of host publication | Proceedings of the 12th Conference on Language Resources and Evolution |
Subtitle of host publication | LREC 2020 |
Publisher | European Language Resources Association (ELRA) |
Pages | 5962-5967 |
Number of pages | 6 |
Publication status | Published - 11 May 2021 |
Event | LREC 2020: Proceedings of the 12th Conference on Language Resources and Evaluation - Virtual, Marseille, France Duration: 11 May 2020 → 16 May 2020 https://lrec2020.lrec-conf.org/en/ |
Conference
Conference | LREC 2020 |
---|---|
Abbreviated title | LREC 2020 |
Country/Territory | France |
City | Marseille |
Period | 11/05/20 → 16/05/20 |
Internet address |