SeCoDa: Sense Complexity Dataset

David Strohmaier, Sian Gooding, Shiva Taslimipoor, Ekaterina Kochmar

Research output: Chapter or section in a book/report/conference proceedingChapter in a published conference proceeding


The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation, SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way, we can offer more coarse-grained senses than directly available in WordNet.
Original languageEnglish
Title of host publicationProceedings of the 12th Conference on Language Resources and Evolution
Subtitle of host publicationLREC 2020
PublisherEuropean Language Resources Association (ELRA)
Number of pages6
Publication statusPublished - 11 May 2021
EventLREC 2020: Proceedings of the 12th Conference on Language Resources and Evaluation - Virtual, Marseille, France
Duration: 11 May 202016 May 2020


ConferenceLREC 2020
Abbreviated titleLREC 2020
Internet address


Dive into the research topics of 'SeCoDa: Sense Complexity Dataset'. Together they form a unique fingerprint.

Cite this