SeCoDa: Sense Complexity Dataset

David Strohmaier, Sian Gooding, Shiva Taslimipoor, Ekaterina Kochmar

Research output: Contribution to conferencePaperpeer-review

Abstract

The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation, SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way, we can offer more coarse-grained senses than directly available in WordNet.
Original languageEnglish
Pages5962-5967
Publication statusPublished - 11 May 2021
EventLREC 2020: Proceedings of the 12th Conference on Language Resources and Evaluation - Virtual, Marseille, France
Duration: 11 May 202016 May 2020
https://lrec2020.lrec-conf.org/en/

Conference

ConferenceLREC 2020
Abbreviated titleLREC 2020
CountryFrance
CityMarseille
Period11/05/2016/05/20
Internet address

Cite this