Abstract
We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 19 datasets annotated with named entities in a cross-lingual consistent schema across 13 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We will release the data, code, and fitted models to the public.
Original language | English |
---|---|
Title of host publication | Long Papers |
Editors | Kevin Duh, Helena Gomez, Steven Bethard |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 4322-4337 |
Number of pages | 16 |
ISBN (Electronic) | 9798891761148 |
DOIs | |
Publication status | Published - 21 Jun 2024 |
Event | 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 - Hybrid, Mexico City, Mexico Duration: 16 Jun 2024 → 21 Jun 2024 |
Publication series
Name | Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 |
---|---|
Volume | 1 |
Conference
Conference | 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 |
---|---|
Country/Territory | Mexico |
City | Hybrid, Mexico City |
Period | 16/06/24 → 21/06/24 |
Bibliographical note
Publisher Copyright:©2024 Association for Computational Linguistics.
Funding
This project could not have happened without the enthusiastic response and hard work of many annotators in the NLP community, and for that we are extremely grateful. Annotators additional to authors are: Elyanah Aco, Ekaterina Artemova, Vuk Batanovi\u0107, Jay Rhald Caballes Padilla, Chunyuan Deng, Ivo-Pavao Jazbec, Juliane Karlsson, Jozef Kub\u00EDk, Peter Krantz, Myron Darrel Montefalcon, Stefan Schweter, Sif Sonniks, Emil Stenstr\u00F6m, Miriam \u0160uppov\u00E1. We would like to thank Joakim Nivre, Dan Zeman, Matthew Honnibal, \u017Deljko Agi\u0107, Constantine Lignos, and Amir Zeldes for early discussion and helpful ideas at the very beginning of this project. JMI is funded by National University Philippines and the UKRI Centre for Doctoral Training in Accountable, Responsible and Transparent AI [EP/S023437/1] of the University of Bath. Arij Riabi is funded by the European Union\u2019s Horizon 2020 research and innovation programme under grant agreement No. 101021607. Marek \u0160uppa was partially supported by the grant APVV-21-0114.
Funders | Funder number |
---|---|
National University, Philippines | |
University of Bath | |
UKRI CDT in Accountable, Responsible and Transparent AI | EP/S023437/1 |
Horizon 2020 Framework Programme | APVV-21-0114, 101021607 |
Keywords
- cs.CL
ASJC Scopus subject areas
- Computer Networks and Communications
- Hardware and Architecture
- Information Systems
- Software