TY - GEN
T1 - Adapting Cross-lingual Model To Improve Vietnamese Dependency Parsing
AU - Do Tran, Anh Duc
AU - Dinh, Dien
AU - Luong, An-Vinh
AU - Do, Thao
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022/5/19
Y1 - 2022/5/19
N2 - Dependency parsing is the task of analyzing the syntax of a sentence into a direct binary relational graph. Many languages have had the state-of-the-art model in this task and used it as a knowledge base to solve more complex problems. However, to achieve high accuracy in a dependency parsing model, it takes significant time and labor to build a large amount of annotated treebanks. For languages with little or no annotated treebanks, some approaches have been studied to induce a dependency parser from treebanks of high-resource languages to solve this problem. In this paper, we propose an approach to building a cross-lingual model to parse Vietnamese as a low-resource target language. The model uses English as a supportive high-resource source language to induce a Vietnamese parser. To remove the differences in syntaxes and lexicons of English and Vietnamese when training the model, the approach uses a filtering algorithm to choose English sentences having syntaxes as same as Vietnamese sentences based on Euclidean distance. The result shows that the proposed model significantly improves accuracy compared with models using only supervised mono-lingual treebanks.
AB - Dependency parsing is the task of analyzing the syntax of a sentence into a direct binary relational graph. Many languages have had the state-of-the-art model in this task and used it as a knowledge base to solve more complex problems. However, to achieve high accuracy in a dependency parsing model, it takes significant time and labor to build a large amount of annotated treebanks. For languages with little or no annotated treebanks, some approaches have been studied to induce a dependency parser from treebanks of high-resource languages to solve this problem. In this paper, we propose an approach to building a cross-lingual model to parse Vietnamese as a low-resource target language. The model uses English as a supportive high-resource source language to induce a Vietnamese parser. To remove the differences in syntaxes and lexicons of English and Vietnamese when training the model, the approach uses a filtering algorithm to choose English sentences having syntaxes as same as Vietnamese sentences based on Euclidean distance. The result shows that the proposed model significantly improves accuracy compared with models using only supervised mono-lingual treebanks.
KW - Cross-lingual method
KW - Deep biaffine attention
KW - Dependency parsing
KW - Low-resource language
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85130903256&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-97610-1_9
DO - 10.1007/978-3-030-97610-1_9
M3 - Chapter in a published conference proceeding
SN - 9783030976095
T3 - Lecture Notes on Data Engineering and Communications Technologies
SP - 97
EP - 108
BT - Artificial Intelligence in Data and Big Data Processing
A2 - Thanh Dang, Ngoc Hoang
A2 - Zhang, Yu-Dong
A2 - Travers, Joaa Manuel R. S.
A2 - Chen, Bo-Hao
PB - Springer
CY - Cham, Switzerland
ER -