TY - GEN
T1 - UniMorph 3.0
T2 - 12th International Conference on Language Resources and Evaluation, LREC 2020
AU - McCarthy, Arya D.
AU - Kirov, Christo
AU - Grella, Matteo
AU - Nidhi, Amrit
AU - Xia, Patrick
AU - Gorman, Kyle
AU - Vylomova, Ekaterina
AU - Mielke, Sabrina J.
AU - Nicolai, Garrett
AU - Silfverberg, Miikka
AU - Arkhangelskij, Timofey
AU - Krizhanovsky, Natalya
AU - Krizhanovsky, Andrew
AU - Klyachko, Elena
AU - Sorokin, Alexey
AU - Mansfield, John
AU - Ernštreits, Valts
AU - Pinter, Yuval
AU - Jacobs, Cassandra L.
AU - Cotterell, Ryan
AU - Hulden, Mans
AU - Yarowsky, David
N1 - Funding Information:
We thank Djamé Seddah for important references on syntactic blend annotation. We thank Elizabeth Salesky for help with converting data. We thank Marcell Bollmann, Yova Kementchedjhieva, Matt Post, Richard Sproat, Matthew Wiesner, and Winston Wu for discussions that shaped the direction of the work. Y.P. is a Bloomberg Data Science PhD Fellow.
Publisher Copyright:
© European Language Resources Association (ELRA), licensed under CC-BY-NC
PY - 2020/1/1
Y1 - 2020/1/1
N2 - The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.
AB - The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological paradigms for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. We have implemented several improvements to the extraction pipeline which creates most of our data, so that it is both more complete and more correct. We have added 66 new languages, as well as new parts of speech for 12 languages. We have also amended the schema in several ways. Finally, we present three new community tools: two to validate data for resource creators, and one to make morphological data available from the command line. UniMorph is based at the Center for Language and Speech Processing (CLSP) at Johns Hopkins University in Baltimore, Maryland. This paper details advances made to the schema, tooling, and dissemination of project resources since the UniMorph 2.0 release described at LREC 2018.
KW - Lexical database
KW - Morphology
KW - Multilinguality
UR - http://www.scopus.com/inward/record.url?scp=85096511235&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85096511235
T3 - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
SP - 3922
EP - 3931
BT - LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
A2 - Calzolari, Nicoletta
A2 - Bechet, Frederic
A2 - Blache, Philippe
A2 - Choukri, Khalid
A2 - Cieri, Christopher
A2 - Declerck, Thierry
A2 - Goggi, Sara
A2 - Isahara, Hitoshi
A2 - Maegaard, Bente
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Moreno, Asuncion
A2 - Odijk, Jan
A2 - Piperidis, Stelios
PB - European Language Resources Association (ELRA)
Y2 - 11 May 2020 through 16 May 2020
ER -