TY - GEN
T1 - Multilingual Instruction Tuning With Just a Pinch of Multilinguality
AU - Shaham, Uri
AU - Herzig, Jonathan
AU - Aharoni, Roee
AU - Szpektor, Idan
AU - Tsarfaty, Reut
AU - Eyal, Matan
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024/1/1
Y1 - 2024/1/1
N2 - As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses.
AB - As instruction-tuned large language models (LLMs) gain global adoption, their ability to follow instructions in multiple languages becomes increasingly crucial. In this work, we investigate how multilinguality during instruction tuning of a multilingual LLM affects instruction-following across languages from the pre-training corpus. We first show that many languages transfer some instruction-following capabilities to other languages from even monolingual tuning. Furthermore, we find that only 40 multilingual examples integrated in an English tuning set substantially improve multilingual instruction-following, both in seen and unseen languages during tuning. In general, we observe that models tuned on multilingual mixtures exhibit comparable or superior performance in multiple languages compared to monolingually tuned models, despite training on 10x fewer examples in those languages. Finally, we find that diversifying the instruction tuning set with even just 2-4 languages significantly improves cross-lingual generalization. Our results suggest that building massively multilingual instruction-tuned models can be done with only a very small set of multilingual instruction-responses.
UR - http://www.scopus.com/inward/record.url?scp=85204006269&partnerID=8YFLogxK
U2 - 10.18653/v1/2024.findings-acl.136
DO - 10.18653/v1/2024.findings-acl.136
M3 - Conference contribution
AN - SCOPUS:85204006269
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 2304
EP - 2317
BT - The 62nd Annual Meeting of the Association for Computational Linguistics
A2 - Ku, Lun-Wei
A2 - Martins, Andre
A2 - Srikumar, Vivek
PB - Association for Computational Linguistics (ACL)
T2 - Findings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Y2 - 11 August 2024 through 16 August 2024
ER -