TY - GEN
T1 - Few-Shot Tabular Data Enrichment Using Fine-Tuned Transformer Architectures
AU - Harari, Asaf
AU - Katz, Gilad
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - The enrichment of tabular datasets using external sources has gained significant attention in recent years. Existing solutions, however, either ignore external unstructured data completely or devise dataset-specific solutions. In this study, we proposed Few-Shot Transformer based Enrichment (FeSTE), a generic and robust framework for the enrichment of tabular datasets using unstructured data. By training over multiple datasets, our approach is able to develop generic models that can be applied to additional datasets with minimal training (i.e., few-shot). Our approach is based on an adaptation of BERT, for which we present a novel fine-tuning approach that reformulates the tuples of the datasets as sentences. Our evaluation, conducted on 17 datasets, shows that FeSTE is able to generate high quality features and significantly outperform existing fine-tuning solutions.
AB - The enrichment of tabular datasets using external sources has gained significant attention in recent years. Existing solutions, however, either ignore external unstructured data completely or devise dataset-specific solutions. In this study, we proposed Few-Shot Transformer based Enrichment (FeSTE), a generic and robust framework for the enrichment of tabular datasets using unstructured data. By training over multiple datasets, our approach is able to develop generic models that can be applied to additional datasets with minimal training (i.e., few-shot). Our approach is based on an adaptation of BERT, for which we present a novel fine-tuning approach that reformulates the tuples of the datasets as sentences. Our evaluation, conducted on 17 datasets, shows that FeSTE is able to generate high quality features and significantly outperform existing fine-tuning solutions.
UR - http://www.scopus.com/inward/record.url?scp=85141646156&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85141646156
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 1577
EP - 1591
BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
A2 - Muresan, Smaranda
A2 - Nakov, Preslav
A2 - Villavicencio, Aline
PB - Association for Computational Linguistics (ACL)
T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
Y2 - 22 May 2022 through 27 May 2022
ER -