Skip to main navigation Skip to search Skip to main content

Ad-hoc document retrieval using weak-supervision with BERT and GPT2

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

15 Scopus citations

Abstract

We describe a weakly-supervised method for training deep learning models for the task of ad-hoc document retrieval. Our method is based on generative and discriminative models that are trained using weak-supervision based solely on the documents in the corpus. We present an end-to-end retrieval system that starts with traditional information retrieval methods, followed by two deep learning re-rankers. We evaluate our method on three different datasets: a COVID-19 related scientific literature dataset and two news datasets. We show that our method outperforms state-of-the-art methods; this without the need for the expensive process of manually labeling data.

Original languageEnglish
Title of host publicationEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages4191-4197
Number of pages7
ISBN (Electronic)9781952148606
DOIs
StatePublished - 1 Jan 2020
Externally publishedYes
Event2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 - Virtual, Online
Duration: 16 Nov 202020 Nov 2020

Publication series

NameEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Conference

Conference2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
CityVirtual, Online
Period16/11/2020/11/20

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Ad-hoc document retrieval using weak-supervision with BERT and GPT2'. Together they form a unique fingerprint.

Cite this