Summarization of financial documents with TF-IDF weighting of multi-word terms

Sophie Krimberg, Natalia Vanetik, Marina Litvak

Research output: Contribution to conferencePaperpeer-review

5 Scopus citations

Abstract

Financial documents, such as corporate annual reports, are usually very long and may consist of more than 100 pages. Every report is divided into thematic sections or statements that have an inner structure and include special financial terms and numbers. This paper describes an approach for summarizing financial documents based on a Bag-of-Words (BOW) document representation. The suggested solution first calculates the Term Frequency-Inverse Document Frequency (TF-IDF) weights for all single-word and multi-word expressions in the corpus, then finds the sequence of words with a maximum total weight in each document. The solution is designed to meet the requirements of the Financial Narrative Summarization (FNS 2021) shared task and has been tested on FNS 2021 dataset shared-task dataset.

Original languageEnglish
Pages75-80
Number of pages6
StatePublished - 1 Jan 2021
Externally publishedYes
Event3rd Financial Narrative Processing Workshop, FNP 2021 - Lancaster, United Kingdom
Duration: 15 Sep 202116 Sep 2021

Conference

Conference3rd Financial Narrative Processing Workshop, FNP 2021
Country/TerritoryUnited Kingdom
CityLancaster
Period15/09/2116/09/21

ASJC Scopus subject areas

  • Artificial Intelligence
  • Business, Management and Accounting (miscellaneous)
  • Finance

Fingerprint

Dive into the research topics of 'Summarization of financial documents with TF-IDF weighting of multi-word terms'. Together they form a unique fingerprint.

Cite this