Automatic generation of composite image descriptions

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatic generation of natural language descriptions for images has recently become an important research topic. In this paper, we propose a frame-based algorithm for generating a composite natural language description for a given image. The goal of this algorithm is to describe not only the objects appearing in the image but also the main activities happening in the image and the objects participating in those activities. The algorithm builds upon a pre-trained CRF (Conditional Random Field)-based structured prediction model, which generates a set of alternative frames for a given image. We use imSitu, a situation recognition dataset with 126,102 images, 504 activities, 11,538 objects, and 1,788 roles, as a test bed of our algorithm. We ask human evaluators to evaluate the quality of the descriptions for 20 images from the imSitu dataset. The results demonstrate that our composite description contains on average 16% more visual elements than the baseline method and gains a significantly higher accuracy score by the human evaluators.

Original languageEnglish
Title of host publicationICNC-FSKD 2017 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery
EditorsLiang Zhao, Lipo Wang, Guoyong Cai, Kenli Li, Yong Liu, Guoqing Xiao
PublisherInstitute of Electrical and Electronics Engineers
Pages2612-2618
Number of pages7
ISBN (Electronic)9781538621653
DOIs
StatePublished - 21 Jun 2018
Event13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, ICNC-FSKD 2017 - Guilin, Guangxi, China
Duration: 29 Jul 201731 Jul 2017

Publication series

NameICNC-FSKD 2017 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery

Conference

Conference13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, ICNC-FSKD 2017
Country/TerritoryChina
CityGuilin, Guangxi
Period29/07/1731/07/17

Keywords

  • composite image descriptions
  • frames
  • natural language processing

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Logic
  • Modeling and Simulation
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Automatic generation of composite image descriptions'. Together they form a unique fingerprint.

Cite this