Skip to main navigation Skip to search Skip to main content

REFVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation

  • Aviv Slobodkin
  • , Hagai Taitelbaum
  • , Yonatan Bitton
  • , Brian Gordon
  • , Michal Sokolik
  • , Nitzan Bitton Guetta
  • , Almog Gueta
  • , Royi Rassin
  • , Dani Lischinski
  • , Idan Szpektor

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    Abstract

    Subject-driven text-to-image (T2I) generation aims to produce images that align with a given textual description, while preserving the visual identity from a referenced subject image. Despite its broad downstream applicability—ranging from enhanced personalization in image generation to consistent character representation in video rendering—progress in this field is limited by the lack of reliable automatic evaluation. Existing methods either assess only one aspect of the task (i.e., textual alignment or subject preservation), misalign with human judgments, or rely on costly API-based evaluation. To address this gap, we introduce REFVNLI, a cost-effective metric that evaluates both textual alignment and subject preservation in a single run. Trained on a large-scale dataset derived from video-reasoning benchmarks and image perturbations, REFVNLI outperforms or statistically matches existing baselines across multiple benchmarks and subject categories (e.g., Animal, Object), achieving up to 6.4-point gains in textual alignment and 5.9-point gains in subject preservation.1

    Original languageEnglish
    Title of host publicationEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
    EditorsChristos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
    PublisherAssociation for Computational Linguistics (ACL)
    Pages8420-8438
    Number of pages19
    ISBN (Electronic)9798891763357
    DOIs
    StatePublished - 1 Jan 2025
    Event30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 - Suzhou, China
    Duration: 4 Nov 20259 Nov 2025

    Publication series

    NameEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025

    Conference

    Conference30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
    Country/TerritoryChina
    CitySuzhou
    Period4/11/259/11/25

    ASJC Scopus subject areas

    • Computational Theory and Mathematics
    • Computer Science Applications
    • Information Systems
    • Linguistics and Language

    Fingerprint

    Dive into the research topics of 'REFVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation'. Together they form a unique fingerprint.

    Cite this