Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications

Oren Sultan, Alex Khasin, Guy Shiran, Asnat Greenstein-Messica, Dafna Shahaf

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present a practical distillation approach to fine-tune LLMs for invoking tools in real-time applications. We focus on visual editing tasks; specifically, we modify images and videos by interpreting user stylistic requests, specified in natural language (“golden hour”), using an LLM to select the appropriate tools and their parameters to achieve the desired visual effect. We found that proprietary LLMs such as GPT-3.5-Turbo show potential in this task, but their high cost and latency make them unsuitable for real-time applications. In our approach, we fine-tune a (smaller) student LLM with guidance from a (larger) teacher LLM and behavioral signals. We introduce offline metrics to evaluate student LLMs. Both online and offline experiments show that our student models succeeded in matching the performance of our teacher model (GPT-3.5-Turbo), significantly reducing costs and latency. Lastly, we show that fine-tuning was improved by 25% in low-data regimes using augmentation.

Original languageEnglish
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Industry Track
EditorsFranck Dernoncourt, Daniel Preotiuc-Pietro, Anastasia Shimorina
PublisherAssociation for Computational Linguistics (ACL)
Pages1286-1304
Number of pages19
ISBN (Electronic)9798891761667
DOIs
StatePublished - 1 Jan 2024
Externally publishedYes
Event2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, EMNLP 2024 - Miami, United States
Duration: 12 Nov 202416 Nov 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Industry Track

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, EMNLP 2024
Country/TerritoryUnited States
CityMiami
Period12/11/2416/11/24

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Visual Editing with LLM-based Tool Chaining: An Efficient Distillation Approach for Real-Time Applications'. Together they form a unique fingerprint.

Cite this