Invited Paper: Common Public Knowledge for Enhancing Machine Learning Data Sets

Shlomi Dolev, Arnon Ilani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

In this study, we show the advantages of incorporating multi-source knowledge from publicly available sources, such as ChatGPT and Wikipedia, into existing datasets to enhance the performance of machine learning models for routine tasks, such as classification. specifically, we propose the utilization of supplementary data from external sources and demonstrate the utility of widely accessible knowledge in the context of the Forest Cover Type Prediction task launched by the Roosevelt National Forest of Northern Colorado. Additionally, we exhibit an improvement in classification accuracy for the Isolated Letter Speech Recognition dataset when incorporating information on regional accents in the prediction of spoken English letter names.

Original languageEnglish
Title of host publicationProceedings of the 5th Workshop on Advanced Tools, programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed Systems, ApPLIED 2023
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9798400701283
DOIs
StatePublished - 19 Jun 2023
Event5th Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed Systems, ApPLIED 2023 - Orlando, United States
Duration: 19 Jun 2023 → …

Publication series

NameProceedings of the 5th Workshop on Advanced Tools, programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed Systems, ApPLIED 2023

Conference

Conference5th Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed Systems, ApPLIED 2023
Country/TerritoryUnited States
CityOrlando
Period19/06/23 → …

Keywords

  • ChatGPT
  • Feature engineering
  • Forest management
  • Isolated letter
  • Machine learning
  • Ontology
  • Random forests
  • Speech recognition
  • Tree cover type
  • World knowledge

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Theoretical Computer Science

Fingerprint

Dive into the research topics of 'Invited Paper: Common Public Knowledge for Enhancing Machine Learning Data Sets'. Together they form a unique fingerprint.

Cite this