Optimizing Cloud Data Lakes Queries

Grisha Weintraub

Research output: Contribution to journalConference articlepeer-review

Abstract

Cloud data lakes emerge as an inexpensive solution for storing very large amounts of data. The main idea is the separation of compute and storage layers. Thus, cheap cloud storage is used for storing the data, while compute engines are used for running analytics on this data in “on-demand” mode. However, to perform any computation on the data in this architecture, the data should be moved from the storage layer to the compute layer over the network for each calculation. Obviously, that hurts calculation performance and requires huge network bandwidth. Our research focuses on three related topics: (1) identify the key challenges to improving query performance in cloud data lakes, (2) provide a theoretical model that formally defines the problem of poor query performance in cloud data lakes, (3) design a practical solution to the problem and demonstrate its efficiency via large-scale experimental evaluation.

Original languageEnglish
Pages (from-to)13-16
Number of pages4
JournalCEUR Workshop Proceedings
Volume3452
StatePublished - 1 Jan 2023
Event49th International Conference on Very Large Data Bases PhD Workshop, VLDB-PhD Workshop 2023 - Vancouver, Canada
Duration: 28 Aug 2023 → …

Keywords

  • cloud storage
  • data lakes
  • query optimization

ASJC Scopus subject areas

  • Computer Science (all)

Fingerprint

Dive into the research topics of 'Optimizing Cloud Data Lakes Queries'. Together they form a unique fingerprint.

Cite this