Indexing cloud data lakes within the lakes

Grisha Weintraub, Ehud Gudes, Shlomi Dolev

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Cloud data lakes are a modern approach for storing large amounts of data in a convenient and inexpensive way. The main idea is the separation of compute and storage layers. However, to perform analytics on the data in this architecture, the data should be moved from the storage layer to the compute layer over the network for each calculation. Obviously, that hurts calculation performance and requires huge network bandwidth. We are exploring different approaches for adding indexing to the cloud data lakes with the goal of reducing the amounts of data read from the storage, and as a result, improving query execution time.
Original languageEnglish
Title of host publicationSYSTOR '21: The 14th ACM International Systems and Storage Conference, Haifa, Israel, June 14-16, 2021
EditorsBruno Wassermann, Michal Malka, Vijay Chidambaram, Danny Raz
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450383981
DOIs
StatePublished - 2021
Event14th ACM International Conference on Systems and Storage, SYSTOR 2021 - Virtual, Online, Israel
Duration: 14 Jun 202116 Jun 2021

Conference

Conference14th ACM International Conference on Systems and Storage, SYSTOR 2021
Country/TerritoryIsrael
CityVirtual, Online
Period14/06/2116/06/21

Keywords

  • engineering
  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Indexing cloud data lakes within the lakes'. Together they form a unique fingerprint.

Cite this