Indexing cloud data lakes within the lakes

Grisha Weintraub, Ehud Gudes, Shlomi Dolev

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Cloud data lakes are a modern approach for storing large amounts of data in a convenient and inexpensive way. The main idea is the separation of compute and storage layers. However, to perform analytics on the data in this architecture, the data should be moved from the storage layer to the compute layer over the network for each calculation. Obviously, that hurts calculation performance and requires huge network bandwidth. We are exploring different approaches for adding indexing to the cloud data lakes with the goal of reducing the amounts of data read from the storage, and as a result, improving query execution time.

Original languageEnglish
Title of host publicationSYSTOR 2021 - Proceedings of the 14th ACM International Conference on Systems and Storage
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450383981
DOIs
StatePublished - 14 Jun 2021
Event14th ACM International Conference on Systems and Storage, SYSTOR 2021 - Virtual, Online, Israel
Duration: 14 Jun 202116 Jun 2021

Publication series

NameSYSTOR 2021 - Proceedings of the 14th ACM International Conference on Systems and Storage

Conference

Conference14th ACM International Conference on Systems and Storage, SYSTOR 2021
Country/TerritoryIsrael
CityVirtual, Online
Period14/06/2116/06/21

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Fingerprint

Dive into the research topics of 'Indexing cloud data lakes within the lakes'. Together they form a unique fingerprint.

Cite this