TY - GEN
T1 - Indexing cloud data lakes within the lakes
AU - Weintraub, Grisha
AU - Gudes, Ehud
AU - Dolev, Shlomi
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/6/14
Y1 - 2021/6/14
N2 - Cloud data lakes are a modern approach for storing large amounts of data in a convenient and inexpensive way. The main idea is the separation of compute and storage layers. However, to perform analytics on the data in this architecture, the data should be moved from the storage layer to the compute layer over the network for each calculation. Obviously, that hurts calculation performance and requires huge network bandwidth. We are exploring different approaches for adding indexing to the cloud data lakes with the goal of reducing the amounts of data read from the storage, and as a result, improving query execution time.
AB - Cloud data lakes are a modern approach for storing large amounts of data in a convenient and inexpensive way. The main idea is the separation of compute and storage layers. However, to perform analytics on the data in this architecture, the data should be moved from the storage layer to the compute layer over the network for each calculation. Obviously, that hurts calculation performance and requires huge network bandwidth. We are exploring different approaches for adding indexing to the cloud data lakes with the goal of reducing the amounts of data read from the storage, and as a result, improving query execution time.
UR - http://www.scopus.com/inward/record.url?scp=85108411706&partnerID=8YFLogxK
U2 - 10.1145/3456727.3463828
DO - 10.1145/3456727.3463828
M3 - Conference contribution
AN - SCOPUS:85108411706
T3 - SYSTOR 2021 - Proceedings of the 14th ACM International Conference on Systems and Storage
BT - SYSTOR 2021 - Proceedings of the 14th ACM International Conference on Systems and Storage
PB - Association for Computing Machinery, Inc
T2 - 14th ACM International Conference on Systems and Storage, SYSTOR 2021
Y2 - 14 June 2021 through 16 June 2021
ER -