TY - GEN
T1 - Meta-X
T2 - 5th International Symposium on Cyber Security Cryptography and Machine Learning, CSCML 2021
AU - Afrati, Foto
AU - Dolev, Shlomi
AU - Sharma, Shantanu
AU - Ullman, Jeffrey D.
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Computations, such as syndromic surveillance and e-commerce, are executed over the datasets collected from different geographical locations. Modern data processing systems, such as MapReduce/Hadoop or Spark, also, require to collect the data from different geographical locations to a single global location, before executing an application, and thus, result in a significant communication cost. While MapReduce/Hadoop and Spark have proven to be the most useful paradigms in the revolution of distributed computing, the federation of cloud and big-data activities is the challenge, wherein data processing should be modified to avoid (big) data migration across remote (cloud) sites. This is exactly our scope of work, where only the very essential data for obtaining the final result is transmitted, for reducing communication and processing, and for preserving data privacy as much as possible. In this work, we propose an algorithmic technique for geographically distributed computations, called Meta-X, that decreases the communication cost by allowing us to process and moves metadata to among different locations, instead of the entire datasets. We illustrate the usefulness of Meta-X in terms of MapReduce computations for different operations, such as equijoin, k-nearest-neighbors finding, and shortest path finding.
AB - Computations, such as syndromic surveillance and e-commerce, are executed over the datasets collected from different geographical locations. Modern data processing systems, such as MapReduce/Hadoop or Spark, also, require to collect the data from different geographical locations to a single global location, before executing an application, and thus, result in a significant communication cost. While MapReduce/Hadoop and Spark have proven to be the most useful paradigms in the revolution of distributed computing, the federation of cloud and big-data activities is the challenge, wherein data processing should be modified to avoid (big) data migration across remote (cloud) sites. This is exactly our scope of work, where only the very essential data for obtaining the final result is transmitted, for reducing communication and processing, and for preserving data privacy as much as possible. In this work, we propose an algorithmic technique for geographically distributed computations, called Meta-X, that decreases the communication cost by allowing us to process and moves metadata to among different locations, instead of the entire datasets. We illustrate the usefulness of Meta-X in terms of MapReduce computations for different operations, such as equijoin, k-nearest-neighbors finding, and shortest path finding.
KW - Hadoop
KW - MapReduce
KW - Spark
UR - http://www.scopus.com/inward/record.url?scp=85111975950&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-78086-9_34
DO - 10.1007/978-3-030-78086-9_34
M3 - Conference contribution
AN - SCOPUS:85111975950
SN - 9783030780852
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 467
EP - 486
BT - Cyber Security Cryptography and Machine Learning - 5th International Symposium, CSCML 2021, Proceedings
A2 - Dolev, Shlomi
A2 - Margalit, Oded
A2 - Pinkas, Benny
A2 - Schwarzmann, Alexander
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 8 July 2021 through 9 July 2021
ER -