TY - GEN
T1 - Approximating Aggregated SQL Queries with LSTM Networks
AU - Regev, Nir
AU - Rokach, Lior
AU - Shabtai, Asaf
N1 - Funding Information:
We want to thank Sisense Ltd. for hosting and supporting this research and providing all of the required resources.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/9/20
Y1 - 2021/9/20
N2 - Despite continuous investments in data technologies, the latency of querying data still poses a significant challenge. Modern analytic solutions require near real-time responsiveness both to make them interactive and to support automated processing. Current technologies (Hadoop, Spark, Dataflow) scan the dataset to execute queries and focus on providing scalable data storage and in-memory concurrent data processing to maximize task execution speed. We argue that these solutions fail to offer an adequate level of interactivity, since they depend on continual access to data. In this paper, we present a method for query approximation, also known as approximate query processing (AQP), that reduces the need to scan data during inference (query calculation), thus enabling a rapid query processing tool. We use an LSTM network to learn the relationship between queries and their results, and to provide a rapid inference layer for the prediction of query results. Our method (referred to as 'Hunch') produces a lightweight LSTM network which provides high query throughput. We evaluated our method using 12 datasets and compared it to state-of-the-art AQP engines (VerdictDB, BlinkDB) in terms of the query latency, model weight, and accuracy. The results show that our method predicted query results with a normalized root mean squared error (NRMSE) ranging from approximately 1% to 4%, which, for the majority of our datasets, was better than the results of the benchmarks. Moreover, our method was able to predict up to 120, 000 queries in a second (streamed together) and with a single query latency of no more than 2 ms.
AB - Despite continuous investments in data technologies, the latency of querying data still poses a significant challenge. Modern analytic solutions require near real-time responsiveness both to make them interactive and to support automated processing. Current technologies (Hadoop, Spark, Dataflow) scan the dataset to execute queries and focus on providing scalable data storage and in-memory concurrent data processing to maximize task execution speed. We argue that these solutions fail to offer an adequate level of interactivity, since they depend on continual access to data. In this paper, we present a method for query approximation, also known as approximate query processing (AQP), that reduces the need to scan data during inference (query calculation), thus enabling a rapid query processing tool. We use an LSTM network to learn the relationship between queries and their results, and to provide a rapid inference layer for the prediction of query results. Our method (referred to as 'Hunch') produces a lightweight LSTM network which provides high query throughput. We evaluated our method using 12 datasets and compared it to state-of-the-art AQP engines (VerdictDB, BlinkDB) in terms of the query latency, model weight, and accuracy. The results show that our method predicted query results with a normalized root mean squared error (NRMSE) ranging from approximately 1% to 4%, which, for the majority of our datasets, was better than the results of the benchmarks. Moreover, our method was able to predict up to 120, 000 queries in a second (streamed together) and with a single query latency of no more than 2 ms.
KW - Approximate query processing (AQP)
KW - LSTM
KW - SQL
KW - Supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85116434846&partnerID=8YFLogxK
U2 - 10.1109/IJCNN52387.2021.9533974
DO - 10.1109/IJCNN52387.2021.9533974
M3 - Conference contribution
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - IJCNN 2021 - International Joint Conference on Neural Networks, Proceedings
PB - Institute of Electrical and Electronics Engineers
T2 - 2021 International Joint Conference on Neural Networks, IJCNN 2021
Y2 - 18 July 2021 through 22 July 2021
ER -