TY - GEN
T1 - Improving constrained search results by data melioration
AU - Guy, Ido
AU - Milo, Tova
AU - Novgorodov, Slava
AU - Youngmann, Brit
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/4/1
Y1 - 2021/4/1
N2 - The problem of finding an item-set of maximal aggregated utility that satisfies a set of constraints is at the cornerstone of many search applications. Its classical definition assumes that all the information needed to verify the constraints is explicitly given. However, in real-world databases, the data available on items is often partial. Hence, adequately answering constrained search queries requires the completion of this missing information. A common approach to complete missing data is to employ Machine Learning (ML)-based inference. However, such methods are naturally error-prone. More accurate data can be obtained by asking humans to complete missing information. But, as the number of items in the repository is vast, limiting human effort is crucial. To this end, we introduce the Probabilistic Constrained Search (PCS) problem, which identifies a bounded-size item-set whose data completion is likely to be highly beneficial, as these items are expected to belong to the result set of the constrained search queries in question. We prove PCS to be hard to approximate, and consequently propose a best-effort PTIME heuristic to solve it. We demonstrate the effectiveness and efficiency of our algorithm over real-world datasets and scenarios, showing that our algorithm significantly improves the result sets of constrained search queries, in terms of both utility and constraints satisfaction probability.
AB - The problem of finding an item-set of maximal aggregated utility that satisfies a set of constraints is at the cornerstone of many search applications. Its classical definition assumes that all the information needed to verify the constraints is explicitly given. However, in real-world databases, the data available on items is often partial. Hence, adequately answering constrained search queries requires the completion of this missing information. A common approach to complete missing data is to employ Machine Learning (ML)-based inference. However, such methods are naturally error-prone. More accurate data can be obtained by asking humans to complete missing information. But, as the number of items in the repository is vast, limiting human effort is crucial. To this end, we introduce the Probabilistic Constrained Search (PCS) problem, which identifies a bounded-size item-set whose data completion is likely to be highly beneficial, as these items are expected to belong to the result set of the constrained search queries in question. We prove PCS to be hard to approximate, and consequently propose a best-effort PTIME heuristic to solve it. We demonstrate the effectiveness and efficiency of our algorithm over real-world datasets and scenarios, showing that our algorithm significantly improves the result sets of constrained search queries, in terms of both utility and constraints satisfaction probability.
UR - http://www.scopus.com/inward/record.url?scp=85112867708&partnerID=8YFLogxK
U2 - 10.1109/ICDE51399.2021.00147
DO - 10.1109/ICDE51399.2021.00147
M3 - Conference contribution
AN - SCOPUS:85112867708
T3 - Proceedings - International Conference on Data Engineering
SP - 1667
EP - 1678
BT - Proceedings - 2021 IEEE 37th International Conference on Data Engineering, ICDE 2021
PB - Institute of Electrical and Electronics Engineers
T2 - 37th IEEE International Conference on Data Engineering, ICDE 2021
Y2 - 19 April 2021 through 22 April 2021
ER -