TY - GEN
T1 - Imputation of Missing Boarding Stop Information in Smart Card Data with Machine Learning Methods
AU - Shalit, Nadav
AU - Fire, Michael
AU - Ben-Elia, Eran
N1 - Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - With the increase in population densities and environmental awareness, public transport has become an important aspect of urban life. Consequently, large quantities of transportation data are generated, and mining data from smart card use has become a standardized method to understand the travel habits of passengers. Increase in available data and computation power demands more sophisticated methods to analyze big data. Public transport datasets, however, often lack data integrity. Boarding stop information may be missing either due to imperfect acquirement processes or inadequate reporting. As a result, large quantities of observations and even complete sections of cities might be absent from the smart card database. We have developed a machine (supervised) learning method to impute missing boarding stops based on ordinal classification. In addition, we present a new metric, Pareto Accuracy, to evaluate algorithms where classes have an ordinal nature. Results are based on a case study in the city of Beer Sheva utilizing one month of data. We show that our proposed method significantly outperforms schedule-based imputation methods and can improve the accuracy and usefulness of large-scale transportation data. The implications for data imputation of smart card information is further discussed.
AB - With the increase in population densities and environmental awareness, public transport has become an important aspect of urban life. Consequently, large quantities of transportation data are generated, and mining data from smart card use has become a standardized method to understand the travel habits of passengers. Increase in available data and computation power demands more sophisticated methods to analyze big data. Public transport datasets, however, often lack data integrity. Boarding stop information may be missing either due to imperfect acquirement processes or inadequate reporting. As a result, large quantities of observations and even complete sections of cities might be absent from the smart card database. We have developed a machine (supervised) learning method to impute missing boarding stops based on ordinal classification. In addition, we present a new metric, Pareto Accuracy, to evaluate algorithms where classes have an ordinal nature. Results are based on a case study in the city of Beer Sheva utilizing one month of data. We show that our proposed method significantly outperforms schedule-based imputation methods and can improve the accuracy and usefulness of large-scale transportation data. The implications for data imputation of smart card information is further discussed.
KW - Boarding stop imputation
KW - Machine learning
KW - Smart card
UR - http://www.scopus.com/inward/record.url?scp=85097428137&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-62362-3_3
DO - 10.1007/978-3-030-62362-3_3
M3 - Conference contribution
AN - SCOPUS:85097428137
SN - 9783030623616
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 17
EP - 27
BT - Intelligent Data Engineering and Automated Learning – IDEAL 2020 - 21st International Conference, 2020, Proceedings
A2 - Analide, Cesar
A2 - Novais, Paulo
A2 - Camacho, David
A2 - Yin, Hujun
PB - Springer Science and Business Media Deutschland GmbH
T2 - 21th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2020
Y2 - 4 November 2020 through 6 November 2020
ER -