TY - JOUR
T1 - Predicting venue popularity using crowd-sourced and passive sensor data
AU - Timokhin, Stanislav
AU - Sadrani, Mohammad
AU - Antoniou, Constantinos
N1 - Publisher Copyright:
© 2020 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2020/9
Y1 - 2020/9
N2 - Efficient and reliable mobility pattern identification is essential for transport planning research. In order to infer mobility patterns, however, a large amount of spatiotemporal data is needed, which is not always available. Hence, location-based social networks (LBSNs) have received considerable attention as a potential data provider. The aim of this study is to investigate the possibility of using several different auxiliary information sources for venue popularity modeling and provide an alternative venue popularity measuring approach. Initially, data from widely used services, such as Google Maps, Yelp and OpenStreetMap (OSM), are used to model venue popularity. To estimate hourly venue occupancy, two different classes of model are used, including linear regression with lasso regularization and gradient boosted regression (GBR). The predictions are made based on venue-related parameters (e.g., rating, comments) and locational properties (e.g., stores, hotels, attractions). Results show that the prediction can be improved using GBR with a logarithmic transformation of the dependent variables. To investigate the quality of social media-based models by obtaining WiFi-based ground truth data, a microcontroller setup is developed to measure the actual number of people attending venues using WiFi presence detection, demonstrating that the similarity between the results of WiFi data collection and Google “Popular Times” is relatively promising.
AB - Efficient and reliable mobility pattern identification is essential for transport planning research. In order to infer mobility patterns, however, a large amount of spatiotemporal data is needed, which is not always available. Hence, location-based social networks (LBSNs) have received considerable attention as a potential data provider. The aim of this study is to investigate the possibility of using several different auxiliary information sources for venue popularity modeling and provide an alternative venue popularity measuring approach. Initially, data from widely used services, such as Google Maps, Yelp and OpenStreetMap (OSM), are used to model venue popularity. To estimate hourly venue occupancy, two different classes of model are used, including linear regression with lasso regularization and gradient boosted regression (GBR). The predictions are made based on venue-related parameters (e.g., rating, comments) and locational properties (e.g., stores, hotels, attractions). Results show that the prediction can be improved using GBR with a logarithmic transformation of the dependent variables. To investigate the quality of social media-based models by obtaining WiFi-based ground truth data, a microcontroller setup is developed to measure the actual number of people attending venues using WiFi presence detection, demonstrating that the similarity between the results of WiFi data collection and Google “Popular Times” is relatively promising.
KW - Big data
KW - Google popular times
KW - Mobility pattern
KW - Venue popularity
KW - WiFi data collection
UR - http://www.scopus.com/inward/record.url?scp=85104249499&partnerID=8YFLogxK
U2 - 10.3390/smartcities3030042
DO - 10.3390/smartcities3030042
M3 - Article
AN - SCOPUS:85104249499
SN - 2624-6511
VL - 3
SP - 818
EP - 841
JO - Smart Cities
JF - Smart Cities
IS - 3
ER -