Title
From Raw Footprints to Personal Interests: Bridging the Semantic Gap via Trip Intention Aggregation
Abstract
User-generated trajectories (UGT), such as GPS footprints from wearable devices or travel records from bus companies, capture rich information of human mobility and urban dynamics in the offline world. In this paper, our objective is to enrich these raw footprints and discover the users' personal interests by utilizing the semantic information contained in the spatial-and temporal-aware user-generated contents (STUGC) published in the online world. We design a novel probabilistic framework named CO <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> to connect the offline world with the online world in order to discover the users' interests directly from their raw footprints in UGT. In particular, we first propose a latent probabilistic generative model named STLDA to infer the intention attached with each trip, and then aggregate the extracted trip intentions to discover the users' personal interests. To tackle the inherent sparsity and noisiness problems of the tags in STUGC, STLDA considers the inner correlation between tags (i.e., semantic, spatial and temporal correlation) on the topic-level. To evaluate the effectiveness of CO <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> , we utilize a dataset containing three months of data with 5.3 billion bus records and a Twitter dataset with 1.5 million tweets published in 6 months in Singapore as a case study. Experimental results on these two real-world datasets show that CO <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> is effective in discovering user interests and improves the precision of the state-of-the-art method by 280%. In addition, we also conduct a questionnaire survey in Singapore to evaluate the effectiveness of CO <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . The results further validate the superiority of CO <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> .
Year
DOI
Venue
2017
10.1109/ICDE.2017.55
2017 IEEE 33rd International Conference on Data Engineering (ICDE)
Keywords
Field
DocType
trip intention aggregation,users personal interests,semantic information,spatial-and temporal-aware user-generated contents,connect the offline world with the online world,CO2,UGT,latent probabilistic generative model,STLDA,STUGC,Twitter,Singapore,user-generated trajectories,spatial-and temporal-aware LDA-based model,GPS footprints,time 6 month
Data mining,World Wide Web,Computer science,Bridging (networking),Semantic gap,Probabilistic generative model,Global Positioning System,Probabilistic logic,Questionnaire,Wearable technology,Database,Semantics
Conference
ISSN
ISBN
Citations 
1084-4627
978-1-5090-6544-8
2
PageRank 
References 
Authors
0.53
4
5
Name
Order
Citations
PageRank
Long Guo1654.17
Dongxiang Zhang274343.89
Huayu Wu318422.70
Bin Cui41843124.59
Kian-Lee Tan56962776.65