Title
Interpreting Social Media-Based Substance Use Prediction Models with Knowledge Distillation
Abstract
People nowadays spend a significant amount of time on social media such as Twitter, Facebook, and Instagram. As a result, social media data capture rich human behavioral evidence that can be used to help us understand their thoughts, behavior and decision making process. Social media data, however, are mostly unstructured (e.g., text and images) and may involve a large number of raw features (e.g., millions of raw text and image features). Moreover, the ground truth data about human behavior and decision making could be difficult to obtain at a large scale. As a result, most state-of-the-art social media-based human behavior models employ sophisticated unsupervised feature learning to leverage a large amount of unsupervised data. Unfortunately, these advanced models often rely on latent features that are hard to explain. Since understanding the knowledge captured in these models is important for behavior scientists, public health providers as well as policymakers, in this research, we focus on employing a knowledge distillation framework to build machine learning models with not only state-of-the-art predictive performance but also interpretable results. We evaluate the effectiveness of the proposed framework in explaining Substance Use Disorder (SUD) prediction models. Our best models achieved 87% ROC AUC for predicting tobacco use, 84% for alcohol use and 93% for drug use, which are comparable to existing state-of-the-art SUD prediction models. Since these models are also interpretable (e.g., a logistics regression model and a gradient boosting tree model), we combine the results from these models to gain insight into the relationship between a user's social media behavior (e.g., social media likes and word usage) and substance use.
Year
DOI
Venue
2018
10.1109/ICTAI.2018.00100
2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI)
Keywords
Field
DocType
human behavior, social media, substance use disorders, explainable AI, interpretable AI
Data modeling,Word usage,Social media,Computer science,Decision tree model,Artificial intelligence,Predictive modelling,Feature learning,Decision-making,Machine learning,Gradient boosting
Conference
ISSN
ISBN
Citations 
1082-3409
978-1-5386-7450-5
0
PageRank 
References 
Authors
0.34
6
4
Name
Order
Citations
PageRank
Tao Ding1158.48
Fatema Hasan200.34
Warren K. Bickel300.34
Shimei Pan468464.41