Title
Theory-guided Data Science: A New Paradigm for Scientific Discovery.
Abstract
Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.
Year
Venue
Field
2016
arXiv: Learning
Data science,Scientific discovery,Computer science,Management science
DocType
Volume
Citations 
Journal
abs/1612.08544
0
PageRank 
References 
Authors
0.34
0
9
Name
Order
Citations
PageRank
Anuj Karpatne110916.77
Gowtham Atluri200.34
James H. Faghmous300.34
Michael Steinbach400.68
Arindam Banerjee54716233.98
Auroop R. Ganguly628629.53
Shashi Shekhar743521098.43
Nagiza F. Samatova886174.04
Vipin Kumar9205.66