Title
Towards Reliable Data Collection and Annotation to Extract Pulmonary Digital Biomarkers Using Mobile Sensors
Abstract
Proliferation of sensors embedded in smartphones and smartwatches helps capture rich dataset for machine learning algorithms to extract meaningful digital bio-markers on consumer devices for monitoring disease progression and treatment response. However, development and validation of machine learning algorithms depend on gathering high fidelity sensor data and reliable ground-truth. We conduct a study, called mLungStudy, with 131 subjects with varying pulmonary conditions to collect mobile sensor data including audio, accelerometer, gyroscope using a smartphone and a smartwatch, in order to extract pulmonary biomarkers such as breathing, coughs, spirometry, and breathlessness. Our study shows that commonly used breathing ground-truth data from chestband may not always be reliable as a gold-standard. Our analysis shows that breathlessness biomarkers such as pause time and pause frequency from 2.15 minutes of audio can be as reliable as those extracted from 5 minutes' worth of speech data. This finding can be useful for future studies to trade-off between the reliability of breathlessness data and patient comfort in generating continuous speech data. Furthermore, we use crowdsourcing techniques to annotate pulmonary sound events for developing signal processing and machine learning algorithms. In this paper, we highlight several practical challenges to collect and annotate physiological data and acoustic symptoms from chronic pulmonary patients and ways to improve data quality. We show that the waveform visualization of the audio signal improves annotation quality which leads to a 6.59% increase in cough classification accuracy and a 6% increase in spirometry event classification accuracy. Findings from this study inform future studies focusing on developing explainable machine learning models to extract pulmonary digital bio-markers using mobile sensors.
Year
DOI
Venue
2019
10.1145/3329189.3329204
Proceedings of the 13th EAI International Conference on Pervasive Computing Technologies for Healthcare
Keywords
Field
DocType
Breathing, Breathlessness, Cough, Crowdsourced Annotation, Data Quality, Digital Biomarkers, mHealth
Data collection,Annotation,Information retrieval,Computer science,Distributed computing
Conference
ISSN
ISBN
Citations 
2153-1633
978-1-4503-6126-2
2
PageRank 
References 
Authors
0.42
0
6
Name
Order
Citations
PageRank
Md. Mahmudur Rahman11716.00
Viswam Nathan25014.09
Ebrahim Nemati38415.30
Korosh Vatanparvar413416.20
Mohsin Ahmed520.42
Jilong Kuang63817.00