Title
Distributed Detection of Cancer Cells in High-Throughput Cellular Spike Streams
Abstract
Detection and identification of important biological targets such as, DNA, proteins, and diseased human cells is crucial towards early disease diagnosis and prognosis. The key to differentiate healthy cells from the diseased cells is the biophysical properties that differ significantly. Micro and nanosystems, such as solid-state micropores and nanopores, can measure and translate these properties of human cells and DNA into electrical spikes to decode useful biological insights. Nonetheless, such approaches result in large data streams that are often plagued with inherit noise and baseline wanders. Moreover, the extant detection approaches are tedious, time-consuming, and error-prone, and there is no error-resilient software that can analyze large datasets instantly. The ability to effectively process and detect biological targets in larger datasets lies in the automated and accelerated data processing strategies using state-of-the-art distributed computing systems. To this end, we propose a distributed detection framework, which collects the raw data stream on a server node that then splits/distributes the data into segments across the worker nodes. Each node reduces noise in the assigned data segment using moving-average filtering, and detects the electric spikes by comparing them against a statistical threshold (based on the mean and standard deviation of the data), in a Single Program Multiple Data (SPMD) style. Our proposed framework enables the detection of cancer cells with an accuracy of 63% in a mixture of Cancer cells, Red Blood Cells (RBCs), and White Blood Cells (WBCs), and achieves a maximum speedup of 6X over a single-node machine by processing 10 gigabytes of raw data using an 8-node cluster in less than a minute.
Year
DOI
Venue
2014
10.1109/CCGrid.2014.108
Cluster, Cloud and Grid Computing
Keywords
Field
DocType
bioMEMS,bioelectric potentials,biomedical equipment,blood,cancer,cellular biophysics,data analysis,filtering theory,medical signal processing,microsensors,molecular biophysics,nanomedicine,nanoporous materials,nanosensors,porosity,proteins,signal denoising,statistical analysis,8-node cluster,DNA,accelerated data processing strategies,assigned data segment,automated data processing strategies,baseline wanders,biological insights,biological target detection,biological target identification,biological targets,biophysical properties,data streams,dataset analysis,disease diagnosis,disease prognosis,diseased human cells,distributed cancer cell detection,distributed detection framework,electric spikes,electrical spikes,extant detection approaches,healthy cells,high-throughput cellular spike streams,inherit noise,microsystems,moving-average filtering,nanosystems,noise reduction,proteins,raw data stream collection,red blood cells,server node,single program multiple data style,single-node machine,solid-state micropores,solid-state nanopores,standard deviation,state-of-the-art distributed computing systems,statistical threshold,white blood cells,Distributed computing,accelerated-diagnosis,automated cancer cell detection,solid-state micropores
Data segment,SPMD,Data processing,Data stream mining,Computer science,Node (networking),Real-time computing,Artificial intelligence,Throughput,Speedup,Pattern recognition,Filter (signal processing),Bioinformatics
Conference
ISSN
Citations 
PageRank 
2376-4414
0
0.34
References 
Authors
9
3
Name
Order
Citations
PageRank
Abdul Hafeez151.63
M. Mustafa Rafique215715.49
Ali R. Butt365147.51