Title
Scalable clustering algorithms for continuous environmental flow cytometry.
Abstract
Motivation: Recent technological innovations in flow cytometry now allow oceanographers to collect high-frequency flow cytometry data from particles in aquatic environments on a scale far surpassing conventional flow cytometers. The SeaFlow cytometer continuously profiles microbial phytoplankton populations across thousands of kilometers of the surface ocean. The data streams produced by instruments such as SeaFlow challenge the traditional sample-by-sample approach in cytometric analysis and highlight the need for scalable clustering algorithms to extract population information from these large-scale, high-frequency flow cytometers. Results: We explore how available algorithms commonly used for medical applications perform at classification of such a large-scale, environmental flow cytometry data. We apply large-scale Gaussian mixture models to massive datasets using Hadoop. This approach outperforms current state-of-the-art cytometry classification algorithms in accuracy and can be coupled with manual or automatic partitioning of data into homogeneous sections for further classification gains. We propose the Gaussian mixture model with partitioning approach for classification of large-scale, high-frequency flow cytometry data.
Year
DOI
Venue
2016
10.1093/bioinformatics/btv594
BIOINFORMATICS
Field
DocType
Volume
Population,Data mining,Data stream mining,Computer science,Source code,Software,Cluster analysis,Statistical classification,Mixture model,Scalability
Journal
32
Issue
ISSN
Citations 
3
1367-4803
0
PageRank 
References 
Authors
0.34
11
6
Name
Order
Citations
PageRank
Jeremy Hyrkas171.87
Sophie Clayton200.34
Francois Ribalet391.92
Daniel Halperin4141987.19
E Virginia Armbrust5394.87
Bill Howe6152094.44