Title
Visualizing the Protein Sequence Universe
Abstract
Modern biology is experiencing a rapid increase in data volumes that challenges our analytical skills and existing cyberinfrastructure. Exponential expansion of the protein sequence universe PSU, the protein sequence space, together with the costs and complexities of manual curation creates a major bottleneck in life sciences research. Existing resources lack scalable visualization tools that are instrumental for functional annotation. Here, we describe a new visualization tool using multidimensional scaling to create a 3D embedding of the protein space. The advantages of the proposed PSU method include the ability to scale to large numbers of sequences, integrate different similarity measures with other functional and experimental data, and facilitate protein annotation. We applied the method to visualize the prokaryotic PSU using sequence alignment scores. As an annotation example, we used the interpolation approach to map the set of annotated archaeal proteins into the prokaryotic PSU. Transdisciplinary approaches akin to the one described in this paper are urgently needed to quickly and efficiently translate the influx of new data into tangible innovations and groundbreaking discoveries. Copyright © 2013 John Wiley & Sons, Ltd.
Year
DOI
Venue
2014
10.1002/cpe.3072
Concurrency and Computation: Practice & Experience
Keywords
Field
DocType
prokaryotic psu,protein sequence space,functional annotation,protein sequence universe,data volume,large number,experimental data,existing cyberinfrastructure,protein annotation,different similarity measure,new data,cog,em,mpi,psu,twister,needleman wunsch,multidimensional scaling,data visualization,uniprot,blast
Data science,Bottleneck,Data visualization,Multidimensional scaling,UniProt,Visualization,Computer science,Cyberinfrastructure,Needleman–Wunsch algorithm,Protein Annotation
Journal
Volume
Issue
ISSN
26
6
1532-0626
Citations 
PageRank 
References 
3
0.47
30
Authors
11
Name
Order
Citations
PageRank
Larissa Stanberry1295.14
Roger Higdon2436.96
Winston Haynes3263.99
Natali Kolker4294.46
William Broomall5294.13
Saliya Ekanayake6909.34
Adam Hughes7813.90
Yang Ruan81126.26
Judy Qiu974343.25
Eugene Kolker105410.90
Geoffrey Fox114070575.38