Abstract | ||
---|---|---|
UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the 'representatives' are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at http://cubic.bioc.columbia.edu/services/uniqueprot; a command-line version for Linux is downloadable from this web site. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1093/nar/gkg620 | NUCLEIC ACIDS RESEARCH |
Keywords | Field | DocType |
proteins,web service,protein sequence,internet,algorithms,greedy algorithm,sequence alignment | Data set,Alignment-free sequence analysis,Biology,Theoretical computer science,Greedy algorithm,Software,Genetics,Web service,Cluster analysis,Multiple sequence alignment,The Internet | Journal |
Volume | Issue | ISSN |
31 | 13 | 0305-1048 |
Citations | PageRank | References |
41 | 4.32 | 5 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sven Mika | 1 | 106 | 8.59 |
Burkhard Rost | 2 | 795 | 88.14 |