Title
KDD-Cup 2004: protein homology task
Abstract
In this paper we describe the winning model for the performance measure "lowest ranked homologous sequence" (RKL). This was a subtask of the Protein Homology Prediction task of the KDD Cup 2004. The goal was to predict protein homology for different performance metrics. The given data was organized in blocks, each of which corresponds to a specific native sequence. The two metrics average precision (APR) and RKL explicitly make use of this block structure. Our solution consists of two parts. The first one is a global classification SVM not aware of the block structure. The second part is a k-NearestNeighbor scheme for block similarity, used to train ranking SVMs on the fly. Furthermore, we sketch our approach to optimize the root-mean-squared-error and report some alternative solutions that turned out to be suboptimal.
Year
DOI
Venue
2004
10.1145/1046456.1046477
SIGKDD Explorations
Keywords
Field
DocType
kdd cup,different performance metrics,block structure,protein homology prediction task,global classification,alternative solution,specific native sequence,block similarity,performance measure,metrics average precision,protein homology task,root mean square error
Data mining,Block structure,Ranking,Computer science,Support vector machine,On the fly,Artificial intelligence,Homology (biology),Artificial neural network,Machine learning,Sketch
Journal
Volume
Issue
Citations 
6
2
1
PageRank 
References 
Authors
0.36
3
3
Name
Order
Citations
PageRank
Christophe Foussette151.10
Daniel Hakenjos210.36
Martin Scholz354445.31