Title | ||
---|---|---|
A new efficient probabilistic model for mining labeled ordered trees applied to glycobiology |
Abstract | ||
---|---|---|
Mining frequent patterns from large datasets is an important issue in data mining. Recently, complex and unstructured (or semi-structured) datasets have appeared as targets for major data mining applications, including text mining, web mining and bioinformatics. Our work focuses on labeled ordered trees, which are typically semi-structured datasets. In bioinformatics, carbohydrate sugar chains, or glycans, can be modeled as labeled ordered trees. Glycans are the third major class of biomolecules, having important roles in signaling and recognition. For mining labeled ordered trees, we propose a new probabilistic model and its efficient learning scheme which significantly improves the time and space complexity of an existing probabilistic model for labeled ordered trees. We evaluated the performance of the proposed model, comparing it with those of other probabilistic models, using synthetic as well as real datasets from glycobiology. Experimental results showed that the proposed model drastically reduced the computation time of the competing model, keeping the predictive power and avoiding overfitting to the training data. Finally, we assessed our results on real data from a variety of biological viewpoints, verifying known facts in glycobiology. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1145/1342320.1342326 | TKDD |
Keywords | Field | DocType |
data mining,competing model,new efficient probabilistic model,new probabilistic model,large datasets,web mining,probabilistic model,maximum likelihood,major data mining application,expectation-maximization,labeled ordered trees,text mining,probabilistic models,existing probabilistic model,expectation maximization,space complexity | Data mining,Text mining,Web mining,Glycobiology,Computer science,Expectation–maximization algorithm,Statistical model,Artificial intelligence,Probabilistic logic,Overfitting,Machine learning,Computation | Journal |
Volume | Issue | Citations |
2 | 1 | 5 |
PageRank | References | Authors |
0.47 | 13 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kosuke Hashimoto | 1 | 30 | 8.07 |
Kiyoko Flora Aoki-Kinoshita | 2 | 5 | 0.47 |
Nobuhisa Ueda | 3 | 369 | 20.78 |
Minoru Kanehisa | 4 | 4429 | 707.80 |
Hiroshi Mamitsuka | 5 | 973 | 91.71 |