Title
Evaluation of BIC and cross validation for model selection on sequence segmentations.
Abstract
Segmentation is a general data mining technique for summarising and analysing sequential data. Segmentation can be applied, e.g., when studying large-scale genomic structures such as isochores. Choosing the number of segments remains a challenging question. We present extensive experimental studies on model selection techniques, Bayesian Information Criterion (BIC) and Cross Validation (CV). We successfully identify segments with different means or variances, and demonstrate the effect of linear trends and outliers, frequently occurring in real data. Results are given for real DNA sequences with respect to changes in their codon, G + C, and bigram frequencies, and copy-number variation from CGH data.
Year
DOI
Venue
2010
10.1504/IJDMB.2010.037547
IJDMB
Keywords
Field
DocType
challenging question,bigram frequency,model selection,general data mining technique,cgh data,real dna sequence,copy-number variation,bayesian information criterion,cross validation,analysing sequential data,sequence segmentation
Sequence segmentation,Sequential data,Bayesian information criterion,Pattern recognition,Computer science,Categorical variable,Segmentation,Model selection,Artificial intelligence,Cross-validation,Machine learning,Binary number
Journal
Volume
Issue
ISSN
4
6
1748-5673
Citations 
PageRank 
References 
0
0.34
12
Authors
2
Name
Order
Citations
PageRank
Niina Haiminen1819.71
Heikki Mannila265951495.69