Semi-supervised expert metadata extraction based on co-training style - Citegraph

Paper Info

Title
Semi-supervised expert metadata extraction based on co-training style

Abstract
Aiming at the problem that requiring large amounts of labeled training data while using supervised learning to extract the expert metadata, a semi-supervised expert metadata extraction method based on co-training style is proposed. Firstly, according to the characteristics of expert metadata, we select expert metadata features and label a certain amount of metadata samples, then train two classifiers with maximum entropy and conditional random respectively. Secondly, two classifiers are used to label metadata items in the unlabeled expert home pages; when the classification results of one type metadata in one expert page satisfy the confidence requirement, analyze the differences of each type metadata labeled by two classifiers; for the metadata satisfying the difference requirement, the better performing classifier for one type metadata is selected to label the certain type metadata, then the labeled expert homepage is obtained as the labeled sample. Finally, use the above-mentioned labeled expert homepage to extend training samples, and retrain two new classifiers, then iterate until two classifiers are convergent. In the experiment, we collected 2000 expert home pages; the results indicate that the semi-supervised expert metadata extraction method based on co-training style outperforms a number of supervised methods, which reduces the amount of manual labeling work effectively.

Year	DOI	Venue
2012	10.1109/FSKD.2012.6234139	FSKD
Keywords	Field	DocType
co-training learning,supervised learning,labeled training data,semisupervised expert metadata extraction,learning (artificial intelligence),labeled expert homepage,pattern classification,semi-supervised,unlabeled expert home pages,expert metadata extraction,classifiers,cotraining style,meta data,entropy,maximum entropy,learning artificial intelligence,accuracy,feature extraction,labeling,satisfiability,data mining,classification algorithms,organizations	Training set,Metadata,Pattern recognition,Computer science,Co-training,Supervised learning,Artificial intelligence,Principle of maximum entropy,Classifier (linguistics),Machine learning	Conference
Volume	Issue	ISBN
null	null	978-1-4673-0025-4
Citations	PageRank	References
0	0.34	6
Authors
5

Authors (5 rows)

Cited by (0 rows)

References (6 rows)

Name	Order	Citations	PageRank
Youmin M. Zhang	1	1267	128.81
Zhengtao Yu	2	460	69.08
Li Liu	3	5	1.45
Jianyi Guo	4	20	10.99
Cunli Mao	5	51	11.54

1