The Diseases Clustering for Multi-source Medical Sets - Citegraph

Paper Info

Title
The Diseases Clustering for Multi-source Medical Sets

Abstract
The construction of medical database has been constructed to some degrees, but the data redundancy between many medical sets has great influence on searching cross different sets. In this paper, the first step is to use three major domestic medical sets as the foundation of the research. And the Natural Language processing technologies is applied to realize the segmentation of disease description. Then, we use TF-IDF to calculate the weight of the feature words in the disease description, and establish the disease feature vector. Based on this vector, the similarity of disease feature vectors is measured by the cosine similarity method. Finally, the effect of k-means and k-center clustering algorithm on the alignment of the disease text is compared. The experimental results show that the k-center clustering algorithm has better performance compared to k-means. And the result of the clustering is reasonable to some extent.

Year	DOI	Venue
2016	10.1109/IIKI.2016.37	2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI)
Keywords	Field	DocType
Multi-source medical sets,Align diseases,text clustering,Natural Language Processing	Ontology (information science),Feature vector,Cosine similarity,Pattern recognition,Segmentation,Computer science,Data acquisition,Computer network,Data redundancy,Artificial intelligence,Cluster analysis,Multi-source	Conference
ISBN	Citations	PageRank
978-1-5090-5953-9	0	0.34
References	Authors
9	4

Authors (4 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Liangchi Li	1	0	1.01
Shuaijing Xu	2	7	2.40
Shenling Wang	3	0	1.01
Xianlin Ma	4	0	1.01

1