Title
The Diseases Clustering for Multi-source Medical Sets
Abstract
The construction of medical database has been constructed to some degrees, but the data redundancy between many medical sets has great influence on searching cross different sets. In this paper, the first step is to use three major domestic medical sets as the foundation of the research. And the Natural Language processing technologies is applied to realize the segmentation of disease description. Then, we use TF-IDF to calculate the weight of the feature words in the disease description, and establish the disease feature vector. Based on this vector, the similarity of disease feature vectors is measured by the cosine similarity method. Finally, the effect of k-means and k-center clustering algorithm on the alignment of the disease text is compared. The experimental results show that the k-center clustering algorithm has better performance compared to k-means. And the result of the clustering is reasonable to some extent.
Year
DOI
Venue
2016
10.1109/IIKI.2016.37
2016 International Conference on Identification, Information and Knowledge in the Internet of Things (IIKI)
Keywords
Field
DocType
Multi-source medical sets,Align diseases,text clustering,Natural Language Processing
Ontology (information science),Feature vector,Cosine similarity,Pattern recognition,Segmentation,Computer science,Data acquisition,Computer network,Data redundancy,Artificial intelligence,Cluster analysis,Multi-source
Conference
ISBN
Citations 
PageRank 
978-1-5090-5953-9
0
0.34
References 
Authors
9
4
Name
Order
Citations
PageRank
Liangchi Li101.01
Shuaijing Xu272.40
Shenling Wang301.01
Xianlin Ma401.01