Title
A Study Of Damp-Heat Syndrome Classification Using Word2vec And Tf-Idf
Abstract
With people's increasing concern about health, judging people's health through medical record is becoming a potential demand. Most of preview disease analysis researches were conducted on structured dataset, which usually ignored the relationship between different symptoms, and the dataset was expensive to get. In this paper, we proposed a novel model based on Word2vec and Terms Frequency-Inverse Document Frequency (TF-IDF), which could be used to detect damp-heat syndrome on unstructured records directly. Firstly, we adopt ICTCLAS system combined with corpus collected in the field of Traditional Chinese Medicine (TCM) to segment the clinical records into words. Secondly, Word2vec tool was used to train word vector. Then, we constructed the record representation vector according to word vector and TF-IDF. The record representation method was named Word2vec+ TF-IDF. In order to verify the effectiveness of the proposed method, we compared our record representation method with other text representation methods under four different classifiers. The experiment was conducted on the dataset collected from over 10 Chinese Medicine hospitals. And the experimental results show that our model perform better than the state-of-theart methods such as LSA and Doc2vec.
Year
Venue
Keywords
2016
2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)
Clinical record analysis, Word2vec, TF-IDF, TCM, Damp-heat syndrome Classification
Field
DocType
ISSN
tf–idf,Computer science,Medical record,Artificial intelligence,Word2vec,Machine learning
Conference
2156-1125
Citations 
PageRank 
References 
3
0.45
12
Authors
5
Name
Order
Citations
PageRank
Wei Zhu130.45
Wei Zhang230.45
Guo-Zheng Li336842.62
Chong He430.45
Lei Zhang530.45