Abstract | ||
---|---|---|
We present an effective word and document matrix representation architecture based on a linear operation, referred to as doc2matrix, to learn representations for document-level classification. It uses a matrix to present each word or document, which is different from the traditional form of vector representation. Doc2matrix defines proper subwindows as the scale of text. A word matrix and a document matrix are generated by stacking the information of these subwindows. Our document matrix not only contains more fine-grained semantic and syntactic information than the original representation but also introduces abundant two-dimensional features. Experiments conducted on four document-level classification tasks demonstrate that the proposed architecture can generate higher-quality word and document representations and outperform previous models based on linear operations. We can see that compared to different classifiers, a convolutional-based classifier is more suitable for our document matrix. Furthermore, we also demonstrate that the convolution operation can better capture the two-dimensional features of the proposed document matrix by the analysis from both theoretical and experimental perspectives. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1007/s00521-019-04541-x | Neural Computing and Applications |
Keywords | DocType | Volume |
Document-level classification, Word matrix, Document matrix, Subwindows | Journal | 32 |
Issue | ISSN | Citations |
14 | 0941-0643 | 0 |
PageRank | References | Authors |
0.34 | 0 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shun Guo | 1 | 0 | 0.68 |
Nianmin Yao | 2 | 159 | 21.57 |