Abstract | ||
---|---|---|
This paper is related to a project aiming at discovering weak signals from different streams of information, possibly sent by whistleblowers. The study presented in this paper tackles the particular problem of clustering topics at multi-levels from multiple documents, and then extracting meaningful descriptors, such as weighted lists of words for document representations in a multi-dimensions space. In this context, we present a novel idea which combines Latent Dirichlet Allocation and Word2vec (providing a consistency metric regarding the partitioned topics) as potential method for limiting the "a priori" number of cluster K usually needed in classical partitioning approaches. We proposed 2 implementations of this idea, respectively able to: (1) finding the best K for LDA in terms of topic consistency; (2) gathering the optimal clusters from different levels of clustering. We also proposed a non-traditional visualization approach based on a multi-agents system which combines both dimension reduction and interactivity. |
Year | DOI | Venue |
---|---|---|
2019 | 10.1109/ICDAR.2019.00024 | 2019 International Conference on Document Analysis and Recognition (ICDAR) |
Keywords | Field | DocType |
weak signal,clustering topics,word embedding,multi-agent system,vizualisation | Latent Dirichlet allocation,Dimensionality reduction,Information retrieval,Pattern recognition,Computer science,Visualization,Multi-agent system,Information extraction,Artificial intelligence,Word2vec,Word embedding,Cluster analysis | Conference |
ISSN | ISBN | Citations |
1520-5363 | 978-1-7281-3015-6 | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Julien Maitre | 1 | 0 | 0.34 |
Michel Ménard | 2 | 0 | 0.34 |
Guillaume Chiron | 3 | 8 | 2.27 |
Alain Bouju | 4 | 93 | 15.32 |
Nicolas Sidere | 5 | 24 | 7.00 |