Abstract | ||
---|---|---|
The telecommunications industry makes extensive use of data extracted from logs, alarms, traces, diagnostics, and other monitoring devices. Analyzing the generated data requires that the data be parsed, re-structured, and re-formatted. Developing custom parsers for each input format is labor-intensive and requires domain knowledge. In this paper, we describe a novel unsupervised text processing pipeline to automatically detect and label relevant data and eliminate noise using Levenshtein similarity and Agglomerative clustering. We experiment with different similarity and clustering algorithms on a selection of common data formats to verify the accuracy of the proposed technique. The results suggest that the proposed methodology has higher accuracy. |
Year | DOI | Venue |
---|---|---|
2020 | 10.23919/CNSM50824.2020.9269096 | 2020 16th International Conference on Network and Service Management (CNSM) |
Keywords | DocType | ISSN |
Unsupervised Data Mining,Information Extraction,Clustering,Similarity | Conference | 2165-9605 |
ISBN | Citations | PageRank |
978-1-6654-1547-7 | 0 | 0.34 |
References | Authors | |
9 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shubham Jain | 1 | 0 | 0.68 |
Amy de Buitléir | 2 | 0 | 0.34 |
Enda Fallon | 3 | 0 | 0.68 |