Title
Unsupervised Noise Detection in Unstructured data for Automatic Parsing
Abstract
The telecommunications industry makes extensive use of data extracted from logs, alarms, traces, diagnostics, and other monitoring devices. Analyzing the generated data requires that the data be parsed, re-structured, and re-formatted. Developing custom parsers for each input format is labor-intensive and requires domain knowledge. In this paper, we describe a novel unsupervised text processing pipeline to automatically detect and label relevant data and eliminate noise using Levenshtein similarity and Agglomerative clustering. We experiment with different similarity and clustering algorithms on a selection of common data formats to verify the accuracy of the proposed technique. The results suggest that the proposed methodology has higher accuracy.
Year
DOI
Venue
2020
10.23919/CNSM50824.2020.9269096
2020 16th International Conference on Network and Service Management (CNSM)
Keywords
DocType
ISSN
Unsupervised Data Mining,Information Extraction,Clustering,Similarity
Conference
2165-9605
ISBN
Citations 
PageRank 
978-1-6654-1547-7
0
0.34
References 
Authors
9
3
Name
Order
Citations
PageRank
Shubham Jain100.68
Amy de Buitléir200.34
Enda Fallon300.68