Unsupervised Noise Detection in Unstructured data for Automatic Parsing - Citegraph

Paper Info

Title
Unsupervised Noise Detection in Unstructured data for Automatic Parsing

Abstract
The telecommunications industry makes extensive use of data extracted from logs, alarms, traces, diagnostics, and other monitoring devices. Analyzing the generated data requires that the data be parsed, re-structured, and re-formatted. Developing custom parsers for each input format is labor-intensive and requires domain knowledge. In this paper, we describe a novel unsupervised text processing pipeline to automatically detect and label relevant data and eliminate noise using Levenshtein similarity and Agglomerative clustering. We experiment with different similarity and clustering algorithms on a selection of common data formats to verify the accuracy of the proposed technique. The results suggest that the proposed methodology has higher accuracy.

Year	DOI	Venue
2020	10.23919/CNSM50824.2020.9269096	2020 16th International Conference on Network and Service Management (CNSM)
Keywords	DocType	ISSN
Unsupervised Data Mining,Information Extraction,Clustering,Similarity	Conference	2165-9605
ISBN	Citations	PageRank
978-1-6654-1547-7	0	0.34
References	Authors
9	3

Authors (3 rows)

Cited by (0 rows)

References (9 rows)

Name	Order	Citations	PageRank
Shubham Jain	1	0	0.68
Amy de Buitléir	2	0	0.34
Enda Fallon	3	0	0.68

1