Title
Approximate matching of persistent LExicon using search-engines for classifying Mobile app traffic
Abstract
We present AMPLES, Approximate Matching of Persistent LExicon using Search-Engines, to address the Mobile-Application-Identification (MApId) problem in network traffic at a per-flow granularity. We transform MApId into an information-retrieval problem where lexical similarity of short-text-documents is used as a metric for classification tasks. Specifically, a network-flow, observed at an intercept-point, is treated as a semi-structured-text-document and modified into a flow-query. This query is then run against a corpus of documents pre-indexed in a search-engine. Each index-document represents an application, and consists of distinguishable identifiers from the metadata-file and URL-strings found in the application's executable-archive. The search-engine acts as a kernel function, generating a score distribution vis-'a-vis the index-documents, to determine a match. This extends the scope of MApId to fuzzy-classification mapping a flow to a family of apps when the score distribution is spread-out. Through experiments over an emulator-generated test-dataset (400 K applications and 13.5 million flows), we obtain over 80% flow coverage and about 85% application coverage with low false-positives (4%) and nearly no false-negatives. We also validate our methodology over a real network trace. Most importantly, our methodology is platform agnostic, and subsumes previous studies, most of which focus solely on the application coverage.
Year
DOI
Venue
2016
10.1109/INFOCOM.2016.7524386
IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications
Keywords
Field
DocType
approximate matching of persistent lexicon using search-engines,AMPLES,mobile-application-identification,MApId classification,information retrieval,network flow,semistructured text document,flow query,metadata file,URL string,fuzzy classification mapping
Lexical similarity,Data mining,Mobile app,Search engine,Identifier,Computer science,Lexicon,Approximate matching,Granularity,Kernel (statistics)
Conference
ISSN
ISBN
Citations 
0743-166X
978-1-4673-9954-8
6
PageRank 
References 
Authors
0.50
12
3
Name
Order
Citations
PageRank
Gyan Ranjan1352.26
Alok Tongaonkar224114.88
Ruben Torres3363.07