Title
On-Device Language Detection and Classification of Extreme Short Text from Calendar Titles Across Languages
Abstract
Smartphones have become indispensable part of day-to-day human life. These devices provide rapid access to digital calendars enabling users to schedule their personal and professional activities with short titles referred as event titles. Event titles provide valuable information for personalization of various services. However, very nature of the event titles to be short with only few words, pose a challenge to identify language and exact event the user is scheduling. Deployment of robust machine learning pipelines that can continuously learn from data on the server side is not feasible as the event titles represent private user data and raise significant concerns. To tackle this challenge, we propose a privacy preserving on-device solution namely Calendar Event Classifier (CEC) to classify calendar titles into a set of 22 event types grouped into 3 categories using the fastText library. Our language detection models with accuracies of 96%, outperform existing language detection tools by 20% and our event classifiers achieved 92%, 94%, 87% and 90% accuracies across, English, Korean and German, French respectively. Currently tested CEC module architecture delivers the fastest (4 ms/event) predictions with <8 MB memory footprint and cater multiple personalization services. Taken together, we present the need for customization of machine learning models for language detection and information extraction from extremely short text documents such as calendar titles.
Year
DOI
Venue
2022
10.1007/978-3-031-08473-7_5
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022)
Keywords
DocType
Volume
Language detection, Short text classification, Event classification, fastText
Conference
13286
ISSN
Citations 
PageRank 
0302-9743
0
0.34
References 
Authors
0
6