Title
Evolving fuzzy grammar for crime texts categorization
Abstract
This paper introduces the evolving fuzzy grammar (EFG) method for crime texts categorization. The learning model is built based on a set of selected text fragments which are then transformed into their underlying structure called fuzzy grammars.The fuzzy notion is used because the matching, parsing and grammar derivation involves uncertainty. Fuzzy union operator is also used to combine and transform individual text fragment grammars into more general representations of the learned text fragments. The set of learned fuzzy grammars is influenced by the evolution in the seen pattern; the learned model is slightly changed (incrementally) as adaptation, which does not require the conventional redevelopment.This paper compares EFG, a novel method for learning text structure at the text fragment level Refs. [1,2] with ML in categorizing crime incidents data. In contrast, the ML methods are generally statistically founded and require the conventional train-test-test-retrain when new pattern is found.It is hypothesized in this research that this makes the whole time involved in this ML higher. An experiment is carried out to compare the performance between EFG and ML methods in precision, recall, FF-measure and extension learning. A significant test is performed to measure the difference in mean value within precision, recall and FF-measure competency.The main strength of this paper in comparison with previous related works is that it describes completely the steps involved in developing EFG. In Ref. [2] the authors presented the general step for grammar-grammar combination. In Ref. [3] the highlight was given on the step to combine grammar while the Ref. [4] emphasizes the permutation free generated fuzzy grammars.This paper however, integrates and refines the previous mentioned papers by providing more details such as the algorithm for grammar combination and demonstrates this using data that express armed attack and bombing events.This paper also ventures on a new problem on text classification compared to previous smaller scale data used in case study of EFG in Ref. [2] and text extraction task in Refs. [1,6].Results show that the EFG algorithm produces results that are close in performance with the other ML methods in terms of precision, recall, and FF-measure. The performance across approaches investigated in this paper is compared against EFG using significant test. Result has also shown that EFG has lower model retraining adaptability time. Text mining refers to the activity of identifying useful information from natural language text. This is one of the criteria practiced in automated text categorization. Machine learning (ML) based methods are the popular solution for this problem. However, the developed models typically provide low expressivity and lacking in human-understandable representation. In spite of being highly efficient, the ML based methods are established in train-test setting, and when the existing model is found insufficient, the whole processes need to be reinvented which implies train-test-retrain and is typically time consuming. Furthermore, retraining the model is not usually practical and feasible option whenever there is continuous change. This paper introduces the evolving fuzzy grammar (EFG) method for crime texts categorization. In this method, the learning model is built based on a set of selected text fragments which are then transformed into their underlying structure called fuzzy grammars. The fuzzy notion is used because the grammar matching, parsing and derivation involve uncertainty. Fuzzy union operator is also used to combine and transform individual text fragment grammars into more general representations of the learned text fragments. The set of learned fuzzy grammars is influenced by the evolution in the seen pattern; the learned model is slightly changed (incrementally) as adaptation, which does not require the conventional redevelopment. The performance of EFG in crime texts categorization is evaluated against expert-tagged real incidents summaries and compared against C4.5, support vector machines, naïve Bayes, boosting, and k-nearest neighbour methods. Results show that the EFG algorithm produces results that are close in performance with the other ML methods while being highly interpretable, easily integrated into a more comprehensive grammar system and with lower model retraining adaptability time.
Year
DOI
Venue
2015
10.1016/j.asoc.2014.11.038
Applied Soft Computing
Keywords
Field
DocType
machine learning,soft computing
Rule-based machine translation,Categorization,Naive Bayes classifier,Computer science,Fuzzy logic,Grammar,Artificial intelligence,Boosting (machine learning),Parsing,Soft computing,Machine learning
Journal
Volume
Issue
ISSN
28
C
1568-4946
Citations 
PageRank 
References 
3
0.37
46
Authors
2
Name
Order
Citations
PageRank
Nurfadhlina Mohd Sharef16011.72
Trevor P. Martin213426.98