Abstract | ||
---|---|---|
A story is defined as “an actor(s) taking action(s) that culminates in a resolution(s).” In this paper, we investigate the utility of standard keyword based features, statistical features based on shallow-parsing (such as density of POS tags and named entities), and a new set of semantic features to develop a story classifier. This classifier is trained to identify a paragraph as a “story,” if the paragraph contains mostly story(ies). Training data is a collection of expert-coded story and non-story paragraphs from RSS feeds from a list of extremist web sites. Our proposed semantic features are based on suitable aggregation and generalization of <;Subject, Verb, Object>; triplets that can be extracted using a parser. Experimental results show that a model of statistical features alongside memory-based semantic linguistic features achieves the best accuracy with a Support Vector Machine (SVM) classifier. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1109/ASONAM.2012.97 | Advances in Social Networks Analysis and Mining |
Keywords | Field | DocType |
support vector machine,semantic triplet,pos tag,proposed semantic feature,best accuracy,expert-coded story,semantic feature,non-story paragraph,memory-based semantic linguistic feature,statistical feature,story classifier,feature extraction,accuracy,linguistics,organizations,support vector machines,semantics,statistical analysis,literature,artificial intelligence,grammars | Rule-based machine translation,Data mining,Computer science,Paragraph,Artificial intelligence,Natural language processing,Classifier (linguistics),Support vector machine,Feature extraction,Parsing,RSS,Machine learning,Semantics | Conference |
ISBN | Citations | PageRank |
978-1-4673-2497-7 | 5 | 0.45 |
References | Authors | |
21 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Betul Ceran | 1 | 42 | 2.65 |
Ravi Karad | 2 | 5 | 0.45 |
Ajay Mandvekar | 3 | 5 | 0.45 |
Steven R. Corman | 4 | 97 | 8.72 |
Hasan Davulcu | 5 | 584 | 86.85 |