Abstract | ||
---|---|---|
Pattern mining derives from the need of discovering hidden knowledge in very large amounts of data, regardless of the form in which it is presented. When it comes to Natural Language Processing (NLP) , it arose along the humans' necessity of being understood by computers. In this paper we present an exploratory approach that aims at bringing together the best of both worlds. Our goal is to discover patterns in linguistically processed texts, through the usage of NLP state-of-the-art tools and traditional pattern mining algorithms. Articles from a Portuguese newspaper are the input of a series of tests described in this paper. First, they are processed by an NLP chain, which performs a deep linguistic analysis of text; afterwards, pattern mining algorithms Apriori and GenPrefixSpan are used. Results showed the applicability of sequential pattern mining techniques in textual structured data, and also provided several evidences about the structure of the language. |
Year | DOI | Venue |
---|---|---|
2009 | 10.1007/978-3-642-03070-3_20 | MLDM |
Keywords | Field | DocType |
nlp chain,sequential pattern mining technique,nlp state-of-the-art tool,linguistically processed text,textual structured data,portuguese newspaper,traditional pattern mining algorithm,pattern mining,exploratory approach,natural language processing,deep linguistic analysis,structured data | Text mining,Parse tree,Of the form,Computer science,Portuguese,A priori and a posteriori,Newspaper,Association rule learning,Artificial intelligence,Natural language processing,Data model,Machine learning | Conference |
Volume | ISSN | Citations |
5632 | 0302-9743 | 1 |
PageRank | References | Authors |
0.36 | 12 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ana Cristina Mendes | 1 | 93 | 11.59 |
Cláudia Antunes | 2 | 161 | 16.57 |