Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation. - Citegraph

Paper Info

Title
Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation.

Abstract
One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotate data from four different forums. Each of these forums constitutes its own fine-grained in that the forums cover different market sectors with different properties, even though all forums are in the broad domain of cybercrime. We characterize these domain differences in the context of a learning-based system: supervised models see decreased accuracy when applied to new forums, and standard techniques for semi-supervised learning and domain adaptation have limited effectiveness on this data, which suggests the need to improve these techniques. We release a dataset of 1,938 annotated posts from across the four forums.

Year	DOI	Venue
2017	10.18653/v1/d17-1275	empirical methods in natural language processing
DocType	Volume	ISSN
Journal	abs/1708.09609	EMNLP (2017) 2598-2607
Citations	PageRank	References
2	0.35	15
Authors
8

Authors (8 rows)

Cited by (2 rows)

References (15 rows)

Name	Order	Citations	PageRank
Greg Durrett	1	341	26.94
Jonathan K. Kummerfeld	2	93	16.19
Taylor Berg-Kirkpatrick	3	554	35.93
Rebecca S. Portnoff	4	50	3.20
sadia afroz	5	274	18.85
damon mccoy	6	2073	125.49
Kirill Levchenko	7	1235	83.12
Vern Paxson	8	14031	2130.20

1