Abstract | ||
---|---|---|
Query-based triggers play a crucial role in modern search systems, e.g., in deciding when to display direct answers on result pages. We address a common scenario in designing such triggers for real-world settings where positives are rare and search providers possess only a small seed set of positive examples to learn query classification models. We choose the critical domain of self-harm intent detection to demonstrate how such small seed sets can be expanded to create meaningful training data with a sizable fraction of positive examples. Our results show that with our method, substantially more positive queries can be found compared to plain random sampling. Additionally, we explored the effectiveness of traditional active learning approaches on classification performance and found that maximum uncertainty performs the best among several other techniques that we considered. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2806416.2806594 | ACM International Conference on Information and Knowledge Management |
Field | DocType | Citations |
Query optimization,Training set,Data mining,Active learning,Query expansion,Information retrieval,Computer science,Harm,Web query classification,Artificial intelligence,Sampling (statistics),Machine learning | Conference | 2 |
PageRank | References | Authors |
0.42 | 12 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ashiqur R. KhudaBukhsh | 1 | 82 | 9.30 |
Paul N. Bennett | 2 | 1500 | 87.93 |
Ryen White | 3 | 4546 | 222.75 |