Abstract | ||
---|---|---|
We introduce TechQA, a domain-adaptation question answering dataset for the technical support domain. The TechQA corpus highlights two real-world issues from the automated customer support domain. First, it contains actual questions posed by users on a technical forum, rather than questions generated specifically for a competition or a task. Second, it has a real-world size -- 600 training, 310 dev, and 490 evaluation question/answer pairs -- thus reflecting the cost of creating large labeled datasets with actual data. Consequently, TechQA is meant to stimulate research in domain adaptation rather than being a resource to build QA systems from scratch. The dataset was obtained by crawling the IBM Developer and IBM DeveloperWorks forums for questions with accepted answers that appear in a published IBM Technote---a technical document that addresses a specific technical issue. We also release a collection of the 801,998 publicly available Technotes as of April 4, 2019 as a companion resource that might be used for pretraining, to learn representations of the IT domain language. |
Year | Venue | DocType |
---|---|---|
2020 | ACL | Conference |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
21 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vittorio Castelli | 1 | 928 | 129.71 |
Chakravarti Rishav | 2 | 0 | 0.34 |
Dana Saswati | 3 | 0 | 0.34 |
Anthony Ferritto | 4 | 0 | 1.69 |
Radu Florian | 5 | 924 | 91.44 |
Martin Franz | 6 | 483 | 53.56 |
Dinesh Garg | 7 | 0 | 1.69 |
Dinesh Khandelwal | 8 | 0 | 2.03 |
J. Scott Mccarley | 9 | 214 | 21.36 |
McCawley Mike | 10 | 0 | 0.34 |
Nasr Mohamed | 11 | 0 | 0.34 |
Pan Lin | 12 | 1 | 0.69 |
Pendus Cezar | 13 | 0 | 0.34 |
John F. Pitrelli | 14 | 493 | 81.16 |
Pujar Saurabh | 15 | 0 | 0.34 |
Salim Roukos | 16 | 6248 | 845.50 |
Sakrajda Andrzej | 17 | 0 | 0.34 |
Avirup Sil | 18 | 131 | 13.85 |
Uceda-Sosa Rosario | 19 | 0 | 0.34 |
Todd Ward | 20 | 0 | 1.01 |
Zhang Rong | 21 | 0 | 0.34 |