Abstract | ||
---|---|---|
Text-based knowledge extraction methods for populating knowledge bases have focused on binary facts: relationships between two entities. However, in advanced domains such as health, it is often crucial to consider ternary and higher-arity relations. An example is to capture which drug is used for which disease at which dosage (e.g. 2.5 mg/day) for which kinds of patients (e.g., children vs. adults). In this work, we present an approach to harvest higher-arity facts from textual sources. Our method is distantly supervised by seed facts, and uses the fact-pattern duality principle to gather fact candidates with high recall. For high precision, we devise a constraint-based reasoning method to eliminate false candidates. A major novelty is in coping with the difficulty that higher-arity facts are often expressed only partially in texts and strewn across multiple sources. For example, one sentence may refer to a drug, a disease and a group of patients, whereas another sentence talks about the drug, its dosage and the target group without mentioning the disease. Our methods cope well with such partially observed facts, at both pattern-learning and constraint-reasoning stages. Experiments with health-related documents and with news articles demonstrate the viability of our method.
|
Year | DOI | Venue |
---|---|---|
2018 | 10.1145/3178876.3186000 | WWW '18: The Web Conference 2018
Lyon
France
April, 2018 |
Field | DocType | ISBN |
Knowledge graph,Arity,Computer science,Coping (psychology),Natural language processing,Artificial intelligence,Knowledge extraction,Novelty,Recall,Sentence,Machine learning | Conference | 978-1-4503-5639-8 |
Citations | PageRank | References |
4 | 0.40 | 40 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Patrick Ernst | 1 | 70 | 6.51 |
Amy Siu | 2 | 8 | 2.83 |
Gerhard Weikum | 3 | 12710 | 2146.01 |