Title
Automatic Meaning Discovery Using Google
Abstract
Abstract We present a new theory of relative semantics between objects, based on information distance and Kolmogorov,complexity. This theory is then applied to construct a method,to automatically extract the meaning,of words and phrases from the world-wide-web using Google page counts. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. The world-wide-web is the largest database on earth, and the latent semantic context information entered by millions of independent users averages out to provide automatic meaning,of useful quality. We give examples to distinguish between,colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance,of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.
Year
Venue
Keywords
2006
Kolmogorov Complexity and Applications
support vector machine,binary classification,world wide web,natural language,randomized trial
Field
DocType
Citations 
Normalized Google distance,Ontology,Problem domain,Binary classification,Information retrieval,Kolmogorov complexity,Computer science,Information distance,WordNet,Semantics
Conference
16
PageRank 
References 
Authors
1.47
13
2
Name
Order
Citations
PageRank
Rudi Cilibrasi112813.21
Paul Vitányi22130287.76