Abstract | ||
---|---|---|
We have built a database that provides term vector information for large numbers of pages (hundreds of millions). The basic operation of the database is to take URLs and return term vectors. Compared to computing vectors by downloading pages via HTTP, the Term Vector Database is several orders of magnitude faster, enabling a large class of applications that would be impractical without such a database. This paper describes the Term Vector Database in detail. It also reports on two applications built on top of the database. The first application is an optimization of connectivity-based topic distillation. The second application is a Web page classifier used to annotate results returned by a Web search engine. |
Year | DOI | Venue |
---|---|---|
2000 | 10.1016/S1389-1286(00)00046-3 | Computer Networks |
Keywords | Field | DocType |
Page classification,Term vectors,Topic distillation,Web connectivity,Web search | Web search engine,Static web page,World Wide Web,Information retrieval,Web page,Web mapping,Computer science,Search engine indexing,Data Web,Database schema,Rewrite engine,Database | Journal |
Volume | Issue | ISSN |
33 | 1-6 | Computer Networks |
Citations | PageRank | References |
13 | 2.80 | 4 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Raymie Stata | 1 | 1968 | 245.65 |
Krishna A. Bharat | 2 | 1211 | 252.86 |
Farzin Maghoul | 3 | 1198 | 173.90 |