Title
Deep Semantic Text Hashing with Weak Supervision.
Abstract
With an ever increasing amount of data available on the web, fast similarity search has become the critical component for large-scale information retrieval systems. One solution is semantic hashing which designs binary codes to accelerate similarity search. Recently, deep learning has been successfully applied to the semantic hashing problem and produces high-quality compact binary codes compared to traditional methods. However, most state-of-the-art semantic hashing approaches require large amounts of hand-labeled training data which are often expensive and time consuming to collect. The cost of getting labeled data is the key bottleneck in deploying these hashing methods. Motivated by the recent success in machine learning that makes use of weak supervision, we employ unsupervised ranking methods such as BM25 to extract weak signals from training data. We further introduce two deep generative semantic hashing models to leverage weak signals for text hashing. The experimental results on four public datasets show that our models can generate high-quality binary codes without using hand-labeled training data and significantly outperform the competitive unsupervised semantic hashing baselines.
Year
DOI
Venue
2018
10.1145/3209978.3210090
SIGIR
Keywords
Field
DocType
Semantic Hashing,Weak Supervision,Variational Autoencoder
Data mining,Bottleneck,Ranking,Computer science,Binary code,Hash function,Artificial intelligence,Labeled data,Deep learning,Semantic hashing,Nearest neighbor search
Conference
ISBN
Citations 
PageRank 
978-1-4503-5657-2
2
0.37
References 
Authors
9
3
Name
Order
Citations
PageRank
Suthee Chaidaroon1182.31
Travis Ebesu2662.72
Yi Fang337932.01