Title
Compressing Yahoo Mail
Abstract
Yahoo mail servers have been receiving an enormous number of messages each day for the past 17 years. The vast majority of today's messages are machine-generated (about 90% of the messages), based on a boilerplate with a small number of specific per-recipient changes. We show that the popular Zlib compression to gzip format fails to fully utilize the high similarity between these machine-generated messages. In this paper we analyze the data redundancy in Yahoo mail, and present methods to reduce its space requirements while using the standard Zlib library. Our results show we can further reduce the compressed data size by a factor of almost 2.5, compared to traditional gzip compression.
Year
DOI
Venue
2015
10.1109/DCC.2015.15
DCC '15 Proceedings of the 2015 Data Compression Conference
Keywords
Field
DocType
data compression,electronic mail,reliability,software libraries,Yahoo mail servers,Zlib compression,Zlib library,boilerplate,data redundancy,gzip compression,gzip format,machine-generated messages,specific per-recipient changes,Compression,Deflate,Mail,Yahoo,Zlib,gzip
Space requirements,World Wide Web,Computer science,Boilerplate text,Server,Data redundancy,Redundancy (engineering),DEFLATE
Conference
ISSN
Citations 
PageRank 
1068-0314
1
0.38
References 
Authors
10
2
Name
Order
Citations
PageRank
Aran Bergman1473.93
Eyal Zohar210.38