Dynamic Unbounded Integer Compression
Mulpuri, Naga Sai Gowtham
One of the important applications of data compression is inverted indexes compression, which is a lossless compression. Presently, most of the internet search engines such as Google and Wikipedia process a large amount of documents and employ information retrieval policies through inverted indexes for fast search or query. This might necessitate compressing the data in order to store it in physical memory and enable computationally effective retrieval of information. Nevertheless, their currently used compression methods implement static compression procedures and do not fully exploit dynamic integer compression capabilities. In this research, we have exploited dynamic Huffman compression method referred to as δ-Huffman coding for compressing integers and improved the attainable compression ratios. To achieve this, we have explored a combination of static and dynamic compression algorithms. To the best of our knowledge this is the first work that combines unbounded integer compression methods with Huffman coding and comparatively evaluates their performance. We have applied compression algorithms on the data-sets drawn from Geometric probability distribution function, Poisson distributions, realistic inverted lists, and page-rank lists obtained from Wikipedia. Our results show that the bit-rate of δ-Huffman coding is the best of all of the tested coding methods including Comma code, Elias Gamma code, Elias Delta code, Elias Omega code, Fibonacci code, and Golomb code. Additionally, it is very close to the estimated entropy of the test sequences.
Mulpuri, N. S. G. (2016). <i>Dynamic unbounded integer compression</i> (Unpublished thesis). Texas State University, San Marcos, Texas.