Preprocessing text to make it more compressible
The Endeavour 2024-10-15
Summary:
Repetitive text compresses efficiently. Text like the lyrics to Jingle Bells ought to compress more efficiently than ordinary prose, assuming the compression algorithm can exploit the repetition. The idea of the Burrows-Wheeler transform is to permute text in before compressing it. The hope is that the permutation will make the repetition in the text easier […]
The post Preprocessing text to make it more compressible first appeared on John D. Cook.