mardi 8 novembre 2016

Zip-Ada v.52 - featuring LZMA compression

In case you missed it, there is a new version of Zip-Ada @ .

Actually, these are two successive versions: v.51 with a basic LZMA encoder, v.52 with a more advanced one.
The shift from v.50 to v.51 ensured 52 steps up in the Squeeze Chart benchmark, although the LZ part remained identical and the new "MA" part is a simple, straightforward encoder which comes in replacement of our sophisticated Taillaule algorithm for the Deflate format. This shows just how much the LZMA format is superior to Deflate.
Then, from v.51 to v.52, there were 45 more steps upward. This is due to a combination of a better-suited LZ algorithm, and a refinement of the "MA" algorithm - details below.

* Changes in '51', 27-Aug-2016:
  - LZMA.Encoding has been added; it is a standalone compressor,
      see lzma_enc.adb for an example of use.
  - Zip.Compress provides now LZMA_1, LZMA_2 methods. In other words, you
      can use the LZMA compression with Zip.Create.
  - Zip.Compress has also a "Preselection" method that selects
      a compression method depending on hints like the uncompressed size.
  - Zip.Compress.Deflate: Deflate_1 .. Deflate_3 compression is
      slightly better.

The LZMA format, new in Zip-Ada on the encoding side, is especially good for compressing database data - be it in binary or text forms. Don't be surprised if the resulting archive represent only a few percents of the original data...
The new piece of code, LZMA.Encoding, has been written from scratch. This simple version, fully functional, holds in only 399 lines, after going through J-P. Rosen's Normalize tool.
It can be interesting for those who are curious about how the "MA" part of that compression algorithm is working.
The code can be browsed here.

* Changes in '52', 08-Oct-2016:
  - UnZip.Streams: all procedures have an additional (optional)
      Ignore_Directory parameter.
  - Zip.Compress has the following new methods with improved compression:
      LZMA_3, Preselection_1 (replaces Preselection), Preselection_2.
      Preselection methods use now entry name extension and size for
      improving compression, while remaining 1-pass methods.

For those interested about what's happening "under the hood", LZMA.Encoding now computes (with floating-point numbers, something unusual in compression code!) an estimation of the predicted probabilities of some alternative encodings, and chooses the most probable one - it gives an immediate better local compression. Sometimes the repetition of such a repeated short-run improvement has a long-run positive effect, but sometimes not - that's where it's beginning to be fun...

Aucun commentaire:

Enregistrer un commentaire

The new floor

The Fed has now admitted the obvious: the "temporary" measures of the last decade are actually permanent: zero (or less) real inte...