dimanche 27 novembre 2016

L'effet Trump

Avec le passage du temps et l'expérience, il y a forcément de moins en moins souvent des événements surprenants pour nous dans le monde merveilleux de la finance. Nous rappelons au passage que la notion de «cygne noir» dépend du niveau d’information et de la mémoire du sujet, et bien sûr de sa capacité de raisonnement (je pense à la jolie expression «connect the dots»).
La surprise est donc ici l’effet, net et immédiat, de l’élection de Donald Trump sur les taux d’intérêts autour du globe. Ses promesses de déficits supplémentaires ont fait bondir les taux des obligations du trésor US, sur toutes les durées (de 2 à 30 ans). Du côté des durées courtes, les marchés semblent penser que la Fed aura désormais moins de scrupules à relever ses taux.
Ce qui est surprenant (pour nous) est la rapidité de la propagation de ce bond (du trésor 😏 ) sur les autres marchés, dans d’autres monnaies. Par exemple, voici les taux hypothécaires d’une banque régionale suisse :




Étonnant, non ?
A voir si «l'effet Trump» est le début de quelque chose de durable et marque un véritable tournant.
Nous y reviendrons dans un prochain billet.

dimanche 13 novembre 2016

GID release #06 - with Recurve, a chart data recovery tool

GID means Generic Image Decoder, a free, open-source library that can be found here.

The latest release features a couple of new application examples, among them a tool called Recurve for retrieving data from an image with plotted curves. Typically you come across a chart on a web site and would like to get the corresponding data, for reworking them in Excel - perhaps you want to spot specific values, or compare two curves that were not originally on the same chart, or use the data for further calculation. Sometimes the data is not available from the web site - and even less if the chart is from a PDF or a scanned newspaper page.

Fortunately, Recurve will do the painful job of retrieving the data points for you. It will detect gridlines and filter them out, then track the curves by matching their respective colours.

An example here:
Mortgage rates in Switzerland - 2004 to now. Chart is from the Comparis web site

mardi 8 novembre 2016

Zip-Ada v.52 - featuring LZMA compression

In case you missed it, there is a new version of Zip-Ada @ http://unzip-ada.sf.net .

Actually, these are two successive versions: v.51 with a basic LZMA encoder, v.52 with a more advanced one.
The shift from v.50 to v.51 ensured 52 steps up in the Squeeze Chart benchmark, although the LZ part remained identical and the new "MA" part is a simple, straightforward encoder which comes in replacement of our sophisticated Taillaule algorithm for the Deflate format. This shows just how much the LZMA format is superior to Deflate.
Then, from v.51 to v.52, there were 45 more steps upward. This is due to a combination of a better-suited LZ algorithm, and a refinement of the "MA" algorithm - details below.

* Changes in '51', 27-Aug-2016:
  - LZMA.Encoding has been added; it is a standalone compressor,
      see lzma_enc.adb for an example of use.
  - Zip.Compress provides now LZMA_1, LZMA_2 methods. In other words, you
      can use the LZMA compression with Zip.Create.
  - Zip.Compress has also a "Preselection" method that selects
      a compression method depending on hints like the uncompressed size.
  - Zip.Compress.Deflate: Deflate_1 .. Deflate_3 compression is
      slightly better.

The LZMA format, new in Zip-Ada on the encoding side, is especially good for compressing database data - be it in binary or text forms. Don't be surprised if the resulting archive represent only a few percents of the original data...
The new piece of code, LZMA.Encoding, has been written from scratch. This simple version, fully functional, holds in only 399 lines, after going through J-P. Rosen's Normalize tool.
It can be interesting for those who are curious about how the "MA" part of that compression algorithm is working.
The code can be browsed here.

* Changes in '52', 08-Oct-2016:
  - UnZip.Streams: all procedures have an additional (optional)
      Ignore_Directory parameter.
  - Zip.Compress has the following new methods with improved compression:
      LZMA_3, Preselection_1 (replaces Preselection), Preselection_2.
      Preselection methods use now entry name extension and size for
      improving compression, while remaining 1-pass methods.

For those interested about what's happening "under the hood", LZMA.Encoding now computes (with floating-point numbers, something unusual in compression code!) an estimation of the predicted probabilities of some alternative encodings, and chooses the most probable one - it gives an immediate better local compression. Sometimes the repetition of such a repeated short-run improvement has a long-run positive effect, but sometimes not - that's where it's beginning to be fun...

mardi 1 novembre 2016

AZip 2.0

The version 2.0 of AZip is out!

    URL: http://azip.sf.net/

AZip is a Zip archive manager.

The latest addition is an archive recompression tool.

AZip's recompression tool's results - click to enlarge
Some features: 

    - Multi-document (should be familiar to MS Office users)
    - Flat view / Tree view
    - Simple to use (at least I hope so ;-) )
    - Useful tools:
        - Text search function through an archive, without having to extract files
        - Archive updater
        - Integrity check
        - Archive recompression (new), using an algorithm-picking approach for improving a zip archive's compression.
    - Encryption
    - Methods supported: Reduce, Shrink, Implode, Deflate, Deflate64, BZip2, LZMA
    - Free, open-source
    - Portable (no installation needed, no DLL, no configuration file)

"Under the hood" features:

    - AZip is from A to Z in Ada :-)
    - Uses the highly portable Zip-Ada library
    - Portablity to various platforms: currently it's fully implemented with GWindows (for Windows), and there is a GtkAda draft, but anyway the key parts of the UI and user persistence are generic, platform-independent

Enjoy!

lundi 24 octobre 2016

La BNS double encore la mise !

Encore quelques données fraîches, cette fois de la Banque Nationale Suisse (BNS).

Le nombre de milliards de francs en circulation vient de passer une nouvelle puissance de deux, comme il l'a fait quatre fois en seize ans en ce beau début de millénaire:
  • 29 = 512 milliards maintenant, en 2016
  • 28 = 256 en 2012
  • 27 = 128 en 2010
  • 26 = 64 en 2009
Il n'y en avait qu'un peu plus de 32 milliards en 2000.

Les lecteurs de ce blog savent combien nous aimons les puissances de deux (cf nos activités dans le domaine de la compression de données). Il fallait donc fêter dignement ce passage de cap logarithmique avec un graphique mis à jour, que voici:

Cliquer pour agrandir 
Pour ceux que ce graphique, intuitivement, pourrait inquiéter: rassurez-vous!
La progression de la création monétaire de la BNS est beaucoup plus tempérée que, par exemple, celle de son homologue argentine:

Cliquer pour agrandir

Dormez tranquilles, braves gens!

mardi 4 octobre 2016

Economie septembre 2016

Suisse...
Taux hypothécaires CHF

Taux à risque minimal CHF

...Monde:
Avindex - "appétit pour le risque" financier

Baltic Dry Index

mardi 20 septembre 2016

LZMA parametrization

One fascinating property of the LZMA data compression format is that it is actually a family of formats with three numeric parameters that can be set:

  • The “Literal context bits” (lc) sets the number of bits of the previous literal (a byte) that will be used to index the probability model. With 0 the previous literal is ignored, with 8 you have a full 256 x 256 Markov chain matrix, with probability of getting literal j when the previous one was i.
  • The “Literal position” (lp) will take into account the position of each literal in the uncompressed data, modulo 2lp. For instance lp=1 will be better fitted for 16 bit data.
  • The pb parameter has the same role in a more general context where repetitions occur.

For instance when (lc, lp, pb) = (8, 0, 0) you have a simple Markov model similar to the one used by the old "Reduce" format for Zip archives. Of course the encoding of this Markov-compressed data is much smarter with LZMA than with "Reduce".
Additionally, you have a non-numeric parameter which is the choice of the LZ77 algorithm – the first stage of LZMA.

The stunning thing is how much the changes in these parameters lead to different compression quality. Let’s take a format difficult to compress as a binary data, losslessly: raw audio files (.wav), 16 bit PCM.
By running Zip-Ada's lzma_enc with the -b (benchmark) parameter, all combinations will be tried – in total, 900 different combinations of parameters! The combination leading to the smallest .lzma archive is with many .wav files (but not all) the following: (0, 1, 0) – list at bottom [1].
It means that the previous byte is useless for predicting the next one, and that the compression has an affinity with 16-bit alignment, which seems to make sense. The data seems pretty random, but the magic of LZMA manages to squeeze 15% off the raw data, without loss. The fortuitous repetitions are not helpful: the weakest LZ77 implementation gives the best result! Actually, pushing this logic further, I have implemented for this purpose a “0-level” LZ77 [2] that doesn’t do any LZ compression. It gives the best output for most raw sound data. Amazing, isn’t it? It seems that repetitions are so rare that they output a very large code through the range encoder, while weakening slightly and temporarily the probability of outputting a literal - see the probability evolution curves in the second article, “LZMA compression - a few charts”.
Graphically, the ordered compressed sizes look like this:



and the various parameters look like this:

The 900 parameter combinations

The best 100 combinations

Many thanks to Stephan Busch who is maintaining the only public data compression corpus, to my knowledge, with enough size and variety to be really meaningful for the “real life” usage of data compression. You find the benchmark @ http://www.squeezechart.com/ . Stephan is always keen to share his knowledge about compression methods.
Previous articles:
____
[1] Here is the directory in descending order (the original file is a2.wav).

37'960 a2.wav
37'739 w_844_l0.lzma
37'715 w_843_l0.lzma
37'702 w_842_l0.lzma
37'696 w_841_l0.lzma
37'693 w_840_l0.lzma
37'547 w_844_l2.lzma
...
32'733 w_020_l0.lzma
32'717 w_010_l1.lzma
32'717 w_010_l2.lzma
32'707 w_011_l1.lzma
32'707 w_011_l2.lzma
32'614 w_014_l0.lzma
32'590 w_013_l0.lzma
32'577 w_012_l0.lzma
32'570 w_011_l0.lzma
32'568 w_010_l0.lzma

[2] In the package LZMA.Encoding you find the very sophisticated "Level 0" algorithm

    if level = Level_0 then
      while More_bytes loop
        LZ77_emits_literal_byte(Read_byte);
      end loop;
    else
      My_LZ77;
    end if;

Hope you appreciate ;-)