sample large weave

Martin Pool martinpool at gmail.com
Wed Aug 17 21:20:44 BST 2005


I imported the changelog and Makefile.in from gcc into a bzrlib weave.
 (Not a full tree, yet, just these files.)  These are real examples of
files that are large and have long histories.

Both compress very well:

% weave stats gcc-Makefile.in.weave
versions                276
weave file          2979815 bytes
total contents    109245151 bytes
compression ratio     36.66x
average size         395815 bytes
relative size          7.53x

% weave stats =(zcat gcc-changelog.weave.gz)
versions               1112
weave file           507721 bytes
total contents    317646022 bytes
compression ratio    625.63x
average size         285652 bytes
relative size          1.78x

-r--rw-r--  1  1837342 2005-08-17 18:51 ChangeLog,v
-r--r--r--  1  1077955 2005-08-17 20:00 Makefile.in
-r--rw-r--  1 18510053 2005-08-17 18:51 Makefile.in,v
-rw-r--r--  1   202088 2005-08-17 20:00 gcc-Makefile.in.weave.gz
-rw-r--r--  1   151812 2005-08-17 22:42 gcc-changelog.weave.gz

The gzipped version of the Makefile.in weave, storing 276 versions, is
actually less than half the size of the current Makefile.in text.

The performance, I would say, is reasonably good.  add performance
tends to bog down to nearly a second on the Makefile.in, which I
suspect is because the pure-python difflib diff is too slow on large
files.  That should be fixable.

Extraction is quite fast: annotate of a 12038-line version of the
changelog takes 0.2 user seconds.

The files are in http://bazaar-ng.org/tmp/ in case anyone wants to
experiment with them.

-- 
Martin




More information about the bazaar mailing list