[RFC] gzip bundle data

Goffredo Baroncelli kreijack at tiscalinet.it
Mon Nov 13 19:16:15 GMT 2006


Hi all,

I wrote a little patch, which implements the compression of the base64-encoded 
data inside the bundle format. The data part is labeled "encoding:base64+gz" 
instead "encoding:base64".

The gains are showed below:

                 number of rev vs size of bundle

			   v0.8   gzip(bundle-v0.8)  gzip+bundle   b64enc(gzip(v0.8))
bzr bundle -r -80..-1	21901342   5793884 (26%)     8933882 (40%) 7853978 (35%)
bzr bundle -r -40..-1	 8467581   2366873 (27%)     3382841 (39%) 3208475 (37%)
bzr bundle -r -20..-1	 3591438   1062965 (29%)     1359907 (37%) 1440955 (40%)
bzr bundle -r -10..-1	 1237068    394742 (31%)      482154 (38%)  535142 (43%)
bzr bundle -r -5..-1	  682035    217855 (31%)      261169 (38%)  295362 (43%)

where:
* v0.8			standard bundle format v0.8
* gzip(bundle-v0.8)	gzipped standard bundle format v0.8 ( bzr bundle ... |
                        gzip -9 )
* gzip+bundle		bundle with data section gzipped then encoded base64
* b64enc(gzip(v0.8))	gzipped standard bundle format v0.8 then encoded base64
                        (bzr bundle ... | gzip -9 | uuencode -m ... )


The table abowe, shows that is better to gzip all bundle. But in this case we 
have the problem of sending a binary file. In fact if we consider the 
encoding of the gzipped file, the result is that the size is bigger than the 
bundle with the gzipping of the data section.


Below an example of the bundle format:

[...] 
# message:
#   Oops, fix the message up
# committer: Daniel Silverstone <dsilvers at digital-scurf.org>
# date: Sat 2006-11-11 16:27:08.082000017 +0000

=== modified file bzrlib/tests/blackbox/test_remove_tree.py // 
encoding:base64+
... gz
H4sIAA25WEUC/5TQwWrDMAwG4HueQvSSFscJbIeyQyFQ9gg7jDGC7SqNiWsV2Vm7Pf2chm0NrIzp
ILAtf/xISgn6g53VVcQQQ6WdMr2m8+XYMB7oDZvIiOXxPRNC/GO6rkGuH4o1iEuv6wy+KqBry1ZZ
9+QdhvB4tklb5pqVN91d1RLlq59pCqXpdpa/B2aPQzwOETaTyYNvUsIGmYmXmYSrellslfcUYcoJ
J+Le+j2MeaFlOoCz+y6ecOxgOjR9whevRSbmzDMNYGZU7H7j1E3w2oN8QuT4MS8SGQ3tcHO/+nNf
N/dUluk++wQAAP//AwAZdi1V3QEAAA==
[...]


My first concern, is if we can update the format 0.8 or we have to introduce a 
new bundle forma (v0.10 or v0.9 ? ).

The patch is only for RFC. In fact the current code encode(base64) in the 
serializer and decode(base64) in the main code, which IMHO is not the "right 
thing to do" (C) :-).
If we are interested in the idea I want to move the encoding/deconding in the 
serializer. Then I will apply the compress layer

Goffredo

-- 
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack at inwind.it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87  87C0 BB86 505C 6B2A CFF9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bundle-gzip.diff
Type: text/x-diff
Size: 3707 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061113/4e3bc9b1/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061113/4e3bc9b1/attachment.pgp 


More information about the bazaar mailing list