[RFC] gzip bundle data
Goffredo Baroncelli
kreijack at tiscalinet.it
Mon Nov 13 19:16:15 GMT 2006
Hi all,
I wrote a little patch, which implements the compression of the base64-encoded
data inside the bundle format. The data part is labeled "encoding:base64+gz"
instead "encoding:base64".
The gains are showed below:
number of rev vs size of bundle
v0.8 gzip(bundle-v0.8) gzip+bundle b64enc(gzip(v0.8))
bzr bundle -r -80..-1 21901342 5793884 (26%) 8933882 (40%) 7853978 (35%)
bzr bundle -r -40..-1 8467581 2366873 (27%) 3382841 (39%) 3208475 (37%)
bzr bundle -r -20..-1 3591438 1062965 (29%) 1359907 (37%) 1440955 (40%)
bzr bundle -r -10..-1 1237068 394742 (31%) 482154 (38%) 535142 (43%)
bzr bundle -r -5..-1 682035 217855 (31%) 261169 (38%) 295362 (43%)
where:
* v0.8 standard bundle format v0.8
* gzip(bundle-v0.8) gzipped standard bundle format v0.8 ( bzr bundle ... |
gzip -9 )
* gzip+bundle bundle with data section gzipped then encoded base64
* b64enc(gzip(v0.8)) gzipped standard bundle format v0.8 then encoded base64
(bzr bundle ... | gzip -9 | uuencode -m ... )
The table abowe, shows that is better to gzip all bundle. But in this case we
have the problem of sending a binary file. In fact if we consider the
encoding of the gzipped file, the result is that the size is bigger than the
bundle with the gzipping of the data section.
Below an example of the bundle format:
[...]
# message:
# Oops, fix the message up
# committer: Daniel Silverstone <dsilvers at digital-scurf.org>
# date: Sat 2006-11-11 16:27:08.082000017 +0000
=== modified file bzrlib/tests/blackbox/test_remove_tree.py //
encoding:base64+
... gz
H4sIAA25WEUC/5TQwWrDMAwG4HueQvSSFscJbIeyQyFQ9gg7jDGC7SqNiWsV2Vm7Pf2chm0NrIzp
ILAtf/xISgn6g53VVcQQQ6WdMr2m8+XYMB7oDZvIiOXxPRNC/GO6rkGuH4o1iEuv6wy+KqBry1ZZ
9+QdhvB4tklb5pqVN91d1RLlq59pCqXpdpa/B2aPQzwOETaTyYNvUsIGmYmXmYSrellslfcUYcoJ
J+Le+j2MeaFlOoCz+y6ecOxgOjR9whevRSbmzDMNYGZU7H7j1E3w2oN8QuT4MS8SGQ3tcHO/+nNf
N/dUluk++wQAAP//AwAZdi1V3QEAAA==
[...]
My first concern, is if we can update the format 0.8 or we have to introduce a
new bundle forma (v0.10 or v0.9 ? ).
The patch is only for RFC. In fact the current code encode(base64) in the
serializer and decode(base64) in the main code, which IMHO is not the "right
thing to do" (C) :-).
If we are interested in the idea I want to move the encoding/deconding in the
serializer. Then I will apply the compress layer
Goffredo
--
gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack at inwind.it>
Key fingerprint = CE3C 7E01 6782 30A3 5B87 87C0 BB86 505C 6B2A CFF9
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bundle-gzip.diff
Type: text/x-diff
Size: 3707 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061113/4e3bc9b1/attachment.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20061113/4e3bc9b1/attachment.pgp
More information about the bazaar
mailing list