Rev 34: Another disk-format bump. in http://bazaar.launchpad.net/%7Ebzr/bzr-groupcompress/trunk
John Arbash Meinel
john at arbash-meinel.com
Thu Mar 5 17:21:06 GMT 2009
At http://bazaar.launchpad.net/%7Ebzr/bzr-groupcompress/trunk
------------------------------------------------------------
revno: 34
revision-id: john at arbash-meinel.com-20090305172017-mefnbegtuk4vt99i
parent: john at arbash-meinel.com-20090304223810-agw3duzy5tul01da
parent: john at arbash-meinel.com-20090305165238-o5be2o7v8wzewnlk
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: trunk
timestamp: Thu 2009-03-05 11:20:17 -0600
message:
Another disk-format bump.
Move the labels/sha1 information into a pre-header. This also makes it
easier to decide to enable/disable the headers, as we can support
both with the same deserialising code (at least until we remove
the extra info from the indexes.)
This also makes a fulltext record stream start with 'f' and a delta
record stream start with 'd', which makes them more self describing.
The next step would probably be to write the base128 length of the
encoded bytes, which would make them fully independent, though
you wouldn't know what content they refer to.
This also brings in an update to .compress() which allows us to
see that we overflowed our group, roll back and start a new one.
This seems to give better compression in a 'more stable' manner.
Still open to tweaking, though.
Also introduce the 'gcc-chk255-big' which uses 64k leaf pages
rather than 4k leaf pages. Initial results show smaller compressed
size at a small (10%) increase in uncompressed size. Also shows
a full level decrease in the tree depth.
No-labels decreases the inv size approx 300kB, and big-page decreases
the inv size another 300kB, not to mention the 116k decrease in the
.cix index, just from not having the extra pages.
Having both no-labels and big inv pages brings a total drop of
11023k down to 9847k for the repo (1176kB savings, or 10% overall).
For now, leave the default with labels, but consider changing it.
removed:
equivalence_table.py equivalence_table.py-20080723225607-fk4rlr7rm1wln8w4-1
modified:
__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
errors.py errors.py-20080705181503-ccbxd6xuy1bdnrpu-7
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 32.1.16
revision-id: john at arbash-meinel.com-20090305165238-o5be2o7v8wzewnlk
parent: john at arbash-meinel.com-20090305154227-41elarat0xs75c1p
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Thu 2009-03-05 10:52:38 -0600
message:
Make sure we don't inter-pack to GCCHKBig repos.
Change the code so that we can branch from a source that has no labels
even if we don't have _NO_LABELS set locally.
Restore labels and slow as the default.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
------------------------------------------------------------
revno: 32.1.15
revision-id: john at arbash-meinel.com-20090305154227-41elarat0xs75c1p
parent: john at arbash-meinel.com-20090305132400-k1i3iw0vz53oywy0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Thu 2009-03-05 09:42:27 -0600
message:
Implement a 'bigpage' version of chk serializer, which uses 64kB pages for leaf nodes. (this is approx 255 leaf entries, similar to the internal fan out.)
modified:
__init__.py __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
------------------------------------------------------------
revno: 32.1.14
revision-id: john at arbash-meinel.com-20090305132400-k1i3iw0vz53oywy0
parent: john at arbash-meinel.com-20090305042604-9d9sl2idrw3lvlqu
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Thu 2009-03-05 07:24:00 -0600
message:
Fix a bug in 'FAST' handling.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 32.1.13
revision-id: john at arbash-meinel.com-20090305042604-9d9sl2idrw3lvlqu
parent: john at arbash-meinel.com-20090305040549-1egrt0x9kqzl3d7j
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Wed 2009-03-04 22:26:04 -0600
message:
bring back the code that handles _NO_LABELS.
Basically, we omit the header, and just hold the content.
This drops the chk from 1.5MB => 1.1MB, and the texts from 8.1=>7.7
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 32.1.12
revision-id: john at arbash-meinel.com-20090305040549-1egrt0x9kqzl3d7j
parent: john at arbash-meinel.com-20090305034657-t3qbsogy187yul4z
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Wed 2009-03-04 22:05:49 -0600
message:
Add a single byte to indicate whether the following text is a fulltext
or a delta.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 32.1.11
revision-id: john at arbash-meinel.com-20090305034657-t3qbsogy187yul4z
parent: john at arbash-meinel.com-20090305032949-ffww56phklv1vhbj
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Wed 2009-03-04 21:46:57 -0600
message:
Slightly different handling of large texts.
We should only use 2*max_fulltext as a minimum size if we are still working
on the same file. That allows us to avoid packing all texts in
after an ISO.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 32.1.10
revision-id: john at arbash-meinel.com-20090305032949-ffww56phklv1vhbj
parent: john at arbash-meinel.com-20090304223243-xrg48jyhczvpkjxc
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Wed 2009-03-04 21:29:49 -0600
message:
Play around with detecting compression breaks.
Trying to get tricky with whether the last insert was a fulltext or delta
did not pay off well (yet).
However, using similar logic actually shows some of the best results yet.
The main difference is probably that we detect overflow and rollback.
So if we got a big fulltext that pushes us over the line, in the past
we would leave it alone (poorly compressed in the last group),
and start a new group, which would start off with a new fulltext.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 32.1.9
revision-id: john at arbash-meinel.com-20090304223243-xrg48jyhczvpkjxc
parent: john at arbash-meinel.com-20090304214211-rg22q09z8queeer0
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress
timestamp: Wed 2009-03-04 16:32:43 -0600
message:
Add some benchmark results for various flush sizes.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 32.1.8
revision-id: john at arbash-meinel.com-20090304214211-rg22q09z8queeer0
parent: john at arbash-meinel.com-20090304212250-xcvwt1yx4zt76pev
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress
timestamp: Wed 2009-03-04 15:42:11 -0600
message:
Fix up the tests. Mostly it was just changing things to
no longer include the labels.
It also means we get a positive compression ratio :).
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 32.1.7
revision-id: john at arbash-meinel.com-20090304212250-xcvwt1yx4zt76pev
parent: john at arbash-meinel.com-20090304210622-ur7wz2dz0w4lhzn3
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress
timestamp: Wed 2009-03-04 15:22:50 -0600
message:
Have the GroupCompressBlock decide how to compress the header and content.
It can now decide whether they should be compressed together or not.
As long as we make the to_bytes() function match the from_bytes() one, we should be fine.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
------------------------------------------------------------
revno: 32.1.6
revision-id: john at arbash-meinel.com-20090304210622-ur7wz2dz0w4lhzn3
parent: john at arbash-meinel.com-20090304183131-p433dz5coqrmv8pw
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress
timestamp: Wed 2009-03-04 15:06:22 -0600
message:
(tests broken) implement the basic ability to have a separate header
This puts the labels/sha1/etc together, and then has the actual content deltas
combined later on.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
repofmt.py repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 32.1.5
revision-id: john at arbash-meinel.com-20090304183131-p433dz5coqrmv8pw
parent: john at arbash-meinel.com-20090304182042-yo1m7n2i2bpdldfl
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress
timestamp: Wed 2009-03-04 12:31:31 -0600
message:
Now using a zlib compressed format.
We encode the length of the compressed and uncompressed content,
and then compress the actual content.
Need to do some testing with real data to see if this is efficient
or if another structure would be better.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 32.1.4
revision-id: john at arbash-meinel.com-20090304182042-yo1m7n2i2bpdldfl
parent: john at arbash-meinel.com-20090304180240-xbl3a604h819an7y
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress
timestamp: Wed 2009-03-04 12:20:42 -0600
message:
We at least have the rudimentary ability to encode and decode values.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 32.1.3
revision-id: john at arbash-meinel.com-20090304180240-xbl3a604h819an7y
parent: john at arbash-meinel.com-20090304170218-c3thty7hh2yfrnye
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: groupcompress
timestamp: Wed 2009-03-04 12:02:40 -0600
message:
Add a encode/decode base128 functions.
Not entirely sure if I'll use them, but they may come in handy.
modified:
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 32.1.2
revision-id: john at arbash-meinel.com-20090304170218-c3thty7hh2yfrnye
parent: john at arbash-meinel.com-20090304165605-zbap3q69laok4o6p
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Wed 2009-03-04 11:02:18 -0600
message:
First cut at meta-info as text form.
modified:
errors.py errors.py-20080705181503-ccbxd6xuy1bdnrpu-7
groupcompress.py groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
tests/test_groupcompress.py test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
------------------------------------------------------------
revno: 32.1.1
revision-id: john at arbash-meinel.com-20090304165605-zbap3q69laok4o6p
parent: john at arbash-meinel.com-20090304161119-wjb6l5idp2k9niwq
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: internal_index
timestamp: Wed 2009-03-04 10:56:05 -0600
message:
fully remove the eq table for now.
removed:
equivalence_table.py equivalence_table.py-20080723225607-fk4rlr7rm1wln8w4-1
-------------- next part --------------
Diff too large for email (1081 lines, the limit is 1000).
More information about the bazaar-commits
mailing list