Initial results using 'hash trie'

Thu Dec 25 16:58:07 GMT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...
> 
> So it seems that adding delta compression does a lot to "homogenize" the
> results. The 255-way version had been using 16MB compressed, and now it
> drops to 4MB, while the 16-way was 7.8MB and only drops to 4.5MB.
> 
> Also worth noting, though, is that the time to convert with delta
> compression increases a lot. From about 2min to 3m30s. (for the 16-way
> it is 2m40s) It was still bad even when I tried to add code to use any
> cached parent texts. So it seems to be the time to compute the delta,
> rather than the time to extract the previous text.

So I posted one patch which fixed up the 3m30s, by properly using the
page cache during _check_remap.

I've worked with the code some more, with the longer conversion, and
I'll have some more _check_remap patches coming. There are 2
possibilities I'm exploring:

1) page_cache thrashing. It was set to 2MB, which isn't enough to fit
the 255-internal nodes (takes approx 3MB).

2) During _check_remap we need to start paging in unread nodes, to find
out if we could fit everything into a single leaf node. Originally we
would read all nodes and then work it out, I changed it to read just 10
nodes at a time. This should also help with (1) because we aren't
reading a lot of extra data that we may not need.

In general, though, I've found that the overhead is actually in the
index layer. Some of it is changing the code to decrease how much we
have to hit the index layer. I think also when you have knit-delta's,
you spin a bit more to determine the build chains, etc.

The way the deltas are generated, they are over the 'history' of a given
leaf/internal node. Which means that there isn't any shared info between
the pages. So when reading 200 leaf nodes, we probably have to hit
200*100=20,000 knit hunks. (for 100 revisions, we call
btree.iter_entries 495,000 times, and 900,000 times for a different 100
revisions.)

Even if we decrease that portion, I'm also noticing that the chk index
is starting to take up a sizable portion of the repository.

du packs indices
338M	packs
40M	indices

So now the indexes are 10.5% of the total repository size. Compare that
with a 1.9 format repo:
591M    packs
16M     indices
2.6% for indices

The bulk of that is the chk index:
du indices/*.cix
27M

27/40 = 67.5% of all index size, or 7.1% of the total repository size.

Even further, sha hashes are randomly distributed, so you don't have any
locality to exploit.

Oh, and I also found that when building up the deltas, using the direct
parents saves both space on disk, and computation time. I use the 100
revisions in a batch, and then compare the inventory against all present
parents, and determine what would make the smallest delta.

My guess is that the time for "_make_delta" is rather short once they
are already in memory, and using a better delta means we don't have to
page in as many extra nodes, especially ones that would end up being
redundant after the apply delta. (Interestingly, the left-hand parent is
better about 7 times in 10.)

On the plus side, chk-inventories with 255-way split out and delta
compression, when using the 'optimal' parent when possible, gives

Commits: 26100
                      Raw    %    Compressed    %  Objects
Inventories:   569920 KiB   6%     86605 KiB  38%   321977

However, this figure doesn't include the 27700KiB for the .cix index
itself. Which is ~1/3rd of the compressed size for the actual inventory
texts.

38% is pretty good, given that the 1.9 format is 35%, and you don't get
any compression between the various delta hunks in split inventory (I'm
thinking mostly stuff like 'revision' which should be fairly identical
for all the new nodes).

I would guess that the index would be a bit smaller if we used a
different delta compression, as we could store less in the index.
However, I *am* still concerned about the fragmentation overhead.

On the other hand, it is something that we can easily tune with our
existing code. We can change the fan-out and the LeafNode size. My
concern is that the code is still designed that when the LeafNode
splits, it pretty much splits down to 1 entry per node.

At the moment, 4k pages fit at most 20 entries, but an internal node
will tend to split to point at 16 or 256 LeafNodes. So even if the "max"
size is 4k, the average size is actually only a few entries on a node,
which is 400 bytes. (Or about 1/10th the size we actually want.)

Next I might try just bumping up the max leaf node size, and see what
happens. It will probably cause the raw size to go up, but I would guess
the compressed size wouldn't change much.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklTu58ACgkQJdeBCYSNAAN/RACfXDr/qCjOdRxe9u9UduaEc7dN
hecAoKieJh7Nu/WS8SdCxfURdIRvm+LH
=dD2R
-----END PGP SIGNATURE-----