[MERGE] Updates to btree index

John Arbash Meinel john at arbash-meinel.com
Sat Aug 23 02:02:31 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Attached is a couple tweaks and updates to the new btree index code.

1) The big change is in the "prediction" logic. Specifically, we used to use:

if ((total_bytes_seen + new_bytes) * estimated_min_compression
    < output_size):
  compress new bytes without sync
else:
  compress and sync, and check for size

This works, but has a few issues
  a) You have to get 'estimated_min_compression' right. There is temptation to
set it high, because you don't have to evaluate much, but you need to keep it
lower than it will ever reach, or you will add bytes that won't fit.

  b) Ones you see more than X bytes, all future bytes will require a sync.

So instead, I changed the logic to:

remaining_size = output_size - len_out_so_far
if new_bytes < remaining_size:
  compress new bytes without sync
else:
  compress and sync, check for size

The big wins are:

  a) After a compress, we usually free up some space, so the next bytes can be
     added without having to check. Typical compression is about 5:1, but you
     have to set min compression to 2:1 or less. In this case, we set our
     "estimated" compression to only 1:1, but then can take advantage of
     knowing that the first N bytes are already compressed.

  b) Basically, this ends up faster and more tightly packed. Because we call
     flush(SYNC) less often.


2) With the improved compression, I ran into something weird. Specifically the
"test_too_much_data_does_not_exceed_size".
For whatever reason, when I'm doing flush(SYNC) a few times, it compresses
*better* than compressing all input data. I assume zlib is confused by the
structure in the data.
It is a bit artificial, but basically my workaround is to trigger a 'repack'
earlier. Because we do 2 repacks this doesn't hurt real-world performance, and
it safely handles this case.

3) Adds some minor stats tracking like we have for btree index. I'm not stuck
on leaving it in, but it was kind of nice for figuring out what specifically
was going on.

4) It simplifies some tests. Partially by compacting the key strings. For example:
- -
"503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503503\n"
- -            )
+            ) + ("635" * 40) + "\n"

The 503 became "635" but it is much easier to update that in the
multiplicative form than the string form.

This also shows the improved compression. We went from fitting 503 keys to
fitting 635 in a single page. (This is a bit overstated because or test keys
are horrendously simple and compress very well.)

5) I cheat for the "three_level" and "iter_all" tests that need enough pages
to create 3-levels of btree index. Basically, I allow those tests to set the
page size down to 2kB instead of 4kB. Which allows us to build a deeper tree
with 20k nodes instead of needing 100k nodes. The test takes ~5s which I think
is reasonable. (Down from about 30s with the new packing and 4k pages.)

The rest of the test changes are just because more stuff is being packed into
a page, so the pages sizes are shrinking.

6) I included some benchmarks with real world data. With the default setting,
repacking my bzr.dev repo went from:

9.6s 4.4MB  down to  8.4s 4.2MB

And packing my mysql repo was

55.8s 14.1MB  down to 54.4s 13.6MB

So, about a 5-10% gain, but the really nice thing is that it is a gain in
*both* speed and space.


7) What do you think about changing the algorithm now to favor speed over
compactness, and set "_max_repack = 1". The indexes don't bloat as much as
they used to. And as long as we have "bzr pack" set _max_repack to something
higher (20 is more than high enough) we can still get the maximally-compact
indexes at the appropriate time.

Also, we could do a bit better if we didn't have to worry about
"safety_margin". I don't think we'll ever trigger the problem in real-world
data. I just had to do it because the testing data is a-typical.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIr2GnJdeBCYSNAAMRAhNDAJ0b7h5lOA+4Xmy4pAkSSLDZpxia1QCfW3t4
W1f2evpOXDqzXquYApzvWvw=
=PbBi
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: btree2.patch
Type: text/x-diff
Size: 33601 bytes
Desc: not available
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080822/cc3a6c0b/attachment-0001.bin 


More information about the bazaar mailing list