[MERGE] Set allow_optimize=False when spilling btree content to disk

John Arbash Meinel john at arbash-meinel.com
Mon Mar 23 19:49:05 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Robert Collins has voted tweak.
> Status is now: Conditionally approved
> Comment:
> Hmm, I may need to digest this. Re: the change to the builder - no
> problems there. It would be ideal though, if you could arrange it, to
> have a hint to the builder. pack and autopack definitely won't be
> reading back, commit however may be - so lets look at:
>  - being able to never combine backing store

I went ahead and made it another flag to the existing "set_optimize()"
and then set it up so that Packer.open_pack() does the appropriate
set_optimize().

>  - maintaining the backing store with a count cap that is a function of
> nodes-so-far. There is no point combining two 100K node indices to avoid
> searches. But there is a point in combining 10 - one 1M node index is
> what 3 levels vs 2 levels, so 20 queries vs 3.

This is certainly something to look at. It isn't something that I think
we have a lot of benchmarking with. Since you need >100k nodes before
spill happens at all, and then you need 200k before anything would be
combining. CHK happens to trigger this more, because we end up with more
nodes. (Especially with something like --gc-chk16.)

> 
> As for random_id, it doesn't matter if it would collide with a different
> pack - that won't be picked up today anyway because we have a new-pack
> vf. As long as we won't emit the same id twice ourselves it will always
> be fine.
> 

Thanks for confirming that. I'm a little bit concerned about our
tendency to create huge sets of a large number of keys, as I'm noticing
stuff like "bzr pack" is using a lot more memory than I think it should
be. Some of that is the refcycle stuff which I'm already working on a
fix. (And also what prompted some of the LRUCache changes).

I also wonder about having fetch/pack write the final index down on disk
once that phase is done. (We don't need to keep the whole xxx.rix in
memory once we are working on chk nodes/text nodes.) I was seeing
something like 10MB saved if I did:

  for index in source_vf._index._graph_index._indices:
   index.leaf_node_cache.clear()
  target_vf._index._graph_index._spill_mem_keys_to_disk()

But 10MB doesn't really explain why I'm seeing 500MB during 'bzr pack'...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknH57EACgkQJdeBCYSNAAMARACfQxBJQiAynR6QG8cRG+VSDdI5
pCIAoJBWndLPd9KgI7ZnaT1yb1ADQ74J
=eai+
-----END PGP SIGNATURE-----



More information about the bazaar mailing list