MySQL in chk-inventory

John Arbash Meinel john at arbash-meinel.com
Wed Dec 10 22:05:01 GMT 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Collins wrote:
> Could you toss up the source repo details too please?
> 
> Its nice that this is looking as-expected. I'm a bit concerned about it
> hitting 1GB of memory - do you have any guesses? I'd guess at inventory
> sizes of the xml inventories..
> 
> -Rob

As for the memory consumption... I'm not really sure.

Doing "for rt in repo.revision_trees(last_100):" seems to consume 320MB.

It does seem to be the total bytes consumed by all of the inventory
bytes. Specifically, just before we call "_get_content_maps" we are
using 12MB of RAM.

Right after get_content_maps, we have read all the records, by they are
lists of lines. Memory has gone up to 34MB.

We are careful in get_record_stream() to only buffer one text at a time,
but _iter_inventories_xml() does a get_bytes_as('fulltext') and buffers
it for all records.

Which causes our carefully shared lines to be thrown out and replaced by
something that is buffering all the inventories as individual byte strings.

(Another case of where 'fulltext' is less optimal then something like
'chunked'.)

I can change the _iter_inventory_xmls so that it only buffers what it
needs to, in order to return the texts in requested order.

Oddly enough 'unordered' returns a very poor ordering (something like 90
of the texts get buffered). Some of that is because I'm explicitly
requesting them in topological ordering.

It is even easier to change the _iter_inventories_xml code to release
the texts as it yields them. And when doing that, you can see the memory
spike, and then gradually decrease as they are yielded.

Still, I think it would be better to have a "get_record_stream()" that
can return 'chunked' texts, which then get cast up to a simple string
only at the last second (for the sake of the xml parsing code).

Anyway, you are right that a lot of the memory consumption is because we
expand all of 100 inventory bytes into separate strings, and then buffer
all of it, and hang onto it until all revisions have been processed.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAklAPQ0ACgkQJdeBCYSNAAOsDgCfdbgKW+LUwaXPB3KU4gHsbMxt
WqkAnA/X+Pdtn5Rp0jvSXmc8QzVTJN7k
=Re0y
-----END PGP SIGNATURE-----



More information about the bazaar mailing list