[MERGE] intern() various file_id/revision_id

John Arbash Meinel john at arbash-meinel.com
Wed Mar 4 04:01:54 GMT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This changes the BTreeIndex parser and the xml deserializer to intern()
file_ids and revision_ids that are processed. This is generally done in
the hope that it will decrease memory consumption.

For xml, we used to use cache_utf8.get_cached_ascii(), which maps the
string to a Unicode representation, and then back again. But we don't
ever use the Unicode representation of file_ids or revision_ids, so we
are wasting a decent amount of memory if we do so. (I believe on Linux
the default is a UCS-4 unicode char, so it costs 4-bytes of memory per
string byte.)

After enabling this, I didn't see a huge memory difference. I'm not
really sure why, though perhaps the memory used by the strings is
actually dwarfed by the memory used by some other object (like a large
intern dict?)

Anyway, I did the work for it, and I think it is the route we want to
go, so I figured I might as well submit it for approval.

I also wanted to look at "interning" some tuples, since we now use
'keys' everywhere. However, if you put a tuple in a dict, it will live
until the cache is cleared (the string deallocator knows explicitly
about the interned dict). Further, you can't 'weakref()' a tuple. So
there isn't an easy structure for caching them long enough without
caching them forever.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmt/TEACgkQJdeBCYSNAAMjSACgq/XD0BYz8dfJ/b/hW4OVszs7
lt0AoIvNnR8o+Xo5sVMwIFPqXbEtejlS
=xt5a
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: intern_keys.patch
Url: https://lists.ubuntu.com/archives/bazaar/attachments/20090303/b7374f96/attachment.diff 


More information about the bazaar mailing list