[RFC] Change revision_id caching

Sun Mar 30 10:36:31 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

At the moment, when we read revision ids from various locations, we cache them
in a utf8 and unicode dictionaries. We do this because officially revision_ids
are utf8 strings and we slowly moved the internals over to dealing with them as
only utf8 (and never unicode directly, except when writing them to the XML form).

Anyway, get_cached_utf8() is nice for giving us unique revision_id strings in
memory, but it can probably get a bit bloated. It also has the overhead of
.decode('UTF8') for every revision_id.

If we trust most of the internals to not need it, then I think it would be
reasonable to switch to an LRUCache of plain 8-bit strings, and not worry about
the Unicode side. If we want to be safe, we can have checks where users might be
inputting data. (So we check at Branch.set_last_revision_info(), but we don't
have to check on every Branch.last_revision_info() call.)

I would like to move to an LRUCache anyway, since it will keep memory
consumption lower. Though it will add more runtime overhead...

Thoughts?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFH718fJdeBCYSNAAMRAjGzAJwKJ4+7uRKrfzagisl8SL5KuXBvJgCguKG1
Nhg6igiC4quMnkI0Bjb+evk=
=U8bT
-----END PGP SIGNATURE-----