Rethinking conversions to rich-root data

Sat Mar 21 03:32:22 GMT 2009

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've been looking at our index, etc code, and I noticed that we have a
surprisingly large number of records that are associated with no
content. I then realized that this was at least partially because
*every* revision for a conversion is now generating a new root node.

As a specific example, python.org's repository has 256k entries in the
per-file graph. Of those, 141k of them refer to no content. That is more
than half (55%). Some of those are because of various directories,
renames, etc. It is also a bzr-svn conversion, which may effect things.
But launchpad also has a similar 140k versus 240k entries in the .tix
that refer to no actual changes, given that lp has 54k revs, a large
portion of those are just the root being noted on every revision.

At a minimum, that fake update is 54k/240k total changes, which is
causing at least 54k/350k chk nodes to be updated, and bloat in the .tix.

Do we *really* need to fake all of these root changes? I know there were
like 2 branches out there that actually had a root id for about 3
revisions.  Couldn't we just force that all revision roots from non-rr
trees are fixed, and avoid preserving this bloat for the rest of history?

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknEX8YACgkQJdeBCYSNAAN3CwCgtr91Y//djmuYq2BjYrjCJO/O
7ycAoIx+w6HnuFf+nXNONHdwQoyTlD37
=gNB6
-----END PGP SIGNATURE-----