Rethinking conversions to rich-root data
John Arbash Meinel
john at arbash-meinel.com
Sat Mar 21 03:32:22 GMT 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I've been looking at our index, etc code, and I noticed that we have a
surprisingly large number of records that are associated with no
content. I then realized that this was at least partially because
*every* revision for a conversion is now generating a new root node.
As a specific example, python.org's repository has 256k entries in the
per-file graph. Of those, 141k of them refer to no content. That is more
than half (55%). Some of those are because of various directories,
renames, etc. It is also a bzr-svn conversion, which may effect things.
But launchpad also has a similar 140k versus 240k entries in the .tix
that refer to no actual changes, given that lp has 54k revs, a large
portion of those are just the root being noted on every revision.
At a minimum, that fake update is 54k/240k total changes, which is
causing at least 54k/350k chk nodes to be updated, and bloat in the .tix.
Do we *really* need to fake all of these root changes? I know there were
like 2 branches out there that actually had a root id for about 3
revisions. Couldn't we just force that all revision roots from non-rr
trees are fixed, and avoid preserving this bloat for the rest of history?
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAknEX8YACgkQJdeBCYSNAAN3CwCgtr91Y//djmuYq2BjYrjCJO/O
7ycAoIx+w6HnuFf+nXNONHdwQoyTlD37
=gNB6
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list