Rethinking conversions to rich-root data

Sat Mar 21 03:43:49 GMT 2009

John Arbash Meinel wrote:
> I've been looking at our index, etc code, and I noticed that we have a
> surprisingly large number of records that are associated with no
> content. I then realized that this was at least partially because
> *every* revision for a conversion is now generating a new root node.
>
> As a specific example, python.org's repository has 256k entries in the
> per-file graph. Of those, 141k of them refer to no content. That is more
> than half (55%). Some of those are because of various directories,
> renames, etc. It is also a bzr-svn conversion, which may effect things.
> But launchpad also has a similar 140k versus 240k entries in the .tix
> that refer to no actual changes, given that lp has 54k revs, a large
> portion of those are just the root being noted on every revision.
>   
Is this Python conversion a bzr-svn 0.4.x or 0.5.x conversion? The 
latter should create (significantly?) fewer changed directories fwiw.

> Do we *really* need to fake all of these root changes? I know there were
> like 2 branches out there that actually had a root id for about 3
> revisions.  Couldn't we just force that all revision roots from non-rr
> trees are fixed, and avoid preserving this bloat for the rest of history?
Not sure I follow this; Are you suggesting upgrading a non-rich-root 
revision to a rich-root revision should *not* mark the root entry as 
changed? This would break consistency with existing upgrades that were 
done independently.

Cheers,

Jelmer