No subject

Thu Jun 12 23:09:36 BST 2008

A bit of effort will be needed to instrument the code or write
adequate scripts (measuring time and repository sizes or any
more precise points we can think of) against the existing
repositories so that we can measure improvements.

Do we have some knowledge about how long it takes to replay all
commits starting from scratch on bzr, openoffice and mysql
repositories ? If this is too long, can we find a satisfying
number of revisions to start with (incrementing it as we get
better performances) ?

Also regarding tree representation, it's worth noting that using
file-ids or paths as keys may not be relevant. Hashes will spread
differences anyway, different hashes will spread
differently. This is really a key point about which I'm pretty
sure we can't draw conclusions without experiences (and I really
mean it, I had a lot of surprises in my previous experiences).

In other words, we must first reach the work/right point and
experiment various hashes (and other details :) to solve the fast
part.

That's why, IMHO, no, IMVHO, it may be more important to
highlight the explanations you gave above than deciding if hash
tries or radix trees are more appropriate here.

       Vincent

P.S.: If it wasn't clear enough, I'd be more than happy to work
on that subject in any way :D