Telling if two trees are different
Stephen J. Turnbull
stephen at xemacs.org
Fri Oct 24 15:17:01 BST 2008
Robert Collins writes:
> I'm hazy here - where does the name hint used by 'git-pack-objects' get
> stored;
I don't know the internals of git packs that well, sorry. For my
purposes packs are effectively as transparent as an LBA hard drive.
> I had previously thought it was stored in the header for a file
> blob; but having read the code I can see there is no room for it.
> Assuming that it is calculated from the tree objects that seems like it
> would make repacking a heavier operation than it needs to be.
In git, repacking is a heavy operation. Why does that matter? It's
something one does on a monthly basis or so.
> So we'd be assured of equality, but we still can't do a completely
> mapped git-style auxiliary index without extract file texts all
> along the way (or using the actual content sha1's for files).
I'm not sure what you're trying to do; AIUI the OP would be perfectly
happy with a comparison of the top tree SHA1s.
> > In practice this is not going to matter. Humans like balanced trees,
> > too.
>
> Agreed that it probably won't matter for many users; I know of live
> trees using bzr today that will benefit in both corner cases (narrow and
> broad trees).
Sure, but do they have millions (at least many thousands) of internal
nodes (subtrees)? If not, I can't see how it's going to make a
difference unless you have an algorithm that needs to do a lot of
full-tree compares. git-diff is awf'ly fast from a human's perspective.
More information about the bazaar
mailing list