Telling if two trees are different

Stephen J. Turnbull stephen at xemacs.org
Fri Oct 24 15:17:01 BST 2008


Robert Collins writes:

 > I'm hazy here - where does the name hint used by 'git-pack-objects' get
 > stored;

I don't know the internals of git packs that well, sorry.  For my
purposes packs are effectively as transparent as an LBA hard drive.

 > I had previously thought it was stored in the header for a file
 > blob; but having read the code I can see there is no room for it.
 > Assuming that it is calculated from the tree objects that seems like it
 > would make repacking a heavier operation than it needs to be.

In git, repacking is a heavy operation.  Why does that matter?  It's
something one does on a monthly basis or so.

 > So we'd be assured of equality, but we still can't do a completely
 > mapped git-style auxiliary index without extract file texts all
 > along the way (or using the actual content sha1's for files).

I'm not sure what you're trying to do; AIUI the OP would be perfectly
happy with a comparison of the top tree SHA1s.

 > > In practice this is not going to matter.  Humans like balanced trees,
 > > too.
 > 
 > Agreed that it probably won't matter for many users; I know of live
 > trees using bzr today that will benefit in both corner cases (narrow and
 > broad trees).

Sure, but do they have millions (at least many thousands) of internal
nodes (subtrees)?  If not, I can't see how it's going to make a
difference unless you have an algorithm that needs to do a lot of
full-tree compares.  git-diff is awf'ly fast from a human's perspective.



More information about the bazaar mailing list