Telling if two trees are different
Robert Collins
robertc at robertcollins.net
Fri Oct 24 13:18:21 BST 2008
On Fri, 2008-10-24 at 18:34 +0900, Stephen J. Turnbull wrote:
> Robert Collins writes:
>
> > A few points here - in git the content blobs are the encoded blobs, not
> > the content-only.
> >
> > That is there is a header + content for anything in the database.
>
> True. The header contains the object type and its size.
I'm hazy here - where does the name hint used by 'git-pack-objects' get
stored; I had previously thought it was stored in the header for a file
blob; but having read the code I can see there is no room for it.
Assuming that it is calculated from the tree objects that seems like it
would make repacking a heavier operation than it needs to be. So we'd be
assured of equality, but we still can't do a completely mapped git-style
auxiliary index without extract file texts all along the way (or using
the actual content sha1's for files). (This is one of the tradeoffs I've
been examining in my split-inventory work).
> > So for a file X, it can be in the database many times, with different
> > content pointers - the object pointer will say 'not equal' - even though
> > the file content is identical.
>
> I don't think so. Two objects have the same SHA1 "name" if and only
> if they have the same type and content. That's the fundamental
> invariant of git, and the source of its speed. The type is constant
> for files ("blob") and the size will be the same for identical
> content.
Working with the assertion that its type+content only, then yes files
with the same content with have the same chk; thats a trivial
conclusion. I was working with the assumption that it was a header
rather than just a type string + content (which I was clearly wrong
about).
> > Secondly, the cost of translating a bzr inventory into any other
> > representation is _always_ much greater than that of doing
> > revtree1.iter_changes(revtree1).next()
>
> True, but I'm suggesting keeping an auxiliary index, not calculating
> it on the fly.
Ah right, I missed that.
> > I haven't yet written the iter_changes optimiser for these trees, but it
> > will have similar work to do as git does to compare two trees (though I
> > hope it will have a better scaling factory due to balancing its own tree
> > rather than following directory boundaries.)
>
> In practice this is not going to matter. Humans like balanced trees,
> too.
Agreed that it probably won't matter for many users; I know of live
trees using bzr today that will benefit in both corner cases (narrow and
broad trees).
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20081024/e5b7b8ad/attachment.pgp
More information about the bazaar
mailing list