Making diff fast (was Re: Some notes on distributed SCM)

Aaron Bentley aaron.bentley at utoronto.ca
Sun Apr 10 23:07:16 BST 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Any time somebody clones a repository, all the fileid's up to the point of the 
> clone will match exactly.  Past the clone point, things get interesting.  Two 
> identical repositories (i.e., just after a clone) might each pull the same 
> changeset from a third repository.  Everything still matches exactly, but we 
> need fancier bookkeeping to know that.  A slightly improved fileid does the 
> trick:
> 
>    fileid = (repository-number:file-number)

Personally, I prefer the Arch approach, which is essentially to assign a
 uuid to each file.  (Of course, Tom had to go reinvent uuids first)

> where the repository part is also just a counter, which counts all the foreign 
> repositories we have ever pulled from.  These repository numbers are strictly 
> internal.  We map an internal repository number to/from somebody's "public" 
> repository uuid with a table.  This way, we can always establish an exact 
> taxonomy of all objects that anybody ever imported from each other.

With uuids, you get this correspondance automatically.  If the file came
from the same ultimate source, it's treated the same in every tree that
contains it.

> When two sibling repositories each import the same third-party tarball, things 
> get more interesting.  In this case we have to guess a little, but almost all 
> the time, we ought to still be able to come up with an exact correspondence 
> between objects in the two sibling repositories.  

In this case, the Arch model requires tables similar to the ones you
described earlier, to map one uuid to another.

We can leave this as
> "further work".

As has been done so far in Arch :-)  This problem is only likely to
occur when multiple people import the same well-known project.
Canonical is fighting this by providing imports of many well-known
projects in the baz format (which is Arch-derived).

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCWaOT0F+nu1YWqI0RAnMyAJ95KZeXBvJwYc/ydJRsiouYZ3bqQACeKgqN
H5BUIlY4R8oC2iZRKjizNqg=
=jT/5
-----END PGP SIGNATURE-----




More information about the bazaar mailing list