optimising diff (text_id vs sha1, and inventory_dir_id)
Martin Pool
mbp at sourcefrog.net
Thu Apr 14 23:55:47 BST 2005
On Thu, 2005-04-14 at 21:56 +1000, Robert Collins wrote:
> Using inode sigs to tune diff gives you a mapping : inode signature ->
> prior data about that inode.
More accurately it tells you what domain of a function from
filename->prior data is valid.
> Currently we have both text_id and text_sha1 in the prior data about the
> inode.
We have it in the inventory; we don't have it in the working directory.
> When we diff against a different revision, if the sha1 is the
> same, then we /probably/ have an unaltered file. If the text_id is the
> same and we trust our working copy, then its definately an unaltered
> file.
The problem is that you can directly determine the sha1 directly from a
working file but you cannot work out what text id it has.
You could compare it to text present in the previous revision and if
they're the same use the same text id, but that will cause at least
twice as much IO, for no apparent benefit.
> I think that we should use the text_id for diff optimisation and not the
> sha1 - making the hash a useful optimisation and not the core data
> assists with the ability to upgrade it later when we need to.
That is much slower, harder to write, and I don't think it gains
anything. If you mistrust SHA-1 you can just turn the optimization off.
In any case, inode signatures have a practical chance of failure (with
low res clocks or clock steps) that is far more worrying than the chance
of SHA-1 collision.
> Likewise, when we create aggregate sha1s for directories, we should
> assign an inventory_dir_id with the same algorithm as the sha1, which
> only changes when the dir or some of its children do.
--
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050415/6f4cb4e8/attachment.pgp
More information about the bazaar
mailing list