Thoughts on file ids
Eric Siegerman
lists08-bzr at davor.org
Thu May 5 15:53:06 UTC 2011
On Thu, 2011-05-05 at 17:07 +0200, Jelmer Vernooij wrote:
> The main problem I see with something like this is that it doesn't
help
> with the problem of texts with the same content having a different
file
> id/revision and being stored multiple times in the repository.
One way to solve that (if my understanding of the data model
isn't totally off the mark :-/) would be to index texts, not by
(file-id, rev-id), but by a new text-id. To maximize
deduplication, the text-id of a deltatext should be the hash of
the fully reconstituted text, not that of the delta itself.
Of course such a change entails a new repo format, but a decision
to do it -- not now, but some day (e.g. when the format needs to
be changed anyway for other reasons) -- factors deduplication out
into a separate, orthogonal problem that can stop clouding the
current discussion.
- Eric
More information about the bazaar
mailing list