Rule-based preferences - format marker RFC, etc.

Wed Jun 25 05:32:09 BST 2008

John Arbash Meinel wrote:
> Ian Clatworthy wrote:
>> John Arbash Meinel wrote:
>>> John Arbash Meinel has voted tweak.

> I'm not sure what importers would be writing specially to this file, but if
> they were, then they would need to be converting the rules specially, and thus
> would need to special case it anyway. (Such as converting .cvsignore =>
> .bzrignore, or .gitattributes => .bzrrules.) Any conversion that touches on
> these specially is going to need to know how to handle it anyway.

I partially agree. There are people doing round-tripping and messy stuff like
using the one tree with multiple tools, so my point was more about every
converter needing to assign 'bzr:rules' as the file-id for the '.bzrrules'
pathname. I guess the path->file-id mapping is controlled by the repo so
we'd need to ensure bzr-svn (say) special-cased that mapping, wouldn't we?
And maybe that would cause problems in cases where that file was also stored
in the svn repo?

> The reason I proposed it, is because of how our files are indexed. If I want
> the rules in a given revision, the lookup key is (file_id, revision_id). Not
> (path, revision_id). And to lookup something via path you have to do
> (inventory(revision_id).path2id(file_id), revision_id)
> And having to open up the whole inventory, just to get 1 path out of it to
> look up the file_id has traditionally been one of our performance thorns. Also
> bad is that we have a flat-file for our whole inventory, but the keys are
> stored such that you have to recursively walk them. You can't just bisect to
> the full-path and be done with it, you have to walk all the parent_ids.

Thanks. I can now see where you're coming from and I agree it's an
interesting idea worth further discussion.

So, one option would be to generalise the mapping so that .bzrXXX in the
root of a tree always got mapped to the file-id bzr:XXX. Another option
is to enhance our API so that the file-ids used for magic files could be
found more directly. Later (or now if performance requires), we could
extend the relevant internal formats to store cached values for these.

Any thoughts on the pros and cons of those two alternatives?
Are there others we should consider now as well before this lands?

Ian C.