Rule-based preferences - format marker RFC, etc.

Tue Jun 24 15:44:41 BST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Clatworthy wrote:
> John Arbash Meinel wrote:
>> John Arbash Meinel has voted tweak.
> 
>> I don't really prefer putting the file in as a versioned file in the
>> users workspace, but I understand the tradeoffs and I'm will to concede
>> on this one. I would just make a strong request for a required format
>> marker at the top of the file. Having one for .bzrignore would have
>> meant we could have retained proper compatibility across versions.
> 

...

> In terms of regular expressions, I'm thinking ^#\s*format\s*(\d+)\s*$.
> I considered putting the word "rules" in there as well but I think that
> increases the chance of someone typing it incorrectly for no noticeable
> benefit. (It's inside a file called rules or .bzrrules so I don't think
> repeating that information buys much.)
> 
> Another option is to use a bzr version # instead of a simple integer.
> My leaning is against this but I could easily change my mind if someone
> made a strong case for it.

I generally prefer a simple version number. I'm also a big fan of self
describing data. So *I* would like "# bzr rules format 1", but I understand
your point about trivial typos messing things up.

> 
> I guess it would be nice if the pattern chosen was adopted (in the future)
> for .bzrignore as well?
> 

Well, we upgraded the ignore rules in a slightly incompatible way, and nobody
complained. It actually took us quite a bit of discussion before we just said
"do it".

...

>> I also wonder if we want '.bzrrules' to have an explicit file id. Like
>> ('bzr:rules', etc). Something to think about.
> 
> I'm against this idea but I'd like to hear what others think.
> 
> At a minimum, it probably complicates every importer and the add code.
> I'm not sure how it would impact performance. I can't think of a reason
> why it would make things quicker, while increased complexity of code
> paths to look for it could well slow things down. I'm also against it
> because we might decide at some stage in the future to support a
> .bzrrules file per directory (ala .gitattributes) instead of per tree,
> so a special file-id doesn't scale. I suppose we could always go with
> a family of special ids (rules:*) if that were the only issue though.
> 
> Any particular reason you suggested this? I'm probably missing some
> obvious advantages?
> 
> Ian C.
> 
> 

I'm not sure what importers would be writing specially to this file, but if
they were, then they would need to be converting the rules specially, and thus
would need to special case it anyway. (Such as converting .cvsignore =>
.bzrignore, or .gitattributes => .bzrrules.) Any conversion that touches on
these specially is going to need to know how to handle it anyway.

The reason I proposed it, is because of how our files are indexed. If I want
the rules in a given revision, the lookup key is (file_id, revision_id). Not
(path, revision_id). And to lookup something via path you have to do
(inventory(revision_id).path2id(file_id), revision_id)
And having to open up the whole inventory, just to get 1 path out of it to
look up the file_id has traditionally been one of our performance thorns. Also
bad is that we have a flat-file for our whole inventory, but the keys are
stored such that you have to recursively walk them. You can't just bisect to
the full-path and be done with it, you have to walk all the parent_ids.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIYQhZJdeBCYSNAAMRAoHKAKCe9PQQDSmtWIQuYhs3Xk9YgpmL0wCcCUVf
fitfTo6Ie9S83XBpF7MtSAU=
=/+Gb
-----END PGP SIGNATURE-----