Illegal Filesystem characters in revision names
John Yates
jyates at netezza.com
Mon Dec 5 16:42:16 GMT 2005
This seems like a prime example of a project policy.
If I am initiating a project and expect developers
to contribute while working on Windows boxes I should
be able to record a global project intent that all
filenames need to satisfy Windows constraints. This
might be a hard coded bit of logic or simply a set of
regular expressions. Similarly, if I want to ensure
that my *nix developers do not create problems for my
Windows developers I want to record my desire that no
two names within a directory differ only in alphabetic
case. I can obviously capture such policies a web
page or some other project document. But ideally I
would like to my VCS to understand and enforce these
intensions.
Policies could be implemented as hooks. Such optional
hook functions would be versioned objects stored at a
well-known position in any tree.
Simply within the realm of pathnames I can envision
a number of additional policy options:
. only lowercase letters
. enforce 8.3 restriction
Beyond pathnames I would like to enforce canonical
representation of whitespace, irrespective of any
havoc that a developer's editor may have wrought.
Such a hook would be a filter rather than a predicate.
This whitespace use case makes me think that add/commit
actions apply project hooks while operations that
materialize new files within a developer's workspace
should call hooks private to that developer. potentially
independent of any tree.
/john
-----Original Message-----
From: bazaar-ng-bounces at lists.canonical.com
[mailto:bazaar-ng-bounces at lists.canonical.com]On Behalf Of John A Meinel
Sent: Saturday, December 03, 2005 1:40 PM
To: Bazaar-NG
Subject: Illegal Filesystem characters in revision names
I'm trying to figure out the best way to handle illegal characters in
revision names.
On windows, the forbidden characters in a filename are:
\ / : * ? " < > |
There are also some forbidden strings, specifically you can never name a
file:
prn
com1, com2, com3, etc
lpt1, lpt2, etc
I'm guessing there are others, but I haven't been able to find the
Microsoft Knowledge base article
The specific thing I'm worried about, is that we are trying to use
"Arch:" as a prefix for arch imported archives, and this prevents them
from being checked out on windows.
I can think of a few solutions:
1) Forbid illegal characters in revision names. People could always
switch to using Arch-some at archive%foo--baz--1.0--patch-2. But using a
semicolon to define a namespace is really nice.
2) URL encode illegal characters. This means that they would always have
a legal filesystem name. The question becomes, though, when do you turn
this on. Do you use it on all platforms, or just on windows? When you
request it, it actually has to be doubly encoded, since http would
decode it. If we don't use it everywhere, we still have to be able to
handle it, since some people will publish using an IIS server, and
others would use Apache.
An branch format bump could make this cleaner, and just everyone would
start using the new format.
3) Just switch to using a revision.weave file. Since revisions are no
longer saved directly by their name, we can break ourselves from being
limited to filesystem constraints. This would also require a branch
format bump. I'm hesitant to do this until we have knits. We already
have 1 file (inventory.weave) which scales by the number of commits, I
wouldn't really want to introduce a second one. (Though it would scale
better, since the number of lines per revision are much smaller).
Does anyone have more ideas? I'm leaning towards the third option, since
we kind of want to do it anyway. And I guess we can just say "arch
conversions aren't supported on windows until the next branch format".
Though it might be nice to get it sooner than that, since I've started
pointing my coworkers to bzr.
John
=:->
More information about the bazaar
mailing list