[RFC] sha1 of fileid naming for knit files

Tue Oct 31 23:27:38 GMT 2006

John Arbash Meinel wrote:
> Robert Collins wrote:
>> To help out a user trying bzr-svn [which creates loong file ids because
>> it needs a mapping back into svn], I have whipped up this plugin -
>> http://bazaar.launchpad.net/~lifeless/+junk/sha1repo
>>
>> It sha1's the fileid to get the 40byte hex digest and uses that for the
>> filename.
>>
>> I'm wondering what people think of this as the escaping format to fix
>> our problems with respect to apache etc.
>>
>> -Rob
> 
> I think the idea is interesting, and it certainly gives us a way to
> restrict ourselves to "safe" characters.
> 
> However, it is kind of nice to have the back reference. So if it says
> "failed to do X on file foobar-2uaothnthauo.knit" you have an idea what
> file is involved. It isn't a perfect mapping, but there is some
> information there, which has meaning.
> 
> John

If the problem is 'bad' characters, one solution is to remove the bad
characters and add the sha1 hash at the end. I've used systems where all
non-ASCII is mapped to ASCII, spaces map to _ and most punctuation gets
omitted, so you get mappings from Renée's pie chart.jpg to
Renees_pie_chart.jpg which is pretty close to the original, but might
not be unique. Adding a sha1 guarantees uniqueness:
Renees_pie_chart-a7b64b8f2.jpg but this is even longer.

If the problem is long names, this may not help at all.

I don't fully understand the problem, can someone explain it? I *think*
you might be talking about the double escaping problem on some web
server setups?

Loki