[Preview/RFC] win32 fake symlinks

John Arbash Meinel john at arbash-meinel.com
Sat Nov 3 14:32:30 GMT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander Belchenko wrote:
> Hi,
> 

...

> 3) To allow bzr using fake symlinks on win32 I monkeypatching os module. I'm hardly convinced that
> monkeypatching here is right thing, because no-one Linux developer will care to use
> osutils.[symlink|readlink|lstat] instead of os.[symlink|readlink|lstat]. And I don't want forever go
> behind you guys as street-cleaner an fix all places where you write win32-incompatible code.
> 
> About speed.
> 

One thing I noticed:

+def check_fake_symlink(path):
+    """Return True if file is fake symlink"""
+    if GetFileAttributes(path) & FILE_ATTRIBUTE_SYSTEM:
+        f = file(path, 'rb')
+        data = f.read()
+        f.close()
+        if data.startswith('!<symlink>') and data.endswith('\0'):
+            return True
+    return False

a) you should still use try/finally . It isn't huge when there is only 1
command, but it it good to be in the habit.

b) You read the entire content of every file that you stat. So if there is a
10MB file, you read 10MB just to find out it isn't a symlink. I realize you
don't really want an arbitrary limit on the symlink path. Because most systems
read in pages anyway, you could probably make this a 4k read. 2k or 1k would
still be longer than Windows really supports anyway.

c) Is there an encoding for these paths? How do you set a link to a Unicode
path. I would probably recommend using UTF-8. Though on Win32 using MBCS might
be reasonable (since that is the path encoding anyway). But that puts '\0'
characters in your strings, and you are using that as the EOF.
I'm guessing your current implementation is ending up using OEM encoding. But
that only works for the subset that is in your code page.

d) You could use the size of the file to filter out files that could not
possibly be symlinks. (Too big or too small.)

e) You could instead just read for "!<symlink>" at the beginning, which is a
very fixed (and short) length. However, reading the content of files usually
means making the HD heads move to a different location on disk. The metadata is
usually stored separately from the actual content. I'm curious if you'd see big
differences between FAT32 and NTFS for this. Just because they lay things out
differently on disk.

f) I would probably prefer it if this functionality was optional. (Provided by
a plugin.) It doesn't make a lot of sense to penalize all the normal Win32
users by checking every file if it is a symlink. Especially considering that
most Win32 projects won't/can't use them.

I would actually prefer if our TreeTransform code just knew whether it could
create symlinks, and if it got them, just have it either fail with a clean
message, delete them automatically (a bit ugly), or have a way to set those
files as "hidden". Considered part of the tree, but not present on disk.

The last is my favorite, but we need a way to store it. Which means a format bump.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHLIZ1JdeBCYSNAAMRAmWIAKCTjL5irLaBJFsTsesJk/3tF6MxtwCfciWd
G4Ovefj9fCu0S9SvdUkr2rg=
=Lvqe
-----END PGP SIGNATURE-----



More information about the bazaar mailing list