non-recursive status of a directory?

John Arbash Meinel john at arbash-meinel.com
Sat Jun 7 07:01:03 BST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Arbash Meinel wrote:
| Talden wrote:

...

| |
| | I agree that this special-case is a justification for the feature.
| | Our project of 22,000 files in nearly 3,600 folder definitely benefits
| | from tsvn's approach - waiting for a full status would be
| | excruciating.  At no time is a significant proportion of the folders
| | or the files in an explorer view - being able to short-circuit the
| | scan, though more work overall to establish status of everything, is
| | beneficial when only needing the status of the small sampling of the
| | source tree.
| |
| | There is also no question that the complete status operation of bzr
| | performs well in comparison to svn, but given that our developers
| | often turn off the recursive folder status in tsvn to make explorer
| | more responsive I would hope the faster status in bzr would translate
| | to faster status in explorer as well.  Being just as good as tsvn
| | isn't itself a worthy goal if the opportunity to beat it is sitting
| | right there in front of us.
| |
| | --
| | Talden
| |
| |
|
| On a Mozilla tree with 55,000 files 'bzr status' was taking 2 seconds. I
| don't
| consider that excruciating.
|
| John
| =:->

I should mention that this was a clean tree, on decent hardware with a hot
cache. The important thing about 'iter_changes()' is that it figures out that
nothing has changed *quickly*. That is why the code is a bit ugly, but it is all
about culling the unchanged files as fast as possible. (Don't decode their
filenames, don't do more than stat, etc, etc.)

In any tree of significant size, the vast majority of files are going to be
unchanged. Humans don't generally touch all the files, and if they do, it
doesn't matter as much if it is a bit slower. We spent a lot of time making
'iter_changes()' fast for certain use cases.

Now.... we probably have a Unicode encoding performance issue on Windows.
Specifically, when we 'os.listdir()' we get back Unicode names. Which is
probably slower than it could be (I haven't benchmarked it on win32), and more
importantly, dirstate entries are in utf-8, so we have to encode those Unicode
paths into UTF-8.

I don't have a strong feeling for what to do here, other than we could list
directories in plain strings as long as they don't have any non-ascii
characters, and fall back to Unicode apis when we have to. That would require
writing a special cased "osutils._walkdirs_win32()" instead of the current
_walkdirs_unicode_to_utf8.

So.... this might all be moot. I believe SVN's recursive status hasn't been as
tuned as much as bzr's. If it takes SVN 2s to give you the status of 1 directory
with 100 files, and bzr can do 55,000 files in 5,000 directories in the same
time, we might as well just recurse the whole tree.

Also, 'iter_changes()' is certainly capable of doing just subdirectories, so if
you are only viewing a 5,000 file subtree of the 55,000, you're still good.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkhKJB8ACgkQJdeBCYSNAAOeNQCfbISjex0DRWMqZhGCMSh5tycY
lGgAnioCxn4/D8vtw9WdhUntTfDzMrPs
=zV6o
-----END PGP SIGNATURE-----



More information about the bazaar mailing list