[MERGE] fast read-dir support for unix too
Robert Collins
robertc at robertcollins.net
Tue Sep 23 00:04:26 BST 2008
On Mon, 2008-09-22 at 17:56 -0400, John Arbash Meinel wrote:
>
> I *do* wonder if doing:
>
> property st_XXX:
> def __get__(self):
> return self._st.st_XXX
>
> Is faster/slower than:
>
>
> cdef class _Stat:
>
> cdef readonly int st_XXX
>
>
> I know the former is translated into an explicit function, while the
> latter is
> just put into the 'members' struct.
> It might be worth benchmarking.
>
> On the flip side, having an explicit wrapping of a 'struct stat' may
> be
> a great
> way to have other pyrex code access the raw 'struct stat' directly.
The property is likely slightly slower; I have other work to remove the
use of stat itself throughout the higher layers and instead have an
opaque fingerprint; Thats down the track though. For now I'll carefully
setup those types according to the C definition. I use the properties
because of the C macros that cause utter chaos - doing cdef readonly int
st_XXX fails. Also using the stat struct as I do is less copying in the
inner loop.
> ...
>
> + cdef _kind_from_mode(self, int mode):
> + # in order of frequency:
> + if S_ISREG(mode):
> + return self._file
> + if S_ISDIR(mode):
> + return self._directory
> + if S_ISCHR(mode):
> + return self._chardev
> + if S_ISBLK(mode):
> + return self._block
> + if S_ISLNK(mode):
> + return self._symlink
> + if S_ISFIFO(mode):
> + return self._fifo
> + if S_ISSOCK(mode):
> + return self._socket
> + return self._unknown
>
> ^- I would assume symlinks are more common in working trees than CHR
> and
> BLK
> devices. Not that it is a huge issue after you've handled files and
> directories.
Yeah, how about I just say 'files and dirs are respectively most
common'.
> I notice that you pushed "lstat" into _read_dir(). Did you decide
> that
> stat-ing
> in inode order was not a sufficient win to continue using it? Does
> that
> point
> us to simplifying the _read_dir api?
It wasn't a sufficient win on hot-cache to make me cry losing it; I
intend to reinstate it; I'm happy to do so in this branch, or a later
one; I'd prefer a later one because having smaller deltas is easier for
me to work with.
> I'll note that the docstring is now incorrect:
> cdef _read_dir(path):
> """Like os.listdir, this reads the contents of a directory.
>
> :param path: the directory to list.
> :return: a list of (sort_key, basename) tuples.
> """
>
> ^- as _read_dir is now returning:
> PyList_Append(result, (entry.d_ino, entry.d_name, None,
> statvalue, None))
Thanks.
> I'll note that it is a little bit ugly to create a tuple and modify it
> *in-place*, but as long as it is contained in the C code, and you
> haven't
> called hash(tuple), etc, it is probably fine to do. Perhaps we should
> make it a
> bit clearer that we are doing so?
I disagree on 'ugly' - its what the C layer _does_ to create tuples.
(You make it, and the populate it). The ugliness here is IMO being
half-python and half-CPython.
> + # We have inode, name, None, statvalue, None
> + # inode -> path_from_top
> + # direct concat - faster than operator +.
>
> # We have (inode, name, None, statvalue, None), which we will replace
> with
> # (prefix_path, name, kind, statvalue, top_path)
I'll rephrase it because I think you're saying it's not clear at first
read. But we're not replacing one tuple with another, we're editing the
tuple we're creating - and I don't want confusion on that score!
> Otherwise the read_dir() function looks like it is properly
> streamlined
> on the
> standard path. We use direct C access for all of the common path code.
Theres quite a bit more we can do on that path I think, but its quite
tedious.
> I'll note, though, that you no longer return entries in sorted order.
> I
> guess
> that is okay, since you use a sorted() call in _walkdirs_utf8()
>
> However, you need to change the DirReader api documentation:
> def read_dir(self, prefix, top):
> """Read a specific dir.
>
> :param prefix: A utf8 prefix to be preprended to the path
> basenames.
> :param top: A natively encoded path to read.
> :return: A sorted list of the directories contents. Each item
> contains:
> (utf8_relpath, utf8_name, kind, lstatvalue, native_abspath)
> """
> raise NotImplementedError(self.read_dir)
>
> I'll also note that we can get rid of the sorted() in
> UnicodeDirReader.read_dir() and the same code in walkdirs_win32.pyx
Thanks, will do.
> Also, things are quite broken for win32, I'm getting infinite
> recursion
> in
> "file_kind_from_stat_mode_thunk". You are missing this line:
> + file_kind_from_stat_mode = _file_kind_from_stat_mode
>
> After you import the correct version from the _readdir_py.py module.
>
> So we should probably should have test that it can fallback correctly.
Oh hmm. I'll look at a unit test for this.
> Oh, and you broke the _walkdirs_win32 code, because it was expecting
> to
> grab
> "formats" from osutils.py. This fixes it for me:
> === modified file 'bzrlib/_walkdirs_win32.pyx'
> ...
> I also think it would be reasonable to rename the
> "_walkdirs_win32.pyx"
> file to
> something like "_readdir_win32_pyx.pyx" to fit the other naming.
will do.
Thanks for the review.
-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20080923/adc6e8e7/attachment-0001.pgp
More information about the bazaar
mailing list