[MERGE] faster iter_changes
John Arbash Meinel
john at arbash-meinel.com
Tue Sep 23 00:21:01 BST 2008
John Arbash Meinel has voted comment.
Status is now: Semi-approved
Comment:
I think this is going to take a few rounds of review of the different
pieces. I think I like where it is going, but it isn't ready to land for
'tweak' (doesn't work on win32). But I don't really want it out of the
queue via 'resubmit'. So for now I'm just voting 'comment'.
=== modified file 'bzrlib/_dirstate_helpers_c.pyx'
On win32 we don't have:
+cdef extern from "arpa/inet.h":
+ unsigned long htonl(unsigned long)
+
+
or
+cdef extern from 'sys/stat.h':
+ int S_ISDIR(int mode)
+ int S_ISREG(int mode)
+ int S_ISLNK(int mode)
+ int S_IXUSR
This may depend on the compiler tools. I'm currently using mingw32, and
it
seems to be fine with "S_ISDIR and S_ISREG" but fails trying to find
S_ISLNK.
More exploration is needed to know what VS 7/8/9 would have available.
(They
tend to put POSIX stuff into _foo, like _open and _stat sort of things.)
Perhaps we just need a compatibility header, that on win32 S_ISLINK is
defined
as: "#define S_ISLNK(mode) (0)"
Though I think at one point Alexander had done some interesting trickery
to get
"symlink" support on win32. By setting certain flags, and then using
more
invasive checks when those flags were set. So I think it would be good
to be a
little bit flexible here, but we can put the onus of supporting that
onto
someone who is actually going to be implementing symlink support on
win32,
rather than someone who doesn't have a good win32 platform for testing.
:)
And I don't think you meant to keep:
+cdef extern from "stdio.h":
+ void printf(char *format, ...)
+
+from bzrlib import cache_utf8, errors, osutils
+from bzrlib.dirstate import DirState, pack_stat
+from bzrlib.osutils import pathjoin, splitpath
^- you don't actually use 'pack_stat' here. Also, I think it would be
better to
do "pathjoin = osutils.pathjoin" at the appropriate times, rather than
importing them again here.
+#cdef object _encode
+_encode = binascii.b2a_base64
+
^- I don't think cdef object gets you anything here, but commenting it
out gets
you even less.
...
+
+from struct import pack
+cdef _pack_stat(stat_value):
^- This seems like a place, that if you could determine you had
_walkdirs_win32._Win32Stat or _readdir_pyx._Stat you could access the
values as
C types, rather than converting into python and back down to C.
I would also be *really* curious to poke a sprintf("%8X.%8X.%8X") in
there, instead of
the base64.encode() and see what we get.
...
+cdef _update_entry(self, entry, abspath, stat_value):
+ """Update the entry based on what is actually on disk.
+
+ :param entry: This is the dirblock entry for the file in question.
+ :param abspath: The path on disk for this file.
+ :param stat_value: (optional) if we already have done a stat on the
+ file, re-use it.
+
+ :return: The sha1 hexdigest of the file (40 bytes) or link target
of a
+ symlink.
+ """
^- 'stat_value' isn't optional anymore. We moved that code up the stack,
and
just forgot to update the docstring. We need to update the "def
update_entry"
docstring as well.
+ # TODO - require pyrex 0.8, then use a pyd file to define access to
the _st
+ # mode of the compiled stat objects.
^- Do you really mean pyrex 0.8? That seems like a perfectly reasonable
request. If you mean 0.9.8 that might be more difficult.
+ cdef int minikind, saved_minikind
+ cdef void * details
+ # pyrex 0.9.7 would allow cdef list details_list, and direct access
rather
+ # than PyList_GetItem_void_void below
I didn't realize this. Is that pyrex 0.9.7, or cython?
Regardless, 'details' is a Tuple, not a List object.
Also "cdef _update_entry(self, ...)" since _update_entry is no longer a
member
of DirState, I think it might be more obvious to use
"_update_entry(state,
...)".
+
+cdef char _minikind_from_string(object string):
+ """Convert a python string to a char."""
+ return PyString_AsString(string)[0]
+
^- This is a little unsafe in the case that 'string' is an empty string.
Perhaps a quick "if PyString_Size(string) > 0" ?
...
+cdef object _minikind_to_kind(char minikind):
+ """Create a string kind for minikind."""
+ cdef char _minikind[1]
+ if minikind == c'f':
+ return _kind_file
+ elif minikind == c'd':
+ return _kind_directory
+ elif minikind == c'a':
+ return _kind_absent
+ elif minikind == c'r':
+ return _kind_relocated
+ elif minikind == c'l':
+ return _kind_symlink
+ elif minikind == c't':
+ return _kind_tree_reference
+ _minikind[0] = minikind
+ raise KeyError(PyString_FromStringAndSize(_minikind, 1))
^- Why do you use the "_minikind[0]" workaround, rather than &minikind?
I'm skipping the big ProcessEntryC code block for now, I'll try to get
back to
it later.
=== modified file 'bzrlib/delta.py'
--- bzrlib/delta.py 2008-04-24 07:22:53 +0000
+++ bzrlib/delta.py 2008-09-15 03:23:53 +0000
@@ -241,7 +241,7 @@
(executable[0] != executable[1])))
elif kind[0] != kind[1]:
delta.kind_changed.append((path[1], file_id, kind[0],
kind[1]))
- elif content_change is True or executable[0] != executable[1]:
+ elif content_change or executable[0] != executable[1]:
delta.modified.append((path[1], file_id, kind[1],
content_change,
(executable[0] != executable[1])))
^- any comment on why you had to do this?
...
=== modified file 'bzrlib/dirstate.py'
--- bzrlib/dirstate.py 2008-07-30 08:55:10 +0000
+++ bzrlib/dirstate.py 2008-09-18 06:36:28 +0000
@@ -211,6 +211,7 @@
import time
import zlib
+import bzrlib
from bzrlib import (
^- As near as I can tell, you do the import here, so that later you can
do:
def iter_changes(self):
"""Iterate over the changes."""
utf8_decode = cache_utf8._utf8_decode
==> cmp_by_dirs = bzrlib.dirstate.cmp_by_dirs
_process_entry = self._process_entry
It seems like it would be a lot cleaner to just use a different variable
name.
Like "_cmp_by_dirs = cmp_by_dirs".
...
+from bzrlib.osutils import pathjoin, splitpath
Similarly, I find it better to use "pathjoin = osutils.pathjoin" when
you want
it as a simple function rather than a module lookup.
...
@@ -2774,6 +2738,709 @@
raise errors.ObjectNotLocked(self)
+def py_update_entry(self, entry, abspath, stat_value,
+ _stat_to_minikind=DirState._stat_to_minikind,
+ _pack_stat=pack_stat):
^- Shouldn't this be moved into _dirstate_helpers_py.py ?
Similarly for:
+class ProcessEntryPython(object):
=== modified file 'bzrlib/tests/test__dirstate_helpers.py'
...
+
+class TestUpdateEntry(test_dirstate.TestCaseWithDirState):
I'm assuming this is mostly just a Cut & Paste from the test_dirstate
tests?
I'll also note that you don't seem to have direct tests for the other
helpers
like pack_stat and ProcessEntry[C/Py]. I don't know that we have to, but
I
usually feel better having implementation tests to make sure that the
Pyrex code performs identically to the Python code.
I suppose you are testing it indirectly with iter_changes and:
+ if optimiser is bzrlib.workingtree_4.InterDirStateTree:
+ # Its a little ugly to be conditional here, but less so
than having
+ # the optimiser listed twice.
+ # Add once, compiled version
+ test_intertree_permutations.append(
+ (optimiser.__name__ + "(C)",
+ optimiser,
+ optimiser._matching_from_tree_format,
+ optimiser._matching_to_tree_format,
+ optimiser.make_source_parent_tree_compiled_dirstate))
+ # python version
+ test_intertree_permutations.append(
+ (optimiser.__name__ + "(PY)",
+ optimiser,
+ optimiser._matching_from_tree_format,
+ optimiser._matching_to_tree_format,
+ optimiser.make_source_parent_tree_python_dirstate)
As mentioned, I haven't gone over the ProcessEntryX classes yet, but I
figured
I could at least start the discussion.
Overall, I think the basic refactoring you've done is good. We still
need to
figure out how to get everything to compile on all platforms, etc.
For details, see:
http://bundlebuggy.aaronbentley.com/project/bzr/request/%3C1221720286.9592.27.camel%40lifeless-64%3E
Project: Bazaar
More information about the bazaar
mailing list