Rev 2643: (John Arbash Meinel) Implement DirState._read_dirblocks() in pyrex in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Fri Jul 20 20:48:25 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2643
revision-id: pqm at pqm.ubuntu.com-20070720194822-smqttk05w6efypf0
parent: pqm at pqm.ubuntu.com-20070720172520-i2ezksmrduaonojd
parent: john at arbash-meinel.com-20070720182620-948wu6weli9aupkq
committer: Canonical.com Patch Queue Manager <pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Fri 2007-07-20 20:48:22 +0100
message:
  (John Arbash Meinel) Implement DirState._read_dirblocks() in pyrex
added:
  bzrlib/_dirstate_helpers_c.pyx dirstate_helpers.pyx-20070503201057-u425eni465q4idwn-3
  bzrlib/_dirstate_helpers_py.py _dirstate_helpers_py-20070710145033-90nz6cqglsk150jy-1
  bzrlib/benchmarks/bench_dirstate.py bench_dirstate.py-20070503203500-gs0pz6zkvjpq9l2x-1
  bzrlib/tests/test__dirstate_helpers.py test_dirstate_helper-20070504035751-jsbn00xodv0y1eve-2
modified:
  .bzrignore                     bzrignore-20050311232317-81f7b71efa2db11a
  NEWS                           NEWS-20050323055033-4e00b5db738777ff
  bzrlib/benchmarks/__init__.py  __init__.py-20060516064526-eb0d37c78e86065d
  bzrlib/dirstate.py             dirstate.py-20060728012006-d6mvoihjb3je9peu-1
  bzrlib/tests/__init__.py       selftest.py-20050531073622-8d0e3c8845c97a64
  bzrlib/tests/test_dirstate.py  test_dirstate.py-20060728012006-d6mvoihjb3je9peu-2
  bzrlib/workingtree_4.py        workingtree_4.py-20070208044105-5fgpc5j3ljlh5q6c-1
  setup.py                       setup.py-20050314065409-02f8a0a6e3f9bc70
    ------------------------------------------------------------
    revno: 2474.1.74
    merged: john at arbash-meinel.com-20070720182620-948wu6weli9aupkq
    parent: john at arbash-meinel.com-20070720173448-cn7og836bl8dovwv
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-07-20 13:26:20 -0500
    message:
      Revert the accidental removal of the Unicode normalization check code.
      It was done to profile how much it was costing us, but it wasn't meant to be removed.
    ------------------------------------------------------------
    revno: 2474.1.73
    merged: john at arbash-meinel.com-20070720173448-cn7og836bl8dovwv
    parent: john at arbash-meinel.com-20070720170136-pa6kb99lxxmekyji
    parent: pqm at pqm.ubuntu.com-20070720161548-nppg3mvd38gbuaid
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-07-20 12:34:48 -0500
    message:
      [merge] bzr.dev 2641
    ------------------------------------------------------------
    revno: 2474.1.72
    merged: john at arbash-meinel.com-20070720170136-pa6kb99lxxmekyji
    parent: john at arbash-meinel.com-20070718204238-5gi11fx04q7zt72d
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-07-20 12:01:36 -0500
    message:
      Document a bit more what is going on in _dirstate_helpers_c.pyx, from Martin's comments
    ------------------------------------------------------------
    revno: 2474.1.71
    merged: john at arbash-meinel.com-20070718204238-5gi11fx04q7zt72d
    parent: john at arbash-meinel.com-20070718203014-u8gpbqn5z9ftx1tu
    parent: pqm at pqm.ubuntu.com-20070717180333-5smmeduk2q3sbzvw
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Wed 2007-07-18 15:42:38 -0500
    message:
      [merge] bzr.dev 2625
    ------------------------------------------------------------
    revno: 2474.1.70
    merged: john at arbash-meinel.com-20070718203014-u8gpbqn5z9ftx1tu
    parent: john at arbash-meinel.com-20070713212835-m330r85zq4xwgipi
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Wed 2007-07-18 15:30:14 -0500
    message:
      Lot's of fixes from Martin's comments.
      Fix signed/unsigned character issues
      Add lots of comments to help understand the code
      Add tests for proper Unicode handling (we should abort if we get a Unicode string,
      and we should correctly handle utf-8 strings)
    ------------------------------------------------------------
    revno: 2474.1.69
    merged: john at arbash-meinel.com-20070713212835-m330r85zq4xwgipi
    parent: john at arbash-meinel.com-20070713175009-sylhp1kst6145v0f
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-07-13 16:28:35 -0500
    message:
      Thanks to Jan 'RedBully' Seiffert, some review cleanups
      changes size_t to unsigned.
      Check alignment on strings before using integer loops.
      Just use a simple backwards checking loop for _memrchr
    ------------------------------------------------------------
    revno: 2474.1.68
    merged: john at arbash-meinel.com-20070713175009-sylhp1kst6145v0f
    parent: john at arbash-meinel.com-20070712181059-xnomv3tzzsb2hpx5
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-07-13 12:50:09 -0500
    message:
      Review feedback from Martin, mostly documentation updates.
    ------------------------------------------------------------
    revno: 2474.1.67
    merged: john at arbash-meinel.com-20070712181059-xnomv3tzzsb2hpx5
    parent: john at arbash-meinel.com-20070712163402-lp91q157w5etslrj
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-07-12 13:10:59 -0500
    message:
      Add NEWS entries for performance improvements.
    ------------------------------------------------------------
    revno: 2474.1.66
    merged: john at arbash-meinel.com-20070712163402-lp91q157w5etslrj
    parent: john at arbash-meinel.com-20070712052601-n0bcu3r5nlu1skj4
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-07-12 11:34:02 -0500
    message:
      Some restructuring.
      Move bisect_path_* to private functions
      Move cmp_path_by_dirblock to a private function,
      since it is only used by the bisect_path functions.
      Add tests that the compiled versions are actually used.
      This catches cases when the import fails for the wrong reason.
      Move some code around to make it closer to sorted by name.
    ------------------------------------------------------------
    revno: 2474.1.65
    merged: john at arbash-meinel.com-20070712052601-n0bcu3r5nlu1skj4
    parent: john at arbash-meinel.com-20070712051503-ntboo0z3prcrcg3t
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-07-12 00:26:01 -0500
    message:
      Found an import dependency bug if the compiled version is not available.
      Basically, we need a constant from dirstate.py, but we can't import the module directly
      because before the module finishes loading, it imports _dirstate_helper*.
      but bzrlib.dirstate.DirState *has* been defined at that point,
      so we can import it.
      But now the tests pass with and without running 'make' first.
    ------------------------------------------------------------
    revno: 2474.1.64
    merged: john at arbash-meinel.com-20070712051503-ntboo0z3prcrcg3t
    parent: john at arbash-meinel.com-20070712051426-u9auufylv5cba940
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-07-12 00:15:03 -0500
    message:
      Fix dirstate benchmarks for new layout.
    ------------------------------------------------------------
    revno: 2474.1.63
    merged: john at arbash-meinel.com-20070712051426-u9auufylv5cba940
    parent: john at arbash-meinel.com-20070711234520-do3h7zw8skbathpz
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-07-12 00:14:26 -0500
    message:
      Found a small bug in the python version of _read_dirblocks.
      This reveals that the code is not as directly tested as it should be.
      Consider refactoring all test_dirstate to use both implementations.
      Or at least at more direct tests.
    ------------------------------------------------------------
    revno: 2474.1.62
    merged: john at arbash-meinel.com-20070711234520-do3h7zw8skbathpz
    parent: john at arbash-meinel.com-20070711225935-llcal92udviwxfp4
    parent: pqm at pqm.ubuntu.com-20070711162842-8fx9cc0c3ogyxudl
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Wed 2007-07-11 18:45:20 -0500
    message:
      [merge] bzr.dev 2601
    ------------------------------------------------------------
    revno: 2474.1.61
    merged: john at arbash-meinel.com-20070711225935-llcal92udviwxfp4
    parent: john at arbash-meinel.com-20070711215705-x6l2fdioh050zxzp
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Wed 2007-07-11 17:59:35 -0500
    message:
      Finish fixing DirState._bisect and the bisect tests
    ------------------------------------------------------------
    revno: 2474.1.60
    merged: john at arbash-meinel.com-20070711215705-x6l2fdioh050zxzp
    parent: john at arbash-meinel.com-20070711214905-e2cxwnuoxr9r1o9r
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Wed 2007-07-11 16:57:05 -0500
    message:
      Get rid of strchr in favor of memchr
    ------------------------------------------------------------
    revno: 2474.1.59
    merged: john at arbash-meinel.com-20070711214905-e2cxwnuoxr9r1o9r
    parent: john at arbash-meinel.com-20070711000154-4et8yf8si3jgxmgc
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Wed 2007-07-11 16:49:05 -0500
    message:
      Make sure to set basename_len. With that patch, the tests pass.
    ------------------------------------------------------------
    revno: 2474.1.58
    merged: john at arbash-meinel.com-20070711000154-4et8yf8si3jgxmgc
    parent: john at arbash-meinel.com-20070710145123-jv3wcj10qdvkgmt8
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Tue 2007-07-10 19:01:54 -0500
    message:
      (broken) Try to properly implement DirState._bisect*
      Involves rewriting some helper functions.
      Currently something is wrong.
    ------------------------------------------------------------
    revno: 2474.1.57
    merged: john at arbash-meinel.com-20070710145123-jv3wcj10qdvkgmt8
    parent: john at arbash-meinel.com-20070509152850-spj91ozbgzpgxmw7
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Tue 2007-07-10 09:51:23 -0500
    message:
      Move code around to refactor according to our pyrex extension design.
      This creates a _dirstate_helpers_py.py next to _dirstate_helpers_c.pyx
      Rather than having a 'bzrlib.compiled.*' directory.
    ------------------------------------------------------------
    revno: 2474.1.56
    merged: john at arbash-meinel.com-20070509152850-spj91ozbgzpgxmw7
    parent: john at arbash-meinel.com-20070507231309-mtyzwjrascrg5tiq
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Wed 2007-05-09 10:28:50 -0500
    message:
      Remove a lot of unused definitions.
    ------------------------------------------------------------
    revno: 2474.1.55
    merged: john at arbash-meinel.com-20070507231309-mtyzwjrascrg5tiq
    parent: john at arbash-meinel.com-20070507230047-53ozoz7og6n2j24i
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 18:13:09 -0500
    message:
      Remove an unused (and ugly) pyrex function.
    ------------------------------------------------------------
    revno: 2474.1.54
    merged: john at arbash-meinel.com-20070507230047-53ozoz7og6n2j24i
    parent: john at arbash-meinel.com-20070507221117-l6pjpggfs9p2dtwy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 18:00:47 -0500
    message:
      Optimize the simple case that the strings are the same object.
      Add some TODO statements that we might consider.
    ------------------------------------------------------------
    revno: 2474.1.53
    merged: john at arbash-meinel.com-20070507221117-l6pjpggfs9p2dtwy
    parent: john at arbash-meinel.com-20070507214233-czz6gaimsje4qka6
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 17:11:17 -0500
    message:
      Changing Reader.get_next_str (which returns a Python String)
      into a c function saves a lot of time.
      Specifically it avoids a GetAttr call, and a PyObject_CallObject
      This drops the times down to:
      ...test__read_dirblocks_20k_tree_0_parents_c    OK      122ms/    2561ms
      ...test__read_dirblocks_20k_tree_0_parents_py   OK      235ms/    2606ms
      ...test__read_dirblocks_20k_tree_1_parent_c     OK      175ms/    2797ms
      ...test__read_dirblocks_20k_tree_1_parent_py    OK      358ms/    3014ms
      ...test__read_dirblocks_20k_tree_2_parents_c    OK      259ms/    2992ms
      ...test__read_dirblocks_20k_tree_2_parents_py   OK      498ms/    3232ms
      
      We are close to being 2x faster than the python implementation.
    ------------------------------------------------------------
    revno: 2474.1.52
    merged: john at arbash-meinel.com-20070507214233-czz6gaimsje4qka6
    parent: john at arbash-meinel.com-20070507213645-le9y48efqghhes86
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 16:42:33 -0500
    message:
      Add a benchmark timing how long it takes to add ~20k entries to a DirState object.
    ------------------------------------------------------------
    revno: 2474.1.51
    merged: john at arbash-meinel.com-20070507213645-le9y48efqghhes86
    parent: john at arbash-meinel.com-20070507213102-i2nuwkr0vfj8u98u
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 16:36:45 -0500
    message:
      Fix one benchmark so it is actually writing data instead of a null block.
    ------------------------------------------------------------
    revno: 2474.1.50
    merged: john at arbash-meinel.com-20070507213102-i2nuwkr0vfj8u98u
    parent: john at arbash-meinel.com-20070507211832-430v0s9bvjud3jeg
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 16:31:02 -0500
    message:
      Refactor a bit to make benchmark setup time faster.
    ------------------------------------------------------------
    revno: 2474.1.49
    merged: john at arbash-meinel.com-20070507211832-430v0s9bvjud3jeg
    parent: john at arbash-meinel.com-20070507204345-plq5j2u2hfwm1q8v
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 16:18:32 -0500
    message:
      Add DirState.save() benchmarks.
      At this point it doesn't seem a huge overhead
      (857ms for 20k entries with 2 parents on a slow machine)
      But something we might look into in the future
    ------------------------------------------------------------
    revno: 2474.1.48
    merged: john at arbash-meinel.com-20070507204345-plq5j2u2hfwm1q8v
    parent: john at arbash-meinel.com-20070507203816-0zk28og5dadjdj4l
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 15:43:45 -0500
    message:
      Just recording a benchmark on my fast machine
      _read_dirblocks_20k_tree_0_parents_c    OK      158ms/    2632ms
      _read_dirblocks_20k_tree_0_parents_py   OK      247ms/    2648ms
      _read_dirblocks_20k_tree_1_parent_c     OK      224ms/    5493ms
      _read_dirblocks_20k_tree_1_parent_py    OK      324ms/    5558ms
      _read_dirblocks_20k_tree_2_parents_c    OK      279ms/    6675ms
      _read_dirblocks_20k_tree_2_parents_py   OK      435ms/    6847ms
    ------------------------------------------------------------
    revno: 2474.1.47
    merged: john at arbash-meinel.com-20070507203816-0zk28og5dadjdj4l
    parent: john at arbash-meinel.com-20070507202804-5w45ajlfp3xoc3kl
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 15:38:16 -0500
    message:
      Change the names of the functions from c_foo and py_foo to foo_c and foo_py
      This makes it easier to search for 'def foo*' and means that benchmark results
      are next to eachother, rather than far apart.
    ------------------------------------------------------------
    revno: 2474.1.46
    merged: john at arbash-meinel.com-20070507202804-5w45ajlfp3xoc3kl
    parent: john at arbash-meinel.com-20070507191244-ywyxg0ftlh6n297f
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 15:28:04 -0500
    message:
      Finish implementing _c_read_dirblocks for any number of parents.
      bench_dirstate.BenchmarkDirState.test__c_read_dirblocks_20k_tree_0_parents    OK      367ms/    4353ms
      bench_dirstate.BenchmarkDirState.test__c_read_dirblocks_20k_tree_1_parent     OK      594ms/    8958ms
      bench_dirstate.BenchmarkDirState.test__c_read_dirblocks_20k_tree_2_parents    OK      842ms/   10490ms
      bench_dirstate.BenchmarkDirState.test__py_read_dirblocks_20k_tree_0_parents   OK      560ms/    4298ms
      bench_dirstate.BenchmarkDirState.test__py_read_dirblocks_20k_tree_1_parent    OK      692ms/    8658ms
      bench_dirstate.BenchmarkDirState.test__py_read_dirblocks_20k_tree_2_parents   OK     1006ms/   10710ms
      
      So overall the performance benefit is about 15-30%
    ------------------------------------------------------------
    revno: 2474.1.45
    merged: john at arbash-meinel.com-20070507191244-ywyxg0ftlh6n297f
    parent: john at arbash-meinel.com-20070507183155-fzs5z1516gyf5lth
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 14:12:44 -0500
    message:
      Add benchmarks to see how reading the dirstate changes when you have parents.
      Currently, the C implementation is slower than python, but partially that is
      because it is not optimized (at all).
    ------------------------------------------------------------
    revno: 2474.1.44
    merged: john at arbash-meinel.com-20070507183155-fzs5z1516gyf5lth
    parent: john at arbash-meinel.com-20070507182449-mm860vvdw9keyfx5
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 13:31:55 -0500
    message:
      Use cmp_by_dirs in _iter_changes, it saves a bit of time.
      When I initially wrote it, I thought they wouldn't be called often,
      but I realize now they are evaluated when we have unknown/ignored files
      on disk.
    ------------------------------------------------------------
    revno: 2474.1.43
    merged: john at arbash-meinel.com-20070507182449-mm860vvdw9keyfx5
    parent: john at arbash-meinel.com-20070507180840-e0r1jomaos7an93j
    parent: pqm at pqm.ubuntu.com-20070507175017-mvwcdqzq0w4z36lr
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 13:24:49 -0500
    message:
      [merge] bzr.dev 2483
    ------------------------------------------------------------
    revno: 2474.1.42
    merged: john at arbash-meinel.com-20070507180840-e0r1jomaos7an93j
    parent: john at arbash-meinel.com-20070507175701-b8c87exjybq31evq
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 13:08:40 -0500
    message:
      fix benchmark names, refactor to avoid 'create_path_names' overhead.
    ------------------------------------------------------------
    revno: 2474.1.41
    merged: john at arbash-meinel.com-20070507175701-b8c87exjybq31evq
    parent: john at arbash-meinel.com-20070505132458-0fe0g2jfdoyg95mn
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Mon 2007-05-07 12:57:01 -0500
    message:
      Change the name of cmp_dirblock_strings to cmp_by_dirs
      And refactor the test cases so that we test both the python version and the
      C version. Also, add benchmarks for both.
      It shows that the C version is approx 10x faster.
    ------------------------------------------------------------
    revno: 2474.1.40
    merged: john at arbash-meinel.com-20070505132458-0fe0g2jfdoyg95mn
    parent: john at arbash-meinel.com-20070505050202-hmi7l9smckjrf2pa
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Sat 2007-05-05 08:24:58 -0500
    message:
      (python-only) Shave a bit of time off by calling binascii.b2a_base64
      I should have looked closer, base64.encodestring() is a Legacy api, which
      just wraps binascii.b2a_base64.
      On 21k pack_stat calls, it drops us from around 784ms to 281ms
    ------------------------------------------------------------
    revno: 2474.1.39
    merged: john at arbash-meinel.com-20070505050202-hmi7l9smckjrf2pa
    parent: john at arbash-meinel.com-20070505045753-1fwhap6q0jyb18vt
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Sat 2007-05-05 00:02:02 -0500
    message:
      Clean up and remove unused functions.
    ------------------------------------------------------------
    revno: 2474.1.38
    merged: john at arbash-meinel.com-20070505045753-1fwhap6q0jyb18vt
    parent: john at arbash-meinel.com-20070505043606-lw7bjxwzcnjbls9v
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 23:57:53 -0500
    message:
      Finally, faster than text.split() (156ms)
      By iterating over the fields directly, we don't have to create Python strings
      for the dirname field (only when it changes), or for the size field or is_executable
      fields.
      A lot fewer python objects means faster parsing.
    ------------------------------------------------------------
    revno: 2474.1.37
    merged: john at arbash-meinel.com-20070505043606-lw7bjxwzcnjbls9v
    parent: john at arbash-meinel.com-20070505015422-9dfed0e9uza2g7n9
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 23:36:06 -0500
    message:
      get_next() returns the length of the string,
      preparing for having a _get_entry... which parses rather than
      extracting to a list first
    ------------------------------------------------------------
    revno: 2474.1.36
    merged: john at arbash-meinel.com-20070505015422-9dfed0e9uza2g7n9
    parent: john at arbash-meinel.com-20070504223428-d7vwvp3f7ypn9ivv
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 20:54:22 -0500
    message:
      Move functions into member functions on reader() class.
      Drops time down to 212ms
    ------------------------------------------------------------
    revno: 2474.1.35
    merged: john at arbash-meinel.com-20070504223428-d7vwvp3f7ypn9ivv
    parent: john at arbash-meinel.com-20070504222904-6f6i8yxr9qpf8lpw
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 17:34:28 -0500
    message:
      Read the entries one at a time, rather than all at the beginning.
    ------------------------------------------------------------
    revno: 2474.1.34
    merged: john at arbash-meinel.com-20070504222904-6f6i8yxr9qpf8lpw
    parent: john at arbash-meinel.com-20070504221204-d9mjz2nl8fd5maxp
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 17:29:04 -0500
    message:
      Delay reading fields until in parse loop
    ------------------------------------------------------------
    revno: 2474.1.33
    merged: john at arbash-meinel.com-20070504221204-d9mjz2nl8fd5maxp
    parent: john at arbash-meinel.com-20070504220621-iwla6gmrtx7iy37s
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 17:12:04 -0500
    message:
      Using text.split() is down to 174ms
      We'll need some work to get the Reader version faster.
    ------------------------------------------------------------
    revno: 2474.1.32
    merged: john at arbash-meinel.com-20070504220621-iwla6gmrtx7iy37s
    parent: john at arbash-meinel.com-20070504214853-iqaht2z8963hdlr3
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 17:06:21 -0500
    message:
      Skip past the first entry while reading,
      rather than while processing.
    ------------------------------------------------------------
    revno: 2474.1.31
    merged: john at arbash-meinel.com-20070504214853-iqaht2z8963hdlr3
    parent: john at arbash-meinel.com-20070504214147-ckrxzu7bepvcs4ct
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 16:48:53 -0500
    message:
      Avoiding the string format unless there is actually a problem
      saves us almost 50ms (down to 242ms)
    ------------------------------------------------------------
    revno: 2474.1.30
    merged: john at arbash-meinel.com-20070504214147-ckrxzu7bepvcs4ct
    parent: john at arbash-meinel.com-20070504210438-cvtzgzh4xbad7kww
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 16:41:47 -0500
    message:
      Start working towards a parser which uses a Reader (producer)
      rather than working on a list of fields. Currently slower than text.split('\0'),
      but should be possible to avoid the intermediate list entirely.
    ------------------------------------------------------------
    revno: 2474.1.29
    merged: john at arbash-meinel.com-20070504210438-cvtzgzh4xbad7kww
    parent: john at arbash-meinel.com-20070504200015-yli1te8jfhk3xpjc
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 16:04:38 -0500
    message:
      Refactor, so that the inner _fields_to_entries function is the
      doing the path comparison, and it will re-use the dirname object,
      rather than copying a new string each time.
      This should have equivalent performance, but have a rather large
      memory savings, because we don't maintain N copies of the dirname
      for N files in that directory.
      It (theoretically) will speed up some comparisons, too,
      because the string hash, etc, will be properly cached.
    ------------------------------------------------------------
    revno: 2474.1.28
    merged: john at arbash-meinel.com-20070504200015-yli1te8jfhk3xpjc
    parent: john at arbash-meinel.com-20070504194612-ryl2chfi4dd53c2h
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 15:00:15 -0500
    message:
      Ask the field converter to determine the current directory
      rather than parsing it out of the returned entry.
    ------------------------------------------------------------
    revno: 2474.1.27
    merged: john at arbash-meinel.com-20070504194612-ryl2chfi4dd53c2h
    parent: john at arbash-meinel.com-20070504192637-1tzys0ugbgy21fw9
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 14:46:12 -0500
    message:
      Switching to direct access of members of the list drops us down to 305ms
    ------------------------------------------------------------
    revno: 2474.1.26
    merged: john at arbash-meinel.com-20070504192637-1tzys0ugbgy21fw9
    parent: john at arbash-meinel.com-20070504192326-5f9kzev4v57if01r
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 14:26:37 -0500
    message:
      Switch to using an offset rather than doing a list splice
    ------------------------------------------------------------
    revno: 2474.1.25
    merged: john at arbash-meinel.com-20070504192326-5f9kzev4v57if01r
    parent: john at arbash-meinel.com-20070504190500-tq5wvnhmmd30m21y
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 14:23:26 -0500
    message:
      Refactor into a helper function to make implementation clearer
      This also improves performance to 319ms
    ------------------------------------------------------------
    revno: 2474.1.24
    merged: john at arbash-meinel.com-20070504190500-tq5wvnhmmd30m21y
    parent: john at arbash-meinel.com-20070504185936-1mjdoqmtz74xe5mg
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 14:05:00 -0500
    message:
      Unrolling into a direct loop drops us to 326ms
    ------------------------------------------------------------
    revno: 2474.1.23
    merged: john at arbash-meinel.com-20070504185936-1mjdoqmtz74xe5mg
    parent: john at arbash-meinel.com-20070504181128-422svqlutnl3v43d
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 13:59:36 -0500
    message:
      A C implementation of _fields_to_entry_0_parents drops the time from 400ms to 330ms for a 21k-entry tree
    ------------------------------------------------------------
    revno: 2474.1.22
    merged: john at arbash-meinel.com-20070504181128-422svqlutnl3v43d
    parent: john at arbash-meinel.com-20070504180557-iaitatth56jygggl
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 13:11:28 -0500
    message:
      Do the same renaming => py_ and c_ for _read_dirblocks
    ------------------------------------------------------------
    revno: 2474.1.21
    merged: john at arbash-meinel.com-20070504180557-iaitatth56jygggl
    parent: john at arbash-meinel.com-20070504174616-4kdi7zi32h7ev4f9
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 13:05:57 -0500
    message:
      Cleanup the multiple testing.
      Change the function names from both being 'bisect_dirblocks' to being
      py_bisect_dirblocks and c_bisect_dirblocks.
      And enable using the compiled form when it is available.
    ------------------------------------------------------------
    revno: 2474.1.20
    merged: john at arbash-meinel.com-20070504174616-4kdi7zi32h7ev4f9
    parent: john at arbash-meinel.com-20070504173600-5reyrpo013nk17sr
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 12:46:16 -0500
    message:
      Apply all of the tests for DirState.bisect_dirblock to the compiled function.
    ------------------------------------------------------------
    revno: 2474.1.19
    merged: john at arbash-meinel.com-20070504173600-5reyrpo013nk17sr
    parent: john at arbash-meinel.com-20070504163523-69dypgt24ipo26p2
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 12:36:00 -0500
    message:
      Clean up _cmp_dirblock_strings_alt to make it the default.
      This improves bisect_dirblock_compiled by another 2x.
      So far the improvement is now 800ms => 100ms => 50ms with the current
      function.
    ------------------------------------------------------------
    revno: 2474.1.18
    merged: john at arbash-meinel.com-20070504163523-69dypgt24ipo26p2
    parent: john at arbash-meinel.com-20070504161941-7n3we92jhxnczl5a
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 11:35:23 -0500
    message:
      Add an integer-size comparison loop at the begining, and
      update the test suite to make sure we are properly exercising it.
    ------------------------------------------------------------
    revno: 2474.1.17
    merged: john at arbash-meinel.com-20070504161941-7n3we92jhxnczl5a
    parent: john at arbash-meinel.com-20070504161120-wyplkl21ctqbq2ka
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 11:19:41 -0500
    message:
      Using a custom loop seems to be the same speed, but is probably
      easier to understand.
    ------------------------------------------------------------
    revno: 2474.1.16
    merged: john at arbash-meinel.com-20070504161120-wyplkl21ctqbq2ka
    parent: john at arbash-meinel.com-20070504160330-jai9q6h8ts1ddb2i
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 11:11:20 -0500
    message:
      Shave off maybe 10% by using the PyString_* macros instead of functions.
    ------------------------------------------------------------
    revno: 2474.1.15
    merged: john at arbash-meinel.com-20070504160330-jai9q6h8ts1ddb2i
    parent: john at arbash-meinel.com-20070504160216-v19b36wj16g0awwi
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 11:03:30 -0500
    message:
      No need to benchmark bisect_dirblock_compiled_cached
      The cache isn't used in the compiled form.
    ------------------------------------------------------------
    revno: 2474.1.14
    merged: john at arbash-meinel.com-20070504160216-v19b36wj16g0awwi
    parent: john at arbash-meinel.com-20070504155015-l31mrfviixrrf277
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 11:02:16 -0500
    message:
      Switching bisect_dirblocks remove the extra .split('/')
      This is a massive improvement (approx 8x).
      Since we avoid all the temporary lists, dictionary lookups etc.
      Now we just have a custom string comparison, which is quite fast.
    ------------------------------------------------------------
    revno: 2474.1.13
    merged: john at arbash-meinel.com-20070504155015-l31mrfviixrrf277
    parent: john at arbash-meinel.com-20070504154346-fgz2nrtwtd8u9w6a
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 10:50:15 -0500
    message:
      Now that we have bisect_dirblock working again, bring back cmp_dirblock_strings.
    ------------------------------------------------------------
    revno: 2474.1.12
    merged: john at arbash-meinel.com-20070504154346-fgz2nrtwtd8u9w6a
    parent: john at arbash-meinel.com-20070504044714-xgbrxg27p83yis89
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Fri 2007-05-04 10:43:46 -0500
    message:
      Clean up bisect_dirstate to not use temporary variables.
    ------------------------------------------------------------
    revno: 2474.1.11
    merged: john at arbash-meinel.com-20070504044714-xgbrxg27p83yis89
    parent: john at arbash-meinel.com-20070504043751-5unx865kqw9scyyu
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 23:47:14 -0500
    message:
      Avoid a Py_INCREF by using a void *
    ------------------------------------------------------------
    revno: 2474.1.10
    merged: john at arbash-meinel.com-20070504043751-5unx865kqw9scyyu
    parent: john at arbash-meinel.com-20070504041902-r5vxd4xpkduhbd0b
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 23:37:51 -0500
    message:
      Explicitly calling Py_INCREF makes things happier again.
    ------------------------------------------------------------
    revno: 2474.1.9
    merged: john at arbash-meinel.com-20070504041902-r5vxd4xpkduhbd0b
    parent: john at arbash-meinel.com-20070504041242-lnhinwkv7wvsejg0
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 23:19:02 -0500
    message:
      Revert the pyrex implementation to its most basic
      The fancier ones were causing segfaults.
    ------------------------------------------------------------
    revno: 2474.1.8
    merged: john at arbash-meinel.com-20070504041242-lnhinwkv7wvsejg0
    parent: john at arbash-meinel.com-20070504035829-orbif7nnkim9md1t
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 23:12:42 -0500
    message:
      Fix the benchmarks to test what I thought I was testing earlier
    ------------------------------------------------------------
    revno: 2474.1.7
    merged: john at arbash-meinel.com-20070504035829-orbif7nnkim9md1t
    parent: john at arbash-meinel.com-20070503234531-xt0tpuxwqgjn10l8
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 22:58:29 -0500
    message:
      Add some tests for a helper function that lets us
      compare 2 paths in 'dirblock' mode, without splitting the strings.
    ------------------------------------------------------------
    revno: 2474.1.6
    merged: john at arbash-meinel.com-20070503234531-xt0tpuxwqgjn10l8
    parent: john at arbash-meinel.com-20070503234105-xwv4fcxn26d97d6u
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 18:45:31 -0500
    message:
      use 10x the directories to make the timing to fall in the 1s mark
    ------------------------------------------------------------
    revno: 2474.1.5
    merged: john at arbash-meinel.com-20070503234105-xwv4fcxn26d97d6u
    parent: john at arbash-meinel.com-20070503233549-445n015iomhc8ppm
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 18:41:05 -0500
    message:
      Implement explicit handling of the no-cache version, which is even faster.
    ------------------------------------------------------------
    revno: 2474.1.4
    merged: john at arbash-meinel.com-20070503233549-445n015iomhc8ppm
    parent: john at arbash-meinel.com-20070503233314-btj1vbd2qtod34kq
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 18:35:49 -0500
    message:
      Add benchmarks for dirstate.bisect_dirblocks, and implement bisect_dirblocks in pyrex.
      Shows about a 2x performance improvement being in compiled C.
      Also, at least on my Mac, it is faster without extra caching.
    ------------------------------------------------------------
    revno: 2474.1.3
    merged: john at arbash-meinel.com-20070503233314-btj1vbd2qtod34kq
    parent: john at arbash-meinel.com-20070503211741-b51wshh2i5ecw50i
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 18:33:14 -0500
    message:
      remove the .c file for now, so it doesn't clutter things
    ------------------------------------------------------------
    revno: 2474.1.2
    merged: john at arbash-meinel.com-20070503211741-b51wshh2i5ecw50i
    parent: john at arbash-meinel.com-20070503201137-qiijh6rvjo9p14wy
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 16:17:41 -0500
    message:
      Add benchmark tests for a couple DirState functions.
    ------------------------------------------------------------
    revno: 2474.1.1
    merged: john at arbash-meinel.com-20070503201137-qiijh6rvjo9p14wy
    parent: pqm at pqm.ubuntu.com-20070430223205-x4uyrteryh0230fp
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: dirstate_pyrex
    timestamp: Thu 2007-05-03 15:11:37 -0500
    message:
      Create a Pyrex extension for reading the dirstate file.

Diff too large for email (2930 lines, the limit is 1000).



More information about the bazaar-commits mailing list