Rev 31: Bring in the 'rabin' experiment. in http://bazaar.launchpad.net/%7Ebzr/bzr-groupcompress/trunk

John Arbash Meinel john at arbash-meinel.com
Wed Mar 4 16:05:47 GMT 2009


At http://bazaar.launchpad.net/%7Ebzr/bzr-groupcompress/trunk

------------------------------------------------------------
revno: 31
revision-id: john at arbash-meinel.com-20090304160155-66iy2jorb5h39n6d
parent: robertc at robertcollins.net-20090302205544-kmcaa6d3stdbddda
parent: john at arbash-meinel.com-20090304153824-86p8mekizpx70bkr
committer: John Arbash Meinel <john at arbash-meinel.com>
branch nick: trunk
timestamp: Wed 2009-03-04 10:01:55 -0600
message:
  Bring in the 'rabin' experiment.
  Change the names and disk-strings for the various repository formats.
  Make the CHK format repositories all 'rich-root' we can introduce non-rich-root later.
  Make a couple other small tweaks, like copyright statements, etc.
  Remove patch-delta.c, at this point, it was only a reference implementation,
  as we have fully integrated the patching into pyrex, to allow nicer exception
  handling.
added:
  delta.h                        delta.h-20090227173129-qsu3u43vowf1q3ay-1
  diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
renamed:
  _groupcompress_c.pyx => _groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
  tests/test__groupcompress_c.py => tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
modified:
  .bzrignore                     bzrignore-20080724041812-7jbgn9euewwtns1u-1
  TODO                           todo-20080705181503-ccbxd6xuy1bdnrpu-5
  __init__.py                    __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
  groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
  repofmt.py                     repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
  setup.py                       setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
  tests/__init__.py              __init__.py-20080705181503-ccbxd6xuy1bdnrpu-11
  tests/test_groupcompress.py    test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
  _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
  tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.59
    revision-id: john at arbash-meinel.com-20090304153824-86p8mekizpx70bkr
    parent: john at arbash-meinel.com-20090304152748-iqp4zqlzvnq5pm23
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Wed 2009-03-04 09:38:24 -0600
    message:
      TODO entry.
    modified:
      TODO                           todo-20080705181503-ccbxd6xuy1bdnrpu-5
    ------------------------------------------------------------
    revno: 28.4.58
    revision-id: john at arbash-meinel.com-20090304152748-iqp4zqlzvnq5pm23
    parent: john at arbash-meinel.com-20090304150015-b6o2fru8grx5ubpm
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Wed 2009-03-04 09:27:48 -0600
    message:
      fix up the failing tests.
      
      The new delta code needs a 16-byte window to match, so to *know* that there will
      be a match, you need ~32-bytes in common. (guarantees that 16-bytes somewhere in
      that 32-byte range will match.)
      Also, when setting 'max_delta', it is possible that we run out of bytes before
      we actually find the last match, which would make things compress better.
      This is rare in practice, because texts are longer than 40 bytes. But it happens
      in testing.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      tests/test_groupcompress.py    test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 28.4.57
    revision-id: john at arbash-meinel.com-20090304150015-b6o2fru8grx5ubpm
    parent: john at arbash-meinel.com-20090304042506-zaf29b1u9jnajp2u
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Wed 2009-03-04 09:00:15 -0600
    message:
      Change the formatting, replace \t with spaces to be consistent with bzr coding.
    modified:
      delta.h                        delta.h-20090227173129-qsu3u43vowf1q3ay-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.56
    revision-id: john at arbash-meinel.com-20090304042506-zaf29b1u9jnajp2u
    parent: john at arbash-meinel.com-20090303225027-dd26kj3xasgfi7bv
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 22:25:06 -0600
    message:
      update TODO a little bit.
    modified:
      TODO                           todo-20080705181503-ccbxd6xuy1bdnrpu-5
    ------------------------------------------------------------
    revno: 28.4.55
    revision-id: john at arbash-meinel.com-20090303225027-dd26kj3xasgfi7bv
    parent: john at arbash-meinel.com-20090303222649-n917r5v7ti7szu5r
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 16:50:27 -0600
    message:
      Make sure the default is _FAST=False for now.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.54
    revision-id: john at arbash-meinel.com-20090303222649-n917r5v7ti7szu5r
    parent: john at arbash-meinel.com-20090303221259-ghe53xhqu8igvz03
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 16:26:49 -0600
    message:
      'bzr pack' _FAST during compress() now is 32s versus 25s.
      However, I'm extending _FAST to also stop checking the sha1 sums,
      with that change, _FAST is 20s versus 32s.
      It is a bit dangerous without the sha1 checking, but it is nice
      to see as a 'how fast can we make it', once we are sure about
      correctness.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.53
    revision-id: john at arbash-meinel.com-20090303221259-ghe53xhqu8igvz03
    parent: john at arbash-meinel.com-20090303220215-1luhz4zfr9vrdmud
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 16:12:59 -0600
    message:
      Remove the temporary adjustment for handling multiple formats of labels.
      Update the maximum size source array.
      I was hitting 16k sources in a single group, and I didn't want to write the code
      that resizes sources and then adjusts the existing index pointers.
      That should be done, though.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.52
    revision-id: john at arbash-meinel.com-20090303220215-1luhz4zfr9vrdmud
    parent: john at arbash-meinel.com-20090303214221-ea1e84bkmi22yfgk
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 16:02:15 -0600
    message:
      Use the max_delta flag.
      Prefer to extract and compress bytes rather than chunks/lines.
      This has a fairly positive impact on the 'bzr pack' times.
      We still do a ''.join([bytes]), but we know that doesn't have
      to do any memory copying.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.51
    revision-id: john at arbash-meinel.com-20090303214221-ea1e84bkmi22yfgk
    parent: john at arbash-meinel.com-20090303212302-lemyfgzfyq0l7ojl
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 15:42:21 -0600
    message:
      Remove the debug printing.
    modified:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.50
    revision-id: john at arbash-meinel.com-20090303212302-lemyfgzfyq0l7ojl
    parent: john at arbash-meinel.com-20090303210721-m25wehoeo3jxsz11
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Tue 2009-03-03 15:23:02 -0600
    message:
      Change the code to do the copies in bigger chunks.
      
      We should be able to get a small number of memcopies, rather than having to copy
      each record individualy, or copy each hash range individually.
    modified:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.49
    revision-id: john at arbash-meinel.com-20090303210721-m25wehoeo3jxsz11
    parent: john at arbash-meinel.com-20090303203526-o9xw0n70j2g622e0
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Tue 2009-03-03 15:07:21 -0600
    message:
      When adding new entries to the delta index, use memcpy
      rather than copying them one by one.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.48
    revision-id: john at arbash-meinel.com-20090303203526-o9xw0n70j2g622e0
    parent: john at arbash-meinel.com-20090303200908-hjdzbzj0cs6zua2v
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Tue 2009-03-03 14:35:26 -0600
    message:
      Remove bogus line.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.47
    revision-id: john at arbash-meinel.com-20090303200908-hjdzbzj0cs6zua2v
    parent: john at arbash-meinel.com-20090303200711-qc4qoqyrnpyla6iz
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Tue 2009-03-03 14:09:08 -0600
    message:
      Use the new add_delta_source.
      
      It shaves off a small amount of time, and improves the compression slightly.
      Next step is to work on optimizing the code.
      It feels like the include_entries_from_index is wasting a lot of time
      double copying all of the previous matches.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.46
    revision-id: john at arbash-meinel.com-20090303200711-qc4qoqyrnpyla6iz
    parent: john at arbash-meinel.com-20090303195329-epc5tn11m2jmo7rm
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Tue 2009-03-03 14:07:11 -0600
    message:
      Fix a bug in create_delta_index_from_delta when inserting into a already filled hash location.
    modified:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.45
    revision-id: john at arbash-meinel.com-20090303195329-epc5tn11m2jmo7rm
    parent: john at arbash-meinel.com-20090303181057-i1239vipqi27fxbs
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Tue 2009-03-03 13:53:29 -0600
    message:
      Add a function that updates the index for delta bytes.
      This avoids indexing control bytes, and helps to align the actual index pointers
      to the real data.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      delta.h                        delta.h-20090227173129-qsu3u43vowf1q3ay-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
      tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.44
    revision-id: john at arbash-meinel.com-20090303181057-i1239vipqi27fxbs
    parent: john at arbash-meinel.com-20090303180544-mfgw9jsndwiwj047
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 12:10:57 -0600
    message:
      Remove the multi-index handling now that we have index combining instead.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      delta.h                        delta.h-20090227173129-qsu3u43vowf1q3ay-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.43
    revision-id: john at arbash-meinel.com-20090303180544-mfgw9jsndwiwj047
    parent: john at arbash-meinel.com-20090303163107-l4j0114btw2efmjp
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 12:05:44 -0600
    message:
      Change the internals to allow delta indexes to be expanded with new source data.
      Now when adding a new source, the old index entries are included in the new structure.
      This generally seems to be better than having multiple indexes, as it improves the
      efficiency of the internal hash map, and avoids extra iterating.
      Bring back the _FAST flag. At the moment, with _FAST=True, doing bzr pack is about
      37s rather than 1min, and gives 9.7MB texts, rather than 8.2MB or so.
      So at the moment, it is still a useful flag to have.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      delta.h                        delta.h-20090227173129-qsu3u43vowf1q3ay-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.42
    revision-id: john at arbash-meinel.com-20090303163107-l4j0114btw2efmjp
    parent: john at arbash-meinel.com-20090303160222-4bkou2s65s60h75a
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 10:31:07 -0600
    message:
      Change the code around again.
      
      This time, the information about sources is maintained in the DeltaIndex object.
      And we pass that info down into create_delta_index, et al.
      
      Next step is to actually combine the delta indexes.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      delta.h                        delta.h-20090227173129-qsu3u43vowf1q3ay-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.41
    revision-id: john at arbash-meinel.com-20090303160222-4bkou2s65s60h75a
    parent: john at arbash-meinel.com-20090303150939-93yexh0v5hmvkwdo
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 10:02:22 -0600
    message:
      Start moving the information about source buffers into the actual index_entry.
      
      This leads the way for combining indexes for multiple sources together.
    modified:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.40
    revision-id: john at arbash-meinel.com-20090303150939-93yexh0v5hmvkwdo
    parent: john at arbash-meinel.com-20090303150400-3il0kyvau1ho5vww
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 09:09:39 -0600
    message:
      Add a comment why we aren't using the list type for _sources
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
    ------------------------------------------------------------
    revno: 28.4.39
    revision-id: john at arbash-meinel.com-20090303150400-3il0kyvau1ho5vww
    parent: john at arbash-meinel.com-20090303145931-5ahrrw6hycii49xj
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 09:04:00 -0600
    message:
      Merge the setup.py changes so that it actually fails if an extension fails to build.
    modified:
      setup.py                       setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
    ------------------------------------------------------------
    revno: 28.4.38
    revision-id: john at arbash-meinel.com-20090303145931-5ahrrw6hycii49xj
    parent: john at arbash-meinel.com-20090303144815-zdo0ak0vjclvx6y3
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 08:59:31 -0600
    message:
      fix the local offset problem in a slightly different way.
      Leave moff in local offsets until encoding, and then convert.
      This allows us to skip the extra local variable, and just looks a bit cleaner, IMO.
    modified:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.37
    revision-id: john at arbash-meinel.com-20090303144815-zdo0ak0vjclvx6y3
    parent: john at arbash-meinel.com-20090303141551-qhokyhnloc1qsznh
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 08:48:15 -0600
    message:
      If you are going to join the bytes anyway, use sha_string instead of sha_strings.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.36
    revision-id: john at arbash-meinel.com-20090303141551-qhokyhnloc1qsznh
    parent: john at arbash-meinel.com-20090303021815-dlqfgperty1bwnv1
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Tue 2009-03-03 08:15:51 -0600
    message:
      Track down a memory leak in the refactored diff-delta.c code.
      
      We weren't deallocating the unpacked hash array in all code paths.
    modified:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.35
    revision-id: john at arbash-meinel.com-20090303021815-dlqfgperty1bwnv1
    parent: john at arbash-meinel.com-20090303021638-20p6dywzjesch07v
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Mon 2009-03-02 20:18:15 -0600
    message:
      Add a rich-root compatible gcr+chk255+rich-root format.
    modified:
      __init__.py                    __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
      repofmt.py                     repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.4.34
    revision-id: john at arbash-meinel.com-20090303021638-20p6dywzjesch07v
    parent: john at arbash-meinel.com-20090302223828-hyb4crn4w28sgvmc
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Mon 2009-03-02 20:16:38 -0600
    message:
      Update groupcompress to allow it to read older conversions.
      This will be removed, but I needed it for testing.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.33
    revision-id: john at arbash-meinel.com-20090302223828-hyb4crn4w28sgvmc
    parent: john at arbash-meinel.com-20090302210223-9ixutqay7sx8c1n3
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Mon 2009-03-02 16:38:28 -0600
    message:
      Fix a bug when handling multiple large-range copies.
      
      We were adjusting moff multiple times, without adjusting it back.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.32
    revision-id: john at arbash-meinel.com-20090302210223-9ixutqay7sx8c1n3
    parent: john at arbash-meinel.com-20090302202718-c7ojzhft35boi1kn
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Mon 2009-03-02 15:02:23 -0600
    message:
      Refactor the code a bit, so that I can re-use bits for a create_delta_index_from_delta.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.31
    revision-id: john at arbash-meinel.com-20090302202718-c7ojzhft35boi1kn
    parent: john at arbash-meinel.com-20090302201609-k275n1rspptl2ve3
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: rabin
    timestamp: Mon 2009-03-02 14:27:18 -0600
    message:
      Add a bit of comments about things to do.
    modified:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.30
    revision-id: john at arbash-meinel.com-20090302201609-k275n1rspptl2ve3
    parent: john at arbash-meinel.com-20090302200018-si0py093o7esxzyd
    parent: john at arbash-meinel.com-20090302200837-l2v96rd0e6u68479
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 14:16:09 -0600
    message:
      Merge in Ian's groupcompress trunk updates
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      repofmt.py                     repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.4.29
    revision-id: john at arbash-meinel.com-20090302200018-si0py093o7esxzyd
    parent: john at arbash-meinel.com-20090302195421-5j3s3xzr2r8y80bw
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 14:00:18 -0600
    message:
      Forgot to add the delta bytes to the index objects.
      Also add an assertion to make sure things like that don't get missed.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.28
    revision-id: john at arbash-meinel.com-20090302195421-5j3s3xzr2r8y80bw
    parent: john at arbash-meinel.com-20090302194337-f0x1quasnm4p7x9m
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 13:54:21 -0600
    message:
      Gotta import 'trace' if you want to use trace.mutter()
    modified:
      repofmt.py                     repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.4.27
    revision-id: john at arbash-meinel.com-20090302194337-f0x1quasnm4p7x9m
    parent: john at arbash-meinel.com-20090302193629-51hqsvh1rhh71gku
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 13:43:37 -0600
    message:
      Fix up some failing tests.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      tests/test_groupcompress.py    test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 28.4.26
    revision-id: john at arbash-meinel.com-20090302193629-51hqsvh1rhh71gku
    parent: john at arbash-meinel.com-20090302191537-7mvjwk2042fvj9gg
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 13:36:29 -0600
    message:
      We now start to make use of the ability to extend the delta index
      with new sources. Next step is to understand the delta encoding, so as to
      avoid linking up with lines in the deltas.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.25
    revision-id: john at arbash-meinel.com-20090302191537-7mvjwk2042fvj9gg
    parent: john at arbash-meinel.com-20090302185236-gm5ckgaic13q6vvs
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 13:15:37 -0600
    message:
      We are now able to add multiple sources to the delta generator.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.24
    revision-id: john at arbash-meinel.com-20090302185236-gm5ckgaic13q6vvs
    parent: john at arbash-meinel.com-20090302180420-8m229eh99p2bp2r5
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 12:52:36 -0600
    message:
      Change the code so that we can pass in multiple sources to match against.
      At the moment, we only use a single source, but that will soon change.
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      delta.h                        delta.h-20090227173129-qsu3u43vowf1q3ay-1
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
    ------------------------------------------------------------
    revno: 28.4.23
    revision-id: john at arbash-meinel.com-20090302180420-8m229eh99p2bp2r5
    parent: john at arbash-meinel.com-20090302180323-cx4qz36qnmd0dnki
    parent: john at arbash-meinel.com-20090302160108-9pl56rebxcd23w35
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 12:04:20 -0600
    message:
      Merge the gc for pyrex 0.9.6.4 updates
    modified:
      _groupcompress_pyx.pyx         _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
        ------------------------------------------------------------
        revno: 28.5.1
        revision-id: john at arbash-meinel.com-20090302160108-9pl56rebxcd23w35
        parent: john at arbash-meinel.com-20090228050444-38soix727ge8yhvn
        committer: John Arbash Meinel <john at arbash-meinel.com>
        branch nick: groupcompress_rabin
        timestamp: Mon 2009-03-02 10:01:08 -0600
        message:
          Make the groupcompress pyrex extension compatible with pyrex 0.9.6.4
          Also fix a bug in processing the offsets.
        modified:
          _groupcompress_c.pyx           _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
    ------------------------------------------------------------
    revno: 28.4.22
    revision-id: john at arbash-meinel.com-20090302180323-cx4qz36qnmd0dnki
    parent: john at arbash-meinel.com-20090302170533-v13igzvtt0hf7y2z
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 12:03:23 -0600
    message:
      Add a mutter() while repacking, so that we log progress as we go along.
    modified:
      repofmt.py                     repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
    ------------------------------------------------------------
    revno: 28.4.21
    revision-id: john at arbash-meinel.com-20090302170533-v13igzvtt0hf7y2z
    parent: john at arbash-meinel.com-20090228050444-38soix727ge8yhvn
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Mon 2009-03-02 11:05:33 -0600
    message:
      Rename the extension to _pyx, since Robert prefers that form
    renamed:
      _groupcompress_c.pyx => _groupcompress_pyx.pyx _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      tests/test__groupcompress_c.py => tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    modified:
      .bzrignore                     bzrignore-20080724041812-7jbgn9euewwtns1u-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      setup.py                       setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
      tests/__init__.py              __init__.py-20080705181503-ccbxd6xuy1bdnrpu-11
      tests/test__groupcompress_pyx.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.20
    revision-id: john at arbash-meinel.com-20090228050444-38soix727ge8yhvn
    parent: john at arbash-meinel.com-20090228050349-5b5fljgovy1ylokx
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 23:04:44 -0600
    message:
      For now, use _FAST=True
      
      This could be a reasonable 'autopack' configuration, if DeltaIndex.extend()
      ends up being too difficult to implement.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.19
    revision-id: john at arbash-meinel.com-20090228050349-5b5fljgovy1ylokx
    parent: john at arbash-meinel.com-20090228044639-zhrn3p7ykngc0zs4
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 23:03:49 -0600
    message:
      Implement a 'FAST' mode.
      
      If we insert a text and get a 'decent' delta, then we just keep using
      that delta_index until we get a bad insert. (delta > 1/2 size).
      In this mode 'bzr pack' drops from 2m41s => 53s. Inventory pages
      are barely effected in size, while Text pages go from 8.2MB => 9.6MB.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.18
    revision-id: john at arbash-meinel.com-20090228044639-zhrn3p7ykngc0zs4
    parent: john at arbash-meinel.com-20090228044347-vjb5fzj5s9cd8a7c
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 22:46:39 -0600
    message:
      Add some profiling comments.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.17
    revision-id: john at arbash-meinel.com-20090228044347-vjb5fzj5s9cd8a7c
    parent: john at arbash-meinel.com-20090228042933-zdoupq6lka7lyvg9
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 22:43:47 -0600
    message:
      Create a wrapper function, so that lsprof will properly attribute time spent.
    modified:
      _groupcompress_c.pyx           _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.16
    revision-id: john at arbash-meinel.com-20090228042933-zdoupq6lka7lyvg9
    parent: john at arbash-meinel.com-20090228042802-joang5uih4qcf45p
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 22:29:33 -0600
    message:
      Properly restore the label functionality.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.15
    revision-id: john at arbash-meinel.com-20090228042802-joang5uih4qcf45p
    parent: john at arbash-meinel.com-20090228042448-nfhhzpjuqic78bfr
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 22:28:02 -0600
    message:
      Handle when self._index is NULL, mostly because the source text was the empty strig.
      Start using DeltaIndex as part of the stardard compressing.
    modified:
      _groupcompress_c.pyx           _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.14
    revision-id: john at arbash-meinel.com-20090228042448-nfhhzpjuqic78bfr
    parent: john at arbash-meinel.com-20090228040012-lbkwky6vtdmhjepx
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 22:24:48 -0600
    message:
      Implement a DeltaIndex wrapper.
      
      This splits out the create_delta_index from the create_delta code.
      Which should also help for profiling purposes.
    modified:
      _groupcompress_c.pyx           _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      tests/test__groupcompress_c.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.13
    revision-id: john at arbash-meinel.com-20090228040012-lbkwky6vtdmhjepx
    parent: john at arbash-meinel.com-20090228032304-13o0os3ho1nqq4ze
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 22:00:12 -0600
    message:
      Factor out the ability to have/not have labels.
      
      It turns out that labels now cost overall 10% increase in repo size. A rather
      large 40% increase for inventory pages.
      Perhaps since label == sha1 we could get away doing something differently.
      Note also that repository-details doesn't take into account the indexes.
      The .cix index for a conversion is approx 380kB, which starts to be an
      important factor when you consider the total content for all chk pages
      is less than 1.5MB.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.12
    revision-id: john at arbash-meinel.com-20090228032304-13o0os3ho1nqq4ze
    parent: john at arbash-meinel.com-20090227204002-fdzk52zc3frd4ddi
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 21:23:04 -0600
    message:
      Add a 'len:' field to the data.
      
      With this field, we can now fully populate an index from expanding
      the group-compress pages.
      There might be an issue with expanding the zlib pages, though if
      we switched to using gzip pages that would certainly go away.
      (perhaps zlib would have a 'trailing bytes', though, that would
      make it ok.)
      Checking to see how much this impacts final compressed size.
      Next step is to try removing all labels, and see what that
      final size becomes.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
    ------------------------------------------------------------
    revno: 28.4.11
    revision-id: john at arbash-meinel.com-20090227204002-fdzk52zc3frd4ddi
    parent: john at arbash-meinel.com-20090227201847-181ruulj0worz3ra
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 14:40:02 -0600
    message:
      Insert a fulltext if the delta is more than half the total size.
      Also, gcr deltas are more pithy, they probably are approx the same after
      compression, but decrease the range limits since the copy instructions are
      effectively pre-compressed.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      setup.py                       setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
    ------------------------------------------------------------
    revno: 28.4.10
    revision-id: john at arbash-meinel.com-20090227201847-181ruulj0worz3ra
    parent: john at arbash-meinel.com-20090227195427-5rw3pjlgkssido0d
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 14:18:47 -0600
    message:
      Allowing the source bytes to be longer than expected.
      This makes a huge difference for extraction speed.
      10s versus 45s. Versus 17s for the original groupcompress code.
      
      
      Also, the compiled version in _groupcompress_c seems ~ the same speed as
      the patch-delta.c version.
      At the very least, the extra memory copy overhead negates any benefit.
    modified:
      _groupcompress_c.pyx           _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      patch-delta.c                  patchdelta.c-20090226042143-l9wzxynyuxnb5hus-2
    ------------------------------------------------------------
    revno: 28.4.9
    revision-id: john at arbash-meinel.com-20090227195427-5rw3pjlgkssido0d
    parent: john at arbash-meinel.com-20090227184307-h8zgtnf217omdw1h
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 13:54:27 -0600
    message:
      We now basically have full support for using diff-delta as the compressor.
      
      Will still need some tuning/tweaking to see how we want to proceed.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      tests/test_groupcompress.py    test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 28.4.8
    revision-id: john at arbash-meinel.com-20090227184307-h8zgtnf217omdw1h
    parent: john at arbash-meinel.com-20090227182104-ogr8fu5548ewpzx3
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 12:43:07 -0600
    message:
      Add another test text.
    modified:
      tests/test__groupcompress_c.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.7
    revision-id: john at arbash-meinel.com-20090227182104-ogr8fu5548ewpzx3
    parent: john at arbash-meinel.com-20090227173623-wbwvxgznqacu6u48
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 12:21:04 -0600
    message:
      Add a apply_delta2 function, just in case it matters.
    modified:
      _groupcompress_c.pyx           _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
    ------------------------------------------------------------
    revno: 28.4.6
    revision-id: john at arbash-meinel.com-20090227173623-wbwvxgznqacu6u48
    parent: john at arbash-meinel.com-20090227173204-ce7djs6xbflluut1
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 11:36:23 -0600
    message:
      Start stripping out the actual GroupCompressor
      in preparation for using the diff-delta code.
      Add some tests that we can generate and apply diff deltas.
      
      We need to start adding some exceptions, and consider moving the
      core of the patch-delta loop back into a pure C function, as the
      generated code is very messy.
    modified:
      .bzrignore                     bzrignore-20080724041812-7jbgn9euewwtns1u-1
      _groupcompress_c.pyx           _groupcompress_c.pyx-20080724041824-yelg6ii7c7zxt4z0-1
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      setup.py                       setup.py-20080705181503-ccbxd6xuy1bdnrpu-9
      tests/__init__.py              __init__.py-20080705181503-ccbxd6xuy1bdnrpu-11
      tests/test__groupcompress_c.py test__groupcompress_-20080724145854-koifwb7749cfzrvj-1
    ------------------------------------------------------------
    revno: 28.4.5
    revision-id: john at arbash-meinel.com-20090227173204-ce7djs6xbflluut1
    parent: john at arbash-meinel.com-20090227160746-1gt1m20vqk7i273c
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 11:32:04 -0600
    message:
      Minor changes to get diff-delta.c and patch-delta.c to compile.
      This includes bringing in 'delta.h'
    added:
      delta.h                        delta.h-20090227173129-qsu3u43vowf1q3ay-1
    modified:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
      patch-delta.c                  patchdelta.c-20090226042143-l9wzxynyuxnb5hus-2
    ------------------------------------------------------------
    revno: 28.4.4
    revision-id: john at arbash-meinel.com-20090227160746-1gt1m20vqk7i273c
    parent: john at arbash-meinel.com-20090227160650-iv1rpvxsqejydxj7
    parent: john at arbash-meinel.com-20090227051839-841q6ss4z8zm1353
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 10:07:46 -0600
    message:
      Merge in the latest updates to the gc trunk.
    modified:
      groupcompress.py               groupcompress.py-20080705181503-ccbxd6xuy1bdnrpu-8
      repofmt.py                     repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
      tests/test_groupcompress.py    test_groupcompress.p-20080705181503-ccbxd6xuy1bdnrpu-13
    ------------------------------------------------------------
    revno: 28.4.3
    revision-id: john at arbash-meinel.com-20090227160650-iv1rpvxsqejydxj7
    parent: john at arbash-meinel.com-20090226042229-qk6u230fwyxbmhd7
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Fri 2009-02-27 10:06:50 -0600
    message:
      Fix a couple more locations.
    modified:
      __init__.py                    __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
    ------------------------------------------------------------
    revno: 28.4.2
    revision-id: john at arbash-meinel.com-20090226042229-qk6u230fwyxbmhd7
    parent: john at arbash-meinel.com-20090226041719-oi3d5putp8s2r233
    author: Nicolas Pitre <nico at cam.org>
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Wed 2009-02-25 22:22:29 -0600
    message:
      Add the diff-delta.c and patch-delta.c files.
    added:
      diff-delta.c                   diffdelta.c-20090226042143-l9wzxynyuxnb5hus-1
      patch-delta.c                  patchdelta.c-20090226042143-l9wzxynyuxnb5hus-2
    ------------------------------------------------------------
    revno: 28.4.1
    revision-id: john at arbash-meinel.com-20090226041719-oi3d5putp8s2r233
    parent: john at arbash-meinel.com-20090225230422-4oigw03k7fq62eyb
    committer: John Arbash Meinel <john at arbash-meinel.com>
    branch nick: groupcompress_rabin
    timestamp: Wed 2009-02-25 22:17:19 -0600
    message:
      Start a quick experimentation with a different 'diff' algorithm.
    modified:
      __init__.py                    __init__.py-20080705181503-ccbxd6xuy1bdnrpu-6
      repofmt.py                     repofmt.py-20080715094215-wp1qfvoo7093c8qr-1
-------------- next part --------------

Diff too large for email (3617 lines, the limit is 1000).


More information about the bazaar-commits mailing list