[Bug 737234] Re: too much data transferred making a new stacked branch

Launchpad Bug Tracker 737234 at bugs.launchpad.net
Mon Aug 1 01:09:47 UTC 2011


This bug was fixed in the package bzr - 2.3.4-0ubuntu1

---------------
bzr (2.3.4-0ubuntu1) natty-proposed; urgency=low

  * New upstream release.
   + Fix bzr version number in deprecation warnings. LP: #794960
   + Prevent write attemps on remote branch during "bzr up". LP: #786980
   + Fix conflict handling when two trees involved in a merge have different
     root ids. LP: #805809

bzr (2.3.3-0ubuntu1) natty-proposed; urgency=low

  * New upstream release.
   + Fixes deprecation warning on newer versions of Python. LP: #760435
   + Stops 'bzr push' from copying entire repository if a .bzr directory is
     present without a branch. LP: #465517
   + Fixes undefined local variable error when waiting for lock. LP: #733136
   + Fixes lock contention issues pushing to a bound branch. LP: #733350
   + Transfers less data creating a new stacked branch. LP: #737234
   + Several fixes to the test suite, making it more robust. LP: #654733,
      LP: #751824
   + 'bzr merge --pull --preview' actually shows a preview rather than
     actually merging. LP: #760152
   + bzr smart server now supports UTF-8 user names. LP: #659763
   + user identity can now be set based on username and /etc/mailname, not
     requiring it to be set manually. LP: #616878
   + stacking is now fully transitive. LP: #715000
   + makes in-terminal crash report of plugins much shorter. LP: #716389
 -- Jelmer Vernooij <jelmer at debian.org>   Thu, 14 Jul 2011 21:12:58 +0200

** Changed in: bzr (Ubuntu Natty)
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to bzr in Ubuntu.
https://bugs.launchpad.net/bugs/737234

Title:
  too much data transferred making a new stacked branch

Status in Bazaar Version Control System:
  Fix Released
Status in Bazaar 2.3 series:
  Fix Released
Status in “bzr” package in Ubuntu:
  Fix Released
Status in “bzr” source package in Natty:
  Fix Released

Bug description:
  In thread "Linaro bzr feedback" John writes:

  Note, I just did 'bzr branch lp:gcc-linaro', and it transferred about
  500MB, about 457MB on disk. (Not bad considering lp:emacs transferred
  400-500MB and was only 200MB on disk.)

  I then ran 'bzr serve' and 'bzr branch --stacked bzr://localhost:...'.
  What was scary was:

  8141442kB 24128kB/s / Finding Revisions
  ...
  > Grepping the .bzr.log file in question, I do, indeed see about 8.1GB of
  > data transferred before we read the first .tix.
  > If my grep fu is strong, then we only read 30MB of .cix data. Which
  > leaves us with 8GB of .pack content, or actual CHK page content.

  This is a change which drops the 8GB down to 150MB:

  === modified file 'bzrlib/inventory.py'
  - --- bzrlib/inventory.py 2010-09-14 13:12:20 +0000
  +++ bzrlib/inventory.py 2011-03-17 15:38:40 +0000
  @@ -736,6 +736,13 @@
              specific_file_ids = set(specific_file_ids)
          # TODO? Perhaps this should return the from_dir so that the root is
          # yielded? or maybe an option?
  +        if from_dir is None and specific_file_ids is None:
  +            # They are iterating from the root, assume they are iterating
  +            # everything and preload all file_ids into the
  +            # _fileid_to_entry_cache. This doesn't build things into
  .children
  +            # for each directory, but that will happen later.
  +            for _ in self.iter_just_entries():
  +                continue
          if from_dir is None:
              if self.root is None:
                  return

  
  Basically, iter_entries_by_dir goes in a specific order which doesn't
  match the order in the repository. 'iter_just_entries' loads everything
  in repository order, and puts it into the
  CHKInventory._file_id_entry_cache, and then the rest of the requests are
  fed from there.

  We don't usually notice this effect, because of the
  chk_map._thread_caches.page_cache and the GCCHKRepository block cache.
  Once the inventory is large enough to not be in the bytes cache, we have
  to load it from the repository again.

  I just checked, and this also has a large effect for local
  repositories.

  'time list(rev_tree.inventory.iter_entries_by_dir())'
  drops from 4m30s down to 13s with the patch.

  So we certainly should think about other ramifications, but short term
  it looks quite good.

To manage notifications about this bug go to:
https://bugs.launchpad.net/bzr/+bug/737234/+subscriptions




More information about the foundations-bugs mailing list