[Bug 737234] Re: too much data transferred making a new stacked branch
Jelmer Vernooij
737234 at bugs.launchpad.net
Fri Jun 10 16:11:36 UTC 2011
** Changed in: bzr (Ubuntu Natty)
Importance: Undecided => High
** Changed in: bzr (Ubuntu Natty)
Assignee: (unassigned) => Jelmer Vernooij (jelmer)
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to bzr in Ubuntu.
https://bugs.launchpad.net/bugs/737234
Title:
too much data transferred making a new stacked branch
Status in Bazaar Version Control System:
Fix Released
Status in Bazaar 2.3 series:
Fix Released
Status in “bzr” package in Ubuntu:
Fix Released
Status in “bzr” source package in Natty:
In Progress
Bug description:
In thread "Linaro bzr feedback" John writes:
Note, I just did 'bzr branch lp:gcc-linaro', and it transferred about
500MB, about 457MB on disk. (Not bad considering lp:emacs transferred
400-500MB and was only 200MB on disk.)
I then ran 'bzr serve' and 'bzr branch --stacked bzr://localhost:...'.
What was scary was:
8141442kB 24128kB/s / Finding Revisions
...
> Grepping the .bzr.log file in question, I do, indeed see about 8.1GB of
> data transferred before we read the first .tix.
> If my grep fu is strong, then we only read 30MB of .cix data. Which
> leaves us with 8GB of .pack content, or actual CHK page content.
This is a change which drops the 8GB down to 150MB:
=== modified file 'bzrlib/inventory.py'
- --- bzrlib/inventory.py 2010-09-14 13:12:20 +0000
+++ bzrlib/inventory.py 2011-03-17 15:38:40 +0000
@@ -736,6 +736,13 @@
specific_file_ids = set(specific_file_ids)
# TODO? Perhaps this should return the from_dir so that the root is
# yielded? or maybe an option?
+ if from_dir is None and specific_file_ids is None:
+ # They are iterating from the root, assume they are iterating
+ # everything and preload all file_ids into the
+ # _fileid_to_entry_cache. This doesn't build things into
.children
+ # for each directory, but that will happen later.
+ for _ in self.iter_just_entries():
+ continue
if from_dir is None:
if self.root is None:
return
Basically, iter_entries_by_dir goes in a specific order which doesn't
match the order in the repository. 'iter_just_entries' loads everything
in repository order, and puts it into the
CHKInventory._file_id_entry_cache, and then the rest of the requests are
fed from there.
We don't usually notice this effect, because of the
chk_map._thread_caches.page_cache and the GCCHKRepository block cache.
Once the inventory is large enough to not be in the bytes cache, we have
to load it from the repository again.
I just checked, and this also has a large effect for local
repositories.
'time list(rev_tree.inventory.iter_entries_by_dir())'
drops from 4m30s down to 13s with the patch.
So we certainly should think about other ramifications, but short term
it looks quite good.
To manage notifications about this bug go to:
https://bugs.launchpad.net/bzr/+bug/737234/+subscriptions
More information about the foundations-bugs
mailing list