[RFC] Out of memory during BzrDir.find_branches

Mon Apr 21 14:12:52 BST 2008

Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Jonathan Lange wrote:
> > Taking a peak at the implementation, I notice that it builds all of
> > the open branches into a big list. This seems bound to cause problems.
> > Would it be a good thing to make BzrDir.find_branches an iterator?
> 
> I don't agree that opening a bunch of branches should be expected to
> cause problems.  Branches themselves have very little data; they take up
> ~48k on ext3.  Their repositories take more, but none of that should be
> loaded when a branch is initially opened.
> 
> When I run "bzr branches" in my Bazaar repository, memory increases by
> 27M.  This for 186 branches, so each branch is taking 145k.  That's
> several orders of magnitude too big, but still, you'd need more than 20
> 000 branches to OOM my machine.

Yes, there's a bad interaction with some combinations of bzr-svn and
python-subversion, including the current hardy versions it seems.  Every time
bzr-svn attempts to open a branch memory is being leaked.

> I think that switching to an iterator is just sidestepping a bug.

My experimentation shows that it doesn't really sidestep the bug effectively.
You still get OOM before the command finishes, it's just you get partial results
first, rather than none.

> > I can imagine that there are other, deeper memory consumption issues
> > at play here. Still, making find_branches an iterable means that
> > bzr-removable could use it.
> 
> Well, I do want us to have an API that is usable, even if I may disagree
> that removing branches from a repository is a good idea.  Are you really
> using something like 20 000 branches, though?

A lazy iterator is better regardless of the memory bug, because it starts giving
results much faster.

-Andrew.