bzr/LP issues from work discussed at UDS

Thu Dec 3 20:46:45 GMT 2009

John Arbash Meinel wrote:

>> 3) Importing a lot more branches
>>
>> We want to import a lot more branches this cycle, all of those used
>> for maintaining packages in Debian. I don't have a definite number
>> that we want to import, but
>>
>>   http://upsilon.cc/~zack/stuff/vcs-usage/
>>
>> declares that there are 6881 source packages using a VCS. Therefore,
>> what would happen if tomorrow we increased the number of vcs-imports
>> by 5000? (What is the current number?)
> 
> I think we currently have 8k or so, with some fraction of that failing.

No.  We have 2921 code imports[1], of which 1672 are currently active[2].

[1] https://code.edge.launchpad.net/+code-imports
[2] https://code.edge.launchpad.net/+code-import-list

> At least, I thought I remembered about 1-2k failures, and a 25% failure
> rate. So 2/.25 = 8k.

>> It may be that the answer here is just “deal with the failures,” but
>> maybe there needs to be infrastructure work done before this. jml
>> says that it may just be a case of throwing more machines at it,
>> as the system is already built to be scalable.
> 
> Well, I would assume that growing from 1 puller to 2 pullers would be a
> significant growing pain. But growing from there to N pullers would be
> mostly a matter of throwing hardware at the problem.

It's lucky we already have three machines then!

https://code.edge.launchpad.net/+code-imports/+machines

> And while the system is probably designed to support >1 pullers, until
> you actually have 2 running concurrently, I don't think you can claim
> anything :).

Well yes.

For 5000 new imports we will definitely need some more hardware.
Without code changes I think we actually need a lot more hardware, about
10 machines total (because each machine can only start two jobs per
minute and each import updates 4 times a day).  There are some easy
changes to scheduling that will help here, and using a process pool
rather than a new process will probably help too (the majority of code
imports don't find any revisions to import, so are actually processed
very quickly).

There is also the issue of the load the import branches place on the
branch puller, thanks to this bug:

    https://bugs.edge.launchpad.net/launchpad-code/+bug/487357

("the code import system calls requestMirror even if no revisions were
imported").

Cheers,
mwh