Moving udd away from sqlite
Vincent Ladeuil
vila+udd at canonical.com
Fri Jun 15 10:34:12 UTC 2012
>>>>> Jonathan Lange <jml at canonical.com> writes:
<snip/>
>> There is a work in progress regarding the jubany package importer
>> deployment, let's try to no step on each other feet ;)
>>
> What's the work in progress?
Setting up a quantal lxc container that can be used for both test and
production with minimal differences between the two (see the 'Upgrading
pristine-xz on jubany' thread for rationale and details).
> ...
>> > After lots of head-scratching I believe I've worked out why
>> > changing to storm made this happen when the underlying db was the
>> > same. Storm forces sqlite to operate at a higher isolation level,
>> > so udd was taking locks more frequently or holding them for
>> > longer, leading to more contention and eventually the deadlock.
>>
>> But we already know we can have deadlocks with sqlite
>> (https://bugs.launchpad.net/udd/+bug/724893) so I'm not convinced
>> changing the db will magically fix the issues.
>>
> It's not magic. It's moving from a database that's not designed for
> concurrent use to one that is designed for concurrent use.
Despite not being designed for concurrent use, it *is* used this way and
lock contentions have been encountered leading me to believe that the
actual *design* needs to be fixed. The fact that changing the db is
triggering more contentions is a symptom of a deeper issue.
<snip/>
> Even once the correctness and safety of the code change is
> demonstrated,
> you'll still have to pick something from Options 1 - 3.
Well, when the correctness and safety is demonstrated, the context (and
hence my own answer) will probably be different but until then I just
can't say.
> And I'm very reluctant to fork without an actual plan for merging
> back: how to know when it's safe & how to actually achieve it.
And I have no idea (nor time right now) to debug the fallouts of such a
change that the actual package importer doesn't need. Hence my tendency
to consider that demonstrating the validity of this change should be
achieved first.
Keep in mind that I'm trying to keep the package importer working no
worse than it works today and hope to bring it back to ~400 failures
instead of the current 782 ones. Given the last tries to migrate to
storm I'm not overly enthusiastic :-/
Would there be a script to migrate from sqlite to PG ?
Can the package importer be re-started with empty dbs and catch up (how
long will it take ? Days ? Weeks ?). Can this trigger bugs because the
importer don't remember what it pushed to lp ?
Or do you expect us to see another peek like
http://webnumbr.com/ubuntu-package-import-failures.from%282012-01-24%29 ?
> When: We won't be running on sqlite, so our own experience in
> production won't be valid.
So basically you have no idea whether it will blow up on jubany or not,
that's my concern.
> AIUI, the package importer doesn't have a staging server, doesn't
> have anything in its automated test suite that demonstrates the
> locking problem that James saw and doesn't have a test plan.
Yes, that's why we're not in a position to safely accept such a change !
And all the time spent on integrating these changes is not spent on
allowing them to be accepted in good conditions.
I thought I made this clear back when we started this discussion and was
(unfortunately) proved right when we had to rollback :-/
> How: We'll still need to pick something from Options 1 - 3. Degraded
> performance? Storm hack? Postgres?
>> > Are there any others?
>>
>> Setup a test environment with a launchpad test instance where you can
>> demonstrate that the importer won't break when this change is deployed ?
>>
> James has already done this in an EC2 environment. I don't know
> how closely it mirrors the production environment,
A known pain point that I'd *really* like to see fixed. It's absurdly
hard to debug import failures today (and there is not even a way to test
operations that update lp or the branches pushed there). I've tried to
improve the test suite but there is still a loooong way to go.
Without a proper test suite, we're all blind when it comes to deploying
stuff into production so at least a testing environment (as close as
possible to the production one) should be available and that's what I'm
targeting.
> but it demonstrated the locking problem and is how he came up with
> those options.
Then the test improvements are certainly valuable to backport to lp:udd
or is there nothing to reuse from the EC2 experiment ?
> We have had a lot of experience recently working with Canonical IS
> to get new servers and new staging servers deployed. If you want a
> staging server, we'd be happy to help you and would gladly
> advocate for you in their priority queue.
Great to hear :) I should come back soon on this topic, just a quick
question though: are the new servers running lucid or precise ?
Vincent
More information about the ubuntu-distributed-devel
mailing list