juju is slow to do anything

Mon Sep 2 01:19:41 UTC 2013

HI Kapil,

I cannot reproduce your results, can you please post the output of
juju status -v so we can see the timestamps.

I 100% agree that is this a problem to be fixed, I am trying to
determine if it is (yet another) ec2 region specific SNAFU.

Cheers

Dave

On Mon, Sep 2, 2013 at 11:12 AM, Kapil Thangavelu
<kapil.thangavelu at canonical.com> wrote:
> Hi Folks,
>
> Just to followup after some more investigation. For me on an ec2 environment
> with 1-2 services, juju status was taking about 35s. I instrumented the code
> base a little, and found that roughly 25s is endpoint lookup (both s3 and
> ec2 api). That overhead was constant across any of the juju commands. So
> number #3 would be the biggest win, i went ahead and filed issue 1219441 for
> it. It seems like low hanging fruit that delivers a big win for the ux.
>
> i went ahead and put together a juju cli plugin that does a status
> approximation using endpoint caching and the api and was able to drop that
> time down to 5s (from 35s on my env) and down to 2.5s on Peter's
> environment. attached, inline docs and install instructions.
>
> also regarding the dead machines stuck in pending state, there's a bug
> outstanding for that https://bugs.launchpad.net/juju-core/+bug/1217781
>
> cheers,
>
> Kapil
>
>
>
>
> On Fri, Aug 30, 2013 at 2:15 PM, John Arbash Meinel <john at arbash-meinel.com>
> wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 2013-08-30 14:28, Peter Waller wrote:
>> > For the record, I sent the link privately. The run took about 22s
>> > but I have measured 30s to 1m.
>>
>> Some thoughts, nothing that I can give absolute confirmation on.
>>
>> 1) Next week we have a group sprinting on moving a lot of the command
>> line operations from being evaluated by the client into being
>> evaluated by the API server (running in the cloud) and then returned.
>> The explicit benefits that I've seen in other commands are pretty good.
>>
>> 'juju status' is going to be a command that should see a rather large
>> improvement. Because it does round trip queries for a lot of things
>> (what machines are there, what are the details of each machine, what
>> are the units running on each one, etc).
>>
>> I've prototyped doing those queries in parallel, or trying to do bulk
>> ops, which actually helped a lot in testing (this was for hundreds of
>> units/machines).
>>
>> Doing it on the API server means any round trips are "local" rather
>> than from your machine out to Amazon.
>>
>> 2) From here, 'time juju status' with a single instance running on ec2
>> is 10s. Which breaks down roughly 4s to lookup the IP address, 2s to
>> establish the state, and 4s to "finish up". (resolution of this is 1s
>> granularity)
>>
>> Similarly "time juju-1.13.2 get not-service" takes 8.5s to run. 4s to
>> lookup the address, 2s to connect, and 3s to give the final 'not
>> found' result.
>>
>> With trunk, "time ./juju get not-service" is 4.6s. 2s to lookup IP
>> address, 2s to connect, and the not-found result is instantaneous.
>>
>> So I would expect the 10s of a generic "juju status" to easily drop
>> down to sub 5s. Regardless of any round-trip issues.
>>
>> 3) We are also looking to cache the IP address of the API server, to
>> shave off another ~2-4s for the common case that the address hasn't
>> changed. (We'll fall back to our current discovery mechanism.)
>>
>> 4) There seems to be an odd timing of your screen cast. It does report
>> 22s which matches the times reported in the debug output. But the
>> total time of the video is 20s including typing. Is it just running
>> 2:1 speed?
>>
>> You can see from the debug that you have 7s to lookup the address to
>> connect to, and then about 1s to connect. The rest is time spent
>> gathering the information.
>>
>> I expect it to get a whole lot faster in a couple more weeks, but I'm
>> not going to guarantee that until we've finished the work.
>>
>> 5) If I counted correctly, you have about 23 "machines" that are being
>> considered. A bunch of them down/pending/errored.
>>
>> I would think for the errored ones you could do some sort of "juju
>> destroy-machine". It might make things better (less time spent
>> checking on machines you don't care about.)
>>
>> What happens when you try it? (There may be other issues that make us
>> think we are waiting for something to happen with a machine that we
>> don't want to destroy it.)
>>
>>
>> Anyway in summary, this should be getting better, but I won't have
>> explicit numbers until the work is done.
>>
>> John
>> =:->
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.13 (Cygwin)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iEYEARECAAYFAlIg4UIACgkQJdeBCYSNAAOfxQCeMaRQdqvdyQ11WyRnJ/WPAccp
>> IysAniDrUq6IDtM0fu9SuZg+2AQto8rP
>> =JaZw
>> -----END PGP SIGNATURE-----
>>
>> --
>> Juju mailing list
>> Juju at lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju
>
>
>
> --
> Juju mailing list
> Juju at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju
>