juju is slow to do anything

Kapil Thangavelu kapil.thangavelu at canonical.com
Mon Sep 2 01:22:59 UTC 2013


Hi Dave,

I attached some status --debug logs to the bug for us-east-1 (30s instead
of 35s in us-west-2).

cheers,

Kapil


On Sun, Sep 1, 2013 at 9:19 PM, David Cheney <david.cheney at canonical.com>wrote:

> HI Kapil,
>
> I cannot reproduce your results, can you please post the output of
> juju status -v so we can see the timestamps.
>
> I 100% agree that is this a problem to be fixed, I am trying to
> determine if it is (yet another) ec2 region specific SNAFU.
>
> Cheers
>
> Dave
>
> On Mon, Sep 2, 2013 at 11:12 AM, Kapil Thangavelu
> <kapil.thangavelu at canonical.com> wrote:
> > Hi Folks,
> >
> > Just to followup after some more investigation. For me on an ec2
> environment
> > with 1-2 services, juju status was taking about 35s. I instrumented the
> code
> > base a little, and found that roughly 25s is endpoint lookup (both s3 and
> > ec2 api). That overhead was constant across any of the juju commands. So
> > number #3 would be the biggest win, i went ahead and filed issue 1219441
> for
> > it. It seems like low hanging fruit that delivers a big win for the ux.
> >
> > i went ahead and put together a juju cli plugin that does a status
> > approximation using endpoint caching and the api and was able to drop
> that
> > time down to 5s (from 35s on my env) and down to 2.5s on Peter's
> > environment. attached, inline docs and install instructions.
> >
> > also regarding the dead machines stuck in pending state, there's a bug
> > outstanding for that https://bugs.launchpad.net/juju-core/+bug/1217781
> >
> > cheers,
> >
> > Kapil
> >
> >
> >
> >
> > On Fri, Aug 30, 2013 at 2:15 PM, John Arbash Meinel <
> john at arbash-meinel.com>
> > wrote:
> >>
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA1
> >>
> >> On 2013-08-30 14:28, Peter Waller wrote:
> >> > For the record, I sent the link privately. The run took about 22s
> >> > but I have measured 30s to 1m.
> >>
> >> Some thoughts, nothing that I can give absolute confirmation on.
> >>
> >> 1) Next week we have a group sprinting on moving a lot of the command
> >> line operations from being evaluated by the client into being
> >> evaluated by the API server (running in the cloud) and then returned.
> >> The explicit benefits that I've seen in other commands are pretty good.
> >>
> >> 'juju status' is going to be a command that should see a rather large
> >> improvement. Because it does round trip queries for a lot of things
> >> (what machines are there, what are the details of each machine, what
> >> are the units running on each one, etc).
> >>
> >> I've prototyped doing those queries in parallel, or trying to do bulk
> >> ops, which actually helped a lot in testing (this was for hundreds of
> >> units/machines).
> >>
> >> Doing it on the API server means any round trips are "local" rather
> >> than from your machine out to Amazon.
> >>
> >> 2) From here, 'time juju status' with a single instance running on ec2
> >> is 10s. Which breaks down roughly 4s to lookup the IP address, 2s to
> >> establish the state, and 4s to "finish up". (resolution of this is 1s
> >> granularity)
> >>
> >> Similarly "time juju-1.13.2 get not-service" takes 8.5s to run. 4s to
> >> lookup the address, 2s to connect, and 3s to give the final 'not
> >> found' result.
> >>
> >> With trunk, "time ./juju get not-service" is 4.6s. 2s to lookup IP
> >> address, 2s to connect, and the not-found result is instantaneous.
> >>
> >> So I would expect the 10s of a generic "juju status" to easily drop
> >> down to sub 5s. Regardless of any round-trip issues.
> >>
> >> 3) We are also looking to cache the IP address of the API server, to
> >> shave off another ~2-4s for the common case that the address hasn't
> >> changed. (We'll fall back to our current discovery mechanism.)
> >>
> >> 4) There seems to be an odd timing of your screen cast. It does report
> >> 22s which matches the times reported in the debug output. But the
> >> total time of the video is 20s including typing. Is it just running
> >> 2:1 speed?
> >>
> >> You can see from the debug that you have 7s to lookup the address to
> >> connect to, and then about 1s to connect. The rest is time spent
> >> gathering the information.
> >>
> >> I expect it to get a whole lot faster in a couple more weeks, but I'm
> >> not going to guarantee that until we've finished the work.
> >>
> >> 5) If I counted correctly, you have about 23 "machines" that are being
> >> considered. A bunch of them down/pending/errored.
> >>
> >> I would think for the errored ones you could do some sort of "juju
> >> destroy-machine". It might make things better (less time spent
> >> checking on machines you don't care about.)
> >>
> >> What happens when you try it? (There may be other issues that make us
> >> think we are waiting for something to happen with a machine that we
> >> don't want to destroy it.)
> >>
> >>
> >> Anyway in summary, this should be getting better, but I won't have
> >> explicit numbers until the work is done.
> >>
> >> John
> >> =:->
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v1.4.13 (Cygwin)
> >> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> >>
> >> iEYEARECAAYFAlIg4UIACgkQJdeBCYSNAAOfxQCeMaRQdqvdyQ11WyRnJ/WPAccp
> >> IysAniDrUq6IDtM0fu9SuZg+2AQto8rP
> >> =JaZw
> >> -----END PGP SIGNATURE-----
> >>
> >> --
> >> Juju mailing list
> >> Juju at lists.ubuntu.com
> >> Modify settings or unsubscribe at:
> >> https://lists.ubuntu.com/mailman/listinfo/juju
> >
> >
> >
> > --
> > Juju mailing list
> > Juju at lists.ubuntu.com
> > Modify settings or unsubscribe at:
> > https://lists.ubuntu.com/mailman/listinfo/juju
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20130901/3fac7733/attachment-0001.html>


More information about the Juju mailing list