feedback about juju after using it for a few months

Tim Penhey tim.penhey at canonical.com
Wed Dec 17 22:47:02 UTC 2014


On 18/12/14 11:24, Caio Begotti wrote:
> Folks, I just wanted to share my experience with Juju during the last
> few months using it for real at work. I know it's pretty long but stay
> with me as I wanted to see if some of these points are bugs, design
> decisions or if we could simply to talk about them :-)
> 
> General:
> 
> 1. Seems that if you happen to have more than... say, 30 machines, Juju
> starts behaving weirdly until you remove unused machines. One of the
> weird things is that new deploys all stay stuck with a pending status.
> That happened at least 4 times, so now I always destroy-environment when
> testing things just in case. Have anyone else seen this behaviour? Can
> this because of LXC with Juju local? I do a lot of Juju testing so it's
> not usual for me to have a couple hundreds of machines after a mont by
> the way.

I'll answer this one now.  This is due to "not enough file handles".  It
seems that the LXC containers that get created inherit the handles of
the parent process, which is the machine agent.  After a certain number
of machines, and it may be around 30, the new machines start failing to
recognise the new upstart script because inotify isn't working properly.
This means the agents don't start, and don't tell the state server they
are running, which means the machines stay pending even though lxc says
"yep you're all good".

I'm not sure how big we can make the "limit nofile" in the agent upstart
script without it causing problems elsewhere.

Tim




More information about the Juju mailing list