feedback about juju after using it for a few months

Caio Begotti caio1982 at gmail.com
Wed Dec 17 22:24:56 UTC 2014


Folks, I just wanted to share my experience with Juju during the last few
months using it for real at work. I know it's pretty long but stay with me
as I wanted to see if some of these points are bugs, design decisions or if
we could simply to talk about them :-)

General:

1. Seems that if you happen to have more than... say, 30 machines, Juju
starts behaving weirdly until you remove unused machines. One of the weird
things is that new deploys all stay stuck with a pending status. That
happened at least 4 times, so now I always destroy-environment when testing
things just in case. Have anyone else seen this behaviour? Can this because
of LXC with Juju local? I do a lot of Juju testing so it's not usual for me
to have a couple hundreds of machines after a mont by the way.

2. It's not reliable to use Juju in laptops, which I can understand why of
course but just in case... if the system is suspended Juju will not recover
itself like the rest of the system services. It looses its connection from
its API apparently? Hooks fail too (resuming always seems to call
hooks/config-changed)? Is this just with me?

3. The docs recommend writing charms in Python versus shell script.
Compared to Python they are subpar enough that I'd recommend saying they
are not officially supported then. It's quite common to have race
conditions in charms written in shell script. You have to keep polling the
status of things because if you just call deploys and set relations in a
row they will fail, because Juju won't queue the commands in a logical
sequence, it'll just run them dumbly and developers are left in the wild to
control it. I'm assuming a Python charm does not have this problem at all?

4. It's not very clear how many times hooks/config-changed runs to me, I'd
just guess many :-) so you have to pay attention to it and write extra
checks to avoid multiple harmful runs of this hook. I'd say the sequence
and number of hooks called by a new deploy is not very clear based on the
documentation because of this. Hmm perhaps I could print debug it and count
the hits...

5. Juju should queue multiple deployment in order not to hurt performance,
both of disk and network IO. More than 3 deployments in parallel on my
machine makes it all really slow. I just leave Juju for a while and go get
some coffee because the system goes crazy. Or I have to break up manually
the deployments, while Juju could have just queued it all and the CLI could
simply display it as "queued" instead. I know it would need to analyse the
machine's hardware to guess a number different from 3 but think about it if
your deployments have about 10 different services... things that take 20
minutes can easily take over 1 hour.

6. There is no way to know if a relation exists and if it's active or not,
so you need to write dummy conditionals in your hooks to work around that.
IMHO it's hackish to check variables that are only non-empty during a
relation because they will vanish anyway. A command to list the currently
set relations would be awesome to have, both inside the hooks and in the
CLI. Perhaps charmhelpers.core.services.helpers.RelationContext could be
used for this but I'm not totally sure as you only get the relation data
and you need to know the relation name in advance anyway, right?

7. When a hook fails (most usually during relations being set) I have to
manually run resolved unit/0 multiple times. It's not enough to call it
once and wait for Juju to get it straight. I have to babysit the unit and
keep running resolved unit/0, while I imagined this should be automatic
because I wanted it resolved for real anyway. If the failed hook was the
first in a chain, you'll have to re-run this for every other hook in the
sequence. Once for a relation, another for config-changed, then perhaps
another for the stop hook and another one for start hook, depending on your
setup.

8. Do we have to monitor and wait a relation variable to be set? I've
noticed that sometimes I want to get its value right away in the relation
hook but it's not assigned yet by the other service. So I'm finding myself
adding sleep commands when it happens, and that's quite hackish I think?
IMHO the command to get a variable from a relation should be blocking until
a value is returned so the charm doesn't have any timing issues. I see that
happening with rabbitmq-server's charm all the time, for instance.

9. If you want to cancel a deployment that just started you need to keep
running remove-service forever. Juju will simply ignore you if it's still
running some special bits of the charm or if you have previously asked it
to cancel the deployment during its setting up. No errors, no other
messages are printed. You need to actually open its log to see that it's
still stuck in a long apt-get installation and you have to wait until the
right moment to remove-service again. And if your connection is slow, that
takes time, you'll have to babysit Juju here because it doesn't really
control its services as I imagined. Somehow apt-get gets what it wants :-)

10. I think there's something weird about relation-set and relation-get
between services when you add and remove relations multiple times. For
example, the first time I set a relation to a Postgres charm I get a
database back and my desired roles configured, but if I remove the relation
and then add it back I only get the database settings. The roles parameter
is missing setup, so I don't have the right permissions in the DB the
second time I set the relation. Anyone has seen this too with other charms?

Juju GUI:

11. Juju's GUI's bells and whistles are nice, but I think there's a bug
with it because its statuses are inaccurate. If you set a relation, Juju
says the relation is green and active immediately, which is not true if you
keep tailing the log file and you know things can still fail because
scripts are still running.

12. Cancelling actions on Juju's GUI does not make much sense since you
need to click on commit, then click on clear, then confirm it. Why not
simply having a different cancel button instead? It's like shutting down
Windows from the start menu. The cancel button should cancel the action,
and the actual X button should simply dismiss it. That clear button seems
useless UX-wise?

13. Juju's GUI's panel with charmstore stays open all the time wasting
window space (so I have to zoom out virtually all my deployments because of
the amount of wasted space, every time). There could be a way to hide that
panel, because honestly it's useless locally since it never lists my local
charms even if I export JUJU_REPOSITORY correctly. I'd rather have my local
charms listed there too or just hide the panel instead.

13. Juju's GUI shows new relations info incorrectly. If I set up a DB
relation to my service it simply says in the confirmation window that "db
relation added between postgresql and postgresql". I've noticed sometimes
this changes to "between myservice and myservice" so perhaps it has to do
with the order of the relation, from what service to the other? Anyway,
both cases seem to show it wrong?

14. Juju's GUI always shows the service panel even if the service unit has
been destroyed, just because I opened it once. Also, it says "1 dying
units" (sic) forever until I close it manually.

15. Why subordinate charms don't have a color bar beneath their icons too?
Because if it fails then it will appear in red right? Why not always
display it to indicate it's been correctly deployed or set up?

16. Juju's GUI lists all my machines. Like, all of them, really. In the
added services part of the panel it lists even inactive machines, which
does not make much sense I'd say because it makes it seem only deployed
machines are listed. I think that count is wrong.

That's it, thank you for those who made it to the end :-D
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju/attachments/20141217/06f7dd5c/attachment.html>


More information about the Juju mailing list