<div dir="ltr">Opps, I only responded to William rather than the whole group. <br><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Mark Canonical Ramm-Christensen</b> <span dir="ltr"><<a href="mailto:mark.ramm-christensen@canonical.com">mark.ramm-christensen@canonical.com</a>></span><br> Date: Sat, Nov 9, 2013 at 2:02 AM<br>Subject: Re: High Availability command line interface - future plans.<br>To: William Reade <<a href="mailto:william.reade@canonical.com">william.reade@canonical.com</a>><br><br><br> <div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote"><div class="im">On Sat, Nov 9, 2013 at 12:00 AM, William Reade <span dir="ltr"><<a href="mailto:william.reade@canonical.com" target="_blank">william.reade@canonical.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">I'm concerned that we're (1) rehashing decisions made during the sprint and (2) deviating from requirements in doing so.<div> <br></div></div></blockquote></div><div>I'm sure that this is a valid concern. </div><div><br></div><div>My concern was that we were wandering back into territory oft discussed in the past (juju deploy juju) because the "ensure-ha" spelling was both not-expressive enough, and too expressive of an idea that we aren't making true (that we will fix HA for you even in the future). </div> <div class="im"> <div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div></div><div>In particular, abstracting HA away into "management" manipulations -- as roger notes, pretty much isomorphic to the "jobs" proposal -- doesn't give users HA so much as it gives them a limited toolkit with which they can more-or-less construct their own HA; in particular, allowing people to use an even number of state servers is strictly a bad thing [0], and I'm extremely suspicious of any proposal that opens that door.</div> </div></blockquote><div><br></div></div><div>I'd be fine with: </div><div><br></div><div>juju make-me-ha</div><div><br></div><div>as a better (officially approved) way of saying: </div><div><br></div><div><span style="font-family:arial,sans-serif;font-size:13px">juju add-state-server -n 2</span></div> <div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">Heck, I'd even be fine with implementing make-me-ha as being the "one true way" to do HA with us not even implementing add-state-server yet. </span></div> <div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">I do however think the idea of being able to place state-servers as you go is pretty useful, and folks setting up HA can be expected to read documentation, and when the don't to be reminded that they have done something bad when they type the command, and again every time they run status (and potentially every time they open up the GUI). </span></div> <div class="im"> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>Of course, some will argue that mongo should be able to scale separately from the api servers and other management tasks, and this is a worthy goal; but in this context it sucks us down into the morass of exposing different types of management on different machines, and ends up approaching the jobs proposal still closer, in that it requires users to assimilate a whole load of extra terminology in order to perform a conceptually simple function.</div> </div></blockquote></div><div><br>Yea, that complexity in the --jobs proposal is one thing I want to avoid -- but even more so it is the fact that the complexity is hidden in a very odd place (behind add-machine) which is quite unintuitive. <br> </div><div class="im"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"> <div>Conversely, "ensure-ha" (with possible optional --redundancy=N flag, defaulting to 1) is a simple model that can be simply explained: the command's sole purpose is to ensure that juju management cannot fail as a result to the simultaneous failure of <=N machines. It's a *user-level* construct that will always be applicable even in the context of a more sophisticated future language (no matter what's going on with this complicated management/jobs business, you can run that and be assured you'll end up with at least enough manager machines to fulfil the requirement you clearly stated in the command line).<br> </div></div></blockquote><div><br></div></div><div>I think the problem here is that it is felt to be "too magical" and to express something we can't really promise -- that HA is ensured. But just chainging ensure to set makes that second point go away. </div> <div><br></div><div>Kapil was brought up the point that a complicated syntax for HA was bad, but that it would be "<span style="font-family:arial,sans-serif;font-size:13px">even worse" to "assume it always just works in a blackbox."</span></div> <div class="im"> <div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>I haven't seen anything that makes me think that redesigning from scratch is in any way superior to refining what we already agreed upon; and it's distracting us from the questions of reporting and correcting manager failure when it occurs. I assert the following series of arguments:</div> <div><br></div><div>* users may discover at any time that they need to make an existing environment HA, so ensure-ha is *always* a reasonable user action</div></div></blockquote><div><br></div></div><div>But there are different ways of going about it given a desire for density...</div> <div class="im"> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"> <div>* similarly, allowing users to *directly* destroy management machines enables exciting new failure modes that don't really need to exist</div></div></blockquote><div><br></div></div><div>Well, if you are at 4 nodes of state-servers there are two ways to get into a "good state" adding another mongo or removing one ;) Both allow failure modes. </div> <div class="im"> <div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>* the notion of HA is somewhat limited in worth when there's no way to make a vulnerable environment robust again</div> </div></blockquote><div><br></div></div><div>I think all the proposals offer a way to bring back HA, so I think on this at least everybody is in agreement. </div><div class="im"><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"> * the more complexity we shovel onto the user's plate, the less likely she is to resolve the situation correctly under stress</div></blockquote><div><br></div></div><div>Agreed. And make-ha is perhaps a good spelling for that where it makes sure you have exactly 3 machines. </div> <div class="im"> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>* the most obvious, and foolproof, command for repairing HA would be "ensure-ha" itself, which could very reasonably take it upon itself to replace manager nodes detected as "down" -- assuming a robust presence implementation, which we need anyway, this (1) works trivially for machines that die unexpectedly and (2) allows a backdoor for resolution of "weird" situations: the user can manually shutdown a misbehaving manager out-of-band, and run ensure-ha to cause a new one to be spun up in its place; once HA is restored, the old machine will no longer be a manager, no longer be indestructible, and can be cleaned up at leisure</div> </div></blockquote><div><br></div></div><div>I think this can all be done manually for now -- we don't automate it for services, or have any of the primitives for HA that we would need. </div><div class="im"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"><div><br></div><div>I don't believe that any of this precludes more sophisticated management of juju's internal functions *when* the need becomes pressing -- whether via jobs, or namespaced pseudo-services, or whatever -- but at this stage I think it is far better to expose the policies we're capable of supporting, and thus allow ourselves wiggle room to allow the mechanism to evolve, than to define a user-facing model that is, at best, a woolly reflection of an internal model that's likely to change as we explore the solution space in practice.</div> </div></blockquote><div><br></div></div><div>Cool, I think that make-ha sounds both idempotent and transient (it can be safely run multiple times, and doesn't imply that it is *watching and updating things dynamically* so I'm all for it. </div> <div><br></div><div>And I think the need for add-state-server is predicated on a desire to make the manual bits easier to handle -- and I'd rather do that that add-machines --jobs -- but doing the high level thing that just does the right thing is better yet in my opinion. </div> <div class="im"> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>Long-term, FWIW, I would be happiest to expose fine control over HA, scaling, etc by presenting juju's internal functionality as a namespaced group of services that *can* be configured and manipulated (as much as possible) like normal services, because... y'know... services/units is actually a pretty good user model; but I think we're all in agreement that we shouldn't go down that rabbit hole today.</div> </div></blockquote><div><br></div></div><div>I think there is broad agreement about the niceness of exposing it this way even if it is implemented very differently under the hood. I do think that it is important not go down that road until internal functionality actually acts like units in enough ways that it is unlikely that they will then immediately move on to ask why it behaves differently from all other units ;) </div> <div><br></div><div>That said, after a bit more thinking this evening on the train to Hong Kong I'm not sure the list of issues Roger raised is actually likely to cause user problems:</div><div class="im"><div><br></div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"><div><span style="font-family:arial,sans-serif;font-size:13px">The difficulty I have with 1) is that there's a significant mismatch between</span><br style="font-family:arial,sans-serif;font-size:13px"> <span style="font-family:arial,sans-serif;font-size:13px">the user's view of things and what's going on underneath.</span><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">For instance, with a built-in service, can I:</span><br style="font-family:arial,sans-serif;font-size:13px"> <br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">- add a subordinate service to it?</span><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">- see the relevant log file in the usual place for a unit?</span><br style="font-family:arial,sans-serif;font-size:13px"> <span style="font-family:arial,sans-serif;font-size:13px">- see its charm metadata?</span><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">- join to its juju-info relation?</span><br style="font-family:arial,sans-serif;font-size:13px"> <br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">If it's a single service, how can its units span different series?</span><br style="font-family:arial,sans-serif;font-size:13px"> <span style="font-family:arial,sans-serif;font-size:13px">(presumably it has got a charm URL, which includes the series)</span><br></div><div><br></div></div></blockquote></div><div>I don't think users are generally going to have much of any expectations about the above five points... but I don't think they matter -- still something feels icky and I'm not completely sold on doing the "add-unity state-server --to=lxc5/1". </div> <div><br></div><div>The more I thought about it all evening the less convinced I was that calling this "add-unit state-server" (even though that is not AT ALL how it is implemented) was a terrible idea. The --to and --constraints syntax apply in both cases, and I'm not sure that "normal" users have a crisp idea of what a "unit" is and probably have no expectations about the 5 mismatches listed above -- at least not until they were pretty far down the rabbit hole. </div> <span class="HOEnZb"><font color="#888888"> <div><br></div><div>--Mark Ramm</div><div><br></div> <div><br></div><div><br></div><div> </div></font></span></div></div></div> </div><br></div>