<div dir="ltr">From <a href="https://www.kernel.org/doc/Documentation/vm/overcommit-accounting">https://www.kernel.org/doc/Documentation/vm/overcommit-accounting</a>:<div><br></div><div><span style="color:rgb(0,0,0);white-space:pre-wrap">The Linux kernel supports the following overcommit handling modes</span><pre style="color:rgb(0,0,0);word-wrap:break-word;white-space:pre-wrap">0 - Heuristic overcommit handling. Obvious overcommits of
address space are refused. Used for a typical system. It
ensures a seriously wild allocation fails while allowing
overcommit to reduce swap usage. root is allowed to
allocate slightly more memory in this mode. This is the
default.
1 - Always overcommit. Appropriate for some scientific
applications. Classic example is code using sparse arrays
and just relying on the virtual memory consisting almost
entirely of zero pages.
2 - Don't overcommit. The total address space commit
for the system is not permitted to exceed swap + a
configurable amount (default is 50%) of physical RAM.
Depending on the amount you use, in most situations
this means a process will not be killed while accessing
pages but will receive errors on memory allocation as
appropriate.
Useful for applications that want to guarantee their
memory allocations will be available in the future
without having to initialize every page.</pre></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 3, 2015 at 7:40 AM, John Meinel <span dir="ltr"><<a href="mailto:john@arbash-meinel.com" target="_blank">john@arbash-meinel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">So interestingly we are already fairly heavily overcommitted. We have 4GB of RAM and 4GB of swap available. And cat /proc/meminfo is saying:<div><font face="monospace, monospace">CommitLimit: 6214344 kB</font></div><div><font face="monospace, monospace">Committed_AS: 9764580 kB</font></div><div><font face="monospace, monospace"><br></font></div><div><font face="monospace, monospace">John</font></div><div><font face="monospace, monospace">=:-></font></div><div><div class="h5"><div><font face="monospace, monospace"><br></font></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jun 3, 2015 at 9:28 AM, Gustavo Niemeyer <span dir="ltr"><<a href="mailto:gustavo@niemeyer.net" target="_blank">gustavo@niemeyer.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">Ah, and you can also suggest increasing the swap. It would not actually be used, but the system would be able to commit to the amount of memory required, if it really had to.<br>
</p><div><div>
<div class="gmail_quote">On Jun 3, 2015 1:24 AM, "Gustavo Niemeyer" <<a href="mailto:gustavo@niemeyer.net" target="_blank">gustavo@niemeyer.net</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p dir="ltr">Hey John,</p>
<p dir="ltr">It's probably an overcommit issue. Even if you don't have the memory in use, cloning it would mean the new process would have a chance to change that memory and thus require real memory pages, which the system obviously cannot give it. You can workaround that by explicitly enabling overcommit, which means the potential to crash late in strange places in the bad case, but would be totally okay for the exec situation.</p>
<div style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">So we're running into this failure mode again at one of our sites.<div><br></div><div>Specifically, the system is running with a reasonable number of nodes (~100) and has been running for a while. It appears that it wanted to restart itself (I don't think it restarted jujud, but I do think it at least restarted a lot of the workers.)</div><div>Anyway, we have a fair number of things that we "exec" during startup (kvm-ok, restart rsyslog, etc).</div><div>But when we get into this situation (whatever it actually is) then we can't exec anything and we start getting failures.</div><div><br></div><div>Now, this *might* be a golang bug.</div><div><br></div><div>When I was trying to debug it in the past, I created a small program that just allocated big slices of memory (10MB strings, IIRC) and then tried to run "echo hello" until it started failing.</div><div>IIRC the failure point was when I wasn't using swap and the allocated memory was 50% of total available memory. (I have 8GB of RAM, it would start failing once we had allocated 4GB of strings).</div><div>When I tried digging into the golang code, it looked like they use clone(2) as the "create a new process for exec" function. And it seemed it wasn't playing nicely with copy-on-write. At least, it appeared that instead of doing a simple copy-on-write clone without allocating any new memory and then exec into a new process, it actually required to have enough RAM available for the new process.</div><div><br></div><div>On the customer site, though, jujud has a RES size of only 1GB, and they have 4GB of available RAM and swap is enabled (2GB of 4GB swap currently in use).</div><div><br></div><div>The only workaround I can think of is for us to create a "forker" process right away at startup that we just send RPC requests to run a command for us and return the results. ATM I don't think we do any fork and run interactively such that we need the stdin/stdout file handles inside our process.</div><div><br></div><div>I'd rather just have golang fork() work even when the current process is using a large amount of RAM.</div><div><br></div><div>Any of the golang folks know what is going on?</div><div><br>John</div><div>=:-></div><div><br></div></div>
<br>--<br>
Juju-dev mailing list<br>
<a href="mailto:Juju-dev@lists.ubuntu.com" target="_blank">Juju-dev@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" target="_blank">https://lists.ubuntu.com/mailman/listinfo/juju-dev</a><br>
<br></div>
</blockquote></div>
</div></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><br>gustavo @ <a href="http://niemeyer.net" target="_blank">http://niemeyer.net</a></div>
</div>