<div dir="ltr">I tested your branch in the same configuration. First I had to change "logging-config: <root>=DEBUG" again. I thought that if you did "juju bootstrap --debug" that it always left DEBUG level on.<div><br></div><div>Anyway, something doesn't quite seem right, as I saw:</div><div><div>-rw------- 1 syslog adm 500M Sep 16 07:25 all-machines.log</div><div>...</div><div>-rw------- 1 syslog syslog 300M Sep 16 07:21 machine-0-2014-09-16T07-21-02.193.log<br></div><div>-rw------- 1 syslog syslog 300M Sep 16 07:25 machine-0-2014-09-16T07-25-14.840.log</div><div>-rw------- 1 syslog syslog 31M Sep 16 07:25 machine-0.log</div><div></div></div><div><br></div><div><br></div><div>Which means that machine-0.log has gotten more than 600MB of data for sure (I saw it rotate at least once above this). But there is only a single all-machines.log of only 500MB.</div><div><br></div><div>As I sat there watching, I did manage to see:</div><div><div>-rw------- 1 syslog adm 7.0M Sep 16 07:26 all-machines.log</div><div>-rw------- 1 syslog adm 513M Sep 16 07:26 all-machines.log.1</div><div></div></div><div>...</div><div><br></div><div>Which looks correct. But it would seem that something (load?) was causing it to not get all of the machine-0 messages.</div><div>Maybe it is getting rate limited?</div><div>I can watch the machine-0.log file fill at about 1MB per second. And then after about 5s I see all-machines.log only go up by a couple of MB. (machine-0 log seemed to go up by about 12MB, and then all-machines.log only went up by 3MB.)</div><div><br></div><div>Now, I'm also seeing a rediculous amount of logging, and lots of lines like:</div><div><a href="http://paste.ubuntu.com/8356019/">http://paste.ubuntu.com/8356019/</a><br></div><div><br></div><div>Which seems to indicate that every 3 seconds the Rsyslog Watcher is dying and causing everything to restart and reconnect and ask for the new credentials.</div><div><br></div><div>I'm also a bit surprised that we're telling them to connect to a DNS name, and not an IP address, but the resetting is the important bit. I filed <a href="https://bugs.launchpad.net/juju-core/+bug/1369900">https://bugs.launchpad.net/juju-core/+bug/1369900</a> to track that bug.</div><div><br></div><div>Anyone care to try to figure out why I'm seeing 3-4x the amount of data in machine-0.log than seems to be ending up in all-machines.log ?</div><div><br></div><div>John</div><div>=:-></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 15, 2014 at 7:13 PM, Wayne Witzel <span dir="ltr"><<a href="mailto:wayne.witzel@canonical.com" target="_blank">wayne.witzel@canonical.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Here is a branch that addresses the concerns and bug. I've tested it locally and with digitalocean. I would love for it to be tested under the scaling scenario where you first encountered the issues.<div> <div> <a href="https://github.com/wwitzel3/juju/tree/ww3-rsyslogd-logrotate" target="_blank">https://github.com/wwitzel3/juju/tree/ww3-rsyslogd-logrotate</a></div><div><br></div><div>I've changed the size in logrotate to an arbitrarily small size, since the rotation is driven by rsyslog. I've also updated the "style" of the rotation to create a new log file and use postrotate to HUP rsyslog.</div><div><br></div><div>Also I removed compression. I don't think we need it given that we are only keeping one rotation around and it is size restricted. This will just make it easier to get at information across both of the log files.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Wayne</div></font></span></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 15, 2014 at 10:30 AM, Nate Finch <span dir="ltr"><<a href="mailto:nate.finch@canonical.com" target="_blank">nate.finch@canonical.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">tl;dr: rsyslog sees 512,000,000 bytes and tells logrotate "time to rotate!" and logrotate sees less than 512MB and say "nah, not big enough" and rsyslog never writes the logs anymore because the file is too big.</div><div><div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 15, 2014 at 10:07 AM, Wayne Witzel <span dir="ltr"><<a href="mailto:wayne.witzel@canonical.com" target="_blank">wayne.witzel@canonical.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">When this was being implemented and during the review process the actual size of the rotated files got adjusted. I tested very thoroughly the actual rotation script and also tested the triggering of the rotation script. Enough that I was happy landing the changes.<div><br></div><div>When the sizes were adjusted the same amount of diligence was not applied in testing. This is what introduced the bug.</div><div><br></div><div>The size mismatch is problem. rsyslogd will stop logging to the file and attempted to execute the script in the outchannel every time it gets a new log message, which runs logrotate, which does nothing because the file isn't big enough.</div><div><br></div><div class="gmail_extra">Here is the rsyslog docs for implementation of sized based rotation.</div><div class="gmail_extra"> <a href="http://www.rsyslog.com/doc/log_rotation_fix_size.html" target="_blank">http://www.rsyslog.com/doc/log_rotation_fix_size.html</a></div><div class="gmail_extra"><br></div><div class="gmail_extra">I first attempted use the mv command, but with our rsyslogd configuration, when I performed a mv, rsyslogd would stop logging to the file until I actually did a reload on the process. Also this didn't easily support things like compression or adding more archives if we decided to keep more than one rotation around. So the choice to use logrotate is was made.</div><div class="gmail_extra"><br></div><div class="gmail_extra">As for the actual act of reloading rsyslog. You can use copytruncate to avoid it all together or you can use postrotate to reload rsyslogd. In both cases there is a small window of possible data loss. Though with copyandtruncate it is more likley to happen when the system is under load. So changing to a postrotate that reloads rsyslogd is probably a good idea.</div><div class="gmail_extra"><br></div><div class="gmail_extra">The rotate setting is just the number of files to keep around. I asked some people after the size update if 1 rotation, a total of 1GB was enough to keep around. Most people thought it was fine, since the original purpose of the sized based long rotation was to address a ticket that was complaining about the juju log folder taking up 3GB+ of space.</div><div class="gmail_extra"><br></div><div class="gmail_extra">I will get a ticket created to address the size mismatch issue and switch the copyandtruncate approach to a postrotate of rsyslogd.</div><div class="gmail_extra"><br></div><div class="gmail_extra">Wayne</div><div class="gmail_extra"><br><div class="gmail_quote"><div><div>On Mon, Sep 15, 2014 at 5:30 AM, John Meinel <span dir="ltr"><<a href="mailto:john@arbash-meinel.com" target="_blank">john@arbash-meinel.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div><div dir="ltr">So we are using rsyslog.conf to have it figure out when rotation needs to be done with:<div><div># Maximum size for the log on this outchannel is 512MB</div><div># The command to execute when an outchannel as reached its size limit cannot accept any arguments</div><div># that is why we have created the helper script for executing logrotate.</div><span><div>$outchannel logRotation,{{logDir}}/all-machines.log,512000000,{{logrotateHelperPath}}</div><div><br></div></span><div>I would think that would not require SIGHUP along with having it run the script at our request.</div><div><br></div><div>John</div><div>=:-></div><span><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 15, 2014 at 12:13 PM, Stuart Bishop <span dir="ltr"><<a href="mailto:stuart.bishop@canonical.com" target="_blank">stuart.bishop@canonical.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On 15 September 2014 12:38, John Meinel <<a href="mailto:john@arbash-meinel.com" target="_blank">john@arbash-meinel.com</a>> wrote:<br>
<br>
> 7) "copytruncate" seems the wrong setting for interactive with rsyslog. I<br>
> believe rsyslog is already aware that the file needs to be rotated, and thus<br>
<br>
</span>It is only aware if you sent it a HUP signal.<br>
<div><div><br>
> it shouldn't be trying to write to the same file handle (and thus we don't<br>
> need to truncate in place). I'm not 100% sure on the interactions here, but<br>
> "copytruncate" seems to have an inherent likelyhood of dropping data (while<br>
> you are copying, if any data gets written then you'll miss those last few<br>
> bytes when you go to truncate, right?)<br>
<br>
</div></div><span><font color="#888888">--<br>
Stuart Bishop <<a href="mailto:stuart.bishop@canonical.com" target="_blank">stuart.bishop@canonical.com</a>><br>
</font></span></blockquote></div><br></div></span></div></div>
<br></div></div><span>--<br>
Juju-dev mailing list<br>
<a href="mailto:Juju-dev@lists.ubuntu.com" target="_blank">Juju-dev@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" target="_blank">https://lists.ubuntu.com/mailman/listinfo/juju-dev</a><br>
<br></span></blockquote></div><span><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div>Wayne Witzel III</div><div><a href="mailto:wayne.witzel@canonical.com" target="_blank">wayne.witzel@canonical.com</a></div></div>
</font></span></div></div>
<br>--<br>
Juju-dev mailing list<br>
<a href="mailto:Juju-dev@lists.ubuntu.com" target="_blank">Juju-dev@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/juju-dev" target="_blank">https://lists.ubuntu.com/mailman/listinfo/juju-dev</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><div>Wayne Witzel III</div><div><a href="mailto:wayne.witzel@canonical.com" target="_blank">wayne.witzel@canonical.com</a></div></div>
</div>
</div></div></blockquote></div><br></div>