logrotate configuration seems wrong
John Meinel
john at arbash-meinel.com
Tue Sep 16 07:39:34 UTC 2014
I tested your branch in the same configuration. First I had to change
"logging-config: <root>=DEBUG" again. I thought that if you did "juju
bootstrap --debug" that it always left DEBUG level on.
Anyway, something doesn't quite seem right, as I saw:
-rw------- 1 syslog adm 500M Sep 16 07:25 all-machines.log
...
-rw------- 1 syslog syslog 300M Sep 16 07:21
machine-0-2014-09-16T07-21-02.193.log
-rw------- 1 syslog syslog 300M Sep 16 07:25
machine-0-2014-09-16T07-25-14.840.log
-rw------- 1 syslog syslog 31M Sep 16 07:25 machine-0.log
Which means that machine-0.log has gotten more than 600MB of data for sure
(I saw it rotate at least once above this). But there is only a single
all-machines.log of only 500MB.
As I sat there watching, I did manage to see:
-rw------- 1 syslog adm 7.0M Sep 16 07:26 all-machines.log
-rw------- 1 syslog adm 513M Sep 16 07:26 all-machines.log.1
...
Which looks correct. But it would seem that something (load?) was causing
it to not get all of the machine-0 messages.
Maybe it is getting rate limited?
I can watch the machine-0.log file fill at about 1MB per second. And then
after about 5s I see all-machines.log only go up by a couple of MB.
(machine-0 log seemed to go up by about 12MB, and then all-machines.log
only went up by 3MB.)
Now, I'm also seeing a rediculous amount of logging, and lots of lines like:
http://paste.ubuntu.com/8356019/
Which seems to indicate that every 3 seconds the Rsyslog Watcher is dying
and causing everything to restart and reconnect and ask for the new
credentials.
I'm also a bit surprised that we're telling them to connect to a DNS name,
and not an IP address, but the resetting is the important bit. I filed
https://bugs.launchpad.net/juju-core/+bug/1369900 to track that bug.
Anyone care to try to figure out why I'm seeing 3-4x the amount of data in
machine-0.log than seems to be ending up in all-machines.log ?
John
=:->
On Mon, Sep 15, 2014 at 7:13 PM, Wayne Witzel <wayne.witzel at canonical.com>
wrote:
> Here is a branch that addresses the concerns and bug. I've tested it
> locally and with digitalocean. I would love for it to be tested under the
> scaling scenario where you first encountered the issues.
>
> https://github.com/wwitzel3/juju/tree/ww3-rsyslogd-logrotate
>
> I've changed the size in logrotate to an arbitrarily small size, since the
> rotation is driven by rsyslog. I've also updated the "style" of the
> rotation to create a new log file and use postrotate to HUP rsyslog.
>
> Also I removed compression. I don't think we need it given that we are
> only keeping one rotation around and it is size restricted. This will just
> make it easier to get at information across both of the log files.
>
> Wayne
>
> On Mon, Sep 15, 2014 at 10:30 AM, Nate Finch <nate.finch at canonical.com>
> wrote:
>
>> tl;dr: rsyslog sees 512,000,000 bytes and tells logrotate "time to
>> rotate!" and logrotate sees less than 512MB and say "nah, not big enough"
>> and rsyslog never writes the logs anymore because the file is too big.
>>
>> On Mon, Sep 15, 2014 at 10:07 AM, Wayne Witzel <
>> wayne.witzel at canonical.com> wrote:
>>
>>> When this was being implemented and during the review process the actual
>>> size of the rotated files got adjusted. I tested very thoroughly the actual
>>> rotation script and also tested the triggering of the rotation script.
>>> Enough that I was happy landing the changes.
>>>
>>> When the sizes were adjusted the same amount of diligence was not
>>> applied in testing. This is what introduced the bug.
>>>
>>> The size mismatch is problem. rsyslogd will stop logging to the file and
>>> attempted to execute the script in the outchannel every time it gets a new
>>> log message, which runs logrotate, which does nothing because the file
>>> isn't big enough.
>>>
>>> Here is the rsyslog docs for implementation of sized based rotation.
>>> http://www.rsyslog.com/doc/log_rotation_fix_size.html
>>>
>>> I first attempted use the mv command, but with our rsyslogd
>>> configuration, when I performed a mv, rsyslogd would stop logging to the
>>> file until I actually did a reload on the process. Also this didn't easily
>>> support things like compression or adding more archives if we decided to
>>> keep more than one rotation around. So the choice to use logrotate is was
>>> made.
>>>
>>> As for the actual act of reloading rsyslog. You can use copytruncate to
>>> avoid it all together or you can use postrotate to reload rsyslogd. In both
>>> cases there is a small window of possible data loss. Though with
>>> copyandtruncate it is more likley to happen when the system is under load.
>>> So changing to a postrotate that reloads rsyslogd is probably a good idea.
>>>
>>> The rotate setting is just the number of files to keep around. I asked
>>> some people after the size update if 1 rotation, a total of 1GB was enough
>>> to keep around. Most people thought it was fine, since the original purpose
>>> of the sized based long rotation was to address a ticket that was
>>> complaining about the juju log folder taking up 3GB+ of space.
>>>
>>> I will get a ticket created to address the size mismatch issue and
>>> switch the copyandtruncate approach to a postrotate of rsyslogd.
>>>
>>> Wayne
>>>
>>> On Mon, Sep 15, 2014 at 5:30 AM, John Meinel <john at arbash-meinel.com>
>>> wrote:
>>>
>>>> So we are using rsyslog.conf to have it figure out when rotation needs
>>>> to be done with:
>>>> # Maximum size for the log on this outchannel is 512MB
>>>> # The command to execute when an outchannel as reached its size limit
>>>> cannot accept any arguments
>>>> # that is why we have created the helper script for executing logrotate.
>>>> $outchannel
>>>> logRotation,{{logDir}}/all-machines.log,512000000,{{logrotateHelperPath}}
>>>>
>>>> I would think that would not require SIGHUP along with having it run
>>>> the script at our request.
>>>>
>>>> John
>>>> =:->
>>>>
>>>> On Mon, Sep 15, 2014 at 12:13 PM, Stuart Bishop <
>>>> stuart.bishop at canonical.com> wrote:
>>>>
>>>>> On 15 September 2014 12:38, John Meinel <john at arbash-meinel.com>
>>>>> wrote:
>>>>>
>>>>> > 7) "copytruncate" seems the wrong setting for interactive with
>>>>> rsyslog. I
>>>>> > believe rsyslog is already aware that the file needs to be rotated,
>>>>> and thus
>>>>>
>>>>> It is only aware if you sent it a HUP signal.
>>>>>
>>>>> > it shouldn't be trying to write to the same file handle (and thus we
>>>>> don't
>>>>> > need to truncate in place). I'm not 100% sure on the interactions
>>>>> here, but
>>>>> > "copytruncate" seems to have an inherent likelyhood of dropping data
>>>>> (while
>>>>> > you are copying, if any data gets written then you'll miss those
>>>>> last few
>>>>> > bytes when you go to truncate, right?)
>>>>>
>>>>> --
>>>>> Stuart Bishop <stuart.bishop at canonical.com>
>>>>>
>>>>
>>>>
>>>> --
>>>> Juju-dev mailing list
>>>> Juju-dev at lists.ubuntu.com
>>>> Modify settings or unsubscribe at:
>>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Wayne Witzel III
>>> wayne.witzel at canonical.com
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev at lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>>
>>
>
>
> --
> Wayne Witzel III
> wayne.witzel at canonical.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/juju-dev/attachments/20140916/9d2c73fd/attachment-0001.html>
More information about the Juju-dev
mailing list