[Bug 1429427] Re: Unexplainable time jumps in CRON

G.J. gjmoed at gmail.com
Tue Jun 9 08:49:43 UTC 2015


Gave it another 10 mins, this time checking the 'receiving' end, being
rsyslog in my case:

[pid   583] <... select resumed> )      = 1 (in [0])
[pid   583] recvmsg(0, {msg_name(0)=NULL, msg_iov(1)=[{"<86>Jun  9 10:01:01 CRON[4860]: pam_unix(cron:session): session opened for user sogo by (uid=0)", 8096}], msg_controllen=64, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, MSG_DONTWAIT) = 95
[pid   583] futex(0x73de74, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x73de70, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid   585] <... futex resumed> )       = 0
[pid   583] <... futex resumed> )       = 1
[pid   585] futex(0x73e060, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   583] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   585] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid   583] <... futex resumed> )       = 0
[pid   585] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   583] select(4, [0 3], NULL, NULL, NULL <unfinished ...>
[pid   585] <... futex resumed> )       = 0
[pid   585] write(5, "Jun  9 10:01:01 mail1 CRON[4860]: pam_unix(cron:session): session opened for user sogo by (uid=0)\n", 98) = 98
[pid   585] write(1, "May 30 19:38:01 mail1 CRON[4860]: message repeated 3 times: [ (sogo) CMD (/usr/sbin/sogo-ealarms-notify && /usr/sbin/sogo-tool expire-sessions 60)]\n", 148) = 148
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid   585] open("/dev/xconsole", O_RDWR|O_NONBLOCK|O_CLOEXEC <unfinished ...>
[pid   583] <... select resumed> )      = 1 (in [0])
[pid   585] <... open resumed> )        = -1 ENOENT (No such file or directory)
[pid   583] recvmsg(0,  <unfinished ...>
[pid   585] futex(0x73de74, FUTEX_WAIT_PRIVATE, 2668217, NULL <unfinished ...>
[pid   583] <... recvmsg resumed> {msg_name(0)=NULL, msg_iov(1)=[{"<78>Jun  9 10:01:01 CRON[4861]: (sogo) CMD (/usr/sbin/sogo-ealarms-notify && /usr/sbin/sogo-tool expire-sessions 60)", 8096}], msg_controllen=64, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, MSG_DONTWAIT) = 116
[pid   585] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid   583] futex(0x73de74, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x73de70, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid   585] futex(0x73e060, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid   583] <... futex resumed> )       = 0
[pid   585] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid   583] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   585] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   583] <... futex resumed> )       = 0
[pid   585] <... futex resumed> )       = 0
[pid   583] select(4, [0 3], NULL, NULL, NULL <unfinished ...>
[pid   585] write(1, "Jun  9 10:01:01 mail1 CRON[4861]: (sogo) CMD (/usr/sbin/sogo-ealarms-notify && /usr/sbin/sogo-tool expire-sessions 60)\n", 119) = 119
[pid   585] futex(0x73de74, FUTEX_WAIT_PRIVATE, 2668219, NULL <unfinished ...>
[pid   583] <... select resumed> )      = 1 (in [0])
[pid   583] recvmsg(0, {msg_name(0)=NULL, msg_iov(1)=[{"<86>Jun  9 10:01:01 CRON[4860]: pam_unix(cron:session): session closed for user sogo", 8096}], msg_controllen=64, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, MSG_DONTWAIT) = 84
[pid   583] futex(0x73de74, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x73de70, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid   585] <... futex resumed> )       = 0
[pid   583] <... futex resumed> )       = 1
[pid   585] futex(0x73e060, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid   583] select(4, [0 3], NULL, NULL, NULL <unfinished ...>
[pid   585] <... futex resumed> )       = 0
[pid   585] write(5, "Jun  9 10:01:01 mail1 CRON[4860]: pam_unix(cron:session): session closed for user sogo\n", 87) = 87
[pid   585] futex(0x73de74, FUTEX_WAIT_PRIVATE, 2668221, NULL <unfinished ...>

>From this I can see I need to comment out this darn xconsole in
/etc/rsyslog.d/50-default.conf since this is a vm. But the more
interesting thing is of course:

recvmsg -> pam session open -> write the same to syslog files
recvmsg -> cron cmd -> write the same to syslog
recvmsg -> pam session closed -> write to syslog

But in this instance where things mess up, I'm seeing:

recvmsg -> pam session open -> write to syslog (still all good)
(no matching recvmsg)   cron cmd -> write to syslog (wrong date/time, same cron pid from session open, repeated 3 times ???)
recvmsg -> cron cmd (sane pid again) -> write to syslog (all good)
recvmsg -> pam session closed -> write to syslog (all good)

So I doubt this has anything to do with cron, pretty much convinced it's in rsyslog.
And then the question is, is this some rsyslog bug or misconfiguration, which imho should still not cause anything like this to happen. With misconfig I mean the xconsole maybe causing some race but I'd have to monitor a bit further then for this to happen at the right time. Right time as in, most likely around the same time this particular cron starts.

And the reporter of this bug mentioned restarting cron did not help,
only rebooting helps. That would make sense since everything points at
rsyslog and not cron. So I will also try restarting just rsyslog and see
what that brings before things go nuts again ;-)

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to cron in Ubuntu.
https://bugs.launchpad.net/bugs/1429427

Title:
  Unexplainable time jumps in CRON

Status in rsyslog package in Ubuntu:
  Confirmed

Bug description:
  On my main server I see unexplainable time jumps backwards in the syslog. Those jumps affect CRON.
  Example:

  Feb 10 06:48:01 nostromo CRON[20351]: (root) CMD (    /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
  Feb 10 06:49:01 nostromo CRON[20364]: (root) CMD (    /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
  Feb 10 06:50:01 nostromo CRON[20386]: (root) CMD (    /storage/exec/status-nostromo.sh >/dev/null 2>&1)
  Feb  7 05:40:01 nostromo CRON[20389]: (root) CMD (    /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
  Feb 10 06:50:01 nostromo CRON[20390]: (root) CMD (    /storage/exec/checkinternet.sh 2>/dev/null 1>/dev/null)
  Feb 10 06:50:01 nostromo CRON[20391]: (root) CMD (    /storage/exec/checkip.sh 2>/dev/null 1>/dev/null)

  For debugging I did the following:
  Start xclock and watch xclock and tail -f /var/log/syslog in parallel. When CRON logged a wrong time, xclock did NOT show any time jump but seemed to freeze for a fraction of a second.
  Open a screen and start a script that will once per second read the time (in unix seconds) and compare the read time with the time read a second ago. If the current time was smaller, the script would send an email with a process list from before and after the jump. The script also never detected any time jump.

  In summary, my current impression is that there might be a bug in CRON because no other programm seems to be able to see the "wrong" time. The server in question is syslog server for 4 servers and 3 network devices. The time jumps exclusively show in syslog entries from the local CRON instance. Not in any remote syslog entry and not in any other local syslog entry, e.g. from DHCPD, bind, tftpd, etc. etc.
  Also, after a reboot, things work ok for several days upto about 2 or 3 weeks. Then the "time jumps" start to occur with increasing frequency.

  I don't use user crontabs but maintain all jobs in /etc/crontab. I
  have number of jobs which are triggered every minute and another
  number of jobs which are triggered every 5 minutes (maybe some CRON
  internal counter overflow problem?).

  Hardware:
  Asus P9D-V
  Intel Xeon E3-1240L V3
  16GB ECC RAM
  128GB SSD System
  3x3TB ZFS RaidZ2 storage
  1x3TB Misc. data

  CMOS battery already changed and board inspected.

  nostromo:~ # lsb_release -rd
  Description:    Ubuntu 14.04.2 LTS
  Release:        14.04

  nostromo:~ # apt-cache policy cron
  cron:
    Installed: 3.0pl1-124ubuntu2
    Candidate: 3.0pl1-124ubuntu2
    Version table:
   *** 3.0pl1-124ubuntu2 0
          500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages
          100 /var/lib/dpkg/status

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rsyslog/+bug/1429427/+subscriptions



More information about the foundations-bugs mailing list