REISUB -- now what?

Kevin O'Gorman kogorman at gmail.com
Thu Feb 16 16:55:17 UTC 2017


The problem is it's kind of overwhelming.  At 4MB the log is over 1000x the
normal size.  The previous day's log is 10x normal. The failure happened
unattended, so I'm not even sure when it happened.
It would sure help if I knew in advance what the first stages of a reboot
looks like.

Just making some guesses, I grepped for "lockup" on unzipped logs: "grep -i
lockup saved-*" and got 662 lines
The first looked the most severe:
saved.syslog.1:Feb 15 16:48:51 plato-x kernel: [1377271.765054] NMI
watchdog: Watchdog detected hard LOCKUP on cpu 1
Oddly, it was followed by the rest of those 662 that looked more  less like
the next one and the last one:
saved.syslog.1:Feb 15 16:55:57 plato-x kernel: [1377749.441524] NMI
watchdog: BUG: soft lockup - CPU#16 stuck for 23s! [systemd-timesyn:1338]
saved.syslog.1:Feb 15 18:18:05 plato-x kernel: [1382677.646422] NMI
watchdog: BUG: soft lockup - CPU#17 stuck for 22s! [chromium-browse:19520]
with a considerable variety of daemons implicated, none of them the 32
processes that are the real workload, and which run at near 100% CPU
according to top(1)

But then, I think this may have happened during the bootup, not the
failure.  That's because after silence from 18:18:05 to 18:25:51 there's
Feb 15 18:25:51 plato-x rsyslogd: [origin software="rsyslogd"
swVersion="8.16.0" x-pid="1409" x-info="http://www.rsyslog.com"] start
and very soon there's a log entry that looks very much like startup:
Feb 15 18:25:51 plato-x systemd[1]: Activating swap Swap Partition...
It's surrounded by a lot of stuff I don't understand, including many
occurrences of the phrase "failed with exit code" which really makes me
wonder

Anyway. before the first lockup line, things looked very quiet, as you
might expect for an unattended system, except for some oddities that went
on for a long time but ended about 10 hours earlier, here's the complete
sequence from the last few of 09:33:59 up to the first lockup message:
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-livemail
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-mail-ru
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-newsblur
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-pandora
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-qml-autopilot
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
(process:22931): WARNING **: Found newline before all of met
adata, skipping app
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-qq-mail
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-reddit
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-tumblr
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-twitter
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-vkcom
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-wordpress
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-wordpress-com
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-yahoomail
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-yahoonews
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-yandex-music
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-yandexmail
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-yandexmusic
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-yandexnews
Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
Message: Investigating webapp: unity-webapps-youtube
Feb 15 09:43:34 plato-x systemd[1]: snapd.refresh.timer: Adding 56min
56.771453s random time.
Feb 15 10:10:17 plato-x dhclient[30629]: DHCPREQUEST of 192.168.1.172 on
enp6s0 to 192.168.1.1 port 67 (xid=0x6d95b0a3)
Feb 15 10:10:17 plato-x dhclient[30629]: DHCPACK of 192.168.1.172 from
192.168.1.1
Feb 15 10:10:17 plato-x NetworkManager[1493]: <info>  [1487182217.6309]
address 192.168.1.172
Feb 15 10:10:17 plato-x NetworkManager[1493]: <info>  [1487182217.6310]
plen 24 (255.255.255.0)
Feb 15 10:10:17 plato-x NetworkManager[1493]: <info>  [1487182217.6310]
gateway 192.168.1.1
Feb 15 10:10:17 plato-x NetworkManager[1493]: <info>  [1487182217.6310]
server identifier 192.168.1.1
Feb 15 10:10:17 plato-x dbus[1459]: [system] Activating via systemd:
service name='org.freedesktop.nm_dispatcher' unit='dbus-org.f
reedesktop.nm-dispatcher.service'
Feb 15 10:10:17 plato-x NetworkManager[1493]: <info>  [1487182217.6310]
lease time 86400
Feb 15 10:10:17 plato-x NetworkManager[1493]: <info>  [1487182217.6310]
hostname 'plato-x'
Feb 15 10:10:17 plato-x NetworkManager[1493]: <info>  [1487182217.6311]
nameserver '192.168.1.1'
Feb 15 10:10:17 plato-x NetworkManager[1493]: <info>  [1487182217.6311]
dhcp4 (enp6s0): state changed bound -> bound
Feb 15 10:10:17 plato-x dhclient[30629]: bound to 192.168.1.172 -- renewal
in 29017 seconds.
Feb 15 10:10:17 plato-x systemd[1]: Starting Network Manager Script
Dispatcher Service...
Feb 15 10:10:17 plato-x dbus[1459]: [system] Successfully activated service
'org.freedesktop.nm_dispatcher'
Feb 15 10:10:17 plato-x nm-dispatcher: req:1 'dhcp4-change' [enp6s0]: new
request (1 scripts)
Feb 15 10:10:17 plato-x nm-dispatcher: req:1 'dhcp4-change' [enp6s0]: start
running ordered scripts...
Feb 15 10:10:17 plato-x systemd[1]: Started Network Manager Script
Dispatcher Service.
Feb 15 10:17:01 plato-x CRON[24916]: (root) CMD (   cd / && run-parts
--report /etc/cron.hourly)
Feb 15 11:17:01 plato-x CRON[27608]: (root) CMD (   cd / && run-parts
--report /etc/cron.hourly)
Feb 15 11:54:34 plato-x systemd[1]: Starting Daily apt activities...
Feb 15 11:56:56 plato-x systemd[1]: snapd.refresh.timer: Adding 5h 21min
51.717429s random time.
Feb 15 11:57:31 plato-x systemd[1]: Started Daily apt activities.
Feb 15 11:57:31 plato-x systemd[1]: apt-daily.timer: Adding 6h 3min
22.557945s random time.
Feb 15 11:57:31 plato-x systemd[1]: apt-daily.timer: Adding 9h 28min
33.356115s random time.
Feb 15 12:17:01 plato-x CRON[30798]: (root) CMD (   cd / && run-parts
--report /etc/cron.hourly)
Feb 15 13:17:01 plato-x CRON[1056]: (root) CMD (   cd / && run-parts
--report /etc/cron.hourly)
Feb 15 13:30:44 plato-x systemd-timesyncd[1338]: Timed out waiting for
reply from 91.189.89.199:123 (ntp.ubuntu.com).
Feb 15 13:30:44 plato-x systemd-timesyncd[1338]: Synchronized to time
server 91.189.91.157:123 (ntp.ubuntu.com).
Feb 15 14:17:01 plato-x CRON[4111]: (root) CMD (   cd / && run-parts
--report /etc/cron.hourly)
Feb 15 15:17:01 plato-x CRON[6860]: (root) CMD (   cd / && run-parts
--report /etc/cron.hourly)
Feb 15 16:17:01 plato-x CRON[9648]: (root) CMD (   cd / && run-parts
--report /etc/cron.hourly)
Feb 15 16:48:51 plato-x kernel: [1377271.765054] NMI watchdog: Watchdog
detected hard LOCKUP on cpu 1
So could these com.canonical.Unity.Webapps.Service lines be related to a
failure of the user interface, which then went quiet until I got home and
around 4 PM noticed the machine was frozen, and all the CPU lockups are
attempts to respond to my REISUB?

If so is there anything I can do about this?





On Thu, Feb 16, 2017 at 12:39 AM, Colin Law <clanlaw at gmail.com> wrote:

> On 16 February 2017 at 02:48, Kevin O'Gorman <kogorman at gmail.com> wrote:
>
>> I've got a big machine running Ubuntu that's I'd prefer kept running all
>> the time.  I don't always get what I want, of course...
>>
>> Today, for the second time in about three weeks, it completely froze.
>> Fortunately, I kept a printed copy of http://blog.kember.net/article
>> s/reisub-the-gentle-linux-restart/ in a folder with assorted HOWTO-type
>> articles, and was able to restart it that way.  Which, by the way, proves
>> it was still alive, at least sort of.
>>
>> Simply put, I held down Alt and SysRq and slowly typed R E I S U B.
>> Several times, because there's no feedback and a machine with 256GB or RAM
>> can take a while to dump buffers, but I could not actually tell if it was
>> doing anything at all until the boot sequence started.
>>
>> My question is this: is there any way to investigate what caused this?
>> I'll try looking in logs, but I am unfamiliar with them and don't really
>> know what to look for.
>>
>
> The first thing is to look in /var/log/syslog and see what is there at the
> point of failure.
>
> Colin
>
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/ubuntu-users
>
>


-- 
Kevin O'Gorman
#define QUESTION ((bb) || (!bb))   /* Shakespeare */

Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20170216/4d5dd1a7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 441 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20170216/4d5dd1a7/attachment-0001.gif>


More information about the ubuntu-users mailing list