REISUB -- now what?
Kevin O'Gorman
kogorman at gmail.com
Thu Feb 16 17:34:49 UTC 2017
Hmm. Answering my own question: no, the webapps stuff occurs also on other
days that do not have a system failure. I still haven't found anything
that sticks out as happening at the start of a freeze-up. And the first
lockup message is so close to the restart that I doubt it's coincidence.
Very hard to say, because when the system has failed, all the screens are
blank, and won't come alive until reboot.
About all I can see to do is to stop the screens from blanking and hope
there's some clue on the workspace I happen to be on when it fails.
On Thu, Feb 16, 2017 at 8:55 AM, Kevin O'Gorman <kogorman at gmail.com> wrote:
> The problem is it's kind of overwhelming. At 4MB the log is over 1000x
> the normal size. The previous day's log is 10x normal. The failure
> happened unattended, so I'm not even sure when it happened.
> It would sure help if I knew in advance what the first stages of a reboot
> looks like.
>
> Just making some guesses, I grepped for "lockup" on unzipped logs: "grep
> -i lockup saved-*" and got 662 lines
> The first looked the most severe:
> saved.syslog.1:Feb 15 16:48:51 plato-x kernel: [1377271.765054] NMI
> watchdog: Watchdog detected hard LOCKUP on cpu 1
> Oddly, it was followed by the rest of those 662 that looked more less
> like the next one and the last one:
> saved.syslog.1:Feb 15 16:55:57 plato-x kernel: [1377749.441524] NMI
> watchdog: BUG: soft lockup - CPU#16 stuck for 23s! [systemd-timesyn:1338]
> saved.syslog.1:Feb 15 18:18:05 plato-x kernel: [1382677.646422] NMI
> watchdog: BUG: soft lockup - CPU#17 stuck for 22s! [chromium-browse:19520]
> with a considerable variety of daemons implicated, none of them the 32
> processes that are the real workload, and which run at near 100% CPU
> according to top(1)
>
> But then, I think this may have happened during the bootup, not the
> failure. That's because after silence from 18:18:05 to 18:25:51 there's
> Feb 15 18:25:51 plato-x rsyslogd: [origin software="rsyslogd"
> swVersion="8.16.0" x-pid="1409" x-info="http://www.rsyslog.com"] start
> and very soon there's a log entry that looks very much like startup:
> Feb 15 18:25:51 plato-x systemd[1]: Activating swap Swap Partition...
> It's surrounded by a lot of stuff I don't understand, including many
> occurrences of the phrase "failed with exit code" which really makes me
> wonder
>
> Anyway. before the first lockup line, things looked very quiet, as you
> might expect for an unattended system, except for some oddities that went
> on for a long time but ended about 10 hours earlier, here's the complete
> sequence from the last few of 09:33:59 up to the first lockup message:
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-livemail
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-mail-ru
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-newsblur
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-pandora
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-qml-autopilot
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> (process:22931): WARNING **: Found newline before all of met
> adata, skipping app
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-qq-mail
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-reddit
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-tumblr
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-twitter
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-vkcom
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-wordpress
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-wordpress-com
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-yahoomail
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-yahoonews
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-yandex-music
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-yandexmail
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-yandexmusic
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-yandexnews
> Feb 15 09:33:59 plato-x com.canonical.Unity.Webapps.Service[2265]: **
> Message: Investigating webapp: unity-webapps-youtube
> Feb 15 09:43:34 plato-x systemd[1]: snapd.refresh.timer: Adding 56min
> 56.771453s random time.
> Feb 15 10:10:17 plato-x dhclient[30629]: DHCPREQUEST of 192.168.1.172 on
> enp6s0 to 192.168.1.1 port 67 (xid=0x6d95b0a3)
> Feb 15 10:10:17 plato-x dhclient[30629]: DHCPACK of 192.168.1.172 from
> 192.168.1.1
> Feb 15 10:10:17 plato-x NetworkManager[1493]: <info> [1487182217.6309]
> address 192.168.1.172
> Feb 15 10:10:17 plato-x NetworkManager[1493]: <info> [1487182217.6310]
> plen 24 (255.255.255.0)
> Feb 15 10:10:17 plato-x NetworkManager[1493]: <info> [1487182217.6310]
> gateway 192.168.1.1
> Feb 15 10:10:17 plato-x NetworkManager[1493]: <info> [1487182217.6310]
> server identifier 192.168.1.1
> Feb 15 10:10:17 plato-x dbus[1459]: [system] Activating via systemd:
> service name='org.freedesktop.nm_dispatcher' unit='dbus-org.f
> reedesktop.nm-dispatcher.service'
> Feb 15 10:10:17 plato-x NetworkManager[1493]: <info> [1487182217.6310]
> lease time 86400
> Feb 15 10:10:17 plato-x NetworkManager[1493]: <info> [1487182217.6310]
> hostname 'plato-x'
> Feb 15 10:10:17 plato-x NetworkManager[1493]: <info> [1487182217.6311]
> nameserver '192.168.1.1'
> Feb 15 10:10:17 plato-x NetworkManager[1493]: <info> [1487182217.6311]
> dhcp4 (enp6s0): state changed bound -> bound
> Feb 15 10:10:17 plato-x dhclient[30629]: bound to 192.168.1.172 -- renewal
> in 29017 seconds.
> Feb 15 10:10:17 plato-x systemd[1]: Starting Network Manager Script
> Dispatcher Service...
> Feb 15 10:10:17 plato-x dbus[1459]: [system] Successfully activated
> service 'org.freedesktop.nm_dispatcher'
> Feb 15 10:10:17 plato-x nm-dispatcher: req:1 'dhcp4-change' [enp6s0]: new
> request (1 scripts)
> Feb 15 10:10:17 plato-x nm-dispatcher: req:1 'dhcp4-change' [enp6s0]:
> start running ordered scripts...
> Feb 15 10:10:17 plato-x systemd[1]: Started Network Manager Script
> Dispatcher Service.
> Feb 15 10:17:01 plato-x CRON[24916]: (root) CMD ( cd / && run-parts
> --report /etc/cron.hourly)
> Feb 15 11:17:01 plato-x CRON[27608]: (root) CMD ( cd / && run-parts
> --report /etc/cron.hourly)
> Feb 15 11:54:34 plato-x systemd[1]: Starting Daily apt activities...
> Feb 15 11:56:56 plato-x systemd[1]: snapd.refresh.timer: Adding 5h 21min
> 51.717429s random time.
> Feb 15 11:57:31 plato-x systemd[1]: Started Daily apt activities.
> Feb 15 11:57:31 plato-x systemd[1]: apt-daily.timer: Adding 6h 3min
> 22.557945s random time.
> Feb 15 11:57:31 plato-x systemd[1]: apt-daily.timer: Adding 9h 28min
> 33.356115s random time.
> Feb 15 12:17:01 plato-x CRON[30798]: (root) CMD ( cd / && run-parts
> --report /etc/cron.hourly)
> Feb 15 13:17:01 plato-x CRON[1056]: (root) CMD ( cd / && run-parts
> --report /etc/cron.hourly)
> Feb 15 13:30:44 plato-x systemd-timesyncd[1338]: Timed out waiting for
> reply from 91.189.89.199:123 (ntp.ubuntu.com).
> Feb 15 13:30:44 plato-x systemd-timesyncd[1338]: Synchronized to time
> server 91.189.91.157:123 (ntp.ubuntu.com).
> Feb 15 14:17:01 plato-x CRON[4111]: (root) CMD ( cd / && run-parts
> --report /etc/cron.hourly)
> Feb 15 15:17:01 plato-x CRON[6860]: (root) CMD ( cd / && run-parts
> --report /etc/cron.hourly)
> Feb 15 16:17:01 plato-x CRON[9648]: (root) CMD ( cd / && run-parts
> --report /etc/cron.hourly)
> Feb 15 16:48:51 plato-x kernel: [1377271.765054] NMI watchdog: Watchdog
> detected hard LOCKUP on cpu 1
> So could these com.canonical.Unity.Webapps.Service lines be related to a
> failure of the user interface, which then went quiet until I got home and
> around 4 PM noticed the machine was frozen, and all the CPU lockups are
> attempts to respond to my REISUB?
>
> If so is there anything I can do about this?
>
>
>
>
>
> On Thu, Feb 16, 2017 at 12:39 AM, Colin Law <clanlaw at gmail.com> wrote:
>
>> On 16 February 2017 at 02:48, Kevin O'Gorman <kogorman at gmail.com> wrote:
>>
>>> I've got a big machine running Ubuntu that's I'd prefer kept running all
>>> the time. I don't always get what I want, of course...
>>>
>>> Today, for the second time in about three weeks, it completely froze.
>>> Fortunately, I kept a printed copy of http://blog.kember.net/article
>>> s/reisub-the-gentle-linux-restart/ in a folder with assorted HOWTO-type
>>> articles, and was able to restart it that way. Which, by the way, proves
>>> it was still alive, at least sort of.
>>>
>>> Simply put, I held down Alt and SysRq and slowly typed R E I S U B.
>>> Several times, because there's no feedback and a machine with 256GB or RAM
>>> can take a while to dump buffers, but I could not actually tell if it was
>>> doing anything at all until the boot sequence started.
>>>
>>> My question is this: is there any way to investigate what caused this?
>>> I'll try looking in logs, but I am unfamiliar with them and don't really
>>> know what to look for.
>>>
>>
>> The first thing is to look in /var/log/syslog and see what is there at
>> the point of failure.
>>
>> Colin
>>
>>
>> --
>> ubuntu-users mailing list
>> ubuntu-users at lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>> an/listinfo/ubuntu-users
>>
>>
>
>
> --
> Kevin O'Gorman
> #define QUESTION ((bb) || (!bb)) /* Shakespeare */
>
> Please consider the environment before printing this email.
>
--
Kevin O'Gorman
#define QUESTION ((bb) || (!bb)) /* Shakespeare */
Please consider the environment before printing this email.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20170216/7305022e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 441 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20170216/7305022e/attachment.gif>
More information about the ubuntu-users
mailing list