system freezes

Wed Nov 8 17:28:02 UTC 2017

Ralf Mardorf schreef op 08-11-2017 16:42:
> On Wed, 8 Nov 2017 07:06:24 -0700, compdoc wrote:
>> What you describe is often what happens when a power supply or hard
>> drive begins to fail.
> 
> Hi,
> 
> when I experienced issues as described by the OP, it often was either
> CMOS related or a HDD at the end of it's life cycle.
> 
> Sometimes just clearing the CMOS (BIOS) thingy solved issues, sometimes
> it was required to replace the battery. It's similar for HDDs,
> sometimes disconnecting and then connecting the SATA cables again 
> solved
> issues, sometimes a new HDD was needed.
> 
> An important note, SMART and Co not necessarily report any HDD issue,
> neither claims the BIOS that something is fishy or that the battery is
> empty.

Well there is another rather weird thing.

The time runs slower in Linux.

However the CMOS battery is quite new.

Windows generally doesn't regularly update the time from the internet so 
since Windows does not or did not seem to suffer this problem, I wonder 
what could be going on.

KDE just froze, I checked my logfile, there was nothing fishy in it.

By accident the script always checked all "D" state processes.

There weren't any from plasma.

Only from time to time jbd2 and kworker.

So now I know the time runs behind.

It didn't do that before to my recollection but this system is very new.

As in the linux installation is very new and I hadn't really used it 
yet.

So thank you for the hint for CMOS, but how can that cause harddisk or 
KDE failure?

The background system still runs fine.

Ehm...

There is this CUPS in dmesg:

[   18.934182] audit: type=1400 audit(1510139614.277:21): 
apparmor="DENIED" operation="capable" profile="/usr/sbin/cupsd" pid=1928 
comm="serial" capability=21  capname="sys_admin"

But printing works fine.

Then there is weirdness about the clock:

[ 3639.891181] clocksource: timekeeping watchdog on CPU2: Marking 
clocksource 'tsc' as unstable because the skew is too large:
[ 3639.891202] clocksource:                       'hpet' wd_now: 
301bc391 wd_last: 90a8ad86 mask: ffffffff
[ 3639.891206] clocksource:                       'tsc' cs_now: 
9cb4474a36c cs_last: 98264db91aa mask: ffffffffffffffff
[ 3639.896980] clocksource: Switched to clocksource hpet

Then there are multi-core issues in dmesg:

[10193.796843] INFO: rcu_sched self-detected stall on CPU
[10193.796850] INFO: rcu_sched self-detected stall on CPU
[10193.796870]  3-...: (3 GPs behind) idle=61f/1/0 softirq=304876/304876 
fqs=0
[10193.796874] INFO: rcu_sched self-detected stall on CPU
[10193.796879]
[10193.796887]  (t=15260 jiffies g=199741 c=199740 q=6)
[10193.796895]  2-...: (1 GPs behind) idle=a75/1/0 softirq=308395/308395 
fqs=0
[10193.796903] rcu_sched kthread starved for 15260 jiffies! g199741 
c199740 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[10193.796904]
[10193.796909] rcu_sched       S
[10193.796913]  (t=15260 jiffies g=199741 c=199740 q=6)
[10193.796918]     0     7      2 0x00000000
[10193.796924] rcu_sched kthread starved for 15260 jiffies! g199741 
c199740 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1

Follows is a calltrace.

[18816.691512] INFO: rcu_sched self-detected stall on CPU
[18816.691530]  0-...: (1 GPs behind) idle=853/1/0 softirq=415479/415480 
fqs=0
[18816.691533]   (t=18086 jiffies g=292162 c=292161 q=103)
[18816.691545] rcu_sched kthread starved for 18086 jiffies! g292162 
c292161 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1
[18816.691549] rcu_sched       S    0     7      2 0x00000000
[18816.691554] Call Trace:
[18816.691566]  __schedule+0x232/0x700
[18816.691571]  schedule+0x36/0x80
[18816.691575]  schedule_timeout+0x1ea/0x3f0
[18816.691580]  ? del_timer_sync+0x50/0x50
[18816.691586]  rcu_gp_kthread+0x551/0x910
[18816.691592]  kthread+0x109/0x140
[18816.691596]  ? rcu_note_context_switch+0x100/0x100
[18816.691601]  ? kthread_create_on_node+0x60/0x60
[18816.691606]  ret_from_fork+0x2c/0x40
[18816.691624] Task dump for CPU 0:
[18816.691626] swapper/0       R  running task        0     0      0 
0x00000008
[18816.691630] Call Trace:
[18816.691632]  <IRQ>
[18816.691637]  sched_show_task+0xcd/0x130
[18816.691641]  dump_cpu_task+0x37/0x40
[18816.691646]  rcu_dump_cpu_stacks+0x94/0xba
[18816.691651]  rcu_check_callbacks+0x747/0x890
[18816.691656]  ? update_wall_time+0x483/0x7a0
[18816.691661]  ? tick_sched_handle.isra.15+0x60/0x60
[18816.691665]  update_process_times+0x2f/0x60
[18816.691670]  tick_sched_handle.isra.15+0x25/0x60
[18816.691674]  tick_sched_timer+0x3d/0x70
[18816.691678]  __hrtimer_run_queues+0xf3/0x280
[18816.691683]  hrtimer_interrupt+0xb1/0x200
[18816.691689]  local_apic_timer_interrupt+0x38/0x60
[18816.691693]  smp_apic_timer_interrupt+0x38/0x50
[18816.691698]  apic_timer_interrupt+0x89/0x90
[18816.691702] RIP: 0010:native_safe_halt+0x6/0x10
[18816.691705] RSP: 0018:ffffffffbd403de8 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff10
[18816.691709] RAX: 0000000000000000 RBX: ffffffffbd410500 RCX: 
0000000000000003
[18816.691711] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
ffffffffbd89b79c
[18816.691713] RBP: ffffffffbd403de8 R08: 0100000000000000 R09: 
00000000ffffffe0
[18816.691715] R10: 0000000000000000 R11: 000000000001d1c8 R12: 
0000000000000000
[18816.691717] R13: ffffffffbd410500 R14: 0000000000000000 R15: 
0000000000000000
[18816.691719]  </IRQ>
[18816.691725]  default_idle+0x1e/0xd0
[18816.691730]  amd_e400_idle+0x23/0x50
[18816.691733]  arch_cpu_idle+0xf/0x20
[18816.691737]  default_idle_call+0x23/0x30
[18816.691742]  do_idle+0x165/0x1f0
[18816.691746]  cpu_startup_entry+0x71/0x80
[18816.691751]  rest_init+0x77/0x80
[18816.691756]  start_kernel+0x482/0x4a3
[18816.691761]  ? early_idt_handler_array+0x120/0x120
[18816.691765]  x86_64_start_reservations+0x24/0x26
[18816.691769]  x86_64_start_kernel+0x143/0x166
[18816.691774]  start_cpu+0x14/0x14