Mystery logoff

Sat Feb 5 02:09:30 UTC 2022

On 2022-02-04 19:26, MR ZenWiz wrote:
> Problem is, I don't have a clue how to read the logs, although it
> looks fairly clear that Xorg crashed, possibly because of an error in
> nouveau.
> 
> I really need help deciphering what these logs show.  The files are at:
> 
> kern.log -https://drive.google.com/file/d/1kSRBHGP1RNKncp_zWyaGzMI3Ln3dXKZI/view?usp=sharing
> 
> Xorg.0.log.old -
> https://drive.google.com/file/d/1XgXHeHpe1UpXwVLpa0FH2Cbf8K13dLjD/view?usp=sharing
> 
> apport.log -https://drive.google.com/file/d/1kSRBHGP1RNKncp_zWyaGzMI3Ln3dXKZI/view?usp=sharing
> 
> If anyone has a clue, please let me know.

I'm not familiar with apport but it looks like the link to download that 
log is the same as the kern.log, maybe you copy pasted the same link.

It looks like CPU is entering idle state at the same time there is a 
fault in nouveau (the open source driver for Nvidia apparently). And the 
nouveau driver is handling an interrupt.

At the same time from the X log there is a backtrace at the time of the 
crash. I'll look into the stack trace to look at what is going on. But 
it looks like a fatal assertion in the drm_nouveau userspace shared 
libraries.

The X issue in the log seems to happen one minute after the issue in the 
kernel log (time 254141.962 vs 254084.282705).

The following are the functions called before it goes into nouveau:
[254141.970] (EE) 17: /usr/lib/xorg/modules/libglamoregl.so 
(glamor_create_gc+0xdcb9) [0x7f1258463ee9]
[254141.970] (EE) 18: /usr/lib/xorg/Xorg (DamageRegionAppend+0x1a40) 
[0x55f83e90e010]
[254141.970] (EE) 19: /usr/lib/xorg/Xorg (miPaintWindow+0x251) 
[0x55f83e96ddb1]
[254141.971] (EE) 20: /usr/lib/xorg/Xorg (miClearToBackground+0x113) 
[0x55f83e981f73]
[254141.971] (EE) 21: /usr/lib/xorg/Xorg (dixDestroyPixmap+0x795) 
[0x55f83e822255]
[254141.971] (EE) 22: /usr/lib/xorg/Xorg (SendErrorToClient+0x35e) 
[0x55f83e826a7e]
[254141.971] (EE) 23: /usr/lib/xorg/Xorg (InitFonts+0x3b5) [0x55f83e82aad5]
[254141.971] (EE) 24: /lib/x86_64-linux-gnu/libc.so.6 
(__libc_init_first+0x90) [0x7f1258c09fd0]
[254141.972] (EE) 25: /lib/x86_64-linux-gnu/libc.so.6 
(__libc_start_main+0x7d) [0x7f1258c0a07d]

Anyways the relevant section from the kernel log is:
Feb  3 14:26:20 marbase kernel: [254084.282545] nouveau 0000:03:00.0: 
fifo: fault 00 [READ] at 000000ffe6190000 engine 06 [HOST0] client 07 
[HUB/HOST_CPU] reason 00 [PDE] on channel 2 [007fa09000 Xorg[1550]]
Feb  3 14:26:20 marbase kernel: [254084.282563] nouveau 0000:03:00.0: 
fifo: channel 2: killed
Feb  3 14:26:20 marbase kernel: [254084.282567] nouveau 0000:03:00.0: 
fifo: runlist 0: scheduled for recovery
Feb  3 14:26:20 marbase kernel: [254084.282584] ------------[ cut here 
]------------
Feb  3 14:26:20 marbase kernel: [254084.282585] WARNING: CPU: 2 PID: 0 
at drivers/gpu/drm/nouveau/nvkm/engine/fifo/gk104.c:284 
gk104_fifo_engine_id+0x37/0x50 [nouveau]

And this happens at the exact time that:
Feb  3 14:26:20 marbase kernel: [254084.282989] RIP: 
0010:cpuidle_enter_state+0xcc/0x360

And the IRQ/interrupt details:
Feb  3 14:26:20 marbase kernel: [254084.282714]  <IRQ>
Feb  3 14:26:20 marbase kernel: [254084.282716] 
gk104_fifo_fault+0x10c/0x230 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282742] 
nvkm_fifo_fault+0x15/0x20 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282768] 
gp100_fifo_intr_fault+0xe0/0x110 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282794] 
gk104_fifo_intr+0x299/0x3a0 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282818] 
nvkm_fifo_intr+0x1d/0x20 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282842] 
nvkm_engine_intr+0x1f/0x30 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282865] 
nvkm_subdev_intr+0x1a/0x20 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282889] 
nvkm_mc_intr+0x138/0x180 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282920]  ? 
gp100_mc_intr_unarm+0x3a/0x50 [nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282947]  nvkm_pci_intr+0x51/0xa0 
[nouveau]
Feb  3 14:26:20 marbase kernel: [254084.282976] 
__handle_irq_event_percpu+0x45/0x170
Feb  3 14:26:20 marbase kernel: [254084.282979]  handle_irq_event+0x59/0xc0
Feb  3 14:26:20 marbase kernel: [254084.282980]  handle_edge_irq+0x8c/0x220
Feb  3 14:26:20 marbase kernel: [254084.282982] 
__common_interrupt+0x43/0xa0
Feb  3 14:26:20 marbase kernel: [254084.282984]  common_interrupt+0x85/0xa0
Feb  3 14:26:20 marbase kernel: [254084.282987]  </IRQ>
Feb  3 14:26:20 marbase kernel: [254084.282987]  <TASK>
Feb  3 14:26:20 marbase kernel: [254084.282988] 
asm_common_interrupt+0x1e/0x40

And:
eb  3 14:26:20 marbase kernel: [254084.282998]  cpuidle_enter+0x2e/0x40
Feb  3 14:26:20 marbase kernel: [254084.282999] 
cpuidle_idle_call+0x132/0x1d0
Feb  3 14:26:20 marbase kernel: [254084.283001]  do_idle+0x83/0xf0
Feb  3 14:26:20 marbase kernel: [254084.283001]  cpu_startup_entry+0x20/0x30
Feb  3 14:26:20 marbase kernel: [254084.283002]  start_secondary+0x11f/0x160
Feb  3 14:26:20 marbase kernel: [254084.283004] 
secondary_startup_64_no_verify+0xc2/0xcb

And the full Xorg log section:
[254141.962] (EE)
[254141.962] (EE) Backtrace:
[254141.964] (EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x139) 
[0x55f83e98dcf9]
[254141.964] (EE) 1: /lib/x86_64-linux-gnu/libc.so.6 (__sigaction+0x50) 
[0x7f1258c22520]
[254141.964] (EE) 2: /lib/x86_64-linux-gnu/libc.so.6 (pthread_kill+0xf8) 
[0x7f1258c76808]
[254141.965] (EE) 3: /lib/x86_64-linux-gnu/libc.so.6 (raise+0x16) 
[0x7f1258c22476]
[254141.965] (EE) 4: /lib/x86_64-linux-gnu/libc.so.6 (abort+0xd7) 
[0x7f1258c087b7]
[254141.965] (EE) unw_get_proc_name failed: no unwind info found [-10]
[254141.966] (EE) 5: /lib/x86_64-linux-gnu/libc.so.6 (?+0x0) 
[0x7f1258c086db]
[254141.966] (EE) 6: /lib/x86_64-linux-gnu/libc.so.6 
(__assert_fail+0x46) [0x7f1258c19e26]
[254141.966] (EE) 7: /lib/x86_64-linux-gnu/libdrm_nouveau.so.2 
(nouveau_pushbuf_data+0x107) [0x7f1250f58de7]
[254141.966] (EE) 8: /lib/x86_64-linux-gnu/libdrm_nouveau.so.2 
(nouveau_pushbuf_data+0x67) [0x7f1250f58d47]
[254141.966] (EE) 9: /lib/x86_64-linux-gnu/libdrm_nouveau.so.2 
(nouveau_pushbuf_data+0x182) [0x7f1250f58e62]
[254141.967] (EE) 10: /lib/x86_64-linux-gnu/libdrm_nouveau.so.2 
(nouveau_pushbuf_data+0x3e7) [0x7f1250f590c7]
[254141.967] (EE) 11: /lib/x86_64-linux-gnu/libdrm_nouveau.so.2 
(nouveau_pushbuf_space+0x359) [0x7f1250f59be9]
[254141.968] (EE) 12: /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so 
(nouveau_drm_screen_create+0x73c07a) [0x7f125791c3aa]
[254141.969] (EE) 13: /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so 
(nouveau_drm_screen_create+0xcb027) [0x7f12572ab357]
[254141.969] (EE) 14: /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so 
(nouveau_drm_screen_create+0xd47de) [0x7f12572b4b0e]
[254141.969] (EE) 15: /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so 
(nouveau_drm_screen_create+0xd5529) [0x7f12572b5859]
[254141.970] (EE) 16: /usr/lib/x86_64-linux-gnu/dri/nouveau_dri.so 
(__driDriverGetExtensions_d3d12+0x13ca9a) [0x7f1256b0c58a]
[254141.970] (EE) 17: /usr/lib/xorg/modules/libglamoregl.so 
(glamor_create_gc+0xdcb9) [0x7f1258463ee9]
[254141.970] (EE) 18: /usr/lib/xorg/Xorg (DamageRegionAppend+0x1a40) 
[0x55f83e90e010]
[254141.970] (EE) 19: /usr/lib/xorg/Xorg (miPaintWindow+0x251) 
[0x55f83e96ddb1]
[254141.971] (EE) 20: /usr/lib/xorg/Xorg (miClearToBackground+0x113) 
[0x55f83e981f73]
[254141.971] (EE) 21: /usr/lib/xorg/Xorg (dixDestroyPixmap+0x795) 
[0x55f83e822255]
[254141.971] (EE) 22: /usr/lib/xorg/Xorg (SendErrorToClient+0x35e) 
[0x55f83e826a7e]
[254141.971] (EE) 23: /usr/lib/xorg/Xorg (InitFonts+0x3b5) [0x55f83e82aad5]
[254141.971] (EE) 24: /lib/x86_64-linux-gnu/libc.so.6 
(__libc_init_first+0x90) [0x7f1258c09fd0]
[254141.972] (EE) 25: /lib/x86_64-linux-gnu/libc.so.6 
(__libc_start_main+0x7d) [0x7f1258c0a07d]
[254141.972] (EE) 26: /usr/lib/xorg/Xorg (_start+0x25) [0x55f83e813f25]
[254141.972] (EE)
[254141.972] (EE) Received signal 6 sent by process 1550, uid 0
[254141.972] (EE)
Fatal server error:
[254141.972] (EE) Caught signal 6 (Aborted). Server aborting