[Bug 1782984] Re: Assertion `!xcb_xlib_threads_sequence_lost' failed with multiple applications
Chris Halse Rogers
1782984 at bugs.launchpad.net
Tue Oct 19 23:22:32 UTC 2021
Has anyone tried PCManFM or Inkscape on focal? It seems that this update
doesn't fix GNOME Shell on focal, but *maybe* that's because Shell is
hitting a different bug?
If this can be verified to fix PCManFM, Inkscape, or something on focal
we can release it and open a new bug for GNOME Shell on Focal.
If this *doesn't* fix something on focal, we need to work out why -
maybe it needs more changes backported, like bionic did?
If FCManFM & Inkscape don't hang on focal *without* this update then I
think we can release the bionic update and close the focal task.
--
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to a duplicate bug report (1754084).
https://bugs.launchpad.net/bugs/1782984
Title:
Assertion `!xcb_xlib_threads_sequence_lost' failed with multiple
applications
Status in libx11 package in Ubuntu:
Fix Released
Status in libx11 source package in Bionic:
Fix Committed
Status in libx11 source package in Focal:
Fix Committed
Status in libx11 source package in Groovy:
Won't Fix
Bug description:
[Impact]
There is a race in libx11 causing applications to randomly abort. It's
not trivial to reproduce, but there are enough duplicates that this
deserves an SRU to bionic & focal.
[Fix]
Backport a commit from upstream:
From dbb55e1a5e82870466b095097d9e46046680ec25 Mon Sep 17 00:00:00 2001
From: Frediano Ziglio <fziglio at redhat.com>
Date: Wed, 29 Jan 2020 09:06:54 +0000
Subject: [PATCH] Fix poll_for_response race condition
In poll_for_response is it possible that event replies are skipped
and a more up to date message reply is returned.
This will cause next poll_for_event call to fail aborting the program.
This was proved using some slow ssh tunnel or using some program
to slow down server replies (I used a combination of xtrace and strace).
How the race happens:
- program enters into poll_for_response;
- poll_for_event is called but the server didn't still send the reply;
- pending_requests is not NULL because we send a request (see call
to append_pending_request in _XSend);
- xcb_poll_for_reply64 is called from poll_for_response;
- xcb_poll_for_reply64 will read from server, at this point
server reply with an event (say sequence N) and the reply to our
last request (say sequence N+1);
- xcb_poll_for_reply64 returns the reply for the request we asked;
- last_request_read is set to N+1 sequence in poll_for_response;
- poll_for_response returns the response to the request;
- poll_for_event is called (for instance from another poll_for_response);
- event with sequence N is retrieved;
- the N sequence is widen, however, as the "new" number computed from
last_request_read is less than N the number is widened to N + 2^32
(assuming last_request_read is still contained in 32 bit);
- poll_for_event enters the nested if statement as req is NULL;
- we compare the widen N (which now does not fit into 32 bit) with
request (which fits into 32 bit) hitting the throw_thread_fail_assert.
To avoid the race condition and to avoid the sequence to go back
I check again for new events after getting the response and
return this last event if present saving the reply to return it
later.
To test the race and the fix it's helpful to add a delay (I used a
"usleep(5000)") before calling xcb_poll_for_reply64.
Original patch written by Frediano Ziglio, see
https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/34
Reworked primarily for readability by Peter Hutterer, see
https://gitlab.freedesktop.org/xorg/lib/libx11/-/merge_requests/53
Signed-off-by: Peter Hutterer <peter.hutterer at who-t.net>
bionic needs another commit so that the real fix applies.
[Test case]
It's a race condition, the SRU sponsor (tjaalton) does not have a test
case for this, but the bug subscribers seem to.
[Where things could go wrong]
In theory there might be a case where a race still happens, but since
this has been upstream for a year now with no follow-up commits, it's
safe to assume that there are no regressions.
--
STEPS TO REPRODUCE
==================
The bug seems to occur when clicking on a file or folder. It is random and difficult to provide clear steps to reproduce. It is, however, a common situation.
EXPECTED RESULTS
================
pcmanfm works without problem.
ACTUAL RESULTS
==============
All pcmanfm windows become unresponsive, though background processes (e.g. copying) may continue without problem. with the same error message in ~/.cache/lxsession/LXDE/run.log:
[xcb] Unknown sequence number while processing queue
[xcb] Most likely this is a multi-threaded client and XInitThreads has not been called
[xcb] Aborting, sorry about that.
pcmanfm: xcb_io.c:259: poll_for_event: Assertion `!xcb_xlib_threads_sequence_lost' failed.
** Message: 19:58:49.267: app.vala:130: pcmanfm exit with this type of exit: 6
** Message: 19:58:49.268: app.vala:148: Exit not normal, try to reload
(note the timestamp on the message will vary)
AFFECTED VERSIONS
=================
1.2.5-3ubuntu1
NOT 1.2.4-1ubuntu0.1
UPSTREAM BUG
============
https://sourceforge.net/p/pcmanfm/bugs/1089/
ADDITIONAL NOTES
================
Other GTK2 file managers (e.g. Thunar) and applications (e.g. GIMP, Leafpad) seem to have the same problems. This is probably at least rooted in a GTK2 bug:
https://bugs.launchpad.net/ubuntu/+source/gtk+2.0/+bug/1808710
To further assert this, note that there is a SpaceFM file manager that
is available in GTK2 and GTK3. The GTK2 version displays the behavior.
The GTK3 version does not. Same with LibreOffice.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libx11/+bug/1782984/+subscriptions
More information about the Ubuntu-sponsors
mailing list