[Bug 1880193] Re: autofs: Assertion 'set_remove(iterator->links, link) == link' failed at src/shared/userdb.c:314, function userdb_on_query_reply(). Aborting.
Rafael David Tinoco
1880193 at bugs.launchpad.net
Thu Sep 10 03:36:57 UTC 2020
>From systemd 245 release notes (https://lwn.net/Articles/814068/):
----
* A new component "userdb" has been added, along with a small daemon
"systemd-userdb.service" and a client tool "userdbctl". The framework
allows defining rich user and group records in a JSON format,
extending on the classic "struct passwd" and "struct group"
structures. Various components in systemd have been updated to
process records in this format, including systemd-logind and
pam-systemd. The user records are intended to be extensible, and
allow setting various resource management, security and runtime
parameters that shall be applied to processes and sessions of the
user as they log in. This facility is intended to allow associating
such metadata directly with user/group records so that they can be
produced, extended and consumed in unified form. We hope that
eventually frameworks such as sssd will generate records this way, so
that for the first time resource management and various other
per-user settings can be configured in LDAP directories and then
provided to systemd (specifically to systemd-logind and pam-system)
to apply on login. For further details see:
https://systemd.io/USER_RECORD
https://systemd.io/GROUP_RECORD
https://systemd.io/USER_GROUP_API
----
and yet we don't have userdbctl tool or the daemon
https://www.freedesktop.org/software/systemd/man/userdbctl.html
looks like an ongoing effort of unifying user/group information coming from
pam-systemd to logind management scheme within systemd.
I believe making all information coming from pam-systemd to logind available
through this varlink interface is what is causing the issue and where the problem
relies.
----
Nevertheless...
Error is coming from the userdb codeset, from the assertion:
assert_se(set_remove(iterator->links, link) == link);
when userdb code is being called by the varlink protocol.
Many subsystems within systemd now have an embedded varlink server to provide
IPC through simple json protocol. The journal daemon creates a varlink server on its
own through systemd-journald -> server_init -> server_open_varlink() ->
varlink_server_listen_fd() being one example.
The execution path for this error is either coming from:
(1)
process_connection() -> varlink_process() -> varlink_dispatch_reply() ->
reply_callback()
and the reply_callback is a pointer to userdb_on_query_reply(), since
this callback is set with varlink_bind_reply().
if (IN_SET(v->state, VARLINK_AWAITING_REPLY, VARLINK_AWAITING_REPLY_MORE)) {
varlink_set_state(v, VARLINK_PROCESSING_REPLY);
if (v->reply_callback)
r = v->reply_callback(v, parameters, error, flags, v->userdata)
OR
(2) from an error coming from:
varlink_dispatch_disconnect()
varlink_dispatch_method()
varlink_dispatch_reply()
varlink_dispatch_timeout()
all of them calling varlink_dispatch_local_error().
These errors come from varlink_process() main logic, processing the
varlink protocol.
- A timeout in connection would trigger varlink_dispatch_local_error().
- An error in varlink protocol in dispatch a reply w/ "invalid" json object, triggering varlink_dispatch_local_error().
- An error in varlink protocol when being asked to dispatch a method:
- org.varlink.service.GetInfo
- org.varlink.service.GetInterface
- org.varlink.service.*
are not implemented, for example, and would cause a call to varlink_dispatch_local_error()
- a disconnect would also cause a call to varlink_dispatch_local_error().
varlink_dispatch_local_error():
r = v->reply_callback(v, NULL, error,
VARLINK_REPLY_ERROR|VARLINK_REPLY_LOCAL, v->userdata);
-----------------
Commits related to varlink that are not merged:
$ git log --no-merges v246..HEAD --oneline --grep varlink
8d91b2206c varlink: properly allocate connection event source
77472d06a4 varlink: do not parse invalid messages twice
0c73f4f075 nss-resolve: port over to new varlink interface
9581bb8424 resolved: add minimal varlink api for resolving hostnames/addresses
65a01e8242 resolved: move query bus tracking to resolved-bus.c
c9de4e0f5b resolved: rename request → bus_request
7466e94f13 varlink: add helper for generating errno errors
and it looks like they're adding features to nss-resolv so it can resolv hostnames
using systemd-resolved... but not fixing anything related to it.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to autofs in Ubuntu.
https://bugs.launchpad.net/bugs/1880193
Title:
autofs: Assertion 'set_remove(iterator->links, link) == link' failed
at src/shared/userdb.c:314, function userdb_on_query_reply().
Aborting.
Status in autofs package in Ubuntu:
Triaged
Status in autofs source package in Focal:
New
Bug description:
autofs has a periodic error on mounting shares in Ubuntu 20.04 (it happens about 1 time out of 5):
"Assertion 'set_remove(iterator->links, link) == link' failed at src/shared/userdb.c:314, function userdb_on_query_reply(). Aborting.
Aborted (core dumped)"
`autofs.service` restart (or `automount` app restart) fixes this
issue. However if some of home dirs (like `Desktop` or `Documents`)
are mounted by `autofs`, user can't login into Ubuntu Desktop
Environment (PC freezes on login with black screen). Since this error
can prevent user log in, it might be considered as critical bug.
It happens both in `autofs` systemd service and by direct execution of
`automount` app (`automount -f -d` command).
May be it's an underlying error in `systemd` library (I found the
line, mentioned in error, in its source codes).
This issue has place in Ubuntu 20.04 (it works correctly in Ubuntu
18.04):
> lsb_release -rd
Description: Ubuntu 20.04 LTS
Release: 20.04
Packages versions:
> apt-cache policy autofs systemd
autofs:
Installed: 5.1.6-2
Candidate: 5.1.6-2
Version table:
*** 5.1.6-2 500
500 http://ru.archive.ubuntu.com/ubuntu focal/main amd64 Packages
100 /var/lib/dpkg/status
systemd:
Installed: 245.4-4ubuntu3
Candidate: 245.4-4ubuntu3
Version table:
*** 245.4-4ubuntu3 500
500 http://ru.archive.ubuntu.com/ubuntu focal/main amd64 Packages
100 /var/lib/dpkg/status
Steps to reproduce:
1. Ubuntu 20.04 clean install
2. `apt install realmd sssd sssd-tools libnss-sss libpam-sss adcli samba-common-bin`
3. `realm join DOMAIN.NAME`
4. Enable makehomedir by command: `pam-auth-update`
5. `apt install cifs-utils`
6. `apt install autofs`
7. Add next line inside [domain/DOMAIN.EXT] section into /etc/sssd/sssd.conf: `krb5_ccname_template = FILE:%d/krb5cc_%U`
8. Reboot
9. Login as domain user and try to open directory, mounted by `autofs` (in my configuration shares are provided by AD).
10. `autofs.service` stops with the error above about 1 time out of 5 (not always).
Found workaround:
Add `Restart=always` into `[Service]` section in `/lib/systemd/system/autofs.service` file (in other words configure auto-restart on failures for autofs service).
Attachments:
1. Full log of `automount -f -d` command.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/autofs/+bug/1880193/+subscriptions
More information about the foundations-bugs
mailing list