[Bug 2003851] Re: occasional hanging 'apt-get update' from daily cronjob since Jammy 22.04
Ken Sharp
2003851 at bugs.launchpad.net
Mon Apr 24 16:06:08 UTC 2023
** Tags added: jammy
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to apt in Ubuntu.
https://bugs.launchpad.net/bugs/2003851
Title:
occasional hanging 'apt-get update' from daily cronjob since Jammy
22.04
Status in apt package in Ubuntu:
Confirmed
Bug description:
Hi!
Yesterday I spotted several machines of ours where a period `apt-get
update` was stalled. The `http` children were hanging in `WaitFd`
(waiting for parent instructions/queue). The parent was looping in
`AcquireUpdate` every 500ms.
We have a cronjob that runs every few hours which calls `apt-get update` and does some post-processing. We noticed that several of them had stalled at some point in time. Killing the parent (apt-get) got it unstuck, removing the locks.
Example:
```
# apt-get update
Reading package lists... Done
E: Could not get lock /var/lib/apt/lists/lock. It is held by process 154026 (apt-get)
N: Be aware that removing the lock file is not a solution and may break your system.
E: Unable to lock directory /var/lib/apt/lists/
```
Task listing:
```
root 153929 \_ /usr/sbin/CRON -f -P
root 153942 \_ /bin/sh -c [ -x /etc/zabbix/scripts/dpkg.updates ] && /etc/zabbix/scripts/dpkg.updates --cron
root 153943 \_ /bin/sh /etc/zabbix/scripts/dpkg.updates --cron
root 154026 \_ apt-get update
_apt 154029 \_ /usr/lib/apt/methods/http
_apt 154030 \_ /usr/lib/apt/methods/http
_apt 154031 \_ /usr/lib/apt/methods/http
_apt 154033 \_ /usr/lib/apt/methods/gpgv
```
Open (TCP) sockets. All have 1 item in the Recv-Q (probably a FIN or RST?):
```
# netstat -apn | grep -E '154026|154029|154030|154031|154033'
tcp 1 0 10.x.x.x:60868 217.x.x.x:80 CLOSE_WAIT 154030/http
tcp 1 0 10.x.x.x:40756 178.x.x.x:80 CLOSE_WAIT 154029/http
tcp 1 0 10.x.x.x:56818 185.x.x.x:80 CLOSE_WAIT 154031/http
```
All children (including gpgv) were waiting using pselect6(1, [0], NULL, NULL, NULL, NULL).
The parent (apt-get) was waiting using pselect6(10, [5 6 7 9], [],
NULL, {tv_sec=0, tv_nsec=500000000}, NULL).
The http sockets in the children were at fd=3.
Parent lsof:
```
# lsof -p 154026 +E
...
apt-get 154026 root 4uW REG 8,1 0 262281 /var/lib/apt/lists/lock
apt-get 154026 root 5r FIFO 0,13 0t0 4015176 pipe 154029,http,1w
apt-get 154026 root 6r FIFO 0,13 0t0 4012448 pipe 154030,http,1w
apt-get 154026 root 7r FIFO 0,13 0t0 4015192 pipe 154031,http,1w
apt-get 154026 root 8w FIFO 0,13 0t0 4015177 pipe 154029,http,0r
apt-get 154026 root 9r FIFO 0,13 0t0 4015233 pipe 154033,gpgv,1w
apt-get 154026 root 10w FIFO 0,13 0t0 4012449 pipe 154030,http,0r
apt-get 154026 root 12w FIFO 0,13 0t0 4015193 pipe 154031,http,0r
apt-get 154026 root 14w FIFO 0,13 0t0 4015234 pipe 154033,gpgv,0r
http 154029 _apt 0r FIFO 0,13 0t0 4015177 pipe 154026,apt-get,8w
http 154029 _apt 1w FIFO 0,13 0t0 4015176 pipe 154026,apt-get,5r
http 154030 _apt 0r FIFO 0,13 0t0 4012449 pipe 154026,apt-get,10w
http 154030 _apt 1w FIFO 0,13 0t0 4012448 pipe 154026,apt-get,6r
http 154031 _apt 0r FIFO 0,13 0t0 4015193 pipe 154026,apt-get,12w
http 154031 _apt 1w FIFO 0,13 0t0 4015192 pipe 154026,apt-get,7r
gpgv 154033 _apt 0r FIFO 0,13 0t0 4015234 pipe 154026,apt-get,14w
gpgv 154033 _apt 1w FIFO 0,13 0t0 4015233 pipe 154026,apt-get,9r
```
So:
- apt-get is waiting for any data written by any of its four children (at fd 5/6/7/9)
- http and gpgv are waiting for any data written by their parent (at their respective fd 0)
Parent backtrace:
```
#0 0x00007f420116a74d in select ()
from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f420153fb5d in pkgAcquire::Run(int) ()
from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#2 0x00007f420161d535 in AcquireUpdate(pkgAcquire&, int, bool, bool) ()
from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#3 0x00007f420161d986 in ListUpdate(pkgAcquireStatus&, pkgSourceList&, int) ()
from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#4 0x00007f42016d127b in DoUpdate (CmdL=...)
at ./apt-private/private-update.cc:73
#5 0x00007f420156d73f in CommandLine::DispatchArg(CommandLine::Dispatch const*, bool) ()
from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#6 0x00007f420169fa97 in DispatchCommandLine (CmdL=...,
Cmds=std::vector of length 27, capacity 32 = {...})
at ./apt-private/private-cmndline.cc:588
#7 0x0000561fc06bafbd in main (argc=2, argv=0x7ffc0e14aa38)
at ./cmdline/apt-get.cc:447
```
Child backtrace:
```
#0 0x00007f58b1c9b74d in select ()
from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f58b237cb43 in WaitFd(int, bool, unsigned long) ()
from /lib/x86_64-linux-gnu/libapt-pkg.so.6.0
#2 0x00005643e37bc7b3 in BaseHttpMethod::Loop (
this=0x7ffdeaed56c0) at ./methods/basehttp.cc:611
#3 main (argv=<optimized out>) at ./methods/http.cc:1052
```
Where this is:
```
(gdb) print *this
$1 = {<aptAuthConfMethod> = {<aptMethod> = {<pkgAcqMethod> = {<No data fields>}, <aptConfigWrapperForMethods> = {
methodNames = std::vector of length 2, capacity 2 = {
"http", "http::XXXXXX.nl"}}, Binary = "http",
SeccompFlags = 6},
authconfs = std::vector of length 1, capacity 1 = {
std::unique_ptr<FileFd> = {get() = 0x5643e5343100}}},
Server = std::unique_ptr<ServerState> = {
get() = 0x5643e534fba0}, NextURI = "",
AllowRedirect = true, Debug = false, PipelineDepth = 10,
static FailFile = {static npos = 18446744073709551615,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0x5643e534d650 "/var/lib/apt/lists/partial/apt.osso.nl_ubuntu-security_dists_jammy-security_InRelease"},
_M_string_length = 85, {
_M_local_buf = "\226", '\000' <repeats 14 times>,
_M_allocated_capacity = 150}}, static FailFd = -1,
static FailTime = 1674071760}
```
Relevant code in parent:
```
pkgAcquire::RunResult pkgAcquire::Run(int PulseInterval)
{
...
// Run till all things have been acquired
struct timeval tv = SteadyDurationToTimeVal(std::chrono::microseconds(PulseInterval));
while (ToFetch > 0)
...
int Res;
do
{
Res = select(Highest+1,&RFds,&WFds,0,&tv);
}
while (Res < 0 && errno == EINTR);
```
Relevant code in child:
```
int BaseHttpMethod::Loop()
{
signal(SIGTERM,SigTerm);
signal(SIGINT,SigTerm);
Server = 0;
int FailCounter = 0;
while (1)
{
// We have no commands, wait for some to arrive
if (Queue == 0)
{
if (WaitFd(STDIN_FILENO) == false)
return 0;
```
Versions:
```
Ubuntu 22.04.1 LTS (Jammy Jellyfish)
apt 2.4.8
```
Preliminiary conclusion/notes:
- I don't know why it stalls, but it does. And this was not a single
incident. All the stalling apt processes had stopped at a different
time/day.
- Apparently there is a situation possible where there are no commands
queued from the parent, and the parent isn't planning on sending any
either.
- Our apt-get is invoked by a cron job which calls either `update-
notifier-common` or `apt-get update`:
https://github.com/ossobv/zabbix-agent-
osso/blob/c61aee6087c0d03e66d03013c61acd9f65d0eaab/scripts/dpkg.updates#L14-L31
-- on the machines where apt-get was hanging, there was no `update-
notifier-common`.
- I don't know if this is 2.4.x related (on Focal we have apt 2.0.x),
but it sure looks like it. I did check a few other Focal machines that
lacked `update-notifier-common`, but they were not having a stalled
apt-get (although _not_ seeing a hung process proves nothing)
Let me know if there's anything I can get you. Should this be filed elsewhere? (Debian Salsa?)
Cheers,
Walter Doekes
OSSO B.V.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2003851/+subscriptions
More information about the foundations-bugs
mailing list