Ongoing autopkgtest-cloud armhf maintenance
Julian Andres Klode
julian.klode at canonical.com
Thu Jan 13 11:19:56 UTC 2022
We are now operating at full capacity again. Turns out we also
did not have 11 workers, but 12, so anywhere I said 33 is
actually 36 :)
Some items remain TBD, but the rest is done and got us back
on our feet again:
On Wed, Jan 12, 2022 at 05:48:04PM +0100, Julian Andres Klode wrote:
>
> # Pending work
>
> - Move /var/snap/lxd/common out of /srv (where lxd storage pool lives);
> this will likely require slightly increasing the '/' disk size.
>
> - Investigate further where the 30s timeout in lxd comes from and how
> to prevent that (or just ignore it, but next item)
2x TBD
>
> - Investigate were the stuck instances came from and why they were not
> cleaned up. Is it possible for us to check which instances should be
> running and then remove all other ones from the workers? Right now
> we just do a basic time check
There were no errors logged. I saw mentions of exit code -15, but
nothing concrete.
But we now have new cleanup where we only keep as many containers as needed,
deleting everything else older than 1 hour.
>
> - The node lxd-armhf10 needs to finish its redeployment once the
> lxd images exist again
>
> - The node lxd-armhf9 needs to be redeployed to solve the disk I/O
> issue
>
> - Both lxd-armhf10 and lxd-armhf9 will have to be re-enabled with
> the new IPs in the mojo service bundle
Those 3 redeployments have happened
>
> - We should really redeploy all the lxd workers to have clean workers
> again
TBD, need to figure out partitioning for /var/snap/lxd/common, but
does not seem urgent right now.
--
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer i speak de, en
More information about the ubuntu-devel
mailing list