[Bug 1891473] Re: cryptsetup ftbfs in focal
Guilherme G. Piccoli
1891473 at bugs.launchpad.net
Fri Sep 4 22:19:31 UTC 2020
After some investigation, seems we were able to narrow down the issue.
* tl;dr: After the upgrade of the PPA builder to Bionic, the memlock limit (ulimit -l) was bumped from a ridiculous low value (64) to something bigger (16M). Happens that cryptsetup then succeeded in its call to mlockall(), so all allocations got restricted by such limit, which is still a bit low and it ends up leading to allocation failures.
When the limit is very low (like in Xenial), the lock procedure fails, and cryptsetup allocations are not subject to this restriction, so everything just works.
See section "Conclusion" for alternatives on how to fix this
* Details:
I manage to reproduce that by collecting the luks2-validation images in a local environment, running a Bionic VM + LXD (a Focal container). By collecting the strace of luksDump in both environments, we got the following:
### LXD - NOT working
...
openat(AT_FDCWD, "./luks2-metadata-size-4m.img", O_RDONLY|O_DIRECT) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
lseek(5, 0, SEEK_SET) = 0
read(5, "LUKS\272\276\0\2\0\0\0\0\0@\0\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0\0"..., 4096) = 4096
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)
brk(0x55e789c38000) = 0x55e78982d000
mmap(NULL, 4325376, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)
lseek(5, 16384, SEEK_SET) = 16384
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(5, 32768, SEEK_SET) = 32768
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(5, 65536, SEEK_SET) = 65536
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
...
### VM - working
...
openat(AT_FDCWD, "./luks2-metadata-size-4m.img", O_RDONLY|O_DIRECT) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
lseek(5, 0, SEEK_SET) = 0
read(5, "LUKS\272\276\0\2\0\0\0\0\0@\0\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0\0"..., 4096) = 4096
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b06031000
lseek(5, 4096, SEEK_SET) = 4096
mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b05c30000
So: as mmap fails, lseeks start to be attempted with wrong sizes, 2K^N,
where N=4,5,...
In cryptsetup code: on luks2_disk_metadata.c, function
LUKS2_disk_hdr_read(), we fail and try all known offsets, as per the
below code:
[...]
* No header size, check all known offsets.
*/
for (r = -EINVAL,i = 0; r < 0 && i < ARRAY_SIZE(hdr2_offsets); i++)
[...]
This explains why we see that many lseeks in the LXD failing case, with
multiple offsets.
But then, why we fail? In the failing case, on funtion
LUKS2_disk_hdr_read(), we fail right in the first header read, as per
code in lib/luks2/luks2_disk_metadata.c:
[...]
* Read primary LUKS2 header (offset 0).
*/
state_hdr1 = HDR_FAIL;
r = hdr_read_disk(cd, device, &hdr_disk1, &json_area1, 0, 0);
[...]
The failure comes in a malloc(), specifically in hdr_read_disk():
[...]
r = hdr_disk_sanity_check_pre(cd, hdr_disk, &hdr_json_size, secondary, offset);
if (r < 0) {
return r;
}
/*
* Allocate and read JSON area. Always the whole area must be read.
*/
*json_area = malloc(hdr_json_size);
if (!*json_area) {
return -ENOMEM;
}
[...]
Without the json_area allocated we end-up looping, in search of the
proper header size, and failing the test. This malloc is the one
generating the following entry on strace:
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = -1 EAGAIN (Resource temporarily unavailable)
* Conclusion: we have 2 avenues for fixing that, I personally consider (a) [below] the more correct one.
(a) We could increase the builders memlock limit to 64M - Focal has that
as a default now. This seems to me the proper approach, given that in
real life cryptsetup is performing the memory lock, so we should
exercise it like that during the build tests.
(b) It's possible to fallback to the same scenario of Xenial builder by
_reducing_ the memlock limit and having cryptsetup not setting the
memory lock at all during the build. The bonus of this approach is its
simplicity - we can decrease such limit from the package itself, but at
the same time, we don't exercise the real life usage anymore during the
build tests.
By following the approach (b) above, I've managed to make the build
work: https://launchpad.net/~gpiccoli/+archive/ubuntu/crypt-
groovy/+build/19913720
I'll spin a mailing-list discussion on top of Colin's PPA builder update message to discuss the possibility of approach (a).
Cheers,
Guilherme
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to cryptsetup in Ubuntu.
https://bugs.launchpad.net/bugs/1891473
Title:
cryptsetup ftbfs in focal
Status in cryptsetup package in Ubuntu:
In Progress
Status in cryptsetup source package in Focal:
In Progress
Status in cryptsetup source package in Groovy:
In Progress
Bug description:
seen in a focal test rebuild:
https://launchpad.net/ubuntu/+archive/test-rebuild-20200810-focal/+build/19793722
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cryptsetup/+bug/1891473/+subscriptions
More information about the foundations-bugs
mailing list