<div dir="ltr"><div dir="ltr"> </div> <div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, 11 Mar 2022 at 02:39, Dave Jones <<a href="mailto:dave.jones@canonical.com">dave.jones@canonical.com</a>> wrote: </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Mar 10, 2022 at 12:10:39PM +0100, Julian Andres Klode wrote: >On Wed, Mar 09, 2022 at 01:24:55PM +0000, Dave Jones wrote: [snip] >> Firstly I actually think lz4 -2 is probably the ideal level for that >> compressor. There's a large difference in compression performance >> between lz4 -1 and lz4 -2 across all platforms tested, but no >> difference in memory usage, and only a minimal increase in >> compression & decompression time. However, lz4 is currently >> configured to use level -9 which takes a considerable amount of extra >> time for little to no gain in compression performance (at least with >> our initramfs inputs anyway). >> >> On machines with more generous RAM allowances, zstd -T0 -1 does >> appear to be the ideal. The incremental gains in compression at >> higher levels are outweighed by the extra time spent compressing >> (i.e. for our initramfs inputs at least, the extra time spent on the >> compression is not gained back on reading the compressed data at I/O >> speeds typical for their respective platforms). >> >> [snipped some data] >> >> At this point, if you want some data to play with I'd highly >> recommend cloning the following repo and following the instructions >> in the README: >> >> <a href="https://github.com/waveform80/compression" rel="noreferrer" target="_blank">https://github.com/waveform80/compression</a> > >I'm still not convinced by the data as it does not align at all what I >see on my laptop or does it? It certainly does not _feel_ like it, as I >was arguing that -12 makes most sense. I've included the gather.py script in the compression repo if you want to run it against your laptop? The entries can be keyed by an arbitrary label (like "Julian's laptop") so they wouldn't clobber the existing PC based results. >I did some reameasurements > >Compression levels: > >uncompressed 157MB >lz4 -2 75MB (42%) >lz4 -9 63MB (40%) >zstd -T0 -1 56MB (36%) >zstd -T0 -2 52MB (33%) >zstd -T0 -3 47MB (30%) >zstd -T0 -6 45MB (29%) >zstd -T0 -12 40MB (22%) > >I don't know where 19 is, but a switch to lz4 -2 would roughly double >the size, that's for sure, so how would this affect /boot size? Sorry -- I should have clarified: I'm definitely *not* suggesting we switch PC users from zstd to lz4. As you point out, that would balloon the size of the compressed initrd. My one concern there was that even at level -1, zstd still uses a *teensy* bit too much RAM for my comfort on the extremely limited Pi Zero 2 and hence that we should be (and indeed, are) using lz4 instead there. When lz4 is in use (as on the Pi images on jammy), it is currently hard-coded to use level -9 which (as you observe below) is quite silly in spending a great deal more time achieving effectively no extra compression. In other words, my desire for lz4 -2 applies to the Pi images on Jammy alone and nothing else (it's a change that, if made, should *not* be back-ported to Focal for the reasons you've noted). >Looking at size, zstd clearly is the correct choice, if we reverted to >lz4 -2, sizes would even grow relative to older lz4 -9 choice, meaning >those users upgrading from focal run out of boot space. > >Ignoring non-LTS users for a moment, we essentially need to find a >compressor that accomodates the size increase in kernel initramfs due >to new code and stuff, and I think zstd -1 does that reasonably well. Agreed. >Times spent (compressor/total update-initramfs) > user system total >lz4 -2 0.3/ 6.2s 0.1/ 2.6s 0.3/ 8.2s (3% of update-initramfs time) >lz4 -9 4.8/10.8s 0.1/ 2.6s 4.9/12.9s <- this is totally silly >zstd -T0 -1 0.7/ 5.6s 0.1/ 1.7s 0.2/ 6.2s (um, faster than lz4?) >zstd -T0 -1 0.7/ 7.1s 0.1/ 3.5s 0.2/ 9.3s (um, much slower in 2nd run) >zstd -T0 -2 0.9/ 7.1s 0.1/ 3.0s 0.3/ 8.8s (more noise than difference) >zstd -T0 -3 1.6/ 7.8s 0.1/ 2.9s 0.5/ 8.8s >zstd -3 0.9/ 7.2s 0.1/ 3.1s 0.8/ 9.5s >zstd -3 0.9/ 7.7s 0.1/ 3.8s 0.8/10.7s (noise, lots of noise) >zstd -T0 -6 6.2/12.8s 0.1/ 3.9s 1.7/11.4s >zstd -T0 -12 13.1/19.7s 0.2/ 3.4s 4.0/13.0s > >It shows us that looking at the compressor does not tell us all the >story; for low-level zstd and lz4 values, you will absolutely not >notice the time spent compressing; in fact, there is more noise from >I/O or whatever despite the laptop essentially idling. Indeed; this is one of the reasons I stuck to the pure (de)compression times in my analysis and ignored the rest of update-initramfs as there's also *huge* variety across the architectures there (Pi I/O time is vastly different to a PC, and of course the inputs vary in size as well as the Pi has a much smaller default initrd after Juerg split the kernel modules in an -extras package). >There's no way I can figure out if zstd -3 performs worse than zstd -T0 >-1, as it's runtime varies by 50%. > >We also need to consider initrds we prebuild on images and like >combined kernel.efi binaries: They are built once and used >hundredthousand of times, they need *special* configuration. Agreed. In fact, looking at the analysis figures, the LZMA compressors (xz, lzip, etc.) consistently beat zstd at producing smaller output at higher levels. For pre-built images it *may* be worth re-considering those algorithms, but we'd need to measure the decompression performance (For the record, it's definitely not worth considering these algorithms for non-pre-built images; they may compress really well, but they're also reaaaaaallllly sloooooow!) There's some decompression figures for these in the analysis db but they were gathered using the userspace applications and I've no idea if those figures would be pertinent to kernel initrd decompression. >But my conclusion now is that I think zstd -1 or zstd -2 or whatever is >probably a safe choice for users coming from focal in that it does not >grow their initrds, so it's probably a good default. Yup, sounds good to me! </blockquote><div> So where is the debdiff? :-) If noone else has time I can probably work on this but if someone else has done this already, even better.</div><div> </div><div>Cheers,</div><div>mwh</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> >One thing we should work on is performing the compression in parallel >to the CPIO building, this should reduce I/O wait times and offer more >meaningful parallelization. But not sure how feasible that is - I don't >just mean cpio | compressor, but also running the scripts, and copying >them to the output, more like scripts | cpio files from stdin | >compress. Sounds like a good plan.</blockquote><div> </div></div></div>