NACK: [SRU][L, K, J, F][M, U][PATCH 0/1] UBUNTU: [Packaging] Expose base (parent) abi for derivatives

Thadeu Lima de Souza Cascardo cascardo at canonical.com
Wed Jul 5 10:23:34 UTC 2023


On Wed, Jul 05, 2023 at 11:49:35AM +0200, Roxana Nicolescu wrote:
> 
> On 05-07-2023 11:17, Thadeu Lima de Souza Cascardo wrote:
> > On Wed, Jul 05, 2023 at 11:01:35AM +0200, Roxana Nicolescu wrote:
> > > On 06-06-2023 12:38, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > > > On Tue, Jun 06, 2023 at 09:43:36AM +0200, Roxana Nicolescu wrote:
> > > > > BugLink: https://bugs.launchpad.net/bugs/2017006
> > > > > 
> > > > > SRU Justification
> > > > > 
> > > > > [ Impact ]
> > > > > 
> > > > > Upstream stable added a change in the format of jbd2 in 5.15.87 that
> > > > > was cherry-picked to kinetic:linux.
> > > > > This is incompatible with the current changes in the lttng-module
> > > > > kinetic.
> > > > > 
> > > > > We previously encountered this issue for focal #lp2004644
> > > > > and we proactively released a new version for kinetic, but the fix did
> > > > > not work as expected.
> > > > > 
> > > > > As shown in the buglink, the build error is triggered because some
> > > > > function declaration does not match the kernel's.
> > > > > In `include/instrumentation/events/jbd2.h` in lttng-modules, there is
> > > > > a conditional declaration based on the kernel version.
> > > > > But for 5.19 kernels, the upstream kernel version has remained the
> > > > > same.
> > > > > 
> > > > > [ Fix ]
> > > > > Versioning based on linux upstream version does not work in this case.
> > > > > We expose the abi version of the kernel, but it alone will not solve
> > > > > this issue because we deliver many derivatives, each with its own
> > > > > version. Hence a base main abi version needs to be exposed in the
> > > > > headers.
> > > > I don't like this approach. lttng-modules and every other out-of-tree module
> > > > out there are doing the wrong thing. I worked with openvswitch in the past and
> > > > they kept out-of-tree modules even though the module was already in the tree,
> > > > and they kept it updated with the changing kernel ABI. What they did is that
> > > > they had their build look into the kernel headers for the function signatures
> > > > that were relevant and created a config file stating which signature each
> > > > relevant function had and then checking for that in their source code.
> > > > 
> > > > A couple of years ago, we had the same issue with wireguard and we attempted a
> > > > similar solution. There was a bug where it was not looking into the right
> > > > headers, but aside from that, it worked just fine. I believe the same solution
> > > > should be looked after here, and be considered for upstream too.
> > > > 
> > > > Cascardo.
> > > Hi Thadeau,
> > > 
> > > I went ahead and suggested this change to lttng-modules upstream.
> > > 
> > > They think maintaining a configure script will require a lot of effort and
> > > it's not adding any value
> > > 
> > > to their existing way of keeping up with kernel changes. And I agree with
> > > them.
> > Their module cannot build on some kernels because they are testing for version
> > numbers. And here we are introducing another version number to actually mean
> > (this other function signature is actually the one from this other major
> > version number here).
> > 
> > And are they the ones who are going to maintain that, or are we going to add
> > that to our copy of lttng? Then, of course, the burden of maintainership is not
> > falling into them.
> 
> We have an in-house versioning system so probably no, the change
> will be added to upstream, but the testing is actually done by us, so
> we will be the ones adding the version there.
> 
> > 
> > They don't want to migrate to a model that is so much better to maintain
> > because the technical debt is too much at this point.
> > 
> > 
> > > Therefore I am kindly asking to approve my initial suggestion. I need this
> > > at least for kinetic only because
> > > 
> > > kinetic is gonna reach eol very soon and it would be nice to ship a working
> > > module before that.
> > > 
> > One more reason not to add technical debt on our side just to fix something
> > that has been broken for this long on a dead-end series. I would go even
> > further and ask that we revisit what has been done in Jammy.
> Not sure what you mean here, nothing has been done in Jammy.
> Jammy was automatically solved by their fix in upstream because in jammy
> the kernel version(5.15.x) keeps changing.
> So what do you suggest, leave this broken anyway for kinetic?

Sorry, you said that you found this previously in focal, my mind recorded
it as jammy.

As for kinetic, yes, it should be left as broken. If I understand
correctly, in order to fix it, you would have to release a new kernel
exposing this macro in its headers, and then release a new lttng using that
macro.

That means that this new lttng will still fail on already released kernels
that do not expose the macro. Users on kinetic will only benefit if they
upgrade and reboot their kernels. And all of that must happen 15 days
before that Ubuntu release stops getting support. So by the time users
reboot their kernels to be able to use lttng (after how many weeks of not
being able to use it?), they need to upgrade to Lunar.

What is the story on Lunar here? Does lttng work there with 6.2 kernels?
What about Jammy? Does the lttng on Jammy work with both 5.15 and 6.2
kernels?

> > 
> > Honestly, if we really care about lttng, we should work with upstream to
> > migrate everything to the openvswitch model.
> 
> I personally think both sides have good reasons and we're kinda passing the
> ball from one to the other.

That is not my suggestion here. In order to stop pushing work to one
another, if we want to see this work done, we must do it ourselves in
collaboration with upstream. It is one thing to ask upstream to do work, a
second thing to introduce another bandaid on our kernels, a third thing to
do the upstream work that we want to get done. Of course, if there is
opposition that cannot be conciliated, it might not be worth it. But I
haven't seen evidence that we are at this stage.

> I am happy to try to convince them again, but it's not gonna be easy.
> And I do believe no matter our differences in thinking, we need to deliver
> functioning
> modules to our users and I think this solution is not gonna add too much
> technical depth
> and it's a good compromise to deliver a quick fix very soon.

Quick fixes are not always the best. In your cover letter, you mentioned
this could be used by other third-party modules. And there lies the
problem. Once we ship this, and third-party modules start relying on it, we
cannot remove it anymore. And since I don't think it is the right solution,
I see the difficult in removing it as a big problem.

I already mentioned one of the issues with this solution: it doesn't work
for older kernels that don't have the macro. And if you think of that,
since you are not going to make lttng start building correctly for the
older kernels, why not just say: if 5.19 and Ubuntu, just use this new
thing here. That will make it work with the most recent kernels and might
only break for the old 5.19 kernels that did not have it, and only for
users who are going to install lttng anew while running an old kernel when
the new ones have been available for a while.

You want to still be able to build for all those kernels we released? This
solution doesn't help with it.

If we won't do the entire work for upstream right now, and we still want to
fix this, put such a configure, header and ifdef in our downstream code.
Use it to demonstrate the power of that construct in comparison with the
other non-solution. That build the foundation, demonstrates that it works,
shows our goodwill in trying to get the work done and making it work with
all of our kernels.

Cascardo.

> That's why I am asking to accept this for kinetic.
> I will upload lttng-module for kinetic only because the change is minimal,
> see it in the buglink
> and there are higher changes to get it approved.
> 
> Roxana
> 
> > Cascardo.
> > 
> > > Thanks,
> > > Roxana
> > > 
> > > > > The actual fix for lttng-modules requires 2 steps:
> > > > > 1. Expose UTS_UBUNTU_BASE_RELEASE_ABI variable equal with the parent
> > > > > abi in <generated/utsrelease.h>. For backports with multiple
> > > > > inheritance (like jammy:linux-aws-5.19) the version from debian.master
> > > > > (kinetic:linux) will be used. In case there is no inheritance, the
> > > > > actual kernel abi will be used. This is needed for jammy, kinetic and
> > > > > lunar but it may be needed in the future for similar use cases,
> > > > > so it makes sense to push this to all releases.
> > > > > This patch will cover the first part.
> > > > > 2. Use this variable in the lttng-module to construct
> > > > > LTTNG_UBUNTU_BASE_VERSION_CODE and then use it for versioning:
> > > > > LTTNG_UBUNTU_BASE_VERSION_CODE >= LTTNG_UBUNTU_KERNEL_VERSION(5,19,0,42)
> > > > > The lttng-module fix will be done only for lunar, kinetic, and jammy.
> > > > > 
> > > > > [ Where problems would occur ]
> > > > > This patch alone should not have a big impact on our kernels.
> > > > > It will only expose a variable in the headers.
> > > > > In terms of the lttng-modules solution:
> > > > > We may notice new failures in ubuntu_lttng_smoke_test once this
> > > > > compiles and run. But it was tested locally and the results were good,
> > > > > so the probability is very very low.
> > > > > 
> > > > > [ Test Plan ]
> > > > > Since we hit backward issues in the past, multiple kernel versions
> > > > > were tested out to make sure this is the proper fix.
> > > > > General test instructions:
> > > > > 1. Install kinetic (desired version)
> > > > > 2. sudo apt install lttng-modules-dkms(or dpkg -i)
> > > > > 3. Depending on the version it may work or not
> > > > > 4. Additionally lttng-modules regression test should be triggered to
> > > > > confirm everything works fine.
> > > > > 
> > > > > My testing:
> > > > > I build fixed lttng-modules with the fixes described in launchpad.
> > > > > A. Kinetic kernel where I expose UTS_UBUNTU_BASE_RELEASE_ABI.
> > > > > The module built and rts passed:
> > > > > 09:32:05 INFO | PASSED (simple system call tracing with babeltrace)
> > > > > 09:32:05 INFO | Summary: 7 passed, 0 failed
> > > > > B. The break commit was introduced in 5.19.0.42. Versions between 42
> > > > > and 44 (depending on where this fix will land) will break.
> > > > > C. Tested with a kernel version that does not have the break commit:
> > > > > 5.19.0-41 to see if it's backwards compatible. Results were good.
> > > > > D. Tested with jammy-aws-5.19 that has this patch. It works fine.
> > > > > E. Tested with jammy-aws-5.19.0-1024 that does not have the break
> > > > > commit to see backwards compatibility. It succeeded.
> > > > > 
> > > > > Roxana Nicolescu (1):
> > > > >     UBUNTU: [Packaging] Expose base (parent) abi for derivatives
> > > > > 
> > > > >    debian/rules.d/0-common-vars.mk | 9 +++++++++
> > > > >    debian/rules.d/2-binary-arch.mk | 4 ++++
> > > > >    2 files changed, 13 insertions(+)
> > > > > 
> > > > > -- 
> > > > > 2.34.1
> > > > > 
> > > > > 
> > > > > -- 
> > > > > kernel-team mailing list
> > > > > kernel-team at lists.ubuntu.com
> > > > > https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list