NACK/Cmnt: [SRU][F:linux-bluefield][PATCH 0/9] CT offload fixes

Stefan Bader stefan.bader at canonical.com
Mon Apr 12 07:28:38 UTC 2021


On 11.04.21 12:20, Roi Dayan wrote:
> 
> 
> On 2021-04-09 4:05 PM, Stefan Bader wrote:
>> On 06.04.21 19:52, Roi Dayan wrote:
>>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922682
>>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922678
>>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672
>>>
>>>
>> Hi Roi,
>>
>> since the 3 issues appear to be independent of each other, this should be 3 
>> individual submissions to the mailing list. For multiple reasons: They can 
>> then be treated as units of their own. That makes each piece smaller (to look 
>> at) and also allows to potentially handle in parallel. But also prevents one 
>> part that maybe rises questions blocking the other parts.
>>
>> The SRU justification has to be in each bug report (best in the description 
>> because that place is prominently at the top and also can be adjusted / 
>> corrected at any time). The content of each section is best not too technical. 
>> It is basically the place where program managers, developers, and also testers 
>> look at. The "regression potential" part (here "what (it) could break" is 
>> something that should be written with user / tester experience in mind. So 
>> which areas or activities might be affected. The current statements actually 
>> look to be good that way. I just mention it again since that is something that 
>> is misunderstood often.
>>
>> With the SRU justification in the bug report, the cover email does not 
>> necessarily have to repeat that (but it can). The cover email can be used to 
>> pass on helpful information for developers looking at the patches. That part 
>> is usually read only by technical people.
>>
>> A final note on the BugLink URLs: We prefer the short version of
>>
>> https://bugs.launchpad.net/bugs/<bugnumber>
>>
>> because this is the form that will always be valid, even if the 
>> project/package name changes. Not that this is likely for this kernel but then 
>> at least things look the same across all projects. I heard that this might be 
>> a mail servier config issue. So if that is the only thing that is not quite 
>> right, we would adjust that part when applying the patches. But we would 
>> appreciate not to have to for sure. :)
>>
>> So once the bug reports are updated, please send the individual pieces again.
>> Thanks!
>>
>> -Stefan
> 
> great. thanks for the comments.  i'll resubmit.
> is a cover letter still needed if its a single patch?
> could the SRU justification comments be inside the commit msg after the
> patch separator? (---) ?

For single patches it would be ok the put additional info between separator and 
the patch as that goes away when applying. Funnily it seems by splitting the 
submission into 3, the patches magically multiplied in numbers...

-Stefan

> 
> 
>>> SRU Justification:
>>>
>>> 1. The first 2 patches are fixing a race, potentially crashing the kernel.
>>> 2. The next 2 patches are fixing a possible memory hog and aging active
>>>     ct conns.
>>> 3. The last patches are adding offload support for ct_state invalid and
>>>     ct_state reply.
>>>
>>> * Explain the bug(s)
>>>
>>> 1. The kernel crash can happen on stress tcp traffic opening and closing
>>>     the conns fast.
>>>
>>> 2. The memory hog and aging active ct conns can happen from any stress test
>>>     as we have a single workqueue for handling the ct offload conns
>>>     for add/del/stats.
>>>
>>> * brief explanation of fixes
>>>
>>> The fix for #1 is setting the offload timeout early and not relying on gc.
>>>
>>> The fix for #2 is splitting add/del/stats for diff workqueue and also
>>> we set a limit for add work entries.
>>>
>>> * How to test
>>>
>>> Testing #1 was done with stress http traffic opening conns, short data, close 
>>> conns.
>>> different 5-tuple each time.
>>>
>>> Testing #2 was done with just stress traffic with lots of conns different 
>>> 5-tuple.
>>>
>>> * What it could break.
>>>
>>> Issue #1 could potentially crash the kernel.
>>>
>>> Issue #2 can take a lot of memory for a long time and also causing active 
>>> conns to
>>> age out when not necessary.
>>>
>>>
>>>   include/linux/skbuff.h                |  5 ++-
>>>   include/net/flow_offload.h            |  1 +
>>>   include/net/netfilter/nf_conntrack.h  | 12 ++++++
>>>   include/net/sch_generic.h             |  1 +
>>>   include/uapi/linux/pkt_cls.h          |  2 +
>>>   net/core/dev.c                        |  2 +
>>>   net/core/flow_dissector.c             | 13 +++++--
>>>   net/netfilter/nf_conntrack_core.c     | 12 ------
>>>   net/netfilter/nf_flow_table_core.c    |  2 +
>>>   net/netfilter/nf_flow_table_offload.c | 56 +++++++++++++++++++++++----
>>>   net/openvswitch/conntrack.c           |  8 ++--
>>>   net/openvswitch/conntrack.h           |  6 ++-
>>>   net/openvswitch/flow.c                |  4 +-
>>>   net/sched/act_ct.c                    |  6 ++-
>>>   net/sched/cls_api.c                   |  1 +
>>>   net/sched/cls_flower.c                | 10 +++--
>>>   16 files changed, 105 insertions(+), 36 deletions(-)
>>>
>>
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20210412/eff4750c/attachment.sig>


More information about the kernel-team mailing list