NACK/Cmnt: [SRU][F:linux-bluefield][PATCH 0/9] CT offload fixes

Roi Dayan roid at nvidia.com
Mon Apr 12 08:16:58 UTC 2021



On 2021-04-12 10:28 AM, Stefan Bader wrote:
> On 11.04.21 12:20, Roi Dayan wrote:
>>
>>
>> On 2021-04-09 4:05 PM, Stefan Bader wrote:
>>> On 06.04.21 19:52, Roi Dayan wrote:
>>>> BugLink: 
>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922682
>>>> BugLink: 
>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922678
>>>> BugLink: 
>>>> https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1922672
>>>>
>>>>
>>> Hi Roi,
>>>
>>> since the 3 issues appear to be independent of each other, this 
>>> should be 3 individual submissions to the mailing list. For multiple 
>>> reasons: They can then be treated as units of their own. That makes 
>>> each piece smaller (to look at) and also allows to potentially handle 
>>> in parallel. But also prevents one part that maybe rises questions 
>>> blocking the other parts.
>>>
>>> The SRU justification has to be in each bug report (best in the 
>>> description because that place is prominently at the top and also can 
>>> be adjusted / corrected at any time). The content of each section is 
>>> best not too technical. It is basically the place where program 
>>> managers, developers, and also testers look at. The "regression 
>>> potential" part (here "what (it) could break" is something that 
>>> should be written with user / tester experience in mind. So which 
>>> areas or activities might be affected. The current statements 
>>> actually look to be good that way. I just mention it again since that 
>>> is something that is misunderstood often.
>>>
>>> With the SRU justification in the bug report, the cover email does 
>>> not necessarily have to repeat that (but it can). The cover email can 
>>> be used to pass on helpful information for developers looking at the 
>>> patches. That part is usually read only by technical people.
>>>
>>> A final note on the BugLink URLs: We prefer the short version of
>>>
>>> https://bugs.launchpad.net/bugs/<bugnumber>
>>>
>>> because this is the form that will always be valid, even if the 
>>> project/package name changes. Not that this is likely for this kernel 
>>> but then at least things look the same across all projects. I heard 
>>> that this might be a mail servier config issue. So if that is the 
>>> only thing that is not quite right, we would adjust that part when 
>>> applying the patches. But we would appreciate not to have to for 
>>> sure. :)
>>>
>>> So once the bug reports are updated, please send the individual 
>>> pieces again.
>>> Thanks!
>>>
>>> -Stefan
>>
>> great. thanks for the comments.  i'll resubmit.
>> is a cover letter still needed if its a single patch?
>> could the SRU justification comments be inside the commit msg after the
>> patch separator? (---) ?
> 
> For single patches it would be ok the put additional info between 
> separator and the patch as that goes away when applying. Funnily it 
> seems by splitting the submission into 3, the patches magically 
> multiplied in numbers...
> 
> -Stefan
> 

what do you mean magically multiplied? just 2 more cover letters.
the single series had 9 patches + a cover letter.
now series1 is 2 patches, series2 is 2 patches, series3 is 5 patches.
it's still the original 9 patches.

>>
>>
>>>> SRU Justification:
>>>>
>>>> 1. The first 2 patches are fixing a race, potentially crashing the 
>>>> kernel.
>>>> 2. The next 2 patches are fixing a possible memory hog and aging active
>>>>     ct conns.
>>>> 3. The last patches are adding offload support for ct_state invalid and
>>>>     ct_state reply.
>>>>
>>>> * Explain the bug(s)
>>>>
>>>> 1. The kernel crash can happen on stress tcp traffic opening and 
>>>> closing
>>>>     the conns fast.
>>>>
>>>> 2. The memory hog and aging active ct conns can happen from any 
>>>> stress test
>>>>     as we have a single workqueue for handling the ct offload conns
>>>>     for add/del/stats.
>>>>
>>>> * brief explanation of fixes
>>>>
>>>> The fix for #1 is setting the offload timeout early and not relying 
>>>> on gc.
>>>>
>>>> The fix for #2 is splitting add/del/stats for diff workqueue and also
>>>> we set a limit for add work entries.
>>>>
>>>> * How to test
>>>>
>>>> Testing #1 was done with stress http traffic opening conns, short 
>>>> data, close conns.
>>>> different 5-tuple each time.
>>>>
>>>> Testing #2 was done with just stress traffic with lots of conns 
>>>> different 5-tuple.
>>>>
>>>> * What it could break.
>>>>
>>>> Issue #1 could potentially crash the kernel.
>>>>
>>>> Issue #2 can take a lot of memory for a long time and also causing 
>>>> active conns to
>>>> age out when not necessary.
>>>>
>>>>
>>>>   include/linux/skbuff.h                |  5 ++-
>>>>   include/net/flow_offload.h            |  1 +
>>>>   include/net/netfilter/nf_conntrack.h  | 12 ++++++
>>>>   include/net/sch_generic.h             |  1 +
>>>>   include/uapi/linux/pkt_cls.h          |  2 +
>>>>   net/core/dev.c                        |  2 +
>>>>   net/core/flow_dissector.c             | 13 +++++--
>>>>   net/netfilter/nf_conntrack_core.c     | 12 ------
>>>>   net/netfilter/nf_flow_table_core.c    |  2 +
>>>>   net/netfilter/nf_flow_table_offload.c | 56 
>>>> +++++++++++++++++++++++----
>>>>   net/openvswitch/conntrack.c           |  8 ++--
>>>>   net/openvswitch/conntrack.h           |  6 ++-
>>>>   net/openvswitch/flow.c                |  4 +-
>>>>   net/sched/act_ct.c                    |  6 ++-
>>>>   net/sched/cls_api.c                   |  1 +
>>>>   net/sched/cls_flower.c                | 10 +++--
>>>>   16 files changed, 105 insertions(+), 36 deletions(-)
>>>>
>>>
>>>
> 
> 



More information about the kernel-team mailing list