[Precise] [PATCH] SRU: zerocopy LSO segmenting fixed

Tim Gardner tim.gardner at canonical.com
Wed Jun 18 14:31:51 UTC 2014


You should stay focused on solutions for the Precise kernel (3.2) and
the Trusty LTS kernel (3.13). All others are EOL by early August with
the release of 14.04.1 and 12.04.5.

rtg

On 06/18/2014 08:23 AM, Anton Nayshtut wrote:
> Hi Tim,
> 
> At the moment, I'm testing a backport of 1fd819ec to 3.11 and it looks
> good so far.
> 
> I'll send it to this list once all the tests pass.
> 
> Best Regards,
> Anton
> 
> On Wed, Jun 18, 2014 at 5:18 PM, Tim Gardner <tim.gardner at canonical.com> wrote:
>> Perhaps this works for your particular workload, but there is no way I'm
>> applying some random crack to core networking code without it having
>> been vetted by upstream. Maybe there is a reason zero-copy is disabled
>> by 1fd819ec ?
>>
>> rtg
>>
>> On 06/16/2014 08:04 AM, Anton Nayshtut wrote:
>>> Hi Tim,
>>>
>>> Certainly, backporting of 1fd819ec is an option.
>>>
>>> The reason behind sending our patch is that 1fd819ec fixes the issue
>>> by eliminating the zero-copy advantages for this case.
>>>
>>> According to our tests, our patch performs ~24% better than 1fd819ec
>>> in this scenario. See https://lkml.org/lkml/2014/5/25/22.
>>>
>>> Best Regards.
>>> Anton
>>>
>>> On Mon, Jun 16, 2014 at 3:50 PM, Tim Gardner <tim.gardner at canonical.com> wrote:
>>>> On 06/15/2014 03:15 AM, Anton Nayshtut wrote:
>>>>>
>>>>> BugLink: https://bugs.launchpad.net/bugs/1328973
>>>>>
>>>>> [SRU Justification]
>>>>>
>>>>> [Setup]
>>>>>   - 2 or more QEMU Guest VMs sharing the the same host
>>>>>   - the Guests are communicating using virtio-net-pci devices
>>>>>   - vhost-net is enabled
>>>>>
>>>>> [Explanation]
>>>>> If one Guest VM sends GSO packets to another while GRO is disabled for
>>>>> receiver,
>>>>> so these packets are segmented by net/core.
>>>>> In this case, if zero-copy is enabled in vhost-net, the GSO packets TX
>>>>> completion is reported to userspace as before the TX is actually done.
>>>>> The vhost-net's zero-copy mechanism is enabled by default since v3.8-rc1
>>>>> (f9611c43).
>>>>>
>>>>> [Impact]
>>>>> Incorrect/junk data sent in case the transmitting Guest OS re-uses/frees
>>>>> the TX
>>>>> buffer immediately upon TX completion.
>>>>>
>>>>> [Test Case]
>>>>> Windows 2008R2 Guest VMs running MS HCK Offload LSO test.
>>>>> NOTE1: GRO is always disabled in this case because it's not supported by
>>>>> Windows
>>>>> Guest virtio-net-pci drivers.
>>>>> NOTE2: MS HCK re-uses the GSO (LSO) buffers, so it reproduces the issue
>>>>> every
>>>>> time.
>>>>>
>>>>> [Note]
>>>>> This bug has been fixed in v3.14-rc7 by 1fd819ec.
>>>>> The fix actually disables zero copy for this case since it forces
>>>>> unconditional
>>>>> fragments copying, but it resolves the issue mentioned.
>>>>>
>>>>> OriginalAuthor: Igor Royzis <igorr at swortex.com>
>>>>> Signed-off-by: Igor Royzis <igorr at swortex.com>
>>>>> Signed-off-by: Anton Nayshtut <anton at swortex.com>
>>>>> ---
>>>>>   include/linux/skbuff.h |    1 +
>>>>>   net/core/skbuff.c      |   13 +++++++++++++
>>>>>   2 files changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>> index e27c8f6..4848180 100644
>>>>> --- a/include/linux/skbuff.h
>>>>> +++ b/include/linux/skbuff.h
>>>>> @@ -267,6 +267,7 @@ struct skb_shared_info {
>>>>>         struct sk_buff  *frag_list;
>>>>>         struct skb_shared_hwtstamps hwtstamps;
>>>>>         __be32          ip6_frag_id;
>>>>> +       struct sk_buff  *zcopy_src;
>>>>>
>>>>>         /*
>>>>>          * Warning : all fields before dataref are cleared in
>>>>> __alloc_skb()
>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>>> index b2cd9a4..7023979 100644
>>>>> --- a/net/core/skbuff.c
>>>>> +++ b/net/core/skbuff.c
>>>>> @@ -625,14 +625,18 @@ EXPORT_SYMBOL(__kfree_skb);
>>>>>    */
>>>>>   void kfree_skb(struct sk_buff *skb)
>>>>>   {
>>>>> +       struct sk_buff *zcopy_src;
>>>>>         if (unlikely(!skb))
>>>>>                 return;
>>>>>         if (likely(atomic_read(&skb->users) == 1))
>>>>>                 smp_rmb();
>>>>>         else if (likely(!atomic_dec_and_test(&skb->users)))
>>>>>                 return;
>>>>> +       zcopy_src = skb_shinfo(skb)->zcopy_src;
>>>>>         trace_kfree_skb(skb, __builtin_return_address(0));
>>>>>         __kfree_skb(skb);
>>>>> +       if (unlikely(zcopy_src))
>>>>> +               kfree_skb(zcopy_src);
>>>>>   }
>>>>>   EXPORT_SYMBOL(kfree_skb);
>>>>>
>>>>> @@ -677,14 +681,18 @@ EXPORT_SYMBOL(skb_tx_error);
>>>>>    */
>>>>>   void consume_skb(struct sk_buff *skb)
>>>>>   {
>>>>> +       struct sk_buff *zcopy_src;
>>>>>         if (unlikely(!skb))
>>>>>                 return;
>>>>>         if (likely(atomic_read(&skb->users) == 1))
>>>>>                 smp_rmb();
>>>>>         else if (likely(!atomic_dec_and_test(&skb->users)))
>>>>>                 return;
>>>>> +       zcopy_src = skb_shinfo(skb)->zcopy_src;
>>>>>         trace_consume_skb(skb);
>>>>>         __kfree_skb(skb);
>>>>> +       if (unlikely(zcopy_src))
>>>>> +               kfree_skb(zcopy_src);
>>>>>   }
>>>>>   EXPORT_SYMBOL(consume_skb);
>>>>>
>>>>> @@ -2847,6 +2855,11 @@ struct sk_buff *skb_segment(struct sk_buff *skb,
>>>>> netdev_features_t features)
>>>>>                                                  nskb->data - tnl_hlen,
>>>>>                                                  doffset + tnl_hlen);
>>>>>
>>>>> +               if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
>>>>> +                       skb_shinfo(nskb)->zcopy_src = skb;
>>>>> +                       atomic_inc(&skb->users);
>>>>> +               }
>>>>> +
>>>>>                 if (fskb != skb_shinfo(skb)->frag_list)
>>>>>                         goto perform_csum_check;
>>>>>
>>>>>
>>>>
>>>> If 1fd819ec fixes the problem, then why not do a proper backport and run the
>>>> resulting patch by the stable maintainers for all affected kernels ?
>>>>
>>>> rtg
>>>> --
>>>> Tim Gardner tim.gardner at canonical.com
>>
>>
>> --
>> Tim Gardner tim.gardner at canonical.com


-- 
Tim Gardner tim.gardner at canonical.com




More information about the kernel-team mailing list