[Precise] [PATCH] SRU: zerocopy LSO segmenting fixed

Wed Jun 18 17:16:12 UTC 2014

Anton, your patch may be good and valid.  However for now, we need a
clean cherry-pick/backport of purely the upstream commit.

After that has been accomplished you are welcome to submit your patch to
the upstream netdev and linux-kernel mailing lists (available here
http://vger.kernel.org/vger-lists.html).  Additionally you should run
linux/scripts/get_maintainer.pl on your patch and include those people
on copy when you submit your performant patch upstream.

Once the performant patch is accepted upstream then resubmit it to us.

Thanks,
Dave.
Let us know if you have any more questions about the process.

On 06/16/2014 09:04 AM, Anton Nayshtut wrote:
> Hi Tim,
> 
> Certainly, backporting of 1fd819ec is an option.
> 
> The reason behind sending our patch is that 1fd819ec fixes the issue
> by eliminating the zero-copy advantages for this case.
> 
> According to our tests, our patch performs ~24% better than 1fd819ec
> in this scenario. See https://lkml.org/lkml/2014/5/25/22.
> 
> Best Regards.
> Anton
> 
> On Mon, Jun 16, 2014 at 3:50 PM, Tim Gardner <tim.gardner at canonical.com> wrote:
>> On 06/15/2014 03:15 AM, Anton Nayshtut wrote:
>>>
>>> BugLink: https://bugs.launchpad.net/bugs/1328973
>>>
>>> [SRU Justification]
>>>
>>> [Setup]
>>>   - 2 or more QEMU Guest VMs sharing the the same host
>>>   - the Guests are communicating using virtio-net-pci devices
>>>   - vhost-net is enabled
>>>
>>> [Explanation]
>>> If one Guest VM sends GSO packets to another while GRO is disabled for
>>> receiver,
>>> so these packets are segmented by net/core.
>>> In this case, if zero-copy is enabled in vhost-net, the GSO packets TX
>>> completion is reported to userspace as before the TX is actually done.
>>> The vhost-net's zero-copy mechanism is enabled by default since v3.8-rc1
>>> (f9611c43).
>>>
>>> [Impact]
>>> Incorrect/junk data sent in case the transmitting Guest OS re-uses/frees
>>> the TX
>>> buffer immediately upon TX completion.
>>>
>>> [Test Case]
>>> Windows 2008R2 Guest VMs running MS HCK Offload LSO test.
>>> NOTE1: GRO is always disabled in this case because it's not supported by
>>> Windows
>>> Guest virtio-net-pci drivers.
>>> NOTE2: MS HCK re-uses the GSO (LSO) buffers, so it reproduces the issue
>>> every
>>> time.
>>>
>>> [Note]
>>> This bug has been fixed in v3.14-rc7 by 1fd819ec.
>>> The fix actually disables zero copy for this case since it forces
>>> unconditional
>>> fragments copying, but it resolves the issue mentioned.
>>>
>>> OriginalAuthor: Igor Royzis <igorr at swortex.com>
>>> Signed-off-by: Igor Royzis <igorr at swortex.com>
>>> Signed-off-by: Anton Nayshtut <anton at swortex.com>
>>> ---
>>>   include/linux/skbuff.h |    1 +
>>>   net/core/skbuff.c      |   13 +++++++++++++
>>>   2 files changed, 14 insertions(+)
>>>
>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>> index e27c8f6..4848180 100644
>>> --- a/include/linux/skbuff.h
>>> +++ b/include/linux/skbuff.h
>>> @@ -267,6 +267,7 @@ struct skb_shared_info {
>>>         struct sk_buff  *frag_list;
>>>         struct skb_shared_hwtstamps hwtstamps;
>>>         __be32          ip6_frag_id;
>>> +       struct sk_buff  *zcopy_src;
>>>
>>>         /*
>>>          * Warning : all fields before dataref are cleared in
>>> __alloc_skb()
>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>> index b2cd9a4..7023979 100644
>>> --- a/net/core/skbuff.c
>>> +++ b/net/core/skbuff.c
>>> @@ -625,14 +625,18 @@ EXPORT_SYMBOL(__kfree_skb);
>>>    */
>>>   void kfree_skb(struct sk_buff *skb)
>>>   {
>>> +       struct sk_buff *zcopy_src;
>>>         if (unlikely(!skb))
>>>                 return;
>>>         if (likely(atomic_read(&skb->users) == 1))
>>>                 smp_rmb();
>>>         else if (likely(!atomic_dec_and_test(&skb->users)))
>>>                 return;
>>> +       zcopy_src = skb_shinfo(skb)->zcopy_src;
>>>         trace_kfree_skb(skb, __builtin_return_address(0));
>>>         __kfree_skb(skb);
>>> +       if (unlikely(zcopy_src))
>>> +               kfree_skb(zcopy_src);
>>>   }
>>>   EXPORT_SYMBOL(kfree_skb);
>>>
>>> @@ -677,14 +681,18 @@ EXPORT_SYMBOL(skb_tx_error);
>>>    */
>>>   void consume_skb(struct sk_buff *skb)
>>>   {
>>> +       struct sk_buff *zcopy_src;
>>>         if (unlikely(!skb))
>>>                 return;
>>>         if (likely(atomic_read(&skb->users) == 1))
>>>                 smp_rmb();
>>>         else if (likely(!atomic_dec_and_test(&skb->users)))
>>>                 return;
>>> +       zcopy_src = skb_shinfo(skb)->zcopy_src;
>>>         trace_consume_skb(skb);
>>>         __kfree_skb(skb);
>>> +       if (unlikely(zcopy_src))
>>> +               kfree_skb(zcopy_src);
>>>   }
>>>   EXPORT_SYMBOL(consume_skb);
>>>
>>> @@ -2847,6 +2855,11 @@ struct sk_buff *skb_segment(struct sk_buff *skb,
>>> netdev_features_t features)
>>>                                                  nskb->data - tnl_hlen,
>>>                                                  doffset + tnl_hlen);
>>>
>>> +               if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
>>> +                       skb_shinfo(nskb)->zcopy_src = skb;
>>> +                       atomic_inc(&skb->users);
>>> +               }
>>> +
>>>                 if (fskb != skb_shinfo(skb)->frag_list)
>>>                         goto perform_csum_check;
>>>
>>>
>>
>> If 1fd819ec fixes the problem, then why not do a proper backport and run the
>> resulting patch by the stable maintainers for all affected kernels ?
>>
>> rtg
>> --
>> Tim Gardner tim.gardner at canonical.com
>