[Precise] [PATCH] SRU: zerocopy LSO segmenting fixed

Mon Jun 16 14:04:25 UTC 2014

Hi Tim,

Certainly, backporting of 1fd819ec is an option.

The reason behind sending our patch is that 1fd819ec fixes the issue
by eliminating the zero-copy advantages for this case.

According to our tests, our patch performs ~24% better than 1fd819ec
in this scenario. See https://lkml.org/lkml/2014/5/25/22.

Best Regards.
Anton

On Mon, Jun 16, 2014 at 3:50 PM, Tim Gardner <tim.gardner at canonical.com> wrote:
> On 06/15/2014 03:15 AM, Anton Nayshtut wrote:
>>
>> BugLink: https://bugs.launchpad.net/bugs/1328973
>>
>> [SRU Justification]
>>
>> [Setup]
>>   - 2 or more QEMU Guest VMs sharing the the same host
>>   - the Guests are communicating using virtio-net-pci devices
>>   - vhost-net is enabled
>>
>> [Explanation]
>> If one Guest VM sends GSO packets to another while GRO is disabled for
>> receiver,
>> so these packets are segmented by net/core.
>> In this case, if zero-copy is enabled in vhost-net, the GSO packets TX
>> completion is reported to userspace as before the TX is actually done.
>> The vhost-net's zero-copy mechanism is enabled by default since v3.8-rc1
>> (f9611c43).
>>
>> [Impact]
>> Incorrect/junk data sent in case the transmitting Guest OS re-uses/frees
>> the TX
>> buffer immediately upon TX completion.
>>
>> [Test Case]
>> Windows 2008R2 Guest VMs running MS HCK Offload LSO test.
>> NOTE1: GRO is always disabled in this case because it's not supported by
>> Windows
>> Guest virtio-net-pci drivers.
>> NOTE2: MS HCK re-uses the GSO (LSO) buffers, so it reproduces the issue
>> every
>> time.
>>
>> [Note]
>> This bug has been fixed in v3.14-rc7 by 1fd819ec.
>> The fix actually disables zero copy for this case since it forces
>> unconditional
>> fragments copying, but it resolves the issue mentioned.
>>
>> OriginalAuthor: Igor Royzis <igorr at swortex.com>
>> Signed-off-by: Igor Royzis <igorr at swortex.com>
>> Signed-off-by: Anton Nayshtut <anton at swortex.com>
>> ---
>>   include/linux/skbuff.h |    1 +
>>   net/core/skbuff.c      |   13 +++++++++++++
>>   2 files changed, 14 insertions(+)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index e27c8f6..4848180 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -267,6 +267,7 @@ struct skb_shared_info {
>>         struct sk_buff  *frag_list;
>>         struct skb_shared_hwtstamps hwtstamps;
>>         __be32          ip6_frag_id;
>> +       struct sk_buff  *zcopy_src;
>>
>>         /*
>>          * Warning : all fields before dataref are cleared in
>> __alloc_skb()
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index b2cd9a4..7023979 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -625,14 +625,18 @@ EXPORT_SYMBOL(__kfree_skb);
>>    */
>>   void kfree_skb(struct sk_buff *skb)
>>   {
>> +       struct sk_buff *zcopy_src;
>>         if (unlikely(!skb))
>>                 return;
>>         if (likely(atomic_read(&skb->users) == 1))
>>                 smp_rmb();
>>         else if (likely(!atomic_dec_and_test(&skb->users)))
>>                 return;
>> +       zcopy_src = skb_shinfo(skb)->zcopy_src;
>>         trace_kfree_skb(skb, __builtin_return_address(0));
>>         __kfree_skb(skb);
>> +       if (unlikely(zcopy_src))
>> +               kfree_skb(zcopy_src);
>>   }
>>   EXPORT_SYMBOL(kfree_skb);
>>
>> @@ -677,14 +681,18 @@ EXPORT_SYMBOL(skb_tx_error);
>>    */
>>   void consume_skb(struct sk_buff *skb)
>>   {
>> +       struct sk_buff *zcopy_src;
>>         if (unlikely(!skb))
>>                 return;
>>         if (likely(atomic_read(&skb->users) == 1))
>>                 smp_rmb();
>>         else if (likely(!atomic_dec_and_test(&skb->users)))
>>                 return;
>> +       zcopy_src = skb_shinfo(skb)->zcopy_src;
>>         trace_consume_skb(skb);
>>         __kfree_skb(skb);
>> +       if (unlikely(zcopy_src))
>> +               kfree_skb(zcopy_src);
>>   }
>>   EXPORT_SYMBOL(consume_skb);
>>
>> @@ -2847,6 +2855,11 @@ struct sk_buff *skb_segment(struct sk_buff *skb,
>> netdev_features_t features)
>>                                                  nskb->data - tnl_hlen,
>>                                                  doffset + tnl_hlen);
>>
>> +               if (skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) {
>> +                       skb_shinfo(nskb)->zcopy_src = skb;
>> +                       atomic_inc(&skb->users);
>> +               }
>> +
>>                 if (fskb != skb_shinfo(skb)->frag_list)
>>                         goto perform_csum_check;
>>
>>
>
> If 1fd819ec fixes the problem, then why not do a proper backport and run the
> resulting patch by the stable maintainers for all affected kernels ?
>
> rtg
> --
> Tim Gardner tim.gardner at canonical.com