Trusty SRU - Mellanox refresh

Narinder Gupta narinder.gupta at canonical.com
Sun Jul 27 13:02:37 UTC 2014


Thanks Eyal,
For Mcdivitt HP tested the patches from lomond PPA which Dannf Frazier build and as per them that fixes the bonding issue. HP had successfully tested the bond 5 and bond 6 and it works and unblock the HP CSI test team. Following parches were requested from ming Lei for Mcdivitt.

4 days ago	Amir Vadai	net/mlx4_en: Disable blueflame using ethtool private...	commit | commitdiff | tree | snapshot
4 days ago	Eyal Perry	net/mlx4_en: current_mac isn't updated in port up	commit | commitdiff | tree | snapshot
4 days ago	Noa Osherovich	net/mlx4_en: Fix mac_hash database inconsistency	commit | commitdiff | tree | snapshot
4 days ago	Shani Michaelli	net/mlx4_en: Protect MAC address modification with...	commit | commitdiff | tree | snapshot
4 days ago	Shani Michaelli	net/mlx4_en: Fix errors in MAC address changing when...	commit | commitdiff | tree | snapshot

Please let  me know if you need any comment from tester? I can ask  him to update in the email or in bug report.

On Jul 27, 2014, at 7:51 AM, Eyal Perry wrote:

> Hi all,
> Regarding the list of patches I’ve sent, it’s mostly bug-fixes pulled from upstream and 4 fixes that made by-demand for HP urgent issues.
> More specific details on purpose and mainly testing of these patches:
> ·         2 patches that are required for HP certification – and are relevant for x86 redhat – so it is less urgent to integrate it to Ubuntu at the moment, sorry for the bother.
> UBUNTU: SAUCE: (no-up) net/mlx4_core: Use low memory profile on kdump kernel
> UBUNTU: SAUCE: (no-up) net/mlx4_en: Reduce memory consumption on kdump kernel
> ·         This patch exposes the ability to disable blueflame. It help improving bi-directional IP forwarding test performance on mcdivitt – HP request.
>       UBUNTU: SAUCE: (no-up) net/mlx4_en: Disable blueflame using ethtool private flags
> ·         4 patches that fix MAC modification handling issues – fixes an issue with bonding alb/tlb modes – HP request.
> A simple test would be to set a tlb bond over 2 ports of mellanox NIC – and “make” bonding driver switch the role of active interface between the 2 interfaces by setting the ports up/down.
>       UBUNTU: SAUCE: (no-up) net/mlx4_en: current_mac isn't updated in port up
>       UBUNTU: SAUCE: (linux-next) net/mlx4_en: Fix mac_hash database inconsistency
>       net/mlx4_en: Protect MAC address modification with the state_lock mutex
>       net/mlx4_en: Fix errors in MAC address changing when port is down
> ·         Without this patch, if probe_vf (mlx4_core module parameter) is being used (usually with a big number >= 8),
> You’ll see such prints: “localhost systemd-udevd: worker [14567] /devices/pci0000:00/0000:00:02.0/0000:03:00.1 timeout; kill it”
>       net/mlx4_core: Defer VF initialization till PF is fully initialized
> ·         Without this patch, when loading the mlx4_core with probed VF you’ll see these prints: “PCIe link width is x0, device supports x8”, and on VM “Unable to determine PCI device chain minimum BW”
>       net/mlx4_core: Don't issue PCIe speed/width checks for VFs
> ·         Without this patch – when you attach a single port VF of “port 2” to a VM, and unload/load the mellanox driver you’ll observe the following error messages:
> “mlx4_core 0000:04:00.0: vhcr command QP_ATTACH (0xf0b) slave:7 in_param 0x66b5a000 in_mod=0x1000086a, op_mod=0x0 failed with error:0, status -22”
>       net/mlx4_core: Adjust port number in qp_attach wrapper when detaching
> ·         Way to reproduce:
> 1) Start VM and assign it VFx
> 2) Configure bonding between 2 ports of the VF
> 3) Assign IP to the bond
> 4) Shut down this VM
> 5) Start new VM and assign it VMy
> 6) Go over steps 2-3 for this VM
> 7) Try running rping between this VM and hypervisor (at this point rping does not work)
>       net/mlx4_core: Reset RoCE VF gids when guest driver goes down
> ·         Configure: Load mlx4_core with “options mlx4_core port_type_array=2,2 debug_level=1 num_vfs=1,1,2 probe_vf=0,1,1 log_num_mgm_entry_size=-1”
> Run rping client on the probed VF of port2 (rping -dvca 192.168.30.1 -C 1) – without this patch it would fail with the following error: “cma event RDMA_CM_EVENT_UNREACHABLE, error -110.
>       net/mlx4_core: Fix slave id computation for single port VF
> ·         1) Run VPI on the Hypervisor, with opensm and ipoib running on the IB port
> 2) Bring up the guest/VF
> 3) Configure the guest
> 4) Rping from guest as client – works.
> 5) On the guest, unload ONLY the low level driver (mlx4_ib/mlx4_en) and bring it back up – bringing up mlx4_ib FIRST, then mlx4_en.
> 6) Re-configure the guest interfaces
> 7) Rping from guest as client. DOES NOT WORK
>       net/mlx4_core: Add UPDATE_QP SRIOV wrapper support
> ·         Set up a bonding interface over an VXLAN encapsulating device with ConnectX3-Pro HW, Send traffic, and check with tcpdump that GSO is functioning.
>       bonding: Advertize vxlan offload features when supported
> ·         Single ported VF are currently supported only when all HCA ports are set to Ethernet – such operation would fail, but without this patch it will return success (0).
>       net/mlx4_core: Fix the error flow when probing with invalid VF configuration
> ·         Load the mlx4_core with the following module parameter “log_num_mgm_entry_size=-1” to enable vxlan offloads – no traffic will get to the RX side (i.e. tcpdump).
>       net/mlx4_en: Don't configure the HW vxlan parser when vxlan offloading isn't set
> ·         Without this patches network interface names for port 2 of 2-port devices mellanox are inconsistent – HP requested this patch but I don’t think it’s their unique need.
> Can be easily checked with a command as follows: $ grep . /sys/bus/pci/drivers/mlx4_core/0000\:24\:00.0/net/*/dev_id
> Should return:
> /sys/bus/pci/drivers/mlx4_core/0000:24:00.0/net/eth8/dev_id:0x0
> /sys/bus/pci/drivers/mlx4_core/0000:24:00.0/net/eth9/dev_id:0x1
> Instead of this buggy output without the patch:
> /sys/bus/pci/drivers/mlx4_core/0000:24:00.0/net/p5p1/dev_id:0x0
> /sys/bus/pci/drivers/mlx4_core/0000:24:00.0/net/rename13/dev_id:0x0
>       Revert "net/mlx4_en: Fix bad use of dev_id"
> ·         Not sure about testing these three:
>       net/mlx4_core: Load the Eth driver first
>       net/mlx4_core: Keep only one driver entry release mlx4_priv
>       net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()
>  
>  
>  
> Best Regards,
> Eyal.
>  
> From: Brian Fromme [mailto:brian.fromme at canonical.com] 
> Sent: Saturday, July 26, 2014 12:38 AM
> To: Narinder Gupta; Rafael Tinoco
> Cc: Michael Miller; Dann Frazier; Raghuram Kota; Tim Gardner; Ming Lei; Eyal Perry; kernel-team
> Subject: Re: Trusty SRU - Mellanox refresh
>  
> That's an excellent question, Narinder.  Eyal, Tim, Rafael, etc.  Can you help us to understand how to test these patches?  We can request that HP gets involved in the testing, but only if we can explain what these changes are and how to test them.
>  
>  thanks,
>  Brian
>  
>  
> 
> On Fri, Jul 25, 2014 at 3:06 PM, Narinder Gupta <narinder.gupta at canonical.com> wrote:
> Brian,
> Will you please brief me the changes we are suppose to test. I can ask HP to test and submit the results.
> 
> Thanks and Regards,
> Narinder Gupta (PMP)                   narinder.gupta at canonical.com
> Canonical, Ltd.                    narindergupta [irc.freenode.net]
> +1.281.736.5150                            narindergupta2007[skype]
>  
> Ubuntu- Linux for human beings | www.ubuntu.com | www.canonical.com
>  
> 
> On Fri, Jul 25, 2014 at 3:56 PM, Brian Fromme <brian.fromme at canonical.com> wrote:
> Oops, Narinder is the PM for McDivitt.  Adding him to this thread.
>  
>  cheers,
>  Brian
>  
>  
> 
> On Fri, Jul 25, 2014 at 2:35 PM, Michael Miller <michael.miller at canonical.com> wrote:
> I'm thinking it would Perry Hoffman and Scott Hinchley. I hope I spelled their names correctly.
>  
> 
> On Fri, Jul 25, 2014 at 3:31 PM, Brian Fromme <brian.fromme at canonical.com> wrote:
> Yup.  Adding Dann Frazier and Raghu.  Can you guys help us to figure out who can integrate and test these on our McDivitt cartridge?
>  
>  thanks,
>  Brian
>  
>  
> 
> On Fri, Jul 25, 2014 at 1:10 PM, Michael Miller <michael.miller at canonical.com> wrote:
> Brian,
> Shouldn't this also go to the HP folks working the McDivitt issues? I don't have access to a McDivitt.
>  
> -- mikem
>  
> 
> On Fri, Jul 25, 2014 at 1:50 PM, Tim Gardner <tim.gardner at canonical.com> wrote:
> Gents - I'd like some positive testing confirmation before I apply this to Trusty.
> 
> rtg
> -- 
> Tim Gardner tim.gardner at canonical.com
>  
>  
>  
>  
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20140727/849431c2/attachment.html>


More information about the kernel-team mailing list