Trusty SRU - Mellanox refresh

Eyal Perry eyalpe at mellanox.com
Sun Jul 27 12:51:22 UTC 2014


Hi all,
Regarding the list of patches I’ve sent, it’s mostly bug-fixes pulled from upstream and 4 fixes that made by-demand for HP urgent issues.
More specific details on purpose and mainly testing of these patches:

·         2 patches that are required for HP certification – and are relevant for x86 redhat – so it is less urgent to integrate it to Ubuntu at the moment, sorry for the bother.
UBUNTU: SAUCE: (no-up) net/mlx4_core: Use low memory profile on kdump kernel
UBUNTU: SAUCE: (no-up) net/mlx4_en: Reduce memory consumption on kdump kernel

·         This patch exposes the ability to disable blueflame. It help improving bi-directional IP forwarding test performance on mcdivitt – HP request.
      UBUNTU: SAUCE: (no-up) net/mlx4_en: Disable blueflame using ethtool private flags

·         4 patches that fix MAC modification handling issues – fixes an issue with bonding alb/tlb modes – HP request.

A simple test would be to set a tlb bond over 2 ports of mellanox NIC – and “make” bonding driver switch the role of active interface between the 2 interfaces by setting the ports up/down.
      UBUNTU: SAUCE: (no-up) net/mlx4_en: current_mac isn't updated in port up
      UBUNTU: SAUCE: (linux-next) net/mlx4_en: Fix mac_hash database inconsistency
      net/mlx4_en: Protect MAC address modification with the state_lock mutex
      net/mlx4_en: Fix errors in MAC address changing when port is down

·         Without this patch, if probe_vf (mlx4_core module parameter) is being used (usually with a big number >= 8),

You’ll see such prints: “localhost systemd-udevd: worker [14567] /devices/pci0000:00/0000:00:02.0/0000:03:00.1 timeout; kill it”
      net/mlx4_core: Defer VF initialization till PF is fully initialized

·         Without this patch, when loading the mlx4_core with probed VF you’ll see these prints: “PCIe link width is x0, device supports x8”, and on VM “Unable to determine PCI device chain minimum BW”
      net/mlx4_core: Don't issue PCIe speed/width checks for VFs

·         Without this patch – when you attach a single port VF of “port 2” to a VM, and unload/load the mellanox driver you’ll observe the following error messages:

“mlx4_core 0000:04:00.0: vhcr command QP_ATTACH (0xf0b) slave:7 in_param 0x66b5a000 in_mod=0x1000086a, op_mod=0x0 failed with error:0, status -22”
      net/mlx4_core: Adjust port number in qp_attach wrapper when detaching

·         Way to reproduce:
1) Start VM and assign it VFx
2) Configure bonding between 2 ports of the VF
3) Assign IP to the bond
4) Shut down this VM
5) Start new VM and assign it VMy
6) Go over steps 2-3 for this VM
7) Try running rping between this VM and hypervisor (at this point rping does not work)
      net/mlx4_core: Reset RoCE VF gids when guest driver goes down

·         Configure: Load mlx4_core with “options mlx4_core port_type_array=2,2 debug_level=1 num_vfs=1,1,2 probe_vf=0,1,1 log_num_mgm_entry_size=-1”

Run rping client on the probed VF of port2 (rping -dvca 192.168.30.1 -C 1) – without this patch it would fail with the following error: “cma event RDMA_CM_EVENT_UNREACHABLE, error -110.
      net/mlx4_core: Fix slave id computation for single port VF

·         1) Run VPI on the Hypervisor, with opensm and ipoib running on the IB port

2) Bring up the guest/VF

3) Configure the guest

4) Rping from guest as client – works.

5) On the guest, unload ONLY the low level driver (mlx4_ib/mlx4_en) and bring it back up – bringing up mlx4_ib FIRST, then mlx4_en.

6) Re-configure the guest interfaces

7) Rping from guest as client. DOES NOT WORK
      net/mlx4_core: Add UPDATE_QP SRIOV wrapper support

·         Set up a bonding interface over an VXLAN encapsulating device with ConnectX3-Pro HW, Send traffic, and check with tcpdump that GSO is functioning.
      bonding: Advertize vxlan offload features when supported

·         Single ported VF are currently supported only when all HCA ports are set to Ethernet – such operation would fail, but without this patch it will return success (0).
      net/mlx4_core: Fix the error flow when probing with invalid VF configuration

·         Load the mlx4_core with the following module parameter “log_num_mgm_entry_size=-1” to enable vxlan offloads – no traffic will get to the RX side (i.e. tcpdump).
      net/mlx4_en: Don't configure the HW vxlan parser when vxlan offloading isn't set

·         Without this patches network interface names for port 2 of 2-port devices mellanox are inconsistent – HP requested this patch but I don’t think it’s their unique need.

Can be easily checked with a command as follows: $ grep . /sys/bus/pci/drivers/mlx4_core/0000\:24\:00.0/net/*/dev_id

Should return:

/sys/bus/pci/drivers/mlx4_core/0000:24:00.0/net/eth8/dev_id:0x0

/sys/bus/pci/drivers/mlx4_core/0000:24:00.0/net/eth9/dev_id:0x1

Instead of this buggy output without the patch:

/sys/bus/pci/drivers/mlx4_core/0000:24:00.0/net/p5p1/dev_id:0x0

/sys/bus/pci/drivers/mlx4_core/0000:24:00.0/net/rename13/dev_id:0x0
      Revert "net/mlx4_en: Fix bad use of dev_id"

·         Not sure about testing these three:
      net/mlx4_core: Load the Eth driver first
      net/mlx4_core: Keep only one driver entry release mlx4_priv
      net/mlx4_core: Preserve pci_dev_data after __mlx4_remove_one()



Best Regards,
Eyal.

From: Brian Fromme [mailto:brian.fromme at canonical.com]
Sent: Saturday, July 26, 2014 12:38 AM
To: Narinder Gupta; Rafael Tinoco
Cc: Michael Miller; Dann Frazier; Raghuram Kota; Tim Gardner; Ming Lei; Eyal Perry; kernel-team
Subject: Re: Trusty SRU - Mellanox refresh

That's an excellent question, Narinder.  Eyal, Tim, Rafael, etc.  Can you help us to understand how to test these patches?  We can request that HP gets involved in the testing, but only if we can explain what these changes are and how to test them.

 thanks,
 Brian


On Fri, Jul 25, 2014 at 3:06 PM, Narinder Gupta <narinder.gupta at canonical.com<mailto:narinder.gupta at canonical.com>> wrote:
Brian,
Will you please brief me the changes we are suppose to test. I can ask HP to test and submit the results.


Thanks and Regards,

Narinder Gupta (PMP)                   narinder.gupta at canonical.com<mailto:narinder.gupta at canonical.com>

Canonical, Ltd.                    narindergupta [irc.freenode.net<http://irc.freenode.net>]

+1.281.736.5150<tel:%2B1.281.736.5150>                            narindergupta2007[skype]



Ubuntu- Linux for human beings | www.ubuntu.com<http://www.ubuntu.com> | www.canonical.com<http://www.canonical.com>

On Fri, Jul 25, 2014 at 3:56 PM, Brian Fromme <brian.fromme at canonical.com<mailto:brian.fromme at canonical.com>> wrote:
Oops, Narinder is the PM for McDivitt.  Adding him to this thread.

 cheers,
 Brian


On Fri, Jul 25, 2014 at 2:35 PM, Michael Miller <michael.miller at canonical.com<mailto:michael.miller at canonical.com>> wrote:
I'm thinking it would Perry Hoffman and Scott Hinchley. I hope I spelled their names correctly.

On Fri, Jul 25, 2014 at 3:31 PM, Brian Fromme <brian.fromme at canonical.com<mailto:brian.fromme at canonical.com>> wrote:
Yup.  Adding Dann Frazier and Raghu.  Can you guys help us to figure out who can integrate and test these on our McDivitt cartridge?

 thanks,
 Brian


On Fri, Jul 25, 2014 at 1:10 PM, Michael Miller <michael.miller at canonical.com<mailto:michael.miller at canonical.com>> wrote:
Brian,
Shouldn't this also go to the HP folks working the McDivitt issues? I don't have access to a McDivitt.

-- mikem

On Fri, Jul 25, 2014 at 1:50 PM, Tim Gardner <tim.gardner at canonical.com<mailto:tim.gardner at canonical.com>> wrote:
Gents - I'd like some positive testing confirmation before I apply this to Trusty.

rtg
--
Tim Gardner tim.gardner at canonical.com<mailto:tim.gardner at canonical.com>






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20140727/cdb3a084/attachment.html>


More information about the kernel-team mailing list