Performance, Memory (Hugepages) and NUMA chalenges with Xenial, Open vSwitch and DPDK

Christian Ehrhardt christian.ehrhardt at canonical.com
Sun May 1 09:34:31 UTC 2016


Hi Thiago,
thanks for sharing.

I'm lacking anything like an ixia traffic gen so far - so I can't really
speak fro cases 1&2 that you had.
I wanted to look into moongen as an alternative somewhen the next time to
cover that as well.

But I've covered what you described as test 3&4 in your description.
There I've had quite some success between 30-100% improvement for small
packets.
There again I don't test raw packet forwarding like you set it up with the
l2 bridge (again lacking the ixia), but "classic" benchmarks.

Naturally for streaming workloads like iperf it becomes a matter of
efficiency and then the kernel implemenation wins. But for transnational
workloads like netperf and uperf show some clear wins.

That is even more amplified on virtual workloads where line speed is not a
bottleneck, there streaming workloads are worse comparing dpdk to kernel
networking, but the transactional loads are improving there as well.

Since I can't reproduce your ixia based cases yet I'd like to ask you if
you could try running some classic benchmarks via it so that we can compare
if your setup can reproduce my positive results.

Take a look at this to see how I set up the config and benchmarks.
https://git.launchpad.net/~ubuntu-server/ubuntu/+source/dpdk-testing

Also in there you can find some idea on tuning via cpu masks and queue
changes, but so far that is the area I had no success - my tuning made it
worse, but then my system is smaller (only one numa node) and that way
might just lack some resources.

Also in your traffic gen, can you try different packet
sizes/characteristics to see if it coudl be something comparable to my
streaming vs transactional findings?

I hope that helps you, let me know what you get and work on it together.

Kind Regards,
Christian


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Sun, May 1, 2016 at 10:05 AM, Martinx - ジェームズ <thiagocmartinsc at gmail.com>
wrote:

> Hey guys,
>
>  I want to share my tests results with OpenvSwitch and DPDK on Xenial.
>
>  So far, I'm a little bit frustrated... At first look, OVS with DPDK is
> wrose then just plain OVS. However, it will be just a matter of tuning it,
> at least, I hope so...
>
>
>  I am using the following reference docs:
>
>  * https://github.com/openvswitch/ovs/blob/master/INSTALL.DPDK.md
>
>  * https://help.ubuntu.com/16.04/serverguide/DPDK.html
>
>  * http://wiki.qemu.org/Features/vhost-user-ovs-dpdk
>
>
>  I have a Dell server with:
>
>  - 16 CPU Cores on 2 sockets, reported by cpu_layout.py (32 CPUs on
> /proc/cpuinfo - 16 HT on each NUMA)
>
>    Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
>
>  - 2 NUMA Nodes
>  - 128G of RAM
>  - 2 x 1G NICs for management / service
>  - 2 x 10G NICs (ixgbe) available for DPDK on Numa node 0
>  - Plenty of storage
>
>
>  Here are the tests that I want to do, on top of those 2 x 10G NICs:
>
>
>  1- "Regular OVS" L2 Bridge on bare-metal (no DPDK) - test executed
>
>  2- "OVS + DPDK" L2 Bridge on bare-metal (powered by DPDK) - test executed
>
>  3- "Regular OVS" on bare-metal plus a KVM guest running another "Regular
> OVS" (no DPDK) - future test
>
>  4- "Regular OVS" on bare-metal plus a KVM guest running OVS+DPDK (DPDK
> only inside of a KVM VM) - future test
>
>  5- "OVS + DPDK" on bare-metal plus a KVM guest running "Regular OVS"
> (DPDK only at the host) - future test - looks buggy today, I'm about to
> fill another bug report
>
>  6- "OVS + DPDK" on bare-metal plus a KVM guest running another "OVS +
> DPDK" (DPDK on both host and guest) - future test - blocked by BUG
> https://bugs.launchpad.net/bugs/1577088
>
>
>  At a glance, the test that I want to do, is very simple, which is to
> create a OVS+DPDK L2 bridge between 2 x 10G NICs on bare-metal (tests 2),
> no KVM Guests involved, later, I'll bring virtualization to the table
> (tests 3-6).
>
>  Later, I'll try a more advanced use-case, which will be to move this
> bare-metal OVS+DPDK L2 bridge (of test 2), to a KVM Virtual Machine (by
> doing the test 6).
>
>  I have an IXIA traffic generator sending 10G of data in both directions.
>
>  I also have a proprietary L2 Bridge DPDK Application (similar with
> OVS+DPDK) that, after tuning it, like isolcpus, CPU pinning, NUMA
> placement, it can handle 19.9G/s without ANY packet drop. This proprietary
> DPDK App was tested on this very same hardware that I'm testing Ubuntu, OVS
> and DPDK now.
>
>  So, I want to do the same with Xenial+OVS+DPDK (19.X G/s, no packet
> loss), but, I am unable to do it, it is slow and hard to tune. I'll share
> the instructions about how to reproduce the tests that I am doing.
>
>
> *** Test 1 - Regular OVS on bare-metal:
>
>
> apt install openvswitch-switch
>
>
> ip l set dev p1p1 up
> ip l set dev p1p2 up
>
> ovs-vsctl add-br ovsbr
> ovs-vsctl add-port ovsbr p1p1
> ovs-vsctl add-port ovsbr p1p2
>
> ip l set dev ovsbr up
>
> * Rate:
>
> bwm-ng -I ovsbr
>
> Total: ~2.05 GB/s (good, 10-Gigabit on each direction)
>
>
> * CPU consumption:
>
> Kernel process "ksoftirqd/*" consuming many CPU cores! As follows:
>
> Screenshot: http://i.imgur.com/pAKtrQa.png
>
>
>
> Test 2 - OVS with DPDK on bare-metal:
>
>
> apt install openvswitch-switch-dpdk
>
> service openvswitch-switch stop
>
> update-alternatives --set ovs-vswitchd
> /usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk
>
>
> ---
> PCI IDs and NUMA node of p1p1 and p1p2:
>
> PCI - /etc/dpdk/interfaces:
>
> -
> pci 0000:06:00.0 uio_pci_generic
> pci 0000:06:00.1 uio_pci_generic
> -
>
> NUMA Node of dual 10G NIC cards:
>
> cat /sys/class/net/p1p1/device/numa_node
> 0
> ---
>
> File /etc/default/grub have:
>
> -
> iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=8
> -
>
> File /etc/dpdk/dpdk.conf have:
>
> -
> NR_1G_PAGES=4
> -
>
> File /etc/default/openvswitch-switch have:
>
> -
> DPDK_OPTS='--dpdk -c 0x1 -n 4 -m 2048,0'
> -
>
> After installing and reconfiguring, I am rebooting the server...
>
> * The OVS + DPDK magic:
>
> ovs-vsctl add-br ovsbr -- set bridge ovsbr datapath_type=netdev
>
> ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
> ovs-vsctl add-port ovsbr dpdk1 -- set Interface dpdk1 type=dpdk
>
> ip link set dev ovsbr up
>
> bwm-ng -I ovsbr
>
>
> Total: 756.4 MB/s
>
>
> WTF!!! OVS powered by DPDK is more than 2 times slower than "Regular
> OVS"???
>
> Looks like that OVS+DPDK sucks (but I'll bet that I am doing it wrong)...
> Lets keep trying...
>
>
> * CPU consumption:
>
> Process ovs-vswitchd is consuming 100% of Core 0 / NUMA 0. In fact, it is
> consuming less CPU than "Regular OVS"... Mmmm... Lets give more CPU Cores
> to this guy...
>
>
> After tuning OVS PMD:
>
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=F
>
> Log:
> ---
> dpif_netdev|INFO|Created 2 pmd threads on numa node 0
> dpif_netdev(pmd37)|INFO|Core 2 processing port 'dpdk1'
> dpif_netdev(pmd38)|INFO|Core 0 processing port 'dpdk0'
> ---
>
> Bingo!
>
> ovs-vswitchd now consumes 200% of CPU (top -d3)
>
> "bwm-ng -I ovsbr" now shows:
>
> Tota: 1.18 GB/s
>
> Much better! But not good enough, "Regular OVS" reach ~2 GB/s"... Lets try
> to add more cores for PMD...
>
>
> ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=FF
>
> Log:
> ---
> dpif_netdev|INFO|Created 4 pmd threads on numa node 0
> dpif_netdev(pmd40)|INFO|Core 0 processing port 'dpdk0'
> dpif_netdev(pmd41)|INFO|Core 2 processing port 'dpdk1'
> ---
>
> Bad news...
>
> ovs-vswitchd now consumes 400% of CPU (top -d3)
>
>
> "bwm-ng -I ovsbr" now shows:
>
>
> Total: ~1.05 GB/s
>
>
> It is worse now! Because it is consuming two times the CPU resources,
> while the throughput is basically the same, in fact, it is slower now!
>
> 1 PMD thread (default), very bad perf (~750 MB/s)
>
> 2 PMD threads = Good but, not even close to regular OVS (without DPDK)
> (~1.18 GB/s)
>
> 4 PMD threads = Very bad, slower than when with only 2 PMD, while consumes
> twice the resources (~1.05 GB/s)
>
>
> So, here is my question:
>
>
> *** How to make OVS + DPDK hit the "~2 GB/s" mark (what Regular OVS can do
> "naturally") ?
>
>
>
> So far, for this e-mail message, I only executed "Tests 1 and 2", the
> other tests, I'll let open for the subsequent e-mails. I think that now, we
> have a good "stop point" here, where I want to see OVS + DPDK at full speed
> (similar with Regular OVS can do), then, I'll proceed with more tests and
> messages.
>
> Thoughts?
>
> Cheers!
> Thiago
>
>
> --
> ubuntu-server mailing list
> ubuntu-server at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
> More info: https://wiki.ubuntu.com/ServerTeam
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-server/attachments/20160501/83e0470f/attachment.html>


More information about the ubuntu-server mailing list