Performance, Memory (Hugepages) and NUMA chalenges with Xenial, Open vSwitch and DPDK

Martinx - ジェームズ thiagocmartinsc at gmail.com
Sun May 1 08:05:03 UTC 2016


Hey guys,

 I want to share my tests results with OpenvSwitch and DPDK on Xenial.

 So far, I'm a little bit frustrated... At first look, OVS with DPDK is
wrose then just plain OVS. However, it will be just a matter of tuning it,
at least, I hope so...


 I am using the following reference docs:

 * https://github.com/openvswitch/ovs/blob/master/INSTALL.DPDK.md

 * https://help.ubuntu.com/16.04/serverguide/DPDK.html

 * http://wiki.qemu.org/Features/vhost-user-ovs-dpdk


 I have a Dell server with:

 - 16 CPU Cores on 2 sockets, reported by cpu_layout.py (32 CPUs on
/proc/cpuinfo - 16 HT on each NUMA)

   Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz

 - 2 NUMA Nodes
 - 128G of RAM
 - 2 x 1G NICs for management / service
 - 2 x 10G NICs (ixgbe) available for DPDK on Numa node 0
 - Plenty of storage


 Here are the tests that I want to do, on top of those 2 x 10G NICs:


 1- "Regular OVS" L2 Bridge on bare-metal (no DPDK) - test executed

 2- "OVS + DPDK" L2 Bridge on bare-metal (powered by DPDK) - test executed

 3- "Regular OVS" on bare-metal plus a KVM guest running another "Regular
OVS" (no DPDK) - future test

 4- "Regular OVS" on bare-metal plus a KVM guest running OVS+DPDK (DPDK
only inside of a KVM VM) - future test

 5- "OVS + DPDK" on bare-metal plus a KVM guest running "Regular OVS" (DPDK
only at the host) - future test - looks buggy today, I'm about to fill
another bug report

 6- "OVS + DPDK" on bare-metal plus a KVM guest running another "OVS +
DPDK" (DPDK on both host and guest) - future test - blocked by BUG
https://bugs.launchpad.net/bugs/1577088


 At a glance, the test that I want to do, is very simple, which is to
create a OVS+DPDK L2 bridge between 2 x 10G NICs on bare-metal (tests 2),
no KVM Guests involved, later, I'll bring virtualization to the table
(tests 3-6).

 Later, I'll try a more advanced use-case, which will be to move this
bare-metal OVS+DPDK L2 bridge (of test 2), to a KVM Virtual Machine (by
doing the test 6).

 I have an IXIA traffic generator sending 10G of data in both directions.

 I also have a proprietary L2 Bridge DPDK Application (similar with
OVS+DPDK) that, after tuning it, like isolcpus, CPU pinning, NUMA
placement, it can handle 19.9G/s without ANY packet drop. This proprietary
DPDK App was tested on this very same hardware that I'm testing Ubuntu, OVS
and DPDK now.

 So, I want to do the same with Xenial+OVS+DPDK (19.X G/s, no packet loss),
but, I am unable to do it, it is slow and hard to tune. I'll share the
instructions about how to reproduce the tests that I am doing.


*** Test 1 - Regular OVS on bare-metal:


apt install openvswitch-switch


ip l set dev p1p1 up
ip l set dev p1p2 up

ovs-vsctl add-br ovsbr
ovs-vsctl add-port ovsbr p1p1
ovs-vsctl add-port ovsbr p1p2

ip l set dev ovsbr up

* Rate:

bwm-ng -I ovsbr

Total: ~2.05 GB/s (good, 10-Gigabit on each direction)


* CPU consumption:

Kernel process "ksoftirqd/*" consuming many CPU cores! As follows:

Screenshot: http://i.imgur.com/pAKtrQa.png



Test 2 - OVS with DPDK on bare-metal:


apt install openvswitch-switch-dpdk

service openvswitch-switch stop

update-alternatives --set ovs-vswitchd
/usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk


---
PCI IDs and NUMA node of p1p1 and p1p2:

PCI - /etc/dpdk/interfaces:

-
pci 0000:06:00.0 uio_pci_generic
pci 0000:06:00.1 uio_pci_generic
-

NUMA Node of dual 10G NIC cards:

cat /sys/class/net/p1p1/device/numa_node
0
---

File /etc/default/grub have:

-
iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=8
-

File /etc/dpdk/dpdk.conf have:

-
NR_1G_PAGES=4
-

File /etc/default/openvswitch-switch have:

-
DPDK_OPTS='--dpdk -c 0x1 -n 4 -m 2048,0'
-

After installing and reconfiguring, I am rebooting the server...

* The OVS + DPDK magic:

ovs-vsctl add-br ovsbr -- set bridge ovsbr datapath_type=netdev

ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl add-port ovsbr dpdk1 -- set Interface dpdk1 type=dpdk

ip link set dev ovsbr up

bwm-ng -I ovsbr


Total: 756.4 MB/s


WTF!!! OVS powered by DPDK is more than 2 times slower than "Regular OVS"???

Looks like that OVS+DPDK sucks (but I'll bet that I am doing it wrong)...
Lets keep trying...


* CPU consumption:

Process ovs-vswitchd is consuming 100% of Core 0 / NUMA 0. In fact, it is
consuming less CPU than "Regular OVS"... Mmmm... Lets give more CPU Cores
to this guy...


After tuning OVS PMD:

ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=F

Log:
---
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
dpif_netdev(pmd37)|INFO|Core 2 processing port 'dpdk1'
dpif_netdev(pmd38)|INFO|Core 0 processing port 'dpdk0'
---

Bingo!

ovs-vswitchd now consumes 200% of CPU (top -d3)

"bwm-ng -I ovsbr" now shows:

Tota: 1.18 GB/s

Much better! But not good enough, "Regular OVS" reach ~2 GB/s"... Lets try
to add more cores for PMD...


ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=FF

Log:
---
dpif_netdev|INFO|Created 4 pmd threads on numa node 0
dpif_netdev(pmd40)|INFO|Core 0 processing port 'dpdk0'
dpif_netdev(pmd41)|INFO|Core 2 processing port 'dpdk1'
---

Bad news...

ovs-vswitchd now consumes 400% of CPU (top -d3)


"bwm-ng -I ovsbr" now shows:


Total: ~1.05 GB/s


It is worse now! Because it is consuming two times the CPU resources, while
the throughput is basically the same, in fact, it is slower now!

1 PMD thread (default), very bad perf (~750 MB/s)

2 PMD threads = Good but, not even close to regular OVS (without DPDK)
(~1.18 GB/s)

4 PMD threads = Very bad, slower than when with only 2 PMD, while consumes
twice the resources (~1.05 GB/s)


So, here is my question:


*** How to make OVS + DPDK hit the "~2 GB/s" mark (what Regular OVS can do
"naturally") ?



So far, for this e-mail message, I only executed "Tests 1 and 2", the other
tests, I'll let open for the subsequent e-mails. I think that now, we have
a good "stop point" here, where I want to see OVS + DPDK at full speed
(similar with Regular OVS can do), then, I'll proceed with more tests and
messages.

Thoughts?

Cheers!
Thiago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-server/attachments/20160501/093f58d3/attachment.html>


More information about the ubuntu-server mailing list