Performance, Memory (Hugepages) and NUMA chalenges with Xenial, Open vSwitch and DPDK
Martinx - ジェームズ
thiagocmartinsc at gmail.com
Sun May 1 08:05:03 UTC 2016
Hey guys,
I want to share my tests results with OpenvSwitch and DPDK on Xenial.
So far, I'm a little bit frustrated... At first look, OVS with DPDK is
wrose then just plain OVS. However, it will be just a matter of tuning it,
at least, I hope so...
I am using the following reference docs:
* https://github.com/openvswitch/ovs/blob/master/INSTALL.DPDK.md
* https://help.ubuntu.com/16.04/serverguide/DPDK.html
* http://wiki.qemu.org/Features/vhost-user-ovs-dpdk
I have a Dell server with:
- 16 CPU Cores on 2 sockets, reported by cpu_layout.py (32 CPUs on
/proc/cpuinfo - 16 HT on each NUMA)
Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
- 2 NUMA Nodes
- 128G of RAM
- 2 x 1G NICs for management / service
- 2 x 10G NICs (ixgbe) available for DPDK on Numa node 0
- Plenty of storage
Here are the tests that I want to do, on top of those 2 x 10G NICs:
1- "Regular OVS" L2 Bridge on bare-metal (no DPDK) - test executed
2- "OVS + DPDK" L2 Bridge on bare-metal (powered by DPDK) - test executed
3- "Regular OVS" on bare-metal plus a KVM guest running another "Regular
OVS" (no DPDK) - future test
4- "Regular OVS" on bare-metal plus a KVM guest running OVS+DPDK (DPDK
only inside of a KVM VM) - future test
5- "OVS + DPDK" on bare-metal plus a KVM guest running "Regular OVS" (DPDK
only at the host) - future test - looks buggy today, I'm about to fill
another bug report
6- "OVS + DPDK" on bare-metal plus a KVM guest running another "OVS +
DPDK" (DPDK on both host and guest) - future test - blocked by BUG
https://bugs.launchpad.net/bugs/1577088
At a glance, the test that I want to do, is very simple, which is to
create a OVS+DPDK L2 bridge between 2 x 10G NICs on bare-metal (tests 2),
no KVM Guests involved, later, I'll bring virtualization to the table
(tests 3-6).
Later, I'll try a more advanced use-case, which will be to move this
bare-metal OVS+DPDK L2 bridge (of test 2), to a KVM Virtual Machine (by
doing the test 6).
I have an IXIA traffic generator sending 10G of data in both directions.
I also have a proprietary L2 Bridge DPDK Application (similar with
OVS+DPDK) that, after tuning it, like isolcpus, CPU pinning, NUMA
placement, it can handle 19.9G/s without ANY packet drop. This proprietary
DPDK App was tested on this very same hardware that I'm testing Ubuntu, OVS
and DPDK now.
So, I want to do the same with Xenial+OVS+DPDK (19.X G/s, no packet loss),
but, I am unable to do it, it is slow and hard to tune. I'll share the
instructions about how to reproduce the tests that I am doing.
*** Test 1 - Regular OVS on bare-metal:
apt install openvswitch-switch
ip l set dev p1p1 up
ip l set dev p1p2 up
ovs-vsctl add-br ovsbr
ovs-vsctl add-port ovsbr p1p1
ovs-vsctl add-port ovsbr p1p2
ip l set dev ovsbr up
* Rate:
bwm-ng -I ovsbr
Total: ~2.05 GB/s (good, 10-Gigabit on each direction)
* CPU consumption:
Kernel process "ksoftirqd/*" consuming many CPU cores! As follows:
Screenshot: http://i.imgur.com/pAKtrQa.png
Test 2 - OVS with DPDK on bare-metal:
apt install openvswitch-switch-dpdk
service openvswitch-switch stop
update-alternatives --set ovs-vswitchd
/usr/lib/openvswitch-switch-dpdk/ovs-vswitchd-dpdk
---
PCI IDs and NUMA node of p1p1 and p1p2:
PCI - /etc/dpdk/interfaces:
-
pci 0000:06:00.0 uio_pci_generic
pci 0000:06:00.1 uio_pci_generic
-
NUMA Node of dual 10G NIC cards:
cat /sys/class/net/p1p1/device/numa_node
0
---
File /etc/default/grub have:
-
iommu=pt intel_iommu=on default_hugepagesz=1GB hugepagesz=1G hugepages=8
-
File /etc/dpdk/dpdk.conf have:
-
NR_1G_PAGES=4
-
File /etc/default/openvswitch-switch have:
-
DPDK_OPTS='--dpdk -c 0x1 -n 4 -m 2048,0'
-
After installing and reconfiguring, I am rebooting the server...
* The OVS + DPDK magic:
ovs-vsctl add-br ovsbr -- set bridge ovsbr datapath_type=netdev
ovs-vsctl add-port ovsbr dpdk0 -- set Interface dpdk0 type=dpdk
ovs-vsctl add-port ovsbr dpdk1 -- set Interface dpdk1 type=dpdk
ip link set dev ovsbr up
bwm-ng -I ovsbr
Total: 756.4 MB/s
WTF!!! OVS powered by DPDK is more than 2 times slower than "Regular OVS"???
Looks like that OVS+DPDK sucks (but I'll bet that I am doing it wrong)...
Lets keep trying...
* CPU consumption:
Process ovs-vswitchd is consuming 100% of Core 0 / NUMA 0. In fact, it is
consuming less CPU than "Regular OVS"... Mmmm... Lets give more CPU Cores
to this guy...
After tuning OVS PMD:
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=F
Log:
---
dpif_netdev|INFO|Created 2 pmd threads on numa node 0
dpif_netdev(pmd37)|INFO|Core 2 processing port 'dpdk1'
dpif_netdev(pmd38)|INFO|Core 0 processing port 'dpdk0'
---
Bingo!
ovs-vswitchd now consumes 200% of CPU (top -d3)
"bwm-ng -I ovsbr" now shows:
Tota: 1.18 GB/s
Much better! But not good enough, "Regular OVS" reach ~2 GB/s"... Lets try
to add more cores for PMD...
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=FF
Log:
---
dpif_netdev|INFO|Created 4 pmd threads on numa node 0
dpif_netdev(pmd40)|INFO|Core 0 processing port 'dpdk0'
dpif_netdev(pmd41)|INFO|Core 2 processing port 'dpdk1'
---
Bad news...
ovs-vswitchd now consumes 400% of CPU (top -d3)
"bwm-ng -I ovsbr" now shows:
Total: ~1.05 GB/s
It is worse now! Because it is consuming two times the CPU resources, while
the throughput is basically the same, in fact, it is slower now!
1 PMD thread (default), very bad perf (~750 MB/s)
2 PMD threads = Good but, not even close to regular OVS (without DPDK)
(~1.18 GB/s)
4 PMD threads = Very bad, slower than when with only 2 PMD, while consumes
twice the resources (~1.05 GB/s)
So, here is my question:
*** How to make OVS + DPDK hit the "~2 GB/s" mark (what Regular OVS can do
"naturally") ?
So far, for this e-mail message, I only executed "Tests 1 and 2", the other
tests, I'll let open for the subsequent e-mails. I think that now, we have
a good "stop point" here, where I want to see OVS + DPDK at full speed
(similar with Regular OVS can do), then, I'll proceed with more tests and
messages.
Thoughts?
Cheers!
Thiago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-server/attachments/20160501/093f58d3/attachment.html>
More information about the ubuntu-server
mailing list