[Bug 1827264] Re: ovs-vswitchd thread consuming 100% CPU
Jeremy Phillips
1827264 at bugs.launchpad.net
Fri May 22 02:29:31 UTC 2020
I know this has been open for a year now, but for what it's worth, we're
seeing the same issue on two of our three Neutron nodes. A whole lot of
entries in /var/log/openvswitch/ovs-vswitchd.log like the following:
2020-05-22T00:19:25.581Z|18410|poll_loop(handler74)|INFO|Dropped 2973816 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2020-05-22T00:19:25.581Z|18411|poll_loop(handler74)|INFO|wakeup due to [POLLIN] on fd 35 (unknown anon_inode:[eventpoll]) at ../lib/dpif-netlink.c:2786 (98% CPU usage)
We're running Neutron in Rocky on Ubuntu 18.04:
# uname -r
4.15.0-99-generic
# dpkg -l | grep openvswitch-
ii neutron-openvswitch-agent 2:13.0.6-0ubuntu1~cloud0 all Neutron is a virtual network service for Openstack - Open vSwitch plugin agent
ii openvswitch-common 2.10.0-0ubuntu2~cloud0 amd64 Open vSwitch common components
ii openvswitch-switch 2.10.0-0ubuntu2~cloud0 amd64 Open vSwitch switch implementations
Looking at the ovs-vswitchd process itself:
# timeout 10 strace -c -p 1227
strace: Process 1227 attached
strace: Process 1227 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
96.28 9.059391 28852 314 poll
3.03 0.284833 284833 1 restart_syscall
0.28 0.026164 13 1968 sendmsg
0.27 0.025726 67 385 208 recvfrom
0.07 0.006302 3 2220 252 recvmsg
0.02 0.002327 7 315 69 read
0.02 0.002157 2 930 171 futex
0.02 0.002075 5 378 378 accept
0.01 0.000532 2 316 getrusage
0.00 0.000238 14 17 sendto
0.00 0.000034 11 3 write
------ ----------- ----------- --------- --------- ----------------
100.00 9.409779 6847 1078 total
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to openvswitch in Ubuntu.
https://bugs.launchpad.net/bugs/1827264
Title:
ovs-vswitchd thread consuming 100% CPU
Status in openvswitch package in Ubuntu:
Confirmed
Bug description:
I have an ovs-vswitchd process consuming 100% CPU in a very lightly
used Openstack Rocky cloud running on Bionic. The version in question
is openvswitch-switch 2.10.0-0ubuntu2~cloud0
ovs-vswitchd is running alongside various neutron processes
(lbaasv2-agent, metadata-agent, l3-agent, dhcp-agent, openvswitch-
agent) inside an LXC container on a physical host. There is a single
neutron router, and the entire environment including br-tun, br-ex,
and br-int traffic barely goes over 200KiB/s TX/RX combined.
The issue appears to have arisen on its own after the host was hard
rebooted. On restart ovs-vswitchd came up with high load and it has
not diminished since then.
The thread that is consuming CPU looks like this in ps and shows up with the name 'handler89':
UID PID SPID PPID C STIME TTY TIME CMD
root 7267 7454 1 99 Apr23 ? 8-03:38:14 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
Logs in /var/log/openvswitch/ovs-vswitchd.log show ~14000 entries like this per day - roughly one per second:
2019-05-01T18:34:30.110Z|237220|poll_loop(handler89)|INFO|Dropped
1092844 log messages in last 6 seconds (most recently, 0 seconds ago)
due to excessive rate
2019-05-01T18:34:30.110Z|237221|poll_loop(handler89)|INFO|wakeup due
to [POLLIN] on fd 42 (unknown anon_inode:[eventpoll]) at ../lib/dpif-
netlink.c:2786 (99% CPU usage)
Logs in /var/log/neutron/neutron-openvswitch-agent.log show entries
like this, again one set per second for ~14000 per day:
2019-05-01 18:35:31.174 13621 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
7e354ade-55a3-46ba-adf4-db49a51e5218 - - - - -] Agent rpc_loop -
iteration:355070 started rpc_loop /usr/lib/python3/dist-
packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:2086
2019-05-01 18:35:31.177 13621 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch
[req-7e354ade-55a3-46ba-adf4-db49a51e5218 - - - - -] ofctl request
version=0x4,msg_type=0x12,msg_len=0x38,xid=0x81826d95,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=23,type=1)
result
[OFPFlowStatsReply(body=[OFPFlowStats(byte_count=0,cookie=13408676230369642694,duration_nsec=724000000,duration_sec=710697,flags=0,hard_timeout=0,idle_timeout=0,instructions=[],length=56,match=OFPMatch(oxm_fields={}),packet_count=0,priority=0,table_id=23)],flags=0,type=1)]
_send_msg /usr/lib/python3/dist-
packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.py:114
2019-05-01 18:35:31.179 13621 DEBUG
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-
7e354ade-55a3-46ba-adf4-db49a51e5218 - - - - -] Agent rpc_loop -
iteration:355070 completed. Processed ports statistics: {'regular':
{'added': 0, 'updated': 0, 'removed': 0}}. Elapsed:0.005
loop_count_and_wait /usr/lib/python3/dist-
packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py:1875
I'm not sure what else information to add here but am happy to gather more diagnostic data to try to pin this down. I did come across https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047558.html which is the same version as mine and the symptoms appear to be the same.
Below are various ovs-dpctl/ofctl show reports:
root at juju-df624b-4-lxd-10:~# ovs-dpctl show
system at ovs-system:
lookups: hit:223561120 missed:5768546 lost:798
flows: 131
masks: hit:2284286371 total:15 hit/pkt:9.96
port 0: ovs-system (internal)
port 1: br-ex (internal)
port 2: eth1
port 3: gre_sys (gre: packet_type=ptap)
port 4: br-tun (internal)
port 5: br-int (internal)
port 6: tapa062d6f1-40
port 7: tapf90e3ab6-13
port 8: tap45ba891c-4c
root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-int
OFPT_FEATURES_REPLY (xid=0x2): dpid:00003643edb09542
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(int-br-ex): addr:4a:b4:cc:dd:aa:ac
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(patch-tun): addr:7a:36:76:1f:47:6e
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
3(tapa062d6f1-40): addr:92:d4:6c:26:55:0a
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
4(tapf90e3ab6-13): addr:9e:8b:4f:ae:8f:ba
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
5(tap45ba891c-4c): addr:76:45:39:c8:d7:e0
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
LOCAL(br-int): addr:36:43:ed:b0:95:42
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-tun
OFPT_FEATURES_REPLY (xid=0x2): dpid:000092c6c8f4a545
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(patch-int): addr:02:1d:bc:ec:0c:3b
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(gre-0a30029b): addr:fe:09:61:11:92:8b
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
3(gre-0a3002a2): addr:a2:64:93:48:e2:82
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
4(gre-0a30029d): addr:12:b1:73:32:19:d4
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
5(gre-0a3002ce): addr:ca:10:51:f2:0f:05
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
6(gre-0a3002a1): addr:de:fe:95:33:d9:67
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
7(gre-0a30029e): addr:5a:fa:ce:15:7b:5c
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
8(gre-0a3002c3): addr:b6:87:66:09:fc:04
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
10(gre-0a3002a0): addr:ce:79:f2:bf:a8:94
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
14(gre-0a30029f): addr:b6:b0:84:7d:9f:aa
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
18(gre-0a30029c): addr:da:14:12:54:1c:ec
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-tun): addr:92:c6:c8:f4:a5:45
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
root at juju-df624b-4-lxd-10:~# ovs-ofctl show br-ex
OFPT_FEATURES_REPLY (xid=0x2): dpid:000000163e9a47a2
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(eth1): addr:00:16:3e:9a:47:a2
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
2(phy-br-ex): addr:ce:a8:6a:25:55:0e
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-ex): addr:00:16:3e:9a:47:a2
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1827264/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list