[Bug 1844455] Re: Memory leak of struct _virPCIDeviceAddress on libvirt

Guilherme G. Piccoli 1844455 at bugs.launchpad.net
Fri Feb 28 15:26:44 UTC 2020


The function that allocates the structure _virPCIDeviceAddress is:
virPCIGetDeviceAddressFromSysfsLink().

This function is called in the following path, specifically in
virPCIGetPhysicalFunction(), which is itself called from
nodeDeviceSysfsGetPCISRIOVCaps().

[0]             [1]            [2]
-----------------------------------
                 |
                 |
nodeDeviceSysfsGetPCIRelatedDevCaps()
  nodeDeviceSysfsGetPCISRIOVCaps()

Since the tree is a bit large, I've split in 3 parts: [0], [1] and [2].
In that terminology, the bottom function in each stack calls nodeDeviceSysfsGetPCIRelatedDevCaps().


[0] src/node_device/node_device_driver.c

virNodeDeviceGetXMLDesc() [libvirt-nodedev.c]
  virNodeDeviceDriver node_device_driver callback <.nodeDeviceGetXMLDesc(), in udev/hal/remote>
    nodeDeviceGetXMLDesc()
      update_caps()


[1] src/node_device/node_device_udev.c
   virStateDriver udevStateDriver [callback register, stateInitialize()]
                               |
            --------------------------------------
            |                                    |
            |                           nodeStateInitialize()
            |                                    |
   nodeStateInitialize()              udevEnumerateDevices()
               |                                 |
udevEventHandleCallback()   ||   udevProcessDeviceListEntry
  udevAddOneDevice()
    udevGetDeviceDetails()
      udevProcessPCI()


[2] src/node_device/node_device_hal.c << deprecated >>
virStateDriver halStateDriver callbacks register [stateInitialize(), stateInitialize()]
                      |           libhal_ctx_set_device_added()     |-> multiple calls in this file 
         -----------------------                 |                  |
         |                     |                 |                  |
nodeStateInitialize() || nodeStateReload || device_added() || dev_refresh()
         |
      dev_create()      libhal_ctx_set_device_new_capability() <HAL callback>
         |                       |
gather_capabilities() || device_cap_added()
  gather_capability()
    caps_tbl[VIR_NODE_DEV_CAP_PCI_DEV]() <gather_fn ptr>
      gather_pci_cap()


We can skip the path [2] given that HAL is deprecated and only udev is available on Ubuntu. Trying to follow the path [1] led us to exercise the allocation of the structure responsible for the leak in question, but at the same time, udevEventHandleCallback() is able to deallocate the variable.
The trigger is the following:

while true; do udevadm trigger; done

It goes to the aforementioned described udev path ([1]), and allocates
one instance of the PCI _virPCIDeviceAddress structure per device; it is
deallocated though according to the following backtrace collected with
gdb:


#0  virNodeDevCapsDefFree (caps=0x558d824d6ba0) at ../../../src/conf/node_device_conf.c:1709
#1  0x00007f7f1244db6c in virNodeDeviceDefFree (def=0x558d824c29c0) at ../../../src/conf/node_device_conf.c:146
#2  0x00007f7f1244fe99 in virNodeDeviceAssignDef (devs=0x7f7ee00e68b8, def=0x558d824cde50)
    at ../../../src/conf/node_device_conf.c:182
#3  0x00007f7eebe1efae in udevAddOneDevice (device=device at entry=0x558d824d85e0)
    at ../../../src/node_device/node_device_udev.c:1402
#4  0x00007f7eebe202f8 in udevEventHandleCallback (watch=watch at entry=7, fd=<optimized out>, events=events at entry=1,
    data=data at entry=0x0) at ../../../src/node_device/node_device_udev.c:1546
#5  0x00007f7f12395cf0 in virEventPollDispatchHandles (fds=<optimized out>, nfds=<optimized out>)
    at ../../../src/util/vireventpoll.c:509
#6  virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#7  0x00007f7f12394332 in virEventRunDefaultImpl () at ../../../src/util/virevent.c:314
#8  0x00007f7f1250789d in virNetDaemonRun (dmn=0x558d8249f340) at ../../../src/rpc/virnetdaemon.c:701


Remains to us reproduce the issue with path [0]. To exercise that path, we can run the following command:

while true; do virsh nodedev-dumpxml pci_0000_08_12_0 >/dev/null; done

It worked, I was able to observe the following stack on Valgrind when
exercising the leak path:

16,752 bytes in 1,047 blocks are definitely lost in loss record 580 of 586
at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x50A6283: virAlloc (viralloc.c:144)
by 0x50F1282: virPCIGetDeviceAddressFromSysfsLink (virpci.c:2453)
by 0x50F1417: virPCIGetPhysicalFunction (virpci.c:2486)
by 0x2BD50141: nodeDeviceSysfsGetPCISRIOVCaps (node_device_linux_sysfs.c:157)
by 0x2BD50141: nodeDeviceSysfsGetPCIRelatedDevCaps (node_device_linux_sysfs.c:225)
by 0x2BD4F1D0: update_caps (node_device_driver.c:66)
by 0x2BD4F1D0: nodeDeviceGetXMLDesc (node_device_driver.c:346)
by 0x51C838E: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:292)
by 0x138D6E: remoteDispatchNodeDeviceGetXMLDesc (remote_dispatch.h:12746)
by 0x138D6E: remoteDispatchNodeDeviceGetXMLDescHelper (remote_dispatch.h:12720)
by 0x5213FE8: virNetServerProgramDispatchCall (virnetserverprogram.c:437)
by 0x5213FE8: virNetServerProgramDispatch (virnetserverprogram.c:307)
by 0x520F4F7: virNetServerProcessMsg (virnetserver.c:135)
by 0x520F4F7: virNetServerHandleJob (virnetserver.c:156)
by 0x5106565: virThreadPoolWorker (virthreadpool.c:145)
by 0x5105AE7: virThreadHelper (virthread.c:206)


This is for my libvirt 1.3.1 (Xenial) testing, but I've noticied also in Bionic/Eoan testing. Also, 0000:08:12.0 is a Virtual Function, so the reproducer requires a SR-IOV capable PCI device.
Not only the virsh nodedev command is capable of exercising the leak path; Openstack/Nova composes XMLs for PCI functions and can run libvirt API and so cause the same effect.

-- 
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1844455

Title:
  Memory leak of struct _virPCIDeviceAddress on libvirt

Status in Ubuntu Cloud Archive:
  Confirmed
Status in Ubuntu Cloud Archive mitaka series:
  Confirmed
Status in libvirt package in Ubuntu:
  Fix Released
Status in libvirt source package in Xenial:
  Confirmed
Status in libvirt source package in Bionic:
  Confirmed
Status in libvirt source package in Eoan:
  Confirmed
Status in libvirt source package in Focal:
  Fix Released

Bug description:
  It was reported that libvirt 1.3.1 running on Trusty (through
  UCA/Mitaka) is getting OOM'ed after a while - in our reports took 2
  years for the leak to trigger an out-of-memory situation, but this may
  change according to the user available memory.

  Valgrind was executed in a similar environment, we were able to
  collect information about the "definitely lost" memory of libvirt
  process (attached) below.

  The leaks are detailed in next comments.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1844455/+subscriptions



More information about the Ubuntu-openstack-bugs mailing list