[Bug 1827159] [NEW] check_all_disks includes squashfs /snap/* which are 100%
Launchpad Bug Tracker
1827159 at bugs.launchpad.net
Wed Jun 29 09:51:23 UTC 2022
You have been subscribed to a public bug by Hua Zhang (zhhuabj):
[Impact]
False positive reports are generated in monitoring tools when artificial filesystems are mounted, since they show 100% disk utilization, and thus add unnecessary (but dire sounding) "DISK CRITICAL" noise.
[Test Case]
$ lxc create ubuntu-daily:19.10/amd64 lp1827159
$ lxc exec lp1827159 bash
# apt-get -y update
# apt-get install monitoring-plugins
# snap install gnome-calculator
[...]
# /usr/lib/nagios/plugins/check_disk -w 10 -c 10
DISK CRITICAL - free space: / 1903 MB (1% inode=78%); /dev 0 MB (100% inode=99%); /dev/full 16018 MB (100% inode=99%); /dev/null 16018 MB (100% inode=99%); /dev/random 16018 MB (100% inode=99%); /dev/tty 16018 MB (100% inode=99%); /dev/urandom 16018 MB (100% inode=99%); /dev/zero 16018 MB (100% inode=99%); /dev/fuse 16018 MB (100% inode=99%); /dev/net/tun 16018 MB (100% inode=99%); /dev/lxd 0 MB (100% inode=99%); /dev/.lxd-mounts 0 MB (100% inode=99%); /dev/shm 16041 MB (100% inode=99%); /run 3208 MB (99% inode=99%); /run/lock 5 MB (100% inode=99%); /sys/fs/cgroup 16041 MB (100% inode=99%); /snap 1903 MB (1% inode=78%); /run/snapd/ns 3208 MB (99% inode=99%);| /=111171MB;119160;119160;0;119170 /dev=0MB;-10;-10;0;0 /dev/full=0MB;16008;16008;0;16018 /dev/null=0MB;16008;16008;0;16018 /dev/random=0MB;16008;16008;0;16018 /dev/tty=0MB;16008;16008;0;16018 /dev/urandom=0MB;16008;16008;0;16018 /dev/zero=0MB;16008;16008;0;16018 /dev/fuse=0MB;16008;16008;0;16018 /dev/net/tun=0MB;16008;16008;0;16018 /dev/lxd=0MB;-10;-10;0;0 /dev/.lxd-mounts=0MB;-10;-10;0;0 /dev/shm=0MB;16031;16031;0;16041 /run=0MB;3198;3198;0;3208 /run/lock=0MB;-5;-5;0;5 /sys/fs/cgroup=0MB;16031;16031;0;16041 /snap=111171MB;119160;119160;0;119170 /run/snapd/ns=0MB;3198;3198;0;3208
# /usr/lib/nagios/plugins/check_disk -w 10 -c 10 -e -X squashfs
DISK CRITICAL - free space: /dev 0 MB (100% inode=99%); /dev/lxd 0 MB (100% inode=99%); /dev/.lxd-mounts 0 MB (100% inode=99%); /run/lock 5 MB (100% inode=99%);| /=111392MB;119160;119160;0;119170 /dev=0MB;-10;-10;0;0 /dev/full=0MB;16008;16008;0;16018 /dev/null=0MB;16008;16008;0;16018 /dev/random=0MB;16008;16008;0;16018 /dev/tty=0MB;16008;16008;0;16018 /dev/urandom=0MB;16008;16008;0;16018 /dev/zero=0MB;16008;16008;0;16018 /dev/fuse=0MB;16008;16008;0;16018 /dev/net/tun=0MB;16008;16008;0;16018 /dev/lxd=0MB;-10;-10;0;0 /dev/.lxd-mounts=0MB;-10;-10;0;0 /dev/shm=0MB;16031;16031;0;16041 /run=0MB;3198;3198;0;3208 /run/lock=0MB;-5;-5;0;5 /sys/fs/cgroup=0MB;16031;16031;0;16041 /snap=111392MB;119160;119160;0;119170 /run/snapd/ns=0MB;3198;3198;0;3208
# /usr/lib/nagios/plugins/check_disk -w 10 -c 10 -e -X tmpfs
DISK OK| /=111171MB;119160;119160;0;119170 /dev/full=0MB;16008;16008;0;16018 /dev/null=0MB;16008;16008;0;16018 /dev/random=0MB;16008;16008;0;16018 /dev/tty=0MB;16008;16008;0;16018 /dev/urandom=0MB;16008;16008;0;16018 /dev/zero=0MB;16008;16008;0;16018 /dev/fuse=0MB;16008;16008;0;16018 /dev/net/tun=0MB;16008;16008;0;16018 /snap=111171MB;119160;119160;0;119170
[Regression Potential]
As this alters the logic of how out-of-space checks are handled, relevant issues to keep an eye out for would relate to filesystem checks reporting improperly. These tools underlay a few different front-ends, so regression bugs may get filed in a few different places, however they will tend to display error messages involving check_disk, nagios, and either tmpfs or tracefs.
Note that there are likely other synthetic filesystems beyond tmpfs and
tracefs (e.g. udev, usbfs, devtmpfs, fuse.*, ...) which might also cause
similar false positives; these should be handled as separate bugs,
although they can likely be fixed the same way.
[Fix]
monitoring-plugins is modified to exclude the unwanted filesystems by default, in check_disk.c (see patch).
[Discussion]
There have been several bug reports filed about false positives with different synthetic file systems (see Dupes), including tracefs, squashfs, and tmpfs. The commonly discussed workaround is to exclude these when running the tools (e.g. using the '-X <fs>' parameter for check_all_disks). Since wrappers are typically used for running the underlying tools, it is possible to add a string of -X... parameters.
However, a cleaner solution is possible. monitoring-plugins'
check_disk.c maintains an internal exclusion list, fs_exclude_list,
which already excludes iso9660, and can be modified to add other
filesystems to exclude by default.
In other words, check_disk.c is modified thusly:
np_add_name(&fs_exclude_list, "iso9660");
np_add_name(&fs_exclude_list, "squashfs");
np_add_name(&fs_exclude_list, "tmpfs");
np_add_name(&fs_exclude_list, "tracefs");
This code is added prior to the command line parsing logic, and as such
simply sets default behavior. It does not preclude further adding or
removing filesystems via the -X and -N parameters. Indeed, if someone
were to desire checking tmpfs, they are able to manually add it, via "-N
tmpfs".
[Original Report]
When using nagios to monitor the Nagios host itself, if the host is not a container, the template for checking the disk space on the Nagios host does not exclude any snap filesystems. This means we get a Critical report if any snap is installed.
This can be changed by adding to the check_all_disks command a '-X
squashfs', but that command is defined in the nagios plugins package.
(Or, perhaps '-X tmpfs'? -- bryce)
** Affects: charm-nagios
Importance: Undecided
Status: Fix Released
** Affects: coreutils (Ubuntu)
Importance: Undecided
Status: Fix Released
** Affects: monitoring-plugins (Ubuntu)
Importance: Low
Assignee: Bryce Harrington (bryce)
Status: Fix Released
** Tags: canonical-bootstack patch server-next
--
check_all_disks includes squashfs /snap/* which are 100%
https://bugs.launchpad.net/bugs/1827159
You received this bug notification because you are a member of Ubuntu Sponsors Team, which is subscribed to the bug report.
More information about the Ubuntu-sponsors
mailing list