root filesystem full and having trouble clearing it!!!

Gavin McCullagh gmccullagh at gmail.com
Fri Mar 16 13:55:51 UTC 2007


Hi,

[ I've found a fix, but I'd like to send this anyway, to ask what the
  correct solution is ]

We're running Edgy for a number of thin clients (about 20 just now).

I'm having a nasty problem and I can't spot the reason.  The root
filesystem is full so users can't login.  However, I can't pinpoint where
the data is to free it up.

gavinmc at medlycott:~$ df -h 
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              9.2G  8.8G  416K 100% /
varrun                2.0G  168K  2.0G   1% /var/run
varlock               2.0G     0  2.0G   0% /var/lock
procbususb             10M  108K  9.9M   2% /proc/bus/usb
udev                   10M  108K  9.9M   2% /dev
devshm                2.0G     0  2.0G   0% /dev/shm
/dev/md2              115G  5.0G  104G   5% /backups
/dev/md1              9.2G  335M  8.4G   4% /var
brooks:/home          130G  5.5G  118G   5% /home
brooks:/shared        130G  5.5G  118G   5% /shared
ltspfs                125M   16K  125M   1% /tmp/.jmcclean-ltspfs/floppy0
ltspfs                 38G  820M   37G   3% /tmp/.jmcclean-ltspfs/atadisk-hda1

Looking at the disk usage (I've snipped out /dev and /proc), I can't see
where all the disk usage is.   

gavinmc at medlycott:~$ sudo du -hs /*
4.8G    /backups
3.5M    /bin
36M     /boot
0       /cdrom
13M     /etc
2.3G    /home
4.0K    /initrd
0       /initrd.img
0       /initrd.img.old
259M    /lib
48K     /lost+found
44K     /media
4.0K    /mnt
350M    /opt
905M    /proc
180K    /root
5.6M    /sbin
2.4M    /shared
4.0K    /srv
0       /sys
du: cannot access `/tmp/.jmcclean-ltspfs/floppy0': Permission denied
du: cannot access `/tmp/.jmcclean-ltspfs/atadisk-hda1': Permission denied
86M     /tmp
2.3G    /usr
207M    /var
0       /vmlinuz
0       /vmlinuz.old

/tmp was very full so I removed a big load of files called /tmp/fileXXXXXXX
all of which were 32MB in size (I gather they're ndb swap files?).
However, they've not freed up any space.  I presume this is because some
process (nbd-server) still has them open?

[My Solution]

There were tonnes of old processes running under the nbdserver user each
of which looked like:

nobody    6776  0.0  0.0   1656   468 ?        S    Mar15   0:00 /bin/sh /usr/sbin/nbdswapd
nobody    6779  0.0  0.0   3248   740 ?        S    Mar15   0:00 /bin/nbd-server 0 /tmp/fileHiJv50 

Many of them were still there from February.  I can't see why that would be, but it
looks like they never stopped when the thin client went down.  So, I used
this to kill the February ones:

	ps aux |grep nbd |grep Feb | awk '{ print $2}' | xargs sudo kill

I have now freed up 2.6GB space and all is going back to normal for now.
However, it appears this is going to bite us again in a week or two.  There
are currently 103 such processes which is about five times are total number
of thin clients.

Can someone explain why these processes are hanging around using up so much
disk space.  Is it a bug or something we've done wrong?

Can I and should I put the network swap files on a different partition?
Should we just turn off network swap?

Gavin





More information about the edubuntu-devel mailing list