Help! 10.04 LTSP configuration using only one nic

Gavin McCullagh gmccullagh at gmail.com
Wed Sep 7 16:18:57 UTC 2011


Hi,

On Tue, 06 Sep 2011, Jim Christiansen wrote:

> My old Centos LTSP server for our Library died near the end of June.  My
> students had been playing with a new 10.04 64 setup and had it serving 32
> bit fat clients, but really slowly.  One of the students altered something
> in iptables to make it function and I wonder if this could be the problem.
>  Grepping my history for iptables shows:
> 
>  49  sudo iptables --table nat --append POSTROUTING --jump MASQUERADE --source 192.168.1.0/24
>  50  sudo sh -c 'iptables-save > /etc/ltsp/nat'

The commands are to enable network address translation on routed traffic
which is going through the LTSP server and which is coming from the
192.168.1.0/24 subnet.  The second command saves it so that it gets enabled
again after a reboot.  If you open /etc/ltsp/nat you should see the full
firewall config.

> This was done, apparently, to allow the system to function with one nic.

If your LTSP server is a single-interface server and your router is a
separate unit, also on 192.168.1.0/24 which seems likely, then this is
probably not what you want to do.  What I'd ordinarily suggest with 1
interface is to configure DHCP to point the clients at the router as
default gateway and let it do the NAT.

To fix this you could just comment out the nat command above from
/etc/ltsp/nat and reboot.

In fact, I'm not sure it's wise to run iptables at all on a 1-interface
LTSP server so you might want to comment out that entire file and reboot.

> The system is sitting on a 100 megabit network with 26 clients.  Only 1/3 to
> 2/3 of the clients will boot right off.  The others will linger with 4 four
> little streaming dots in the middle of the screen for minutes until the log
> in screen appears or they fail with errors:
> 
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> INFO: task modprobe:436 blocked for more than 120 seconds.
> "echo 0 >... same as 1st line
> INFO: task udev-configure-:936 blocked for more than 120 seconds.
> "echo 0 > ...same as 1st line
> INO: taskhdparm:1020 blocked for more than 120 seconds.
> "echo 0 > ...same as 1st line
> INFO: task S32ltsp-client-:1027 blocked for more than 120 seconds.

So it appears that:

1. The initial PXE DHCP works.
2. The kernel loads over the network and a root filesystem (possibly
   initramfs or an NFS root) is mounted.
3. The running kernel is trying to load do load a module (modprobe) which
   is timing out.  At a guess, perhaps it's not getting a response from the
   remote filesystem.

If you can work out the IP address of this client and run

	 sudo tcpdump -n -i eth0 host <ip_of_client>

you may be able to see the packets being sent to the server from the client
and get an idea what's going wrong.  You might see that the firewall is
blocking incoming NFS mount attempts.

Also look at the logs on the server for anything relevant.

> It doesn't seem any better it I boot fewer clients or more... They just
> don't all start up reliably.

Hmm.  Unreliable suggests something more like you're hitting a limit (ie
some work until you hit that limit), or something is working contingent on
a bit of luck.  One possibility is that some NFS connections are getting
through the firewall rules and some aren't.  NFS uses a range of ports, so
trying disabling iptables is worth a go.

Let us know how you get on.

Gavin









More information about the edubuntu-users mailing list