[Bug 1317705] Re: Commissioning x86_64 node never completes, sitting at grub prompt, pserv py tbs
Andres Rodriguez
andreserl at ubuntu-pe.org
Mon Jul 20 09:46:24 UTC 2015
** Description changed:
+ [Impact]
+ When TFTP booting with UEFI, the TFTP server would stack trace when terminating the transfer. This would lead to some UEFI boot issues when using UEFI
+
+ [Test Case]
+ 1. Install MAAS
+ 2. Setup UEFI on machine to PXE boot from MAAS
+ 3. UEFI boot machine, it will fail as tftp chrases.
+
+ 4. With fix, UEFI boot machine, it will succeed as tftp doesn't crash.
+
+ [Regression Potential]
+ Minimal. This has tested and QA and proven to be working as expected.
+
ubuntu 14.04LTS + MaaS 1.5 on x86_64
Controller:
esxi vm xeon + vmnet3/ixgbe
- Nodes:
+ Nodes:
supermicro twinblades
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
128GB RAM
2@ ige
2@ ixgbe <<< used for PXE booting
Trying to add physical nodes configured for Trusty Tahr amd64. IPMI
powerctl cycles the node, tftp's two boot files, then commissioning goes
out to lunch:
15:12:11.465976 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 00:25:90:e5:5a:56 (oui Unknown), length 359
15:12:11.468982 IP 172.30.193.38.bootps > 255.255.255.255.bootpc: BOOTP/DHCP, Reply, length 300
15:12:11.475270 IP 172.30.255.101.1294 > 172.30.193.38.tftp: 41 RRQ "bootx64.efi" octet tsize 0 blksize 1468
15:12:11.535326 IP 172.30.255.101.1295 > 172.30.193.38.tftp: 33 RRQ "bootx64.efi" octet blksize 1468
15:12:12.024716 IP 172.30.255.101.1296 > 172.30.193.38.tftp: 33 RRQ "/grubx64.efi" octet blksize 512
These tb's coincide with above traffic and node sitting at the grub
prompt indefinitely:
2014-05-08 15:12:11-0700 [-] Starting protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469a098>
2014-05-08 15:12:11-0700 [RemoteOriginReadSession (UDP)] Unhandled Error
- Traceback (most recent call last):
- File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 73, in callWithContext
- return context.call({ILogContext: newCtx}, func, *args, **kw)
- File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
- return self.currentContext().callWithContext(ctx, func, *args, **kw)
- File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
- return func(*args,**kw)
- File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
- why = selectable.doRead()
- --- <exception caught here> ---
- File "/usr/lib/python2.7/dist-packages/twisted/internet/udp.py", line 234, in doRead
- self.protocol.datagramReceived(data, addr)
- File "/usr/lib/python2.7/dist-packages/tftp/bootstrap.py", line 171, in datagramReceived
- datagram = TFTPDatagramFactory(*split_opcode(datagram))
- File "/usr/lib/python2.7/dist-packages/tftp/datagram.py", line 394, in __call__
- return datagram_class.from_wire(payload)
- File "/usr/lib/python2.7/dist-packages/tftp/datagram.py", line 323, in from_wire
- raise InvalidErrorcodeError(errorcode)
- tftp.errors.InvalidErrorcodeError: Unknown error code: 8
-
+ Traceback (most recent call last):
+ File "/usr/lib/python2.7/dist-packages/twisted/python/log.py", line 73, in callWithContext
+ return context.call({ILogContext: newCtx}, func, *args, **kw)
+ File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
+ return self.currentContext().callWithContext(ctx, func, *args, **kw)
+ File "/usr/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
+ return func(*args,**kw)
+ File "/usr/lib/python2.7/dist-packages/twisted/internet/posixbase.py", line 614, in _doReadOrWrite
+ why = selectable.doRead()
+ --- <exception caught here> ---
+ File "/usr/lib/python2.7/dist-packages/twisted/internet/udp.py", line 234, in doRead
+ self.protocol.datagramReceived(data, addr)
+ File "/usr/lib/python2.7/dist-packages/tftp/bootstrap.py", line 171, in datagramReceived
+ datagram = TFTPDatagramFactory(*split_opcode(datagram))
+ File "/usr/lib/python2.7/dist-packages/tftp/datagram.py", line 394, in __call__
+ return datagram_class.from_wire(payload)
+ File "/usr/lib/python2.7/dist-packages/tftp/datagram.py", line 323, in from_wire
+ raise InvalidErrorcodeError(errorcode)
+ tftp.errors.InvalidErrorcodeError: Unknown error code: 8
+
2014-05-08 15:12:11-0700 [RemoteOriginReadSession (UDP)] Logged OOPS id OOPS-20c0e9854c8b0ef29998d4a27454fc6a: InvalidErrorcodeError: Unknown error code: 8
2014-05-08 15:12:11-0700 [TFTP (UDP)] Datagram received from ('172.30.255.101', 1295): <RRQDatagram(filename=bootx64.efi, mode=octet, options={'blksize': '1468'})>
2014-05-08 15:12:11-0700 [TFTP (UDP)] Datagram received from ('172.30.255.101', 1295): <RRQDatagram(filename=bootx64.efi, mode=octet, options={'blksize': '1468'})>
2014-05-08 15:12:11-0700 [-] RemoteOriginReadSession starting on 43143
2014-05-08 15:12:11-0700 [-] RemoteOriginReadSession starting on 43143
2014-05-08 15:12:11-0700 [-] Starting protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469aea8>
2014-05-08 15:12:11-0700 [-] Starting protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469aea8>
2014-05-08 15:12:12-0700 [RemoteOriginReadSession (UDP)] Final ACK received, transfer successful
2014-05-08 15:12:12-0700 [RemoteOriginReadSession (UDP)] Final ACK received, transfer successful
2014-05-08 15:12:12-0700 [-] (UDP Port 43143 Closed)
2014-05-08 15:12:12-0700 [-] (UDP Port 43143 Closed)
2014-05-08 15:12:12-0700 [-] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469aea8>
2014-05-08 15:12:12-0700 [-] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469aea8>
2014-05-08 15:12:12-0700 [TFTP (UDP)] Datagram received from ('172.30.255.101', 1296): <RRQDatagram(filename=/grubx64.efi, mode=octet, options={'blksize': '512'})>
2014-05-08 15:12:12-0700 [TFTP (UDP)] Datagram received from ('172.30.255.101', 1296): <RRQDatagram(filename=/grubx64.efi, mode=octet, options={'blksize': '512'})>
2014-05-08 15:12:12-0700 [-] RemoteOriginReadSession starting on 56400
2014-05-08 15:12:12-0700 [-] RemoteOriginReadSession starting on 56400
2014-05-08 15:12:12-0700 [-] Starting protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469a440>
2014-05-08 15:12:12-0700 [-] Starting protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469a440>
2014-05-08 15:12:12-0700 [RemoteOriginReadSession (UDP)] (UDP Port 41252 Closed)
2014-05-08 15:12:12-0700 [RemoteOriginReadSession (UDP)] (UDP Port 41252 Closed)
2014-05-08 15:12:12-0700 [RemoteOriginReadSession (UDP)] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469a098>
2014-05-08 15:12:12-0700 [RemoteOriginReadSession (UDP)] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469a098>
2014-05-08 15:12:12-0700 [RemoteOriginReadSession (UDP)] Final ACK received, transfer successful
2014-05-08 15:12:12-0700 [RemoteOriginReadSession (UDP)] Final ACK received, transfer successful
2014-05-08 15:12:12-0700 [-] (UDP Port 56400 Closed)
2014-05-08 15:12:12-0700 [-] (UDP Port 56400 Closed)
2014-05-08 15:12:12-0700 [-] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469a440>
2014-05-08 15:12:12-0700 [-] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7fbbe469a440>
2014-05-08 15:12:13-0700 [-] Unhandled Error
- Traceback (most recent call last):
- File "/usr/lib/python2.7/dist-packages/twisted/application/app.py", line 392, in startReactor
- self.config, oldstdout, oldstderr, self.profiler, reactor)
- File "/usr/lib/python2.7/dist-packages/twisted/application/app.py", line 313, in runReactorWithLogging
- reactor.run()
- File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1192, in run
- self.mainLoop()
- File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1201, in mainLoop
- self.runUntilCurrent()
- --- <exception caught here> ---
- File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in runUntilCurrent
- call.func(*call.args, **call.kw)
- File "/usr/lib/python2.7/dist-packages/tftp/util.py", line 80, in _call_and_schedule
- self.callable(*self.callable_args, **self.callable_kwargs)
- File "/usr/lib/python2.7/dist-packages/twisted/internet/udp.py", line 254, in write
- return self.socket.send(datagram)
- exceptions.AttributeError: 'Port' object has no attribute 'socket'
-
- 2014-05-08 15:12:13-0700 [-] Logged OOPS id OOPS-4ad4c1419556eb88cc72311fd54f737b: AttributeError: 'Port' object has no attribute 'socket'
+ Traceback (most recent call last):
+ File "/usr/lib/python2.7/dist-packages/twisted/application/app.py", line 392, in startReactor
+ self.config, oldstdout, oldstderr, self.profiler, reactor)
+ File "/usr/lib/python2.7/dist-packages/twisted/application/app.py", line 313, in runReactorWithLogging
+ reactor.run()
+ File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1192, in run
+ self.mainLoop()
+ File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1201, in mainLoop
+ self.runUntilCurrent()
+ --- <exception caught here> ---
+ File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in runUntilCurrent
+ call.func(*call.args, **call.kw)
+ File "/usr/lib/python2.7/dist-packages/tftp/util.py", line 80, in _call_and_schedule
+ self.callable(*self.callable_args, **self.callable_kwargs)
+ File "/usr/lib/python2.7/dist-packages/twisted/internet/udp.py", line 254, in write
+ return self.socket.send(datagram)
+ exceptions.AttributeError: 'Port' object has no attribute 'socket'
+ 2014-05-08 15:12:13-0700 [-] Logged OOPS id OOPS-
+ 4ad4c1419556eb88cc72311fd54f737b: AttributeError: 'Port' object has no
+ attribute 'socket'
- Nodes and controller are on the same untagged subnet but there is an lldp'd link between the bladeserver's onboard xgb switches and the controller's connected xgb Arista.
+ Nodes and controller are on the same untagged subnet but there is an
+ lldp'd link between the bladeserver's onboard xgb switches and the
+ controller's connected xgb Arista.
root at pre-maas-ctrl:/var/log/maas/oops/2014-05-08# dpkg -l '*maas*' | cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================================-=============================-============-===============================================================================
ii maas 1.5+bzr2252-0ubuntu1 all MAAS server all-in-one metapackage
ii maas-cli 1.5+bzr2252-0ubuntu1 all MAAS command line API tool
ii maas-cluster-controller 1.5+bzr2252-0ubuntu1 all MAAS server cluster controller
ii maas-common 1.5+bzr2252-0ubuntu1 all MAAS server common files
ii maas-dhcp 1.5+bzr2252-0ubuntu1 all MAAS DHCP server
ii maas-dns 1.5+bzr2252-0ubuntu1 all MAAS DNS server
ii maas-region-controller 1.5+bzr2252-0ubuntu1 all MAAS server complete region controller
ii maas-region-controller-min 1.5+bzr2252-0ubuntu1 all MAAS Server minimum region controller
ii python-django-maas 1.5+bzr2252-0ubuntu1 all MAAS server Django web framework
ii python-maas-client 1.5+bzr2252-0ubuntu1 all MAAS python API client
ii python-maas-provisioningserver 1.5+bzr2252-0ubuntu1 all MAAS server provisioning libraries
-
Repro:
This is a pretty standard initial configuration afaict, following the provided instructions. I notice there are no grub.cfg-* anywhere, only the grub.cfg template. Could that be why none of the nodes are doing anything once they're in the grub shell?
root at pre-maas-ctrl:~# cat /var/lib/maas/boot-
resources/current/grub/grub.cfg
# MAAS GRUB2 pre-loader configuration file
# Load based on MAC address first.
configfile (pxe)/grub/grub.cfg-${net_default_mac}
# Failed to load based on MAC address.
# Load amd64 by default, UEFI only supported by 64-bit
configfile (pxe)/grub/grub.cfg-default-amd64
root at pre-maas-ctrl:~# ls -l /var/lib/maas/boot-resources/current/grub/
total 4
-rw-r--r-- 1 root root 270 May 6 18:23 grub.cfg
root at pre-maas-ctrl:~# locate grub.cfg
/boot/grub/grub.cfg
/usr/share/doc/grub-common/examples/grub.cfg
/var/lib/maas/boot-resources/snapshot-20140506-172255/grub/grub.cfg
-
- Controller VM is connected to unrouted internal private network and external lab, which is not used by MaaS. Nodes are only connected to the private n/w. Controller is managing tftp, dhcp and dns and ip helper pointed to its private IP.
+ Controller VM is connected to unrouted internal private network and
+ external lab, which is not used by MaaS. Nodes are only connected to
+ the private n/w. Controller is managing tftp, dhcp and dns and ip
+ helper pointed to its private IP.
Nodes are configured for 'Default Ubuntu Release' Trusty Tahr. Boot
images:
4 trusty amd64 generic commissioning release May 6, 2014, 6:23 p.m.
7 trusty amd64 generic install release May 6, 2014, 6:23 p.m.
3 trusty amd64 generic xinstall release May 6, 2014, 6:23 p.m.
5 trusty i386 generic commissioning release May 6, 2014, 6:23 p.m.
12 trusty i386 generic install release May 6, 2014, 6:23 p.m.
9 trusty i386 generic xinstall release May 6, 2014, 6:23 p.m.
6 precise amd64 generic commissioning release May 6, 2014, 6:23 p.m.
11 precise amd64 generic install release May 6, 2014, 6:23 p.m.
10 precise amd64 generic xinstall release May 6, 2014, 6:23 p.m.
2 precise i386 generic commissioning release May 6, 2014, 6:23 p.m.
8 precise i386 generic install release May 6, 2014, 6:23 p.m.
1 precise i386 generic xinstall release May 6, 2014, 6:23 p.m.
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to python-tx-tftp in Ubuntu.
https://bugs.launchpad.net/bugs/1317705
Title:
Commissioning x86_64 node never completes, sitting at grub prompt,
pserv py tbs
To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1317705/+subscriptions
More information about the Ubuntu-server-bugs
mailing list