[Bug 1353008] Re: MAAS Provider: LXC did not get DHCP address, stuck in "pending"

Scott Moser smoser at ubuntu.com
Wed Sep 17 14:44:33 UTC 2014


** Description changed:

+ === Begin SRU Information ===
+ This bug causes lxc containers created by the ubuntu-cloud template (lxc-create -t ubuntu-cloud) to sometimes not obtain an IP address, and thus not correctly b
+ oot to completion.
+ 
+ The bug is in an assumption by cloud-init that /run is mounted before the cloud-init-local job is run.  The fix is very simply to guarantee that it is via modif
+ ication to its upstart 'start on'.
+ 
+ When booting with an initramfs /run will be mounted before /, so the race condition is not possible.  Thus, the failure case is only either in non-initramfs boot (which is very unlikely) or in lxc boot.  The lxc case seems only to occur ver
+ y rarely, somewhere well under one percent of the time.
+ 
+ [Test Case]
+ A test case is written at [1] that launches many instances in an attempt brute f
+ orce find the error.  However, I've not been able to make it fail.
+ 
+ The original bug reporter has been running with the 'start on' change
+ and has seen no errors since.
+ 
+ We will request the original bug reporter to apply the uploaded changes and run
+ through their battery.
+ 
+ [Regression Potential] 
+ The possibility for regression here is in the second boot of an instance.  The following scenario is a change of behavior:
+  * the user boots an instance with NoCloud or ConfigDrive in ds=local mode
+  * user changes /etc/network/interfaces in a way that would cause
+    static-networking to not be emitted on subsequent boot
+  * user reboots
+ Now, instead of a quick boot, the user may see cloud-init-nonet blocking on network coming up.
+ 
+ This would be a uncommon scenario, and the broken-etc-network-interfaces scenari
+ o is already one that causes timeouts on boot.
+ === End  SRU Information ===
+ 
  Note, that after I went onto the system, it *did* have an IP address.
  
        0/lxc/3:
          agent-state: pending
          instance-id: juju-machine-0-lxc-3
          series: trusty
          hardware: arch=amd64
  
  cloud-init-output.log snip:
  
  Cloud-init v. 0.7.5 running 'init' at Mon, 04 Aug 2014 23:57:12 +0000. Up 572.29 seconds.
  ci-info: +++++++++++++++++++++++Net device info+++++++++++++++++++++++
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: | Device |  Up  |  Address  |    Mask   |     Hw-Address    |
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: |   lo   | True | 127.0.0.1 | 255.0.0.0 |         .         |
  ci-info: |  eth0  | True |     .     |     .     | 00:16:3e:34:aa:57 |
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  Cloud-init v. 0.7.5 running 'modules:config' at Mon, 04 Aug 2014 23:57:12 +0000. Up 572.99 seconds.
  Cloud-init v. 0.7.5 running 'modules:final' at Mon, 04 Aug 2014 23:57:14 +0000. Up 574.42 seconds.
  Cloud-init v. 0.7.5 finished at Mon, 04 Aug 2014 23:57:14 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net].  Up 574.54 seconds
  
  syslog on system, showing DHCPACK 1 second later:
  
  root at juju-machine-0-lxc-3:/home/ubuntu# grep DHCP /var/log/syslog
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 255.255.255.255 port 67 (xid=0x1687c544)
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPOFFER of 10.96.3.173 from 10.96.0.10
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
  Aug  5 05:28:15 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 10.96.0.10 port 67 (xid=0x1687c544)
  Aug  5 05:28:15 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
  Aug  5 11:15:00 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 10.96.0.10 port 67 (xid=0x1687c544)
  Aug  5 11:15:00 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
  
  It appears in every case, cloud-init init-local has failed very early
  visible in juju logs /var/lib/juju/containers/<container>/console.log:
  
  Traceback (most recent call last):
-   File "/usr/bin/cloud-init", line 618, in <module>
-     sys.exit(main())
-   File "/usr/bin/cloud-init", line 614, in main
-     get_uptime=True, func=functor, args=(name, args))
-   File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1875, in log_time
-     ret = func(*args, **kwargs)
-   File "/usr/bin/cloud-init", line 491, in status_wrapper
-     force=True)
-   File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1402, in sym_link
-     os.symlink(source, link)
+   File "/usr/bin/cloud-init", line 618, in <module>
+     sys.exit(main())
+   File "/usr/bin/cloud-init", line 614, in main
+     get_uptime=True, func=functor, args=(name, args))
+   File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1875, in log_time
+     ret = func(*args, **kwargs)
+   File "/usr/bin/cloud-init", line 491, in status_wrapper
+     force=True)
+   File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1402, in sym_link
+     os.symlink(source, link)
  OSError: [Errno 2] No such file or directory

** Description changed:

  === Begin SRU Information ===
- This bug causes lxc containers created by the ubuntu-cloud template (lxc-create -t ubuntu-cloud) to sometimes not obtain an IP address, and thus not correctly b
- oot to completion.
+ This bug causes lxc containers created by the ubuntu-cloud template (lxc-create -t ubuntu-cloud) to sometimes not obtain an IP address, and thus not correctly boot to completion.
  
- The bug is in an assumption by cloud-init that /run is mounted before the cloud-init-local job is run.  The fix is very simply to guarantee that it is via modif
- ication to its upstart 'start on'.
+ The bug is in an assumption by cloud-init that /run is mounted before
+ the cloud-init-local job is run.  The fix is very simply to guarantee
+ that it is via modification to its upstart 'start on'.
  
- When booting with an initramfs /run will be mounted before /, so the race condition is not possible.  Thus, the failure case is only either in non-initramfs boot (which is very unlikely) or in lxc boot.  The lxc case seems only to occur ver
- y rarely, somewhere well under one percent of the time.
+ When booting with an initramfs /run will be mounted before /, so the
+ race condition is not possible.  Thus, the failure case is only either
+ in non-initramfs boot (which is very unlikely) or in lxc boot.  The lxc
+ case seems only to occur very rarely, somewhere well under one percent
+ of the time.
  
  [Test Case]
- A test case is written at [1] that launches many instances in an attempt brute f
- orce find the error.  However, I've not been able to make it fail.
+ A test case is written at [1] that launches many instances in an attempt brute force find the error.  However, I've not been able to make it fail.
  
  The original bug reporter has been running with the 'start on' change
  and has seen no errors since.
  
- We will request the original bug reporter to apply the uploaded changes and run
- through their battery.
+ We will request the original bug reporter to apply the uploaded changes
+ and run through their battery.
  
- [Regression Potential] 
+ [Regression Potential]
  The possibility for regression here is in the second boot of an instance.  The following scenario is a change of behavior:
-  * the user boots an instance with NoCloud or ConfigDrive in ds=local mode
-  * user changes /etc/network/interfaces in a way that would cause
-    static-networking to not be emitted on subsequent boot
-  * user reboots
+  * the user boots an instance with NoCloud or ConfigDrive in ds=local mode
+  * user changes /etc/network/interfaces in a way that would cause
+    static-networking to not be emitted on subsequent boot
+  * user reboots
  Now, instead of a quick boot, the user may see cloud-init-nonet blocking on network coming up.
  
- This would be a uncommon scenario, and the broken-etc-network-interfaces scenari
- o is already one that causes timeouts on boot.
+ This would be a uncommon scenario, and the broken-etc-network-interfaces
+ scenario is already one that causes timeouts on boot.
+ 
+ --
+ [1] http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/cloud-init-test/view/head:/tests/lxc-test-new-instance
+ 
  === End  SRU Information ===
  
  Note, that after I went onto the system, it *did* have an IP address.
  
        0/lxc/3:
          agent-state: pending
          instance-id: juju-machine-0-lxc-3
          series: trusty
          hardware: arch=amd64
  
  cloud-init-output.log snip:
  
  Cloud-init v. 0.7.5 running 'init' at Mon, 04 Aug 2014 23:57:12 +0000. Up 572.29 seconds.
  ci-info: +++++++++++++++++++++++Net device info+++++++++++++++++++++++
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: | Device |  Up  |  Address  |    Mask   |     Hw-Address    |
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: |   lo   | True | 127.0.0.1 | 255.0.0.0 |         .         |
  ci-info: |  eth0  | True |     .     |     .     | 00:16:3e:34:aa:57 |
  ci-info: +--------+------+-----------+-----------+-------------------+
  ci-info: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Route info failed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  Cloud-init v. 0.7.5 running 'modules:config' at Mon, 04 Aug 2014 23:57:12 +0000. Up 572.99 seconds.
  Cloud-init v. 0.7.5 running 'modules:final' at Mon, 04 Aug 2014 23:57:14 +0000. Up 574.42 seconds.
  Cloud-init v. 0.7.5 finished at Mon, 04 Aug 2014 23:57:14 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net].  Up 574.54 seconds
  
  syslog on system, showing DHCPACK 1 second later:
  
  root at juju-machine-0-lxc-3:/home/ubuntu# grep DHCP /var/log/syslog
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 255.255.255.255 port 67 (xid=0x1687c544)
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPOFFER of 10.96.3.173 from 10.96.0.10
  Aug  4 23:57:13 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
  Aug  5 05:28:15 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 10.96.0.10 port 67 (xid=0x1687c544)
  Aug  5 05:28:15 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
  Aug  5 11:15:00 juju-machine-0-lxc-3 dhclient: DHCPREQUEST of 10.96.3.173 on eth0 to 10.96.0.10 port 67 (xid=0x1687c544)
  Aug  5 11:15:00 juju-machine-0-lxc-3 dhclient: DHCPACK of 10.96.3.173 from 10.96.0.10
  
  It appears in every case, cloud-init init-local has failed very early
  visible in juju logs /var/lib/juju/containers/<container>/console.log:
  
  Traceback (most recent call last):
    File "/usr/bin/cloud-init", line 618, in <module>
      sys.exit(main())
    File "/usr/bin/cloud-init", line 614, in main
      get_uptime=True, func=functor, args=(name, args))
    File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1875, in log_time
      ret = func(*args, **kwargs)
    File "/usr/bin/cloud-init", line 491, in status_wrapper
      force=True)
    File "/usr/lib/python2.7/dist-packages/cloudinit/util.py", line 1402, in sym_link
      os.symlink(source, link)
  OSError: [Errno 2] No such file or directory

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to cloud-init in Ubuntu.
https://bugs.launchpad.net/bugs/1353008

Title:
  MAAS Provider: LXC did not get DHCP address, stuck in "pending"

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1353008/+subscriptions



More information about the Ubuntu-server-bugs mailing list