[Bug 1337873] Re: Precise, Trusty, Utopic - ifupdown initialization problems caused by race condition

Rafael David Tinoco rafael.tinoco at canonical.com
Tue Sep 9 13:04:26 UTC 2014


** Description changed:

  It was brought to my attention (by others) that ifupdown runs into race
  conditions on some specific cases.
  
  [Impact]
  
  When trying to deploy many servers at once (higher chances of happening)
  or from time-to-time, like any other intermittent race-condition.
  Interfaces are not brought up like they should and this has a big impact
  for servers that cannot rely on network start scripts.
  
  The problem is caused by a race condition when init(upstart) starts up
  network interfaces in parallel.
  
  [Test Case]
  
  Use attached script to reproduce the error (it might take some hours, in
  a single virtual machine, for the error to occur).
  
- (example 1)
+ * please consider my bonding examples are using eth1 and eth2 as slave
+  interfaces.
  
- *** sequence to trigger race-condition ***
+ ifupdown some race conditions explained bellow:
  
- (a) ifup eth0                     (b) ifup -a for eth0
+ !!!!
+ case 1)
+ (a) ifup eth0 (b) ifup -a for eth0
  -----------------------------------------------------------------
  1-1. Lock ifstate.lock file.
-                                   1-1. Wait for locking ifstate.lock
-                                       file.
+                                   1-1. Wait for locking ifstate.lock
+                                       file.
  1-2. Read ifstate file to check
-      the target NIC.
+      the target NIC.
  1-3. close(=release) ifstate.lock
-      file.
+      file.
  1-4. Judge that the target NIC
-      isn't processed.
-                                   1-2. Read ifstate file to check
-                                        the target NIC.
-                                   1-3. close(=release) ifstate.lock
-                                        file.
-                                   1-4. Judge that the target NIC
-                                        isn't processed.
+      isn't processed.
+                                   1-2. Read ifstate file to check
+                                        the target NIC.
+                                   1-3. close(=release) ifstate.lock
+                                        file.
+                                   1-4. Judge that the target NIC
+                                        isn't processed.
  2. Lock and update ifstate file.
-    Release the lock.
-                                   2. Lock and update ifstate file.
-                                      Release the lock.
+    Release the lock.
+                                   2. Lock and update ifstate file.
+                                      Release the lock.
+ !!!
  
- (example 2)
  
- Bonding device using eth0.
- ifenslave for eth0 is also executed in parallel, eth0 remains down.
- 
- *** sequence to trigger race-condition ***
- 
- (a) ifenslave of eth0             (b) ifenslave of eth0
+ !!!
+ case 2)
+ (a) ifenslave of eth0 (b) ifenslave of eth0
  ------------------------------------------------------------------
- 3. Execute ifenslave of eth0.      3. Execute ifenslave of eth0.
+ 3. Execute ifenslave of eth0.     3. Execute ifenslave of eth0.
  4. Link down the target NIC.
  5. Write NIC id to
-    /sys/class/net/bond0/bonding
+    /sys/class/net/bond0/bonding
     /slaves then NIC gets up
-                                   4. Link down the target NIC.
-                                   5. Fails to write NIC id to
-                                      /sys/class/net/bond0/bonding/
+                                   4. Link down the target NIC.
+                                   5. Fails to write NIC id to
+                                      /sys/class/net/bond0/bonding/
                                       slaves it is already written.
- 
- (example 3)
- 
- bonding is not set to active-backup as defined in config file: When the
- init(upstart) executes "if-pre-up.d/ifenslave" script and "if-pre-
- up.d/vlan" script for bond0 device in parallel, the "if-pre-
- up.d/ifenslave" script fails to change the bonding mode with a error
- message, "bonding: unable to update mode of bond0 because interface is
- up.".
- 
- *** sequence to trigger race-condition ***
- 
- (a)ifup bond0                     (b)ifup -a
- -----------------------------------------------------------------------
- 1. Update statefile about bond0.
-                                   1. Does nothing about bond0
-                                      because statefile is already
-                                      updated about it.
- 2. ifenslave::setup_master()
-    sysfs_change_down mode 1
-    and link down bond0.
-                                   2. Link up bond0 by the vlan
-                                      script on the processing
-                                      for linking up bond0.201(*1).
- 3. "echo 1 > .../mode" fails.
- 
- [ /etc/network/if-pre-up.d/vlan ]
- 
- 46 if [ -n "$IF_VLAN_RAW_DEVICE" ] && [ ! -d /sys/class/net/$IFACE ]; then
- 47     if [ ! -x /sbin/vconfig ]; then
- 48         exit 0
- 49     fi
- 50     if ! ip link show dev "$IF_VLAN_RAW_DEVICE" > /dev/null; then
- 51         echo "$IF_VLAN_RAW_DEVICE does not exist, unable to create $IFACE"
- 52         exit 1
- 53     fi
- 54     ip link set up dev $IF_VLAN_RAW_DEVICE     <-- (*1).
- 55     vconfig add $IF_VLAN_RAW_DEVICE $VLANID
- 56 fi
- 
- 
- [Regression Potential]
- 
-  * Attaching proposed patch (for upstream as well) and describing
- potential later on today.
- 
- [Other Info]
- 
- Example: [ /etc/network/interfaces ]
- 
- auto lo
- iface lo inet loopback
- 
- auto eth0
- iface eth0 inet manual
-  bond-master bond0
- 
- auto eth1
- iface eth1 inet manual
-  bond-master bond0
- 
- auto bond0
- iface bond0 inet dhcp
-  bond-slaves eth0 eth1
-  hwaddress 11:22:33:44:55:66
-  bond-primary eth0
-  bond-mode 1
-  bond-miimon 100
-  bond-updelay 200
-  bond-downdelay 200
- 
- auto bond0.201
- iface bond0.201 inet dhcp
-  hwaddress 11:22:33:44:55:66
-  vlan-raw-device bond0
- ...
- 
- auto bond0.205
- iface bond0.205 inet dhcp
-  hwaddress 11:22:33:44:55:66
-  vlan-raw-device bond0
+ !!!

** Description changed:

  It was brought to my attention (by others) that ifupdown runs into race
  conditions on some specific cases.
  
  [Impact]
  
  When trying to deploy many servers at once (higher chances of happening)
  or from time-to-time, like any other intermittent race-condition.
  Interfaces are not brought up like they should and this has a big impact
  for servers that cannot rely on network start scripts.
  
  The problem is caused by a race condition when init(upstart) starts up
  network interfaces in parallel.
  
  [Test Case]
  
  Use attached script to reproduce the error (it might take some hours, in
  a single virtual machine, for the error to occur).
  
  * please consider my bonding examples are using eth1 and eth2 as slave
-  interfaces.
+  interfaces.
  
  ifupdown some race conditions explained bellow:
  
  !!!!
  case 1)
  (a) ifup eth0 (b) ifup -a for eth0
  -----------------------------------------------------------------
  1-1. Lock ifstate.lock file.
-                                   1-1. Wait for locking ifstate.lock
-                                       file.
+                                   1-1. Wait for locking ifstate.lock
+                                       file.
  1-2. Read ifstate file to check
-      the target NIC.
+      the target NIC.
  1-3. close(=release) ifstate.lock
-      file.
+      file.
  1-4. Judge that the target NIC
-      isn't processed.
-                                   1-2. Read ifstate file to check
-                                        the target NIC.
-                                   1-3. close(=release) ifstate.lock
-                                        file.
-                                   1-4. Judge that the target NIC
-                                        isn't processed.
+      isn't processed.
+                                   1-2. Read ifstate file to check
+                                        the target NIC.
+                                   1-3. close(=release) ifstate.lock
+                                        file.
+                                   1-4. Judge that the target NIC
+                                        isn't processed.
  2. Lock and update ifstate file.
-    Release the lock.
-                                   2. Lock and update ifstate file.
-                                      Release the lock.
+    Release the lock.
+                                   2. Lock and update ifstate file.
+                                      Release the lock.
  !!!
- 
  
  !!!
  case 2)
  (a) ifenslave of eth0 (b) ifenslave of eth0
  ------------------------------------------------------------------
  3. Execute ifenslave of eth0.     3. Execute ifenslave of eth0.
  4. Link down the target NIC.
  5. Write NIC id to
-    /sys/class/net/bond0/bonding
-    /slaves then NIC gets up
-                                   4. Link down the target NIC.
-                                   5. Fails to write NIC id to
-                                      /sys/class/net/bond0/bonding/
-                                      slaves it is already written.
+    /sys/class/net/bond0/bonding
+    /slaves then NIC gets up
+                                   4. Link down the target NIC.
+                                   5. Fails to write NIC id to
+                                      /sys/class/net/bond0/bonding/
+                                      slaves it is already written.
  !!!

-- 
You received this bug notification because you are a member of Ubuntu
Sponsors Team, which is subscribed to the bug report.
https://bugs.launchpad.net/bugs/1337873

Title:
  Precise, Trusty, Utopic - ifupdown initialization problems caused by
  race condition

Status in “ifupdown” package in Ubuntu:
  In Progress
Status in “ifupdown” package in Debian:
  New

Bug description:
  It was brought to my attention (by others) that ifupdown runs into
  race conditions on some specific cases.

  [Impact]

  When trying to deploy many servers at once (higher chances of
  happening) or from time-to-time, like any other intermittent race-
  condition. Interfaces are not brought up like they should and this has
  a big impact for servers that cannot rely on network start scripts.

  The problem is caused by a race condition when init(upstart) starts up
  network interfaces in parallel.

  [Test Case]

  Use attached script to reproduce the error (it might take some hours,
  in a single virtual machine, for the error to occur).

  * please consider my bonding examples are using eth1 and eth2 as slave
   interfaces.

  ifupdown some race conditions explained bellow:

  !!!!
  case 1)
  (a) ifup eth0 (b) ifup -a for eth0
  -----------------------------------------------------------------
  1-1. Lock ifstate.lock file.
                                    1-1. Wait for locking ifstate.lock
                                        file.
  1-2. Read ifstate file to check
       the target NIC.
  1-3. close(=release) ifstate.lock
       file.
  1-4. Judge that the target NIC
       isn't processed.
                                    1-2. Read ifstate file to check
                                         the target NIC.
                                    1-3. close(=release) ifstate.lock
                                         file.
                                    1-4. Judge that the target NIC
                                         isn't processed.
  2. Lock and update ifstate file.
     Release the lock.
                                    2. Lock and update ifstate file.
                                       Release the lock.
  !!!

  !!!
  case 2)
  (a) ifenslave of eth0 (b) ifenslave of eth0
  ------------------------------------------------------------------
  3. Execute ifenslave of eth0.     3. Execute ifenslave of eth0.
  4. Link down the target NIC.
  5. Write NIC id to
     /sys/class/net/bond0/bonding
     /slaves then NIC gets up
                                    4. Link down the target NIC.
                                    5. Fails to write NIC id to
                                       /sys/class/net/bond0/bonding/
                                       slaves it is already written.
  !!!

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/1337873/+subscriptions



More information about the Ubuntu-sponsors mailing list