[Bug 1188642] Re: Init aborts with the message: init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io

regmka regmka at gmail.com
Fri Jun 7 16:25:15 UTC 2013


** Also affects: ubuntu
   Importance: Undecided
       Status: New

** Package changed: ubuntu => upstart (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1188642

Title:
  Init aborts with the message: init: log.c:813: Assertion failed in
  log_clear_unflushed: ! log->io

Status in Upstart:
  New
Status in “upstart” package in Ubuntu:
  New

Bug description:
  init (in Ubuntu 13.04) accidentally aborts with the following messages
  in the kernel log:

  [    4.981969] init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io
  [    4.991031] init: Caught abort, core dumped
  [    4.995571] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

  I've figured out that the probable cause is a race between the line 699 of the file init/log.c
  of the function log_read_watch() that sets the field log->remote_closed to 1 (the caller clears log->io)
  and the routine log_clear_unflushed() that causes the abort when log->remote_closed == 1 but log->io != NULL:
  log_clear_unflushed() may preempt log_read_watch() when log->remote_closed is set to 1 but
  before log->io is cleared for the same log.

  I applied the following patch to prove this assertion:

  @@ -264,6 +266,8 @@
   			 */
   			log->io->error_handler = NULL;
   
  +			nih_fatal("log %p: clearing log->io\n", log);
  +
   			nih_free (log->io);
   			log->io = NULL;
   		}
  @@ -384,6 +388,8 @@
   
   	nih_free (err);
   
  +	nih_fatal("log %p: clearing log->io\n", log);
  +
   	/* Ensure the NihIo is closed */
   	nih_free (log->io);
   	log->io = NULL;
  @@ -696,7 +702,11 @@
   			 *
   			 */
   			if (saved && saved != EAGAIN && saved != EWOULDBLOCK)
  +			{
  +				nih_fatal("log %p: setting log->remote_closed\n", log);
   				log->remote_closed = 1;
  +			}
   
   			close (log->fd);
   			log->fd = -1;
  @@ -806,6 +816,8 @@
   		nih_assert (log->open_errno);
   
   		if (log->remote_closed) {
  +			nih_fatal("log %p: checking log->io\n", log);
  +
   			/* Parent job has ended and unflushed data
   			 * exists.
   			 */

  
  I got the  following kernel log before the abort:

  [    4.635146] init: log 0xb882a660: clearing log->io
  [    4.640487] init: log 0xb88035e0: setting log->remote_closed
  [    4.654816] init: log 0xb882ce88: clearing log->io
  [    4.655263] init: log 0xb882a1d0: setting log->remote_closed
  [    4.706758] init: log 0xb8831370: setting log->remote_closed
  [    6.714543] init: log 0xb882a600: clearing log->io
  [    6.733596] init: log 0xb882c6a0: clearing log->io
  [    6.746124] init: log 0xb882cbd8: setting log->remote_closed
  [    6.765075] init: log 0xb882b1d0: clearing log->io
  [    6.787772] init: log 0xb8842b50: clearing log->io
  [    6.801163] init: log 0xb882cb78: clearing log->io
  [    6.810717] init: log 0xb882df58: clearing log->io
  [    6.864869] init: log 0xb883e418: clearing log->io
  [    6.871033] init: log 0xb882d2a8: clearing log->io
  [    6.899113] init: log 0xb882d268: clearing log->io
  [    6.927477] init: log 0xb884eb28: clearing log->io
  [    6.942432] init: log 0xb88519a8: clearing log->io
  [    6.997231] init: log 0xb8850c40: clearing log->io
  [    7.031685] init: log 0xb8850650: clearing log->io
  [    7.042562] init: log 0xb8856d10: clearing log->io
  [    7.042811] init: log 0xb8857cf8: clearing log->io
  [    7.063461] init: log 0xb885a850: setting log->remote_closed
  [    7.063996] init: log 0xb883e478: setting log->remote_closed
  [    7.095505] init: log 0xb8854960: setting log->remote_closed
  [    7.135581] init: log 0xb883e478: checking log->io
  [    7.135756] init: log.c:825: Assertion failed in log_clear_unflushed: ! log->io
  [    7.140537] init: Caught abort, core dumped
  [    7.143211] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

  So the assertion has been proved for the log at 0xb883e478 which has the field remote_closed updated
  but the field io dirty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/upstart/+bug/1188642/+subscriptions




More information about the foundations-bugs mailing list