[Lucid-ec2] SRU: Prevent divide by zero crashes

Stefan Bader stefan.bader at canonical.com
Tue Jan 18 15:34:21 UTC 2011


SRU Justification:

Impact: When trying to find the busiest group for the scheduler, there are
rare (but it seems more likely in EC2) cases where cpu_power is zero when
the code tries to divide by that variable.

Fix: There is no real fix yet (and therefor both patches are not upstream)
but users have tested the first patch which works around the issue by
avoiding the divide whenever cpu_power actually is zero.
The second patch is an optional companion to the first one which hopefully
will yell when cpu_power is set to zero by accident. While it is neither
a bug fix nor really needed I would like to add it, too. That way we could
potentially catch the real bug in real usage (which seems to be the only
way to get it after an extended period of time) and then revert both changes
in future, when there is a fix.

Testcase: Not being able to reproduce in test. But this has been reported
to happen after around a week of uptime on production servers.




More information about the kernel-team mailing list