cpufreqd as standard install?
John Moser
john.r.moser at gmail.com
Sat Mar 3 08:16:39 UTC 2012
On 03/03/2012 12:13 AM, Phillip Susi wrote:
> On 02/29/2012 04:40 PM, John Moser wrote:
>> At full load (encoding a video), it eventually reaches 80C and the
>> system shuts down.
>
> It sounds like you have some broken hardware. The stock heatsink and
> fan are designed to keep the cpu from overheating under full load at
> the design frequency and voltage. You might want to verify that your
> motherboard is driving the cpu at the correct frequency and voltage.
>
Possibly.
The only other use case I can think of is when ambient temperature is
hot. Remember server rooms use air conditioning; I did find that for a
while my machine would quickly overheat if the room temperature was
above 79F, and so kept the room at 75F. The heat sink was completely
clogged with dust at the time, though, which is why I recently cleaned
and inspected it and checked all the fan speed monitors and motherboard
settings to make sure everything was running as appropriate.
In any case if the A/C goes down in a server room, it would be nice to
have the system CPU frequency scaling kick in and take the clock speed
down before the chip overheats. Modern servers--for example, the new
revision of the Dell PowerEdge II and III as per 4 or 5 years ago--lean
on their low-power capabilities, and modern data centers use a
centralized DC converter and high voltage (220V) DC mains in the data
center to reduce power waste because of the high cost of electricity.
It's extremely likely that said servers would provide a low enough clock
speed to not overheat without air conditioning, which is an emergency
situation.
Of course, the side benefit of not overheating desktops with inadequate
cooling or faulty motherboard behavior is simply a bonus. Still, I
believe in fault tolerance.
>> I currently have cpufreqd configured to clock to 1.8GHz at 73C, and move
>> to the ondemand governor at 70C.
>
> This need for manual configuring is a good reason why it is not a
> candidate for standard install.
>
I've attached a configuration that generically uses sensors (i.e. if the
program 'sensors' gives useful output, this works). It's just one core
though (a multi-core system reads the same temperature for them all, as
it's per-CPU); you can easily automatically generate this.
Mind you on the topic of automatic generation, 80C is a hard limit. It
just is. My machine reports (through sensors) +95.0C as "Critical", but
my BIOS shuts down the system at +80.0C immediately. Silicon physically
does not tolerate temperatures above 80.0C well at all; if a chip claims
it can run at 95.0C it's lying. Even SOD-CMOS doesn't tolerate those
temperatures.
As well, again, you could write some generic profiles that detect when
the system is running on battery (UPS, laptop) and make appreciable
adjustments based on how much battery life is left.
>> At 73C, the system switches from 1.9GHz to 1.8GHz. Ten seconds later,
>> it's at 70C and switches back to 1.9GHz. 41 seconds after that, it
>> reaches 73C again and switches to 1.8GHz.
>>
>> That means at stock frequency (1.9GHz) with stock cooling equipment, the
>> CPU overheats under full load. Clocked 0.1GHz slower than its rated
>> speed, it rapidly cools. Which is ridiculous; who designed this thing?
>
> This sounds like your motherboard is overvolting the cpu in that 1.9
> GHz stepping.
>
Possibly, but the settings are all default, nothing set to overclock (it
has jumper free overclocking configuration, but the option "Standard" is
default for clock rate and voltage settings, which I assume the CPU
supplies).
Basically the argument here is between "Supply fault tolerance" and
"Well your motherboard is [old|poorly designed] so buy a new one."
That's an excellent argument for hard drives (I have, in fact, suggested
in the past that Ubuntu monitor hard disks for behavior indicative of
dying drives--SMART errors, IDE RESET commands because the drive hangs,
etc--and begin annoying the user with messages about the SEVERE risk of
extreme data loss if he doesn't back up his data), but really if my
mobo/CPU is aging and the CPU runs a little hot I'm not going to cry
when the CPU suddenly burns out and my machine shuts down. I'll be
confused, annoyed, but I'll buy a new one--I might buy an entire new
computer, unaware that just my CPU is broken, and shove the hard drive
in there. So there's no harm in allowing the user's hardware to go
ahead and burn itself out if you think that's what's going on here.
By all means that doesn't mean you can't have a diagnostic center
somewhere that the user can review and see the whole collection.
"Ethernet: Lots of garbage [Possibly: Faulty switch, faulty NIC,
another computer with a chattering NIC spewing packets]." "CPU:
Overheats under high CPU load [Possibly: Dust-clogged CPU heat sink,
failing CPU fan, overclocking, failing CPU, failing motherboard voltage
regulators, buggy motherboard BIOS]." "/!\ Hard drive: Freezes and
needs IDE Resets [Possibly: Dying hard drive/!\, dying IDE controller,
dying RAID controller] /!\WARNING: SEVERE DATA LOSS POSSIBLE". Etc.
Looks like you really need a new computer...
Yes I have strange ideas about what a computer should and shouldn't do.
But then, you know, people run huge racks of computers that fail
catastrophically if you don't pipe an air conditioning line straight to
the chassis fan intake (take a look under the cabinet, the floor tile
directly under each server rack is perforated--the raised floor has A/C
pumped under it and it vents directly and exclusively into the server
cabinets).
-------------- next part --------------
# this is a comment
# see CPUFREQD.CONF(5) manpage for a complete reference
#
# Note: ondemand/conservative Profiles are disabled because
# they are not available on many platforms.
[General]
pidfile=/var/run/cpufreqd.pid
poll_interval=0.2
verbosity=4
#enable_remote=1
#remote_group=root
[/General]
[Profile]
name=Standard
minfreq=0%
maxfreq=100%
policy=ondemand
[/Profile]
[Profile]
name=Hot
minfreq=50%
maxfreq=95%
policy=ondemand
[/Profile]
[Profile]
name=Overheating
minfreq=0%
maxfreq=10%
policy=ondemand
[/Profile]
##
# Basic states
##
[Rule]
name=Normal
#acpi_temperature=0-70
sensor=temp1:0-70
#cpu_interval=00-100
profile=Standard
[/Rule]
##
# Special Rules
##
# CPU Too hot!
[Rule]
name=CPU Hot
#acpi_temperature=4-5
sensor=temp1:73-76
#cpu_interval=00-100
profile=Hot
[/Rule]
[Rule]
name=CPU Too Hot
#acpi_temperature=50-100
sensor=temp1:76-100
#cpu_interval=00-100
profile=Overheating
[/Rule]
More information about the Ubuntu-devel-discuss
mailing list