Looking for resources on system performance monitoring & tuning

Charles Bearden Charles.F.Bearden at uth.tmc.edu
Wed Sep 14 21:02:28 UTC 2011


I'm the de facto admin for 32-core research server running Ubuntu Server 10.04 
LTS on Dell hardware. 'uname -a' gives

   Linux UTxxxxx 2.6.32-33-server #72-Ubuntu SMP Fri Jul 29 21:21:55 UTC 2011 
x86_64 GNU/Linux

An external collaborator is running a long-running multi-threaded CPU-intensive 
process on our machine. They claim that it runs fine on all CPUs for about 6 
days before being reduced to about 4 cores. We sometimes renice it a bit, but we 
aren't taking any action to throttle it. The other users of this system have 
occasional processes that run for an hour or two and consume most of one or two 
processes, but nothing like our collaborator's process either for resource 
intensity or duration.

I'm tasked with working on the problem on the server-side (checking the 
application is up to them). Can you recommend resources that will bring me up to 
speed on performance monitoring and tuning? I know how to run e.g. dstat, and I 
have a feel for the CPU, mem, and disk i/o numbers, but I'm not sure how to 
interpret the hardware & software interrupt numbers.

One other oddity that we've noticed with this process in the past is that after 
running for a few days, it disappears from the top of 'top', and in the ps 
output for it it has some ridiculously high number for % of CPU time (should be 
around 3200%, i.e. all of 32 cores, but it could be 1393215%). This struck us as 
weird even before our partners told us about the performance problem.

Many thanks in advance. I really appreciate any pointers to good, reliable material.
-- 
Chuck Bearden
Programmer Analyst IV
The University of Texas Health Science Center at Houston
School of Biomedical Informatics
Email: Charles.F.Bearden at uth.tmc.edu
Phone: 713.500.9672





More information about the ubuntu-server mailing list