Monitoring for disk issues

Oliver Marshall oliver.marshall at g2support.com
Fri Oct 19 15:59:41 UTC 2012


Great info guys. Thanks

Don't suppose you know of a way to simulate a drive failure? We dont have
any failing drives to test the monitoring scripts with.

Olly

On 19 October 2012 13:19, Marius Gedminas <marius at pov.lt> wrote:

> On Fri, Oct 19, 2012 at 09:43:56AM +0100, Oliver Marshall wrote:
> > We have an increasing number of ubuntu based machines kicking about,
> either
> > desktops or basic servers. Most have single disks, or dual disks with a
> > software raid.
> >
> > We want to monitor them for disk issues after a few of the older ones
> > (admittedly very old) died.
>
> sudo apt-get install smartmontools
>
> Then make sure sending email to root at localhost works and forwards
> somewhere appropriately (e.g. install postfix or ssmtp, define a root
> alias in /etc/aliases).
>
> > There seems to be a mass of places that we
> > might look and script to check but not one place itself.
> >
> > I'm told that SCSI errors should appear in /var/log/syslog and that we
> > might be able to use smartmon to monitor the smart status of the disks.
> > Smart statuses are notoriously unreliable though with disks failing
> without
> > any warning from the smart chips.
>
> After the disk fails, I think, you ought to get an email from smartd
> about a new entry in the SMART Error Log.  Or maybe about an increment
> in the "# of bad blocks" attribute ("reallocated sector count" or some
> such).
>
> > In the windows world we have a certain number of event codes in the event
> > logs we monitor for. Is there a similar thing we can use here? Monitor
> for
> > a certain string or code in a certain log which all disk errors can be
> > expected to use?
>
> Hm.  I don't really know anything about that.  There's logcheck, but
> it's based on a blacklist of events you don't want to monitor, rather
> than a whitelist of events you want to know about.
>
> > Bare in mind we aren't using any 3rd party raid controllers. It's all
> > software stuff or single disk.
>
> If you use software RAID, you should daily emails from mdadm about
> failed RAID members.
>
> Although last time a hard disk failed for me I received emails from
> smartd but nothing from mdadm, despite kernel.log containing messages
> like
>
>     # [6627534.944848] raid1:md2: read error corrected (8 sectors at
> 5491720 on sda6)
>     # [6627534.944864] raid1: sdb6: redirecting sector 5491720 to another
> mirror
>
> Marius Gedminas
> --
> "In general, it is safe and legal to kill your children and their children"
> POSIX Prg Gt, by Donald Lewine, O'Reilly & Associates, 1991, p.110 (On
> process
> termination)
>         -- http://lambda.weblogs.com/discuss/msgReader$7635?mode=day
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>
>

-- 
Network Support
Online Backups
Server Management

Tel: 0845 307 3443
Web: http://www.g2support.com
Twitter: g2support <http://twitter.com/home?status=@g2support>
Google+: http://www.g2support.com/plus
Facebook: http://www.facebook.com/g2support
Mail: Unit H, Hove Technology Centre, Hove, Sussex, BN3 7ES

Have you said something nice about us to a friend or colleague ? Let us say 
thanks. Find out more at www.g2support.com/referral

G2 Support LLP is registered at Mill House, 103 Holmes Avenue, HOVE
BN3 7LE. Our registered company number is OC316341.

      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.ubuntu.com/archives/ubuntu-users/attachments/20121019/72f9cae7/attachment.html>


More information about the ubuntu-users mailing list