Great info guys. Thanks<div><br></div><div>Don't suppose you know of a way to simulate a drive failure? We dont have any failing drives to test the monitoring scripts with.</div><div><br></div><div>Olly<br><br><div class="gmail_quote">
On 19 October 2012 13:19, Marius Gedminas <span dir="ltr"><<a href="mailto:marius@pov.lt" target="_blank">marius@pov.lt</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Fri, Oct 19, 2012 at 09:43:56AM +0100, Oliver Marshall wrote:<br>
> We have an increasing number of ubuntu based machines kicking about, either<br>
> desktops or basic servers. Most have single disks, or dual disks with a<br>
> software raid.<br>
><br>
> We want to monitor them for disk issues after a few of the older ones<br>
> (admittedly very old) died.<br>
<br>
</div>sudo apt-get install smartmontools<br>
<br>
Then make sure sending email to root@localhost works and forwards<br>
somewhere appropriately (e.g. install postfix or ssmtp, define a root<br>
alias in /etc/aliases).<br>
<div class="im"><br>
> There seems to be a mass of places that we<br>
> might look and script to check but not one place itself.<br>
><br>
> I'm told that SCSI errors should appear in /var/log/syslog and that we<br>
> might be able to use smartmon to monitor the smart status of the disks.<br>
> Smart statuses are notoriously unreliable though with disks failing without<br>
> any warning from the smart chips.<br>
<br>
</div>After the disk fails, I think, you ought to get an email from smartd<br>
about a new entry in the SMART Error Log. Or maybe about an increment<br>
in the "# of bad blocks" attribute ("reallocated sector count" or some<br>
such).<br>
<div class="im"><br>
> In the windows world we have a certain number of event codes in the event<br>
> logs we monitor for. Is there a similar thing we can use here? Monitor for<br>
> a certain string or code in a certain log which all disk errors can be<br>
> expected to use?<br>
<br>
</div>Hm. I don't really know anything about that. There's logcheck, but<br>
it's based on a blacklist of events you don't want to monitor, rather<br>
than a whitelist of events you want to know about.<br>
<div class="im"><br>
> Bare in mind we aren't using any 3rd party raid controllers. It's all<br>
> software stuff or single disk.<br>
<br>
</div>If you use software RAID, you should daily emails from mdadm about<br>
failed RAID members.<br>
<br>
Although last time a hard disk failed for me I received emails from<br>
smartd but nothing from mdadm, despite kernel.log containing messages<br>
like<br>
<br>
# [6627534.944848] raid1:md2: read error corrected (8 sectors at 5491720 on sda6)<br>
# [6627534.944864] raid1: sdb6: redirecting sector 5491720 to another mirror<br>
<span class="HOEnZb"><font color="#888888"><br>
Marius Gedminas<br>
--<br>
"In general, it is safe and legal to kill your children and their children"<br>
POSIX Prg Gt, by Donald Lewine, O'Reilly & Associates, 1991, p.110 (On process<br>
termination)<br>
-- <a href="http://lambda.weblogs.com/discuss/msgReader$7635?mode=day" target="_blank">http://lambda.weblogs.com/discuss/msgReader$7635?mode=day</a><br>
</font></span><br>--<br>
ubuntu-users mailing list<br>
<a href="mailto:ubuntu-users@lists.ubuntu.com">ubuntu-users@lists.ubuntu.com</a><br>
Modify settings or unsubscribe at: <a href="https://lists.ubuntu.com/mailman/listinfo/ubuntu-users" target="_blank">https://lists.ubuntu.com/mailman/listinfo/ubuntu-users</a><br>
<br></blockquote></div><br></div>
<br>
<img alt="" src="https://lh5.googleusercontent.com/-2-gHGBUE9eQ/Tx6JwDAOA5I/AAAAAAAABBI/Il4nIT8hNKU/s0-d/personal2.jpg" style="font:normal normal normal medium/normal 'Times New Roman'"><span style="font:normal normal normal medium/normal 'Times New Roman'"></span><table border="0" cellspacing="0" cellpadding="0" bgcolor="#ffffff" style="width:500px"><tbody><tr><td valign="top" width="220"><img alt="" src="https://lh3.googleusercontent.com/-b1F1QX1hEHU/Tx6K_mSOZ1I/AAAAAAAABBg/7fUhv69QANk/w500-h157-k/G2_SUPPORT_LLP_NEW_SMALL_220_50_white.jpg" width="220" height="69"></td><td valign="top" width="280" align="left"><div style="margin-top:10px"><font color="#808080" size="2" face="Calibri">Network Support<br>Online Backups<br>Server Management</font></div></td></tr><tr><td colspan="2"><p><font color="#000000" size="2" face="Calibri">Tel: 0845 307 3443<br>Web: <a href="http://www.g2support.com/" target="_blank">http://www.g2support.com</a><br>Twitter: <a href="http://twitter.com/home?status=@g2support" target="_blank">g2support</a><br>Google+: <a href="http://www.g2support.com/plus" target="_blank">http://www.g2support.<WBR>com/plus</a><br>Facebook: <a href="http://www.facebook.com/g2support" target="_blank">http://www.facebook.com/<WBR>g2support</a><br>Mail: Unit H, Hove Technology Centre, Hove, Sussex, BN3 7ES<br></font></p><p><font color="#000000" size="2" face="Calibri">Have you said something nice about us to a friend or colleague ? Let us say thanks. Find out more at <a href="http://www.g2support.com/referral" target="_blank">www.g2support.com/referral</a></font></p><p><font face="Calibri"><font color="#808080" size="2">G2 Support LLP is registered at Mill House, 103 Holmes Avenue, HOVE<br>BN3 7LE. Our registered company number is OC316341.</font></font></p></td></tr></tbody></table><br><div><img src="https://lh3.googleusercontent.com/-AJb6rYW9F9s/UGW-0nD_TnI/AAAAAAAAGP0/fwAuTYi_nRc/s75/Google-Apps-Button.png"> <span style="font-size:1.3em"> </span><img src="https://lh5.googleusercontent.com/-8r74XXu4XgU/UGW_4gD99nI/AAAAAAAAGQI/BTEAWLc8tzI/s75/lenovo.png" style="font-size:1.3em"> <img src="https://lh6.googleusercontent.com/-Xf84YuvL9TA/UGXCGSAbFBI/AAAAAAAAGQY/zLQyy21sQLQ/s118/ruckus.png" style="font-size:1.3em"></div>