[ec2-beta] data corruption

Ben Hendrickson ben at seomoz.org
Tue Apr 14 02:06:55 BST 2009


I use around 16 large instances for data processing tasks.  When using
the beta AMIs, I had around a dozen instances of data corruption.
Previously when using the alestic.com images, and now that I've
switched back to them, I haven't seen any corruption.  Is there any
known issue this could be related to?

I use machines by raiding together the disks via RAID 0, and then
installing ReiserFS on top of that.  The list of commands I use to do
this is at the bottom of this email.  The workload of the machines
changes somewhat, but generally it maxes out both of the cores, use
around 15MB/s of disk throughput (split even read/writing), and has
the disks around 60% full.  Our data is always compressed on disk
(LZO), and we have checksums every 64KB of uncompressed data.  What I
would see is that at a seemingly random point in a file the checksum
wouldn't match, although the checksums for the rest of the file both
before and after this point would be fine.  I didn't notice anything
unusual in the system logs.

I had these corruptions when using both the first and second beta
AMIs.  I would usually see about 1-2 corruptions a week.  It didn't
seem random in that if I didn't replace a machine that had a
corruption problem, it was more likely than other machines to have
future corruption.  I could read a corrupted file multiple times
without it changing, so the issue seems to be with writing and not
reading.  I do check the return values of my write calls, and they
never fail.

Anyway, I'm curious if anyone else has had similar issues, or if
anyone knows what this might be caused by.  If nobody else has seen
this, perhaps I have a bug in my code which for some reason doesn't
surface when using the Alestic AMIs.  Let me know if there is any
additional information anyone would find useful.

Thanks
Ben

PS, here are the commands I run to setup a machine.

gpg --keyserver keyserver.ubuntu.com --recv A0749E8C90AE9B49
gpg --export --armor A0749E8C90AE9B49 | apt-key add -
apt-get --force-yes -y update
apt-get --force-yes -y upgrade
export TERM=xterm;export DEBIAN_FRONTEND=noninteractive;apt-get -qq
--force-yes -y install g++ gcc-doc libstdc++6 gcc screen
libcurl4-openssl-dev gperf autoconf automake libc-ares-dev libc-ares2
glibc-doc make liblzo2-dev liblzo2-2 zlib1g-dbg zlib1g-dev zlib1g-dbg
libdb4.6 libdb4.6-dev libdb4.6-dbg libpopt0 libpopt-dev dstat gdb
libfcgi-dev libfcgi0ldbl libssl-dev libssh2-1 libssh2-1-dev
libmysqlclient15off mdadm
apt-get -qq --force-yes -y remove citadel-server citadel-mta
umount /mnt || true
umount /dev/sdb || true
umount /dev/sdc || true
grep -q 'md' /etc/modules || echo md >> /etc/modules
export TERM=xterm; yes | mdadm /dev/md0 --create -l 0 -n 2 /dev/sdb /dev/sdc
export TERM=xterm; grep -q 'ARRAY' /etc/mdadm/mdadm.conf || echo
"ARRAY /dev/md0 devices=/dev/sdb,/dev/sdc" >> /etc/mdadm/mdadm.conf
echo /dev/md0        /mnt           reiserfs    defaults        0
 0 >> /etc/fstab
mkfs.reiserfs -q /dev/md0
mount -a




More information about the Ec2-beta mailing list