Unable to start new processes

Fri Nov 5 00:20:42 UTC 2010

Ok, sorry for the absolutely horrendous delay, but I've finally got a
few test machines set up to hopefully solve this.

See my original post for a description of the problem, it still
applies. I'm seeing this problem on multiple instances of the same
hardware. The machines are all D510MOs with 1GB of ram and a 4GB USB
flash drive that is host to Ubuntu, previously 10.04, but now 10.10
and the errors persist. I captured the error (pasted below) over the
serial port, but I'd seen it once before and it occurred at a
different sector. I've restarted the machine and I'm sure it will
crash again within a day or two. I'm also setting up another machine,
exact same hardware, I'll see if that fails too.

I'm starting to think this is a systemic hardware fault somewhere, but
if anyone knows their kernel debug-fu I'd be happy to give something a
try at my end to hopefully narrow the focus a bit.

[266929.048995] end_request: I/O error, dev sda, sector 776208
[266929.065740] Buffer I/O error on device sda1, logical block 96770
[266929.084033] Buffer I/O error on device sda1, logical block 96771
[266929.102321] Buffer I/O error on device sda1, logical block 96772
[266929.120692] end_request: I/O error, dev sda, sector 3490352
[266929.137686] Aborting journal on device sda1-8.
[266929.137744] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8189
pages, ino 27933; err -30
[266929.137760] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8168
pages, ino 27845; err -30
[266929.137770] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8168
pages, ino 27951; err -30
[266929.137779] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8168
pages, ino 27953; err -30
[266929.137787] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8168
pages, ino 27956; err -30
[266929.276439] JBD2: I/O error detected when updating journal
superblock for sda1-8.
[266929.276542] EXT4-fs error (device sda1): ext4_journal_start_sb:
Detected aborted journal
[266929.276555] EXT4-fs (sda1): Remounting filesystem read-only
[266929.340709] journal commit I/O error
[266930.839068] EXT4-fs error (device sda1): ext4_find_entry: inode
#32023: (comm java) reading directory lblock 0
[266933.157820] sd 3:0:0:0: [sdb] Assuming drive cache: write through
[266933.176765] EXT4-fs error (device sda1): ext4_find_entry: inode
#312: (comm rsyslogd)
[266933.178520] sd 3:0:0:0: [sdb] Assuming drive cache: write through
[266933.222163] sd 3:0:0:0: [sdb] Assuming drive cache: write through
[266933.223117] EXT4-fs error (device sda1): ext4_find_entry: inode
#338: (comm udevd) reading directory lblock 0
[266971.032873] EXT4-fs error (device sda1): ext4_find_entry: inode
#32259: (comm postmaster) reading directory lblock 0
[266971.067016] EXT4-fs error (device sda1): ext4_find_entry: inode
#32259: (comm postmaster)
[266971.068947] EXT4-fs error (device sda1): ext4_find_entry: inode
#312: (comm rsyslogd) reading directory lblock 0
[266971.069077] EXT4-fs error (device sda1): ext4_find_entry: inode
#312: (comm rsyslogd) reading directory lblock 0
[266971.156148] EXT4-fs error (device sda1): ext4_find_entry: inode
#32018: (comm postmaster)
[266971.157334] EXT4-fs error (device sda1): ext4_find_entry: inode
#312: (comm rsyslogd) reading directory lblock 0
[266971.212704] EXT4-fs error (device sda1): ext4_find_entry: inode
#32259: (comm postmaster) reading directory lblock 0
[266973.229713] EXT4-fs error (device sda1): ext4_find_entry: inode
#2: (comm cron) reading directory lblock 0
[266973.259137] EXT4-fs error (device sda1): ext4_find_entry: inode
#6090: (comm cron) reading directory lblock 0
[267024.440346] EXT4-fs error (device sda1): ext4_find_entry: inode
#1611: (comm java) reading directory lblock 0
[267024.470643] EXT4-fs error (device sda1): ext4_find_entry: inode
#31906: (comm java) reading directory lblock 0
[267084.518307] EXT4-fs error (device sda1): ext4_find_entry: inode
#144807: (comm java) reading directory lblock 0
[267084.548999] EXT4-fs error (device sda1): ext4_find_entry: inode
#144802: (comm java) reading directory lblock 0
[267084.579665] EXT4-fs error (device sda1): ext4_find_entry: inode
#144801: (comm java) reading directory lblock 0
[267525.944731] EXT4-fs error (device sda1): ext4_find_entry: inode
#12: (comm ntpd) reading directory lblock 0
[270010.787208] EXT4-fs error (device sda1): ext4_find_entry: inode
#12: (comm ntpd) reading directory lblock 0
[270811.644198] EXT4-fs error (device sda1): ext4_find_entry: inode
#32245: (comm java) reading directory lblock 0

On Tue, Aug 24, 2010 at 5:01 AM, Hakan Koseoglu <hakan at koseoglu.org> wrote:
> Chris,
>
> On 24 August 2010 12:09, Karl Larsen <klarsen1 at gmail.com> wrote:
>> On 08/23/2010 09:31 PM, Chris MacDonald wrote:
> First let's take care of this.
>>         It appears you have a problem with ssh. Please give details on
>> how you have set up ssh. You should have zero problems using Tomcat on
>> the remote machine.
> Wrong, wrong, so wrong it's stupid.
>
>>>  From my machine the problem manifests itself as an inability to
>>> request much in the way of data from the remote machine, for instance,
>>> when I SSH in (ssh -v) it opens a connection, attempts to negotiate a
>>> session (I get a response from the remote machine), but then promptly
>>> closes the connection remotely before I get prompted for a password.
>>> Likewise for the running instance of Tomcat, I'll connect to the http
>>> port, it will accept my connection, but before I get anything back it
>>> closes the connection on me. I can ping the remote machine, it shows
>>> ports as open, I just can't seem to get any data.
> It looks like you cannot spawn any new processes. This can happen
> because of a couple of main reasons. First being the ulimits being
> reached. Typical Ubuntu installation does not have any limits on the
> amount of memory & processes a user can consume. You can check the
> limits by executing "ulimit -a". With the information given this
> sounds like a memory leak where the server is starved and any new
> processes are being killed. One other possibility is breaching the max
> amount of open files. You can use various tools to check these. My
> favourite is nmon, you can also use sar for checking cpu usage stats.
>
> The best action is figuring out what's running on your server and how
> do they behave as time goes. Nmon's capacity planning will give you
> the necessary overview although you might like to collect more data.
>
> One other thing to check is if your applications are consuming too
> many ports! You might like to have a look at
> net.ipv4.ip_local_port_range configuration you have. Regardless, This
> is usually quite a high range, if this is happening, you have an other
> problem like your processes not closing their ports after in use.
>
> Reducing the amount of memory allocated to Tomcat might be a starting
> point since that's the process most likely ballooning and leaking.
> Also look for OOM killer in the message files.
> --
> Hakan (m1fcj) - http://www.hititgunesi.org
>
> --
> ubuntu-users mailing list
> ubuntu-users at lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>