Unable to start new processes

Chris MacDonald chris at fourthandvine.com
Fri Nov 5 00:25:05 UTC 2010


On Thu, Nov 4, 2010 at 5:20 PM, Chris MacDonald <chris at fourthandvine.com> wrote:
> Ok, sorry for the absolutely horrendous delay, but I've finally got a
> few test machines set up to hopefully solve this.
>
> See my original post for a description of the problem, it still
> applies. I'm seeing this problem on multiple instances of the same
> hardware. The machines are all D510MOs with 1GB of ram and a 4GB USB
> flash drive that is host to Ubuntu, previously 10.04, but now 10.10
> and the errors persist. I captured the error (pasted below) over the
> serial port, but I'd seen it once before and it occurred at a
> different sector. I've restarted the machine and I'm sure it will
> crash again within a day or two. I'm also setting up another machine,
> exact same hardware, I'll see if that fails too.
>
> I'm starting to think this is a systemic hardware fault somewhere, but
> if anyone knows their kernel debug-fu I'd be happy to give something a
> try at my end to hopefully narrow the focus a bit.
>
> [266929.048995] end_request: I/O error, dev sda, sector 776208
> [266929.065740] Buffer I/O error on device sda1, logical block 96770
> [266929.084033] Buffer I/O error on device sda1, logical block 96771
> [266929.102321] Buffer I/O error on device sda1, logical block 96772
> [266929.120692] end_request: I/O error, dev sda, sector 3490352
> [266929.137686] Aborting journal on device sda1-8.
> [266929.137744] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8189
> pages, ino 27933; err -30
> [266929.137760] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8168
> pages, ino 27845; err -30
> [266929.137770] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8168
> pages, ino 27951; err -30
> [266929.137779] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8168
> pages, ino 27953; err -30
> [266929.137787] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8168
> pages, ino 27956; err -30
> [266929.276439] JBD2: I/O error detected when updating journal
> superblock for sda1-8.
> [266929.276542] EXT4-fs error (device sda1): ext4_journal_start_sb:
> Detected aborted journal
> [266929.276555] EXT4-fs (sda1): Remounting filesystem read-only
> [266929.340709] journal commit I/O error
> [266930.839068] EXT4-fs error (device sda1): ext4_find_entry: inode
> #32023: (comm java) reading directory lblock 0
> [266933.157820] sd 3:0:0:0: [sdb] Assuming drive cache: write through
> [266933.176765] EXT4-fs error (device sda1): ext4_find_entry: inode
> #312: (comm rsyslogd)
> [266933.178520] sd 3:0:0:0: [sdb] Assuming drive cache: write through
> [266933.222163] sd 3:0:0:0: [sdb] Assuming drive cache: write through
> [266933.223117] EXT4-fs error (device sda1): ext4_find_entry: inode
> #338: (comm udevd) reading directory lblock 0
> [266971.032873] EXT4-fs error (device sda1): ext4_find_entry: inode
> #32259: (comm postmaster) reading directory lblock 0
> [266971.067016] EXT4-fs error (device sda1): ext4_find_entry: inode
> #32259: (comm postmaster)
> [266971.068947] EXT4-fs error (device sda1): ext4_find_entry: inode
> #312: (comm rsyslogd) reading directory lblock 0
> [266971.069077] EXT4-fs error (device sda1): ext4_find_entry: inode
> #312: (comm rsyslogd) reading directory lblock 0
> [266971.156148] EXT4-fs error (device sda1): ext4_find_entry: inode
> #32018: (comm postmaster)
> [266971.157334] EXT4-fs error (device sda1): ext4_find_entry: inode
> #312: (comm rsyslogd) reading directory lblock 0
> [266971.212704] EXT4-fs error (device sda1): ext4_find_entry: inode
> #32259: (comm postmaster) reading directory lblock 0
> [266973.229713] EXT4-fs error (device sda1): ext4_find_entry: inode
> #2: (comm cron) reading directory lblock 0
> [266973.259137] EXT4-fs error (device sda1): ext4_find_entry: inode
> #6090: (comm cron) reading directory lblock 0
> [267024.440346] EXT4-fs error (device sda1): ext4_find_entry: inode
> #1611: (comm java) reading directory lblock 0
> [267024.470643] EXT4-fs error (device sda1): ext4_find_entry: inode
> #31906: (comm java) reading directory lblock 0
> [267084.518307] EXT4-fs error (device sda1): ext4_find_entry: inode
> #144807: (comm java) reading directory lblock 0
> [267084.548999] EXT4-fs error (device sda1): ext4_find_entry: inode
> #144802: (comm java) reading directory lblock 0
> [267084.579665] EXT4-fs error (device sda1): ext4_find_entry: inode
> #144801: (comm java) reading directory lblock 0
> [267525.944731] EXT4-fs error (device sda1): ext4_find_entry: inode
> #12: (comm ntpd) reading directory lblock 0
> [270010.787208] EXT4-fs error (device sda1): ext4_find_entry: inode
> #12: (comm ntpd) reading directory lblock 0
> [270811.644198] EXT4-fs error (device sda1): ext4_find_entry: inode
> #32245: (comm java) reading directory lblock 0

Just to clarify, in my original post I made reference to devices in
the field... I've downgraded those D945GCLF2 boards to 9.10 and
they're fine. I'm experiencing the same symptoms here at my desk with
a slightly different board (the D510MO), same flash drive, same RAM.

> On Tue, Aug 24, 2010 at 5:01 AM, Hakan Koseoglu <hakan at koseoglu.org> wrote:
>> Chris,
>>
>> On 24 August 2010 12:09, Karl Larsen <klarsen1 at gmail.com> wrote:
>>> On 08/23/2010 09:31 PM, Chris MacDonald wrote:
>> First let's take care of this.
>>>         It appears you have a problem with ssh. Please give details on
>>> how you have set up ssh. You should have zero problems using Tomcat on
>>> the remote machine.
>> Wrong, wrong, so wrong it's stupid.
>>
>>>>  From my machine the problem manifests itself as an inability to
>>>> request much in the way of data from the remote machine, for instance,
>>>> when I SSH in (ssh -v) it opens a connection, attempts to negotiate a
>>>> session (I get a response from the remote machine), but then promptly
>>>> closes the connection remotely before I get prompted for a password.
>>>> Likewise for the running instance of Tomcat, I'll connect to the http
>>>> port, it will accept my connection, but before I get anything back it
>>>> closes the connection on me. I can ping the remote machine, it shows
>>>> ports as open, I just can't seem to get any data.
>> It looks like you cannot spawn any new processes. This can happen
>> because of a couple of main reasons. First being the ulimits being
>> reached. Typical Ubuntu installation does not have any limits on the
>> amount of memory & processes a user can consume. You can check the
>> limits by executing "ulimit -a". With the information given this
>> sounds like a memory leak where the server is starved and any new
>> processes are being killed. One other possibility is breaching the max
>> amount of open files. You can use various tools to check these. My
>> favourite is nmon, you can also use sar for checking cpu usage stats.
>>
>> The best action is figuring out what's running on your server and how
>> do they behave as time goes. Nmon's capacity planning will give you
>> the necessary overview although you might like to collect more data.
>>
>> One other thing to check is if your applications are consuming too
>> many ports! You might like to have a look at
>> net.ipv4.ip_local_port_range configuration you have. Regardless, This
>> is usually quite a high range, if this is happening, you have an other
>> problem like your processes not closing their ports after in use.
>>
>> Reducing the amount of memory allocated to Tomcat might be a starting
>> point since that's the process most likely ballooning and leaking.
>> Also look for OOM killer in the message files.
>> --
>> Hakan (m1fcj) - http://www.hititgunesi.org
>>
>> --
>> ubuntu-users mailing list
>> ubuntu-users at lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-users
>>
>




More information about the ubuntu-users mailing list