The default file descriptor limit (ulimit -n 1024) is too low

Mon Sep 27 21:45:23 BST 2010

On Mon, Sep 20, 2010 at 12:18:45PM -0400, Etienne Goyer wrote:
> On 10-09-20 12:06 PM, Stephan Hermann wrote:
> > On Mon, Sep 20, 2010 at 03:21:29AM -0700, Scott Ritchie wrote:
> >> Would there be any harm in raising this?
> >>
> >> I ask because I've seen a real world application hit this limit.  The
> >> application in question is multithreaded and opens separate threads and
> >> files to work on for each core; on a 4 core machine it stays under the
> >> limit, while on an 8 core machine it hits it and runs into strange errors.
> >>
> >> I feel that, as we get more and more cores on machines applications like
> >> this are going to increasingly be a problem with a ulimit of only 1024.
> > 
> > I saw this as well recently here...but it was more a developers bug not closing
> > sockets or files, while trying to code in java.
> > 
> > In case this needs to be raised, this should be done on a per server decision by the sysadmin.
> > I don't see why it should be raised in general...
> 
> That's the thing: AFAICT, there is no single place where you can raise
> that value system-wide.  Doing so for daemon involve invoking ulimit
> from within their init script (a hack at best).  Or perhaps there *is* a
> way to raise it globally that I do not know about, I which case I would
> love to know about it. :)

There are places where you can do it.../etc/security/limits.conf is one of them
Anyways,

> 
> Also, if you turn the question around, is there a good reason *not* to
> raise that limit?

Yes. Mostly amok running apps who are opening sockets like hell, and not
closing them nicely.

For example there is a long time standing bug in the Java JVM, and it's not fixed as it
should be, even when the bugreport says so. 
(http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6215050)

So, we have some sources which uses a URLDataSource to fetch remote assets from
a webserver, to attach it e.g. to a mail as local referenced attachment.
It looks that the underlying source of URLDataSource is opening a socket, but
it's not really freeing the unused socket afterwards. The remote part already
dropped the socket, but the TCP state on the local site is still in CLOSE_WAIT.

The fun part is the GC of the JVM. On Jaunty, the GC comes and frees memory
like it should, but doesn't clean up the sockets. 
On Lucid the GC comes frees memory and the sockets which are in CLOSE_WAIT
status (and yes, it's the same patch level)

lsof shows the bugger easily, but finding the bugger took time, and
Suns/Oracles bugtracker is not as formidable then Launchpad or the BTS.

The first reaction was: "Hey You SysAdmin, Your System Is Crap"
The first reaction of me was: "Hey You Developer, Your App is Crap"

The second reaction of me was:"You Developer, try to close those sockets,
because your Java does not do that properly"
And now the developer is in "what to do" mode. 
The developers reaction was "You could raise this limit from 1024 up", and my
reaction is to say "No, I don't. You fix your app, and that's it".

The problem is not the file open max limit, the problem is mostly the
Application doing wrong things.
There are workarounds to those problems, and there are real fixes. 

The "file open max" problem is as old as we are using NSCA httpd, having 256
open logfiles at the same time wasn't good, nowadays nobody is opening 1024 or
more logfiles ( at least I don't know anyone who is insane enough to do that).

So, instead of raising a sane default, and not knowing what happens after that,
we should try to fix the application which runs amok.

The solution on our side is to use something else to fetch the remote assets,
and to close the sockets in a nice and clean way, as every app should do it. :)

Regards,

\sh
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
Url : https://lists.ubuntu.com/archives/ubuntu-devel/attachments/20100927/abb07b67/attachment-0001.pgp