Open file descriptors (FD's) are an OS resource used for file handles and network connections (sockets). Given the distributed nature of GemFire deployments, it is often the case that GemFire data nodes (servers) have many socket connections open as they communicate with other servers in the same cluster, clients, or other GemFire clusters via WAN gateway connections. In such cases the default OS limit on the number of open file descriptors may be not high enough. The OS defaults can be quite low for today's needs and available resources. As is often the case, they simply have not been adjusted to reflect the new reality.
As a general guideline, we recommend setting the FD limit to 50000 or higher, depending on the estimated use. Nowadays even smartphones have enough RAM to support even higher numbers of file descriptors in use; Linux, for example, uses roughly 1KB of RAM per one open file descriptor for bookkeeping. So, the memory overhead for 50000 open file descriptors is less than 50MB. Given such a low overhead it is better to over-allocate than risk running out of file descriptors and bringing a production system to a stall.
Estimating the maximum number of FD's used by GemFire
To estimate the maximum number of open file descriptors that may be in use by GemFire, take into account the following:
- The number of concurrent connections to the JVM (n1)
- The maximum number of worker threads on the server JVM (maxThreads)
- The number of peers this JVM will be connected to. So, for instance if the server cluster has 10 peer nodes and if conserve-sockets is set to false (for high performance) then, you will need (maxThreads x 10 x 2) additional connections (n2)
- The number of open files the JVM will be working with. This is generally not more than 20 (n3)
The max FD limit per JVM, which we'll call perJVMFDCount, should be greater than: n1 + n2 + n3.
Then take into account the number of GemFire data node (server) JVMs that will be hosted on the physical machine. We'll call this number srvrCount.
The final FD count will be srvCount * perJVMFDCount.
One way to easily determine the FD usage in a running environment is to monitor the GemFire/SQLFire FD usage statistic. You can view this statistic using VSD. For more information see the article File Descriptor Issue.
One other thing that complicates things here and should be taken into account is the fact that JVM itself can contribute to an increase of open file descriptors: Sun JVM uses internal per-thread selectors for blocking network I/O, and each one of those selectors uses three file descriptors. The JVM does not clean up these selectors and their file descriptors until garbage collection. Because of this, the number of these selectors and file descriptors can keep growing, and if GC does not happen often enough, the FD limit can be reached. Sun JVM uses sun.nio.ch.Util$SelectorWrapper$Closer objects to clean up these selectors and their file descriptors; if a heap histogram shows a large number of them, there will also be three times as many FD's in use, which can be verified using lsof. Upon GC, the number of Closer objects and file descriptors will go down.
Controlling the GemFire's use of FD's
In the light of the above discussion, controlling GemFire's use of file descriptors may involve tuning JVM garbage collection, and GemFire settings that control the use of sockets and threads:
1. Garbage collection on the servers: Use CMS garbage collector, and CMSInitiatingOccupancyFraction set to the level than ensures a regular and timely GC cycle.
2. If using a GemFire version prior to 7.0.1, increase the gemfire.IDLE_THREAD_TIMEOUT Java system property on the servers (default=15000ms). GemFire 7.0.1 increased the default to 30 minutes, which should be high enough.
3. Increase the socket-lease-time GemFire property on the servers (default=60000 ms), or set to 0 to disable timeout altogether.
4. On the clients, Increase the client pool idle-timeout (default=5000 ms), or even turn it off by setting it to -1.
Modifying the Limit of Open File Descriptors on Linux
Limits can be checked using ulimit, like so:
ulimit -n -S
for soft limit on open files, and:
ulimit -n -H
for hard limit on open files. ulimit can be used to change the soft limit. Limits are configured in /etc/security/limits.conf, access to which requires superuser privileges.