ZFS Tuning: ARC

A Note from the CTO:

This series of articles provides technical tips on how to construct, tune, and manage ZFS storage systems.

WARP Mechanics offers a wide range of commercially-supported Enterprise-class ZFS appliances. These have already been set up and tuned optimally at the factory. Therefore it is rarely necessary for WARP customers to get into this level of detail.

However, WARP believes in giving back to the community, and that WARP appliances may not be right for everyone. Documents such as this may be useful for those who need a ZFS system other than the types offered by WARP, and therefore need to build their own.

However, setting up an enterprise ZFS storage operating system is far more complex than can be explained in any short/simple “how to” document. WARP, for example, has been continually tuning the WARPos stack for six years and counting.

In short: no warranty or support can be offered for non-WARP appliances. And customers who do have supported appliances are encouraged to call WARP rather than attempting to change low-level parameters on their own.


 

By default, ZFS on Linux (ZOL) definitely does the wrong thing with ZFS ARC.

There are two related problems.

  1. ARC can consume all available memory, thus leaving no RAM for applications
  2. Linux, left to itself, will then pseudo randomly kill running processes to free up RAM

Point #2 is a surprise to many people. It’s called the “OOM Killer”. It sounds weird when phrased as above, but there’s a legitimate reason Linux does this. Not everybody agrees with the trade off that Linux made back in the day, but it is a legitimate design trade off nevertheless. It can be “the right thing to do” for many Linux installations.

However, it’s never the right trade off to make on a ZFS storage server.

So the first thing to do is stop the OOM killer from acting this way.

In the file /etc/sysctl.conf, you can add things like this:

vm.swappiness = 100
vm.overcommit_memory=2
vm.overcommit_ratio=25
dirty_ratio=15

 

The above specific metrics might not be right for everybody.

The main item is vm.overcommit_memory. With this set to “2”, the system should never promise more RAM+swap to an application than actually exists in the server.

This will make some applications run badly, or not at all. But generally, those applications aren’t the types which get executed directly on a storage appliance.

For example, a large database might want to pre-allocate a (very) large amount of memory which it will never actually use… which is fine… but generally you would be running that application on a different server, rather than the storage controller itself.

The above setting example should tame the OOM killer. But if you leave ZFS ARC at its default, it may quickly become impossible to start new processes.

The ZFS “Adaptive Replacement Cache” might allocate all unused RAM, and with the above OOM-taming settings, the system couldn’t allocate any more RAM to other applications. You might not even be able to log in.

It’s good to give ZFS as much RAM as possible, as this improves ZFS performance and supports advanced features such as deduplication. Just, don’t give it so much RAM that the OS itself, and daemons such as SSHd, get starved.

In /etc/modprobe.d, you should have a file called zfs.conf. This should contain an “options” line.

Something like this:

options zfs zfs_arc_max=34359738368

 

…would tell ZFS not to use more than 32GB for ARC. (The big number above is 32-bytes times 1024-cubed.)

But how much RAM should you give to ARC? And it is RAM in this case. ZFS is never going to use swap for ARC.

The short answer is, as much as possible without preventing applications from running. ZFS always benefits for the largest ARC you can spare.

In WARP servers, there is typically quite a lot of RAM. Therefore we allocate the majority of RAM to ZFS. Lower end WARP systems might have 64GB. Higher end systems have 256GB or 512GB. In such cases, we might only reserve 8GB for the OS plus applications, and give everything else to ZFS.

But if you are doing a “roll your own” ZFS server, you may have much less RAM. Maybe you only have 4GB total.

In a very small server like this, it is highly unlikely that you will benefit from ZFS’s high performance capabilities anyway. And ARC won’t be able to cache a meaningful percentage of your active data set.

So for “RAM poor” installations, it’s best to allocate just 1GB to ARC, and leave everything else for the OS and applications.