I recently set up a Owncloud on an Amazon EC2 instance. I chose to put the data onto a separate EBS volume. Although EBS supports taking snapshots of a volume, you really shouldn't snapshot a live file system that isn't aware that snapshots can be taken of it. For that reason, I decided to use a file system with snapshot capabilities built into it. I am a huge fan of ZFS, so I got ZFS installed onto my EC2 instance and was good to go.
After about a month of usage, I noticed a performance problem. I SSHed to the VM and noticed that a process called spl_kmem_cache was taking all of my CPU. After some googling, I discovered that the process was related to ZFS'es RAM cache of the disk. ZFS L1 cache is stored in RAM and uses an ARC variant to swap out pages.
The problem is the ZFS L1 cache does not work well in a low memory environment. ZFS was designed for servers, and servers usually have lots of RAM. EC2 micro instances barely have more than 580MB of RAM, though. After consuming most of the RAM, the ZFS L1 cache started to swap constantly, causing the spl_kmem_cache process to use up all my CPU. No RAM and no CPU makes Homer something something. Go crazy? Don't mind if I do!
I read about various L1 cache tweaks that you can set in the /proc file system. None of those helped the situation. I almost gave up hope until I decided to look up disabling the L1 cache. By running the command zfs set primarycache=metadata <poolname>, I disabled L1 caching for data (metadata is still cached). After making the change, my VM came back to life.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.