I am getting unexpected behaviour when my machine runs out of memory.
I have a Intel i7-6700 with 32GB of RAM and I’m running Arch Linux with vanilla 4.14.8 kernel. I have a 32GB swap on a encrypted LVM volume on SSD disk.
During normal operation I run a couple of QEMU/KVM guests, along with other stuff (XFCE, Firefox etc.). The normal memory usage is about 20-30%, with almost no swap.
But when I run something memory-intensive (e.g.
7za a -md=29 to compress a large file), the system hangs/freezes when memory usage gets to 100%. The keyboard and mouse stop responding completely, display freezes, disk activity stops, and any TCP connections to machine hang in SYN phase. The only way to recover from this situation is to power-cycle the machine.
In the moment just before hang, one can see that virtually no swap space is being used. Of course, swap is enabled, and I am not using any particular sysctl settings related to memory (in particular, my vm.swappiness has the default value of 60).
What I don’t understand is this:
- Why doesn’t the kernel use the swap space?
- Why does the oom-killer not kick in when memory is exhausted?
I am not a kernel expert, but as I understand it, the system is not supposed to freeze/hang when running out of memory. What I would expect to see is this:
- When there is swap space available, no process should be killed until both memory and swap are consumed (in my case, 64G).
- Even with no swap, oom-killer is supposed to kill
7zawhen memory runs out
- Even without both of the above, any process trying to allocate more memory than is available should get an error and fail gracefully.
So there are in fact 3 independent mechanisms to prevent running out of memory, but all of them appear to fail. I realize there might be some subtle issues I don’t know about (i.e. memory ballooning in guest VMs, locked memory etc.), but I can really not think of anything that would explain the behaviour I am seeing.
Can somebody explain what is going on here and why? Am I just missing something? Can I do something to deterministically prevent hanging?
I ran some differential tests and I’ve found that:
- Encrypted swap on LVM volume => machine freezes.
- Encrypted swap on partition => everything OK (swap gets used as expected, machine does not freeze).
It would appear that the problem is somehow related to LVM. I have used the same physical partition in both cases, so it’s not disk-related either. During tests, i’ve left vm.swappiness to 60 (default).
Just as a side note – during one particular test, I’ve noticed that in htop, one “notch” appeared in swap bar just before machine froze. So the kernel actually started to use swap, but it only lasted for about 3 seconds.
The problem should be easily reproducible.
For anybody following up on this, I determined that the problem is specific to using swap space on top of LVM (encrypted or not). This was tested on 4.x kernels, and I was not able to avoid this hangs by tweaking sysctl parameters. I have no info about 5.x at the moment. It seems like a kernel bug to me.
I’ve seen a similar result happen – but the problem isn’t lack of memory; it’s a process that eats up space in the root partition/volume.
E.g. Commonly this could be excessive writing to /tmp, or other file system in /. The kernel will swap out anything it can (which isn’t much) in an effort to store the unwritten memory in RAM buffers. Fairly quickly this will fail and everything grinds to a halt.
Normally you would get warning messages issued – but you may not see them for an especially storage-greedy process.