Currently there is this mega thread VM became extremely slow after upgraded to macOS 10.14.6
2019/08/26: Apple just released a fix for this problem, see also https://www.macrumors.com/2019/08/26/apple-tvos-watchos-macos-software-updates/ So far the reports in the mega thread are positive. “It’s solved on Apple 10.4.6 supplemental update build 18G85.”
2019/08/09: VMware has pushed out an update that mitigates the issue when your VM is encrypted. More details by VMware employee ksc (at the bottom). Note that this does not yet fix all issues!
2019/08/02: There’s an update from VMware .. see at the end of the post.
The latest update from apple for macOS 10.14 (and macOS 10.15) changed something in such a way that Virtual Machines running under VMware Fusion 11 now suffer serious performance problems. So much so that using those VMs becomes very hard to impossible.
After reading the forum it appears that the following factors make things worse:
- Using an encrypted VM
- Using an encrypted file system at the host (APFS)
What is happening?
After the update, the VM is triggering a lot more disk reads/writes than before, by a factor 100 if you are unlucky. Encrypted VMs are hit harder because of lock contentions. The best explanation of what happened can be found at Reply 85. in the thread.
Here’s a partial quote (more details in that reply)
There is a significant behavioral change in MacOS’s usage of pinned memory. Exact change in behavior is unknown, but the net effect is that we [red. VMware] significantly underestimate the MacOS’s memory needs. As a result, we don’t back off memory pressure soon enough, and page-out times spike unusually high. We’re still investigating exactly what changed so we can address it directly.
What can you try in order to get back to a VM that will work again?
- If your VM is encrypted and you are able remove the encryption, try removing that encryption. Some people replied that removing the encryption from the VM was enough to get back to a usable VM.
- If you are using snapshots, commit the snapshots. If you use the “auto snapshot” feature, disable it for now.
- This one is a bit weird, but for some people it helped? In macOS Settings -> Security & Privacy > Privacy (Tab) -> Uncheck VMware and check it again.
- Disable Harddisk buffering on the VM advanced settings. This helps some people, but not everyone.
The best results so far have come from the following two tweaks (both offered by VMware developer ksc) :
- Try lowering the RAM of your VM significantly. Setting it to 2GB appears to work for most people. But if your host has more memory you might be able to use a higher value.
- Reduce total VM memory usage.
- In the file “~/Library/Preferences/VMware Fusion/preferences”,
add these two options:
prefvmx.useRecommendedLockedMemorySize = "FALSE"
prefvmx.allVMMemoryLimit = "4096"
The first prefvmx option enables the second. That second value is the maximum memory to be used across all VMs, in MB. It can be set smaller than actual VM size.
Note: The above 2 options SHOULD BE REMOVED from your preferences file AFTER INSTALLING the update from VMware Fusion 11.1.1, they are no longer needed and less effective as the update.
For changes to the preferences to take effect you have to first shutdown your VM(s).
Followed by shutting down Fusion and then edit the preferences file.
Hopefully an update will follow soon so that these tweaks are no longer needed.
Official update… root cause identified and reduced to a small sample program. Still investigating workarounds; until we ship an update, smaller VM sizes or decrypting is going to be the best option. No ETA yet.
(I just reported this root cause to Apple about 15 minutes ago, so I’m almost certain it’s not related to their supplemental update.)
The root cause, if curious, is the behavior of paging memory from an unlinked file. Posix reference counts file handles independently from the on-disk entry, so it’s possible to create and open a file, make it a certain size, unlink it, then mmap the file handle to a chunk of memory. The operating system will still page to and from the file; when the file handle is closed, the operating system will release the disk blocks too. It’s an old-school UNIX trick to get shared memory that automatically gets cleaned up after the last process using it exits. On 10.14.6, accesses to the mmapped parts of the file get slower … and … slower … over time. If the exact same file is not unlinked, accesses slow down some but are still very usable.
Fusion uses this unlinked-file technique for encrypted VMs. (Normal VMs are memory-mapped to the .vmem file, but that obviously doesn’t work when the file is encrypted).
Long-time VMware users who have used Workstation may remember the “mainMem.useNamedFile” option. On Linux and Windows, we had the opposite problem: a memory-mapped file was slow, but switching to an unlinked file (sometimes) made things much faster, especially when Linux re-tuned the memory manager. This is the first time we’ve seen MacOS re-tune the memory manager, and having the first time be in a dot release was quite a surprise. Alas, that option won’t do any good here: encrypted VMs cannot use a named file so ignore it.
Another official update…
Apple has confirmed reproduction and identified the bug on their end. No timeline for a fix shared yet.
In light of that, we’re also working on a workaround. “Days but not weeks”, can’t be more specific because the workaround is not yet written.
A few more things we know now:
- Affects hosts with >=4GB and <32GB of memory. (Yes, really).
- Triggers after 2GB of modified data is cached, regardless of file. Once triggered, it stays triggered as long as the file is open.
- So why does this affect encrypted VMs more than unencrypted?
- Due to different code paths, this cache bug only affects emulated device I/O. The virtual CPU is not affected.
- Encrypted VMs are much more sensitive to additional I/O costs, and “touch” memory many extra times (due to crypto internals).
- Some benchmarking I did suggests we get ~4MBps touching memory backed by an affected file in normal usage, but crypto’s high-touch access pattern drops this to ~0.2MBps or worse. In comparison, an SSD on random-read workloads is 75-100MBps, a HDD is 1-2MBps. So an unencrypted VM “feels” like HDD speed, and an encrypted VM “feels” 10x worse than an HDD.
- In theory, this should affect everybody who is writing >2GB to a file. In practice, most applications writing that much data to a file do so sequentially, which makes it easy to write out modified data fast enough that the bug never triggers. Virtual machines … have to emulate RAM, which has a very different access pattern.
Really gory details, for the technically inclined … we know where the actual bug is. This is a link to an (older, 10.14.1) Apple source code drop.
This is a function that keeps a hash table of blocks used from a file; the hash table automatically grows once it reaches a certain size. Apple appears to have “fixed” the XXX comment above the function and added a 32GB case (in addition to the 4GB case already present) … but their new case has a bug that re-allocates the entire hash table every time.
Once they release sources, I expect the new fix will be a 1-liner to add a return statement in their new code, just like line 6643. Oops
Some good news…
We’ve just released Fusion 11.1.1. It hasn’t hit the auto-update mechanism yet (and it will), but I do see it downloadable through the Fusion download page.
This has a workaround for encrypted VMs; they are usable again across all memory sizes. I’ll call the workaround “good” but not “great” (and since I wrote it, I can describe my work that way…).
First, to do this we had to weaken the encryption properties slightly – there will be a temporary, unencrypted file in $TMPDIR, which gets cleaned up when the VM turns off. This is a slight weakening of the security model in that now the user can peek at their data (whereas before, only ‘root’ could); but we felt that this weakened mode was better than leaving such VMs completely unusable.
Second, there still are some performance issues with MacOS 10.14.6 regardless of encrypted VMs, particularly related to heavy graphics usage. They are much more recoverable – wait tens of seconds and the VM will be usable again – but we still aren’t happy with overall behavior. (And the “set memory to 2GB” workaround still applies). We’re actively looking into other options, but I expect we won’t have any better options in less than several weeks. All the remaining ideas are invasive enough to need much more testing.