Once More, With F®ee(BSD)ling
Previously on this blog, I wrote about buying some new NAS hardware and installing TrueNAS. I expressed some mild discontent with the fact that TrueNAS is now based on Linux and not FreeBSD (like it used to be), but part of what I wanted out of a NAS was that it be an appliance I didn't have to manage as intensely as I would if I installed everything myself. Joke's on me.
For several months TrueNAS worked more or less the way I wanted it to. In my last post on this subject I addressed the pain points I encountered when setting it up, but as I used it more I kept encountering issues that were harder or more annoying than they should have been. I could usually figure out how I needed to misuse the control interfaces to make something work the way I wanted to, and I ended up with a number of little things configured under the advanced settings interface (like how to get my SuperDrive to work in a way that survives system updates or how to load the right module for the hardware watchdog).
But the reason I needed to configure the hardware watchdog at all was that it was crashing occasionally. After a few crashes I got the hardware watchdog working, and for a few weeks the watchdog would restart the box whenever it crashed. It didn't last. Eventually it started locking up in such a way that the hardware watchdog never restarted it, and no amount of BIOS resetting fixed it. There was nothing useful in the system logs, so I started to suspect that the crashes might be a hardware issue, and I contacted UGREEN.
UGREEN, however, will only provide active support for their own software (UGOS), so I backed up my TrueNAS configuration and installed UGOS, which wiped out the startup drive in the machine. That's fine as far as it goes, since I had a backup, but UGOS doesn't support ZFS. That means that the entire time I had UGOS running I didn't have access to my existing files or services that depended on them (which is, uh, all of them). I ran UGOS for a few days and encountered no issues, but the system wasn't under any kind of load so it wasn't really a great test. After some back and forth with UGREEN and a failed attempt at imaging the startup drive with UGOS installed (so I could easily restore it), I gave up and made an emergency purchase of a new SSD so I could put TrueNAS on that and keep the UGOS SSD around to swap back in when I had more time to test it.
And once again, the box crashed while running TrueNAS, and the watchdog didn't restart it. At least the fact that TrueNAS was installed on a different SSD ruled that individual component out as the root cause, which was more information than I had before. When I had some downtime I put the UGOS SSD back in the box, and once again it ran for several days without any errors. Meanwhile it couldn't seem to make it 48 hours without crashing under TrueNAS.
Why was it crashing? Is there a hardware issue that's only exposed by TrueNAS? Some, maybe even most, of the crashes happened when the system was essentially idle, so it's hard to point the finger at hardware that fails at idle but only with certain software installed. Is it a software issue? Lots of people run TrueNAS, and I couldn't find much evidence that other people were having the same problems. Well, there was one guy, but he never posted a followup with any resolution (argh). Is it a problem with some containerized app I had installed? If so, how do I even figure out which one without evidence in the system logs? I don't really feel like digging through a dozen different containers' log files to try to find a culprit. I don't even like Docker anyway!
So, since I tend to prefer FreeBSD to Linux anyway (previously), last week I made one last backup of my TrueNAS configuration, wiped the (new) SSD, and installed FreeBSD on it. Since then I've been adding back all the services I had under TrueNAS, just mostly* by installing them directly and configuring NGINX by hand. I've rebooted it a bunch of times over the past week as I've made configuration changes, mostly to make sure that services all come back up after a restart, but it hasn't crashed once. Knock on wood.
* Two of the services are containerized, though. I put Nextcloud in a jail because I've dealt with PHP and PostgreSQL package weirdness on FreeBSD before, and I wanted to isolate any dependencies or fragile bits, and a jail seemed the best way to limit the scope of those problems. And Immich is pretty much Docker only, so I had to install Linux in a virtual machine, then install Docker in the VM, and use the Docker in the VM to install Immich in a container. I might not have bothered to get immich up if it hadn't been the single hardest thing about setting up TrueNAS, and I wanted to save all the effort that went into getting my photo library up.