So, here's the thing about distributed storage: it's the reason I spent sleepless nights wrestling with Longhorn, watching my infrastructure costs explode, and ultimately migrating 650 Ghost sites to CephFS.
And you know what? I'd do it all over again.
Let me explain why.
The Single Point of Failure Problem
Imagine you're running a managed Ghost hosting service, like me. A creator who just hit publish on a newsletter that is sent on the same day every week. The site is running smoothly. Everything is fine.
Then the server decides to take a nap. Maybe it's planned maintenance, maybe the hardware just decided today was a good day to retire. Who knows? Servers are like that sometimes.
Thankfully, things are set up to run in a cluster, right? One server goes down, another one is there to take over and serve the Ghost site.
But here's where it gets messy.
Without distributed storage, that Ghost site can only ever truly live on one server at a time. Sure, the orchestrator can spin it up on a different server when things go wrong. But the new server is starting from scratch. All the images that were uploaded? Gone. The custom theme tweaks? Vanished. Hope you have a backup.
And yes, there are backups. But the ugly truth restoring from a backup: it's slow. Painfully slow.
The average Ghost site on Magic Pages holds about 2GB of data. That's manageable, right? Now consider the largest site, sitting at a whopping 75GB. Ever tried to restore 75GB from a backup? Even server-to-server within the same data center, you're looking at...well, let's just say you don't want to be in that situation.
The Two Components That Need to Survive
Every Ghost site essentially has two parts that need to stay alive when a server goes down:
- The database - This is where all your content lives. Posts, pages, members, settings, etc.
- The file system - Your themes, every image, video, or other files you've ever uploaded.
Now, databases? Those are the easy part. At Magic Pages, they live in a completely separate cluster that replicates itself continuously and backs itself up regularly. When a server hosting a Ghost site goes down, the "new" server just reconnects to the same database cluster. Your content is there, waiting, like nothing happened.
But the file system? That's where things get...interesting.
When I first encountered this problem, I thought I was clever. "I'll just set up an NFS server!" I told myself. One server, serving files to all Ghost sites. Network-attached storage. What could go wrong?
It worked! For a while. All Ghost sites could access their files from this central NFS server. When a Ghost container moved between servers, it would just mount the same NFS share. Problem solved, right?
I mean...kinda.
What I'd actually done was create the mother of all single points of failure. If that NFS server went down, it wouldn't take out one site. It would take out every single site on the platform. Thankfully, it never came to that worst-case scenario.
But once I realised this, the NFS server became this terrifying monolith in my infrastructure. The one thing that absolutely, positively could not fail.
Distributed Storage
This is when I realised I needed true distributed storage. Not just network-attached storage, but storage that exists in multiple places simultaneously.
There are a few options out there. Ceph is the veteran – battle-tested, robust, but with a reputation for complexity. So when I discovered Longhorn, with its promises of being "Kubernetes-native" and "easy to operate," I thought I'd found the answer.
You can read about that journey here, but the short version is: Longhorn delivered on being easy to set up. It replicated data beautifully across servers. It was indeed Kubernetes-native.
It also consumed memory like crazy, but we already covered that in a previous post 😂

Why Do I Still Believe in Distributed Storage?
After all that pain, you might expect me to say "forget it, distributed storage is overengineered nonsense for a small hosting provider."
But no. Even after the sleepless nights, the exploded costs, and the Longhorn-induced trauma, I believe distributed storage is absolutely essential for what I'm trying to build.
Here's why:
Without distributed storage, you're always one hardware failure away from a customer asking "why is my site down?" With it – when it works – that same hardware failure becomes a non-event. The orchestrator notices a server is gone, schedules the Ghost site on a different server, and that new server connects to the same distributed storage where all the files are safe and sound.
The site is back online in seconds. Not hours while you restore from backup. Seconds.
After the Longhorn disaster, I did what I should have done from the beginning: I bit the bullet and set up Ceph.
Yes, Ceph. The scary one. The complex one. The one everyone warns you about.
Turns out, it took me about 2 hours to set up:

The key point is this: Ceph just works. It's boring in the best possible way. My files are replicated across three dedicated servers. If one fails, the others don't even blink. It's kinda like that original NFS server, but well...replicated.
The Philosophy of Over-Engineering
Yes, as a small Ghost hosting provider, distributed storage might seem like overkill. I could run everything simpler, cheaper, with less complexity.
But here's what I've learned: infrastructure isn't about what works on a good day. It's about what survives the bad days.
Every Ghost site on Magic Pages represents someone's passion project, their business, their connection to their audience. When they hit publish, they're trusting me with something important to them. The least I can do is make sure a random hardware failure doesn't break that trust.
It's been a journey, and it has involved sleepless nights. But that's a trade-off I'm willing to make.
(Though seriously, just use Ceph. Don't be like me. Don't try Longhorn first. Learn from my mistakes.)