My data storage & backup strategy
By email, Dylan asked about my data storage and backup strategy. In answering, I thought it'd be useful enough to formalise here.
The 'problem'
Here's what I'm working with. The scope of this is mine & Lucy's personal stuff, and the Johnny.Decimal business.
We each have a laptop. Mine only has 500GB of storage so I'm severely limited. Lucy's has 1TB but this is still smaller than the … checks … ah not quite, 787GB that is the entire D85 Johnny.Decimal
business folder.
That folder contains all of the raw footage from the workshop. That's 641GB just there. I probably don't need to keep that now, but whatever. I certainly don't need to keep it all on a laptop.
So we also have a Mac mini. The 'server', even though it just runs plain macOS. The point is that it's always powered on.
Mac minis are great for this. I ran a 2010 version until 2023. It was sold as a server, back when Apple did that. 13 years isn't bad from a ~$1,000 computer.
I replaced it with a refurb M1 mini, lowest spec. Also ~$1,000. Should also last well over a decade.
These machines don't need to do much. No data is stored on them -- we'll get to that. Processor wise, they do next to nothing. They serve files. So don't spend money on anything fancy.
NAS vs DAS
Okay, so where's all the data? The difference between NAS & DAS is both subtle and vitally important. I'll assume you're not a storage expert, so I'll spell it out.
'NAS' stands for 'network-attached storage'. This means that the device itself attaches to your network. It is a tiny server: it has its own smarts. Synology and QNAP are the ones you've heard of.
The advantage of these devices is that you don't need another server. Like a Mac mini. They do it all themselves: you configure them on your network, then you can connect to their storage from any computer. That can be really convenient, and I still have an old Synology in my setup.
The downside is that if you want to configure them to do anything special, you're dealing with a device that is usually underpowered, whose software is not macOS or Windows. It's some specialised Synology or QNAP thing.
So if you want to run, say, Syncthing -- we'll get to all of this later -- then you're depending on that software having a Synology version. Not everything does, and when it does, it's often a cut-down variant. This can be limiting.
For these reasons, I moved away from a NAS when I bought the new mini.
DAS
DAS stands for 'direct-attached storage'. It's storage that is directly attached to a computer. It's basically an external hard drive. And you can just use an external hard drive from your office supplies store.
But if this is your central data store, you probably want something a bit more advanced. Hard drives get hot, so something with a fan is nice. And you can get units that take multiple disks, which can provide redundancy in case one of the drives fails.
(It depends how you set these disks up, and this is as technical as I'll get here. Look up RAID levels if you want to know more.)
So I have a LaCie 2big 16TB DAS that is plugged directly in to the Mac mini via USB-C.
The LaCie doesn't do anything by itself. It requires a server. But now that server is a fully-featured computer, and I can install whatever I want on it.
So where's all the data?
Let's recap. Johnny has a 500GB MacBook Air. Lucy has a 1TB MacBook Pro. And there's an always-on Mac mini whose internal storage is insignificant because it has 16TB of HDD plugged in the back. I'll call the mini 'the server' from now on.
Technically, we could store all of our files on the server and access them over the network. But this would be slow, especially over wifi. Ideally, you want the things you're using all day to be on the machine you're using.
This is where synchronisation software comes in. Dropbox is the one you know: you install it, point it to a folder, and it synchronises all of those files. If you want them on another computer, you just install it there and wait for them to copy over. As a bonus, now they're also in the cloud, and you can log in to a website and access them from anywhere.
This is a great technology but it comes with limits. What if me and Lucy both edit the same file at the same time? This causes a 'conflict', and there's not much you can do about that. Dropbox can't merge our Excel sheets, it's just too hard. So that's just something to be aware of.1
Syncthing
The secret sauce is an amazing piece of free software: Syncthing.2
It's like Dropbox, but completely configurable. You get to say what synchronises from which computer to which other computers. And you get granular control down to the folder or file level.
You also get to control what happens to the file on this computer after a new version is received from that computer. This is really handy. On the server, I've got it configured to keep versions of each file, which it deletes as they get old. It'll keep … well, let's just quote Syncthing:
The following intervals are used: for the first hour a version is kept every 30 seconds, for the first day a version is kept every hour, for the first 30 days a version is kept every day, until the maximum age a version is kept every week.
So as Lucy is working on the small business system, every time she saves the document, it's synchronised to both my laptop, and the server. The server is then applying the 'retention policy' as described above. This is one form of backup: if Lucy accidentally deletes all the text in the document, we can just grab a previous version from Syncthing. Super handy.
Our Syncthing configuration
In a nutshell: the server has everything, and we each have most stuff, minus the massive folder of workshop video files. There's a bit more to it, but that's all you really need to know.
Start picturing 'blobs' of data
What's important is what this means for our data. When you're planning something like this, you need to have this picture in your mind of:
- What your data is, and
- Where your data is.
Johnny.Decimal makes this easy for me to think about. Each blob of data -- the minimum unit of 'my data' that I think about -- is a Johnny.Decimal system. I have:
D85 Johnny.Decimal
(the business)D01 johnnydecimal.com
(the website)P76 Johnny's personal life
L77 Learn with Lucy
(the Excel course)Z99 Archive
some old long-term archives, including some data that isn't mine
I know that all of this data is on the server. That's really important when it comes to backups, later. It's so important, it's a non-negotiable: all data must always be on the server. Then I know that if both laptops fall in the ocean, nothing is actually lost.
I also know that Syncthing is synchronising the important stuff that we use every day to both laptops. And those laptops synchronise to each other.
This is important because if one of the laptops falls in the ocean, it'd be nice to be able to access our important daily stuff quickly. We can do that from the other laptop. And when we get a replacement machine, the two laptops can talk directly to each other. The server, physically far away, is a last resort.
So that's the day-to-day synchronisation of data. Syncthing is indispensable. It's complex, but worth getting to know. If you need any help, ask.
Backups
Backups? Didn't we just talk about backups? All these copies of your data all over the place on three machines?
We did not. Synchronisation is not a backup.
Read that again. In bold. Synchronisation is NOT a backup.
Because synchronisation -- wait for it -- synchronises everything: including you messing up some file and not realising it. Including you deleting some folder and not realising it. So you MUST also have backups.
Nobody said this was simple. Alright, backups. When you think of backups, think of the event that causes you to be glad that you had it. They get progressively worse. Let's simplify and say you're always at home, and not about to be globe-trotting like we are.
1: Your laptop falls in the bath
Bath, ocean. Laptop wet, laptop no good. In this scenario, you're in your house, you have a new laptop, and you need to get working quickly. You want a local backup that you can restore from.3
(In my situation, I'd try the re-synchronising first; but let's say you don't have that option.)
Your operating system has software built-in: Time Machine for Mac, Windows Backup for the other one, and you Linux nerds can figure it out yourself.
You should probably just use this. Personally I also use Arq but we don't need to go there. Different software, same result.
2: Your backup didn't work
It is not your day. You got your backup drive, tried to restore to the new, dry laptop -- and it failed.
Disk error. Can't read. Backup error code FKU390093-B. Cosmic rays. Whatever: backups also fail.
Lucky you have a second backup on a different disk. This is why I use Arq: it makes it really easy to connect to another machine and to create a backup there. So I have one backup on this little external SSD, one on the server, and another on an old Synology. Multiple backups on multiple storage devices.
But we're not finished.
3: The house is destroyed by a cyclone
So now everything's gone. Laptops, servers, hard drives, the lot. Really really unlikely, but it happens.
This is what the cloud is for. Ironic, as it just wiped us out. Ha ha. I pay for a cloud backup service that I never hope to use. Literally, if I go my entire life and never ever have to restore from the service that costs me about a hundred bucks a year, I'd be happy.
But the day you do, you'll be glad for it. So: use Backblaze. Just do. Now, go and sign up now.
3-2-1 backup strategy
This is the industry-standard way to do things:
- Three copies of your data.
- On two different media.
- One copy off-site, i.e. cloud.
Backblaze, and NAS vs. DAS
We talked about NAS vs. DAS above for a reason. Backblaze is amazing: unlimited storage for ~$100/year. With a catch: it only includes DAS.
Backblaze will not back up your Synology for $100/year. It's a miracle that they will back up your LaCie 16TB for that. So this is definitely a factor when deciding what to buy.
Oh yeah, that Synology
Because I already had a Synology -- an old DS118 single-drive unit -- I'm using it purely as a backup target. Both laptops and the server back up to it, using Arq.
This is probably overkill. If I didn't already have this, I wouldn't buy one for this role.
Review
Let's review with a little diagram. I have no computer drawing skillz so here's one I did on paper.

Now, yours won't look anything like this. Don't just copy me. But make sure that you have this mental model of your data. What blobs are there? Where are they? Which copies are complete vs. partial? Local vs. cloud? Synchronisation vs. backup?
And if you need any help, ask on the forum.
Tailscale
There's a secret sauce here which I'll mention briefly.
The server was in the cupboard in the kitchen, but since deciding to go on the move, it needed a new home. So it's now at my mate Alex's house in Melbourne. Thalex!
Ordinarily this would have broken all sorts of stuff and required complicated network reconfiguration. But I have Tailscale permanently turned on, on every device, so I had to do: exactly nothing.
I turned the server off, gave it to Alex, he took it and the LaCie and the Synology home with him, he turned them on, and everything just works like it did. Albeit a touch slower, as they're now about 700kms away. Only about 40ms of network latency though, which is impressive.
I couldn't recommend it more. Again, too much detail for this post, but let me know if you need help.
Footnotes
-
Most sync services will rename one of the conflicted files, giving it a timestamp and putting the word 'conflict' in the filename. Then it's up to you to merge your conflicting versions. You'll never actually lose data. ↩
-
You should financially support 'free' software that you depend on. Because nothing's really free. As soon as we can afford to, I'll be sponsoring Syncthing. ↩
-
Computer terms. 'Local' = on this network; in this building. 'Remote' = not. ↩