Backups: the good, the bad, and the ugly

· 5 min read

If you think something is valuable, then you should back it up.

For over a year now, I’ve been using Kopia to back up my Nextcloud installation and other important homelab files into S3.1 Kopia is an incredibly powerful and underappreciated tool that should be equipped more often in the toolbelts of amateur sysadmins and open-source nerds. To me, Kopia is to backups as WireGuard is to VPNs.

But things were a lot rougher before I smoothened my process out.

The bad and the ugly🔗

None of the programs that I’m about to describe are necessarily poor choices, they’re just incompatible with my specific use case and requirements. However, I suspect that my goals will resonate with other homelab enthusiasts.

Duplicity🔗

I first used Duplicity, but it felt rather dated. Indeed, the program has been around since August 2002, so one of my early obstacles learning how to use Duplicity was navigating stale 20 year old documentation.

The big catch is that you can’t run incremental backups indefinitely without compromising the performance of restoring files, so every once in a while you will need to run a full backup. You then have to delete your full backups to prevent them from bloating space. This adds unneeded complexity to a script. Multithreading is also not supported, which strangled my upload throughput to roughly 2 Mbps.

Duplicati🔗

Seeking something better, I then discovered Duplicati, another backup utility that comes with a web UI. Unfortunately, the UI was often unresponsive or loaded an empty dashboard, which was alarming since I feared my data was lost.

Frequent crashes furthered my distrust of the program in case I ever needed to rely upon it to restore data. There are some horror stories circulating around various forums about Duplicati corrupting data or just displaying poor restore performance, so over time I got frustrated with my unreliable experience with Duplicati and decided to move on.

Duplicacy🔗

I briefly used Duplicacy but I wasn’t comfortable paying licensing fees to solve a problem that ought to have a free and open-source solution. This said, it was straight forward to use and I enjoyed being able to backup data to Google Drive.2

Timeshift🔗

I used Timeshift once to backup a server but was unable to actually restore the data afterwards.3 This was very sad.

Retrospective🔗

For reasons unknown to me, most backup programs for Linux fall short in one crucial way. I had a list of wishes that grew with every deficiency I came across:

  1. no nasty surprises when I had to restore something
  2. no weird user intervention required to maintain backup health
  3. support for backing up to a major cloud provider
  4. solid CLI support for scripting
  5. encryption
  6. compression
  7. dedeuplication
  8. support for ARM architecture

Of course, I wanted something that was designed to tackle all of these things with ease. There are a ton of programs that you can work with, but I suspect that the illusion of choice is in full swing here and there are only a few truly practical solutions for a long term setup. For this reason, I considered using Borg or restic, two very popular backup programs for Linux users, but deciding against the two once I found myself having to plan workarounds to fit my above needs.4

The good🔗

Kopia describes itself as

Encrypted, deduplicated, and compressed data backups using your own cloud storage

That tagline already fulfills a lot of my goals — off to a good start already.

It gets better:

From anecdotal experience, Kopia is the fastest backup program I have ever used on my ARM servers.6 Niraj Tolia and Julio Lopez benchmarked Kopia along with restic, and found that Kopia was up to 700% faster and 20% more space efficient, and continues to show increasingly superior performance as Kopia matures.

On top of all the good things I have already listed about Kopia, one understated strength is how ergonomic it is when used from the command line. For instance, it uses a policy system similar to .gitignore to exclude certain files or file extensions, which removes a lot of the tedium when, say, writing a Bash script to automate backups. Keep your data safe, and happy homelabing.


1

Backblaze B2 is cheaper, but university credits turn S3 into an offer you can’t refuse. I’ll figure something out when my university credits are depleted or expired.

2

Which has practically no storage limits if your university pays for Google Drive.

3

Timeshift is often recommended to new users seeking a basic backup solution with a decent GTK user instance. Unfortunately, the maintainer seems to have abandoned active development on the project. The project GitHub page is consequently filled with various unanswered bug reports and pull requests.

4

At the time of writing, Borg doesn’t have native support for cloud backends and restic doesn’t offer compression.

5

The web interface is a super convenient way to retrieve a versioned copy of a particular file, almost like an open source version of Time Machine on macOS.

6

ODROID-HC1, which are not super powerful