Managing personal documents, photos, music, videos, passwords, lists, notes, etc, has always been a fascinating challenge to me. It got deeper when I started to use multiple devices, including computers running different operating systems and even multiple smartphones. Add to that the need to safely back it all up—and effortlessly recover in the event of drive failures—and you've got a real puzzle.
Over the years, I've gotten pretty good at juggling all this.
I started with a system of distributed filesystem replicas kept in sync on-demand using Unison file synchronizer. Each of my devices had a full or partial replica of all of my files, meaning that if one device died, I could just replace it, then copy my files from any (or several) of my other devices, including several backup drives I kept in sync. This worked well, except that I had to remember to manually sync the files, and the backups weren't incremental—once I changed or deleted a file, the original was gone forever. (Kind of. Part of the robustness of the system was that I didn't keep all my devices up to date, so if I needed a file from a few weeks or months ago, I could go digging through out-of-sync devices for it.)
I operated on that system for years, and it actually saved me a number of times. I got to the point where I could have a hard drive melt down on me, I could boot off my backup and be up and going again in minutes. I could keep working off the external drive as normal until a replacement drive came, and then I could simply pop in the replacement, install an OS, and copy the files off my backup and be up and running in an hour or two as if nothing had happened.
I did eventually start to want incremental backups, though, and I didn't want to have to think about syncing all the time. It was about that time that syncthing really started to get good—and also when my dad bought me a Raspberry Pi (raspberrypi.org)—and I decided it was time to build myself a cloud.
Above all else, what I wanted to was simplicity. I wanted to be able to access any files from anywhere and do whatever I wanted with them. I didn't want it to depend on me having my computer, or having some certain software installed (minus the basics), or having to mess with some clunky web interface to get at my files. I wanted to be able to boot any old computer into a live linux off my thumb drive and get at my files.
I also wanted safety. I wanted to be able to mess with stuff without thinking too hard about it, but still be able to recover if I screwed something up. (That's why I needed incremental backups.)
I didn't want my work to get trapped on whatever device I was on. I wanted to make sure that I didn't have to think too hard to get everything synced before shutting everything down each night. But I also didn't want to have three million gigabytes of backups that I couldn't get rid of.
With these goals in mind, and with a lot of prior experience using my old system, I was able to devise a system that has worked splendidly for 6 months now. Here's how it all works.
After trying a p2p system for a little while, I quickly realized that the model wasn't suited to personal file synchronization. I didn't keep my devices powered on everywhere, so by default, when one device was on, the others were off. I tried using my cell phone as a ferrying device for a while, but of course I couldn't fit all of my files onto my cell phone, so the model just didn't have a solution for complete synchronization. Thus, I chose to go back to a hub-and-spoke model.
My previous objections to this were environmental: I detested the idea of having a computer humming away at 40 to 80w just to give me access to my hard drive. Fortunately, the Raspberry Pi solved that problem. Weighing in at just 3.5w, it has proven wonderfully efficient and (just barely) powerful enough to handle the load.
I went back and forth about this. The Unison solution actually worked really well. The only problem was that it wasn't a continuous-sync solution. At the same time, I didn't want to have to have syncthing installed on everything just to get at my files. In the end, I decided that the system would be a combination of the two models. Despite some unresolved problems with syncthing (battery drainage on my phone and some frustrating constant sync conflicts, depsite my hub-and-spoke model), I have it running on all of my devices and keeping everything up to date with my Raspberry Pi. At the same time, I've taken pains to organize my files in a way that allows me to easily download chunks via rsync—and re-upload them the same way when I'm done. Syncthing then picks up those changes and pushes them back out to my devices like normal.
The one detail that's important to note is that I'm tracking each of my top-level folders as a separate syncthing sync. That is, instead of having one single syncthing folder "Kael's Files", I have 14 syncthing folders, "Documents", "Current", "Music", "Pictures", "Videos", etc.... This lets me sync lighter-weight subsets of my files on devices that don't have a lot of onboard storage.
Backups were the next challenge. rsync is typically touted as the linux world's equivalent of TimeMachine (the native mac backup utility), and indeed it's what's powering my backups. However, it wasn't quite as usable as I wanted....
Now, a million people have made wrappers to try to turn rsync into a more viable backup solution. I acknowledge that, and also admit that I didn't dig too deep to try to figure out if any of those million wrapper scripts did what I wanted. But I've been really hooked on this config-file-based approach to the command line lately, and I thought this would be an opportunity to see if I could create a command line backup utility that felt good to use.
What I came up with was the shamelessly named TimeTraveler.
While my ambitions for timetraveler extend beyond the simple creation of backups, for now, it does little more. When called, it runs rsync with some specific flags (including the crucial
--link-dest flag, which is what powers the incremental backups) and maintains a pointer,
latest, to the most recent backup.
I have timetraveler set up on my Raspberry Pi to make an incremental backup of my files every two days. I also have timetraveler profiles on my other computers pointing to my raspberry pi, allowing me to make the same incremental backups to various external hard drives on demand. This gives me the option to have geographically distributed incremental backups, while also keeping the interface consistent (i.e., on any computer, all I have to do is run
timetraveler backup main).
File System Layout
The final key to all this is the filesystem layout. There are actually two elements to this. The first is mobility. File hierarchies can be like roots that dig deep into your operating system and anchor you firmly to it. This makes it hard to recover when your equipment rots (taking the roots with it), or when you want a different OS.
The solution here is to move 100% of the files you touch out to a separate hard drive or partition (including stuff like system and application runtime config). This is kinda hard at first, but well worth it. I put all of my config into a folder appropriately named "Runtime/Config", then symlink it into the right places when I first set up my machine. This has the added benefit of propagating config changes across devices, if that's what you want. (If it's not, then you can just copy on setup instead of link.)
The second reason to pay attention to structure is to make access easier. If you have to sync down a terabyte of data every time you want to do anything with your files, then you end up pretty limited in what you can do. As it turns out, it's not impossible to arrange your filesystem in a way that both makes sense and makes it easy to grab small pieces of it on the fly.
So those are my two reasons for investing time in structuring my filesystem. Unfortunately, though, I've been refining this over the years and I haven't actually come to a hard conclusion on best practices. I have learned a few nice tricks, though, which I'll try to list here in brief:
The Old Windows Basics
The old Windows system was a good start. I still have a Documents folder where I keep all of my administrative docs and other things that don't change much (health records, notes, scans of drawing, poems, song ideas, etc.); I still have Pictures and Videos and Music folders... But I've made some essential modifications to these folders.
For example, my use-cases for Music and Videos were very specific. I wanted to sync the Music folder with my phone, but I had a bunch of Music I wasn't interested in anymore and didn't want on my phone. To handle that, I split Music into subdirectories: "Current" for stuff I might actually want to listen to, "Archive" for old stuff I'm not really interested in anymore (sorry Guns n' Roses....), and "Raw" for any CDs I might rip that I haven't processed yet. (I'm meticulous about organizing and tagging my music files....)
For Videos, I found I often wanted access to the dump of feature films that a friend gave me (about 60Gb), but I didn't often need to see all my old home movies from highschool (100+ Gb). So I split that up, too: "Feature Films", "Misc Snippets", and "Productions" (for videos I had produced in highschool, as opposed to ones where we just grabbed the camera and ran around town).
I didn't do anything special with Pictures. It's sort of an all-or-nothing sync right now. I've run into a few minor problems with that (mainly because I can't dig through my archives to find stuff on computers that don't have that folder synced), but it's not to the point where I'm hurting yet.
Also, I've found it's extremely useful to keep my Documents folder down to a reasonable size. Right now it weighs in around 500Mb (except that I just dropped my lifelong email archive into it at 3.2G, but I can ignore that in syncs that don't need it). I consider my Documents folder to be sort of my traditional file cabinet. As I mentioned it's where I keep all my important records, as well as stuff like recipes, notes, and developing ideas that haven't yet turned into formal projects. It's nice to be able to keep all this on my phone or on any other device for easy access.
A Few New Folders
One folder I've found is ALWAYS MISSING in these pre-built hierarchies is a "Projects" folder. I use this all the time. I'm in the process of breaking it down further, but so far I've found it useful to break it into broad project categories ("Music Projects", "Video Projects", "Graphic Design", etc...). One I've found particularly useful, though, is "Real Life". I put most things I'm actively working on in this folder, and I usually end up linking at least some of it into Google Drive (more on that in a minute).
In addition to "Projects", I've added "Jobs" for managing contract work in a dedicated space away from my personal stuff; "Current", which I use as a sort of staging folder or "scratch pad" when I run into something that I don't know where to put; "Reading", where I keep ebooks and other such things; "Runtime" (further split into "Config" and "Data"), where I keep application configurations and data that I use across various machines; "Repos", which I use as a sort of github mirror; and finally "Misc Syncs", which is the only top-level folder that is not directly synced. (Instead, I have several smaller miscellaneous syncs in there, like "New Cellphone Captures", which is a direct link to my phone's camera, and "Best-of Photos", which allows me to add arbitrary photos from my collection to my phone so I can show them to people.)
I have one more important top-level folder: "GDrive". This is a special folder. Anything I put in there gets synced to Google Drive via Insync, which I have running on my Raspberry Pi hub. It's pretty nice, actually, because I can symlink sub-directories from any other location in my filesystem into that and insync will sync it as if it's a regular folder.
I often use this feature in the projects in my "Projects/Real Life" folder. Some projects are exclusively on Drive, in which case I just link the whole project subfolder over. Others, though, have certain materials that I share in drive. For those, I usually create a subfolder within the project folder that I then link into my top-level "GDrive" folder. With that, I can symlink specific documents from the project into that for sharing.
Tying It All Together
Ok. Deep breath. Take a break.
Now let's tie this all together.
- My Raspberry Pi makes all my files available to me all the time, without breaking the power bank.
- My filesystem layout allows me to do some really cool things, like grabbing specific subfolders off my Pi via rsync and then pushing them back up when I'm done without disrupting my synchronization. It also allows me to sync important subsets of my files on machines (like my little netbook) that don't have a lot of onboard storage.
- Syncthing does the work of keeping my various devices in sync. (It does a pretty good job, though I have some complaints.)
timetravelerbackup utility allows me to make lightweight incremental backups of my primary fileset, both locally and remotely, routinely and on-demand.
- Insync running on my Pi allows me to keep a folder synced up with Google drive, which is double cool because I can drop items into my GDrive folder from any device and they'll get synced up to Google relatively quickly when they propagate to my Pi.
All in all, I'm pretty damn happy with this setup. And, with the exception of insync, it's all open source! There's definitely some shit goin on in the world these days, but it's always humbling to think that so much of the world's digital infrastructure is now running on stuff people built just because they wanted the world to have it.... (Don't forget to pay for it every once in a while! ;))