Photography, Storage, and Backups

(Originally uploaded to my personal blog @ Mischievous Ramblings II)

 "Angel Wings", by Evan Robinson, Ruby Beach, WA, 2016-07-22

"Angel Wings", by Evan Robinson, Ruby Beach, WA, 2016-07-22

Upon returning from my most recent photo workshop (five days on the Olympic Peninsula with David Muench, Jerry Dodrill, and Grant Ordelheide), I set upon my routine of backing up photos.  Step 1: use Chronosync to move photos from my laptop to my desktop.  And was immediately stopped by the lack of available space on my desktop.

This prompted a re-evaluation of my storage strategy for photos, which prompted a re-jiggering of my backup strategy.  I thought I'd document both for anyone else who was having similar issues, or needed a little help figuring out out to handle the mass of data a modern DSLR (or two) can generate.

I'm currently in possession of over 50K digital images taken over the last 16 years (I lump anything before 2000 in with the 2000s for convenience -- there aren't very many and they're small anyway).  If I include images, movies, export versions, DNG and sidecar files generated by Adobe Photoshop Lightroom, as well as movies and folders, it's 63,187 files and folders occupying 1.28 TB.  More than some, less than others.

Enough to be a pain.

My data usage and backup workflow used to look like this:

  1. Copy photos from camera data cards to Laptop Internal SSD drive (manual)
  2. Separate photos into date folders and rename (manual using scripts)
  3. Import photos into Adobe Lightroom (manual)
  4. Backup from Laptop Internal SSD drive to External SSD drive (manual)
  5. On return: Sync Desktop Internal SSD drive photos with Laptop Internal SSD drive photos (manual using Chronosync)
  6. Backup from Desktop Internal SSD drive to RAID Box 1 (automated daily with Chronosync)
  7. Backup from Raid Box 1 to Raid Box 2 (automated daily with Chronosync)
  8. Sync from Raid Box 2 to Amazon CloudDrive (manual using GoodSync)

Using this workflow, I have a master library (the external SSD "backup" drive plus the current year or two on the internal drives of my laptop and desktop), two local RAID backups, plus a cloud backup.  If my master fails or I lose a drive, I can redirect Lightroom to point to a RAIDed backup without much effort (each year folder is directed separately).  A notable potential problem is that I may work simultaneously on my laptop and desktop on the same catalog or even images.  I have avoided this problem through discipline, which will eventually fail.

The workflow interruption came initially at step 4 (the external SSD was no longer large enough to hold my entire photo library, which was nonfatal because the photos were also on the laptop drive), then because fatal at step 5 when the desktop internal SSD was not big enough.  Even though the laptop and desktop internal SSD drives were only holding this year's photos, going on a long road trip this spring followed by two workshops this summer had my total image for the year approach half a terabyte.  Because the desktop also has a Parallels image occupying about 50GB, there wasn't enough room on that drive.

I do all my work from SSDs whenever possible.  Backup systems (the three RAID boxes in my office) are all using spinning drives, but each is set up so that one drive can fail without data loss.  With my primary photo library now well over 1 TB, I didn't have an SSD large enough.

Immediate ideas: break my photo library into multiple libraries.  Many luminaries recommend one catalog per year, and I already store images in folders by year, so it wouldn't be a stretch.  But I'm not ready to do that.  I have vaguely imagined moving significant images to a "good images" catalog and storing them separately, but I'm dubious about my judgement in picking out the good ones, so I'm not happy with that idea.  I could get a small RAID box like the ThunderBay Mini 4 (designed for laptop drives) and fill it with SSDs, but that would cost in the neighborhood of $3K for 6 TB of usable storage.  Or I could find a 2 TB external drive and move my primary library onto that.  Which I decided to do.  It cost about $800 including several new USB-C 3.1 cables, and of course Amazon delivered in two days.

It took overnight to copy the entire library (including several shoots I'd held out as distinct) to the external drive and make sure that the corpus matched the files on my backup drives.  The copy took that long because the USB C cables I bought from Amazon didn't implement USB 3 (as specified).  They're going back for longer cables (I like 6" cables for drives attached to my laptop).  The biggest issue was some wonky file permissions that were different among backup versions, which a few minutes with the Terminal sorted out.

Now that I am using the external drive for my primary library, the workflow is simpler:

  1. Copy photos from camera data cards to External Library SSD drive (manual)
  2. Separate photos into date folders and rename (manual using scripts)
  3. Import photos into Adobe Lightroom (manual)
  4. Backup from External Library SSD drive to RAID Box 1 (automated on booting external SSD using Chronosync)
  5. Backup from Raid Box 1 to Raid Box 2 (automated daily using Chronosync)
  6. Sync from Raid Box 2 to Amazon CloudDrive (manual using GoodSync)

I still have my master, two local backups, and a cloud backup.  I've removed two manual steps.  In case of disaster, switching to using a RAID backup instead of my External SSD master is now a matter of relocating one folder in Lightroom instead of one per year.

My projections indicate that this solution will last no longer than the end of 2017, but I have that much time to figure out a catalog schema that will cross multiple drives.  I have also accepted that it's OK to delete the technically unacceptable photos* and will be going back to do that.  By next June I need to have a solution.  In the worst case, I can buy that small external RAID device and fill it with large SSDs, but that's just throwing money at the problem, and it dramatically reduces my ability to work portably, since the external RAID requires a power brick, while the external SSD drive doesn't.

With the exception of the length of time it takes to sync up to Amazon CloudDrive, I'm happy with this solution.  Manual steps are reduced, I have a "working token" in the external Library SSD to make sure that I'm not working simultaneously on the desktop and laptop versions of Lightroom, and the backups are more automated (and will be even more so when I turn on GoodSync automation now that the sync is a single job instead of one job per year).

My friend Chuq von Rospach is only one of the people who notes that there are two kinds of computer users: those who haven't needed their backups yet and those who have.  Those of us who have are more likely to set up systems to make sure our backups are in good shape.  I have tested every portion of this system except a full restore from Amazon CloudDrive (I have restored a relatively small year's worth) and am unaware of any reason why I won't be able to get my photos back in the face of almost any disaster (i.e., my computers can be lost or destroyed or my house can burn down).  I might add periodic backup to a spinning drive to be placed offsite, but so long as Amazon AWS stays in business I have offsite backup covered.  It would take at least two simultaneous disasters for my to lose it all :-).


* I did a sunset shoot working to get seagulls crossing in front of the ball of the sun, using my D500's 10 fps and a long lens.  In that demanding setting, using a brand new camera I'm not familiar with and that I hadn't configured the AF system on, nearly half the images were out of focus.  In a first, I deleted them wholesale.  I'll gradually be going back over my catalog seeing if there are other obviously worthless photos to get rid of.  I doubt it will be a 10% reduction, but at this point that's 100 GB, so it would be worth it.