on photo storage solutions

I am by no means a prolific photographer, but I do own an halfway decent entry-level SLR—an Olympus e-520 for those who care. This has resulted in my having about 100 GB of photos that I need to somehow keep stored in an accessible and reliable way. More worryingly, about 45–50 GB of the photos are from last year.

So, in reality, storing that amount of data is trivial. A 3 TB hard drive can be had for $150 on Amazon right now and it would easily store all of photos for the foreseeable future. However, the SSD I have in my MacBook Air is only a 256 GB drive and between music, the videos I want to watch over the next bit, virtual machines and general stuff I don’t have 100 GB free for all of my photos.

the easy (almost-)solution

Fine, so I need to get something to store my photos on that isn’t my laptop. The easiest thing to do is to get a big external hard drive and store all the photos there. I could even get two and you can store a second copy. At least in theory.

In practice, there are at least 2 problems. First, when I put all my photos on an external hard drive, they’re suddenly inaccessible to me except when I’m at home with them plugged into my laptop.

Second, I need some way to make sure the data is replicated on both drives. I can’t simply set them up in a RAID since then anything you accidentally deleted will be deleted on both drives. Since the drives themselves are dumb, they can’t run a chron job to back themselves up either. Meaning that I have to manually make sure they’re both in sync.

This is actually more or less what I do right now, but it sucks. In actuality, one of the two hard drives sits in a Windows box that sits under my TV, so I can kind of get at my photos by remote desktoping back into it, but it’s not what anyone would call elegant.

working out a real solution

Fine, both the previous problems seem to have a common problem which is that you can’t have just dumb drives. You need to have some intelligence associated with you date both to give you access to it even though it’s not all stored locally on your computer and to maintain replicas for you.

Ideally, it should also meet the following 3 goals:

  1. Keep older version of files and folders so that accidental deletions don’t actually destroy any data.
  2. Be easily accessible over the network—fast while you’re on the same LAN and at least functional over the Internet.
  3. Allow for some data to be stored off-site either on a remote machine or in the cloud somewhere.

With goal 3 above, you might be thinking that a cloud solution might solve all of your problems. Just store it all in DropBox, SkyDrive or Google Drive and be done with it.

While this seems like a great solution at first, it doesn’t actually work out so well in practice. The problem isn’t actually the cost so much, in fact, SkyDrive will sell you 100 GB for $50/year which is more than reasonable.

The biggest problem is that you basically have to be willing to store all or nothing of the collection on any given computer. In other words, I can’t easily access my photos from my laptop where I don’t have enough space to store a complete copy. The web interfaces which might save you aren’t really quite good enough to actually use to get at your files except in a pinch.

This does kind of point out what I think the real solution is though. A virtual folder—or file system, however you want to look at it—that gives you the logical view that it contains all the 100s of gigabytes of your files, but in practice only stores a small cache of ones you’ve recently used. Then, if you try to load a file that isn’t cached it goes out and finds a copy to fetch for you whether it’s from your home PC or the cloud.

In theory, this shouldn’t be too hard to implement using a FUSE filesystem—though that rules out Windows for now—where when the user accesses a cached file it works just the same as opening any other file and when they access an uncached file, it blocks until the file can be retrieved. This could even be made better for photos in particular since OSes now store metadata like thumbnails along with photos and you could conceivably cache all of that allowing for easy browsing.

Any file dropped into the virtual folder could be sent to any number of replicas in the cloud or elsewhere and any changes to a file could be made to a new version while the old version was carefully squirreled away.

Basically, I get a magic folder on my laptop which takes up only a small fraction of the space of all my photos—My bet is that 10–15 GB would be more than enough. But, the magic folder does exactly what I want. When I put stuff in it, it backs it up for me. When I look at it I see all of my files—albeit maybe a bit slower than if it was actually local.

I’m honestly a little surprised that DropBox hasn’t already done this.

2 comments

  1. rektide

    git-annex is a tool for using git to track last-seen/last-updated information on files, and sending copies around of those files. it’s meant for exactly these kinds of use cases.
    http://git-annex.branchable.com/.

    there’s a kickstarter project for a management utility, with overtures to being a cloud-y piece of software:
    http://www.kickstarter.com/projects/joeyh/git-annex-assistant-like-dropbox-but-with-your-own

    you can also plug in remote backends to git-annex, such as S3, which may have their own web interfaces. you might also consider running OwnCloud on a seed-box with a lot of storage & decent connectivity.

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>