Toxic Elephant

Amazon S3 versus rsync

Posted by matijs Thu, 08 Jun 2006 11:15:00 GMT

JungleDisk is a new tool that uses
Amazon’s S3 as a storage device but appears to the user as just another
disk. It is a closed source, open standard application, seen locally as a
WebDAV server, so it interfaces with most desktop file managers.

I can’t get it for my Linux-on-iBook system, but I suppose if JungleDisk
catches on, someone will come along and do a Free reimplementation (source
code for retrieving files can already be downloaded from JungleDisk’s
site
).
The question then becomes: Do I want to use it?

I myself use rsync to backup my files to a friend’s machine on the other
side of the ocean. JungleDisk doesn’t do rsync. It’s just a disk, so you
have to consciously copy your files there. That’s fine if you just want to
store some files on-line. For backups, on the other hand, I want an
automated solution.

The following scheme might work: Store md5 and/or sha-1 hashes of all the
files sent to S3. The files sent there are simply indexed by hash, and we
store a mapping of hashes to directory nodes. This way also, when we move a
file, it doesn’t have to be uploaded again. The mapping has to be uploaded
to S3 as well, of course. For more fine-grained upload control, files can
even be divided into equal-sized blocks, and only changed blocks will have
to be re-uploaded.

As an aside, I wonder if S3 allows you to check md5 or sha-1 hashes of the files
stored there, or if there is some other way to check the files there are
the same as the files here.

Posted in  | Tags  | no comments | no trackbacks

No comments

No trackbacks

Comments are disabled

Toxic Elephant is Matijs van Zuijlen's weblog.

Powered

Categories

Archives