Toxic Elephant

Don't bury it in your back yard!

Amazon S3 versus rsync

Posted by matijs 08/06/2006 at 11h15

JungleDisk is a new tool that uses
Amazon’s S3 as a storage device but appears to the user as just another
disk. It is a closed source, open standard application, seen locally as a
WebDAV server, so it interfaces with most desktop file managers.

I can’t get it for my Linux-on-iBook system, but I suppose if JungleDisk
catches on, someone will come along and do a Free reimplementation (source
code for retrieving files can already be downloaded from JungleDisk’s
site
).
The question then becomes: Do I want to use it?

I myself use rsync to backup my files to a friend’s machine on the other
side of the ocean. JungleDisk doesn’t do rsync. It’s just a disk, so you
have to consciously copy your files there. That’s fine if you just want to
store some files on-line. For backups, on the other hand, I want an
automated solution.

The following scheme might work: Store md5 and/or sha-1 hashes of all the
files sent to S3. The files sent there are simply indexed by hash, and we
store a mapping of hashes to directory nodes. This way also, when we move a
file, it doesn’t have to be uploaded again. The mapping has to be uploaded
to S3 as well, of course. For more fine-grained upload control, files can
even be divided into equal-sized blocks, and only changed blocks will have
to be re-uploaded.

As an aside, I wonder if S3 allows you to check md5 or sha-1 hashes of the files
stored there, or if there is some other way to check the files there are
the same as the files here.

Tags no comments no trackbacks

Comments

Comments are disabled