1 Jan, 2011

With my move to Heroku I felt bad about having Heroku’s app servers serve static content for me. It’s not really a problem, but I just like to use the best tool available for the job.

Because Ariejan.net is a rack app, it has a public directory with all static assets in once place. There are, however, a few problems that need adressing. ~

These are the problems I want to resolve:

#### Keep my S3 Bucket in sync with my public directory

The first and foremost is to keep my S3 bucket in sync with the content of public. I don’t care about file deletions, but I do care about new and updated files. Those should be synced with every deployment to S3.

#### Don’t re-upload the entire public directory with every deployment

Over time the size of public has grown. New images are added all the time. I don’t want to re-upload them with every deployment. So, my sync script must be smart enough to not upload unchanged files.

#### Hook the S3 sync into my current deployment rake task

My current rake deploy task should be able to call assets:deploy or something to trigger an asset sync.

#### Minimal configuration

I don’t want to configure anything, if possible.

### The script

Well, this is the rake task I currently use:

require 's3'
require 'digest/md5'
require 'mime/types'

## These are some constants to keep track of my S3 credentials and
## bucket name. Nothing fancy here.

AWS_ACCESS_KEY_ID = "xxxxx"
AWS_SECRET_ACCESS_KEY = "yyyyy"
AWS_BUCKET = "my_bucket"

## This defines the rake task assets:deploy.
namespace :assets do
desc "Deploy all assets in public/**/* to S3/Cloudfront"
task :deploy, :env, :branch do |t, args|

## Minify all CSS files

## Use the s3 gem to connect my bucket

service = S3::Service.new(
:access_key_id => AWS_ACCESS_KEY_ID,
:secret_access_key => AWS_SECRET_ACCESS_KEY)
bucket = service.buckets.find(AWS_BUCKET)

## Needed to show progress
STDOUT.sync = true

## Find all files (recursively) in ./public and process them.
Dir.glob("public/**/*").each do |file|

## Only upload files, we're not interested in directories
if File.file?(file)

## Slash 'public/' from the filename for use on S3
remote_file = file.gsub("public/", "")

## Try to find the remote_file, an error is thrown when no
## such file can be found, that's okay.
begin
obj = bucket.objects.find_first(remote_file)
rescue
obj = nil
end

## If the object does not exist, or if the MD5 Hash / etag of the
## file has changed, upload it.
if !obj || (obj.etag != Digest::MD5.hexdigest(File.read(file)))
print "U"

## Simply create a new object, write the content and set the proper
## mime-type. obj.save will upload and store the file to S3.
obj = bucket.objects.build(remote_file)
obj.content = open(file)
obj.content_type = MIME::Types.type_for(file).to_s
obj.save
else
print "."
end
end
end
STDOUT.sync = false # Done with progress output.

puts
puts "== Done syncing assets"
end
end

This rake task is hooked into my rake deploy:production script and generates the following output (I added a new file just to show you what happens.)

    \$ rake deploy:production
(in /Users/ariejan/Code/Sites/ariejannet)
Deploying master to production
== Minifying CSS
== Done
......................................U.........
== Done syncing assets

Updating ariejannet-production with branch master
Counting objects: 40, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (27/27), done.
Writing objects: 100% (30/30), 4.24 KiB, done.
Total 30 (delta 17), reused 0 (delta 0)

-----> Heroku receiving push

### Conclusion

It’s very easy to write your own S3 sync script. My version has still has some issues/missing features that I may or may not add at some later time. There’s no support for file deletions and error handling is very poor at this time. Also, public is still under version control (where I want it), and is pushed to Heroku. This is non-sense, because most of the assets in public are not used (except robots.txt and favicon.ico)