Rake task to sync your assets to Amazon S3/Cloudfront
With my move to Heroku I felt bad about having Heroku’s app servers serve static content for me. It’s not really a problem, but I just like to use the best tool available for the job.
Because Ariejan.net is a rack app, it has a public
directory with all static assets in once place. There are, however, a few problems that need adressing.
~
These are the problems I want to resolve:
Keep my S3 Bucket in sync with my public directory
The first and foremost is to keep my S3 bucket in sync with the content of public
. I don’t care about file deletions, but I do care about new and updated files. Those should be synced with every deployment to S3.
Don’t re-upload the entire public directory with every deployment
Over time the size of public
has grown. New images are added all the time. I don’t want to re-upload them with every deployment. So, my sync script must be smart enough to not upload unchanged files.
Hook the S3 sync into my current deployment rake task
My current rake deploy task should be able to call assets:deploy
or something to trigger an asset sync.
Minimal configuration
I don’t want to configure anything, if possible.
The script
Well, this is the rake task I currently use:
require 's3'
require 'digest/md5'
require 'mime/types'
## These are some constants to keep track of my S3 credentials and
## bucket name. Nothing fancy here.
AWS_ACCESS_KEY_ID = "xxxxx"
AWS_SECRET_ACCESS_KEY = "yyyyy"
AWS_BUCKET = "my_bucket"
## This defines the rake task `assets:deploy`.
namespace :assets do
desc "Deploy all assets in public/**/* to S3/Cloudfront"
task :deploy, :env, :branch do |t, args|
## Minify all CSS files
Rake::Task[:minify].execute
## Use the `s3` gem to connect my bucket
puts "== Uploading assets to S3/Cloudfront"
service = S3::Service.new(
:access_key_id => AWS_ACCESS_KEY_ID,
:secret_access_key => AWS_SECRET_ACCESS_KEY)
bucket = service.buckets.find(AWS_BUCKET)
## Needed to show progress
STDOUT.sync = true
## Find all files (recursively) in ./public and process them.
Dir.glob("public/**/*").each do |file|
## Only upload files, we're not interested in directories
if File.file?(file)
## Slash 'public/' from the filename for use on S3
remote_file = file.gsub("public/", "")
## Try to find the remote_file, an error is thrown when no
## such file can be found, that's okay.
begin
obj = bucket.objects.find_first(remote_file)
rescue
obj = nil
end
## If the object does not exist, or if the MD5 Hash / etag of the
## file has changed, upload it.
if !obj || (obj.etag != Digest::MD5.hexdigest(File.read(file)))
print "U"
## Simply create a new object, write the content and set the proper
## mime-type. `obj.save` will upload and store the file to S3.
obj = bucket.objects.build(remote_file)
obj.content = open(file)
obj.content_type = MIME::Types.type_for(file).to_s
obj.save
else
print "."
end
end
end
STDOUT.sync = false # Done with progress output.
puts
puts "== Done syncing assets"
end
end
This rake task is hooked into my rake deploy:production
script and generates the following output (I added a new file just to show you what happens.)
$ rake deploy:production
(in /Users/ariejan/Code/Sites/ariejannet)
Deploying master to production
== Minifying CSS
== Done
== Uploading assets to S3/Cloudfront
......................................U.........
== Done syncing assets
Updating ariejannet-production with branch master
Counting objects: 40, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (27/27), done.
Writing objects: 100% (30/30), 4.24 KiB, done.
Total 30 (delta 17), reused 0 (delta 0)
-----> Heroku receiving push
Conclusion
It’s very easy to write your own S3 sync script. My version has still has some issues/missing features that I may or may not add at some later time. There’s no support for file deletions and error handling is very poor at this time. Also, public
is still under version control (where I want it), and is pushed to Heroku. This is non-sense, because most of the assets in public
are not used (except robots.txt
and favicon.ico
)