Rake task to sync your assets to Amazon S3/Cloudfront
1 January 2011
With my move to Heroku I felt bad about having Heroku’s app servers serve static content for me. It’s not really a problem, but I just like to use the best tool available for the job.
Because Ariejan.net is a rack app, it has a
public directory with all static assets in once place. There are, however, a few problems that need adressing.
These are the problems I want to resolve:
Keep my S3 Bucket in sync with my public directory
The first and foremost is to keep my S3 bucket in sync with the content of
public. I don’t care about file deletions, but I do care about new and updated files. Those should be synced with every deployment to S3.
Don’t re-upload the entire public directory with every deployment
Over time the size of
public has grown. New images are added all the time. I don’t want to re-upload them with every deployment. So, my sync script must be smart enough to not upload unchanged files.
Hook the S3 sync into my current deployment rake task
My current rake deploy task should be able to call
assets:deploy or something to trigger an asset sync.
I don’t want to configure anything, if possible.
Well, this is the rake task I currently use:
require 's3' require 'digest/md5' require 'mime/types' ## These are some constants to keep track of my S3 credentials and ## bucket name. Nothing fancy here. AWS_ACCESS_KEY_ID = "xxxxx" AWS_SECRET_ACCESS_KEY = "yyyyy" AWS_BUCKET = "my_bucket" ## This defines the rake task `assets:deploy`. namespace :assets do desc "Deploy all assets in public/**/* to S3/Cloudfront" task :deploy, :env, :branch do |t, args| ## Minify all CSS files Rake::Task[:minify].execute ## Use the `s3` gem to connect my bucket puts "== Uploading assets to S3/Cloudfront" service = S3::Service.new( :access_key_id => AWS_ACCESS_KEY_ID, :secret_access_key => AWS_SECRET_ACCESS_KEY) bucket = service.buckets.find(AWS_BUCKET) ## Needed to show progress STDOUT.sync = true ## Find all files (recursively) in ./public and process them. Dir.glob("public/**/*").each do |file| ## Only upload files, we're not interested in directories if File.file?(file) ## Slash 'public/' from the filename for use on S3 remote_file = file.gsub("public/", "") ## Try to find the remote_file, an error is thrown when no ## such file can be found, that's okay. begin obj = bucket.objects.find_first(remote_file) rescue obj = nil end ## If the object does not exist, or if the MD5 Hash / etag of the ## file has changed, upload it. if !obj || (obj.etag != Digest::MD5.hexdigest(File.read(file))) print "U" ## Simply create a new object, write the content and set the proper ## mime-type. `obj.save` will upload and store the file to S3. obj = bucket.objects.build(remote_file) obj.content = open(file) obj.content_type = MIME::Types.type_for(file).to_s obj.save else print "." end end end STDOUT.sync = false # Done with progress output. puts puts "== Done syncing assets" end end
This rake task is hooked into my
rake deploy:production script and generates the following output (I added a new file just to show you what happens.)
$ rake deploy:production (in /Users/ariejan/Code/Sites/ariejannet) Deploying master to production == Minifying CSS == Done == Uploading assets to S3/Cloudfront ......................................U......... == Done syncing assets Updating ariejannet-production with branch master Counting objects: 40, done. Delta compression using up to 4 threads. Compressing objects: 100% (27/27), done. Writing objects: 100% (30/30), 4.24 KiB, done. Total 30 (delta 17), reused 0 (delta 0) -----> Heroku receiving push
It’s very easy to write your own S3 sync script. My version has still has some issues/missing features that I may or may not add at some later time. There’s no support for file deletions and error handling is very poor at this time. Also,
public is still under version control (where I want it), and is pushed to Heroku. This is non-sense, because most of the assets in
public are not used (except