AWS Beanstalk, Docker and Clojure

The JUXT experience of deploying Docker containers through Beanstalk

When developing cloud based applications there's always the question of deployment. At JUXT we've had experience in AWS of using Cloud Formation templates, Puppet, Vagrant, Pallet, and Capistrano, to name a few. Recently we've been evaluating Beanstalk and its recent support for Docker.

PaaS or IaaS

Before choosing a solution, you have to ask yourself if you need IaaS or PaaS. Paas (Platform as a Service) is the Heroku/Beanstalk model. The ethos is that you don't need to worry about setting up the internals of a platform yourself, such as auto-scaling, load-balancing etc. By paying for a PaaS, it's all done for you. IaaS (Infrastructure as a Service) on the other hand gives you the lower level constructs so you can build a platform yourself. This is the traditional way of using AWS with its EC2 nodes, AutoScaling groups, Elastic Load Balancers (ELBs), CloudFront etc.

This choice between PaaS and IaaS is largely contextual. For example if you're building an MVP (minimal viable product), then you may just want a PaaS, whereas if you're building something large and strategic from the outset, then IaaS may be the straightforward choice. That's not to say the choices are mutually exclusive, as you could start with a PaaS, and if your project grows to something bigger with more infrastructural demands such as zero downtime or a rich Blue/Green deployment model (see below), then you could move to IaaS at a time that suits. Any time invested in PaaS should be minimal, so that graduating to IaaS is seen as an evolutionary move.

Docker

Orthogonal to the IaaS vs PaaS decision is the actual deployment unit. When building Clojure applications we typically like to build and deploy uberjars because then we're in full control of the application stack including the servlet container. Docker makes for a nice way of wrapping an uberjar, where you have the option of performing additional config and set up of your container. Docker is also popular right now for its philosophy of providing isolation, making it a developer friendly choice where you can create a container that mirrors your production environment, isolated from your host development environment.

Playing around with Docker and Clojure uberjars is simple enough, see lein-docker and lein-uberjar. My preference is to skip both tools and use Docker directly.

If you haven't already done so, first build your uberjar

cd myproj
lein uberjar

Now create a file with the name Dockerfile in the root of your directory, with the following content

FROM java:8

ADD target/myproj.0.1.0-SNAPSHOT-standalone.jar /srv/myproj-app.jar

EXPOSE 8080

CMD ["java", "-jar", "/srv/myproj-app.jar"]

Now build your Docker image with

docker build -t myproj/latest

Finally, here's how to run it

docker run -d -P myproj/latest`

If you're using OSX then you'll need boot2docker.

Beanstalk

If you're seeking a PaaS solution, like the idea of Docker and you've a preference for AWS, then it's worth checking out Beanstalk.

Beanstalk layers a PaaS on top of existing AWS constructs, so if you give it a WAR file or a Docker image, then it will create the EC2 nodes and corresponding infrastructure for you, including AutoScaling Groups, Elastic Load Balancers, and Security Groups. It also helps you with minimally provided config (such as key pairs), and can expose environmental vars to the app, thus jiving well with the 12 factor approach.

Beanstalk has 'Applications', 'Environments' and 'Application Versions' as onion layers: Application -> Environment -> Application Version. An application has environments, and you can deploy different application versions to a target environment.

When you deploy an application version to Beanstalk, it automatically sets up a DNS pointing to {your-app-name}-{environment}.elasticbeanstalk.com. It also sets up CloudWatch and various other pieces of the AWS stack that you would normally have to configure manually.

One word of warning is that Beanstalk is well suited for monolithic or broad grained services. If Microservices is your thing then Beanstalk may be too bloated and unwieldly to house lots of moving parts. I got burned when I wanted my singular deployed instance to open up a few different ports, so that I could mimic the running of a few different services/applications in the same process, effectively cheating my way out of the dev ops workload during the early days of development. I got stuck by AWS Beanstalk/Docker expecting just a single application port, and fell foul of how Beanstalk ties its Docker service to the Elastic Load Balancer using some reverse proxying, where the docker run command is essentially locked down. I've since learned that it's best to keep your deployment ambitions in check when using PaaS.

No Blue/Green

Blue/Green deployment is the concept of running two application versions side by side in production, and once you're happy with the new 'green' release candidate, you can gracefully retire the existing 'blue' version.

Beanstalk doesn't offer blue/green deployment by default. When deploying a new application version it simply replaces the one that's already there in the same environment, and you don't get a chance to check the candidate for yourself. The approach will also incur some downtime during the deployment transition.

You can however maintain two different environments, i.e. a prod1 and a prod2, and then do a DNS swap between them, to ensure 'zero downtime', as covered here. There is though a big downside to using DNS swapping to achieve blue/green, as DNS entries are cached by clients hitting your service. This means you do not ultimately control the switch, and you can never be sure that some of your clients will not have problems when you retire an environment. I don't think this a good approach, and the best AWS method I've seen of achieve Blue/Green is to manage a single Elastic Load Balancer and gracefully swap in/out different AutoScaling Groups containing different application versions. You don't get this with Beanstalk.

If you need zero downtime but want to avoid any DNS switching, then Beanstalk has another option. They have quietly introduced 'rolling application version updates' late in 2014. This is subtly different to plain old rolling environment config changes, which have been around for a while.

So now if you have a few nodes running in a Beanstalk environment, you can 'roll in' an application version by upgrading one batch, and then the other. This doesn't give you Blue/Green proper, but it does offer a zero downtime approach. Just make sure you've well tested the application version on a different environment to prod.

Deploying Docker Containers to Beanstalk using Leiningen

If you are using Clojure then there are a couple of options. At the time of writing the Leiningen plugin lein-beanstalk focuses on building and deploying WAR files to Beanstalk. I have built a very small wrapper here lein-dockerstalk, where you can simply do lein dockerstalk deploy dev {path-to-zip-file}. You need to build the ZIP file in advance, containing typically just the Dockerfile and the uberjar (you can use lein-zip). lein-dockerstalk may well get factored into lein-beanstalk in the future.

There is also a plugin for boot for deploying Docker images to Beanstalk.

Conclusion

Beanstalk is a certainly a good option for when you want a PaaS in AWS. Given Beanstalk sits on top of familiar AWS constructs, there is a migration path away from Beanstalk when the time is right.

Thanks to Antonio Terreno and others for pointing the way to Beanstalk and Docker.

References

http://blog.bwhaley.com/evaluating-elastic-beanstalk-an-ops-perspective http://www.hudku.com/blog/demystified-zero-downtime-with-amazon/ http://www.thoughtworks.com/insights/blog/implementing-blue-green-deployments-aws