Monitoring a Scaling Infrastructure at Aviary

Serving photo editing effects, frames, and stickers, as well as photo editing tools to millions of users around the globe requires an infrastructure that can grow as demand grows, and a tightly integrated platform to monitor it. At Aviary, we leverage both Amazon’s large suite of tools as well as a number of third-party and custom products to ensure all our environments are operating at peak performance as well as to catch potential issues before they affect our end-users.


At the core of our infrastructure monitoring platform is the well-known tool, Nagios. We run Nagios with a custom frontend known as Opsview, which provides a clean user interface for the management of hundreds of hosts, as well as an API which we use in our autoscaling environments. At its most basic level, Nagios is an infrastructure monitoring platform which can be endlessly configured to perform checks on various aspects of the infrastructure. At Aviary, a majority of these checks are performed against Amazon EC2 hosts and include disk and memory usage, CPU load, network traffic, and, in the case of hosts serving web pages, page response time. We have assorted other checks in place that test other infrastructure components such as the availability of S3 buckets, expiration dates for certificates, and DNS.


While Nagios is a very configurable system...

Automated Continuous Integration at Aviary

Part I - Infrastructure Setup


Here at Aviary we are growing! Each day, we are shipping more and more code to our users at an increasing pace. As our product grows and we introduce more and more services, the number of times that we need to deploy new code each day grows as well. In our earlier days, deploying code could easily be accomplished manually. However, with scores of applications across a multitude of servers, this process has become arduous, complicated, and, as we’ve learned, quite unnecessary. As a Server Engineer at Aviary, one of my first large projects has been to standardize, secure, and automate the process of deployment across our entire infrastructure. In this post, I’ll be providing an in-depth look at what we tried, what worked, and what didn’t.


To provide some background context, I will first take a moment to explain our earlier setup. Almost all of our work was (and remains) centered around GitHub; we use submodules and environment branches heavily. Our primary, public-facing applications run on Amazon’s EC2 cloud and make use of a variety of AWS services including: CloudFormation, Elastic Load Balancing, S3, and DynamoDB. Each application contains both production and staging environments on which the current GitHub environment branches are reflected. Deploying new code had previously involved manually logging into each running server instance and downloading the new code. While this is trivial for small environments, it is quite apparent why we needed to update this methodology for our growing infrastructure.


During the process of moving to what is known as “continuous integrated deployment,” we read countless articles, blog posts, and tutorials on the subject. Ultimately, we discovered that no single process is perfect for every organization; instead we collected some of the best workflows and used them to develop our own. Doing this has allowed us to create a process that works for our team, our applications, and our infrastructure, all while abstracting the most repetitive and time-consuming tasks into automated processes.

AWS OpsWorks

While automating our deployment process, we stumbled across a tool produced by Amazon called “AWS OpsWorks.” OpsWorks is marketed as a product that can easily abstract the deployment process away from developers, allowing them to focus on their applications. It does this by using “Stacks” of projects containing “Layers” of similar resources (such as a load balancer and a group of EC2 instances), and “Apps” of deployable code linked to GitHub. It makes heavy use of the automation tool, Chef, and includes a number of built-in layers for common project types. While OpsWorks was easy to setup, it had a number of drawbacks that ultimately led us to search for a different solution. Namely, OpsWorks can not run more than one layer of the same type within a stack or more than one app of the same type on servers within a layer (some of our applications require multiple applications on one server running on different ports). Additionally, it was slow due to its configuration method; a new instance could take ten to fifteen minutes to come online - unacceptable in a highly-trafficked, auto-scaling environment.

CloudFormation and Custom Scripts

After iterating over several custom deployment processes, we ultimately found one that works best for our team. Through a combination of CloudFormation (another AWS product), some custom scripts, and a Jenkins build server, we now have a fully automated, continuous integration build environment that provides flexibility and security.

The core of our process is CloudFormation, an AWS product that allows for the creation of resources through a JSON template. We have developed several templates for various infrastructure designs (a single EC2 instance, an auto-scaling group behind a load balancer, an HTTPS-enabled load balancer, etc.). These templates make use of parameters that allow them to be reused over and over for different projects.

A Sample CloudFormation Template Using Parameters

Bootstrap Scripts

The CloudFormation template uses these parameters to customize the application and environment once the machine starts. One of the most important parameters here is “BootstrapScripts.” Each of the scripts listed here is stored in an S3 bucket and run via CloudFormation. By abstracting various installation tasks into different scripts, we can configure an unlimited combinations of machine types.

Security Groups

A final parameter worth mentioning is “SecurityGroups.” We have created a number of groups that correspond to various use cases (such as _Public HTTP which provides access to port 80 for all source IPs). This allows us to maintain control over our resources instead of creating a different group for every instance which can lead to forgotten rules.

IAM Roles

Within our CloudFormation template, we use another AWS invention called IAM roles. IAM roles are an important security safeguard that allow a resource to access other resources within a restricted environment. Our role utilizes the concept of least privilege and only allows access to the resources it needs. AWS takes care of auto-rotating the AWS keys which allow us to avoid hardcoding AWS keys and secrets into the template or our code.

First Boot

When CloudFormation first runs on an instance, it installs the AWS Command Line Tools which are then used to download and run the bootstrap scripts. When that finishes, it checks to see whether application parameters were provided, and if so, downloads the latest application code from the S3 bucket. Another script is responsible for configuring the code and starting the server.

Part II - New Deployments


At this point in the process, we have an easily-configured environment that can begin running an application when it boots. The next step is to allow developers to deploy new versions of an application without having to worry about manually updating all of the running servers. Although OpsWorks accomplished this task, we ultimately settled on the tried-and-tested Jenkins server. On our Jenkins installation, we integrated GitHub with webhooks so that changes to a repository and branch could trigger a new build on Jenkins. On the Jenkins server, we created several managed scripts that allow us to deploy code to single EC2 instances, as well as all instances within a load balanced group.

When Jenkins detects a change on GitHub, it downloads the code to its workspace. After running any needed tests on the code and ensuring the build passes, it then runs a script we wrote that zips the code and uploads it to a provided S3 bucket (the same bucket the CloudFormation template used to download the code). Here is a small snippet of that code:

zip -q -r * -x "\*.git* \*.log";

aws s3 mv "$WORKSPACE/" "s3://bucket/";

Next, Jenkins logs into each server instance running that application, downloads the latest code, unzips it, stops the previously running server, and starts the new version. Here is a snippet of code that helps accomplish that:

INSTANCES=$(aws elb describe-load-balancers --load-balancer-name "$LBNAME" | grep "INSTANCES" | awk {'print $2'});

for i in $INSTANCES;
HOSTNAME=$(aws ec2 describe-instances --instance-ids ${i} | grep INSTANCES | awk {'print $15'});
if [[ $HOSTNAME == *ec2* ]]
echo "SSH into $HOSTNAME";
ssh -i "pem.pem" "user@$HOSTNAME" <<-EOF

When this script is complete, the latest version of the application will be running on every instance in the group.


Because Jenkins is integrated with GitHub, we can now update applications across any environment just by pushing to the configured branch. Through a third-party plugin, we have also integrated Jenkins with our chat platform, Slack, allowing us to receive notifications when the builds start and either succeed or fail.

This process is one that has been evolving for quite some time. While there are some smaller details I’ve left out, hopefully it has provided some insight into how Aviary now manages the deployment of applications across a varied and complex infrastructure.

Questions? Feel free to contact the post’s author, Matt Fuller, at

Want to get these updates in your inbox? Sign up here!

Live Image Processing with getUserMedia() and Web Workers

Demo of our internal tools using Web Workers & getUserMedia() to create image effects.

At Aviary, we are constantly exploring new technologies including the latest (and not-yet-fully supported) HTML5 features. Over the past few months, we’ve begun using Web Workers for a number of our internal tools. Web Workers allow us to perform heavy image processing tasks as a background process to avoid freezing the UI. We hope to be able to roll these benefits into our product in the near future while providing a fallback to unsupported browsers.

getUserMedia() is another exciting new feature in HTML5. It allows web applications to access video and audio from a user’s camera and other media device. In this demo, I'm piping webcam video data into a canvas element. I’ve also built a UI that allows me to control Aviary’s JavaScript Image Processors in real-time.

Since image processing is CPU intensive and can freeze our UI, we are also going to leverage the aforementioned Web Workers to perform the pixel manipulation as a background process. As I mentioned before, this will prevent the UI from freezing while the image is processing. No more need for the setTimeout( …, 0) trick.

Here’s how to get the video stream:

#!javascript // create video element (attach to DOM if you’d like to // view the stream but not necessary here) var video = document.createElement('video');

// default to vendor prefixed method for getUserMedia navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia;

// acquire the video stream navigator.getUserMedia({video:true}, function(stream){ video.src = URL.createObjectURL(stream);;

// setup interval to getImageData every X seconds: setInterval(render, 10); }, function(error){ console.log('error', error); });

In our render() method we first draw the video stream onto a canvas and use getImageData() for the pixels. Then we pass the pixels to our Web Worker for processing. (See example code link at bottom for more detail)

#!javascript var render = function(){ ctx.drawImage(video, 0, 0, w, h); var srcData = ctx.getImageData(0,0,w,h);

// pass image data to web worker processor.postMessage({ imageData: srcData }); };

After that, we use Web Workers to increase the Red channel, then use postMessage() to pass the resulting pixel data back.

#!javascript // message receiver onmessage = function(event) { var imageData =, dst =;

/* Image Processing will go here */ for (var i=0; i < dst.length; i += 4) { dst[i] += 70; // increase red channel }

postMessage({ dstData: imageData // pass result back }); };

Once the Worker is finished we will listen for the result and draw it onto our canvas:

#!javascript processor.onmessage = function(event){ ctxEffects.putImageData(, 0, 0); };

Check out the full example code and a working demo below.

Demo: (you'll need the latest Chrome or Firefox)