Menu Close
Blog: Engineering
All
April 24, 2013
by Jake Davis

Open Source Tools from Simple

chef-open-source

Open source software is a big part of our agenda at Simple. Beyond building a stronger community, it also complements our mission to be as transparent as possible while never compromising on security. We’ve used a variety of open source projects–such as Linux, HAProxy, and Nginx–to build Simple. We’re always looking for ways that we can give back to the open source community and just recently, we open sourced three Chef projects that we use to more efficiently manage our infrastructure. You can find them on our public Github page.

Chef-proxy

The first is a Chef cookbook called chef-proxy, designed for proxying traffic via /etc/hosts. By default, the Linux name resolver uses a combination of DNS and /etc/hosts to map hostnames to IP addresses. Entries in /etc/hosts will be evaluated before attempting DNS resolution, and we leverage this to “trick” our instances into sending traffic through a proxy that we control. Previously, we had to manually update our HAProxy templates to proxy to a new remote service. With chef-proxy, it’s as easy as adding a new Chef data bag:

{
 "id": "example-host",
 "proxy_port": 8000,
 "port": 5000,
 "hosts": ["www.google.com"],
 "ssl": false
}

The first step is to setup the client instance. The client recipe iterates over the hosts in the data bag and feeds each hostname as a parameter into the proxy_host resource, generating a hosts entry. If, for example, the name of the proxy instance resolved to 10.11.12.13, the following would be generated:

10.11.12.13 www.google.com # Aim at our proxy instance

Now any time this instance requests www.google.com, it will send the traffic to the proxy as opposed to the actual host.

But it wouldn’t be very useful if traffic just died at the proxy, so the next step is configuring the proxy instance itself via the server recipe. This recipe configures HAProxy on the proxy instance to listen on specific host:port pairs, and automatically forward incoming traffic to an associated endpoint. We initially intended to just listen on any IP address using a wildcard, but ran into cases where multiple service WSDLs had hardcoded host:port pairs, which required us to send traffic to a specific port. To cover this case, you can attach multiple IP addresses to your instance and use the proxy_host key to differentiate between multiple services that send to the same port. If no proxy_host key is specified, HAProxy listens on all addresses for that port.

For ingress (incoming) traffic, we define the proxy configuration in the Chef data bags that we use to model our services:

{
 "id": "our-service",
 "port": 15151,
 "proxy_port": 443,
 "endpoint_port": 443,
 "dev": {
   "hosts": [ "dev-service1" ]
 },
 "prod": {
   "hosts": [ "service1", "service2" ]
 }
}

The endpoint_port key is a final override for when a service such as, for example, Apache or Nginx sits in front of the service. In that case, the webserver is responsible for forwarding to the port value but is listening elsewhere. endpoint_port isn’t necessary, and Chef will fall back on the regular port value when it isn’t available.

One feature from the example above that we take advantage of is the ability to embed hosts within environment-specific keys. We then use a Chef attribute node [:proxy][:environment] to control which hosts end up in the HAProxy config. This allows us to run proxies in each environment using the same Chef configuration, and makes it easier for new services to quickly begin communicating with partners.

The next two projects are closely related and relatively simple. Chef is a great tool, but one of its weaknesses is the lack of run phases. This means you can easily end up with convoluted notification chains. One of the ways we’ve started trying to address this problem is by adding more advanced functionality to our report handlers, simple scripts which execute at the end of a Chef run.

Chef-handler-sensu

We’ve been working the past month or two on deploying Sensu, an awesome monitoring framework. However, managing your checks directory if a check’s name changes, or if it’s no longer needed, can be a bit annoying. We wanted a way to automatically clean up Sensu’s workspace after a Chef run so incorrect checks weren’t sent out. As a result, we wrote chef-handler-sensu, which compares the checks directory with the sensu_check resources from the run. If it finds checks on the filesystem which aren’t in the resource collection, it deletes the files. This way, your Sensu server is always up to date.

Chef-handler-timestamp

The second handler was motivated by recent memory problems with Chef. The background process was leaking a lot of memory and we were having a difficult time pinpointing where Chef was failing to run. We needed a quick way to make sure it was successfully completing on our instances and wanted an easy way to integrate this with Sensu. The result was chef-handler-timestamp, a Chef handler that creates a timestamp file at the end of a successful run. We monitor this file and alert if Chef hasn’t completed within a certain interval.

This is just the beginning for us and we’re working hard on open-sourcing more of the code that runs Simple. Open source is a central part of our agenda in Ops, and we hope that other people can use these projects, too. If you’re interested in contributing to the open source community as we build the systems that run Simple, check out our careers page.

Enjoy!