Technology Blog: Setting up a Multi-Node Mesos Cluster running Docker, HAProxy and Marathon with Ansible

The Google Omega Paper has given birth to cloud vNext: cluster schedulers managing containers. You can make a bunch of nodes appear as one big computer and deploy anything to your own private cloud; just like Docker, but across any number of nodes. Google’s Kubernetes, Flynn, Fleet and Apache Mesos, originally from Twitter, are implementations of Omega with the goal of abstracting away discrete nodes and optimizing compute resources. Each implementation has its own tweak, but they all follow the same basic setup: leaders, for coordination and scheduling; some service discovery component; some underlying cluster tool (like Zookeeper); followers, for processing.

In this post we’ll use Ansible to install a multi-node Mesos cluster using packages from Mesosphere. Mesos, as a cluster framework, allows you to run a variety of cluster-enabled software, including Spark, Storm and Hadoop. You can also run Jenkins, schedule tasks with Chronos, even run ElasticSearch and Cassandra without having to double to specific servers. We’ll also set up Marathon for running services with Deimos support for Docker containers.
Mesos, even with Marathon, doesn’t offer the holistic integration of some other tools, namely Kubernetes, but at this point it’s easier to set up on your own set of servers. Although young Mesos is one of the oldest projects of the group and allows more of a DIY approach on service composition.

TL;DR

The playbook is on github, just follow the readme!. If you want to simply try out Mesos, Marathon, and Docker mesosphere has an excellent tutorial to get you started on a single node. This tutorial outlines the creation of a more complex multi-node setup.

System Setup

The system is divided into two parts: a set of masters, which handle scheduling and task distribution, with a set of slaves providing compute power. Mesos uses Zookeeper for cluster coordination and leader election. A key component is service discovery: you don’t know which host or port will be assigned to a task, which makes, say, accessing a website running on a slave difficult. The Marathon API allows you to query task information, and we use this feature to configure HAProxy frontend/backend resources.
Our masters run:

Zookeeper
Mesos-Master
HAProxy
Marathon

and our slaves run:

Mesos-Slave
Docker
Deimos, the Mesos -> Docker bridge

Ansible

Ansible works by running a playbook, composed of roles, against a set of hosts, organized into groups. My Ansible-Mesos-Playbook on GitHub has an example hosts file with some EC2 instances listed. You should be able to replace these with your own EC2 instances running Ubuntu 14.04, our your own private instances running Ubuntu 14.04. Ansible allows us to pass node information around so we can configure multiple servers to properly set up our masters, zookeeper set, point slaves to masters, and configure Marathon for high availability.
We want at least three servers in our master group for a proper zookeeper quorum. We use host variables to specify the zookeeper id for each node.

[mesos_masters]

ec2-54-204-214-172.compute-1.amazonaws.com zoo_id=1

ec2-54-235-59-210.compute-1.amazonaws.com zoo_id=2

ec2-54-83-161-83.compute-1.amazonaws.com zoo_id=3

The mesos-ansible playbook will use nodes in the mesos_masters for a variety of configuration options. First, the /etc/zookeeper/conf/zoo.cfg will list all master nodes, with /etc/zookeeper/conf/myid being set appropriately. It will also set up upstart scripts in /etc/init/mesos-master.conf, /etc/init/mesos-slave.conf with default configuration files in /etc/defaults/mesos.conf. Mesos 0.19 supports external executors, so we use Deimos to run docker containers. This is only required on slaves, but the configuration options are set in the shared /etc/defaults/mesos.conf file.

Marathon and HAProxy

The playbook leverages an ansible-marathon role to install a custom build of marathon with Deimos support. If Mesos is the OS for the data center, Marathon is the init system. Marathoin allows us to http post new tasks, containing docker container configurations, which will run on Mesos slaves. With HAProxy we can use the masters as a load balancing proxy server routing traffic from known hosts (the masters) to whatever node/port is running the marathon task. HAProxy is configured via a cron job running a custom bash script. The script queries the marathon API and will route to the appropriate backend by matching a host header prefix to the marathon job name.

Mesos Followers (Slaves)

The slaves are pretty straightforward. We don’t need any host variables, so we just list whatever slave nodes you’d like to configure:

[mesos_slaves]

ec2-54-91-78-105.compute-1.amazonaws.com

ec2-54-82-227-223.compute-1.amazonaws.com

Mesos-Slave will be configured with Deimos support.

The Result

With all this set up you can set up a wildcard domain name, say *.example.com, to point to all of your master node ip addresses. If you launch a task like “www” you can visit www.example.com and you’ll hit whatever server is running your application. Let’s try launching a simple web server which returns the docker container’s hostname:

POST to one of our masters:

POST /v2/apps

{
  "container": {
    "image": "docker:///mhamrah/mesos-sample"
  },
  "cpus": ".25",
  "id": "www",
  "instances": 4,
  "mem": 512,
  "ports": [0],
  "uris": []
}

POST to one of our masters:

POST /v2/apps

{

"container": {

"image": "docker:///mhamrah/mesos-sample"

"cpus": ".25",

"id": "www",

"instances": 4,

"mem": 512,

"ports": [0],

"uris": []

}

We run four instances allocating 25% of a cpu with an application name of www. If we hit www.example.com, we’ll get the hostname of the docker container running on whatever slave node is hosting the task. Deimos will inspect whatever ports are EXPOSEd in the docker container and assign a port for Mesos to use. Even though the config script only works on port 80 you can easily reconfigure for your own needs.
To view marathon tasks, simply go to one of your master hosts on port 8080. Marathon will proxy to the correct master. To view mesos tasks, navigate to port 5050 and you’ll be redirected to the appropriate master. You can also inspect the STDOUT and STDERR of Mesos tasks.

Notes

In my testing I noticed, on rare occasion, the cluster didn’t have a leader or marathon wasn’t running. You can simply restart zookeeper, mesos, or marathon via ansible:

#Restart Zookeeper
ansible mesos_masters -a "sudo service zookeeper restart"

1 2	#Restart Zookeeper ansible mesos_masters -a "sudo service zookeeper restart"

There’s a high probability something won’t work. Check the logs, it took me a while to get things working: grepping /var/log/syslog will help, along with /var/log/upstart/mesos-master.conf, mesos-slave.conf and marathon.conf, along with the /var/log/mesos/.

What’s Next

Cluster schedulers are an exciting tool for running production applications. It’s never been easier to build, package and deploy services on public, private clouds or bare metal servers. Mesos, with Marathon, offers a cool combination for running docker containers–and other mesos-based services–in production. This Twitter U video highlights how OpenTable uses Mesos for production. The HAProxy approach, albeit simple, offers a way to route traffic to the correct container. HAProxy will detect failures and reroute traffic accordingly.
I didn’t cover inter-container communication (say, a website requiring a database) but you can use your service-discovery tool of choice to solve the problem. The Mesos-Master nodes provide good “anchor points” for known locations to look up stuff; you can always query the marathon api for service discovery. Ansible provides a way to automate the install and configuration of mesos-related tools across multiple nodes so you can have a serious mesos-based platform for testing or production use.

Technology Blog

Saturday, August 30, 2014

Setting up a Multi-Node Mesos Cluster running Docker, HAProxy and Marathon with Ansible