Monday, September 8, 2014

Building Distributed Systems with Mesos

Apache Mesos is a popular open source cluster manager which enables building resource-efficient distributed systems. Mesos provides efficient dynamic resources isolation and sharing across multiple distributed applications such as Hadoop, Spark, Memcache, MySQL etc on a dynamic shared pool of resources nodes. This means with Mesos you can run any distributed application which requires clustered resources.

Single compute unit

Mesos allows multiple services to scale and utilise a shared pool of servers more efficiently. The key idea behind the Mesos is to turn your data centre into one very large computer. Think in this way, if you have 5 nodes with 8 CPU Cores and 32GB RAM and 10 nodes with 4 CPU Cores and 16GB RAM - using Mesos you can run them as one single large compute unit of 320GB RAM and 80 CPU Cores. With Mesos rather than running multiple applications on separate clusters , you are efficient distributing resources across multiple application on a single large compute unit.

Who is using Mesos?

Apache Mesos has been used in production by Twitter, AirBnB and many other companies for web-scale computing. Twitter is running several key services including analytics and ads. Similarly AirBnB uses Mesos to manage their big data infrastructure.
Similar type of systems Borg and its successor Omega are used by Google. Both Omega and Mesos let you run multiple distributed systems atop the same cluster of servers.

Mesos Internals

Mesos is leveraging features in modern kernels for resource isolation, prioritisation, limiting and accounting. This is normally done by cgroups in Linux, zones in Solaris. Mesos provide resources isolation for CPU, memory, I/O, file system , etc. It is also possible to use Linux containers but current isolation support for Linux container in Mesos is limited to only CPU and memory.

Mesos Architecture

Mesos architecture1 consists of a Mesos Master that manages Mesos Slaves and frameworks that run tasks on Mesos Slaves. Master is in soft state2 with one or more stand-by managed by ZooKeeper to implement the failover mechanism.
A framework is an application running on top of Mesos. A Mesos framework normally uses Mesos APIs (C++, Python, Ruby, JVM, Go) exposed by Mesos kernel. Using these APIs a framework interacts with Mesos Master to accept and register the resources allocation on Mesos Slaves.
A framework consists of two key components,
  • a scheduler accepts and registers resources from master, and
  • an executor is a process launched on slave nodes to run the framework’s tasks.
Some frameworks distributed in nature such as Hadoop, Spark, etc. Distributed frameworks normally employ distributed executors.
Mesos Architecture and key components
Mesos Architecture and key components

How Mesos Works?

  1. When a Mesos Slave has free resource, it reports to Mesos Master about availability.
  2. Using allocation policy, Mesos Master determines how many resources are offered to each framework.
  3. Then Master send offers and frameworks’ schedulers select which of the offered resources to accept.
  4. When a frameworks accepts offered resources, it passes to Mesos a description of the tasks it wants to run on them.
  5. In turn, Mesos Master sends the tasks to the Slave, which allocates appropriate resources to the framework’s executor.
  6. Finally, framework’s executor launches the tasks.

Mesos Frameworks

Number of Mesos frameworks is growing very rapidly. Here is a list of popular frameworks,
Continuous Integration: Jenkins, GitLab
Big Data: Hadoop, Spark, Storm, Kafka, Cassandra, Hypertable, MPI
Python workloads: DPark, Exelixi
Meta-Frameworks / HA Services: Aurora, Marathon
Distributed Cron: Chronos
Containers: Docker
There are two notable frameworks currently available to support batch processing and long-running services: Chronos (Unix cron equivalent) and Marathon (Unix init.d equivalent). Marathon is a meta framework and it is used by most of other frameworks and application including Chronos.
Mesos architecture supports both batch jobs and long-running services/apps. Image Credits Typesafe Blog.
Mesos architecture supports both batch jobs and long-running services/apps. Image Credits Typesafe Blog.
Marathon can run web applications like Rails3, Django and Play4. Marathon provides a REST API for starting, stopping, and scaling these applications (scale-out and scale-in).
Chronos is a distributed and fault-tolerant cron/job scheduler. Chronos supports custom Mesos executors as well as the default command executor sh. Using a RESTful JSON API over HTTP frameworks and applications can communicate with Chronos.

Writing your own framework

Writing your own framework on top of Mesos is quite easy. First you create your scheduler in C, C++, Java/Scala, or Python by inheriting from the corresponding Scheduler class.
import mesos
# Example using Python
class MyScheduler(mesos.Scheduler):
   # Override Scheduler Functions like resourceOffers, etc.
import org.apache.mesos.MesosSchedulerDriver;
import org.apache.mesos.Protos;
import org.apache.mesos.Protos.*;
import org.apache.mesos.Protos.TaskID;
import org.apache.mesos.Scheduler;
import org.apache.mesos.SchedulerDriver;

public class MyScheduler implements Scheduler {
   // Override Scheduler Functions like resourceOffers, etc.
}
Then you write your framework executor by inheriting from the Executor class.
import mesos
# Example using Python
class MyExecutor(mesos.Executor):
   # Override Executor Functions such as launchTask, etc.
import org.apache.mesos.Executor;
import org.apache.mesos.ExecutorDriver;
import org.apache.mesos.MesosExecutorDriver;
import org.apache.mesos.Protos.Environment.Variable;
import org.apache.mesos.Protos.*;
import org.apache.mesos.Protos.TaskID;
import org.apache.mesos.Protos.TaskStatus;

// Example using Java
public class MyExecutor implements Executor {
   // Override Executor Functions such as launchTask, etc.
}
Finally install your framework on Mesos cluster which requires placing your framework somewhere that all slaves on the cluster can get it from. If you are running HDFS, you can put your framework executor into HDFS.

Running Distributed Systems On Mesos

Apache Mesos can be used as an SDK for building fault-tolerant distributed frameworks on top of it5. You can port existing distributed systems or new distributed applications on Mesos using a custom framework wrapper in less than 100-300 lines (approach described in previous section) without involving any networking related code.
Mesos and Distributed Frameworks as Graph. Image Credits Paco Nathan.
Mesos and Distributed Frameworks as Graph. Image Credits Paco Nathan.
When porting existing distributed system like Hadoop or Storm, challenging bit seems to be the mapping the distributed system specific logic (Distributed system graph) on Scheduler and Executor (Mesos graph). For instance when running Hadoop on Mesos 6 7,
  • Hadoop’sJobTracker represents both a scheduler and a long-running process
  • Mesos Scheduler is a thin layer top of Hadoop Scheduler.
  • Hadoop TaskTrackers are executors.
  • Hadoop Job is collection of map and reduce tasks.
  • Hadoop Task is one unit of work for a Job (map or reduce)
  • Hadoop Slot is a task executor (map or reduce )
  • Hadoop JobTracker launches TaskTrackers for each Job (fixed or variable slot policy)
  • Hadoop Task scheduling is left to the underlying scheduler (i.e., Hadoop FairScheduler)
Similarly for Storm, we have to map Nimbus (JobTracker of Storm) and the Supervisors (TaskTrackers of Storm) on Scheduler and Executor (Mesos graph).

  1. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center 
  2. the master can reconstruct completely its internal state from the periodic messages it gets from the slaves, and from the framework schedulers 
  3. Run Ruby on Rails on Mesos 
  4. Run Play on Mesos 
  5. Apache Mesos as an SDK for Building Distributed Frameworks 
  6. Learning Apache Mesos 
  7. Hadoop on-mesos 

Saturday, September 6, 2014

Tools to choose and use!


Most probably many of you asked yourselves: To build from scratch or Not To build from scratch?
  1. - To build!
  2. - Not to build!
  3. - To build!
  4. - Not to build!
Hmm…  Hard to decide? Take a look on the diagram and decide!
To build from scratch or not to build from scratch
Is there any Re Use in your answer? So you are one of those lucky bastards who can relax for a while and enjoy the cocktail :)

Re-use in all its beauty

Distributed platforms and frameworks aka Map-Reduce:

  • Apache Hadoop is an open source software for reliable, scalable, distributed computing. It is allows to use simple programming models to support distributed processing of a large data sets across clusters of computers. The main components are HDFS (distributed file system), Hadoop YARN (framework for job scheduling resource management), Hadoop MapReduce (system for parallel large data sets processing).
  • YARN is an aka second generation of MapReduce. The main idea behind its architecture is to split JobTracker’s resource management and job scheduling/monitoring into separate daemons. So that a Global Resource Manager and per-application AppManager were represented in the system.
    Architecture:
  • Disco Project is a framework that is based on a MapReduce paradigm. It is on open Source Nokia Research Centre project which purpose is to serve handling of massive amount of data. According to the documentation it distribubed and replicates your data, schedules your jobs efficiently. Moreover, indexing of large amount of records is supported, so that real time querying is possible.
    Architecture:
    • Master adds jobs to the job queue and run them when nodes are available.
    • Client submit jobs to the master
    • Slaves are started on the nodes by master and spawn and monitor processes.
    • Workers do the jobs. Output location is notified to the Master.
    • Data locality mechanisms are applied once results are saved to the cluster.
  • Spring Batch is an open source framework for batch processing. It is provide development of robust batch applications. It is built on Spring Framework. It supports logging, transaction management, statistics, restart, skip and resource management. Some optimization and partitioning techniques can be tuned in the system as well.
    Architecture:
    • Infrastructure level is a low level tool. It is provide an opportunity to batch operations together and retry if an error occurred.
    • Execution environment provides robust features for tracing and management of the batch lifecycle.
    • Core module is the batch-focused domain and implementation. It includes statistics, job launch, restart.
  • Gearman is a framework to delegate tasks to another machines and processes that are more suited for it. Parallel execution of work is supported, as well as load balancing, multi language function calls.
    Architecture:
    • A client, a worker and a job server are the parts of the system.
    • The client creates jobs and sends them to job server.
    • Job server forward tasks to suitable workers.
    • Worker works and responds to the client through job server.
    • Communication with the server is established through TCP sockets.

Distributed platforms and frameworks aka Directed Acyclic Graph aka Stream Processing Systems:

  • Spark a system that is optimized for data analytics to make it fast both to run and write. It is suited for in-memory data processing. Api is in Java and Scala. Purpose: machine learning and data mining, also general purpose is possible. It runs on Apache Mesos to share resources with Hadoop and other apps.
  • Dryad is a general purpose runtime for execution data parallel apps. It is modeled as a directed acyclic graph (DAG) which defines the dataflow of the application and vertices that represents the operations that should be performed on the data. However, creation of a graph is quite tricky part. That is why some high-level language compilers were created. One of them is DryadLINQ.
  • S4 and Storm
    are both for real time, reliable processing of unbounded stream of data. The main difference between Storm and S4 is the Storm guarantees messages to be processes even while failures occur, and S4 supports state recovery. More in the previous post.

Distributed Data Storage(DSS)

  • Key-Value
  • Column Based
  • SQL
  • NewSQL
  • bla bla bla…
  • Latency
  • Consistency
  • Availability
  • bal bla bla…
The problem of choosing the most suitable distributed storage system is quite tricky and require some reading in the field. Some information on Storage System with their deep review, from my previous project on Decentralized Storage Systems, can be found on my wiki. Also a brief review of come hot systems are represented in my previous post.

Actor Model Frameworks:

Is a  quite old computational model for concurrent computations that consist of concurrent digital computations called actors, that can react on the received messages: make local decisions, spawn other actors, send messages and design behavior for the next message that will be received. However this framework has some issues, that should be taken into account, if the decision to use this model is made – (a) scalability, (b) transparency, (c) inconsistency.
Actor Programming Languages are Erlang, Scala and other. This is one extra motivation to get to know closer those languages.
The most popular Actor Libraries:
  • Akka
    • Language: Java and Scala
    • Purpose: Build highly concurrent, distributed, fault-tolerant event-driven application on the JVM.
    • Actors: Very lightweighted concurrent entities. They process messages asynchronously using an event driven receive loop.
  • Pykka
    • Language: Python
    • Purpose: Build highly concurrent, distributed, fault-tolerant event-driven application.
    • Actors: It is an execution unit that executes concurrently with other actors. They don’t share state with each other. Communication is maintained by sending/receiving messages. On message, an actor can do some actions. Only one message is processed at a time.
  • Theron
    • Language: C++
    • Purpose: Build highly concurrent, distributed event-driven application.
    • Actors: They are specialized objects that execute in parallel natively. Each of them has a unique address. Communication is dome by messaging. Each actor’s behavior defined in message handlers, which are user-defined private member functions.
  • S4 - more in the Stream Processing Systems section.

Scheduling and Load Balancing:

  • Luigi Scheduler
    • It is a python module that helps to build complex pipelines of batch jobs. Also it builds in a support for Hadoop. It is a scheduler that is open-sourced by Spotify and used within the company.
    • It is still quite immature and anyone can try hers luck to contribute to this scheduler that is written in Python.
  • Apache Oozie
    • It is a workflow scheduler system to manage Apache Hadoop jobs.
    • It is a directed acyclical graph od actions.
    • It is scalable, reliable and extensible system.
  • Azkaban
    • It is a batch job scheduler.
    • It helps to control dependencies and scheduling of individual pieces to run.
  • Helix
    • It is a generic cluster management framework for automatic management of partitioned, replicated and distributed resources hosted on a cluster of nodes.
    • Features: (a) automatic assignments of resources to nodes, (b) node failure detection and recovery, (c) dynamic addition of resources and nodes to a cluster, (d) pluggable distributed state machine, (e) automatic load balancing and throttling of transactions
  • Norbert
    • Provides easy cluster management and workload distiribution
    • It is implemented in Scala and wraps ZooKeeper, Netty and Protocol Buffers to make it easier to build applications.
    • Purpose: (a) Provide Group Management, change configuration, add/remove nodes. (b) Partitions workload by using software load balancing. (c) Provide asynchronous client/server RPC and notifications.

Replication:

Consesnsus and stuff:

Log Collection:

  • Apache Kafka
    Distributed Pub/Sub messaging systems that supports:
    • Persistent messaging with constant time performance
    • High-throughput
    • Explicit support for message partitioning over servers and machine-consumers
    • Support for parallel data load into Hadoop
    It is a viable solution to provide logged data to offline analysis systems like Hadoop, but it is might be quite limited for building real-time processing. The system is quite similar to Scribe and Apache Flume as they all do activity stream processing, even though that the architectures are different.
    Architecture:
  • Logstash

Message Passing Frameworks:

(De)Serialization for sending objects in the network safely:

  • Avro
    • It is a data serialization system.
    • Provides: (a) rich data structures, (b) compact, fast, binary data format, (c) container file to store persistent data, (d) integration with dynamic languages. The schemas are defined with JSON.
    • Used by Spotify
  • Protocol Buffer
    • Developed and Used by Google.
    • It encodes structured data in an efficient extensible format.
  • Apache Thrift
    • Purpose: scalable cross-language services deployment.

Chasing for latency:

Wednesday, September 3, 2014

Hadoop vs. Redshift

Childhood dreams do come true - in 2015 "Batman vs. Superman" will bring the world’s biggest superheroes to battle on-screen, finally solving that eternal debate who will prevail (I put my Bitcoins on Batman).
The Big Data world has its own share of epic battles. In November 2012 Amazon announced Redshift, their cutting edge data warehouse-as-a-service that scales for only $1,000 per terabyte per year. Apache Hadoop, created in 2005, is not the only big data superhero on the block anymore. Now that we have our own Superman vs. Batman, we gotta ask, how does Hadoop compare with Amazon Redshift? Let’s get them in the ring and find out.




In the left corner wearing a black cape we have Apache Hadoop. Hadoop is an open source framework for distributed processing and storage of Big Data on commodity machines. It uses HDFS, a dedicated file system that cuts data into small chunks and spreads them optimally over a cluster. The data is processed in parallel on the machines via MapReduce (Hadoop 2.0 aka YARN allows for other applications as well).
In the right corner wearing a red cape we have Redshift. Redshift’s data warehouse-as-a-service is based on technology acquired from ParAccel. It is built on an old version of PostgreSQL with 3 major enhancements:
  1. Columnar database - this type of database returns data by columns rather than whole rows. It has better performance for aggregating large sets of data, perfect for analytical querying.

  2. Sharding - Redshift supports data sharding, that is, partitioning the tables across different servers for better performance.

  3. Scalability - With everything running on the cloud, Redshift clusters can be easily up/down sized as needed.

Traditional solutions by companies like Oracle and EMC have been around for a while, though only as $1,000,000 on-premise racks of dedicated machines. Amazon’s innovation, therefore, lies in pricing and capacity. Their pay-as-you-go promise, as low as $1,000/TB/year, makes a powerful data warehouse affordable for small to medium businesses who couldn’t previously manage it. Because Redshift is on the cloud, it shrinks and grows as needed instead of having big dust gathering machines in the office that need maintenance.
Enough said; time to battle. Are you ready? Let’s get ready to rumble!

Round 1 - Scaling

The largest Redshift node comes with 16TB of storage and a maximum of 100 nodes can be created. Therefore, if your Big Data goes beyond 1.6PB, Redshift will not do. Also, when scaling Amazon’s clusters, the data needs to be reshuffled amongst the machines. It could take several days and plenty of CPU power, thus slowing your system for regular operations.
Hadoop scales to as many petabytes as you want, all the more so on the cloud. Scaling Hadoop doesn’t require reshuffling since new data will simply be saved on the new machines. In case you do want to balance the data, there is a rebalancer utility available.
First round goes to Hadoop!

Round 2 - Performance

According to several performance tests made by the Airbnb nerds, a Redshift 16 node cluster performed a lot faster than a Hive/Elastic Mapreduce 44 node cluster. Another Hadoop vs. Amazon Redshift benchmark made by FlyData, a data synchronization solution for Redshift, confirms that Redshift performs faster for terabytes of data.
Nonetheless, there are some constraints to Redshift’s super speed. Certain Redshift maintenance tasks have limited resources, so procedures like deleting old data could take a while. Although Redshift shards data, it doesn’t do it optimally. You might end up joining data across different nodes and miss out on the improved performance.
Hadoop still has some tricks up its utility belt. FlyData’s benchmark concludes that while Redshift performs faster for terabytes, Hadoop performs better for petabytes. Airbnb agree and state that Hadoop does a better job of running big joins over billions of rows. Unlike Redshift, Hadoop doesn’t have hard resource limitations for maintenance tasks. As for spreading data across nodes optimally, saving it in a hierarchical document format should do the trick. It may take extra work, but at least Hadoop has a solution.
We have a tie - Redshift wins for TBs, Hadoop for PBs

Round 3 - Pricing

This is a tricky one. Redshift’s pricing depends on the choice of region, node size, storage type (newly introduced), and whether you work with on-demand or reserved resources. Paying $1000/TB/year only applies for 3 years of a reserved XL Node with 2TB of storage in US East (North Virginia). Working with the same node and the same region on-demand costs $3,723/TB/year, more than triple the price. Choosing the region of Asia Pacific costs even more.
On premise Hadoop is definitely more expensive. According to Accenture’s "Hadoop Deployment Comparison Study", the total cost of ownership of a bare-metal hadoop cluster with 24 nodes and 50 TB of HDFS is more than $21,000 per month. That’s about $5,040/TB/year including maintenance and everything. However, it doesn’t make sense to compare pears with pineapples; let’s compare Redshift with Hadoop as a service.
Pricing for Hadoop as a service isn’t that clear since it depends on how much juice you need. FlyData’s benchmark claims that running Hadoop via Amazon’s Elastic Mapreduce is 10 times more expensive than Redshift. Using Hadoop on Amazon’s EC2 is a different story. Running a relatively low cost m1.xlarge machine with 1.68 TB of storage for 3 years (heavy reserve billing) in the US East region costs about $124 per month, so that’s about $886/TB/year. Working on-demand, using SSD drive machines, or a different region increases prices.
No winner - it depends on your needs

Round 4 - Ease of Use

Redshift has automated tasks for data warehouse administration and automatic backups to Amazon S3. Transitioning to Redshift should be a piece of cake for PostgreSQL developers since they can use the same queries and SQL clients that they’re used to.
Handling Hadoop, whether on the cloud or not, is trickier. Your system administrators will need to learn Hadoop architecture and tools and your developers will need to learn coding in Pig or MapReduce. Heck, you might need to hire new staff with Hadoop expertise. There are Hadoop as a Service solutions which save you from all that trouble (uh hum), however, most data warehouse devs and admins will find it easier to use Redshift.
Redshift takes the round

Round 5 - Data Format

When it comes to data format Redshift is pretty strict. It only accepts flat text files in a fixed format such as CSV. On top of that, Redshift only supports certain data types. The serial data type, arrays, and XML are unsupported at the moment. Even newline characters should be escaped and Redshift doesn’t support multiple NULLs in your data either. This means you’ll need to spend time converting your data before you can use it with Redshift.
Hadoop accepts every data format and data type imaginable.
Winner: Hadoop

Round 6 - Data Storage

Redshift data can only be stored on Amazon S3 or DynamoDB. Not only will you have to use more of Amazon’s services, but you’ll need to spend extra time preparing and uploading the data. Redshift loads data via a single thread by default, so it could take a some time to load. Amazon suggests S3 best practices to speed up the process such as splitting the data into multiple files, compressing them, using a manifest file, etc. Moving the data to DynamoDB is of course a bigger headache, unless it’s already there.
Life is more flexible with Hadoop. You can store data on local drives, in a relational database, or in the cloud (S3 included), and then import them straight into the Hadoop cluster.
Another round for Hadoop

Round 7 - General

Being a columnar DB, Redshift has a columnar engine and can’t, for instance, do any text analysis. Hadoop is open to all kinds of analysis via MapReduce and even more applications in version 2. Upon failure, say an I/O file error, Redshift goes on processing the next data without retries. Hadoop tries again in that case.
Hadoop wins again




Tonight’s Winner

We have a tie! Huh!? Didn’t Hadoop win most of the rounds? Yes, it did, but Big Data’s superheroes are better off working together as a team rather than fighting. Turn on the Hadoop-Signal when you need relatively cheap data storage, batch processing of petabytes, or processing data in non-relational formats. Call out to red-caped Redshift for analytics, fast performance for terabytes, and an easier transition for your PostgreSQL team. As Airbnb concluded in their benchmark: "We don’t think Redshift is a replacement of the Hadoop family due to its limitations, but rather it is a very good complement to Hadoop for interactive analytics". We Agree.

Sunday, August 31, 2014

Lightweight Virtualization with Linux Containers and Docker

Lightweight virtualization", also called "OS-level virtualization", is not new. On Linux it evolved from VServer to OpenVZ, and, more recently, to Linux Containers (LXC). It is not Linux-specific; on FreeBSD it's called "Jails", while on Solaris it’s "Zones". Some of those have been available for a decade and are widely used to provide VPS (Virtual Private Servers), cheaper alternatives to virtual machines or physical servers. But containers have other purposes and are increasingly popular as the core components of public and private Platform-as-a-Service (PAAS), among others. Just like a virtual machine, a Linux Container can run (almost) anywhere. But containers have many advantages over VMs: they are lightweight and easier to manage. After operating a large-scale PAAS for a few years, dotCloud realized that with those advantages, containers could become the perfect format for software delivery, since that is how dotCloud delivers from their build system to their hosts. To make it happen everywhere, dotCloud open-sourced Docker, the next generation of the containers engine powering its PAAS. Docker has been extremely successful so far, being adopted by many projects in various fields: PAAS, of course, but also continuous integration, testing, and more.


Saturday, August 30, 2014

Setting up a Multi-Node Mesos Cluster running Docker, HAProxy and Marathon with Ansible

The Google Omega Paper has given birth to cloud vNext: cluster schedulers managing containers. You can make a bunch of nodes appear as one big computer and deploy anything to your own private cloud; just like Docker, but across any number of nodes. Google’s Kubernetes, Flynn, Fleet and Apache Mesos, originally from Twitter, are implementations of Omega with the goal of abstracting away discrete nodes and optimizing compute resources. Each implementation has its own tweak, but they all follow the same basic setup: leaders, for coordination and scheduling; some service discovery component; some underlying cluster tool (like Zookeeper); followers, for processing.

In this post we’ll use Ansible to install a multi-node Mesos cluster using packages from Mesosphere. Mesos, as a cluster framework, allows you to run a variety of cluster-enabled software, including Spark, Storm and Hadoop. You can also run Jenkins, schedule tasks with Chronos, even run ElasticSearch and Cassandra without having to double to specific servers. We’ll also set up Marathon for running services with Deimos support for Docker containers.
Mesos, even with Marathon, doesn’t offer the holistic integration of some other tools, namely Kubernetes, but at this point it’s easier to set up on your own set of servers. Although young Mesos is one of the oldest projects of the group and allows more of a DIY approach on service composition.

TL;DR

The playbook is on github, just follow the readme!. If you want to simply try out Mesos, Marathon, and Docker mesosphere has an excellent tutorial to get you started on a single node. This tutorial outlines the creation of a more complex multi-node setup.

System Setup

The system is divided into two parts: a set of masters, which handle scheduling and task distribution, with a set of slaves providing compute power. Mesos uses Zookeeper for cluster coordination and leader election. A key component is service discovery: you don’t know which host or port will be assigned to a task, which makes, say, accessing a website running on a slave difficult. The Marathon API allows you to query task information, and we use this feature to configure HAProxy frontend/backend resources.
Our masters run:
  • Zookeeper
  • Mesos-Master
  • HAProxy
  • Marathon
and our slaves run:
  • Mesos-Slave
  • Docker
  • Deimos, the Mesos -> Docker bridge

Ansible

Ansible works by running a playbook, composed of roles, against a set of hosts, organized into groups. My Ansible-Mesos-Playbook on GitHub has an example hosts file with some EC2 instances listed. You should be able to replace these with your own EC2 instances running Ubuntu 14.04, our your own private instances running Ubuntu 14.04. Ansible allows us to pass node information around so we can configure multiple servers to properly set up our masters, zookeeper set, point slaves to masters, and configure Marathon for high availability.
We want at least three servers in our master group for a proper zookeeper quorum. We use host variables to specify the zookeeper id for each node.
[mesos_masters]
ec2-54-204-214-172.compute-1.amazonaws.com zoo_id=1
ec2-54-235-59-210.compute-1.amazonaws.com zoo_id=2
ec2-54-83-161-83.compute-1.amazonaws.com zoo_id=3
The mesos-ansible playbook will use nodes in the mesos_masters for a variety of configuration options. First, the /etc/zookeeper/conf/zoo.cfg will list all master nodes, with /etc/zookeeper/conf/myid being set appropriately. It will also set up upstart scripts in /etc/init/mesos-master.conf, /etc/init/mesos-slave.conf with default configuration files in /etc/defaults/mesos.conf. Mesos 0.19 supports external executors, so we use Deimos to run docker containers. This is only required on slaves, but the configuration options are set in the shared /etc/defaults/mesos.conf file.

Marathon and HAProxy

The playbook leverages an ansible-marathon role to install a custom build of marathon with Deimos support. If Mesos is the OS for the data center, Marathon is the init system. Marathoin allows us to http post new tasks, containing docker container configurations, which will run on Mesos slaves. With HAProxy we can use the masters as a load balancing proxy server routing traffic from known hosts (the masters) to whatever node/port is running the marathon task. HAProxy is configured via a cron job running a custom bash script. The script queries the marathon API and will route to the appropriate backend by matching a host header prefix to the marathon job name.

Mesos Followers (Slaves)

The slaves are pretty straightforward. We don’t need any host variables, so we just list whatever slave nodes you’d like to configure:
[mesos_slaves]
ec2-54-91-78-105.compute-1.amazonaws.com
ec2-54-82-227-223.compute-1.amazonaws.com
Mesos-Slave will be configured with Deimos support.

The Result

With all this set up you can set up a wildcard domain name, say *.example.com, to point to all of your master node ip addresses. If you launch a task like “www” you can visit www.example.com and you’ll hit whatever server is running your application. Let’s try launching a simple web server which returns the docker container’s hostname:
We run four instances allocating 25% of a cpu with an application name of www. If we hit www.example.com, we’ll get the hostname of the docker container running on whatever slave node is hosting the task. Deimos will inspect whatever ports are EXPOSEd in the docker container and assign a port for Mesos to use. Even though the config script only works on port 80 you can easily reconfigure for your own needs.
To view marathon tasks, simply go to one of your master hosts on port 8080. Marathon will proxy to the correct master. To view mesos tasks, navigate to port 5050 and you’ll be redirected to the appropriate master. You can also inspect the STDOUT and STDERR of Mesos tasks.

Notes

In my testing I noticed, on rare occasion, the cluster didn’t have a leader or marathon wasn’t running. You can simply restart zookeeper, mesos, or marathon via ansible:
There’s a high probability something won’t work. Check the logs, it took me a while to get things working: grepping /var/log/syslog will help, along with /var/log/upstart/mesos-master.conf, mesos-slave.conf and marathon.conf, along with the /var/log/mesos/.

What’s Next

Cluster schedulers are an exciting tool for running production applications. It’s never been easier to build, package and deploy services on public, private clouds or bare metal servers. Mesos, with Marathon, offers a cool combination for running docker containers–and other mesos-based services–in production. This Twitter U video highlights how OpenTable uses Mesos for production. The HAProxy approach, albeit simple, offers a way to route traffic to the correct container. HAProxy will detect failures and reroute traffic accordingly.
I didn’t cover inter-container communication (say, a website requiring a database) but you can use your service-discovery tool of choice to solve the problem. The Mesos-Master nodes provide good “anchor points” for known locations to look up stuff; you can always query the marathon api for service discovery. Ansible provides a way to automate the install and configuration of mesos-related tools across multiple nodes so you can have a serious mesos-based platform for testing or production use.

ANSIBLE IS THE BEST WAY TO MANAGE DOCKER


ansible_docker_blog
Docker is an exciting new open source technology that promises to "help developers build and ship higher quality apps faster" and sysadmins "to deploy and run any app on any infrastructure, quickly and reliably" (source: http://www.docker.com/whatisdocker/).
But to truly leverage the power of Docker, you need an orchestration tool that can help you provision, deploy and manage your servers with Docker running on them - and help you build Docker-files themselves, in the simplest way possible.

ANSIBLE+DOCKER RESOURCES

Installing & Building Docker with Ansible - Michael DeHaan, CTO & Founder of Ansible
“To me the interesting part is how to use Ansible to build docker-files, in the simplest way possible. One of the things we've always preferred is to have portable descriptions of automation, and to also get to something more efficient to develop than bash.”
 
“By using an ansible-playbook within a Docker File we can write our complex automation in Ansible, rather than a hodgepodge of docker commands and shell scripts.”
 
“One of the more logical things to do is to use Docker to distribute your containers, which can be done with the docker module in Ansible core.”
 
Read the full article here.


Docker Misconceptions - Matt Jaynes, Founder of DevOpsU
"...you absolutely need an orchestration tool in order to provision, deploy, and manage your servers with Docker running on them.
 
This is where a tool like Ansible really shines. Ansible is primarily an orchestration tool that also happens to be able to do configuration management. That means you can use Ansible for all the necessary steps to provision your host servers, deploy and manage Docker containers, and manage the networking, etc."
 
"So, if you decide you want to use Docker in production, the prerequisite is to learn a tool like Ansible. There are many other orchestration tools (some even specifically for Docker), but none of them come close to Ansible's simplicity, low learning curve, and power."
Read the full article here.

The Why & How of Ansible & Docker - Gerhard Lazu, a contributor to The Changelog
 
"Ansible made me re-discover the joy of managing infrastructures. Docker gives me confidence and stability when dealing with the most important step of application development, the delivery phase. In combination, they are unmatched."
 
Read the full article here.


VIDEO: Ansible + Docker Demonstration - Patrick Galbraith, HP Advanced Technologies

Friday, August 29, 2014

New iPhone 6 leaking picture

It’s happening. Apple’s invitation to a September 9 event has officially gone out. At least one next-gen iPhone will be unveiled in a few short days. However, new photos have emerged on Chinese site WeiFeng that appear to show components that match up almost identically to previous leaks for a 4.7-inch iPhone.
If these numerous leaks are to be trusted as the real deal, and we are, in fact, looking at parts for the next iPhone, then you’ll notice that this design language is quite a departure from the past few generations of the iPhone. From the iPhone 4 onward (with the slight exception of the iPhone 5c), the iPhone has had a flat face and back, with rounded corners.
These components point to a more iPad mini-like shape, with four strips of plastic seemingly running across the back of the device. As AppleInsider mentions, the strips in these most recent photos seem to be closer to the color of the metal back of the phone, as opposed to previous leaks. This might suggest that we’re seeing a more finalized version.
However, we’ve also heard rumors that claim that the plastic strips are just a preliminary design, and will eventually be replaced by glass, which would look a bit closer to the current iPhone 5s.
The next iPhone is a big deal for Apple. The company is making big changes to react to the smartphone landscape that it had a huge part in creating, and all this without Steve Jobs at the helm.
All that said, with a bigger screen and a more durable design, this next iPhone could be the device of the year.