After reading a recent article about OpenShift, it seems that there is still some confusion about what OpenShift is and how it fits into the rest of the cloud landscape. In this post I will provide my opinion on the difference between IaaS and PaaS clouds and how the OpenShift platform addresses the requirements of developers (devs) as well as operations (ops).
Standard disclaimer applies: This blog represents my opinion and not that of Red Hat. I work on OpenShift so my opinions are probably biased.
Let’s start by breaking down exactly what is “the cloud”. The term, cloud, can refer to a few different things but I would break them up into 3 categories:
- Software as a service or SaaS
- Platform as service or PaaS
- Infrastructure as a service or IaaS
Software as a service are directly face the customer. Examples of SaaS include Netflix, Gmail, Google and Microsoft’s online office suites, Dropbox, Google drive, etc. SaaS are pretty easy to understand so I am going to skip them for now. However, the definition of IaaS and PaaS and the boundary between them is often misunderstood.
What is IaaS?
IaaS is the lowest layer of the cloud. The goal of IaaS is to provide compute resources on demand. Using IaaS you can start a virtual machine with the operating system of your choice and load any software on it.
IaaS clouds like OpenStack and EC2 provide a management interface and APIs which allows you to register customized OS images, manage access, and spin up and manage virtual machines.
They make it very easy to start VMs but there are some important usability items to note:
- In essence you are starting many VMs and the burden of creating OS images and making sure that running VMs are fully patches is left on Operations.
- VMs are are good for heavy applications that utilize them fully. However, this is rarely the case so you end up with a lot of unused capacity.
- It is up to ops to monitor application and machine usage and spin up additional VMs when needed.
How about PaaS?
Unlike an IaaS which provides an efficient way to manage VMs; PaaS is meant to manage a large number of applications. Using a PaaS you can quickly deploy an application on a preconfigured runtime environment.
A PaaS is great for self-service environments where developers need a quick way to host and manage applications on an Ops approved environment. Examples of PaaS providers include OpenShift, Heroku, Cloud Foundry and its various forks, etc.
While PaaS clouds are normally deployed on an IaaS, they do not require it. A PaaS can be run in a traditional datacenter without requiring an IaaS layer.
If you know that applications are not going to load a system completely and need to overcommit many varied workloads, PaaS is a great abstraction for that.
PaaS allows you to create a polyglot environment where machines are running many applications each with different runtime requirements. This is accomplished by using a standard set of configurations and isolating applications from each other.
It is important to understand that PaaS is not a silver bullet and does not fit all requirements. For example, if all the applications require different tweaks to low level kernel params, or non-standard runtime configurations, then a PaaS probably will not be a good fit. It is also important to note that PaaS is not meant as a replacement for operations personnel. It is a tool that provides a level of automation and a nice alignment layer between developers and ops expectations.
Developers can provide the runtime, scalability and extensibility requirements of their application in a standard format and ops can match these up with their requirements for a sane and known configuration, multi-tenancy and familiar installation.
OpenShift vs OpenStack
OpenShift and IaaS clouds like OpenStack and EC2 provide two different layers of the cloud. OpenStack is an IaaS platform and makes it easy to start OS images on VMs. OpenShift, on the other hand, is a PaaS platform and provides the ability to manage a large number of applications on those VMs. Thus OpenShift Origin can be one of the images that can be started on OpenStack (or on EC2, Rackspace etc.)
How does OpenShift make your life easier?
Looking from a developers point of view…
In OpenShift the code is managed in a GIT repository. This is the same technology used by the Linux Kernel maintainers for their distributed source control needs. Every application OpenShift gets its own private GIT repository. This allows you to maintain history and easily perform rollbacks if they accidentally push broken code. It also enables a distributed work environment and provides an easy way to publish code (GIT push).
The runtime requirements are provided by a set of cartridges with a standard set of configurations. The cartridges provide languages runtimes like ruby, jboss, php etc and other services like databases and continuous integration functions. Additional cartridges can easily be added for functionality that does not come with base OpenShift.
OpenShift also allows the developer to ssh into the application and check log files and see exactly what processes are running.
From the ops point of view…
The VMs are being utilized efficiently since they are multi tenant. Applying patches to the OS is easier since there are fewer VMs to manage. The cartridges that are running have a standard set of configurations that can be pre-approved by the ops team.
OpenShift uses SElinux policies, PAM namespaces and bind mounts to ensure that applications don’t have access to unapproved files or processes. Unlike unix permissions, the SElinux model prevents access to files and processes unless explicitly approved in a policy. If an application is compromised, a good set of SElinux policies will make it extremely difficult to compromise the rest of the system or other applications. In my opinion, this is a very big advantage OpenShift has over other PaaS platforms that use unix permissions and lxc to isolate applications.
Even if you decide to accept the cost of running, patching and maintaining an application per VM, it might be possible to break out of the VM and compromise the host and other VMs unless SELinux is used properly.
Application resources are also controlled by c-groups, quota and other modules. This ensures that one application does not hog all resources on the VM.
Want to give it a try?