Raspberry Pi Cluster — “Why To” Guide

This will be more of a “why to” guide and not so much of a “how to”, and certainly not a step-by-step on the setup of an individual pi. A good place to start for setting up and individual pi is the Raspberry Pi sit at: https://www.raspberrypi.org

A single Raspberry Pi v3 has enough compute power to be a decent general purpose Linux box, run a Linux Apache MySQL PHP (LAMP) stack to run a website, or be a streaming media server by running OSMC. But what a single Pi can’t do is all of the above at the same time handling a respectable workload.

Back in the day, techies would run a few Linux boxes in their den  or closet, usually re-purposed desktop PCs that were once very powerful desktop or gaming systems but then became inadequate for those purposes. While you can still do this today, you have to consider the cost of powering a full desktop or gaming PC 24/7. When you consider that, plus the cost of the air conditioning, and the noise, and the physical footprint these antiquated PCs take up, it no longer makes sense to do that.

Also, in the event of an extended power outage — such as what we experienced with Hurricane Sandy — you need your infrastructure to take up as little electrical power as possible. Running your services on low powered Pis versus antiquated desktops, you can run longer on UPS; and if you end up living on generator for weeks, your stack of Pi’s will take up negligible load on your generator.

I should clarify that the word “cluster” in the context of multiple Pi’s doesn’t end up being one big Linux box that runs across the Pi’s. It is simply a handful of Pi’s with services distributed across them.

I run three Pi’s, or now I suppose four as of today, with the workloads split out as follows:

  1. One Pi V3 running OSMC acts as the streaming media server for the living room entertainment center.
  2. One Pi V2 running a LAMP stack, plus wordpress, and is what is serving up this page right now for you.
  3. One RP V2 running RasPBX , which is a distribution that specializes in making your Pi run Asterisk and FreePBX  so you can have a full enterprise level phone system, complete with voicemail, conference bridges, etc.
  4. My newest Pi V3 will be a general purpose box and playpen.

Usually you acquire multiple Pis over time buying one, then another, then another, etc. Over time you end up with a pile of Pis just hanging about and a mess of cables. Or perhaps you bought a case for each one, but you find they don’t stack well and still are a mess that is hard to manage.

To solve that problem, I recommend the Dog Bone case by Geaux Robot, sold at Amazon here: https://www.amazon.com/GeauxRobot-Raspberry-Model-4-layer-Enclosure/dp/B00MYFAAPO

This case will accommodate four (4) Raspberry Pi’s model 2 or 3. They also offer a 2-pi or 3-pi version. This will keep your Pi’s stacked neatly and nicely as a single unit, while providing good airflow for cooling and it looks nice and techy.

Next, you need to consolidate all those power cords. If you bought a power cord for each Pi, then you find that they don’t work nicely with a UPS power strip and take up too many slots. Now, you do need a UPS, but there will be more on that later. If you buy a powered USB hub that can provide 2A of power to each port, then you can plug that USB hub into your UPS, and plug the pi’s into that via UPS cords. The net result is you end up with only one power port on your UPS for the entire stack of Pis.

I used this, which is a power port designed for delivering power, not a USB hub: https://www.amazon.com/gp/product/B0115MVRO4

I condensed my stack of Pi’s onto this power source as well as three NetGear switches, so that all of those devices consume only one port on my UPS.

Why do you need a UPS? Because the Raspberry Pi ‘s SD card can become corrupt if power is cut without a graceful system shutdown. What that means is if you lose power, your Pi may not boot and you have to spend time recovering the image — or starting with a new one and reconfiguring your services.  Plus, your Pi’s will become an integral part of your home infrastructure which you don’t want to lose service if it can be avoided.

Before I move off the topic of power, it is important to point out that stable power is critical. If your power source cannot provide at least 15.A  to each Pi, you run the risk of the Pi locking up, crashing, rebooting, etc. Again, this would run the risk of corrupting the SD card of the Pi, because the reboot would be without a graceful shutdown.

For the network, you really should get a 5 or 6 port gigabit switch. That would be 4 ports for the Pi’s and another that uplinks to your other switches. You can, of course, just plug the Pi’s into an existing switch if you happen to have enough ports available. But segregating the Pi’s onto their own small switch is cleaner because you can strap the dog bone case to the switch and handle and manage as a single unit. Usually you buy, or make your own, very short patch cords.

A word about wireless — avoid it if you can. You can use it of course, but each Pi will lose signal every so often, or you will end up with your wireless net oversaturated. Over the past couple years I have re-worked my home infrastructure to put all infrastructure back on wired Gig-E, as well as networked cameras and desktop PCs and gaming rigs, and the only things consuming wireless are the mobile devices and everyone is happy and fast.

I recommend using the “Raspian Lite” image instead of the default image, which includes the X windows GUI and desktop environment. If you install that, you will play with it long enough to realize it isn’t a viable desktop environment for everyday use, and then you spend time figuring out how to uninstall all that bloat. I recommend to install the lite version and then only install on top of that the services you really want and need.

Typically the only thing I install, other than the packages required for the services planned (Apache Httpd, MySQL, etc.) is webmin in order to  manage and administer the system.

You may have picked up that I do not have firewall services running on any of the Pis. I love pfSense, but as far as I am aware it still is next to impossible to make run on an ARM based Pi. It might be possible these days, but just not something I want to spend time on. I run pfSense on a dedicated Zotac C Series CI323, which is a mini x86 system that is fanless and draws little power.

None of my Pi’s are directly on the public Internet, but I do have firewall rules to pass the traffic that is appropriate to each service on the pfSense firewall.  Therefore I do not talk much about security here, but it is very important to consider. I personally would not put a Pi directly onto the public Internet, but that doesn’t mean it can’t or shouldn’t be done. But if you go that route, be extremely diligent managing and applying patches and hotfixes, strong password, and some form of protection like fail2ban.

Cloud!

CloudIt is late in the year 2014, and for the past several years the various vendors have been preparing their cloud solutions. Amazon lead the charge and pretty much set the industry standard with EC2 and now all the vendors are following suit, and the CIOs and CTOs have to figure out how best to integrate cloud into their IT strategy. Those that have built careers by building out huge data centers, strategically placed across the globe with thousands of purchased servers, are finding it difficult to transform.

The idea of cloud is simple… its infrastructure as a service (IaaS). You need a server, you go up to the cloud and provision one, and a few minutes later you receive the details and credentials to connect, and you are logged in and installing the packages needed to support the application stack. You need ten servers? A hundred? No problem! The cloud provides all you need, and the supporting services, like virtual networks, storage, directory services, etc.

Gone are the days of old, where you had to wait months for hardware procurement, followed by weeks for the various infrastructure silos to rack/stack/blast/config your server and hand it off to your AppDev team. Let’s not forget about the weeks that get tagged onto that provisioning cycle for the various approvals and sign-offs.

No, we don’t do that anymore.  Now you can chop months off the time-to-market for your great new business enabling technology idea… or at least that should be the case.  The challenge is the infrastructure camps have resisted embracing the cloud because that would require they reinvent pretty much all that they do; and they have spent 20+ years learning how to be a server provisioning/deployment shop.

There are valid reasons to take pause and do a lot of research and deep planning on your cloud strategy. In order to successfully integrate the cloud into your IT business model,  you will have a lot of issues to resolve in the regulatory/compliance/legal/hr/risk/ITsec areas. All of these corporate executives know that their best interests are served by not letting data reside on third party systems — which means a public cloud service is out of the question. Or, perhaps you can come out of those deep discussions with agreement that data can be assigned to categories, and certain data can reside in a loud, but some can’t — which is good, but you now have created another thing that needs to be policed and audited. Or, you might be able to sell the idea that an encrypted VPN to a cloud, plus the cloud vendor agreeing to ring fence your cloud components completely — which also would need to be policed and audited.

Yes, the cloud solves a lot of problems, but creates some risks and requires a lot of stuff to be thought about and ironed out. The time to do that, obviously, is before putting out the capital expense. You can’t really even decide what kind of cloud you will have until these discussions have happened. It is either going to be 100% in the public cloud (not likely for most companies), or 0% in a public cloud and 100% in an internally grown cloud, or some combination of the two.

The problem with the latter two scenarios, where some form of internal, home grown cloud is in the strategy, is that the internal cloud needs to be designed, built, deployed, and operated by the same infrastructure folks that have spent the past two decades convincing themselves that a 4 month infrastructure procurement and deployment cycle is perfectly fine… and the same infrastructure organization that at this very moment probably can’t provide transparency into the exact makeup of your allocation charges.

I don’t mean to sound critical of infrastructure organizations, but the scenario I just described exists all over the place in the largest corporations across the globe.   As a result, senior managers in infrastructure now have to figure out how to build a cloud that is as mature and well-thought out as Amazon’s EC2 whilst starting a decade behind.

You are competing with Amazon’s mature, battle tested cloud solution because your AppDev teams have been playing with it at home, or they came from jobs that could leverage Amazon’s public cloud. They know that procuring a Windows or Linux box should take about 8 minutes. They know that they can expect to be charged 17 cents and hour, and only be charged when the server is powered up. They know that they need to design their apps to scale out and have the app know how to create more servers in the cloud to accommodate workload peaks and scale down to reduce cost.

All of that means your cloud needs to be able to do all that. Automated provisioning and decommisioning, fully baked cost allocation models,  an API, etc. You have to be able to scale the capacity of the underlying hardware infrastructure accordingly and make that transparent to the app layer — because a cloud that is “all filled up” and can’t accommodate new provisioning/growth requests for four months while more SAN shelves and/or hypervisor hosts are being procured would take the business right back to the old days of yore.

 

Tom C