The Full Stacky: 2016

Friday, December 30, 2016

OpenStack Day 4 - Node Prep (IPMI / PXE / Ironic)

Prepare yourself... it's going to be a long, long day.

Up to this point things have actually gone pretty smoothly. This isn't my first foray into deploying a bit of OpenStack. My first attempt was using an Ubuntu/Canonical approach which utilized a bit of Canonical Magic called MAAS: Metal As A Service (https://maas.io). It's a pretty slick idea and I was excited to try it. In some ways it seemed like a solution in search of a problem but I'm geeky enough to want to do it for it's own sake.

I failed. My understanding of IPMI (as well as PXE, BMC, etc..) was lacking and I didn't want to take the time to fill that gap just to be able to use MAAS. It wasn't magic enough. In fairness it probably would be if you were using hardware that was new enough to actually support IPMI properly.

So IPMI is at the root of it all. What's IPMI? IPMI stands for the Intelligent Platform Management Interface. The specification was led by Intel and first published way back in 1998 (that date is important). It provides for an autonomous subsystem that provides management, and monitoring, capabilities for the host's CPU, firmware, and OS. It allows Administrators to administer the system out of band using a connection directly to the hardware. In the early days this was a dedicated port but later most vendors added the capability to one of the on-board Network Interface Cards (NICs) and so IPMI shares with Ethernet (so called side-band rather than out of band).

One of the things that IPMI can manage and monitor is power. To do this the host system has an IPMI subsystem called the baseboard management controller (BMC) which is generally powered up and running even if the system is "off". The BMC is the central processor for all IPMI interactions with the various host components. So a rack full of servers controlled by IPMI can be powered up and down as needed. This allows for power savings and potentially longer lived systems.

IPMI is only really available on server class hardware. Commodity, home use or desktop hardware really has little need for this type of technology. Vendors have released various implementations such as the Dell DRAC, HP integrated Lights-Out (iLO), IBM Remote Supervisor Adapter, etc...

Like most things technology related IPMI has multiple versions. V1.0 as we mentioned was 1998. v1.5 was released in 2001 and added IPMI over LAN. v2.0 was 2004 and added support for VLAN and some security features. v2.0rev1.1 was published on 2014 and added support for IPv6. Finally, 2.0rev1.1errata7 was published in 2015 with clarifications and a couple of more secure protocols.

And this versioning is really the rub... or really the vendor support or lack thereof. The beauty of standards is that there are so many to choose from and often they contain optional pieces or fuzzy areas where vendors just make a call. My pile of hardware is largely composed of older Dell servers ranging from Gen 9s (850, 1950, 2950) up to a couple of Gen 11s (R210, R310, R410 and R515). I'm also using a couple of HP ProLiant DL320e-Gen8 servers. Digging through my pile of unused hardware yielded 16 possible machines. More than enough. The specs on some were not great (particularly in the RAM area but many also only have a single 250G HD). But my thinking was that if they are usable, then a few components will be cheap compared to entire systems.

When I was using MAAS it completely failed to recognize anything older than the Gen 11 Dell servers. At the time I only had two of those so rolling out my five node proof of concept cloud wasn't going to happen. So I ditched MAAS and hand rolled some OpenStack. Brutal.

This time however we're using OpenStack baremetal (Ironic) to manage the physical nodes. Since my goal is to learn the ins and outs of OpenStack then figuring out how work Ironic is most certainly on the agenda in a way that learning MAAS really wasn't.

With my mind firmly on the results of the MAAS experiment, I chose to start with a Dell PE2950. This is a Generation 9 machine circa 2006 (keep in mind IPMI v2.0 was published in 2004). If this node works then probably everything is going to work. If its even a little bit shaky then it'll be the hardest to get working and everything else will be easier afterward. Right?

On with the show.

We have our undercloud director up and running. It has an externally accessible IP on interface 2 and an IP address of 192.168.50.1/24 on it's first interface. We deliberately chose the first interface (eth0, gig0, g0, whatever) as the Management/PXE interface because a lot (all?) of the Dell DRAC implementations only support IPMI on the first interface.

So I add a management switch to the mix and plug port 1 from the undecloud director and port one of my first node: The Dell PE2950.

To get the IPMI working we really only need three pieces of information: The IP Address of the node, and the logic credentials (username/password). Boot up the 2950 and go into SetUp. In setup we want to ensure that virtualization is enabled in the CPU and that we will be PXE booting off the first interface and that should be first in the boot order. Next we go through the boot up sequence again and press Ctrl-E when the BMC/IPMI setup section arrives. I define a static IP address of 192.168.50.7, netmask 255.255.255.0 and gateway of 192.168.50.1. I set the username to 'root' (because it was the default and why confuse it) and password to 'OpenStack' (to be changed later but we aren't even sure if this is going to work yet). Save everything and power off the 2950.

Note: You should recall from Day 2 (http://www.thefullstacky.com/2016/12/openstack-day-2-deploying-undercloud.html) that I'm under the impression I can use 192.168.50.2 - 192.168.50.79 for IPMI. The .7 choice was a bit random and I like 7s.

SSH into the undercloud director. From the command prompt we should be able to ping the BMC of the new node because the DRAC/BMC should be powered up even if the host is "off".

$ ping 192.168.50.7
PING 192.168.50.7 (192.168.50.7) 56(84) bytes of data.
64 bytes from 192.168.50.7: icmp_seq=1 ttl=64 time=0.928 ms
64 bytes from 192.168.50.7: icmp_seq=2 ttl=64 time=0.489 ms
64 bytes from 192.168.50.7: icmp_seq=3 ttl=64 time=0.440 ms
^C
--- 192.168.50.7 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.440/0.619/0.928/0.219 ms

So far so good.

To register the nodes to baremetal we'll create a JSON file that has the required information in it. We know the IP, username and password so we just need to provide a pm_type. The standard type is 'pxe_ipmitool' which uses the ipmitool utility. The TripleO docs recommend using that driver for about everything. There is also 'pxe_ilo' which is recommended for HP Gen 8/9 machines and 'pxe_drac' which they recommend on Gen 11 and newer Dell systems. The PE2950 is Dell Gen 9 so we'll go with pxe_ipmitool. Our final JSON file looks like this (the 'Name' tag is optional it just makes me happy and it's easier relate to the hardware rather than the IP address of the giant UUID).

$ cat instackenv.json
{
        "nodes": [
                {
                        "pm_type":"pxe_ipmitool",
                        "pm_addr":"192.168.50.2",
                        "pm_user":"root",
                        "pm_password":"OpenStack",
                        "name":"TAG-203"
                }
    ]
}

Run the json validator to make sure we don't make any silly errors:

$ json_verify < instackenv.json
JSON is valid

Cross fingers and toes and click the import button. First we'll re-source the stackrc file just to make sure we have all of our environment variables setup properly. When we launch the import it actually starts an Openstack Mistral workflow... pretty cool stuff.

$ . ~/stackrc
$ openstack baremetal import instackenv.json
Started Mistral Workflow. Execution ID: 1d3fe9c5-35e7-4463-a2d4-b17ed7365630
Successfully registered node UUID 1dbb723e-d9a5-4431-b77a-6969588355ff
Started Mistral Workflow. Execution ID: 8716cf03-4fae-45d0-8408-6f2117ab0344
Failed to set nodes to available state: IronicAction.node.set_provision_state failed: <class 'ironicclient.common.apiclient.exceptions.BadRequest'>: The requested action "provide" can not be performed on node "1dbb723e-d9a5-4431-b77a-6969588355ff" while it is in state "enroll".

*sigh* And so it begins...

The 'pxe_ipmitool' driver uses the ipmitool.... you can also use that tool manually from the command line which is pretty interesting.

$ ipmitool -H 192.168.50.7 -U root -P OpenStack -N 5 channel info
Activate Session command failed
Error: Unable to establish LAN session

Now that is pretty frustrating because Dell documentation says that should just work.

The ipmitool supports a few different IPMI Interfaces:

open – Linux OpenIPMI interface
imb – Intel IMB
lan – IPMI v1.5 LAN interface
lanplus – IPMI v2.0 RMCP+ LAN interface

I'm not sure what the default is. Running it with -I lan produces the same results. Dell Gen 9 is supposed to support IPMI v2.0. So using -I lanplus should be the trick.

$ ipmitool -I lanplus -H 192.168.50.7 -U root -P OpenStack -N 5 channel info
Error: Unable to establish IPMI v2 / RMCP+ session

Nope. I've seen multiple examples of this working on the Internet so it is doubly confusing.

To rule out hardware I replaced the 2950 with a 1950 and configured it in the same fashion. Struck out in the exact same fashion. Possibly firmware versions but some of the articles that show it working are pretty old. Tried a settings reset on the IPMI config, still with no luck. Double checked to make sure I hadn't set an RCMP key. nope.

Okay then. So Ironic actually provides two other drivers that are considered "testing" drivers:

fake_pxe provides stubs instead of real power and management operations. When using this driver, you have to conduct power on and off operations, and set the current boot device, yourself.
fake provides stubs for every operation, so that Ironic does not touch hardware at all.

My belief is that by using these I'll get the hardware working even without IPMI. I'm okay with having to push the power button myself when necessary. Just recently a fellow blogger put up a quick article on using fake_pxe in just this fashion: TripleO: Using the fake_pxe driver with Ironic.

We'll go down this path in the near future but the next step is to move to newer hardware to see what's going to work and what isn't... in other words to understand the scope of what we'll be faking. Pushing a power button or 3, no worries. Pushing 16... ick. If there's enough working hardware then fake_pxe will be delayed until we experiment with adding additional Compute nodes and/or support for Swift/Cinder/Ceph.

The Dell Gen 10 hardware proved slightly better but still not working. The Gen 11 hardware worked perfectly... with the pxe_ipmitool driver. It didn't work with the pxe_drac driver which was the recommended driver for Gen 11 and newer. Similarly the HP Gen 8 worked fine with pxe_ipmitool but not with pxe_ilo. In the end out of 16 machines I started with only 6 made the final cut after ruling out IPMI issues and bad hardware (I pulled this stuff out of a stack... parts are going to be bad).

So my final line up looks like this:
(1) Dell R210
(2) Dell R310
(1) Dell R410
(1) Dell R515
(1) HP ProLiant DL320e-Gen8

I had initially been running the undercloud director on the Dell R210 however I'll be re-installing the director on one of the Gen 9 or Gen 10 machines as the Director doesn't require IPMI and that'll make one less machine I have to fake. That'll bring me up to 7 machines in the initial cluster. I'm thinking one Controller, 3 Ceph and 2 Compute... we'll see. But for now that means we delay playing with the fake_pxe driver.

$ cat instackenv.json
{
        "nodes": [
                {
                        "pm_type":"pxe_ipmitool",
                        "pm_addr":"192.168.50.2",
                        "pm_user":"root",
                        "pm_password":"OpenStack",
                        "name":"TAG-203"
                },
                {
                        "pm_type":"pxe_ipmitool",
                        "pm_addr":"192.168.50.3",
                        "pm_user":"root",
                        "pm_password":"OpenStack",
                        "name":"TAG-207"
                },
                {
                        "pm_type":"pxe_ipmitool",
                        "pm_addr":"192.168.50.4",
                        "pm_user":"root",
                        "pm_password":"OpenStack",
                        "name":"TAG-183"
                },
                {
                        "pm_type":"pxe_ipmitool",
                        "pm_addr":"192.168.50.5",
                        "pm_user":"root",
                        "pm_password":"OpenStack",
                        "name":"TAG-206"
                },
                {
                        "pm_type":"pxe_ipmitool",
                        "pm_addr":"192.168.50.6",
                        "pm_user":"root",
                        "pm_password":"OpenStack",
                        "name":"TAG-202"
                }
        ]
}

$ openstack baremetal import instackenv.json
Started Mistral Workflow. Execution ID: 14f7d053-e749-4abc-999a-70f3de2f1de8
Successfully registered node UUID fef86621-9491-48af-b5c6-2104bc88a7fc
Successfully registered node UUID 5769ec4d-181e-4e1d-87dd-a6e3891ecf6d
Successfully registered node UUID 75554bfa-b300-48c3-b6d8-ce3f68c67859
Successfully registered node UUID cde855a9-0188-4912-a2c2-06dc55e582f7
Successfully registered node UUID 93df3756-9f61-47ce-b12b-c5f2b3ab846f
Started Mistral Workflow. Execution ID: 93b3f11c-8e8b-4081-91bf-d6723cd58b81
Successfully set all nodes to available.

$ openstack baremetal node list
+--------------------------------------+---------+------+-------------+-----------------+-------------+
| UUID                                 | Name    | UUID | Power State | Provision State | Maintenance |
+--------------------------------------+---------+------+-------------+-----------------+-------------+
| fef86621-9491-48af-b5c6-2104bc88a7fc | TAG-203 | None | power off   | available       | False       |
| 5769ec4d-181e-4e1d-87dd-a6e3891ecf6d | TAG-207 | None | power off   | available       | False       |
| 75554bfa-b300-48c3-b6d8-ce3f68c67859 | TAG-183 | None | power off   | available       | False       |
| cde855a9-0188-4912-a2c2-06dc55e582f7 | TAG-206 | None | power off   | available       | False       |
| 93df3756-9f61-47ce-b12b-c5f2b3ab846f | TAG-202 | None | power off   | available       | False       |
+--------------------------------------+---------+------+-------------+-----------------+-------------+

And that's that. I've started some of the Gen9 and Gen10 servers running the big CD based driver updates for their various models. Might help. Might not. Usually good to be current.

Next up is the introspection phase where IPMI will tell the machines to power on and they will PXE boot a bit of software that will evaluate the node's capabilities. This allows for automatic classification of servers into certain roles if you choose to go that route. Otherwise it's just good info to see and proves that everything is working properly. We'll discuss flavors (part of the node classification in this context) briefly and that'll be it for the prep phase and we'll finally move onto deploying the OverCloud.

Have a great new years everyone. And if you have thoughts on my ongoing IPMI issues with Dell Gen9/10 hardware please let me know.

Saturday, December 24, 2016

Happy Holidays Everyone!

Taking a few days off to spend time with the family and recharge. I also have a half dozen courses queued up on EdX, Udemy and the Linux Foundation. It'll be a great few days off. There'll be at least one more post before the end of the year where we discuss the wonders of IPMI as utilized by OpenStack Baremetal/Ironic and how the fake drivers could be so, so much better (if we care and I'm not sure we do).

Happy Holidays everyone!

Wednesday, December 21, 2016

OpenStack Day 3 - Creating the Boot Images

Today I had intended to create boot images and to get the nodes registered and introspected. Sadly my old friend IPMI had other plans. That's a story I'll share in the next posting because it really deserves it's own spotlight.

This will be an easy one.

So when your Overcloud machines boot up they will PXE boot from the Undercloud director. In order for that to happen we need to make sure these boot images are available and this was not done as part of the undercloud deployment.

The TripleO documentation recommends having the latest images and specifically that images for previous OpenStack releases may not function properly. You can download these images from http://buildlogs.centos.org/centos/7/cloud/x86_64/tripleo_images/newton/delorean/ or you can build them yourself. TripleO docs say "It's recommended to build images on the installed undercloud directly since all the dependencies are already present.". English is a slippery language. Does this mean "If your going to build them, building them on the undercloud is recommended" or does it mean "Building your own is recommended and when you do it you should do it on the undercloud"?

For me this is at least partially a learning exercise and building them seems like it will be interesting. The only real options are: "include Ceph?" and "Whole Disk Images?" I'll be using Ceph so that part is pretty easy. Whole Disk Images ( http://tripleo.org/advanced_deployment/whole_disk_images.html) I don't have an opinion on yet so for now I'm going to skip that.

A couple of notes before we get to it:

NOTE#1: Your undercloud machine requires at least 10G of RAM (preferably 16) before it will use tmpfs for building these images. If you don't have that it'll use the physical drives and that will probably take longer. Maybe lots longer.
NOTE#2: The TripleO documentation for the image build command includes two '--config-file' options. That doesn't appear to be an option with the version of newton I'm using so I just left those out and went with '--all'. Maybe good. Maybe bad.

export OS_YAML="/usr/share/openstack-tripleo-common/image-yaml/overcloud-images-centos7.yaml"

STABLE_RELEASE="newton"

export DIB_YUM_REPO_CONF="/etc/yum.repos.d/delorean*"

export DIB_YUM_REPO_CONF="$DIB_YUM_REPO_CONF /etc/yum.repos.d/CentOS-Ceph-Jewel.repo"

openstack overcloud image build --all

openstack overcloud image upload

Time passes... and finally:

"Successfully build all request images"

In undercloud.conf I had the image directory as the default "." so it dumped the images in ~stack.

drwxrwxr-x. 3 stack stack 26 Dec 19 19:09 ironic-python-agent.d
-rw-rw-r--. 1 stack stack 354615883 Dec 19 19:11 ironic-python-agent.initramfs
-rwxr-xr-x. 2 stack stack 5393328 Dec 19 19:11 ironic-python-agent.vmlinuz
-rwxr-xr-x. 2 stack stack 5393328 Dec 19 19:11 ironic-python-agent.kernel
-rw-rw-r--. 1 stack stack 544162 Dec 19 19:11 dib-agent-ramdisk.log
drwxrwxr-x. 3 stack stack 26 Dec 19 19:49 overcloud-full.d
-rwxr-xr-x. 1 root root 5393328 Dec 19 19:49 overcloud-full.vmlinuz
-rw-r--r--. 1 root root 43240180 Dec 19 19:49 overcloud-full.initrd
-rw-r--r--. 1 stack stack 1222114816 Dec 19 19:57 overcloud-full.qcow2
-rw-rw-r--. 1 stack stack 2657590 Dec 19 19:57 dib-overcloud-full.log

and a couple of files in /httpboot
$ ls -al /httpboot/

total 351584
drwxr-xr-x. 2 ironic           ironic                  66 Dec 20 14:23 .
dr-xr-xr-x. 20 root             root                  4096 Dec 20 17:36 ..
-rwxr-xr-x. 1 root             root               5393328 Dec 20 14:23 agent.kernel
-rw-r--r--. 1 root             root             354615883 Dec 20 14:23 agent.ramdisk
-rw-r--r--. 1 ironic-inspector ironic-inspector       456 Dec 19 15:17 inspector.ipxe

Next step is to upload the images.

$ . ~/stackrc && openstack overcloud image upload

Monday, December 19, 2016

OpenStack Day 2 - Deploying the Undercloud

Finally ready to start pushing some buttons.

I took a final read through the TripleO documentation (http://tripleo.org/index.html) to make sure I understood the pre-deployment activities and made sure to read all the way through looking for any last minute gotchas ("But first remove the fuse!" (M*A*S*H Season 1 episode 20). I'd recommend those of you following along at home do likewise. Or if your pressed for time at least read through the architecture (http://tripleo.org/introduction/architecture.html) section so terminology makes sense.

TripleO can deploy into virtual environments (using VMs as targets). I'm targeting physical machines and will stick to those sections of the install document.

After reading the docs I made the following hardware cutoff for sorting through my stack of unused hardware:

Multi-core CPU
4 GB memory
60 BG Free Disk Space
- machines running ceilometer need to have a separate partition for that MongoDB to avoid running out of space on the root partition...
2 Gigabit NIC
Overcloud machines:
- All Overcloud machines must support IPMI (Ironic supports a number of drivers: pxe_ipmitool (IPMI), pxe_ilo (HP Proliant Gen 8 and Gen 9), pxe_drac (Dell 11G and newer), fake_pxe (stubs - won't do power management), fake (won't do anything - ironic won't touch the hardware)) ... there are others: http://docs.openstack.org/developer/ironic/#driver-references
- OverCloud: The NIC used for PXE boot should be the only one enabled for network booting and it should be at the top of the boot order (head of local disk and CD)
Ceph Machines should have 1TB RAM per 1TB of storage and it's recommended that the OS be on a separate drive from data for performance. So consider 2 drives a minimum.

A number of other questions come up during deployment and so I made some initial decisions:

We will not deploy with SSL initially
We will not deploy network isolation (the ability to create isolated networks within openstsack). Network isolation is cool but it also increases complexity for not much gain in my use case. Start simple and then complicate as needed.
We will be deploying with Ceph. Arguably a complication but I really like everything I've read about Ceph.
- How many nodes????
We will be deploying with Swift (so glance can store images in swift rather than directly on the controller)
We will be deploying only a single controller. This is an Single Point Failure, however a High Availability solution requires 3 controllers and we haven't identified sufficient quantity of hardware. This is one of the first areas that should be addressed during scale-out.
Magnum ... I really wanted to use Magnum as well (for containers) but again it's not needed for my initial use case so... later.

I plan to deploy: One Undercloud Director; One Overcloud Controller; At least one compute node but really as many as I can get (we'll know after hardware introspection); Three (3) Ceph nodes (1 monitor, 2 OSD nodes); 2 Networks (the PXE/Management network and the External/Public network). After that we'll discover how easy it is to scale... or not.

Let's get to it.

I downloaded a copy of CentOS 7 (Minimal - 1511 release). The 1611 release has just occurred but I don't want anything too shiny and new because it introduces variables. 1511 was the available release when the TripleO documentation was updated for the OpenStack Newton release so I'm going that direction.

After booting up the install media on my chosen Undercloud Director hardware (Dell R210) I went through the install quickly. I was rather surprised to find that the minimal install required a mouse but I had a USB mouse on hand and just plugged it in. I think I could have gone through the Boot with Troubleshooting option and bypassed the mouse requirement but it looked even more painful.

Since this is the director node I went with a standard LVM disk layout with a couple of changes:

/home seems to get the bulk of the space and since we won't be logging in with users I reduces this down to 100G. This may end up being a bad move because we'll be storing images in /home/stack... we'll see.
Bumped the / partition up to 500G
Added a /mongo partition for later use with ceilometer and let it claim the remaining space (273G).

For the network setup I left Gigiabit 1 (em1) off. I enabled em2 and let it DHCP.

I assigned a root password and created a local user account for myself (with Admin privileges).

When the installation completed I let the system reboot. Once it had rebooted I used putty to SSH into it using my local user account. So far so good.

Here's where I hit my first problem. Some of the software that TripleO wants to install conflicts with a package installed already by the minimal installer. Argh. So I wiped and started over again, repeating the above configuration steps and brining myself back to this point where we can repeat with the benefit of our future knowledge.

Here are the steps I've taken to deploy my Undercloud. Substitute in your hostname for 'myhost.mydomain' and in the following line when editing /etc/hosts (I use 'vi' use whatever you want) make sure to put an entry for that host in your 127.0.0.1 and ::1 lines. The next line 'sudo yum erase -y mariadb-libs' clears up that conflict we mentioned.

sudo useradd stack
sudo passwd stack  # specify a password
echo "stack ALL=(root) NOPASSWD:ALL" | sudo tee -a /etc/sudoers.d/stack
sudo chmod 0440 /etc/sudoers.d/stack

sudo hostnamectl set-hostname myhost.mydomain
sudo hostnamectl set-hostname --transient myhost.mydomain
vi /etc/hosts

sudo yum erase -y mariadb-libs

sudo curl -L -o /etc/yum.repos.d/delorean-newton.repo \

https://trunk.rdoproject.org/centos7-newton/current/delorean.repo

sudo curl -L -o /etc/yum.repos.d/delorean-deps-newton.repo \

http://trunk.rdoproject.org/centos7-newton/delorean-deps.repo

sudo yum -y install --enablerepo=extras centos-release-ceph-jewel
sudo sed -i -e 's%gpgcheck=.*%gpgcheck=0%' /etc/yum.repos.d/CentOS-Ceph-Jewel.repo
sudo yum -y install yum-plugin-priorities
sudo yum install -y python-tripleoclient
sudo yum install -y yajl

sudo su -l stack

cp /usr/share/instack-undercloud/undercloud.conf.sample ~/undercloud.conf
vi undercloud.conf

I have reserved 192.168.50.0/24 for use with the Undercloud PXE/Management network. Tentatively making the following changes to undercloud.conf (it's possible I could have deployed with just a change to local_interface but that would have used a weird default IP Block).

local_ip = 192.168.50.1/24
local_interface = em1
network_gateway = 192.168.50.1
network_cidr = 192.168.50.0/24
masquerade_network = 192.168.50.0/24
dhcp_start = 192.168.50.100
dhcp_end = 192.168.50.250
inspection_iprange = 192.168.50.80,192.168.50.99

I think this says:

The director is 192.168.50.1/24 on em1
The assigned network gateway for the managmenet network should be .1
The whole network is 192.168.50.0/24
The masquerade Network is the same
The DHCP range is 192.168.50.100 - 192.168.50.250
The Introspection IP range is 192.168.50.80 - 192.168.50.99

I'm interpreting this as the assigned IP's on my IPMI will be 192.168.50.2 - 192.168.50.79.

Or so I hope. It might be a while in the future before I discover the error of my ways here...

Make the changes and save ~stack/undercloud.conf

Finally the moment we've been waiting for today:

$ openstack undercloud install

You can make changes and re-run this command until you have Overcloud. After that it's a badness.

If all goes well you should receive a happy message and a couple of files will get created.

#############################################################################

Undercloud install complete.

The file containing this installation's passwords is at

/home/stack/undercloud-passwords.conf.

There is also a stackrc file at /home/stack/stackrc.

These files are needed to interact with the OpenStack services, and should be

secured.

#############################################################################

At this point our undercloud should be usable. We can run a few tests to find out. I should note that I received a LOT of Warnings. Mainly dealing with the use of deprecated commands. My early guess is that some of their scripting (probably in Puppet) hasn't been updated to use Newton (latest version of OpenStack at this time) and it's not a high priority since it's only warnings. Again... we'll see.

$ifconfig -a

br-ctlplane: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.50.1 netmask 255.255.255.0 broadcast 192.168.50.255

$ cat /etc/sysconfig/network-scripts/ifcfg-em1
# This file is autogenerated by os-net-config
DEVICE=em1
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
DEVICETYPE=ovs
TYPE=OVSPort
OVS_BRIDGE=br-ctlplane
BOOTPROTO=none
MTU=1500

So there's something in there... DEICETYPE=ovs. Yum. Lots of discussion on that coming.

Next up: Getting the other hardware registered with Ironic.

OpenStack Day 1 - Choosing an Install

According to their website, "OpenStack software controls large pools of compute, storage, and networking resources through a datacenter, managed through a dashboard or via the OpenStack API." That's a pretty long sentence but in the end it means cloud without resorting to what is almost an overused, and certainly ill defined, buzzword.

I want to deploy a pool of compute, storage, and networking resources. OpenStack offers me a way to control and manage this pool. OpenStack is also very widely used by some pretty serious companies (BestBuy, Paypal, Comast to name a few). It's actively developed and gets new releases a couple of times a year.

The downside... unsurprisingly, OpenStack is complex. It isn't a single application. It's an entire ecosystem of applications and options. There are a number of open source projects that fall under the OpenStack umbrella. The most popular/useful/mature get pulled into the Core. At present there are 6 core Services: Nova (for Compute), Neutron (for networking), Keystone (for identity), Glance (Image Service), Swift (Object Storage) and Cinder (Block Storage). There are also ~13 Optional services.

There are a lot of ways to install OpenStack ranging from all-in-one installs (where virtual machines or containers are used to create an OpenStack environment on a single machine) to full blown metal installs of hundreds of nodes in a number of geographic locations. My needs fall somewhere in the middle. I want to deploy a small proof of concept network that will allow me to easily replace nodes with better hardware and to just add more nodes when I need additional compute or storage resources. Initially I just want to utilize all of the unused hardware that I have just laying around. As things prove out we'll budget in replacements and boost performance.

I looked at the OpenStack website. I was initially drawn to the Install Guide for Ubuntu. I read through it and it was very hands on. Perhaps too hands on. There was a lot of room for error. A bit of googling about led me to https://www.ubuntu.com/cloud/openstack which offers a faster way to get up and running using their Autopilot software. This method essentially has a MAAS (Metal As A Service) host which will use IPMI and PXE to configure your physical hosts. And it's at this point that I completely struck out. The hardware I'm using has some pretty shaky IPMI implementations and I couldn't find good workarounds for MAAS nor did I want to spend the time learning MAAS when I wanted to be deploying OpenStack.

A little side track on IPMI. IPMI is an acronym for the Intelligent Platform Management Interface. This is a computer interface specification for an autonomous computer subsystem that provides management and monitoring capabilities independently of the hosts CPU, firmware and operating system. (above definition curtesy of Wikipedia). A BMC (baseboard management controller) is a specialized service processor that is the main controller for IPMI and provides the intelligence and the physical interfaces to other components and sub-systems. There have been 5 versions of the IPMI specification beginning with v1.0 in 1998. v2.0 was publicshed in 2004 and it has had two minor updates since then (2014 which added IPv6 support; and 2015 which added additional security protocols). On server level hardware IPMI is implemented in the DRAC on Dell Hardware and within the ILO on HP hardware. Yep. At the base of your cloud is physical hardware. At the base of all cloud is physical hardware.

Moving on...

A bit more research led me to TripleO (tripleo.org). TripleO (OOO) or OpenStack On OpenStack. We install OpenStack on a single machine (called the Undercloud) including the optional Ironic component which is used to handle Iron (bare metal servers). Like MAAS it's going to use IPMI and PXE, however it includes dummy drivers to get around some broken implementations and further... learning it is learning OpenStack since it's a component. Seems a win. The only downside, from my perspective, is having to use CentOS instead of Ubuntu. Another serious bonus for me is that Puppet is used to provision the nodes into their role (Compute, Object Storage, Controllers, etc...). I use Puppet a lot so this would mean more visibility under the hood.

TripleO is undergoing a lot of changes (like all things OpenStack). It's entirely possible that future versions will rely much more heavily on containers (and that won't be a bad thing).

TripleO, at least as of the time of this writing, will use Nova, Ironic, Neturon, Heat, Glance and Ceilometer to deploy OpenStack on bare-metal hardware. This deployment is the usable result and is called the Overcloud.

The steps should proceed pretty much like this:

1. Install CentoOS 7 on the server that will become the Undercloud director
2. Deploy the Undercloud
3. Configure IPMI on the remaining hardware (hardcode an IPMI/BMC IP address, username and password)
4. Register the hardware to Ironic
5. Allow Ironic to deploy the introspection image. This image gets additional information about the hardware and performs some light benchmarking. The results of the introspection will make it easier to programmatically decide which hardware is right for which roles.
6. Tag hardware for roles
7. Deploy the OverCloud
8. Observe the monitoring and operations software
9. Backup the director

At least that's the plan... next up: Deploying the Undercloud

Wednesday, December 14, 2016

Hello World!

Hello World!

And that takes care of tradition. Welcome to The Full Stacky. The name of this blog is clearly a play on The Full Monty. The Full Monty is a splendid film and may give you a bit of advance insight into what to expect here. Just like Gaz and the boys we have a goal and we'll be doing the needful to achieve that. Sometimes it won't be pretty but it'll usually be entertaining. We'll be learning a lot on the way and we'll be letting it all hang out.. within the bounds of good taste.

Full Stack in the industry has a special meaning. For developers we refer to various solution stacks (such as LAMP Stack which consists of Linux, Apache, MySQL, and Perl/PHP/Python). Wikipedia has a good overview of the common solution stacks at https://en.wikipedia.org/wiki/Solution_stack. A Full Stack Developer (capitals definitely required) is an individual who is conversant in every element of the solution stack of choice.

Being a Full Stack Developer requires a good breadth of skills and depending on the complexity and range of the stack they are rare... unicorn rare in some cases. Lots of articles have been written on the necessity, and the myth, of the full stack developer. Go forth and enjoy. Start with this one since it's humorous: http://andyshora.com/full-stack-developers.html

Almost there... I'm not all that developer-y anymore. I did my Computer Science at Purdue and then went out into the real world and discovered that I really enjoyed networking just a little more. Time passes and virtualization becomes the new hotness. With an explosion of servers comes better management and tools so we start getting into configuration management. Tools like Puppet, Chef, Ansible, etc... allow for configuration of servers using a bit of code. Uh oh.. headed back to developer land. In recent days this trend is headed full blown into a little paradigm known as "Infrastructure as Code" where we manage and provision our computing infrastructure (bare-metal servers, virtual servers, networking, etc.) using code. Very, very cool. And that's my stack.

I'll be touching on configuration management tools (or "Continuous Configuration Automation" (CCA) Tools using current buzz-nacular), Cloud computing, security, applicable hardware and, since Infrastructure as Code is a key best practice for DevOps, we'll definitely be delving into DevOps.

I'm about to begin a production implementation of OpenStack (private cloud) so I expect the next several posts will focus entirely in that direction. Stay tuned...