Thursday, June 8, 2017

OpenStack Day 7 - Overstack Validation, Post-Deployment and Testing


It's alive!

At the end of the Overstack deployment you should have been given the IP address of the controller. You can also see that if you login to the Undercloud-UI (at the bottom):


[Note: even though we'll be destroying and redeploying a number of times I'm making an effort to get into the habit of not disclosing passwords.]

You should be able to access the dashboard of the new controller if you have access to the network on which it's deployed (via routing, workstation on the subnet or the squid proxy we setup earlier). See TripleO-UI Detour for thoughts on getting remote access to that network.

If you point a browser at http://{controller_IP}/ (e.g. http://192.168.50.108/ in the above) it should redirect to the dashboard (i.e. http://192.168.50.108/dashboard/) where you'll be presented with a login. The login credentials from the Undercloud UI should work fine as will the credentials from the overcloudrc file (see below).

The overstack deployment created a file called 'overstackrc'. This will be in the ~/stack directory of the OpenStack Director if you've been following along. By sourcing that file we will be able to use the openstack command-line tools from the director to manage the overstack. Not our end goal but a good first step along the validation process.

[Note: You can switch back to the undercloud at any time by resourcing the 'stackrc' file created during understand deployment.]
[Note2: If you tried to get fancy and deploy from the Undercloud UI this file will not be created... you should create it yourself as having it will save you some heartache later. The contents of the file is listed below and the only real substitutions will be the IP Addresses and OS_PASSWORD.]


$ source ~/overcloudrc
$ cat ~/overcloudrc
export OS_NO_CACHE=True export OS_CLOUDNAME=overcloud export OS_AUTH_URL=http://192.168.50.108:5000/v2.0 export NOVA_VERSION=1.1 export COMPUTE_API_VERSION=1.1 export OS_USERNAME=admin export no_proxy=,192.168.50.108,192.168.50.108 export OS_PASSWORD=_REMOVED_
export PYTHONWARNINGS="ignore:Certificate has no, ignore:A true SSLContext object is not available" export OS_TENANT_NAME=admin
Now all this does is change some environment variables so that when you run OpenStack commands it checks in with the Overstack Keystone instead of the Understack (Director) Keystone. And, of course, it provides the correct credentials for authenticating with that Keystone.

This presupposes that the Deployment worked. If it didn't then begins a long cycle of figuring out what went wrong, destroying the deploy and re-deploying. As I said we're going to destroy and re-deploy a few times so more on that later. As far as debugging the deployment you have the usual resources: logs and experience. At this point you'll find that both are a little lacking. If you have a monitor connected to the overcloud devices you can see, and possible eliminate, a number of problems such as not PXE booting, not being powered on by IPMI, etc...

Mine didn't fail this time fortunately but here's my short list of resources for when it does:

But our overstack deployment worked so let's move onward and make sure we can do something clever with it. First thing's first. There are some post-deployment steps that need to be taken and these do require a bit of forethought. This is another area where I think the TripleO documentation (ditto for the Red Hat Director documentation). They talk about these steps but don't explain the ramifications.

Post-Validation

Just like we ran a pre-deployment validation we can also run the post-deployment validation:

$ source stackrc
$ openstack workflow execution create tripleo.validations.v1.run_groups '{"group_names": ["post-deployment"]}'

Just like we did with the pre-deployment we will then use 'openstack task execution list' to see the status of the various processes and 'mistral task-get-result {ID}' to look at details.

[Note: One of my validations gave me an error stating that the overcloudrc file didn't exist. I "fixed" this by changing the permissions on the /home/stack directory: 'chmod 755 /home/stack'. I'd guess this is because mistral is running tasks as the 'mistral' user which didn't have access to /home/stack with it's default permissions (0700). OTOH - there are files in this directory that I really should want other users seeing/using (permissions to undercloud and overcloud... so I think it's prudent to set the permissions back to 0700 after the validation suite completes.]

My installation also got dinged on: MySQL Open Files Limit (why was this not set correctly by puppet?), HA Proxy configuration (timeout queue in defaults is 2m, but must be set to 1m... again, why did Puppet not get this?), controller ulimits (set to 1024, should be 2048 or higher), and ntp clock synchronization (ntpstat existed with 'unsynchronised'). I also got dinged on pacemaker cluster but since I only deployed a single controller I'm ignoring this one.

OK. So I've got four to fix and all of them require getting a shell on the affected systems (and they aren't all on the same system). Good times. I can ssh onto the systems or plug in consoles. But in either case I'm going to need some credentials.

Bizarrely it seems you can access the whole system w/out any credentials:

$ ssh heat-admin@192.168.50.101

No password required and sudo works without a password as well.

File this as more reason to protect the OpenStack Director machine.

If you look into /etc/puppet/modules on one of the nodes you can see the puppet scripts that were used to setup the various files and services. Sadly they all explicitly set things to values that cause the validations to fail :( 'ulimit' doesn't seem to get set aside from within each process. This is actually correct. You can check limits from within /etc/system/system/. Ignoring this validation.

In any case. I fixed the validations. They ran clean. Onward. For reference the RDO Project has some tweaks: Tunings and Tweaks. These go along with possible limits in base configurations.


Post-Deployment


The first thing that the triple docs have you do is to setup the overcloud network. They do this without really explaining what the options are or why you'd do it in this fashion. What are flat, vlan, gre and vxlan for instance? To truly understand this we'd need to go down the software defined networking rabbit hole and that's out of scope at the moment. We'll need to go there. Just maybe not yet.

What we need to know is what we want to end up with and what does it take to get there. For my purposes I just want the instances I spin up to be able to connect to the outside world (to pull packages or data) and I want to be able to access them so that I can manage them or let Jenkins manage them. I don't need, at this point, to create more than a single network. So for me a 'flat' network would work just fine. If you want to be able to create isolated networks then you might want to go with 'vlan' which would cause neutron to create a VLAN ID for each network. 'gre' and 'vxlan' are similar enough that we can discuss them together. Both are "overlay" networks. They give you VLAN like functionality by encapsulating the Layer 2 traffic and creating a tunnel like functionality between similarly encapsulated endpoints. The advantage is that you don't need to sync up your "vlan" IDs with your physical L3 devices (e.g. switches). The disadvantage is that encapsulation has overhead which may cause packets to get split (fragmented) into two which will degrade network performance.

Hope your not lost yet. The next question is one of "floating IP Addresses". Normally when you spin up an instance within OpenStack it is given a local IP from it's network range. You can setup routers and outbound goodies to allow outbound NAT. However you have no way of accessing the instance from the outside. That's where a "Floating IP" comes in. A Floating IP belongs to the public network and it has an inbound NAT that will allow you to connect to an instance. It is possible to go super simple and allocate IP space from your public network for your internal provider network on OpenStack. That would give instances public IPs and so they wouldn't require these Floating IPs. That does simplify things in some situations but it also means instances on OpenStack would be on your "public" network (which could be an internal or dmz network, public refers to context as its public from the OpenStack standpoint). Maybe good. Maybe bad. Using an internal network and Floating IPs gives you the option of being accessible or not.

Let's revisit what our physical layout currently looks like.


Ok. So the first thing I'm going to do is to create a "public" network and flag it as external.

$ source overcloudrc
$ openstack network create public --external --provider-network-type flat \
--provider-physical-network datacenter

Then I'll assign an IP range. Since this is my public range I won't want to enable a DHCP Server since that may conflict with the DHCP server already on that network. Which is all great except when I run it: "HttpException: Bad Request".  That's not informative so we slap a '--debug' on and:

RESP BODY: {"NeutronError": {"message": "Invalid input for operation: physical_network 'datacenter' unknown for flat provider network.", "type": "InvalidInput", "detail": ""}}

Better. The docs did say it was an example command, that it assumes a dedicated interface (or native VLAN) and that it should be modified to match the local environment. Ok. I don't have an provider physical network called datacenter so yeah. I do have an open interface on nic2/en02. However the command won't work specifying either of those interfaces either.

If I go to Horizon and try to create this interface as an Admin and do it from an Admin context it also fails. If I do it from a project context it succeeds but it also defaults to network-type: vxlan.

$ openstack subnet create --allocation-pool start=172.16.23.140,end=172.16.23.240 \
--network public --gateway 172.16.23.251 --no-dhcp --subnet-range \
172.16.23.128/25 public

This says to create a subnet with a pool starting at 172.16.23.140 and ending at 172.16.23.240 with a gateway of 172.16.23.251 and a subnet range (weird wording) of 172.16.23.128/25. We say not to use DHCP and we attach this subnet to the "public" network". This absolutely implies that my external network (from the standpoint of the OpenStack deployment) is 172.16.23.128/25.

A network labeled "public" is required in order to do the next set of validations so we can more or less leave it here for a minute while we do those.




4 comments:

  1. Hi ,
    So you have created only Public network, I have tried the same one,and created a private network (random).

    Within private network VM are reachable but not reachable from Host even not reachable from other host using public IP.
    .

    Can you please help me on this?

    ReplyDelete
    Replies
    1. I wouldn't expect the private network to be accessible from the host network. The private network sets up a separate network and there is no gateway between them. Even your "public" network is reliant on 3rd party gateways to make itself accessible. You can see my frustration with the TripleO documentation above. They say to do things but don't provide the necessary understanding. TripleO, paradoxically, is more useful once you understand OpenStack. That makes it useful as a fast deployment stack for developers. For the rest of us I think it makes more sense to not use such a tool and to build from scratch and understand it; or go to the opposite extreme and use a commercial product where the understanding is moved to better UI and commercial support.

      In short, you need to add routers and gateways to your network to make them accessible and to allow them to interact.

      Delete