Tuesday, June 13, 2017

OpenStack Reboot Part Deux



OpenStack Rebooted... Part Deux

(spoiler alert - this ends badly)

Day 4 - Node Prep

k. As mentioned there have been some hardware updates. The biggest changes are the addition of a couple of R510s loaded up with drives to act as Ceph nodes and a couple of R620s to increase out compute node count.

The first time we did day 4 it became obvious that using the Ironic pxe_drac driver wasn't all that great for Gen11 servers even though it was recommended. There's a good slideshow from Redhat on troubleshooting ironic (http://dtantsur.github.io/talks/fosdem2016/#/6) that has a great quote on this:
Ironic has different drivers. Some hardware is supported by more than one of them.
Make sure to carefully choose a driver to use: vendor-specific drivers (like pxe_drac or pxe_ilo) are usually preferred over more generic ones (like pxe_ipmitool). Unless they don't work :)
So there's that. They are preferred if they work.  Since I'm throwing a few Gen12 nodes into the mix I tried the pxe_drac driver on them and it seems to have worked so far (knock on silicon). Everything else I've left as pxe_ipmitool.

The 'openstack baremetal import' command is deprecated now. The new hotness is:

$ openstack overcloud node import instackenv.json
Waiting for messages on queue 'e5a76db8-d9d3-4563-a6d0-e4487cfd60ea' with no timeout.
Successfully registered node UUID d4fb130b-84e2-49de-af8a-70649412d9d3
Successfully registered node UUID e33069b8-e757-44b5-89cc-9b6fd51c2d47
Successfully registered node UUID d625ea11-4f67-4e29-958e-9b7c6e55790e
Successfully registered node UUID c5acfdfa-993b-482e-9f58-a403bf1fc976
Successfully registered node UUID 13d99f7f-567d-496c-8892-57066f23fcc2
Successfully registered node UUID 42dc02a2-ebe7-461d-95d6-a821248b4a33
Successfully registered node UUID 4dbf1ed1-864e-4fbb-886b-38c473d3a371
Successfully registered node UUID 3d9be490-7a86-4bd5-b299-3377b790ef8a
Successfully registered node UUID 62c7b754-5b52-40e8-9656-69c102a273ff

[stack@ostack-director ~]$ openstack baremetal node list

+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name    | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| d4fb130b-84e2-49de-af8a-70649412d9d3 | TAG-203 | None          | power off   | manageable         | False       |
| e33069b8-e757-44b5-89cc-9b6fd51c2d47 | TAG-201 | None          | power off   | manageable         | False       |
| d625ea11-4f67-4e29-958e-9b7c6e55790e | TAG-625 | None          | power off   | manageable         | False       |
| c5acfdfa-993b-482e-9f58-a403bf1fc976 | TAG-202 | None          | power off   | manageable         | False       |
| 13d99f7f-567d-496c-8892-57066f23fcc2 | TAG-626 | None          | power off   | manageable         | False       |
| 42dc02a2-ebe7-461d-95d6-a821248b4a33 | TAG-627 | None          | power off   | manageable         | False       |
| 4dbf1ed1-864e-4fbb-886b-38c473d3a371 | TAG-628 | None          | power off   | manageable         | False       |
| 3d9be490-7a86-4bd5-b299-3377b790ef8a | TAG-629 | None          | power off   | manageable         | False       |
| 62c7b754-5b52-40e8-9656-69c102a273ff | TAG-630 | None          | power off   | manageable         | False       |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+


And just like the first time through... so far so good.

Starting with OpenStack Newton it was possible to to add a '--enroll' command line to the import so that nodes would enter in the 'enroll' state rather than the 'manageable' state. This, in turn, allows you to selectively move some nodes to the 'manageable' state for introspection. You can also one shot that during impot with '--introspect --provide' which would do introspection and set the final state to 'available'.

On to Introspection.

Day 5 - Introspection

This actually brings us into 2017 which feels like progress.

This should "just work" but I have a couple of doubts:
1. I didn't wipe the drives on a couple of nodes from the previous setup; will they PXE boot properly?
2. The R510s are all using UEFI instead of BIOS. Is that even an issue?
3. The Ceph nodes have multiple drives. The TripleO docs have a warning "If you don't specify the root device explicitly, any device may be picked. Also the device chosen automatically is NOT guaranteed to be the same across rebuilds. Make sure to wipe the previous installation before rebuilding in this case." So there's that. That would be an Advanced Deployment: https://docs.openstack.org/developer/tripleo-docs/advanced_deployment/root_device.html#root-device

None of our nodes are in maintenance mode (that column is all false). All are listed as 'manageable'.

Ironic Inspectors sets up a DHCP+iPXE server listening to requests from bare metal nodes.

Also new with Ocata you can run a suite of pre-introspection validations.

openstack workflow execution create tripleo.validations.v1.run_groups '{"group_names": ["pre-introspection"]}'

Getting results from this is a little more complex. In my opinion the easiest way is to:

$ openstack task execution list | grep RUNNING

When that returns no results than the workflow is finished and we can look at ERRORs

$ openstack task execution list | grep run_validation | grep ERROR

If there are no errors, you win, move along. If there are errors we can take a closer look.

$ mistral task-get-result {ID}

{ID} is the first column of the task execution list. This should point you in the right direction.

Back to introspection. The 'bulk start' way we did it previously is also gone. We have a couple of options. We can stay with the bulk introspection with:

# openstack overcloud node introspect --all-manageable

This does exactly what you think. It runs introspection on all nodes in the 'manageable' provisioning state. Optionally we can slap a '--provide' at the end to automatically put nodes in the 'available' state past introspection. (Nodes have to be 'available' before they can be deployed into the overcloud).

Alternately we can do them one node at a time which we'd do if we are paranoid about any node succeeding. Still another option would be to bulk shoot it and then re-do some nodes individually. To do individual nodes:

$ openstack baremetal node manage {UUID/NAME}
$ openstack baremetal introspection start {UUID}
$ openstack baremetal introspection status {UUID}
$ openstack baremetal node provide {UUID}

I'm going to bulk run it and then troubleshoot the failures. This is a workflow so you can use technique similar to monitoring the validation workflow to see some progress. I'd do it in another shell so as not to potentially interrupt.

$ openstack overcloud node introspect --all-manageable --provide
Started Mistral Workflow tripleo.baremetal.v1.introspect_manageable_nodes. Execution ID: c9c0b86a-cb3c-49dd-8d80-ec91798b00bb
Waiting for introspection to finish...
Waiting for messages on queue '1ee5d201-012c-4a3a-8332-f63c49b655f3' with no timeout.
.
.

And... everything fails.

Ok. So troubleshooting some IPMI? I expected. Troubleshooting a bit of PXE? Yep. Here the introspection image just keeps cycling and claims it has no network.

*sigh* So... Reality time. So far on OpenStack deployments with Triple-O I've spent 95% of the actual frustration time with ipmi, pxe and assorted Ironic/Baremetal problems. And here's the thing... none of that has anything to do with my final OpenStack cloud! I really wanted to work it this way because it seems somehow elegant... clever. But it really isn't at all worth the time and effort beating down problems in technologies that really aren't bringing anything to the final solution.

Decision time: Continue on with Triple-O trying the 'Deployed Server' methodology or switch to a different deployment method entirely?

Deployed Server essentially means I pre-install CentOS, pre-set the networking (no understack providing neutron based DHCP), install the package repositories, install python-heat-agent packages and then invoke openstack overcloud deploy... sort of. The documentation gets a bit sketch at this point on where we specify the various IP addresses. And in the end... what will the undercloud be bringing to the party?

Alternate install options: ansible (looks like it prefers Ubuntu and utilizes LXC), Puppet (I know puppet pretty well), kola-ansible, go it manually.

Decisions decisions... what I do know is that TripleO has too many moving parts, not enough soup to nuts walkthroughs. When it works it just works. If it doesn't you are clueless.


Thursday, June 8, 2017

OpenStack Reboot

OpenStack Reboot

And... time passes. Wow. I was getting back into it and noticed that I had left Day 7 as draft. Just clicked the publish button but that's more as a historical reminder for me than anything useful. Time has passed. The age of newton is gone and OpenStack is now (June 2017) all Ocata. I attended OpenStack summit in Boston and there are a lot of great plans for Pike, Queens and Rocky already in motion.

I made a lot of undocumented progress on the previous iteration and we are pushing past proof of concept. There are still some unsettled issues but we'll address them as they come. Rather than attempting to upgrade the existing setup I'm going to wipe everything and start clean. I'll be going through the first part pretty fast just noting where the process has changed from Newton and go through the same "day" type setup so it's easy to reference backward and forward. This may be a tad lengthy and I'm not going to be format obsessed... just trying to catch up.

Day 1 - Still going with TripleO

My reasoning is unchanged. Openstack-Ansible is coming along nicely and a lot of good thoughts from using Puppet directly as well. However, in the end I'm sticking with TripleO. So there's also a couple of new goodies in TripleO that I was unaware of (shocking right?). These are:
  • Composable Roles - Rather than a single monolithic template, each service is encapsulated in a separate template allowing you to easily select which services are deployed/enabled on a particular role. There's a composable roles tutorial available: http://tripleo-docs.readthedocs.io/en/latest/tht_walkthrough/tht_walkthrough.html
  • Split Stack (Deployed Server) - The basic idea is to split the configuration into two parts: 1) deploy baremetal nodes and not deploy any software configurations on those nodes; 2) Applying software configuration to deploy openstack on already deployed baremetal nodes (using the deployed-server-templates probably). The upshot of this from my viewpoint is that a lot of the BMC/IPMI issues may be dodgeable because you can manually deploy the baremetal and then let TripleO pick it up at the 2nd half. At least that's the theory. James Slagle did a deep dive on deployed-server: See https://openstack.nimeyo.com/112516/openstack-tripleo-deep-dive-thursday-1400-deployed-server

Day 2 - Deploying the Undercloud

Pretty much the exact same process on the exact same director hardware. Since this will be a more production-ish deploy some of the decisions have changed. Specifically I'll be deploying 3 controllers, 3 compute nodes and 3 ceph OSD nodes.

Install CentOS 7 on the hardware. Once again leaving Gigabit 1 (eno1) off in favor of eno2 which will have DHCP client enabled. 'yum update' everything up to the latest and greatest. Unlike with the Newton install the Ocata install had no package conflicts.

sudo yum -y upgrade
sudo shutdown -r now

sudo useradd stack
sudo passwd  # specify a password
echo "stack ALL=(root) NOPASSWD:ALL" | sudo tee -a /etc/sudoers.d/stack
sudo chmod 0440 /etc/sudoers.d/stack
su -l stack # from here out we'll do things as the stack user

sudo hostnamectl set-hostname myhost.mydomain
sudo hostnamectl set-hostname --transient myhost.mydomain
vi /etc/hosts # set hostname on 127.0.0.1

sudo curl -L -o /etc/yum.repos.d/delorean-ocata.repo \
https://trunk.rdoproject.org/centos7-ocata/current/delorean.repo

sudo curl -L -o /etc/yum.repos.d/delorean-deps-ocata.repo \
https://trunk.rdoproject.org/centos7-ocata/delorean-deps.repo

sudo yum -y install --enablerepo=extras centos-release-ceph-jewel
sudo sed -i -e 's%gpgcheck=.*%gpgcheck=0%' /etc/yum.repos.d/CentOS-Ceph-Jewel.repo

sudo yum -y install yum-plugin-priorities
sudo yum install -y python-tripleoclient

cp /usr/share/instack-undercloud/undercloud.conf.sample ~/undercloud.conf

The Ocata docs reference an online configuration wizard for the undercloud.conf. Its available at http://ucw-bnemec.rhcloud.com/ and is pretty sparse but does "generate sane values for a number of the important options". I ended up with this which I pasted into the bottom of my undercloud.conf file.

# Config generated by undercloud wizard
# Use these values in undercloud.conf
[DEFAULT]
undercloud_hostname = ostack-director.mydomain.com
local_interface = eno1
local_mtu = 1500
network_cidr = 192.168.50.0/24
masquerade_network = 192.168.50.0/24
local_ip = 192.168.50.1/24
network_gateway = 192.168.50.1
undercloud_public_host = 192.168.50.78
undercloud_admin_host = 192.168.50.79
undercloud_service_certificate =
generate_service_certificate = False
scheduler_max_attempts = 20
dhcp_start = 192.168.50.100
dhcp_end = 192.168.50.200
inspection_iprange = 192.168.50.80,192.168.50.99
# Deprecated names for compatibility with older releases
discovery_iprange = 192.168.50.80,192.168.50.99
undercloud_public_vip = 192.168.50.78
undercloud_admin_vip = 192.168.50.79
Finally...
$ openstack undercloud install
.
.
#############################################################################
Undercloud install complete.
The file containing this installation's passwords is at
/home/stack/undercloud-passwords.conf.
There is also a stackrc file at /home/stack/stackrc.
These files are needed to interact with the OpenStack services, and should be
secured.
#############################################################################
And a quick peak around at some of the networking...
[stack@ostack-director ~]$ source stackrc
[stack@ostack-director ~]$ openstack network list
+--------------------------------------+----------+--------------------------------------+
| ID                                   | Name     | Subnets                              |
+--------------------------------------+----------+--------------------------------------+
| fd15b684-de12-4da0-a438-8e9600dace3d | ctlplane | d036ddb7-de9a-46fe-a4c7-e81e6bb08619 |
+--------------------------------------+----------+--------------------------------------+
[stack@ostack-director ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master ovs-system state DOWN qlen 1000
    link/ether 00:1e:c9:2c:b9:88 brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:1e:c9:2c:b9:8a brd ff:ff:ff:ff:ff:ff
    inet 10.92.3.49/18 brd 10.92.63.255 scope global dynamic eno2
       valid_lft 1282290sec preferred_lft 1282290sec
    inet6 fe80::21e:c9ff:fe2c:b98a/64 scope link
       valid_lft forever preferred_lft forever
4: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether d2:73:a4:4d:3d:dd brd ff:ff:ff:ff:ff:ff
5: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether 00:1e:c9:2c:b9:88 brd ff:ff:ff:ff:ff:ff
    inet 192.168.50.1/24 brd 192.168.50.255 scope global br-ctlplane
       valid_lft forever preferred_lft forever
    inet6 fe80::21e:c9ff:fe2c:b988/64 scope link
       valid_lft forever preferred_lft forever
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
    link/ether 02:42:d5:da:d2:ff brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever
7: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 1a:a7:ef:d9:54:40 brd ff:ff:ff:ff:ff:ff
[stack@ostack-director ~]$ ip netns
qdhcp-fd15b684-de12-4da0-a438-8e9600dace3d
[stack@ostack-director ~]$ sudo ip netns exec qdhcp-fd15b684-de12-4da0-a438-8e9600dace3d ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
8: tap6dde9c03-b9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN qlen 1000
    link/ether fa:16:3e:15:cd:64 brd ff:ff:ff:ff:ff:ff
    inet 192.168.50.100/24 brd 192.168.50.255 scope global tap6dde9c03-b9
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe15:cd64/64 scope link
       valid_lft forever preferred_lft forever


Day 6.5 - TripleO UI

Skipping ahead here a bit but it's worth it to get more validation that our install is good so far. So the puppet scripts bind port 3000 to the 192.168.50.1 IP Address we specified in undercloud.conf. Makes sense but not what I want. A quick look at the previous Day 6.5 will show you installing and configuring a proxy on the external/dhcp address. We do that again here. We can't just modify the httpd configuration because it'll want access to keystone on that same IP and that starts us down a path we don't want to follow. Squid will work for testing and later I'll just put in some routing. Either way the goal here is to get to where you can point our browser at http://ostack-director.mydomain.com:3000

Winner. The username and password can be found in the stackrc file:

OS_USERNAME=admin
OS_PASSWORD=${sudo hiera admin_password}

$ sudo hiera admin_password
d9b9e6db075ce13d3f2a2d83bc41e8d11039223e

Day 3 - Creating the Boot Images

export DIB_YUM_REPO_CONF="/etc/yum.repos.d/delorean*"
export DIB_YUM_REPO_CONF="$DIB_YUM_REPO_CONF /etc/yum.repos.d/CentOS-Ceph-Jewel.repo"
export STABLE_RELEASE="ocata"
openstack overcloud image build --all
....time passes....
openstack overcloud image upload
Image "overcloud-full-vmlinuz" was uploaded.
Image "overcloud-full-initrd" was uploaded.
Image "overcloud-full" was uploaded.
Image "bm-deploy-kernel" was uploaded.
Image "bm-deploy-ramdisk" was uploaded.

$ openstack image list

| 9fc6c13f-a8a7-4b57-9cb0-5b7bc54d8611 | bm-deploy-kernel       | active |
| 0a4af73e-d026-431c-839b-8bac39980113 | bm-deploy-ramdisk      | active |
| 19b2546a-d2f2-4962-8c0c-72b68bb75311 | overcloud-full         | active |
| 5ca4e749-f0e7-498d-9e6b-c4bbac3492bc | overcloud-full-initrd  | active |
| 7342db7a-569d-4927-8f15-b69a90a80b57 | overcloud-full-vmlinuz | active |

And so ends Day 3 and the easy days...


OpenStack Day 7 - Overstack Validation, Post-Deployment and Testing


It's alive!

At the end of the Overstack deployment you should have been given the IP address of the controller. You can also see that if you login to the Undercloud-UI (at the bottom):


[Note: even though we'll be destroying and redeploying a number of times I'm making an effort to get into the habit of not disclosing passwords.]

You should be able to access the dashboard of the new controller if you have access to the network on which it's deployed (via routing, workstation on the subnet or the squid proxy we setup earlier). See TripleO-UI Detour for thoughts on getting remote access to that network.

If you point a browser at http://{controller_IP}/ (e.g. http://192.168.50.108/ in the above) it should redirect to the dashboard (i.e. http://192.168.50.108/dashboard/) where you'll be presented with a login. The login credentials from the Undercloud UI should work fine as will the credentials from the overcloudrc file (see below).

The overstack deployment created a file called 'overstackrc'. This will be in the ~/stack directory of the OpenStack Director if you've been following along. By sourcing that file we will be able to use the openstack command-line tools from the director to manage the overstack. Not our end goal but a good first step along the validation process.

[Note: You can switch back to the undercloud at any time by resourcing the 'stackrc' file created during understand deployment.]
[Note2: If you tried to get fancy and deploy from the Undercloud UI this file will not be created... you should create it yourself as having it will save you some heartache later. The contents of the file is listed below and the only real substitutions will be the IP Addresses and OS_PASSWORD.]


$ source ~/overcloudrc
$ cat ~/overcloudrc
export OS_NO_CACHE=True export OS_CLOUDNAME=overcloud export OS_AUTH_URL=http://192.168.50.108:5000/v2.0 export NOVA_VERSION=1.1 export COMPUTE_API_VERSION=1.1 export OS_USERNAME=admin export no_proxy=,192.168.50.108,192.168.50.108 export OS_PASSWORD=_REMOVED_
export PYTHONWARNINGS="ignore:Certificate has no, ignore:A true SSLContext object is not available" export OS_TENANT_NAME=admin
Now all this does is change some environment variables so that when you run OpenStack commands it checks in with the Overstack Keystone instead of the Understack (Director) Keystone. And, of course, it provides the correct credentials for authenticating with that Keystone.

This presupposes that the Deployment worked. If it didn't then begins a long cycle of figuring out what went wrong, destroying the deploy and re-deploying. As I said we're going to destroy and re-deploy a few times so more on that later. As far as debugging the deployment you have the usual resources: logs and experience. At this point you'll find that both are a little lacking. If you have a monitor connected to the overcloud devices you can see, and possible eliminate, a number of problems such as not PXE booting, not being powered on by IPMI, etc...

Mine didn't fail this time fortunately but here's my short list of resources for when it does:

But our overstack deployment worked so let's move onward and make sure we can do something clever with it. First thing's first. There are some post-deployment steps that need to be taken and these do require a bit of forethought. This is another area where I think the TripleO documentation (ditto for the Red Hat Director documentation). They talk about these steps but don't explain the ramifications.

Post-Validation

Just like we ran a pre-deployment validation we can also run the post-deployment validation:

$ source stackrc
$ openstack workflow execution create tripleo.validations.v1.run_groups '{"group_names": ["post-deployment"]}'

Just like we did with the pre-deployment we will then use 'openstack task execution list' to see the status of the various processes and 'mistral task-get-result {ID}' to look at details.

[Note: One of my validations gave me an error stating that the overcloudrc file didn't exist. I "fixed" this by changing the permissions on the /home/stack directory: 'chmod 755 /home/stack'. I'd guess this is because mistral is running tasks as the 'mistral' user which didn't have access to /home/stack with it's default permissions (0700). OTOH - there are files in this directory that I really should want other users seeing/using (permissions to undercloud and overcloud... so I think it's prudent to set the permissions back to 0700 after the validation suite completes.]

My installation also got dinged on: MySQL Open Files Limit (why was this not set correctly by puppet?), HA Proxy configuration (timeout queue in defaults is 2m, but must be set to 1m... again, why did Puppet not get this?), controller ulimits (set to 1024, should be 2048 or higher), and ntp clock synchronization (ntpstat existed with 'unsynchronised'). I also got dinged on pacemaker cluster but since I only deployed a single controller I'm ignoring this one.

OK. So I've got four to fix and all of them require getting a shell on the affected systems (and they aren't all on the same system). Good times. I can ssh onto the systems or plug in consoles. But in either case I'm going to need some credentials.

Bizarrely it seems you can access the whole system w/out any credentials:

$ ssh heat-admin@192.168.50.101

No password required and sudo works without a password as well.

File this as more reason to protect the OpenStack Director machine.

If you look into /etc/puppet/modules on one of the nodes you can see the puppet scripts that were used to setup the various files and services. Sadly they all explicitly set things to values that cause the validations to fail :( 'ulimit' doesn't seem to get set aside from within each process. This is actually correct. You can check limits from within /etc/system/system/. Ignoring this validation.

In any case. I fixed the validations. They ran clean. Onward. For reference the RDO Project has some tweaks: Tunings and Tweaks. These go along with possible limits in base configurations.


Post-Deployment


The first thing that the triple docs have you do is to setup the overcloud network. They do this without really explaining what the options are or why you'd do it in this fashion. What are flat, vlan, gre and vxlan for instance? To truly understand this we'd need to go down the software defined networking rabbit hole and that's out of scope at the moment. We'll need to go there. Just maybe not yet.

What we need to know is what we want to end up with and what does it take to get there. For my purposes I just want the instances I spin up to be able to connect to the outside world (to pull packages or data) and I want to be able to access them so that I can manage them or let Jenkins manage them. I don't need, at this point, to create more than a single network. So for me a 'flat' network would work just fine. If you want to be able to create isolated networks then you might want to go with 'vlan' which would cause neutron to create a VLAN ID for each network. 'gre' and 'vxlan' are similar enough that we can discuss them together. Both are "overlay" networks. They give you VLAN like functionality by encapsulating the Layer 2 traffic and creating a tunnel like functionality between similarly encapsulated endpoints. The advantage is that you don't need to sync up your "vlan" IDs with your physical L3 devices (e.g. switches). The disadvantage is that encapsulation has overhead which may cause packets to get split (fragmented) into two which will degrade network performance.

Hope your not lost yet. The next question is one of "floating IP Addresses". Normally when you spin up an instance within OpenStack it is given a local IP from it's network range. You can setup routers and outbound goodies to allow outbound NAT. However you have no way of accessing the instance from the outside. That's where a "Floating IP" comes in. A Floating IP belongs to the public network and it has an inbound NAT that will allow you to connect to an instance. It is possible to go super simple and allocate IP space from your public network for your internal provider network on OpenStack. That would give instances public IPs and so they wouldn't require these Floating IPs. That does simplify things in some situations but it also means instances on OpenStack would be on your "public" network (which could be an internal or dmz network, public refers to context as its public from the OpenStack standpoint). Maybe good. Maybe bad. Using an internal network and Floating IPs gives you the option of being accessible or not.

Let's revisit what our physical layout currently looks like.


Ok. So the first thing I'm going to do is to create a "public" network and flag it as external.

$ source overcloudrc
$ openstack network create public --external --provider-network-type flat \
--provider-physical-network datacenter

Then I'll assign an IP range. Since this is my public range I won't want to enable a DHCP Server since that may conflict with the DHCP server already on that network. Which is all great except when I run it: "HttpException: Bad Request".  That's not informative so we slap a '--debug' on and:

RESP BODY: {"NeutronError": {"message": "Invalid input for operation: physical_network 'datacenter' unknown for flat provider network.", "type": "InvalidInput", "detail": ""}}

Better. The docs did say it was an example command, that it assumes a dedicated interface (or native VLAN) and that it should be modified to match the local environment. Ok. I don't have an provider physical network called datacenter so yeah. I do have an open interface on nic2/en02. However the command won't work specifying either of those interfaces either.

If I go to Horizon and try to create this interface as an Admin and do it from an Admin context it also fails. If I do it from a project context it succeeds but it also defaults to network-type: vxlan.

$ openstack subnet create --allocation-pool start=172.16.23.140,end=172.16.23.240 \
--network public --gateway 172.16.23.251 --no-dhcp --subnet-range \
172.16.23.128/25 public

This says to create a subnet with a pool starting at 172.16.23.140 and ending at 172.16.23.240 with a gateway of 172.16.23.251 and a subnet range (weird wording) of 172.16.23.128/25. We say not to use DHCP and we attach this subnet to the "public" network". This absolutely implies that my external network (from the standpoint of the OpenStack deployment) is 172.16.23.128/25.

A network labeled "public" is required in order to do the next set of validations so we can more or less leave it here for a minute while we do those.