OpenStack for NFV Applications: SR-IOV and PCI Passthrough

4632198914_057aede16c_o

NFV

Network Function Virtualisation (NFV) initiatives in the telecommunication industry require specific OpenStack functionalities enabled.

Without entering into the details of the NFV specifications, the goal in OpenStack is to optimise network, memory and CPU performance on the running instances.

In this article we’ll see Single Root I/O Virtualisation (SR-IOV) and PCI-Passthrough, which are commonly required by some Virtual Network Functions (VNF) running as instances on top of OpenStack.

In addition to SR-IOV and PCI-Passthrough there are other techniques such as DPDK, CPU pinning and the use of NUMA nodes which also are usually required by VNFs. A future post will cover some of them.

SR-IOV

SR-IOV allows a PCIe network interface, offering Physical Functions (PF) to expose multiple network interfaces, appearing as Virtual Functions (VF). For example, the network interface p5p1 configured with 5 VFs looks like this from the operating system:

# ip link show p5p1
8: p5p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
 link/ether a0:36:9f:8f:3f:b8 brd ff:ff:ff:ff:ff:ff
 vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
 vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
 vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
 vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
 vf 4 MAC 00:00:00:00:00:00, spoof checking on, link-state auto

The VFs can be used by the OS or exposed to VMs. They look exactly as regular NIC:

# ip link show p5p1_1
18: p5p1_1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
 link/ether 72:1c:ef:b0:a8:d0 brd ff:ff:ff:ff:ff:ff

Only certain NICs support SR-IOV. In this example I’m using Intel’s X540-AT2 NICs which uses the driver ixgbe.

Linux configuration for SR-IOV

To use SR-IOV in OpenStack, firstly we need to make sure the operating system is configured to support it. There are 2 kernel parameters to be set:

intel_iommu=on 
ixgbe.max_vfs=5

Note that ixgbe is specific for the Intel X540-AT2 NIC and you might be using another one. You can also use a different number of VFs.

To enable the parameters in RHEL based systems it works as follows:

  1. Add the parameters to /etc/default/grub in GRUB_CMDLINE_LINUX
  2. Regenerate the config file with: grub2-mkconfig -o /boot/grub2/grub.cfg
  3. Rebuild the initramfs file with: dracut -f -v

We also need to make sure that the admin state of the interface is UP:

# ip link show p5p1
# ip link set p5p1 up

And by setting the appropriate network interface configuration file in /etc/sysconfig/network-scripts/ifcfg-p5p1 as:

BOOTPROTO=none
DEVICE=p5p1
ONBOOT=yes

OpenStack configuration for SR-IOV

1. Neutron

SR-IOV works with the VLAN type driver in Neutron. We enable it in /etc/neutron/plugin.ini:

[ml2]
type_drivers=vxlan,vlan
tenant_network_types=vxlan,vlan

The mechanism driver is sriovnicswitch, which is configured in the same [ml2] section as follows:

mechanism_drivers=openvswitch,sriovnicswitch

Every time we create a new SR-IOV network in Neutron, it will configure it on a VLAN from a range that we need specify. It needs a name too. In this example the range is 1010 to 1020 and the physical network for Neutron will be called physnet_sriov :

[ml2_type_vlan]
network_vlan_ranges=physnet_sriov:1010:1020

Now, we configure SR-IOV settings in /etc/neutron/plugins/ml2/ml2_conf_sriov.ini. In the section [ml2_sriov] we need to tell the driver which NIC we will use:

[ml2_sriov]
supported_pci_vendor_devs=8086:1515

The numbers represent the vendor ID (8086) and the product ID (1515).  To get them we can use lspci -nn:

# lspci -nn|grep X540-AT2
06:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 [8086:1528] (rev 01)
06:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 [8086:1528] (rev 01)

By default the neutron-server service is not loading the configuration in the file ml2_conf_sriov.ini so we need to add it to its systemd service in /usr/lib/systemd/system/neutron-server.service:

[Service]
Type=notify
User=neutron
ExecStart=/usr/bin/neutron-server --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini --config-file /etc/neutron/plugins/ml2/ml2_conf_sriov.ini  --log-file /var/log/neutron/server.log 

And after that restart the service:

# systemctl restart neutron-server

2. Nova scheduler

We need to tell the Nova scheduler about the SR-IOV so that it can schedule instances to compute nodes with SR-IOV support.

In the [DEFAULT] section of /etc/nova/nova.conf adding the PciPassthroughFilter. Also ensure scheduler_available_filters is set as follows:

[DEFAULT]
scheduler_available_filters=nova.scheduler.filters.all_filters
scheduler_default_filters=RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,CoreFilter,PciPassthroughFilter

And restart Nova scheduler:

# systemctl restart openstack-nova-scheduler

3. Nova compute

Nova compute needs to know which PFs can be used for SR-IOV so that VFs are exposed – actually via PCI-passthrough – to the instances. Also, it needs to know that when we create a network with Neutron specifying the physical network physnet_sriov  – configured before in Neutron with network_vlan_ranges – it will use the SR-IOV NIC.

That’s done by the config flag pci_passthrough_whitelist in /etc/nova/nova.conf:

pci_passthrough_whitelist = {"devname": "p5p1", "physical_network": "physnet_sriov"}

And simply restart Nova compute:

# systemctl restart openstack-nova-compute

4. SR-IOV NIC agent

We can optionally configure the SR-IOV NIC agent to manage the admin state of the NICs. When a VF NIC is used by an instance and then released, sometimes the NIC goes into DOWN state and the admin manually has to bring it back to UP state. There’s an article that describes how to do this in the official Red Hat documentation:

Enable the OpenStack Networking SR-IOV agent

Not all the drivers work with the agent and that was the case for the Intel X540-AT2 NIC.

Creating OpenStack instances with a SR-IOV port

1. Create the network

We configured the physnet_sriov network in Neutron to use the SR-IOV interface p5p1. Let’s create the network and its subnet in Neutron now:

$ neutron net-create nfv_sriov --shared --provider:network_type vlan --provider:physical_network physnet_sriov
$ neutron subnet-create --name nfv_subnet_sriov --disable-dhcp --allocation-pool start=10.0.0.2,end=10.0.0.100 nfv_sriov 10.0.0.0/24

Remember we configured a VLAN range, so Neutron will choose a VLAN from it, but if we wanted to specify one we can by using –provider:segmentation_id=1010 when creating the network.

2. Create the port

We’ll pass a port to the instance instead of the nfv_sriov network. To create it we do this:

$ neutron port-create nfv_sriov --name sriov-port --binding:vnic-type direct

Save the ID of the port as we’ll need it for creating the instance.

3. Create the instance

We will now create an instance that uses two NICs, one created the standard way – in a private network which already existed in Neutron – and a another one with the port created before. Assuming  SRIOV_PORT_ID is the ID of the port and PRIVATE_NETWORK_ID is the ID of the pre-existing private network, this is how we create it:

$ openstack server create --flavor m1.small --nic port-id=$SRIOV_PORT_ID --nic net-id=$PRIVATE_NETWORK_ID --image centos7 sr-iov-instance1

If you have key-pairs or other options  you use, pass them too in the openstack server create command.

Log in the instance as usual and you’ll notice two interfaces, eth0 and probably ens5, which is the SR-IOV NIC ready to be used.

Note as well that one of the VFs has now the same MAC address than the Neutron port we created above:

$ ip link show p5p1
8: p5p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000
    link/ether a0:36:9f:8b:cd:80 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 1 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 4 MAC fa:16:3e:e0:3f:be, spoof checking on, link-state auto

PCI-Passthrough

If our VNF (or any virtualised application for that matter) required direct access to a PCI interface in the hypervisor, the PCI-Passthrough functionality in Libvirt/KVM and OpenStack allows us doing it. This is also common in High Performance Computing (HPC), not only with NIC interfaces but, for example, sharing GPUs with the instances.

In this example we’ll use another NIC interface to pass it to the instance: p5p2 in the hypervisor.

Linux configuration for PCI-Passthrough

First, just like before, make sure the admin state if the interface is UP so let’s do the same:

# ip link show p5p2
# ip link set p5p2 up

And in /etc/sysconfig/network-scripts/ifcfg-p5p2:

BOOTPROTO=none
DEVICE=p5p2
ONBOOT=yes

The kernel options are the same ones we used above so nothing else is required at this point.

OpenStack configuration for PCI-Passthrough

Nova scheduler is already configured for PCI-Passthrough so only Nova compute needs to be made aware of the device we want to pass through.

1. Nova compute

We need a second entry in /etc/nova/nova.conf with pci_passthrough_whitelist. This will tell Nova compute that the interface p5p2 can be taken from the Linux OS and passed into an instance:

pci_passthrough_whitelist={ "devname": "p5p2" }

Now, we need to tag this interface with a name that will be used by Nova during the creation of the instance. For example we can call it my_PF. This is also done in the /etc/nova/nova.conf file:

pci_alias={ "vendor_id": "8086", "product_id": "1528", "name": "my_PF"}

Note that the vendor and product IDs are the same ones than before as both NICs are the same. Again, you can get your PCI device IDs with lspci -nn.

2. Nova flavor

The way OpenStack has been designed to allow passing PCI devices to instances is via flavors. The tag we used before (my_PF) needs to be associated with a new flavor in this way:

$ openstack flavor create --ram 4096 --disk 100 --vcpus 2 m1.medium.pci_passthrough
$ openstack flavor set --property "pci_passthrough:alias"="my_PF:1" m1.medium.pci_passthrough

3. Create the instance

Now all we need to do is launching an instance using this new flavor and it will automatically be configured by Nova compute – and then by Libvirt – with the PCI device in it.

$ openstack server create --flavor m1.medium.pci_passthrough --nic net-id=$PRIVATE_NETWORK_ID --image centos7 pci-passthrough-instance1

Again, if you have more options you need such as key-pairs or adding later a floating IP to access the instance you can do it too.

After that, the instance will show again an interface ens5 which is the p5p2 interface. In addition the interface p5p2 will disappear from the operating system while the instance exists.

OpenStack lab on your laptop with TripleO and director

cloud

This setup allows us to experiment with OSP director in our own laptop, play with the existing Heat templates, create new ones and understand how TripleO is used to install OpenStack, from the confort of your laptop.

VMware Fusion Professional version is used, but this will also work in VMware Workstation with virtually no changes and in vSphere or VirtualBox with an equivalent setup.

This guide uses the official Red Hat documentation, in particular the Director Installation and Usage.

Architecture


Architecture diagram

Standard RHEL OSP 7 architecture with multiple networks, VLANs, bonding and provisioning from the Undercloud / director node via PXE.

RHEL OSP 7 in laptop - [racedo]

Networks and VLANs

No especial setup is needed for enabling VLAN support in VMware Fusion, we just set the VLANs and their networks in RHEL as usual.

DHCP and PXE

DHCP and PXE are provided by the Undercloud VM.

NAT

VMware Fusion NAT will be used to provide external access to the Controller and Compute VMs via the provisioning and external networks. The VMware Fusion NAT below, configures 10.0.0.2 in your Mac OS X as the default gateway for the VMs, which will be used in the TripleO templates as the default gateway IP.

VMware Fusion Networks

The networks are configured in the VMware Fusion menu in Preferences, then Network.

vmnet9

vmnet10

The provisioning (PXE) network is set up in vmnet9, the rest of the networks in vmnet10.

The above describes the architecture of our laptop lab in VMware Fusion. Now, let’s implement it.

Step 1. Create 3 VMs in VMware Fusion


VM specifications

VM vCPUs Memory Disk NICs Boot device
Undercloud 1 3000 MB 20 GB 2 Disk
Controller 2 3000 MB 20 GB 3 1st NIC
Compute 2 3000 MB 20 GB 3 1st NIC

Disk size

You may want to increase the disk size for the controller to be able to test more or larger images and to the compute node to be able to run more or larger instances. 3GB of memory is enough if you include a swap partition for the compute and controller.

VMware network driver in .vmx file

Make sure the network driver in the three VMs is vmxnet3 and not e1000 so that RHEL shows all of them:

$ grep ethernet[0-9].virtualDev Undercloud.vmwarevm/Undercloud.vmx
ethernet0.virtualDev = "vmxnet3"
ethernet1.virtualDev = "vmxnet3"

ethX vs enoX NIC names

By default, the OSP director images have the kernel boot option net.ifnames=0. This will name the network interfaces as ethX as opposed to enoX. This is why in the Undercloud the interface names are eno16777984 and eno33557248 (default net.ifnames=1) and the Controller and Compute VMs have eth0, eth1 and eth2. This may change in RHEL OSP 7.2.

Undercloud VM Networks

This is the mapping of VMware networks to OS NICs. A OVS bridge br-ctlplane will be created automatically by the installation of the Undercloud.

Networks VMware Network RHEL NIC
External vmnet10 eno33557248
Provisioning vmnet9 eno16777984 / br-ctlplane

Copy the MAC addresses of the controller and compute VMs

Make a note of the MAC addresses of the first vNIC in the Controller and Compute VMs.

Screen Shot 2015-10-29 at 16.12.41

Screen Shot 2015-10-29 at 16.19.03

Step 2. Install the Undercloud


Install RHEL 7.1 in your preferred way in the Undercloud VM and then configure it as follows.

Network interfaces

First, set up the network. 192.168.100.10 will be the external IP in eno33557248 and 10.0.0.10 the provisioning IP in eno16777984.

In /etc/sysconfig/network-scripts/ifcfg-eno33557248

TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
NAME=eno33557248
DEVICE=eno33557248
ONBOOT=yes
IPADDR=192.168.100.10
PREFIX=24
GATEWAY=192.168.100.2
DNS1=192.168.100.2

And in /etc/sysconfig/network-scripts/ifcfg-eno16777984

TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
NAME=eno16777984
DEVICE=eno16777984
ONBOOT=yes
IPADDR=10.0.0.10
PREFIX=24

Once the network is set up ssh from your Mac OS X to 192.168.100.10 and not to 10.0.0.10 because the latter will be automatically reconfigured during the Undercloud installation to become the IP of the bridge called br-ctrlplane and you would lose access during the reconfiguration.

Undercloud hostname

The Undercloud needs a fully qualified domain name and it also needs to be present in the /etc/hosts file. For example:

# sudo hostnamectl set-hostname undercloud.osp.poc

And in /etc/hosts:

192.168.100.10 undercloud.osp.poc undercloud

Subscribe RHEL and Install the Undercloud Package

Now, subscribe the RHEL OS to Red Hat’s CDN and enable the required repos.

Then, install the OpenStack client plug-in that will allow us to install the Undercloud

# yum install -y python-rdomanager-oscplugin

Create the user stack

After that, create the stack user, which we will use to do the installation of the Undercloud and later the deployment and management of the Overcloud.

Configure the director

The following undercloud.conf file is a working configuration for this guide, which is mostly self-explanatory.

For a reference of the configuration flags, there’s a documented sample in /usr/share/instack-undercloud/undercloud.conf.sample

Become the stack user and create the file in its home directory.

# su - stack
$ vi ~/undercloud.conf
[DEFAULT]
image_path = /home/stack/images
local_ip = 10.0.0.10/24
undercloud_public_vip = 10.0.0.11
undercloud_admin_vip = 10.0.0.12
local_interface = eno16777984
masquerade_network = 10.0.0.0/24
dhcp_start = 10.0.0.50
dhcp_end = 10.0.0.100
network_cidr = 10.0.0.0/24
network_gateway = 10.0.0.10
discovery_iprange = 10.0.0.100,10.0.0.120
undercloud_debug = true
[auth]

The masquerade_network config flag is optional as in VMware Fusion we already have NAT as explained above, but it might be needed if you use VirtualBox.

Finally, get the Undercloud installed

We will run the installation as the stack user we created

$ openstack undercloud install

Step 3. Set up the Overcloud deployment


Verify the undercloud is working

Load the environment first, then run the service list command:

$ . stackrc
$ openstack service list
+----------------------------------+------------+---------------+
| ID                               | Name       | Type          |
+----------------------------------+------------+---------------+
| 0208564b05b148ed9115f8ab0b04f960 | glance     | image         |
| 0df260095fde40c5ab838affcdbce524 | swift      | object-store  |
| 3b499d3319094de5a409d2c19a725ea8 | heat       | orchestration |
| 44d8d0095adf4f27ac814e1d4a1ef9cd | nova       | compute       |
| 84a1fe11ed464894b7efee7543ecd6d6 | neutron    | network       |
| c092025afc8d43388f67cb9773b1fb27 | keystone   | identity      |
| d1a85475321e4c3fa8796a235fd51773 | nova       | computev3     |
| d5e1ad8cca1549759ad1e936755f703b | ironic     | baremetal     |
| d90cb61c7583494fb1a2cffd590af8e8 | ceilometer | metering      |
| e71d47d820c8476291e60847af89f52f | tuskar     | management    |
+----------------------------------+------------+---------------+

Configure the fake_pxe Ironic driver

Ironic doesn’t have a driver for powering on and off VMware Fusion VMs so we will do it manually. We need to configure the fake_pxe driver for this.

Edit /etc/ironic/ironic.conf and add it:

enabled_drivers = pxe_ipmitool,pxe_ssh,pxe_drac,fake_pxe

Then restart ironic-conductor and verify the driver is loaded:

$ sudo systemctl restart openstack-ironic-conductor
$ ironic driver-list
+---------------------+--------------------+
| Supported driver(s) | Active host(s)     |
+---------------------+--------------------+
| fake_pxe            | undercloud.osp.poc |
| pxe_drac            | undercloud.osp.poc |
| pxe_ipmitool        | undercloud.osp.poc |
| pxe_ssh             | undercloud.osp.poc |
+---------------------+--------------------+

Upload the images into the Undercloud’s Glance

Download the images that will be used to deploy the OpenStack nodes to the directory specified in the image_path in the undercloud.conf file, in our example /home/stack/images. Get the images and untar them as described here. Then upload them into Glance in the Undercloud:

$ openstack overcloud image upload --image-path /home/stack/images/

Define the VMs into the Undercloud’s Ironic

TripleO needs to know about the nodes, in our case the VMware Fusion VMs. We describe them in the file instackenv.json which we’ll create in the home directory of the stack user.

Notice that here is where we use the MAC addresses we took from the two VMs.

{
 "nodes": [
 {
   "arch": "x86_64",
   "cpu": "2",
   "disk": "20",
   "mac": [
   "00:0c:29:8f:1e:7b"
   ],
   "memory": "3000",
   "pm_type": "fake_pxe"
 },
 {
   "arch": "x86_64",
   "cpu": "2",
   "disk": "20",
   "mac": [
   "00:0C:29:41:0F:4E"
   ],
   "memory": "3000",
   "pm_type": "fake_pxe"
 }
 ]
}

Import them to the undercloud:

$ openstack baremetal import --json instackenv.json

The command above adds the nodes to Ironic:

$ ironic node-list
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| UUID                                 | Name | Instance UUID                        | Power State | Provision State | Maintenance |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+
| 111cf49a-eb9e-421d-af05-35ab0d74c5d6 | None | 941bbdf9-43c0-442e-8b65-0bd531322509 | power off   | available       | False       |
| e579df9f-528f-4d14-94bc-07b2af4b252f | None | f1bd425b-a4d9-4eca-8bc4-ee31b300e381 | power off   | available       | False       |
+--------------------------------------+------+--------------------------------------+-------------+-----------------+-------------+

To finish the registration of the nodes we run this command:

$ openstack baremetal configure boot

Discover the nodes

At this point we are ready to start discovering the nodes, i.e. having Ironic powering them on, booting with the discovery image that was uploaded before and then shutting them down after the relevant hardware information has been saved in the node metadata in Ironic. This process is called introspection.

Note that as we use the fake_pxe driver, Ironic won’t power on the VMs, so we do it manually in VMware Fusion. We wait until the output of ironic node-list tells us that the power state is on and then we run this command:

$ openstack baremetal introspection bulk start

Assign the roles to the nodes in Ironic

There are two roles in this example, compute and control. We will assign them manually with Ironic.

$ ironic node-update 111cf49a-eb9e-421d-af05-35ab0d74c5d6 add properties/capabilities='profile:compute,boot_option:local'
$ ironic node-update e579df9f-528f-4d14-94bc-07b2af4b252f add properties/capabilities='profile:control,boot_option:local'

Create the flavors in Glance and associate them with the roles in ironic

This consists in creating the flavors matching the specs of the VMs and then adding the property control and compute to the corresponding flavors to match Ironic’s as done in the previous step. Then, it also requires a flavor called baremetal.

$ openstack flavor create --id auto --ram 3000 --disk 17 --vcpus 2 --swap 2000 compute
$ openstack flavor create --id auto --ram 3000 --disk 19 --vcpus 2 --swap 1500 control

TripleO also needs a flavor called baremetal (which we won’t use):

$ openstack flavor create --id auto --ram 3000 --disk 19 --vcpus 2 baremetal

Notice the disk size is 1 GB smaller than the VM’s disk. This is a precaution to avoid No valid host found when deploying with Ironic, which sometimes is a bit too sensitive.

Also, notice that I added swap because 3 GB of memory is not enough and the out of memory killer could be triggered otherwise.

Now we make the flavors match with the capabilities we set in the Ironic nodes in the previous step:

$ openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="control" control
$ openstack flavor set --property "cpu_arch"="x86_64" --property "capabilities:boot_option"="local" --property "capabilities:profile"="compute" compute

 

Step 4. Create the TripleO templates


Get the TripleO templates

Copy the TripleO heat templates to the home directory of the stack user.

$ mkdir ~/templates
$ cp -r /usr/share/openstack-tripleo-heat-templates/ ~/templates/

Create the network definitions

These are our network definitions:

Network Subnet VLAN
Provisioning 10.0.0.0/24 VMware native
Internal API 172.16.0.0/24 201
Tenant 172.17.0.0/24 204
Storage 172.18.0.0/24 202
Storage Management 172.19.0.0/24 203
External 192.168.100.0/24 VMware native

To allow creating dedicated networks for specific services we describe them in a Heat template that we can call network-environment.yaml.

$ vi ~/templates/network-environment.yaml
resource_registry:
 OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/templates/nic-configs/compute.yaml
 OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/templates/nic-configs/controller.yaml

parameter_defaults:

 # The IP address of the EC2 metadata server. Generally the IP of the Undercloud
 EC2MetadataIp: 10.0.0.10
 # Gateway router for the provisioning network (or Undercloud IP)
 ControlPlaneDefaultRoute: 10.0.0.2
 DnsServers: ["10.0.0.2"]

 InternalApiNetCidr: 172.16.0.0/24
 TenantNetCidr: 172.17.0.0/24
 StorageNetCidr: 172.18.0.0/24
 StorageMgmtNetCidr: 172.19.0.0/24
 ExternalNetCidr: 192.168.100.0/24

 # Leave room for floating IPs in the External allocation pool
 ExternalAllocationPools: [{'start': '192.168.100.100', 'end': '192.168.100.200'}]
 InternalApiAllocationPools: [{'start': '172.16.0.10', 'end': '172.16.0.200'}]
 TenantAllocationPools: [{'start': '172.17.0.10', 'end': '172.17.0.200'}]
 StorageAllocationPools: [{'start': '172.18.0.10', 'end': '172.18.0.200'}]
 StorageMgmtAllocationPools: [{'start': '172.19.0.10', 'end': '172.19.0.200'}]

 InternalApiNetworkVlanID: 201
 StorageNetworkVlanID: 202
 StorageMgmtNetworkVlanID: 203
 TenantNetworkVlanID: 204

 # ExternalNetworkVlanID: 100
 # Set to the router gateway on the external network
 ExternalInterfaceDefaultRoute: 192.168.100.2
 # Set to "br-ex" if using floating IPs on native VLAN on bridge br-ex
 NeutronExternalNetworkBridge: "br-ex"

 # Customize bonding options if required
 BondInterfaceOvsOptions:
 "bond_mode=active-backup"

More information about this template can be found here.

Configure the NICs of the VMs

We have examples of NIC configurations for multiple networks and bonding in /usr/share/openstack-tripleo-heat-templates/network/config/bond-with-vlans/

We will use them as a template to define the Controller and Compute NIC setup.

$ mkdir ~/templates/nic-configs/
$ cp /usr/share/openstack-tripleo-heat-templates/network/config/bond-with-vlans/* ~/templates/nic-configs/

Notice that they are called from the previous template network-environment.yaml.

Controller NICs

We want this setup in the controller:

Bonded Interface  Bond Slaves Bond Mode
bond1 eth1, eth2 active-backup
Networks VMware Network RHEL NIC
Provisioning vmnet9 eth0
External vmnet10 bond1 / br-ex
Internal vmnet10 bond1 / vlan201
Tenant vmnet10 bond1 / vlan204
Storage vmnet10 bond1 / vlan202
Storage Management vmnet10 bond1 / vlan203

We only need to modify the resources section of the ~/templates/nic-configs/controller.yaml to match the configuration in the table above:

$ vi ~/templates/nic-configs/controller.yaml
[...]
resources:
  OsNetConfigImpl:
    type: OS::Heat::StructuredConfig
    properties:
      group: os-apply-config
      config:
        os_net_config:
          network_config:
            -
              type: interface
              name: nic1
              use_dhcp: false
              addresses:
                -
                  ip_netmask:
                    list_join:
                      - '/'
                      - - {get_param: ControlPlaneIp}
                        - {get_param: ControlPlaneSubnetCidr}
              routes:
                -
                  ip_netmask: 169.254.169.254/32
                  next_hop: {get_param: EC2MetadataIp}
            -
              type: ovs_bridge
              name: {get_input: bridge_name}
              addresses:
                - ip_netmask: {get_param: ExternalIpSubnet}
              routes:
                - ip_netmask: 0.0.0.0/0
                  next_hop: {get_param: ExternalInterfaceDefaultRoute}
              dns_servers: {get_param: DnsServers}
              members:
                -
                  type: ovs_bond
                  name: bond1
                  ovs_options: {get_param: BondInterfaceOvsOptions}
                  members:
                    -
                      type: interface
                      name: nic2
                      primary: true
                    -
                      type: interface
                      name: nic3
                -
                  type: vlan
                  device: bond1
                  vlan_id: {get_param: InternalApiNetworkVlanID}
                  addresses:
                  -
                    ip_netmask: {get_param: InternalApiIpSubnet}
                -
                  type: vlan
                  device: bond1
                  vlan_id: {get_param: StorageNetworkVlanID}
                  addresses:
                  -
                    ip_netmask: {get_param: StorageIpSubnet}
                -
                  type: vlan
                  device: bond1
                  vlan_id: {get_param: StorageMgmtNetworkVlanID}
                  addresses:
                  -
                    ip_netmask: {get_param: StorageMgmtIpSubnet}
                -
                  type: vlan
                  device: bond1
                  vlan_id: {get_param: TenantNetworkVlanID}
                  addresses:
                  -
                    ip_netmask: {get_param: TenantIpSubnet}

outputs:
  OS::stack_id:
    description: The OsNetConfigImpl resource.
    value: {get_resource: OsNetConfigImpl}

Compute NICs

In the compute node we want this setup:

Bonded Interface  Bond Slaves Bond Mode
bond1 eth1, eth2 active-backup
Networks VMware Network RHEL NIC
Provisioning vmnet9 eth0
Internal vmnet10 bond1 / vlan201
Tenant vmnet10 bond1 / vlan204
Storage vmnet10 bond1 / vlan202
$ vi ~/templates/nic-configs/compute.yaml
[...]
resources:
  OsNetConfigImpl:
    type: OS::Heat::StructuredConfig
    properties:
      group: os-apply-config
      config:
        os_net_config:
          network_config:
            -
              type: interface
              name: nic1
              use_dhcp: false
              dns_servers: {get_param: DnsServers}
              addresses:
                -
                  ip_netmask:
                    list_join:
                      - '/'
                      - - {get_param: ControlPlaneIp}
                        - {get_param: ControlPlaneSubnetCidr}
              routes:
                -
                  ip_netmask: 169.254.169.254/32
                  next_hop: {get_param: EC2MetadataIp}
                -
                  default: true
                  next_hop: {get_param: ControlPlaneDefaultRoute}
            -
              type: ovs_bridge
              name: {get_input: bridge_name}
              members:
                -
                  type: ovs_bond
                  name: bond1
                  ovs_options: {get_param: BondInterfaceOvsOptions}
                  members:
                    -
                      type: interface
                      name: nic2
                      primary: true
                    -
                      type: interface
                      name: nic3
                -
                  type: vlan
                  device: bond1
                  vlan_id: {get_param: InternalApiNetworkVlanID}
                  addresses:
                  -
                    ip_netmask: {get_param: InternalApiIpSubnet}
                -
                  type: vlan
                  device: bond1
                  vlan_id: {get_param: StorageNetworkVlanID}
                  addresses:
                  -
                    ip_netmask: {get_param: StorageIpSubnet}
                -
                  type: vlan
                  device: bond1
                  vlan_id: {get_param: TenantNetworkVlanID}
                  addresses:
                  -
                    ip_netmask: {get_param: TenantIpSubnet}
outputs:
  OS::stack_id:
    description: The OsNetConfigImpl resource.
    value: {get_resource: OsNetConfigImpl}

Enable Swap

Enabling the swap partition is done from within the OS. Ironic only creates the partition as instructed in the flavor. This can be done with the templates that allow running first boot scripts via cloud-init.

First, the template for running at cloud-init userdata /home/stack/templates/firstboot/firstboot.yaml

resource_registry:
 OS::TripleO::NodeUserData: /home/stack/templates/firstboot/userdata.yaml

Then, the actual script for enabling swap /home/stack/templates/firstboot/userdata.yaml

heat_template_version: 2014-10-16

resources:
 userdata:
   type: OS::Heat::MultipartMime
   properties:
   parts:
   - config: {get_resource: swapon_config}

 swapon_config:
   type: OS::Heat::SoftwareConfig
   properties:
   config: |
     #!/bin/bash
     swap_device=$(sudo fdisk -l | grep swap | awk '{print $1}')
     if [[ $swap_device && ${swap_device} ]]; then
       rc_local="/etc/rc.d/rc.local"
       echo "swapon $swap_device " >> $rc_local
       chmod 755 $rc_local
       swapon $swap_device
     fi
outputs:
 OS::stack_id:
 value: {get_resource: userdata}

 

Step 5. Deploy the Overcloud


Summary

We have everything we need to deploy now:

  • The Undercloud configured.
  • Flavors for the compute and controller nodes.
  •  Images for the discovery and deployment of the nodes.
  • Templates defining the networks in OpenStack.
  • Templates defining the nodes’ NICs configuration.
  • A first boot script used to enable swap.

We will use all this information when running the deploy command:

$ openstack overcloud deploy \
--templates templates/openstack-tripleo-heat-templates/ \
-e templates/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e templates/network-environment.yaml \
-e templates/firstboot/firstboot.yaml \
--control-flavor control \
--compute-flavor compute \
--neutron-tunnel-types vxlan --neutron-network-type vxlan \
--ntp-server clock.redhat.com

After a successful deployment you’ll see this:

Deploying templates in the directory /home/stack/templates/openstack-tripleo-heat-templates
[...]
Overcloud Endpoint: http://192.168.100.100:5000/v2.0/
Overcloud Deployed

An overcloudrc file with the environment is created for you to start using the new OpenStack environment deployed in your laptop.

Step 6. Start using the Overcloud


Now we are ready to start testing our newly deployed platform.

$ . overcloudrc
$ openstack service list
+----------------------------------+------------+---------------+
| ID | Name | Type |
+----------------------------------+------------+---------------+
| 043524ae126b4f23bd3fb7826a557566 | glance     | image         |
| 3d5c8d48d30b41e9853659ce840ae4fe | neutron    | network       |
| 418d4f34abe449aa8f07dac77c078e9c | nova       | computev3     |
| 43480fab74fd4fd480fdefc56eecfe83 | cinderv2   | volumev2      |
| 4e01d978a648474db6d5b160cd0a71e1 | nova       | compute       |
| 6357f4122d6d41b986dab40d6fb471e3 | cinder     | volume        |
| a49119e0fd9f43c0895142e3b3f3394a | keystone   | identity      |
| b808ae83589646e6b7033f2b150e7623 | horizon    | dashboard     |
| d4c9383fa9e94daf8c74419b0b18fd6e | heat       | orchestration |
| db556409857d4d24872cdc1b718eee8f | swift      | object-store  |
| ddc3c82097d24f478edfc89b46310522 | ceilometer | metering      |
+----------------------------------+------------+---------------+

Understanding OpenStack Heat Auto Scaling

Heat Autoscaling

OpenStack Heat can deploy and configure multiple instances in one command using resources we have in OpenStack. That’s called a Heat Stack.

Heat will create instances from images using existing flavors and networks. It can configure LBaaS and provide VIPs for our load-balanced instances. It can also use the metadata service to inject files, scripts or variables after instance deployment. It can even use Ceilometer to create alarms based on instance CPU usage and associate actions like spinning up or terminating instances based on CPU load.

All the above is done by Heat to provide autoscaling capabilities to our applications. In this post I explain how to do this in RHEL 7 instances. If you want to reproduce this in another OS it’s as simple as replacing how the example webapp packages are installed.

Steps to have Heat autoscaling

1. Create a WordPress repo in a RHEL 7 box. Make sure it’s a basic installation so that all the dependencies are downloaded along with WordPress:

# Install EPEL and Remi repos first, then create a repo
yum -y install http://dl.fedoraproject.org/pub/epel/beta/7/x86_64/epel-release-7-0.2.noarch.rpm
yum -y install http://rpms.famillecollet.com/enterprise/remi-release-7.rpm
yum -y --enablerepo=remi install wordpress --downloadonly --downloaddir=/var/www/html/repos/wordpress
createrepo /var/www/html/repos/wordpress

2. Create a repo for rhel-7-server-rpms with something like:

# First register to Red Hat's CDN with subscription-manager register
# Then subscribe to the channels to be synchronised
reposync -p /var/www/html/repos/ rhel-7-server-rpms
createrepo /var/www/html/repos/rhel-7-server-rpms

3. Download the Heat template which consists on two files: autoscaling.yaml and lb_server.yaml 

Note: The template autoscaling.yaml uses lb_server.yaml (nested stacks?) and it can’t be deployed from Horizon right now due to a bug. It works fine from the command line as described below.

[Update] Note: I made it work in horizon by:

  • Publishing the two templates on a web server.
  • Modifying the autoscaling.yaml template published in the web server to call the nested template like this:
type: http://172.16.0.129:81/repos/heat-templates/lb_server.yaml

4. Modify the Heat template so the first thing it does when cloud-init executes the script passed via user_data by Heat is installing the WordPress repos.

a. Right before yum -y install httpd wordpress add the repos making it look like this:

 
[...]     
      user_data_format: RAW
      user_data:
        str_replace:
          template: |
            #!/bin/bash -v
            #Add local repos for wordpress and rhel7
            cat << EOF >> /etc/yum.repos.d/rhel.repo
            [rhel-7-server-rpms]
            name=rhel-7-server-rpms
            baseurl=http://172.16.0.129:81/repos/rhel-7-server-rpms
            gpgcheck=0
            enabled=1

            [wordpress]
            name=wordpress
            baseurl=http://172.16.0.129:81/repos/wordpress
            gpgcheck=0
            enabled=1
            EOF

            yum -y install httpd wordpress
[...]

b. And right before yum -y install mariadb mariadb-server do exactly the same.

Note: I’m assuming that your two repos are accessible via http from the instances.

Note: All of these steps are optional. If your instances pull packages directly from the Internet and/or another repository you can skip or adapt this to your environment.

5. Take note of:

  • The glance image you will use: nova image-list
    • Note: I’m using the RHEL 7 image available in the Red Hat Customer Portal  rhel-guest-image-7.0-20140618.1.x86_64.qcow2
  • The ssh key pair you want to use: nova keypair-list
  • The flavor you want to use with them: nova flavor-list
  • The subnet where the instances of the Heat stack will be launched on.

6. Create the Heat stack:

heat stack-create AutoscalingWordpress -f autoscaling.yaml \
-P image=rhel7 \
-P key=ramon \
-P flavor=m1.small \
-P database_flavor=m1.small \
-P subnet_id=44908b41-ce16-4f8c-ba6c-9bb4303e6d3f \
-P database_name=wordpress \
-P database_user=wordpress

Note: Here we use all the parameters from the template downloaded before. The are found in the parameters: section of the YAML file. We could add default: value within the template alternatively.

Now, what I do right after is a tail -f /var/log/heat/*log in the controller node, where I have Heat installed, just to make sure everything is fine with the creation of the heat stack.

7. Verify Heat created a LBaaS pool and VIP:

[root@racedo-rhel7-1 heat(keystone_demo)]# neutron lb-pool-list  
+--------------------------------------+----------------------------------------+----------+-------------+----------+----------------+--------+  
| id                                  | name                                  | provider | lb_method  | protocol | admin_state_up | status |  
+--------------------------------------+----------------------------------------+----------+-------------+----------+----------------+--------+  
| 78f02e89-aa07-40fd-917b-1481175b43e8 | AutoscalingWordpress-pool-46zb7elgzamo | haproxy  | ROUND_ROBIN | HTTP    | True          | ACTIVE |  
+--------------------------------------+----------------------------------------+----------+-------------+----------+----------------+--------+
[root@racedo-rhel7-1 heat(keystone_demo)]# neutron lb-vip-list  
+--------------------------------------+----------+-----------+----------+----------------+--------+  
| id                                  | name    | address  | protocol | admin_state_up | status |  
+--------------------------------------+----------+-----------+----------+----------------+--------+  
| 8da663cb-43d7-49af-9343-360431e02655 | pool.vip | 10.1.1.14 | HTTP    | True          | ACTIVE |  
+--------------------------------------+----------+-----------+----------+----------------+--------+  

8. Associate a floating IP to the VIP: neutron floatingip-associate FLOATING_IP_ID VIP_NEUTRON_PORT_ID. In my case I need a floating IP:

[root@racedo-rhel7-1 heat(keystone_demo)]# neutron lb-vip-show pool.vip | grep port_id
| port_id             | 13c01599-23f1-4e1e-96d9-72f2775e6183 |
[root@racedo-rhel7-1 heat(keystone_demo)]# neutron floatingip-list
+--------------------------------------+------------------+---------------------+--------------------------------------+
| id                                   | fixed_ip_address | floating_ip_address | port_id                              |
+--------------------------------------+------------------+---------------------+--------------------------------------+
| 0525f959-5213-4291-a1f0-a2ea2b40e11c |                  | 172.16.0.53         |                                      |
| 09f1bdc9-228b-4057-a5d1-3327ccc0bfc8 |                  | 172.16.0.54         |                                      |
| 5538961a-3423-46a3-9744-aba699e722c5 |                  | 172.16.0.52         |                                      |
+--------------------------------------+------------------+---------------------+-------------------------------------
root@racedo-rhel7-1 heat(keystone_demo)]# neutron floatingip-associate 0525f959-5213-4291-a1f0-a2ea2b40e11c 13c01599-23f1-4e1e-96d9-72f2775e6183

Note: This is optional if your instances are connected to a provider network where you can access directly instead of to a tenant network like in this example.

9. Verify that Heat created the two Ceilometer alarms; one to scale out on high CPU usage and another one to scale down on low CPU:

root@racedo-rhel7-1 heat(keystone_demo)]# ceilometer alarm-list
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+
| Alarm ID                            | Name                                            | State | Enabled | Continuous | Alarm condition                | Time constraints |
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+
| 1610f404-8df7-46ed-b131-6d3797fc9e4e | AutoscalingWordpress-cpu_alarm_low-vinrbn2rdjpx  | alarm | True    | False      | cpu_util < 15.0 during 1 x 600s | None            |
| 53c124bd-db57-4909-af55-009f5a635937 | AutoscalingWordpress-cpu_alarm_high-42dc5funjeds | ok    | True    | False      | cpu_util > 50.0 during 1 x 60s  | None            |
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+

10. Verify you can access the WordPress using the VIP:

Wordpress

11. Now ssh into the WordPress web instance (not the DB one) and put some CPU load, just a couple of dd commands will suffice. Add a floating IP to the instance first if necessary.

[cloud-user@au-g6hl-ye4uglqb5t7r-ylpghgnzyck3-server-nobvg6ftaoe7 ~]$ dd if=/dev/zero of=/dev/null  &  
[1] 908  
[cloud-user@au-g6hl-ye4uglqb5t7r-ylpghgnzyck3-server-nobvg6ftaoe7 ~]$ dd if=/dev/zero of=/dev/null  &  
[2] 909  
[cloud-user@au-g6hl-ye4uglqb5t7r-ylpghgnzyck3-server-nobvg6ftaoe7 ~]$ dd if=/dev/zero of=/dev/null  &  
[3] 910  
[cloud-user@au-g6hl-ye4uglqb5t7r-ylpghgnzyck3-server-nobvg6ftaoe7 ~]$ dd if=/dev/zero of=/dev/null  &  
[4] 911  
[cloud-user@au-g6hl-ye4uglqb5t7r-ylpghgnzyck3-server-nobvg6ftaoe7 ~]$ dd if=/dev/zero of=/dev/null  &  
[5] 912  
[cloud-user@au-g6hl-ye4uglqb5t7r-ylpghgnzyck3-server-nobvg6ftaoe7 ~]$ top  
top - 11:01:05 up 10 min,  1 user,  load average: 6.81, 1.12, 0.71  
Tasks:  90 total,  8 running,  82 sleeping,  0 stopped,  0 zombie  
%Cpu(s): 24.3 us, 75.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st  
KiB Mem:  1018312 total,  235068 used,  783244 free,      688 buffers  
KiB Swap:        0 total,        0 used,        0 free.    95480 cached Mem  
  
  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM    TIME+ COMMAND  
  908 cloud-u+  20  0  107920    620    528 R 15.8  0.1  0:09.80 dd  
  909 cloud-u+  20  0  107920    616    528 R 15.5  0.1  0:08.49 dd  
  911 cloud-u+  20  0  107920    620    528 R 15.5  0.1  0:07.88 dd  
  912 cloud-u+  20  0  107920    616    528 R 15.5  0.1  0:07.71 dd  
  910 cloud-u+  20  0  107920    620    528 R 15.2  0.1  0:08.11 dd  

12. Observe how Ceilometer triggers an alarm (State goes to alarm) and how a new instance is launched:

[root@racedo-rhel7-1 heat(keystone_demo)]# ceilometer alarm-list  
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+  
| Alarm ID                            | Name                                            | State | Enabled | Continuous | Alarm condition                | Time constraints |  
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+  
| 1610f404-8df7-46ed-b131-6d3797fc9e4e | AutoscalingWordpress-cpu_alarm_low-vinrbn2rdjpx  | ok    | True    | False      | cpu_util < 15.0 during 1 x 600s | None            |  
| 53c124bd-db57-4909-af55-009f5a635937 | AutoscalingWordpress-cpu_alarm_high-42dc5funjeds | alarm | True    | False      | cpu_util > 50.0 during 1 x 60s  | None            |  
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+

Wordpress Autoscaling

13. Kill the dd processes (directly from top press k and kill all of them)

14. Wait about 10 minutes, which is the duration of the scale down alarm by default in our template. The state of the alarm in Ceilometer will go to alarm just like before but now due to lack of CPU load. You’ll see how one of the two instances is deleted:

[root@racedo-rhel7-1 heat(keystone_demo)]# ceilometer alarm-list  
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+  
| Alarm ID                             | Name                                             | State | Enabled | Continuous | Alarm condition                 | Time constraints |  
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+  
| 1610f404-8df7-46ed-b131-6d3797fc9e4e | AutoscalingWordpress-cpu_alarm_low-vinrbn2rdjpx  | alarm | True    | False      | cpu_util < 15.0 during 1 x 600s | None             |  
| 53c124bd-db57-4909-af55-009f5a635937 | AutoscalingWordpress-cpu_alarm_high-42dc5funjeds | ok    | True    | False      | cpu_util > 50.0 during 1 x 60s  | None             |  
+--------------------------------------+--------------------------------------------------+-------+---------+------------+---------------------------------+------------------+  

That’s all.

Multiple Private Networks with Open vSwitch GRE Tunnels and Libvirt

 

Libvirt and GRE Tunnels

GRE tunnels are extremely useful for many reasons. One use case is to be able to design and test an infrastructure requiring multiple networks on a typical home lab with limited hardware, such as laptops and desktops with only 1 ethernet card.

As an example, to design an OpenStack infrastructure for a production environment with RDO or Red Hat Enterprise Linux OpenStack Platform (RHEL OSP) three separate networks are recommended.

These networks will have services such as DHCP (even multiple DHCP servers if eventually needed) as they will be completely isolated from each other. Testing multiple VLANs or trunking is also possible with this setup.

The diagram above should be almost self-explanatory and describes this setup with Open vSwitch, GRE tunnels and Libvirt.

Step by Step on CentOS 6.5

1. Install CentOS 6.5 choosing the Basic Server option

2. Install the EPEL and RDO repos which provide Open vSwitch and iproute Namespace support:

# yum install http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
# yum install http://repos.fedorapeople.org/repos/openstack/openstack-icehouse/rdo-release-icehouse-3.noarch.rpm

3. Install Libvirt, Open vSwitch and virt-install:

# yum install libvirt openvswitch python-virtinst

4. Create the bridge that will be associated to eth0:

# ovs-vsctl add-br br-eth0

5. Set up your network on the br-eth0 bridge with the configuration you had on eth0 and change the eth0 network settings as follows (with your own network settings):

# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MTU=1546
# cat /etc/sysconfig/network-scripts/ifcfg-br-eth0
DEVICE=br-eth0
TYPE=OVSBridge
ONBOOT=yes
BOOTPROTO=none
IPADDR0=192.168.2.1
PREFIX0=24
DNS1=192.168.2.254

Notice the MTU setting above. This is very important as GRE adds encapsulation bytes. There are two options, increasing the MTU in the hosts like in this example or decreasing the MTU in the guests if your NIC doesn’t support MTUs larger than 1500 bytes.

6. Add eth0 to br-eth0 and restart the network to pick up the changes made in the previous step:

# ovs-vsctl add-port br-eth0 eth0 && service network restart

7. Make sure your network still works as it did before the changes above

8. Assuming this host has the IP 192.168.2.1 and you have two other hosts where you will do this same (or compatible) setup with the IPs 192.168.2.2 and 192.168.2.3,  create the internal ovs bridge br-int0 and set the GRE tunnel endpoints gre0 and gre1 (note that the diagram above has only two hosts but a you can add more hosts with identical setup):

# ovs-vsctl add-br br-int0
# ovs-vsctl add-port br-int0 gre0 -- set interface gre0 type=gre options:remote_ip=192.168.2.2
# ovs-vsctl add-port br-int0 gre1 -- set interface gre1 type=gre options:remote_ip=192.168.2.3

Notice there is another way to set up GRE tunnels using /etc/sysconfig/network-scripts/ in CentOS/RHEL but the method explained here works in any Linux distro and is equally persistent. Choose whichever you find appropriate.

9. Enable STP (needed for more than 2 hosts):

# ovs-vsctl set bridge br-int0 stp_enable=true

10. Create a file called libvirt-vlabs.xml with the definition of the Libvirt network that will use the Open vSwitch bridge br-int0 (and the GRE tunnels) we just created. Check the diagram above for reference:

<network>
  <name>ovs-network</name>
  <forward mode='bridge'/>
  <bridge name='br-int0'/>
  <virtualport type='openvswitch'/>
  <portgroup name='no-vlan' default='yes'>
  </portgroup>
  <portgroup name='vlan-100'>
    <vlan>
      <tag id='100'/>
    </vlan>
  </portgroup>
  <portgroup name='vlan-200'>
    <vlan>
      <tag id='200'/>
    </vlan>
  </portgroup>
</network>

11. Remove (optionally) the default network that Libvirt creates and add (mandatory) the network defined in the previous step:

# virsh net-destroy default
# virsh net-autostart --disable default
# virsh net-undefine default
# virsh net-define libvirt-vlans.xml
# virsh net-autostart ovs-network
# virsh net-start ovs-network

12. Create a Libvirt storage pool where your VMs will be created (needed to use qcow2 disk format). I chose /home/VMs/pool but it can be anywhere you find appropriate:

# virsh pool-define-as --name VMs-pool --type dir --target /home/VMs/pool/

13. Asuming you are installing a CentOS VM and that the location of the ISO is /home/VMs/ISOs/CentOS-6.5-x86_64-bin-DVD1.iso, create a VM named foreman (or any name you like) with virt-install:

# virt-install \
--name foreman \
--ram 1024 \
--vcpus=1 \
--disk size=20,format=qcow2,pool=VMs-pool \
--nonetworks \
--cdrom /home/VMs/ISOs/CentOS-6.5-x86_64-bin-DVD1.iso \
--graphics vnc,listen=0.0.0.0,keymap=en_gb --noautoconsole --hvm \
--os-variant rhel6

14. Use a VNC client to access the screen of the VM during the installation. Finish the installation and shut down the VM.

15. Edit the VM with virsh edit foreman (following name used in the example above) to add the 3 networks created before. At the bottom of the VM definition, just before </devices> add the following:


<interface type='network'>
  <source network='ovs-network' portgroup='no-vlan'/>
  <model type='virtio'/>
</interface>
<interface type='network'>
  <source network='ovs-network' portgroup='vlan-100'/>
  <model type='virtio'/>
</interface>
<interface type='network'>
  <source network='ovs-network' portgroup='vlan-200'/>
  <model type='virtio'/>
</interface>

Now you can start your VM with virsh start foreman, set up the network (in any or all of the three interfaces). Repeat the same process in another host and VM and you are good to go and install something like Foreman and OpenStack without having to have more than one network interface per host.

Resizing OpenStack Volumes

Hard Drive

Resizing a Volume with Cinder in Havana

Cinder has the extend functionality in Havana which allows an easy resizing procedure of volumes. It works as expected on the volumes. On the OS I have found it less reliable when using resize2fs on the extended volume. Maybe I haven’t done enough tests yet but in any case the method below works in Havana and in Grizzly.

Resizing a Volume in Grizzly and Havana

The following method works in both Grizzly and Havana. It can be entirely done by the tenant with the nova command (the cinder client is not needed).

1. Identify the volume to be resized

$ nova volume-list
+--------------------------------------+-----------+--------------+------+-------------+--------------------------------------+
| ID | Status | Display Name | Size | Volume Type | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+--------------------------------------+
| 44bcd404-8a6e-41d8-9d56-2ac4e0c1e97c | in-use | None | 10 | None | 438cdb78-5573-4ab0-9f89-79cad806286c |
| 010ea497-98d5-4ace-a6aa-bdc847628cee | available | None | 1 | None | |
+--------------------------------------+-----------+--------------+------+-------------+--------------------------------------+

2. Detach the volume from its instance. It is recommended to ssh into the instance and to unmount it first.

$ nova volume-detach VM1 44bcd404-8a6e-41d8-9d56-2ac4e0c1e97c

3. Create a snapshot of the volume:

$ nova volume-snapshot-create 44bcd404-8a6e-41d8-9d56-2ac4e0c1e97c
+---------------------+--------------------------------------+
| Property            | Value                                |
+---------------------+--------------------------------------+
| status              | creating                             |
| display_name        | None                                 |
| created_at          | 2014-01-16T16:20:17.739982           |
| display_description | None                                 |
| volume_id           | 44bcd404-8a6e-41d8-9d56-2ac4e0c1e97c |
| size                | 10                                   |
| id                  | ea8a1c24-982e-4d63-809f-38f0ad974604 |
| metadata            | {}                                   |
+---------------------+--------------------------------------+

4. Create a new volume from the the snapshot of the volume we are resizing specifying the new desired size:

$ nova volume-create --snapshot-id ea8a1c24-982e-4d63-809f-38f0ad974604 15
+---------------------+--------------------------------------+
| Property            | Value                                |
+---------------------+--------------------------------------+
| status              | creating                             |
| display_name        | None                                 |
| attachments         | []                                   |
| availability_zone   | nova                                 |
| bootable            | false                                |
| created_at          | 2014-01-16T16:22:10.634404           |
| display_description | None                                 |
| volume_type         | None                                 |
| snapshot_id         | ea8a1c24-982e-4d63-809f-38f0ad974604 |
| source_volid        | None                                 |
| size                | 15                                   |
| id                  | 408a9d90-6498-4e87-a26a-43fe506b1b1d |
| metadata            | {}                                   |
+---------------------+--------------------------------------+

5. Wait until the status of the newly created volume is available and attach it to the instance using another device name (if it originally was /dev/vdc then use /dev/vdd for example):

$ nova volume-attach VM1 408a9d90-6498-4e87-a26a-43fe506b1b1d  /dev/vdd
+----------+--------------------------------------+
| Property | Value                                |
+----------+--------------------------------------+
| device   | /dev/vdd                             |
| serverId | 438cdb78-5573-4ab0-9f89-79cad806286c |
| id       | 408a9d90-6498-4e87-a26a-43fe506b1b1d |
| volumeId | 408a9d90-6498-4e87-a26a-43fe506b1b1d |
+----------+--------------------------------------+

Note that the snapshot created in step 3 can now be deleted as well as the original volume.

From within the instance OS, assuming it’s a Linux VM, we need to make the OS aware of the new size.

ubuntu@vm1:/$ sudo e2fsck -f /dev/vdc
e2fsck 1.42 (29-Nov-2011)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/vdc: 13/65536 files (0.0% non-contiguous), 12637/262144 blocks
ubuntu@vm1:/$ sudo resize2fs /dev/vdc
resize2fs 1.42 (29-Nov-2011)
Resizing the filesystem on /dev/vdc to 1310720 (4k) blocks.
The filesystem on /dev/vdc is now 1310720 blocks long.

ubuntu@vm1:/$ sudo mount /dev/vdc /mnt2
ubuntu@vm1:/$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       9.9G  828M  8.6G   9% /
udev            494M  8.0K  494M   1% /dev
tmpfs           200M  220K  199M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            498M     0  498M   0% /run/shm
/dev/vdb         20G  173M   19G   1% /mnt
/dev/vdc       15.0G   34M 14.7G   1% /mnt2

Notes

  • If resize2fs does not work, try rebooting the instance first. The kernel will come back up fresh and may pick it up fine after reboot.
  • Make sure you changed the block device name (i.e. from vdc to vdd for instance)
  • Do not use partitions (i.e. /dev/vdc1). Check resize2fs for details. It can be done anyway but rebuilding the partition table is needed.

Set up iSCSI Storage for ESXi Hosts From The Command Line

VMware Command Line Interface

The esxcli command line tool can be extremely useful to set up an ESXi host, including iSCSI storage.

1. Enable iSCSI:

~ # esxcli iscsi software set -e true
Software iSCSI Enabled

2. Check the adapter name, usually vmhba32, vmhba33, vmhba34 and so on.

~ # esxcli iscsi adapter list
Adapter Driver State UID Description
------- --------- ------ ------------- ----------------------
vmhba32 iscsi_vmk online iscsi.vmhba32 iSCSI Software Adapter

3. Connect your ESXi iSCSI adapter to your iSCSI target

~ # esxcli iscsi adapter discovery sendtarget add -A vmhba32 -a 10.230.5.60:3260

~ # esxcli iscsi adapter get -A vmhba32
vmhba32
Name: iqn.1998-01.com.vmware:ch02b03-65834587
Alias:
Vendor: VMware
Model: iSCSI Software Adapter
Description: iSCSI Software Adapter
Serial Number:
Hardware Version:
Asic Version:
Firmware Version:
Option Rom Version:
Driver Name: iscsi_vmk
Driver Version:
TCP Protocol Supported: false
Bidirectional Transfers Supported: false
Maximum Cdb Length: 64
Can Be NIC: false
Is NIC: false
Is Initiator: true
Is Target: false
Using TCP Offload Engine: false
Using ISCSI Offload Engine: false

4. Now, on your iSCSI server, assign a volume of the SAN to the IQN of your ESXi host, for example, for a HP StorageWorks:

CLIQ>assignVolume volumeName=racedo-vSphereVolume initiator=iqn.1998-01.com.vmware:ch02b01-01e26a74;iqn.1998-01.com.vmware:ch02b02-20d3e33b;iqn.1998-01.com.vmware:ch02b03-65834587

The above command assigned three IQNs to the volume, two that we already had and the new one we are setting up. This is just for HP StorageWorks CLI, other storage arrays work differently.

5. Back on the ESXi host, discover the targets:

~ # esxcli iscsi adapter discovery rediscover -A vmhba32

Finally, check with the df command that the datastore has been added. If not, try rediscovering again.

This is the simplest configuration possible from the command line. NIC teaming or other more complex setups can also be done from the command line of the ESXi hosts.

Deploying vSphere Remotely From the Command Line

VMware Command Line Interface

If all we have is remote access (ssh) to an ESXi host and we still need install vCenter Server to get vSphere up and running, we can do it with ovftool. The ovftool comes with VMware Workstation and can also be downloaded separately if needed.

The idea is simple: download the vCenter Server OVA appliance to the ESXi host datastore, copy the ovftool to the ESXi host datastore and use it to install the appliance.

This process assumes that the ESXi host has the ssh service enabled.

1. Download the vCenter Server OVA appliance. For example, from a Dropbox folder we share. ESXi comes with wget installed so we can do:

# cd /vmfs/volumes/datastore1
# wget https://dropbox-url/VMware-vCenter-Server-Appliance-5.1.0.5300-947940_OVF10.ova

2. Copy the ovftool (with its libraries) to the ESXi server. Just replace bash by sh as the interpreter and copy it from any Linux box with Workstation installed in it:

# vi /usr/lib/vmware-ovftool/ovftool

Replace:

#!/bin/bash

by:

#!/bin/sh

Copy its directory to the ESXi host:

# scp -r /usr/lib/vmware-ovftool/ root@esxi-ip:/vmfs/volumes/datastore1

3. Now, all it’s left is installing the vCenter Server appliance with it:

Find the actual path to the datastore where the appliance and the ovftool were copied:

# ls -l /vmfs/volumes/datastore1
lrwxr-xr-x    1 root     root            35 Nov 26 03:29 /vmfs/volumes/datastore1 -> 52900c65-73c1bf38-6469-001f29e4ff20

Using the full path, run the ovftool to install the vCenter Server appliance:

# /vmfs/volumes/52900c65-73c1bf38-6469-001f29e4ff20/vmware-ovftool/ovftool -dm=thin /vmfs/vol
umes/52900c65-73c1bf38-6469-001f29e4ff20/VMware-vCenter-Server-Appliance-5.1.0.5300-947940_OVF10.ova "vi://root:password@localhost"

The output should be similar to this:

Opening OVA source: /vmfs/volumes/52900c65-73c1bf38-6469-001f29e4ff20/VMware-vCenter-Server-Appliance-5.1.0.5300-947940_OVF10.ova
The manifest validates
Source is signed and the certificate validates
Accept SSL fingerprint (B3:DC:DF:58:00:68:A3:92:A9:A4:65:41:B2:F6:FF:CF:99:2A:3E:71) for host localhost as target type.
Fingerprint will be added to the known host file
Write 'yes' or 'no'
yes
Opening VI target: vi://root@localhost:443/
Deploying to VI: vi://root@localhost:443/
Transfer Completed
Completed successfully

When this is finished, the vCenter Server VM is created in the ESXi host.

4. Power on the vCenter Server VM.

First, find out its VM ID:

# vim-cmd vmsvc/getallvms

Assuming the VM ID is 1, power it on with:

# vim-cmd vmsvc/power.on 1

And once it’s up and running, find its IP running:

# vim-cmd vmsvc/get.summary 1|grep ipAddress

Point your browser to that IP on port 5480 to start configuring the the vCenter Server you just deployed.