OpenStack, Troubleshooting

Figuring Out OpenStack Multi-Node Error Messages – Part 1

Recently I have deployed a 3 node Neutron based OpenStack environment – Controller, Network and Compute, and I have encountered with many error messages throughout the establishment process of a working environment. As a determined ITer who gets the job done, I had to research what are those error messages mean and possible ways to solve the underlying issues evoking them (most are incorrect configurations or timing issues). In this blog post I’ll summarize common error messages and their solutions. I hope you’ll find it helpful.

First things first, in order to debug an issue with a related error/warning message it’s better to understand the flow between the different components in OpenStack.

And also to understand where each service/agent resides in a multi-node environment. Running the following commands will shed some light on how to approach debugging. 

Using nova’s below command we can discover where do all the services reside and what are their statuses.

root@controller:~# nova service-list
+----+------------------+------------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary           | Host       | Zone     | Status  | State | Updated_at                 | Disabled Reason |
+----+------------------+------------+----------+---------+-------+----------------------------+-----------------+
| 1  | nova-cert        | controller | internal | enabled | up    | 2015-03-20T00:14:34.000000 | -               |
| 2  | nova-consoleauth | controller | internal | enabled | up    | 2015-03-20T00:14:29.000000 | -               |
| 3  | nova-scheduler   | controller | internal | enabled | up    | 2015-03-20T00:14:33.000000 | -               |
| 4  | nova-conductor   | controller | internal | enabled | up    | 2015-03-20T00:14:35.000000 | -               |
| 5  | nova-compute     | compute    | nova     | enabled | up    | 2015-03-20T00:14:31.000000 | -               |
+----+------------------+------------+----------+---------+-------+----------------------------+-----------------+

Using neutron’s below command we can discover where do all the agents reside and what are their statuses.

root@controller:~# neutron agent-list
+--------------------------------------+--------------------+---------+-------+----------------+---------------------------+
| id                                   | agent_type         | host    | alive | admin_state_up | binary                    |
+--------------------------------------+--------------------+---------+-------+----------------+---------------------------+
| 5512fa70-7a6d-4925-8c34-50b43b84c872 | L3 agent           | network | :-)   | True           | neutron-l3-agent          |
| 555d26fb-f435-4191-abf2-9d9508f7ee3f | DHCP agent         | network | :-)   | True           | neutron-dhcp-agent        |
| 9e2a36c9-3b44-4da7-83e9-46cd30134a59 | Metadata agent     | network | :-)   | True           | neutron-metadata-agent    |
| c342753a-9536-4b27-988f-a4b6ce270fbe | Open vSwitch agent | network | :-)   | True           | neutron-openvswitch-agent |
| e8d03257-b231-482e-9f90-0e209d36ecbe | Open vSwitch agent | compute | :-)   | True           | neutron-openvswitch-agent |
+--------------------------------------+--------------------+---------+-------+----------------+---------------------------+

Needless to say, log files can be seen on the correct node when changing directory to var/log/neutron in the case of debugging neutron agent for example and using the tail command.

Common Error Messages

RabbitMQ Messages

After the short introduction let’s look at some common error messages and their potential solutions.

ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on controller:5672 is unreachable: [Errno 111] ECONNREFUSED.

or

ERROR neutron.openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed.

RabbitMQ in this case is the message broker and it looks like there’s no communication between the service/agent that is trying to connect with RabbitMQ server.

Solution?

Make sure RabbitMQ server is running by checking its status.

$ service rabbitmq-server status

Authentication to RabbitMQ server can be possibly the culprit.

Use the following and replace RABBIT_PASS with the password you have granted your services to authenticate with RabbitMQ

$ rabbitmqctl change_password guest RABBIT_PASS

NTP is very important to the proper authentication to the RabbitMQ server. Use the date command to make sure all nodes are synced time wise.

Check that port 5672 is active on your controller and receiving request for all nodes by using

$ lsof -i :5672 

And by using

$ root@controller:/etc/rabbitmq# netstat -lntp | grep 5672 
tcp6       0      0 :::5672                 :::*                    LISTEN      30482/

Make sure /etc/rabbitmq/rabbitmq.config is including the following if your RabbitMQ version is later than 3.3.0

[{rabbit, [{loopback_users, []}]}].

Misconfiguration of iptables can also be the culprit. That said, running

$ iptables-save | grep 5672 

can leave no output and it still doesn’t mean that iptables blocking that port. You can try to bring down your iptables to check if this is the source of the problem.

Another common RabbitMQ related message

TRACE nova ImportError: No module named rabbit
Solution?

Based on this bug, it looks rpc_backend may be removed from a non-controller node configuration. I have removed it and can attest that after doing so I had a fully functional environment.

Neutron Messages

Seeing this few time

 root@controller:~# neutron agent-list 
Unable to establish connection to http://controller:9696/v2.0/agents.json
Solution?

This can be resolved by simply restart the neutron server on both the controller and the network nodes.

service neutron-server restart

iptables is another thing you want to try so try to bring down your iptables to check if this is the source of the problem.

Nova Messages

Having this nova related error

ERROR nova.virt.driver [-] Compute driver option required, but not specified
Solution?

Define compute_driver in /etc/nova/nova.conf for example

compute_driver = libvirt.LibvirtDriver

Having this warning message

WARNING nova.virt.libvirt.driver [-] Periodic task is updating the host stat, it is trying to get disk  but disk file was removed by concurrent operations such as resize
Solution?

Was solved by configuring adding vif plugin entries in the nova.conf under [DEFAULT] section (both controller and compute node)

vif_plugging_is_fatal: false
vif_plugging_timeout: 0
RPC Messages

If you see this RPC related message

Endpoint does not support RPC version 3.33 Endpoint does not support RPC version 3.33
Solution?

You’ll need to sync the same nova or neutron version between all nodes. Run nova --version and/or neutron --version on all nodes to check whether there’s any inconsistency.

Advertisements

One thought on “Figuring Out OpenStack Multi-Node Error Messages – Part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s