Monday, August 31, 2009

Hosting your DNS on vSphere 4 - caveat

For a while now, I was having an issue with my whitebox ESX4.0 server: after rebooting this machine, I was unable to connect to it using the vSphere client. The error I was receiving was a simple "503: Service unavailable". The hostd.log on the host was filled with errors like:

--F637FB90 warning 'Proxysvc Req00002'-- Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).

and I noticed that the /var/log/messages contains a lot of vmware-authd start & stop messages. I struggled and managed to find a workaround which consisted of:

  • Logging onto the service console as root

  • Edit the /etc/vmware/hostd/config.xml file and disabling the "proxysvc" component of hostd.

  • Restart the hostd process (service mgmt-vmware restart)

  • Wait for all my autostart VM's to come online

  • Re-enable the "proxysvc" and restart hostd once again

Today, I discovered this thread on the VMware communities which contained the answer I was looking for: the DNS servers I had configured on my ESX box were virtual machines running on the box itself (in my case: a m0n0wall virtual appliance and a Windows 2008 domain controller with DNS). Apparently this disrupts the proxysvc component of hostd (since the virtual DNS servers are not reachable at the time hostd is first started - autostart is yet to kick in), causing it to fail to start properly and preventing vSphere client connections. Furthermore, this prevented the autostart of VM's all together, thus never getting DNS to get up and running at all.

The solution was to clear my /etc/resolv.conf file and now everything works fine immediately after a reboot (no more attempts to connect to a virtual machine that is not yet running)! This completely slashes DNS support (in particular if you are using HA, you'll need to do good /etc/hosts maintenance). Since your typical production environment probably is not running the entire DNS infrastructure as a or several virtual machine(s), you probably are never exposed to this issue anyway.

Thursday, April 9, 2009

Active Directory over SSL in VMware Lifecycle Manager

I recently have been playing around with VMware's Lifecycle Manager appliance, and one of the small "gotcha's" I ran into was how to configure secure communications between the LCM appliance and the Active Directory backend I was authenticating against.


After configuring LCM to use Active Directory and SSL, I was getting the following error message:
Error: Unable to connect to LDAP Server / simple bind failed: dc.pretnet.local:636

In order to get the SSL authentication working for Active Directory (or LDAP in general), you need to be sure that the Certificate Authority that issues your domain controller certificates is trusted by the appliance (you don't need to actually import the domain controller certificate itself, just the issuing CA is sufficient). This is done by going through the following steps:
  1. First, obtain a copy of the issuing certification authority's certificate (without private key obviously). Ensure that it is in the X.509 format, Base64 encrypted or DER encrypted. The appliance doesn't seem to support certificate containers (P7B format), so when you export the certificate using the Certificates MMC, ensure you select one of the first two options as the export format!!


  2. To add the X.509 certificate to the appliance, go to the "Network" tab and select the "SSL Certificate" configuration pane. Here, import the certificate file.


  3. Next, restart the "VMO Configuration Server", which you can find at the bottom of the "Server" tab in the GUI.


    Note: if you get an error message that first you need to fix your LDAP configuration (and "Plugins" section) before you can restart the VMO Configuration Service, go back to the LDAP configuration and disable SSL for a moment.
That's it! Secure Active Directory authentication (which is what we all want) is now working properly! It's a good idea to import the certificate right away, because your other configuration tasks are severily limited when the authentication (either using the built-in OpenLDAP server on the appliance, or using Active Directory) is not working properly.

As a sidenote, I would like to add that, despite VMware recommending to run Lifecycle Manager on a dedicated Windows box (LCM Administration Guide v1.01, p21), the appliance is a really convenient way of running and upgrading this product without too much hassle. Of course, don't forget to offload the configuration database from the appliance (use a dedicated SQL or Oracle server)!