Monday, August 31, 2009

Hosting your DNS on vSphere 4 - caveat

For a while now, I was having an issue with my whitebox ESX4.0 server: after rebooting this machine, I was unable to connect to it using the vSphere client. The error I was receiving was a simple "503: Service unavailable". The hostd.log on the host was filled with errors like:

--F637FB90 warning 'Proxysvc Req00002'-- Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).

and I noticed that the /var/log/messages contains a lot of vmware-authd start & stop messages. I struggled and managed to find a workaround which consisted of:

  • Logging onto the service console as root

  • Edit the /etc/vmware/hostd/config.xml file and disabling the "proxysvc" component of hostd.

  • Restart the hostd process (service mgmt-vmware restart)

  • Wait for all my autostart VM's to come online

  • Re-enable the "proxysvc" and restart hostd once again

Today, I discovered this thread on the VMware communities which contained the answer I was looking for: the DNS servers I had configured on my ESX box were virtual machines running on the box itself (in my case: a m0n0wall virtual appliance and a Windows 2008 domain controller with DNS). Apparently this disrupts the proxysvc component of hostd (since the virtual DNS servers are not reachable at the time hostd is first started - autostart is yet to kick in), causing it to fail to start properly and preventing vSphere client connections. Furthermore, this prevented the autostart of VM's all together, thus never getting DNS to get up and running at all.

The solution was to clear my /etc/resolv.conf file and now everything works fine immediately after a reboot (no more attempts to connect to a virtual machine that is not yet running)! This completely slashes DNS support (in particular if you are using HA, you'll need to do good /etc/hosts maintenance). Since your typical production environment probably is not running the entire DNS infrastructure as a or several virtual machine(s), you probably are never exposed to this issue anyway.