tag:blogger.com,1999:blog-48346343908564759782024-02-19T16:53:24.651+01:00Tim's Technical ThoughtsEvery now and then you run into a small discovery that you know you will ever need again in your life. This is where I throw together all of mine. Perhaps they come in handy for you too...Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.comBlogger50125tag:blogger.com,1999:blog-4834634390856475978.post-12535852629430774872016-12-09T17:00:00.000+01:002016-12-09T17:11:33.130+01:00Caveat: multimaster I2C on 32-bit Arduino'sFor a new project I'm working on, it was necessary to allow for bidirectional communication on a shared I2C bus between a bunch of Arduino's (more specifically, a healthy mix of Arduino MKR1000's and Adafruit Feather M0's). After some quick researching, I found a few easy looking examples on how to use some mixture of master/slave settings in a "multimaster" mode with Arduino's Wire library (the TwoWire/I2C implementation). You can read them <a href="https://michael.bouvy.net/blog/en/2013/05/25/arduino-multi-master-to-master-i2c/">here</a>, <a href="http://kendziorra.nl/arduino/98-esp8266-i-c-with-arduino-multi-masters?showall=1">here</a> and <a href="http://digitalcave.ca/resources/avr/arduino-i2c.jsp">here</a>, and they basically all do the following "trick" during initialization:<br />
<blockquote class="tr_bq">
Wire.begin(MY_ADDRESS);<br />
Wire.onReceive(receiveI2C);</blockquote>
and then for sending data, they use:<br />
<blockquote class="tr_bq">
Wire.beginTransmission(DEST_ADDRESS);<br />
Wire.write(0x30);<br />
Wire.endTransmission();</blockquote>
So what's happening here?<br />
<br />
<ul>
<li>Basically, if you use <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">Wire.begin(MY_ADDRESS)</span> you are initializing the I2C bus in slave mode, and by registering the "onReceive" interrupt service handler, you define what needs to happen when the slave receives data. </li>
<li>Next up, these examples all start sending data as a MASTER (despite being configured as a slave) by initializing the transmission, writing the data and then putting the data on the bus with the <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">endTransmission()</span> call. </li>
</ul>
<div>
This works on 8-bit AVR Arduino's (and a bunch of other devices) because of how the Wire library is implemented on these devices. Just have a look at the following files (from your Arduino IDE installation folder):</div>
<div>
<ul>
<li>hardware\arduino\avr\libraries\Wire\src\Wire.cpp</li>
<li>hardware\arduino\avr\libraries\Wire\src\utility\twi.c</li>
</ul>
<div>
You can see <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">endTransmission()</span> calls the <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">twi_WriteTo()</span> function from twi.c (<i>yes, Wire is merely a wrapper around some C implementation of I2C</i>). What does <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">twi_WriteTo()</span> do? It attempts to become the master on the I2C bus and then sends data. That's actually <b>a very good way</b> of combining master & slave functionality on a single device on the I2C bus, and most definitely the way to go.</div>
</div>
<div>
<br /></div>
<div>
Now cue the Wire implementation on the 32-bit Atmel processors. These use the hardware SERCOM's of the SAMD21 processors, and directly use the corresponding I2C (well.. SERCOM) hardware registers to put the MCU in I2C master or slave mode... Unfortunately, there is no corresponding temporary promotion to a master going on in this implementation. Let's have a look at the SAMD21 implementation of Wire, which you can find in your <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">%LOCALAPPDATA%\Arduino15\packages\arduino\hardware\samd\1.6.8\libraries\Wire</span> folder:</div>
<div>
<ul>
<li>The <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">endTransmission() </span>implementation directly calls <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">SERCOM->startTransmissionWIRE()</span>, which is from the the <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">%LOCALAPPDATA%\Arduino15\packages\arduino\hardware\samd\1.6.8\cores\arduino\SERCOM.cpp</span> file.</li>
<li>That <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">startTransmissionWIRE() </span>function first checks if the I2C bus is idle (multimaster collision protection) or whether it already owns the I2C bus using: <br /><br /><span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">while ( !isBusIdleWIRE() && !isBusOwnerWIRE() );</span></li>
</ul>
Unfortunately, both these functions assume the SERCOM was initialized as a master, and since the Wire.begin() was called with a slave address and hence the entire MCU was initialized as a slave. I haven't worked out the details yet but you'll see that the sketch remains stuck in this while loop because the conditions are never met.</div>
<div>
<br /></div>
<div>
This is one of the many differences between AVR & SAMD based Arduino's that you'll encounter when diving into the details. The solution is to do extend your code and do your own proper promotion towards I2C master when you need to send data, and depromote yourself to slave in case you don't need to send anything... basically mimicing the behaviour of the original <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">twi.c</span> that all started it.</div>
<div>
<br /></div>
<div>
<br /></div>
Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-46890488324240436752016-09-16T12:33:00.000+02:002016-12-09T17:11:51.079+01:00MKR1000 support in the Arduino IDEI started experimenting with the Arduino/Genuino MKR1000 board that I received today for another IoT project that we're working on at ThingTank. There are a few caveats when trying to access the board that you might run into:
<br />
<br />
<ul>
<li>First of all, be sure to use the Arduino.cc IDE and not the Arduino.org IDE. At the time of writing you can distinguish them in the version number; you'll need <a href="https://www.arduino.cc/en/Main/Software">the 1.6.x releases</a> and not <a href="http://www.arduino.org/downloads">the 1.7.x releases</a>. It might seem like you're using an "older" version but that is not the case; they are just different branches of a very similar IDE.<br /><br />The biggest difference is that the Arduino.org IDE at the moment does not have the simple "Board Manager" user interface to enable other boards than the standard Atmel AVR boards.<br /></li>
<li>Then, use the "Board Manager" to install the "Arduino SAMD" toolchain. When starting the board manager, I received a "<span style="color: red;"><b>package_index.json file signature verification failed</b></span>" error message.<br /><br />Not sure what caused this (remnant of my previous 1.7.x Arduino IDE installation?), but I had a remnant <span style="font-family: "courier new" , "courier" , monospace;">package_index.json.sig</span> file in the Arduino AppData folder (<span style="font-family: "courier new" , "courier" , monospace;">%APPDATA%/Local/Arduino15/</span>). After deleting that file, I was able to search and install in the Board Manager correctly.</li>
</ul>
<div>
After installation of the Arduino SAMD Boards support, you can now select the MKR1000 as one of the boards in the IDE and start fiddling around! Hurray!</div>
Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-56637832997221378552015-10-04T18:46:00.002+02:002016-09-16T12:34:22.222+02:00VMware Workstation: scripting unity modeWhen trying to keep a whole bunch of legacy programs running at a case appointed to me, it was necessary to use Windows 7 running in a virtual machine under VMware Workstation (running on Windows 10). To prevent the VM's of staying on after the program execution finished, I wrote a small batch file on the virtualization host to start and suspend the VM:<br />
<br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">@echo off</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">cd C:\Program Files (x86)\VMware\VMware Workstation</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun start C:\VMs\Win7\Win7.vmx</span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun -T ws -gu <user> -gp <psw> runProgramInGuest C:\VMs\Win7\Win7.vmx -activeWindow -interactive "C:\Program Files\MyLegacyApp\Legacy.exe"</psw></user></span><br />
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun suspend C:\VMs\Win7\Win7.vmx</span><br />
<div>
<br /></div>
<div>
This works fine, since the <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun "runProgramInGuest"</span> runs synchronously, only terminating after the application inside the guest VM has finished -- which then nicely calls a "suspend" to freeze the VM for the next time the batch file is called on the host.</div>
<div>
<br /></div>
<div>
However, to improve even further the user experience, I wanted to use VMware Workstation's unity feature. That turned out to be more difficult in my setting than I had expected: whether Unity was used the last time in the VM or not, the script above always opens the VMware Workstation GUI and just shows the VM starting. I also read about adding the following lines to the VMX file:</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">gui.fullScreenAtPowerOn = "FALSE"</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">gui.lastPoweredViewMode = "unity"</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">gui.viewModeAtPowerOn = "unity"</span></div>
</div>
<div>
<br /></div>
<div>
Yet that didn't work either... </div>
<div>
<br /></div>
<div>
So instead, I decide to use a workaround that does both the scripted & synchronous execution of the program I need in the guest, and enables Unity at the same time:</div>
<div>
<ul>
<li>I run a dummy program in "Unity" which puts the (running) VM in Unity view mode -- notice that <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmware-unity-helper.exe </span>works asynchronously, hence immediately returns.</li>
<li>Afterwards, I start the Legacy.exe using <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun.</span></li>
</ul>
<div>
To complicate matters further, also multiple users have to be able to run the program in the same Workstation VM, so some "preconfiguration" for Unity is required for every user.</div>
</div>
<div>
<br /></div>
<div>
In a nutshell:</div>
<div>
<ul>
<li>vmware-unity-helper.exe on Windows seems to only run predefined commands on predefined VMs. The configuration is under <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">%LOCALAPPDATA%\VMware\unity-helper.xml</span>.</li>
<li>This file looks (in my case) as follows:<br /><br /><span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">
<unity_helper version="1"><br />
<ghilaunch><br />
<apps nextid="3"><br />
<app id="1" uri="file:///d:/dummy.lnk"><br />
</apps><br />
<vms nextid="2"><br />
<vm id="1" path="C:\VMs\Win7\Win7.vmx"><br />
</vms><br />
</ghilaunch><br />
</unity_helper>
</span></li>
<li>You can see in the XML file:</li>
<ul>
<li>First, all applications that Unity can start are defined using an App ID and the URI on where the file is located.</li>
<li>Then, a list of all known VMs is provided, with the path to the VMX file.</li>
</ul>
<li>An application is then started with vmware-unity-helper.exe as follows:<br /><br /><span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmware-unity-helper.exe -r -G:1 -V:1</span><br /><br />where the "G" parameter specifies the app ID and the V parameter the VM ID.</li>
</ul>
</div>
<div>
So in order to have any user utilize Unity, I modified the script above as follows:</div>
<div>
<br /></div>
<div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">@echo off</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">C:</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">cd C:\Program Files (x86)\VMware\VMware Workstation</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">REM Prepare unity</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">copy /Y C:\VMs\unity-helper.xml %LOCALAPPDATA%\VMware\unity-helper.xml >NUL</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">REM Start VM</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun start C:\VMs\Win7\Win7.vmx nogui</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">REM Now run application </span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmware-unity-helper.exe -r -G:1 -V:1</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun -T ws -gu Adobe -gp adobe runProgramInGuest C:\VMs\Win7\Win7.vmx -activeWindow -interactive "D:\Program Files\MyLegacyApp\Legacy.exe"</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">REM Now suspend VM</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun suspend C:\VMs\Win7\Win7.vmx</span></div>
</div>
<div>
<br /></div>
<div>
This script:</div>
<div>
<ol>
<li>First copies the predefined XML file to the %LOCALAPPDATA% folder (overwriting anything there -- my users don't use Unity for other purposes, so if you need to support that too you'll need to do some more magic).</li>
<li>Then we start the VM as before</li>
<li>Next step is to run "a dummy application" using Unity, which sets the already running VM in Unity mode. More on this dummy application in a second.</li>
<li>Then we starts the application again using vmrun, wait for it to end, and nicely close the VM as before.</li>
</ol>
</div>
<div>
<br />
The dummy application inside the VM is just a batch file "<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">dummy.bat</span>" with contents "<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">@echo off</span>" and "<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">exit</span>". I created a shortcut "<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">dummy.lnk</span>" to this "<span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">dummy.bat</span>" so I can keep the command prompt window minimized (properties of shortcut) -- this prevents a unity window popping up & disappearing into view.</div>
<div>
<br /></div>
<div>
This works great and runs the application in Unity, nicely starting & stopping the VM as needed. Obviously the script should be extended to check if the VM is already running by another user, but that'll be for another time :). The only disadvantage is that you briefly still see the VMware Workstation window -- unfortunately using the "nogui" option with <span style="font-family: "courier new" , "courier" , monospace; font-size: x-small;">vmrun start</span> does not seem to fix this...</div>
<div>
<br /></div>
<div>
As a sidenote, to top it of I:</div>
<div>
<ul>
<li>Enabled "Shared Files" to make available the data directories of the users to the users under the guest OS, under a mapped network drive.</li>
<li>I disabled all unnecessary services in the VM for a very fast startup.</li>
<li>I've hidden all disk drives (C:, D:, E:) in the guest VM using group policy (<a href="https://support.microsoft.com/kb/231289">https://support.microsoft.com/kb/231289</a>) - to ensure the users can only see the "Shared Folders" and never mistakenly save data inside the guest.</li>
<li>The VM was located on a SSD disk so the Windows 7 in the guest starts incredibly fast </li>
</ul>
</div>
Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-27952563691432249802015-08-18T12:57:00.001+02:002015-08-18T12:57:24.807+02:00Erratic mouse movement in VMware Workstation on a second screenI heavily rely on VMware workstation for running all flavours of OS's (from Linux to older Windows versions, and there might even be a MacOSX for iOS development running somewhere, maybe). After upgrading to Windows 10 on my new laptop (with a 1080p screen compared to the previous device I was using), I noticed that all of my machines were behaving strange, to the point that basically none of them were useful anymore.<br />
<br />
<b>Symptoms:</b><br />
<br />
<ul>
<li>Mouse pointer "flickers" around the screen, when clicking/dragging, the mouse pointer jumps to the upper left corner.</li>
<li>This only happens when VMware Workstation is running on the second screen attached to the computer, not on the native screen.</li>
<li>It happens on ALL operating systems, Linux, Windows, and perhaps also on Mac OS X.</li>
</ul>
<div>
Been looking around a bit to found out what was causing this behaviour, and it turns out that it is a generic problem with Workstation itself.</div>
<div>
<br /></div>
<div>
<b>Resolution:</b></div>
<div>
<br /></div>
<div>
Thank you <a href="https://communities.vmware.com/thread/481919">VMware forums</a>, the solution is to disable "Display scaling on high DPI settings" on the <span style="font-family: Courier New, Courier, monospace;">vmware.exe</span> in the <span style="font-family: Courier New, Courier, monospace;">C:\Program Files (x86)\VMware\VMware Workstation</span> folder. </div>
<div>
<br /></div>
<div>
One of those issues that can drive you crazy, have you reinstalling your VM's a dozen of times thinking it is you doing something wrong, etc etc... so hope it helps :).</div>
Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-31007374066005903282015-02-20T12:37:00.000+01:002015-02-20T12:37:13.753+01:00Removing OneDrive from Windows Explorer in Windows 10 TPI've been using Windows 10 Technical Preview for a few months already, and installed the latest build 9926 a few days ago. Since I hardly use OneDrive (but instead a combination of... DropBox, Google Drive and my own OwnCloud), I prefer not to have it visibly polluting my Explorer windows.<br />
<br />
In Windows 8.1, there was plenty of documentation on what registry keys to modify in order to hide OneDrive from explorer, see for example <a href="http://www.eightforums.com/tutorials/28027-onedrive-remove-navigation-pane-windows-8-1-a.html">here</a>. Unfortunately, the class ID for OneDrive changed in Windows 10 -- use the following registry location instead:<br />
<br />
<pre>HKEY_CLASSES_ROOT\CLSID\{018D5C66-4533-4307-9B53-224DE2ED1FE6}\ShellFolder</pre>
<br />
As described in the original EightForums article, set the "Attributes" key to <b>f090004d</b>. I've also discovered that in Windows 10 it is not necessary to perform the "64-bit" actions (no similar key exists under the Wow6432Node).<br />
<br />
That already visually removes OneDrive. Next up, make sure it doesn't start again by modifying the "OneDrive" application settings. As back in the pre-Windows 8.1 era, OneDrive can again be disabled by rightclicking the OneDrive icon in the notification area, going to settings and disabling...<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmYNjfeR-6uuY2MfyJ66Qx_CidqKnUAP7zyEzpTlg18uyZMt8gIdfxU1a1yzTkS806kDHoTPVgA0gpKoy5P0KgS818PTEoCBCAFeU2ZQvuvP29uOlxo77-un-AmNFwEFu-sejUaZ5vCAzE/s1600/20150220-OneDriveApplication.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgmYNjfeR-6uuY2MfyJ66Qx_CidqKnUAP7zyEzpTlg18uyZMt8gIdfxU1a1yzTkS806kDHoTPVgA0gpKoy5P0KgS818PTEoCBCAFeU2ZQvuvP29uOlxo77-un-AmNFwEFu-sejUaZ5vCAzE/s1600/20150220-OneDriveApplication.PNG" height="320" width="285" /></a></div>
<br />
Finally, don't forget to enable "Save to Computer by default" in the various Office applications to prevent Office of trying to find that OneDrive folder again...<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjh3o3R_f-zPyArrdmU_dhRofEjabaRkZysR6sYT_gEVZzvn2JE8d9shvgaZ06_DNZltTwS_tsq9RcWJ5eSZgyR-gL1w4kFdRcgM_bit0r_0vB1EQmiKvD5Jn_3zJQlcMh2HceBNsTAgJaQ/s1600/20150220-OneDriveWord.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjh3o3R_f-zPyArrdmU_dhRofEjabaRkZysR6sYT_gEVZzvn2JE8d9shvgaZ06_DNZltTwS_tsq9RcWJ5eSZgyR-gL1w4kFdRcgM_bit0r_0vB1EQmiKvD5Jn_3zJQlcMh2HceBNsTAgJaQ/s1600/20150220-OneDriveWord.PNG" height="161" width="400" /></a></div>
<br />
<br />
Et voila, at least in this Technical Preview OneDrive can be circumvented this way. No guarantees whether this will still be the case in future TP's or the RTM version of Windows 10 though...<br />
<br />
Please note that this does have an impact on functionality, as Microsoft still has big plans to use OneDrive as to share data & settings across the entire Windows 10 platform - read Paul Thurrott's view on it <a href="http://winsupersite.com/windows-10/heres-whats-really-happening-onedrive-windows-10">here</a>.Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-20185774215230289662012-10-30T21:17:00.003+01:002012-10-30T21:17:38.567+01:00Windows 8 DNS resolution issues and IPv6One small issue that I faced already a few times, is that the Windows TCP/IP stack does not seem to be able to properly resolve a DNS hostname (FQDN) despite that <b>nslookup</b> returns a perfectly fine result. The same system was running fine in the same network under Windows 7.<br />
<br />
The solution was to disable IPv6 on the network adapters of the system. This is just <a href="http://thommck.wordpress.com/2011/02/08/offline-files-versus-vpn-a-k-a-the-case-of-the-missing-work-online-button">another</a> <a href="http://blog.solori.net/2009/03/04/sbs-2008-panics-needs-ipv6/">example</a> of strange issues with IPv6 that find their origin in the fact that the IPv6 code is in fact used <a href="http://blogs.technet.com/b/jlosey/archive/2011/02/02/why-you-should-leave-ipv6-alone.aspx">very intensively throughout the Windows components</a>. That is also the reason why Microsoft <a href="http://blogs.technet.com/b/netro/archive/2010/11/24/arguments-against-disabling-ipv6.aspx">recommends against</a> disabling IPv6. Well.. it helped me anyway, and was easier than configuring IPv6 addresses for my DNS server :).Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-89157445697284471862012-09-21T12:01:00.000+02:002012-09-21T12:01:04.040+02:00Upgrading Windows 7 Ultimate to Windows 8 EnterpriseUnfortunately, Microsoft <a href="http://technet.microsoft.com/en-us/library/jj203353.aspx">does not support performing an in-place upgrade</a> of a Windows 7 Ultimate installation to a Windows 8 Enterprise edition; Windows 7 Ultimate can only be upgraded to Windows 8 Professional (since Windows 8 does not come with an Ultimate edition). True, there might be <a href="http://en.wikipedia.org/wiki/Windows_8_editions#Comparison_chart">little added value for a home user</a> in Windows 8 Enterprise, but since it was the only version I had ready on a bootable USB stick, I tried to fool the installer to continue anyway.<br />
<br />
This was surprisingly easy. It suffices to modify the "<b>EditionID</b>" and "<b>ProductName</b>" registry keys in the following location:<br />
<br />
<b>HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\Current Version</b><br />
<br />
from "Ultimate" and "Windows 7 Ultimate" to "Enterprise" and "Windows 7 Enterprise" respectively, to let the installation proceed.<br />
<br />Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com6tag:blogger.com,1999:blog-4834634390856475978.post-4285315412991246362010-12-22T09:07:00.003+01:002010-12-22T09:33:00.083+01:00A note on Western Digital 2001FASS drivesAbout a year ago, I decided to open my wallet and cough up some serious money for a good NAS solution for my home usage. With an ESX whitebox, a growing number of pictures and other digital parafernalia that I like to (permanently) store, I decided that a standalone NAS solution would be more reliable than relying on a single (now aging) RAID controller in my ESX whitebox. After all, a NAS is "system independent" so it can be accessed from any device, as long as there is a network. A few weeks later, I ordered the Thecus N7700 NAS from eBay, together with three Western Digital Caviar Black 2 TB disks (type: WD2001FASS). In the meantime, I upgraded to 5 disks.<div><br /></div><div>Neglecting some configurational complexities between ESX and the Thecus (see <a href="http://timjacobs.blogspot.com/2010/11/note-on-esx-4x-and-iscsi-devices.html">my previous blogpost on ESX's iSCSI implementation changes in 4.1</a>), everything has been running very fine... until yesterday.</div><div><br /></div><div>At 17:04 yesterday evening, I received a gmail notification from the Thecus NAS (yes, it sends mails through gmail) indicating one of the Caviar Black disks had failed and that my RAID5 array was now degraded. I was a bit surprised and already fearing another "Sea-gate" incident with another series of continuously failing disks (the "gate" prefix being so popular with "cablegate", I decided to introduce another one :) ). </div><div><br /></div><div>I decided to remove the affected disk and run the Western Digital drive diagnostic tools on it (which took a dreadfully long 4 and a half hours). Sure enough, a Full Drive test revealed that there were some bad sectors on the drive but that they were succesfully remapped to the spare capacity that drives get exactly to compensate for a few bad blocks. Still, the RAID array was degraded and the drive was reported as being failed (even though it seems to be very easily fixable), so I decided to dive a little deeper into what happened in an attempt to discover why this is not automatically fixed by the drive when such a bad block is discovered.</div><div><br /></div><div>What I found out, seriously pissed me off. Western Digital does support a mechanism to automatically remap bad blocks to the spare capacity on the drive. However, this can take a few moments so the question rises how the drive should communicate with the RAID controller to report that it is currently busy to do some block remapping. Western Digital has a technology which they refer to as <a href="http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery">TLER - Time Limited Error Recovery</a> to delay the RAID array of marking a drive as failed. </div><div><br /></div><div>Fantastic! The only problem is that this software feature is disabled in the 2001FASS drives, simply because it is considered a "consumer" drive. The even more expensive (and trust me, I had to use all my tactics to convince my wife to cough up the money for what I consider a really expensive drive) RE or "RAID edition" drives are in fact almost identical to the 2001FASS drives, with the exception that they have the TLER feature enabled.</div><div><br /></div><div>Basically, this means that the 2001FASS drive is <b>not suitable</b> for RAID arrays. When a drive encounters a bad block, it will immediately marked as failed even though this is not the case. Talking about a serious bummer! <a href="http://www.tomshardware.co.uk/forum/257590-14-wd20ears-safe-raid#t1826604">Some report that TLER is not needed for Linux</a> (which is basically what the Thecus NAS is, a Linux box) but my experience seems to contradict this slightly.</div><div><br /></div><div>For me, this is an important reason not to buy Western Digital anymore -- you need to cough up an additional bucket of money for a feature that should be enabled in <i>any</i> drive -- after all, all motherboards today support a basic RAID functionality! Or, if you want to upgrade at a given time from one drive to multiple drives... </div>Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com6tag:blogger.com,1999:blog-4834634390856475978.post-84002510208613829062010-11-24T22:32:00.006+01:002010-11-24T23:06:08.202+01:00A note on ESX 4.x and my iSCSI devicesA few weeks ago, I decided to extend my iSCSI NAS (Thecus N7700) from 3x 2TB Western Digital Caviar Black disks to 5x 2TB Western Digital Caviar Black disks. <div><br /></div><div>Trouble has been my companion ever since. I have been experiencing some serious performance issues since the RAID extension, and was fearing that the different firmware versions of the new Caviar Blacks was confusing my NAS system; mixing firmwares in RAID systems does not seem to be a best practice. The symptoms were very simple: from the moment a lot of I/O was generated (think: 160 MB/s write speeds to the NAS), ESX would loose the iSCSI link to the NAS, which was choking on all that traffic with a 100% CPU usage. As you very well know, storage is ESX's Achilles heel, and very shortly after that, the vmkernel logs would be flooding with messages indicating a path failure to the NAS:</div><br /><span class="Apple-style-span" >0:00:41:06.581 cpu1:4261)NMP: nmp_PathDetermineFailure: SCSI cmd RESERVE failed on path vmhba36:C0:T0:L3, reservation state on device t10.E4143500000000000000000040000000AE70000000000100 is unknown.<br />0:00:41:06.581 cpu1:4261)ScsiDeviceIO: 1672: Command 0x16 to device "t10.E4143500000000000000000040000000AE70000000000100" failed H:0x2 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.</span><br /><br /><div>After a multitude of firmware up- and downgrades on the Thecus N7700 and a lot of conversation with Thecus Support (which by the way I want to thank for their patience with a guy like me working in an <a href="http://www.thecus.com/Downloads/HDD_List/N7700_N7700SAS_N8800_N8800SAS_SATA_HDD_list_2010-09-02.pdf">unsupported scenario</a>!), I stumbled across some a strange error message that I had not seen before on an ESX host:</div><br /><span class="Apple-style-span" >0:00:41:06.733 cpu0:4113)FS3: 8496: Long VMFS3 rsv time on 'NASStorage04' (held for 3604 msecs). # R: 1, # W: 1 bytesXfer: 2 sectors</span><br /><br /><div>Some googling quickly pointed me to a few <a href="https://forums.openfiler.com/viewtopic.php?pid=19087#p19087">interesting</a> <a href="http://communities.vmware.com/thread/280337">threads</a>, which talked about a <a href="http://kb.vmware.com/kb/1002598">VMware KB 1002598</a> discussing performance issues on EMC Clariion systems with iSCSI. It seems that the iSCSI initiator in ESX allows for for delayed ACK's which apparently is important in situations of network congestion. Knowing that the N7700's CPU usage can sometimes peak to 100% and that this can very briefly can lock up the network link on the N7700, I decided to disable the Delayed ACK's, following the procedure in the VMware KB... </div><div><br /></div><div>Great success! Performance was rock solid again, and I have no longer experienced ESX hangs ever since!</div><div><br /></div><div>This made me think a bit, and I remember that I first noticed the performance issues a few weeks after upgrading to ESX 4.0 Update 2 -- I suppose some default setting has changed from a vanilla ESX 4.0 (which I was running earlier) to ESX 4.0 Update 2 that seems to disturb the good karma that I had going between my ESX host and N7700 NAS earlier. Let it be known to the world that also the N7700 with firmwares 2.01.09, 3.00.06 and 3.05.02.2 (the ones I tried) also is subject to the iSCSI symptoms described in <a href="http://kb.vmware.com/kb/1002598">VMware KB 1002598</a>.</div><div><br /></div>Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-80696208243405502912010-11-05T17:40:00.006+01:002010-11-05T23:33:52.697+01:00The joy of WSUSAfter a rather unpleasant electrical powerspike earlier this week had made some of my harddisks go wierd (crashing my ESX server with an equally unpleasant PSOD), a quick inspection revealed that no real harm was done -- except for one of the dozen RAID arrays that I have decided to do an automatic rebuild (no real issue). That finished after a few hours so I was able to go back to my comfortable sofa and enjoy some more quality prime time TV (lol). At least, so I thought...<div><br /></div><div>A few hours later I discovered that my domain controller had not survived the ESX crash and was very unpleasantly complaining about a corrupted registry. Deciding that a bare metal (or virtual metal) Active Directory disaster recovery was not really necessary on my home network (recreating the three user accounts was less effort ;) ), I decided to reinstall my entire domain controller. About 30 minutes after that decision, I was again running a new AD domain with the users recreated and the most important servers already rejoined to the domain.</div><div><br /></div><div>So what did I forget to configure in my enthousiasm to just reinstall the entire bunch? Certificate services, DFS namespace, DHCP server, re-ACL of file server, recreation of user profiles and also my own WSUS server (which were all happily running on my domain controller as well -- beat that SBS!).</div><div><br /></div><div>My own WSUS server I hear you say? Well yes, with the very unpleasant (which you will have noticed already is the word of today) bandwidth limitations we have in Belgium, my ISP decides to punish me with some low-bandwidth connection after transferring more than 80 GB of data. That is quite sufficient but I prefer not spending it on downloading all my Windows updates 14 times (which is about the total number of virtual machines, physical laptops and desktops I have running on a frequent basis). </div><div><br /></div><div>Given that my WSUS partition was about 120 GB and 98% filled, the doom scenario of seeing my entire data transfer that my ISP allows me for this month being entirely consumed by frikkin' Windows updates after reinstalling WSUS & synchronizing for the first time, slowly started to set in. An entire month of "small band" in this digital age? The horror... the horror...</div><div><br /></div><div>So I decided to spend a few megabytes of datatransfer of very actively googling whether it is possible to prevent WSUS from downloading all the updates from the internet. After all, the registry corruption of the domain controller had completely borked its functionality, yet the separate partition (and separate VMDK) which was holding the WSUSContent directory was undamaged.</div><div><br /></div><div>Most fora and blogs I found on recycling WSUSContent when performing a new installation, refer to a TechNet page called <a href="http://technet.microsoft.com/en-us/library/cc720512(WS.10).aspx">"Set Up a Disconnected Network (Import and Export Updates)"</a> , which explains how the WSUSContent can be copied from one server to other -- however, they are always exporting & importing the WSUS database as well; unfortunately this database got lost when I -- again -- enthousiastically wiped the entire corrupted OS VMDK. </div><div><br /></div><div>So I just decided to have a go and installed WSUS from scratch, and I pointed the WSUSContent directory to the partition which already contained the updates from the old server. Then I did the following:</div><div><ul><li>Configured the WSUS server exactly has before (with the same products to update)</li><li>Performed the first initial synchronization (this took a long time but using the network bandwidth monitoring in the vSphere client I could clearly see that only minimal amounts of data were transferred during this synchronization -- no actual content was downloaded!)</li><li>Approved all the updates that were previously also approved.</li></ul><div>This turns out to work quite nicely; apparently when WSUS detects that the updates are already downloaded to disk, it will recycle the existing content! Hurray for WSUS and for not torturing me with small band for an entire month!!</div></div><div><br /></div><div><br /></div>Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-80673149993367637022010-04-01T18:00:00.004+02:002010-04-01T19:09:49.349+02:00ESX Whitebox & RAID controller failures - an epic struggleThe past few days have been a bit tense. Not only was there a deadline at work (an interesting study at one of our customers that had to be finished before end of March 2010), but also yesterday, my ESX whitebox decided to die on me. Of course, I took my screwdriver and box of recovery CD's and went to work.. A reconstruction of the epic struggle to get everything back to work (yes, ):<br /><div><ul><li><b>March 31, 8:00 AM</b>. The (old & faithful 100 Mbps) 3Com switch that my PC's are currently connected to -- after having moved and being too lazy to install CAT6 cabling in my new house so I don't live between UTP cables, the wife loves it-- has crashed and had a blinking "<i>Alert!</i>" light; after disconnecting the power, the switch got back up again.<br /><br /></li><li><b>March 31, 8:05 AM.</b> No internet connectivity; road works again, like the day before? Nope, turns out my ESX box, which runs a virtual m0no0wall router, has completely frozen and can only be brought back by a hard reset.<br /><br /></li><li><b>March 31; 8:10 AM. </b>Thirdly, I discovered my Dell Perc 5i controller now freezes the computer after the power has been cycled. Interesting. Trying to enter the Perc 5i BIOS for configuration also freezes the computer. Fear kicks in. </li></ul>About a year ago, I already burned a Perc 5i controller (including the sizzling, smoke and fireworks) and I decided to buy a second hand controller from eBay again. That replacement never fully worked as I liked it (for example, after resetting the computer, the controller is no longer recognized -- in fact it is only recognized after a power cycle; strange!). A bit pissed off, I blame myself for accepting a half-and-half working controller for hosting all my data (family pictures, personal documents, ...). I'm already fearing that I will have to buy a replacement controller & restore all my data from Amazon S3 & <a href="http://www.jungledisk.com/">JungleDisk</a> (which I subscribed to after the previous controller went up on smoke)... weeks of downtime.<br /><ul><li><b><span class="Apple-style-span" style="font-weight: normal; "><b>March 31, 8:30 AM. <span class="Apple-style-span" style="font-weight: normal;">I remember that shortly after I got the Perc 5i controller, I got a few warnings about ECC errors being discovered in the DIMM that provides the read/write cache. I decide to replace the DIMM as BIOS's crashing all of the sudden seems a bit unreal. Unfortunately, to no avail.</span><br /><br /></b></span></b></li><li><b>March 31, 8:45 AM. </b>After some fiddling around with the controller, I notice the Perc 5i BIOS is accessible without any drives connected. Puzzling, but after performing a factory reset of the card (erasing the FlashROM) and performing a "foreign array import" of my two RAID arrays, the disks are discovered again & the computer tries to boot up. All this is followed by a little dance of happiness around the computer, thanking the computer gods for resurrecting the RAID array.<br /><br /></li><li><b>March 31, 8:55 AM</b>. Immediately after the import, all volumes seem to report suspicious RAID consistency and an automated consistency check & back initilization is automatically started. The just recovered peace of mind is disturbed and fear for data corruption kicks in. Anyway, the only thing to do is wait several hours for the data consistency checks to complete, so I just boot into ESX.<br /><br /></li><li><b>March 31, 8:57 AM.</b> ESX now freezes somewhere halfway in the boot. Turns out I am running an unpatched vSphere 4.0 which still has an older megaraid_sas. I remember <a href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1013026">issues</a> were reported with this driver and this is confirmed when inspecting the vmkernel logs. They reveal that the megasas driver is receiving tons of AEN events (Automated Event Notifications):<br /><br /><pre>esx01 vmkernel: 0:03:28:31.377 cpu3:4193)<6>megasas_hotplug_work[6]: event code 0x006e<br />esx01 vmkernel: 0:03:28:31.387 cpu3:4193)<6>megasas_hotplug_work[6]: aen registered<br />esx01 vmkernel: 0:03:28:31.518 cpu1:4485)<6>megasas_service_aen[6]: aen received<br />esx01 vmkernel: 0:03:28:31.518 cpu0:4196)<6>megasas_hotplug_work[6]: event code 0x006e<br />esx01 vmkernel: 0:03:28:31.528 cpu0:4196)<6>megasas_hotplug_work[6]: aen registered<br />esx01 vmkernel: 0:03:29:51.334 cpu3:4251)<6>megasas_service_aen[6]: aen received<br />esx01 vmkernel: 0:03:29:51.334 cpu2:4205)<6>megasas_hotplug_work[6]: event code 0x0071<br />esx01 vmkernel: 0:03:29:51.349 cpu2:4205)<6>megasas_hotplug_work[6]: aen registered<br />esx01 vmkernel: 0:03:29:54.318 cpu3:4246)<6>megasas_service_aen[6]: aen received<br />esx01 vmkernel: 0:03:29:54.318 cpu0:4207)<6>megasas_hotplug_work[6]: event code 0x0071<br />esx01 vmkernel: 0:03:29:54.334 cpu0:4207)<6>megasas_hotplug_work[6]: aen registered<br />esx01 vmkernel: 0:03:29:57.405 cpu3:4246)<6>megasas_service_aen[6]: aen received<br />esx01 vmkernel: 0:03:29:57.405 cpu2:4193)<6>megasas_hotplug_work[6]: event code 0x0071<br />esx01 vmkernel: 0:03:29:57.421 cpu2:4193)<6>megasas_hotplug_work[6]: aen registered<br /></pre>For an unknown reason, the ESX server is unable to cope with the massive amount of events received and slows down dreadfully (In retrospect I noticed it did not actually crash).<br /><br />I decide to boot back into the Perc 5i BIOS and let the consistency check finish. Turns out again everything freezes before I can enter the BIOS so I need to disconnect all drives again, perform a factory reset & re-import my RAID arrays. I let the consistency checks start & hurry to work.</li></ul><ul><li><b>March 31, 21:00 PM. </b>Consistency checks have finished but now ESX refuses to boot up, no longer finding the service console VMDK & reports:<br /><br /><pre>VSD mount/Bin/SH:cant access TTY job control turned off.</pre><br />Interesting. I discover a <a href="http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012142">VMware KB</a> that describes this behavior, which explains that sometimes LUN's can be discovered as snapshots when changes are made at the storage array. I conclude that my consistency checks & foreign array importing might have messed up the identifiers such that now ESX can no longer find the Service Console VMDK and goes berserk. After following the steps in the KB (basically resignaturing all VMFS volumes), everything works again. Afterwards, I discover that I had switched the two cables connecting both of my RAID arrays (cable 1 got attached to port 2 and vice versa). Doh!!!<br /><br /></li><li><b><span class="Apple-style-span" style="font-weight: normal; "><b>March 31, 21:30 PM. </b>Time to install ESX 4.0 update 1a; yet again, another issue: not enough diskspace to install the patches! After cleaning up the /var/cache/esxupdate, sufficient diskspace is available.<br /><br /></span></b></li><li><b>March 31, 22:00 PM. </b>After having booted up everything, I again notice a very bad performance of ESX, and my suspicion is confirmed when I notice again the same megaraid_sas AEN events in the vmkernel logs. Strangely enough the error only occurs when I access my fileserver virtual machine, which is the only virtual machine that runs on the second of two RAID arrays... hmmm.<br /><br /></li><li><b>April 1, 13:00 PM.</b> Some time for further analysis. I start a virtual machine running on my first RAID array and see that no AEN events are logged in the vmkernel log. Then I decide to add the VMDK's of my fileserver, all hosted on my second RAID array, one by one. The first VMDK is hotadded to a Windows 2008 virtual machine fine and I can see the data is still intact. Big relief! But indeed, when adding the second and third VMDK, the AEN events are flooding the vmkernel logs again.<br /><br /><b><span class="Apple-style-span" style="font-weight: normal; ">At this time, I am becoming more and more convinced that not the Perc 5i controller is involved for the issues, but one or more disks in the second RAID array. </span></b><br /><br /></li><li><b>April 1, 14:00 PM. <span class="Apple-style-span" style="font-weight: normal; ">I decide I want to have a look at the Perc 5i controller logs to see if errors are logged at the HBA level. Since the Perc 5i uses a LSI logic chip, I use the procedure <a href="http://timjacobs.blogspot.com/2008/05/installing-lsi-logic-raid-monitoring.html">I blogged about</a> a while back to install the MegaCLI tool again.<br /><br />At this point, I discover that it is no longer possible to use the LSI MegaCLI tools under vSphere. I guess VMware finally decided that the Service Console has to run as a virtual machine and the Perc 5i card is no longer exposed inside the Service Console. LSI MegaCLI therefor reports that no compatible controllers are present. Bummer! Apparantly some people report in <a href="http://communities.vmware.com/thread/228615">the VMware Community forums</a> that LSI MSM (remote management server?) seems to work with limited functionality but I decide not to try to install this.<br /><br /></span></b></li><li><b><span class="Apple-style-span" style="font-weight: normal; "><b>April 1, 17:00 PM. </b>Time to think of an alternative way of discovering what is wrong in the second RAID array. It is a RAID5 array of 4 Seagate 1 TB disks (yes, the <b><span class="Apple-style-span" style="font-weight: normal; ">ST31000340AS series that had <a href="http://www.tomshardware.com/news/seagate-500gb-1tb-firmware-update,6867.html">the firmware issues</a>)</span></b>, and my suspicion is now that a single disk has failed, but the failure is not picked up by the Perc 5i controller, or not reported by the disk firmware. That is particularly bad because I don't want to pull the wrong disk out of a RAID5 array with a failed disk -- obviously causing a total data loss, which would be very, very, very, VERY depressing after all the happiness that I still had my data ;).<br /><br />Time to pull out the Seagate selftests and indeed, testing each drive individually revealed that one of the drives had failed. </span></b></li></ul><b><span class="Apple-style-span" style="font-weight: normal; "><div><b><span class="Apple-style-span" style="font-weight: normal; ">So the conclusion is: time for another RMA! I now have had each of my four Seagate 1 TB disks fail on me. In fact, out of the 8 Seagate drives I own, I have already requested 7 RMA's. At times like these I remember why I coughed up a massive amount of money to get my hands on the Western Digital Caviar Black edition (which AFAIK is the last consumer disk to provide a 5 year warranty).</span></b></div></span></b></div><div><b><span class="Apple-style-span" style="font-weight: normal; "><br /></span></b></div>Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com1tag:blogger.com,1999:blog-4834634390856475978.post-13516941157170263232009-08-31T13:19:00.011+02:002009-08-31T13:48:17.458+02:00Hosting your DNS on vSphere 4 - caveatFor a while now, I was having an issue with my whitebox ESX4.0 server: after rebooting this machine, I was unable to connect to it using the vSphere client. The error I was receiving was a simple "503: Service unavailable". The hostd.log on the host was filled with errors like:<br /><br />--F637FB90 warning 'Proxysvc Req00002'-- Connection to localhost:8309 failed with error N7Vmacore15SystemExceptionE(Connection refused).<br /><br />and I noticed that the /var/log/messages contains a lot of vmware-authd start & stop messages. I struggled and managed to find a workaround which consisted of:<br /><ul><br /><li>Logging onto the service console as root</li><br /><li>Edit the <b>/etc/vmware/hostd/config.xml</b> file and disabling the "proxysvc" component of hostd.</li><br /><li>Restart the hostd process (service mgmt-vmware restart)</li><br /><li>Wait for all my autostart VM's to come online</li><br /><li>Re-enable the "proxysvc" and restart hostd once again</li><br /></ul>Today, I discovered <a href="http://communities.vmware.com/thread/216408">this thread</a> on the VMware communities which contained the answer I was looking for: the DNS servers I had configured on my ESX box were virtual machines running on the box itself (in my case: a <a href="http://m0n0.ch/wall/">m0n0wall virtual appliance</a> and a Windows 2008 domain controller with DNS). Apparently this disrupts the proxysvc component of hostd (since the virtual DNS servers are not reachable at the time hostd is first started - autostart is yet to kick in), causing it to fail to start properly and preventing vSphere client connections. Furthermore, this prevented the autostart of VM's all together, thus never getting DNS to get up and running at all.<br /><br />The solution was to clear my <b>/etc/resolv.conf</b> file and now everything works fine immediately after a reboot (no more attempts to connect to a virtual machine that is not yet running)! This completely slashes DNS support (in particular if you are using HA, you'll need to do good /etc/hosts maintenance). Since your typical production environment probably is not running the entire DNS infrastructure as a or several virtual machine(s), you probably are never exposed to this issue anyway.Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com1tag:blogger.com,1999:blog-4834634390856475978.post-64853612836627764282009-04-09T18:45:00.009+02:002009-04-09T19:19:05.694+02:00Active Directory over SSL in VMware Lifecycle ManagerI recently have been playing around with <a href="http://www.vmware.com/products/lcm/">VMware's Lifecycle Manager</a> appliance, and one of the small "gotcha's" I ran into was how to configure secure communications between the LCM appliance and the Active Directory backend I was authenticating against.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqhyTaeY8H1TNPk5A4SDN0rQCJggU784lYHHRXyuH7H_rvwqw_AjQtMJQLGUiNFmF5HGHj6rNWE4_P6OlEA-neSBtobvPXbOqFup8ZMbFvvxMnnJdVgzIOowXbwiFh4mOqwlAn4asW1ZEE/s1600-h/20090409-LCMLDAP.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 176px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqhyTaeY8H1TNPk5A4SDN0rQCJggU784lYHHRXyuH7H_rvwqw_AjQtMJQLGUiNFmF5HGHj6rNWE4_P6OlEA-neSBtobvPXbOqFup8ZMbFvvxMnnJdVgzIOowXbwiFh4mOqwlAn4asW1ZEE/s320/20090409-LCMLDAP.png" alt="" id="BLOGGER_PHOTO_ID_5322736623843632354" border="0" /></a><br />After configuring LCM to use Active Directory and SSL, I was getting the following error message:<br /><blockquote>Error: Unable to connect to LDAP Server / simple bind failed: dc.pretnet.local:636<br /></blockquote><br />In order to get the SSL authentication working for Active Directory (or LDAP in general), you need to be sure that the Certificate Authority that issues your domain controller certificates is trusted by the appliance (you don't need to actually import the domain controller certificate itself, just the issuing CA is sufficient). This is done by going through the following steps:<br /><ol><li>First, obtain a copy of the issuing certification authority's certificate (without private key obviously). Ensure that it is in the X.509 format, Base64 encrypted or DER encrypted. The appliance doesn't seem to support certificate containers (P7B format), so when you export the certificate using the Certificates MMC, ensure you select one of the first two options as the export format!!<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglAS8lri-LL1Htgnp03-AkF_Dn9leeivyJ3aClwf7z1_EdTfKWsmjC6ff-MfAGbNtaypWEViGNo1Ji0tsV805LV5JV6GNVwp5IoCVTFIFHxYKQ2tbfzCP-ZnnqWCVDaphp6pN_PPWP3Gal/s1600-h/20090409-LCMCertFormats.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 191px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglAS8lri-LL1Htgnp03-AkF_Dn9leeivyJ3aClwf7z1_EdTfKWsmjC6ff-MfAGbNtaypWEViGNo1Ji0tsV805LV5JV6GNVwp5IoCVTFIFHxYKQ2tbfzCP-ZnnqWCVDaphp6pN_PPWP3Gal/s320/20090409-LCMCertFormats.png" alt="" id="BLOGGER_PHOTO_ID_5322740461560554994" border="0" /></a><br /></li><li>To add the X.509 certificate to the appliance, go to the "Network" tab and select the "SSL Certificate" configuration pane. Here, import the certificate file.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3Rid-5WlOXgHPQCJ_-DkwS3JtOsANaU6pfSJEYt3kKwEcWoTbfBoYpfdNzydprGGViPLnjlG6VuvfqXy9acV48OkAcJXusyhwr8ll7toiiQoI7juGOdLsFarhdW7VYw969TIQ1-RPkoFT/s1600-h/20090409-LCMSSL.png"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 166px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj3Rid-5WlOXgHPQCJ_-DkwS3JtOsANaU6pfSJEYt3kKwEcWoTbfBoYpfdNzydprGGViPLnjlG6VuvfqXy9acV48OkAcJXusyhwr8ll7toiiQoI7juGOdLsFarhdW7VYw969TIQ1-RPkoFT/s320/20090409-LCMSSL.png" alt="" id="BLOGGER_PHOTO_ID_5322740923821807474" border="0" /></a></li><br /><li>Next, restart the "<span style="font-weight: bold;">VMO Configuration Server</span>", which you can find at the bottom of the "Server" tab in the GUI.<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjrs4P1-rd2QLsrnpAp1Kw3G0-7HU2u0pg0P6cqwM7OwBFq8Rae7DE4nfomGvj8rXp8n5dPSrl4aAljmZqUbWOUdcYd_6kc3HJh5Jm_Hi7WF3iqgexUTD63BcKBBYg2Y2K6pEddv_Mi_JH/s1600-h/20090409-LCMConfigRestart.png"><br /><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 193px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjrs4P1-rd2QLsrnpAp1Kw3G0-7HU2u0pg0P6cqwM7OwBFq8Rae7DE4nfomGvj8rXp8n5dPSrl4aAljmZqUbWOUdcYd_6kc3HJh5Jm_Hi7WF3iqgexUTD63BcKBBYg2Y2K6pEddv_Mi_JH/s320/20090409-LCMConfigRestart.png" alt="" id="BLOGGER_PHOTO_ID_5322737743473413586" border="0" /></a><br /><span style="font-weight: bold;">Note:</span> if you get an error message that first you need to fix your LDAP configuration (and "Plugins" section) before you can restart the VMO Configuration Service, go back to the LDAP configuration and disable SSL for a moment.</li></ol>That's it! Secure Active Directory authentication (which is what we all want) is now working properly! It's a good idea to import the certificate right away, because your other configuration tasks are severily limited when the authentication (either using the built-in OpenLDAP server on the appliance, or using Active Directory) is not working properly.<br /><br />As a sidenote, I would like to add that, despite VMware recommending to run Lifecycle Manager on a dedicated Windows box (<a href="http://www.vmware.com/pdf/lcm1_admin_guide.pdf">LCM Administration Guide</a> v1.01, p21), the appliance is a really convenient way of running and upgrading this product without too much hassle. Of course, don't forget to offload the configuration database from the appliance (use a dedicated SQL or Oracle server)!Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-24524112155484435212008-12-22T11:09:00.005+01:002008-12-22T11:16:01.956+01:00Counting ESX Server storage pathsAt a customer, we have been hitting with one of the built-in storage limits of ESX Server: you can only present up to 1024 storage paths to a single ESX host. Depending on your SAN topology, each LUN that you present over a fiber fabric uses 4, 8 or even 16 storage paths. You can check this using the esxcfg-mpath command:<br /><br /><span style="font-size:85%;"><span style="font-family: courier new;">Disk vmhba1:9:2 /dev/sdf (102400MB) has 8 paths and policy of Fixed</span><br /><span style="font-family: courier new;"> FC 13:0.0 10000000c96e8972<->50001fe15009264e vmhba1:9:2 On active preferred</span><br /><span style="font-family: courier new;"> FC 13:0.0 10000000c96e8972<->50001fe15009264a vmhba1:10:2 On</span><br /><span style="font-family: courier new;"> FC 13:0.0 10000000c96e8972<->50001fe15009264c vmhba1:11:2 On</span><br /><span style="font-family: courier new;"> FC 13:0.0 10000000c96e8972<->50001fe150092648 vmhba1:12:2 On</span><br /><span style="font-family: courier new;"> FC 16:0.0 10000000c96e8ccc<->50001fe15009264f vmhba2:12:2 On</span><br /><span style="font-family: courier new;"> FC 16:0.0 10000000c96e8ccc<->50001fe15009264b vmhba2:13:2 On</span><br /><span style="font-family: courier new;"> FC 16:0.0 10000000c96e8ccc<->50001fe15009264d vmhba2:14:2 On</span><br /><span style="font-family: courier new;"> FC 16:0.0 10000000c96e8ccc<->50001fe150092649 vmhba2:15:2 On</span></span><br /><br />To count the total number of paths presented to a single ESX host, you can use the following service console command:<br /><br /><span style="font-size:85%;"><span style="font-family: courier new;">esxcfg-mpath -l | grep paths | awk '{ split($0, array, "has "); split(array[2], array2, " paths"); SUM +=array2[1] } END { print SUM}'</span></span><br /><br />Probably the awk syntax can be greatly shortened but I am no awk/grep/sed expert :). Nevertheless, you can script this command into a cron job such that you can receive reports on whether or not you are hitting this limit.Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-84667895812602377592008-11-30T21:42:00.012+01:002009-02-27T17:53:46.157+01:00App-V 4.5 Certificate Galore<u><span style="font-weight: bold;">1) Setting</span></u><br />This weekend I finally found some time to delve a bit deeper into properly configuring an App-V 4.5 infrastructure for large scale deployments. One of the first things that I investigated was the usage of RTSPS for smoother firewall tunneling: as you know, when using RTSP a series of ports is dynamically chosen, which means that you need to open up entire portranges in your firewall. This is not something your firewall guys will like if you work in a larger environment.<br /><br />Going for RTSPS means you need to use a server public certificate and a corresponding private key in order to let the App-V server sign and encrypt its communications. I have blogged before about <a href="http://timjacobs.blogspot.com/2007/10/configuring-rtsps-rtsp-over-tls-in.html">how to configure this in SoftGrid 4.1/4.2</a> -- luckily the procedure for configuring an SSL certificate got a lot simpler. At least, that is what I thought. Some issues I ran into that might save you some valuable troubleshooting time:<br /><ul><li>As always, when requesting a certificate from your Enterprise PKI, use the Virtual Application Server's FQDN as the subject. It is probably also a good idea to use the hostname as <a href="http://timjacobs.blogspot.com/2008/05/enabling-subject-alternate-name.html">a subject alternate name</a> for those people that still refer to servers by their shortnames.<br /><br /></li><li>After the App-V 4.5 Web Management Service has been installed, don't forget to configure the certificate for the IIS Default Website. In IIS7, that requires adding a binding & selecting the proper certificate. It is not clear to me why the App-V installer cannot handle this automatically!?<br /><br /></li><li>App-V 4.5 runs under the NETWORK SERVICE account by default and no longer under the SYSTEM account as SoftGrid 4.1/4.2 used to. This has some consequences when it comes to Windows PKI: you need to grant the NETWORK SERVICE account read permissions on <span style="font-weight: bold;">the private key</span>.<br /></li></ul>This later action is a lot harder than you think when reading them ;). Read on for more information.<br /><br /><span style="font-weight: bold;">2) Configuring permissions on private keys</span><br />You have three options to get this working:<br /><ul><li>If you are using a Windows 2008 Enterprise CA and are using your own certificate templates, then you can modify the template to automatically grant the NETWORK SERVICE account read permissions on all certificates issued using that template.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzUJ4t0Jv-Xx2jiHmET21R3UasJyGCsXKfCQaP3hyeIpx-JdPbEipRy0WLeg_904NxGy9SpinZg5i-noUT4SeAhu3O3MDxCVOokRl6ID7lzxrELtOB9jF3tsJ5h_rvGP04-8ifvKpyZkwB/s1600-h/20081201-ReadPerm.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 208px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzUJ4t0Jv-Xx2jiHmET21R3UasJyGCsXKfCQaP3hyeIpx-JdPbEipRy0WLeg_904NxGy9SpinZg5i-noUT4SeAhu3O3MDxCVOokRl6ID7lzxrELtOB9jF3tsJ5h_rvGP04-8ifvKpyZkwB/s320/20081201-ReadPerm.jpg" alt="" id="BLOGGER_PHOTO_ID_5274744434358602578" border="0" /></a><br />Since you will typically be creating a new certificate template for server deployment (to enable longer than 2 years validity & exporting of private keys), this is probably the easiest solution if you have a Windows Server 2008 Enterprise CA.<br /><br /></li><li>In a pre-Windows 2008 CA world, you will have to use the <a href="http://www.microsoft.com/downloads/details.aspx?familyid=c42e27ac-3409-40e9-8667-c748e422833f&displaylang=en">WinHTTPcertcfg.exe</a> tool, the Windows HTTP Services Certificate Configuration tool. In our situation, we need to modify the ACL of the certificate to grant read access to the service account of the Management Service (which is the NETWORK SERVICE by default).<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">winhttpcertcfg -g -c LOCAL_MACHINE\My -s (subjectname) -a NetworkService</span></span><br /><br />Verify that everything went ok by listing the permissions:<i><br /><br /></i><span style=";font-family:courier new;font-size:85%;" >winhttpcertcfg –l –c LOCAL_MACHINE\My –s (subjectname)</span><br /><br /></li><li>It is also possible to explicitly set the permissions on the private key file. This information is based on <a href="http://blogs.technet.com/softgrid/archive/2007/11/20/setting-up-an-application-virtualization-in-secure-mode.aspx">information obtained from the App-V blog</a>, with some corrections below.<br /><br /><ul><li>First, obtain the certificate thumbprint. You can find this in the details tab of the certificate:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiflbzK5Js5-PROx5IlNt_LviOKecIUMTe_x9IIlO6Zqj7mPC8zUdoEABp16b-sT2q4oMCA-j72jXk58VjNnOYriUyOZJ9P5R7gSkjmaiAya1UPWncH3JxAepR2PsK3MakeJ3dCKf5wjTru/s1600-h/20081201-Thumbprint.JPG"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 237px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiflbzK5Js5-PROx5IlNt_LviOKecIUMTe_x9IIlO6Zqj7mPC8zUdoEABp16b-sT2q4oMCA-j72jXk58VjNnOYriUyOZJ9P5R7gSkjmaiAya1UPWncH3JxAepR2PsK3MakeJ3dCKf5wjTru/s320/20081201-Thumbprint.JPG" alt="" id="BLOGGER_PHOTO_ID_5274741387422913378" border="0" /></a>Copy/paste the thumbprint for the next commandline.<br /><br /></li><li>Next, use the <a href="http://msdn.microsoft.com/en-us/library/ms732026.aspx">FindPrivateKey.exe</a> utility to locate the private key file on disk (<span style="font-style: italic;">compiled version available </span><a style="font-style: italic;" href="http://xneuron.wordpress.com/2007/12/05/x509-certificate-installation/">here</a><span style="font-style: italic;"> -- download & use untrusted executables from the internet at your own risk</span>). Use the following syntax:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">FindPrivateKey.exe My LocalMachine -t "your thumbprint"</span></span><br /><br />This will give you the full path. Read the <span style="font-weight: bold;">caveat message</span> below if this path looks awkward.<br /><br /></li><li>Grant the NETWORK SERVICE account read & execute permissions on the private key file.<br /></li></ul></li><br /><span style="font-weight: bold; font-style: italic;">CAVEAT: </span><span style="font-style: italic;">the location of the private key should be in a publicly accessible location. For WinXP/Win2K3 the default is:</span> <span style="font-style: italic;font-size:85%;" ><span style="font-family:courier new;">C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys </span></span> <span style="font-style: italic;">For W2K8/Vista, this changed to:</span> <span style="font-style: italic;font-size:85%;" ><span style="font-family:courier new;">C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys</span></span> <span style="font-style: italic;">If you have a different location, then take actions to deplace the private key. I requested my certificate through the Web Enrollment pages of Active Directory Certificate Services on Windows 2008. This stores the public & private key in your user account's profile by default. I knew this and drog & dropped the public certificate from the "Certificates (My User)" to the "Certificates (My Computer)" MMC and when your private key was marked as exportable, this is indeed possible. However, this does not actually move the private key and leaves it in your user profile location (for example: </span><span style="font-style: italic;font-size:85%;" ><span style="font-family:courier new;">C:\Users\Administrator\AppData\Roaming\Microsoft\Crypto\RSA</span></span><span style="font-style: italic;">). I fixed this by explicitly exporting the certificate & private key from my user account and then explicitly importing everything again. So huge warning for all you regular crypto-users: no more drag 'n dropping of public/private keypairs!</span></ul><span style="font-weight: bold;">4) Conclusion</span><br />A bit messy... yet secure! The move towards the NETWORK SERVICE account for the App-V Management service (... and other Microsoft products as well) is obviously a good choice, yet it brings along some difficulties that probably can be streamlined from within the App-V Management Server's installer.<br /><br />PS: You didn't forget to grant the NETWORK SERVICE account also read permissions on your content directory, since otherwise your streaming won't work?Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com1tag:blogger.com,1999:blog-4834634390856475978.post-8088967703646823832008-11-21T10:31:00.005+01:002008-11-27T15:49:19.862+01:00VMware Tools without a reboot?Every now and then, you see blogposts appearing on the "issue" that you need to reboot a guest operating system after you install or update the VMware Tools. Many people have pondered about whether a reboot is in fact really necessary and if it can be avoided all together. Recent posts about this can be read <a href="http://www.ntpro.nl/blog/archives/763-How-to-install-VMware-tools-without-a-reboot.html">here</a> and <a href="http://halr9000.com/article/642">here</a>, refering to <a href="http://communities.vmware.com/thread/168530">this VMware community thread</a> -- the question is still alive in multiple-year spanning threads like <a href="http://communities.vmware.com/thread/15561">this one right here</a>. I usually frown my eyebrowses when reading on these "no reboot" topics, yet I am interested in keeping up with the advancements in that subject for some of the large customers that I come in contact with professionaly.<br /><br />The scripts and methods outlined in these blogposts sound a bit tricky at first if you ask me, and I feared they might not have the outcome you expected. I would think the VMware tools really require a reboot on some operating systems because you update parts of the virtual device drivers and those need to be reloaded by a reboot of the operating system (<span style="font-style: italic;">Note: strictly speaking you don't need a reboot for all types of device drivers, only under a specific set of circumstances </span><a style="font-style: italic;" href="http://www.microsoft.com/whdc/system/pnppwr/pnp/no_reboot.mspx">documented by Microsoft</a><span style="font-style: italic;">. The VMware disk drivers host a boot device so that would fit under the "requires a reboot" category from that document</span>). This means that just running the installer with a "Suppress Reboot" parameter on all your machines will place the new VMware Tools files on your harddisk, but will not actively load all of them... I am not sure if that is a state I would want my production virtual machines in!? And to be very clear: what these scripts do is request an automatic postpone of the reboot, not trigger some hidden functionality in VMware Tools not to really reboot after all!<br /><br />To remove all suspicion, I did a little test on a Windows 2003 virtual machine and upgraded the tools from ESX 3.0.2 to ESX 3.5U2 without rebooting (using the commandline <span style="font-size:85%;"><span style="font-family:courier new;">setup.exe /S /v"REBOOT=R /qb"</span></span> on the VMware Tools ISO). This effectively updates the following services and drivers without rebooting:<br /><ul><li>VMware services (bumped from build 63195 to build 110268)</li><li>VMware SVGA II driver, VMware Pointing Device driver</li></ul>It left the following drivers untouched:<br /><ul><li>VMware Virtual disk SCSI Disk Device ("dummy" harddisk driver - Microsoft driver)<br /></li><li>NECVMWar VMware IDE CDR10 (virtual CD-ROM driver)</li><li>Intel Pro/1000 MT Network Connection (vmnet driver - Microsoft driver)</li><li>LSI Logic PCI-X Ultra320 SCSI Host Adapter (storage adapter - Microsoft driver)</li></ul>It turned out that these drivers didn't require updating for my specific virtual machine (even after a reboot). In fact, I wasn't immediatelly able to find one machine in the test environment at work that required updating any bootdisk device drivers (and some still had 3.0.2 VMware Tools running!).<br /><br />To conclude, I would say that in some circumstances it is safe to postpone the reboot of your virtual machine, if at minimum the boot disk device drivers are not touched. Postponing the reboot is very convenient if you use it in the context of a patch weekend where you want to postpone the restart to one big, single reboot at the end of all your patches.<br /><br /><span style="font-weight: bold;">Update: </span>as Duncan Epping points out in <a href="http://www.yellow-bricks.com/2008/11/27/installing-vmware-tools-without-a-reboot/">a recent blogpost</a>, be also advises that updating the network driver effectively drops all network connections. This is for all practical purposes maybe just as bad as actually rebooting your server, so beware with the "fake level of safety and comfort" that you might have by postponing a VMware Tools reboot!Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-35511388613473645182008-08-14T09:43:00.015+02:002008-08-14T14:23:25.666+02:00Matching LUN's between ESX hosts and a VCB proxyOne of the problems that I encountered at a customer was to discover what VMFS partitions were presented to a VCB proxy. It turned out to be a bit more complex than I had first expected.<br /><br /><span style="font-weight: bold;">Introduction</span><br />VMware released the VCB framework (<a href="http://www.vmware.com/products/vi/consolidated_backup.html">VMware Consolidated Backup</a>) to make a backups of a virtual machine. The VCB framework is typically installed on a Windows host (the VCB proxy), and in order to make SAN backups, you need to present both the source LUN, which contains the virtual machines to backup, and the destination LUN, where the backup files are stored, to that VCB proxy.<br /><br />This setup is relatively simple to maintain in smaller environments. However, once you get in a big environment were a dozen teams are involved (separate networking teams, separate SAN teams, separate Windows teams and separate VMware teams), it can become quite challenging to find out which of the 12 LUN's that are presented to a Windows host in fact belong to a specific ESX host.<br /><br /><span style="font-weight: bold;">Finding unique identifiers for a LUN</span><br />The mission is to find a unique identifier (UID) that can be used both on the ESX host and the Windows box. The first two obvious candidates to uniquely identify a ESX managed LUN on a SAN network are:<br /><ul><li><span style="text-decoration: underline;">The VMFS ID for the partition</span><br />Upon the initialization of a VMFS partition, it is assigned a unique identifier that can be found by looking in the /vmfs/volumes directory on an ESX host, or by using the <span style="font-size:85%;"><span style="font-family:courier new;">esxcfg-vmhbadevs -m</span></span> command on the ESX host. The output looks like this:<br /><br /><span style=";font-family:courier new;font-size:85%;" >vmhba1:0:2:1 /dev/sdb1 48858dc4-f4e218d1-d3a8-001cc497e630<br />vmhba1:4:1:1 /dev/sdc1 483cf914-29b60dc5-dbfd-001cc497e630<br />vmhba1:4:2:1 /dev/sdd1 479da7c1-4494cd90-d327-001cc497e630</span><br /><br />The first disk is the (remainder) of the locally attached storage, and the two other disks are presented from the SAN. The first column indicates that HBA 1, SCSI target 4 and LUN's 1 and 2 are used (and partition 1 on each LUN); the second column lists the Linux device name under the Service Console and the third column lists the VMFS ID.<br /><br /></li><li><span style="text-decoration: underline;">The WWPN (World Wide Port Name) of the disk on the SAN</span><br />On a fiber-channel SAN network, each device is assigned a unique identifier called the <a href="http://en.wikipedia.org/wiki/World_Wide_Port_Name">WWPN</a>. You can compare the WWPN as performing the same function as a MAC address on an Ethernet network. The WWPN's of the disks that are presented to an ESX host can be obtained from the Service Console using the <span style="font-size:85%;"><span style="font-family:courier new;">esxcfg-mpath -l</span></span> command:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">Disk vmhba1:4:1 /dev/sdc (256000MB) has 16 paths and policy of Fixed<br />FC 13:0.0 10000000c96e8972<->500507630308060b vmhba1:4:1 On<br />FC 13:0.0 10000000c96e8972<->500507630313060b vmhba1:5:1 On<br />FC 13:0.0 10000000c96e8972<->500507630303060b vmhba1:6:1 On active preferred<br />FC 13:0.0 10000000c96e8972<->500507630303860b vmhba1:7:1 On<br />FC 13:0.0 10000000c96e8972<->500507630308860b vmhba1:8:1 On<br />FC 13:0.0 10000000c96e8972<->500507630313860b vmhba1:9:1 On<br />FC 13:0.0 10000000c96e8972<->500507630318060b vmhba1:10:1 On<br />FC 13:0.0 10000000c96e8972<->500507630318860b vmhba1:11:1 On<br />FC 16:0.0 10000000c96e8ccc<->500507630303460b vmhba2:4:1 On<br />FC 16:0.0 10000000c96e8ccc<->500507630308460b vmhba2:5:1 On<br />FC 16:0.0 10000000c96e8ccc<->500507630313460b vmhba2:6:1 On<br />FC 16:0.0 10000000c96e8ccc<->500507630303c60b vmhba2:7:1 On<br />FC 16:0.0 10000000c96e8ccc<->500507630308c60b vmhba2:8:1 On<br />FC 16:0.0 10000000c96e8ccc<->500507630313c60b vmhba2:9:1 On<br />FC 16:0.0 10000000c96e8ccc<->500507630318460b vmhba2:10:1 On<br />FC 16:0.0 10000000c96e8ccc<->500507630318c60b vmhba2:11:1 On<br /><br />Disk vmhba1:4:2 /dev/sdd (256000MB) has 16 paths and policy of Fixed<br />FC 13:0.0 10000000c96e8972<->500507630308060b vmhba1:4:2 On<br />FC 13:0.0 10000000c96e8972<->500507630313060b vmhba1:5:2 On<br />FC 13:0.0 10000000c96e8972<->500507630303060b vmhba1:6:2 On<br />FC 13:0.0 10000000c96e8972<->500507630303860b vmhba1:7:2 On<br />FC 13:0.0 10000000c96e8972<->500507630308860b vmhba1:8:2 On<br />FC 13:0.0 10000000c96e8972<->500507630313860b vmhba1:9:2 On<br />FC 13:0.0 10000000c96e8972<->500507630318060b vmhba1:10:2 On<br />FC 13:0.0 10000000c96e8972<->500507630318860b vmhba1:11:2 On<br />FC 16:0.0 10000000c96e8ccc<->500507630303460b vmhba2:4:2 On<br />FC 16:0.0 10000000c96e8ccc<->500507630308460b vmhba2:5:2 On active preferred<br />FC 16:0.0 10000000c96e8ccc<->500507630313460b vmhba2:6:2 On<br />FC 16:0.0 10000000c96e8ccc<->500507630303c60b vmhba2:7:2 On<br />FC 16:0.0 10000000c96e8ccc<->500507630308c60b vmhba2:8:2 On<br />FC 16:0.0 10000000c96e8ccc<->500507630313c60b vmhba2:9:2 On<br />FC 16:0.0 10000000c96e8ccc<->500507630318460b vmhba2:10:2 On<br />FC 16:0.0 10000000c96e8ccc<->500507630318c60b vmhba2:11:2 On<br /><br /></span></span>In this output, you can see two HBA's (that have WWPN's <span style="font-size:85%;"><span style="font-family:courier new;">10000000c96e8972</span></span> and <span style="font-size:85%;"><span style="font-family:courier new;">10000000c96e8ccc</span></span>) that see two LUN's vmhba1:4:<span style="font-weight: bold;">1</span> and vmhba1:4:<span style="font-weight: bold;">2</span> that are presented over 16 paths.<br /><br />On the VCB proxy / Windows box, I used the Emulex HBAnywhere utility to retrieve the WWPN's of the LUN's that were presented. The output is shown in the following screenshot:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMgz80NNeYh85AsX_gWtC4bJBTf5pRJTdxw2yrQoQdwaxDpXo2VDuvXFy6AOk8lXMNI_zS_vzsOPP4RFFxcTT-bjFJJzJ4GKmoA9E1eIGKts1hLsWCnzhpSdNzet-PrRAHtayPCr6r_kO5/s1600-h/20080812-HBAnywhere.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMgz80NNeYh85AsX_gWtC4bJBTf5pRJTdxw2yrQoQdwaxDpXo2VDuvXFy6AOk8lXMNI_zS_vzsOPP4RFFxcTT-bjFJJzJ4GKmoA9E1eIGKts1hLsWCnzhpSdNzet-PrRAHtayPCr6r_kO5/s320/20080812-HBAnywhere.jpg" alt="" id="BLOGGER_PHOTO_ID_5234347795692030658" border="0" /></a><br />It is also possible to use the <span style="font-size:85%;"><span style="font-family:courier new;">HbaCmd.exe AllNodeInfo <hba></hba></span></span> command to retrieve a list of all WWPN's that a certain HBA sees.</li></ul><span style="font-weight: bold;">Looks nice, what's the problem?</span><br />Using the WWPN seemed to be the obvious answer to identifying the LUN's on both the ESX host and the VCB proxy. Until I discovered that two different LUN's where presented using the same WWPN (obviously they were on two different SAN's and presented to two different hosts). On one of our ESX hosts, a 256 GB LUN was presented using WWPN 50:05:07:63:03:08:06:0b, and on the VCB proxy, a 500 GB LUN was presented using that same WWPN -- apparently our SAN team recycles the WWPN's on the different fibre channel fabrics.<br /><br />To make matters even worse, I noticed that the same LUN was presented using one WWPN to an ESX host, and with another WWPN to the VCB proxy (I am no SAN expert myself but I assume it is possible to present the same LUN in different SAN zones using different WWPN's). I was able to verify this since VCB was able to do a SAN backup of a virtual machine that resides on a LUN with a WWPN on the ESX side that is not presented to the VCB proxy.<br /><br /><span style="font-weight: bold;">The next step: VMFS ID's as a unique identifier</span><br />So, if you cannot rely on the WWPN's to uniquely identify a LUN on a host that is connected to multiple SAN's, then surely VCB must use the VMFS ID to know what LUN to read the virtual machine data from? Right?<br /><br />On the VCB proxy & Windows machine, I tried to discover the VMFS ID's using the <span style="font-weight: bold;">vcbSanDbg.exe</span> tool (included in the VCB framework and available as <a href="http://www.vmware.com/download/eula/vcbsdt_eula.html">a separate download from the VMware website</a> -- careful, the separate download is an older version than the one included in the VCB 1.5 framework). An excerpt from its lengthy output:<br /><br /><span style="font-size:78%;"><span style="font-weight: bold;font-family:courier new;" >C:\Program Files\VCB>vcbSanDbg | findstr "ID: NAA: volume"</span><br /><span style="font-family:courier new;">[info] Found logical volume 48761b97-a4f562bd-6875-0017085d.</span><br /><span style="font-family:courier new;">[info] Found logical volume 48761bc5-3f508baa-2f5d-0017085d.</span><br /><span style="font-family:courier new;">[info] Found logical volume 483cf913-05b4f526-45b5-001cc497.</span><br /><span style="font-family:courier new;">[info] Found logical volume 479da7ac-55fe7dfe-378c-001cc497.</span><br /><span style="font-family:courier new;">[info] Found logical volume 477c2b4a-7db36616-30ea-001cc495.</span><br /><span style="font-family:courier new;">[info] Found logical volume 48843bec-154cf784-871a-001cc495.</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b10010443953555534314200044c4f47494341</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:60060e801525180000012518000000374f50454e2d56</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b4000901eb0001100003230000485356323130</span><br /><span style="font-family:courier new;">[info] ID: LVID:48761b97-dacedf9f-ebb9-0017085d0f91/48761b97-a4f562bd-6875-0017085d0f91/1 </span><br /><span style="font-family:courier new;"> Name: 48761b97-a4f562bd-6875-0017085d</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b4000901eb0001100003260000485356323130</span><br /><span style="font-family:courier new;">[info] ID: LVID:48761bc6-7b4afa63-97d9-0017085d0f91/48761bc5-3f508baa-2f5d-0017085d0f91/1 </span><br /><span style="font-family:courier new;"> Name: 48761bc5-3f508baa-2f5d-0017085d</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:6005076303ffc60b0000000000001049323130373930</span><br /><span style="font-family:courier new;">[info] ID: LVID:483cf913-458f9fa5-a749-001cc497e630/483cf913-05b4f526-45b5-001cc497e630/1 </span><br /><span style="font-family:courier new;"> Name: 483cf913-05b4f526-45b5-001cc497</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:6005076303ffc60b000000000000104a323130373930</span><br /><span style="font-family:courier new;">[info] ID: LVID:479da7b6-877867e9-dd06-001cc497e630/479da7ac-55fe7dfe-378c-001cc497e630/1 </span><br /><span style="font-family:courier new;"> Name: 479da7ac-55fe7dfe-378c-001cc497</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:6005076303ffc403000000000000128d323130373930</span><br /><span style="font-family:courier new;">[info] ID: LVID:477c2b4a-969e01e0-8d49-001cc495fb46/477c2b4a-7db36616-30ea-001cc495fb46/1 </span><br /><span style="font-family:courier new;"> Name: 477c2b4a-7db36616-30ea-001cc495</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:6005076303ffc403000000000000128e323130373930</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b40006e8890000b000010a0000485356323130</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b40006e8890000b00003770000485356323130</span><br /><span style="font-family:courier new;">[info] ID: LVID:48843bec-28cc17a4-ca9e-001cc495fb46/48843bec-154cf784-871a-001cc495fb46/1 </span><br /><span style="font-family:courier new;"> Name: 48843bec-154cf784-871a-001cc495</span></span><br /><br />Unfortunately, I was not able to discover the VMFS ID's I saw on the ESX host in this output, even though there are some resemblances:<br /><ul><li>ESX host VMFS ID <span style=";font-family:courier new;font-size:85%;" >483cf914-29b60dc5-dbfd-001cc497e630</span> looks a lot like <span style="font-weight: bold;">vcbSanDbg.exe</span> output's logical volume <span style="font-size:85%;"><span style="font-family:courier new;">483cf913-05b4f526-45b5-001cc497</span></span>.<br /><br /></li><li>ESX host VMFS ID <span style=";font-family:courier new;font-size:85%;" >479da7c1-4494cd90-d327-001cc497e630</span> looks a lot like <span style="font-weight: bold;">vcbSanDbg.exe</span> output's logical volume <span style=";font-family:courier new;font-size:85%;" >479da7ac-55fe7dfe-378c-001cc497</span>.</li></ul>Furthermore, I found out that current versions of VCB do not rely on the VMFS ID to discover virtual machines on a LUN. In Andy Tucker's talk "<a href="http://www.vmware-tsx.com/download.php?asset_id=55">VMware Consolidated Backup: today and tomorrow</a>" at VMworld 2007, it is clearly stated (slide 19) that there...<br /><blockquote>No “VMFS Driver for Windows” on proxy</blockquote><br />And furthermore that the usage of VMFS signatures is on the "todo" list for identifying LUNs on the SAN network (slide 34).<br /><br /><span style="font-weight: bold;">Other ideas?</span><br />So where does one turn when all possible solutions seem to lead to a dead end? Right: <a href="http://communities.vmware.com/">the VMware community forums</a>. The answer came in <a href="http://communities.vmware.com/thread/161447">this thread</a> by snapper.<br /><br />What I learned today is that besides the WWPN on a fiber channel network, there is another unique identifier called the NAA (Network Address Authority) to identify devices on the FC fabric. You can obtain the NAA for the LUN's on an ESX host using the esxcfg-mpath command in verbose mode using:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">esxcfg-mpath -lv | grep ^Disk | grep -v vmhba0 | awk '{print $3,$5,$2}' | cut -b15-</span></span><br /><br />The output on our ESX host looks much like this:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">6005076303ffc60b0000000000001049323130373930 (256000MB) vmhba1:4:1</span><br /><span style="font-family:courier new;">6005076303ffc60b000000000000104a323130373930 (256000MB) vmhba1:4:2</span><br /></span><br />The NAA can be seen in the vcbSanDbg.exe output shown above, and can be filtered as follows:<br /><span style="font-size:85%;"><br /><span style="font-family:courier new;">vcbSanDbg.exe | findstr "NAA:"</span></span><br /><br />The output should look like this:<br /><br /><span style="font-size:85%;"><span style="font-weight: bold;font-family:courier new;" >C:\Program Files\VCB>vcbSanDbg | findstr "NAA:"</span><br /><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b10010443953555534314200044c4f47494341</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:60060e801525180000012518000000374f50454e2d56</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b4000901eb0001100003230000485356323130</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b4000901eb0001100003260000485356323130</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:6005076303ffc60b0000000000001049323130373930</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:6005076303ffc60b000000000000104a323130373930</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:6005076303ffc403000000000000128d323130373930</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:6005076303ffc403000000000000128e323130373930</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b40006e8890000b000010a0000485356323130</span><br /><span style="font-family:courier new;">[info] Found SCSI Device: NAA:600508b40006e8890000b00003770000485356323130</span></span><br /><br />Et voila, now I can start running the esxcfg-mpath command on all our ESX hosts and start matching these NAA's with those in the output of vcbSanDbg to discover what our Windows VCB proxy has access to.Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-37670893808027840292008-08-12T15:35:00.016+02:002008-08-14T14:28:16.286+02:00VMWare D-Day: 12/08/2008I recon "<span style="font-style: italic;">12 August 2008</span>" will be long remembered by all VMWare enthousiasts out there.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN0qdzJCxHAKbvE9BPu2Tn4lqSFTrZSw1XjpIb3L5T5BLCE_6ZoegGATsuWZdUM0rPU3qehGS7jUtYxAAR0ie4TTDsog-55vGMqahL8lhl8Q5CEGoWrNErcgFv7rikmnT9DupWC4ozLdGf/s1600-h/20080812-VMware.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN0qdzJCxHAKbvE9BPu2Tn4lqSFTrZSw1XjpIb3L5T5BLCE_6ZoegGATsuWZdUM0rPU3qehGS7jUtYxAAR0ie4TTDsog-55vGMqahL8lhl8Q5CEGoWrNErcgFv7rikmnT9DupWC4ozLdGf/s400/20080812-VMware.jpg" alt="" id="BLOGGER_PHOTO_ID_5233708872465773506" border="0" /></a>That is the day that a major bug caused ESX 3.5 Update 2 no longer to recognise any license, even if the license file at your license server was perfectly valid. There is no need to sketch the horror that follows when your ESX clusters no longer detect a valid license: Vmotion fails, DRS fails, HA fails, powering on virtual machines is no longer possible... Ironically, today is also Microsoft's Patch Tuesday of August, which probably means that quite some system admininistrators where caught with their pants down (and their VM's powered off during a scheduled maintenance window) when this bug struck.<br /><span style="font-weight: bold;"></span><br />The symptoms and errors that we have been experiencing are the following:<br /><ul><li>Unable to VMotion a host from ESX 3.0.2 to ESX 3.5. The VMotion progresses until 10% and then aborts with error messages such as "operation timed out" or "internal system error".<br /><br /></li><li>HA agent getting completely confused (unable to install, reconfigure for HA does not work).<br /><br /></li><li>Unable to power on new machines:<br /><br /><span style=";font-family:courier new;font-size:85%;" >[2008-08-12 14:11:16.022 'Vmsvc' 121330608 info] Failed to do Power Op: Error: Internal error<br />[2008-08-12 14:11:16.065 'vm:/vmfs/volumes/48858dc4-f4e218d1-d3a8-001cc497e630/HOSTNAME/HOSTNAME.vmx' 121330608 warning] Failed operation<br />[2008-08-12 14:11:16.066 'ha-eventmgr' 121330608 info] Event 15 : Failed to power on HOSTNAME on esx.test.local in ha-datacenter: A general system error occurred</span><br /></li></ul>VMWare<a href="http://communities.vmware.com/message/1019685#1019685"> is promising a patch tomorrow</a>, but several forum posts (<a href="http://communities.vmware.com/message/1019726#1019726">here</a> and <a href="http://communities.vmware.com/message/1019787#1019787">here</a>) are wondering how this patch will be distributed and -- given<a href="http://communities.vmware.com/message/1019761#1019761"> the deep integration of the licensing components</a> within ESX -- whether this will require a reboot of the ESX host or not (which can be quite problematic if you cannot VMotion machines away). A possible workaround for this issue is to introduce a 3.0.2 host in the cluster as I have seen in our environment that VMotioning from 3.5 to 3.0.2 still works.<br /><br /><span style="font-weight: bold; font-style: italic;">Edit (21:20 PM):</span><span style="font-style: italic;"> hopes are up that VMware should be able to release a patch that doesn't require the ESX host to reboot. See what </span><a style="font-style: italic;" href="http://verbeiren.blogspot.com/2008/08/vmware-bug-waiting-for-patch.html">Toni Verbeiren has to say about it on his blog</a><span style="font-style: italic;">.</span><br /><br /><span style="font-weight: bold; font-style: italic;">Edit (9:00 AM 13 AUG)</span><span style="font-style: italic;">: <a href="http://www.vmware.com/landing_pages/esxexpresspatches.html">a patch has been released by VMware</a>. Regarding whether hosts need to be rebooted or not... there is good news and there is bad news: "to apply the patches, no reboot of ESX/ESXi hosts is required. One can VMotion off running VMs, apply the patches and VMotion the VMs back. If VMotion capability is not available, VMs need to be powered off before the patches are applied and powered back on afterwards."</span><br /><br />You can follow the developing crisis at the following sources:<br /><ul><li><a href="http://communities.vmware.com/thread/162377">http://communities.vmware.com/thread/162377</a></li><li><a href="http://kb.vmware.com/kb/1006716">http://kb.vmware.com/kb/1006716</a></li><li><a href="http://ictfreak.wordpress.com/2008/08/12/bug-in-esx-35-update-2/">http://ictfreak.wordpress.com/2008/08/12/bug-in-esx-35-update-2/</a><br /></li><li><a href="http://www.vmug.nl/modules.php?name=Forums&file=viewtopic&t=2954">http://www.vmug.nl/modules.php?name=Forums&file=viewtopic&t=2954</a></li><li><a href="http://lraikhman.blogsite.org/?p=111">http://lraikhman.blogsite.org/?p=111</a></li><li><a href="http://www.theregister.co.uk/2008/08/12/vmware_12_august_esx_cockup/">h</a><a href="http://www.theregister.co.uk/2008/08/12/vmware_12_august_esx_cockup/">ttp://www.theregister.co.uk/2008/08/12/vmware_12_august_esx_cockup/</a></li></ul>Even our dear friends at Microsoft write about the problem, see the blogpost <a href="http://blogs.technet.com/jamesone/archive/2008/08/12/it-s-rude-to-laugh-at-other-people-s-misfortunes-even-vmware-s.aspx">"It's rude to laugh at other people's misfortunes - even VMware's"</a> here.Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com0tag:blogger.com,1999:blog-4834634390856475978.post-62570377161571505022008-08-08T20:47:00.004+02:002008-08-08T21:18:31.637+02:00WM6 and self-signed certificatesWhen playing around with <a href="http://www.miousers.co.uk/viewtopic.php?t=4253">a new (unofficial) WM6.1 rom</a> for my Mio A701, I bumped into a well known problem with installing self-signed certificates on (homebrew?) WM6 ROMs: it is not possible to install a new CA certificate with the error message "<span style="font-style: italic;">The certificate was not successfully added; please restart your device and try again</span>". Obviously, restarting the device did not fix the problem.<br /><br />A few months ago, I already encountered the problem and I knew you could bypass it by importing the certificate directly into the mobile device's registry. However, the procedures that I read all involved:<br /><ol><li>flashing Windows Mobile 5 (or a WM6 version that was patched to accept any certificate),</li><li>importing the certificate in that temporary ROM,</li><li>exporting the relevant registry data,<br /></li><li>reflashing back to the rom that has the certificate problem,<br /></li><li>importing the certificate through the registry file you obtained earlier in step 3.</li></ol>As you can imagine, this is quite some work and since I am a lazy person by nature, I did not want to go back to WM5 after just having flashed my Mio to a brandnew and shiny WM6. Therefore, I decided to develop a shorter workaround that doesn't involve reflashing.<br /><br />The tricky part is that you need to create the proper registry file to import. This file looks like:<br /><blockquote style="font-family: courier new;">Windows Registry Editor Version 5.00<br /><br />[HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\824AF72AB87E17AC777098A4164D7A90C90C0D69]<br />"Blob"=hex:19,00,00,00,01,00,00,00,10,00,00,00,4f,e5,c4,01,4e,7d,89,4a,da,42,\<br />3f,f7,24,0f,7f,a2,19,00,00,00,01,00,00,00,10,00,00,00,cb,bc,40,37,8a,45,2c,\<br />...</blockquote>(please disregard the unintentional wrapping of the registry location; everything between the square brackets should be on one line).<br /><br />The difficult part is converting your self-signed certificate to the proper registry format. Here's how I did that:<br /><ul><li>On a regular PC, use Internet Explorer to go to a website with the certificate that you want to install on your mobile device (typically this will be Outlook Web Access or something). Open the certificate and install it on your local PC (let the certificate import wizard automatically place the certificate in whatever store it finds necessary).<br /><br /></li><li>View the certificate (in Internet Explorer or by using the Certificate MMC) and go to the "Details" tab. There you will find the "Thumbprint" of the algorithm. You will need to look up this number in a few moments, so be sure to remember the first few digits. In the case for the company I work for, the thumbprint is "<span style="font-size:85%;"><span style="font-family:courier new;">824af72ab8somethingsomething</span></span>".<br /><br /></li><li>Open your registry editor and go to the following location:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">HKEY_CURRENT_USER\Software\Microsoft\SystemCertificates\Root\Certificates\</span></span><br /><br />There should be a registry key that has the thumbprint of your certificate as its name:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGVKo4n3CqG1CdYpuBuAbka_Z5nYHasd46j3_JJ6I_BCgcf7DOmLzaGY0s9ONHtuMYVNLgvk6BJj0WPjeCL5iyuLI9TteP6HGtRk1s2q5tAb_zn_Kw7C-OZh8aTAqBFU8jK-CPr4-qxogq/s1600-h/20080808-WM6Registry_001.JPG"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgGVKo4n3CqG1CdYpuBuAbka_Z5nYHasd46j3_JJ6I_BCgcf7DOmLzaGY0s9ONHtuMYVNLgvk6BJj0WPjeCL5iyuLI9TteP6HGtRk1s2q5tAb_zn_Kw7C-OZh8aTAqBFU8jK-CPr4-qxogq/s320/20080808-WM6Registry_001.JPG" alt="" id="BLOGGER_PHOTO_ID_5232225172430335282" border="0" /></a><br />Rightclick that registry key and click "Export...". Choose a location for the exported registry data.<br /><br /></li><li>Next, open the registry export in Notepad. Replace the registry key location (between the square brackets) to <span style="font-size:85%;"><span style="font-family:courier new;">HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\</span></span> followed by the thumbprint. Next, replace the first 12 bytes in the "Blob" registry value by:<span style="font-size:85%;"><span style="font-family:courier new;"> hex:19,00,00,00,01,00,00,00,10,00,00,00</span></span>.<br /><br /></li><li>Your result should look like this:<br /><span style="font-family:courier new;"></span><blockquote><span style="font-family:courier new;">Windows Registry Editor Version 5.00</span><br /><br /><span style="font-family:courier new;">[<span style="font-weight: bold;">HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\</span>824AF72AB87E17AC777098A4164D7A90C90C0D69]</span><br /><span style="font-family:courier new;">"Blob"=hex:<span style="font-weight: bold;">19,00,00,00,01,00,00,00,10,00,00,00</span>,4f,e5,c4,01,4e,7d,89,4a,da,42,\</span><br /><span style="font-family:courier new;"> 3f,f7,24,0f,7f,a2,19,00,00,00,01,00,00,00,10,00,00,00,cb,bc,40,37,8a,45,2c,\</span><br /><span style="font-family:courier new;"> ...</span><br /></blockquote>Compare this with the original registry export that I have shown above, the differences are shown in bold.<br /><br /></li><li>Save the registry file, copy it to your mobile device and import it there. Voila! Finished! </li></ul>You can use the "Certificates" control panel to verify that your certificate is properly recognized!<br /><br /><span style="font-weight: bold; font-style: italic;">Note:</span><span style="font-style: italic;"> you must either restart the ActiveSync process on your device because it will not immediately recognize the new certificate; you can kill the ActiveSync process or restart your device (but first wait at least a few minutes such that Windows Mobile can commit your registry changes to memory!).</span><br /><br />Obviously, this is completely not supported or endorsed by anybody on this planet. Perform these actions at your own risk and be sure you know what to do in case you brick your device!Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com6tag:blogger.com,1999:blog-4834634390856475978.post-8674820253288828192008-07-29T09:57:00.026+02:002008-08-12T13:41:01.625+02:00Full backups of virtual machines and Windows VSS<span style="font-weight: bold;">Introduction</span><br />One of the new features that is appearing in backup products that take backups of an entire virtual machine, as opposed to using an agent inside the guest operating system, is the ability to cooperate with <a href="http://en.wikipedia.org/wiki/Shadow_Copy">Windows VSS (Volume Snapshot Service)</a> inside the guest. For example, the recently released version of <a href="http://www.vmware.com/support/vi3/doc/vi3_vcb15_rel_notes.html">VMWare's Consolidated Backup 1.5</a>, now supports VSS quiescing for Windows 2003, Windows Vista, Windows 2008; vizioncore's <a href="http://www.vizioncore.com/vRangerPro.html">vRanger Pro</a> backup utility has been supporting VSS for Windows 2003 for some versions already.<br /><br />Several opinions exist on whether this is in fact a useful feature or not; for example, not so long ago the developers of <a href="http://www.esxpress.com/">esXpress</a> talked about not including VSS quiescing into their product at that time because it adds additional complexity and does not offer any significant benefits in their opinion (see <a href="http://support.p2v.net/boards/read.php?3,2863,2957#msg-2865">here</a>). This discussion is still alive as you can see for example <a href="http://vmetc.com/2008/02/04/can-you-rely-on-live-backups-of-exchange-and-sql-vms">here</a>, and the big question is indeed: <span style="font-style: italic;">can you rely on live backups of database virtual machines</span>?<br /><br /><span style="font-weight: bold;">The early days of VSS</span><br />The root of the discussion is at the intended use of VSS: on a physical machine that is running a database application such as SQL Server, Exchange or even Active Directory or a DHCP server for that matter, you cannot directly read the database files since they are exclusively locked by the database application. This used to be particularly troublesome because the only way to get a backup of the data inside such a database is to use some sort of export function that had to be programmed into the database application (think of the BACKUP TSQL command or a brick-level backup of an Exchange server).<br /><br />Microsoft tackled this problem by introducing VSS, which presents a fully readable point-in-time snapshot of a filesystem to the (backup) application that initiates the snapshot. That way, a backup application can read the database file contents and put it away safely in case it is ever needed.<br /><br />However, there are two problems when reading files from a filesystem that is "frozen" in time:<br /><ul><li>a file can be in progress of being written (i.e. only 400 bytes of a 512-byte block are filled with actual data).<br /></li><li>data still in a filesystem cache or buffer in memory and not yet written to the disk (in the filesystem journal).<br /></li></ul>On top of the filesystem issues, there are two problems when reading a database that is still in use but "frozen" purely at a filesystem level:<br /><ul><li>at the time of the snapshot, a transaction could still be in progress. This can be an issue when the transaction is not supposed to be committed to the database at the end: as you know, a database query can initiate thousands of changes and perform a ROLLBACK at the end to reset any changes made since the start of the transaction.<br /><br />A good (ficteous) example here is when you try to draw 1000 euros in cash from an ATM: if you change your mind right before clicking the "confirm transaction" button on the ATM screen, then you don't want your 1000 euros to be really gone if at the same time a database snapshot is taken and your final "ROLLBACK" command is not included in the database!<br /><br /></li><li>some data could still be in memory and not written to a logfile or a database file (so-called "dirty pages").<br /></li></ul><span style="font-weight: bold;">Crash consistency versus transactional consistency</span><br />If you don't take these four problems into account, then restoring a snapshot of such a filesystem would be in fact the same as bringing back up the server after you suddenly pulled the power plug. Such a snapshot is said to be in <span style="font-style: italic;">a crash-consistent state</span>, i.e. the same state as a sudden power-loss.<br /><br />Modern filesystems have built-in mechanisms (so-called "journalling") to tackle these problems and to ensure that when such a "frozen" filesystem is restored from a backup, the open files are put back in a consistent state as possible. Obviously, any data that only existed in memory and never was written to a filesystem journal/disk is lost. Databases rely on transaction logging to recover from a crash-consistent state back to a consistent database; this is typically done by simply rolling back all unfinished transactions, effectively ignoring all transactions that were not committed or rolled back.<br /><br />Windows VSS wants to go beyond a crash-consistent snapshot and solves both the filesystem and database problem by not only freezing all I/O to the filesystem but also asking both the filesystem and all applications to flush its dirty data to disk. This allows the creation of both a filesystem consistent and an application-consistent backup. VSS has built-in support for several Windows-native technologies such as NTFS filesystems, Active Directory databases, DNS databases, ... to flush their data to disk before the snapshot is presented to the backup application requesting the snapshot. Other programs, such as SQL/Oracle databases or Exchange mailservers, use "VSS Writer" plugins to get notified when a VSS snapshot is taken and when they have to flush their dirty database pages to disk to bring the database in <span style="font-style: italic;">a transactionally consistent state</span>.<br /><br />From <a href="http://technet2.microsoft.com/WindowsServer/en/Library/2b0d2457-b7d8-42c3-b6c9-59c145b7765f1033.mspx?mfr=true">Technet</a>:<br /><p style="margin-left: 40px;"><i></i></p><i></i><blockquote><i>[...] If an application has no writer, the shadow copy will still occur and all of the data, in whatever form it is in at the time of the copy, will be included in the shadow copy. This means that there might be inconsistent data that is now contained in the shadow copy. This data inconsistency is caused by incomplete writes, data buffered in the application that is not written, or open files that are in the middle of a write operation. Even though the file system flushes all buffers prior to creating a shadow copy, the data on the disk can only be guaranteed to be crash-consistent if the application has completed all transactions and has written all of the data to the disk. (Data on disk is “crash-consistent” if it is the same as it would be after a system failure or power outage.)</i><i>. [...] </i><i>All files that were open will still exist, but are not guaranteed to be free of incomplete I/O operations or data corruption.</i><br /><i><br />Under this design, the responsibility for data consistency has been shifted from the requestor application to the production application. The advantage of this approach is that application developers — those most knowledgeable about their applications — can ensure, through development of their own writers, the maximum effectiveness of the shadow copy creation process.</i><i><br /></i></blockquote><i></i><p style="margin-left: 40px;"><i></i></p>Conclusions for the physical world: the above makes clear that there is a huge benefit in using VSS when working on physical machines: <span style="font-weight: bold;">VSS is a requirement</span> to be able to backup the entire database files and to ensure that the database is not in an inconstent state when you want to do the restore the database- and logfiles and attempt to mount them. The main advantage here is that a restored database does not have to go through a series of consistency checks that typically take up many, many hours.<br /><br /><span style="font-weight: bold;">Going to the virtual world</span><br /><span>In the virtual world, there are several different types of backups that can be performed:</span><br /><ul><li>Performing the backup inside the guest OS.</li><li>Performing a backup of the harddisk files (VHD/VMDK) when using a virtualization product that is hosted on another operating system, such as Microsoft Virtual Server or VMWare Workstation/Server.</li><li>Performing a backup of the harddisk files (VHD/VMDK) when using a bare-metal hypervisor based product such as Microsoft Hyper-V or VMWare's ESX/ESXi Server.</li></ul>Obviously, when you perform the backup inside the guest OS, you still encounter the same problems as when attempting to back up a physical host: open files and database files are locked and thus cannot be backed up directly, so you have to revert to using VSS for the reasons discussed above.<br /><br />But what about the other two ways of performing a virtual machine backup, when attempting to back up the entire harddisk file? For starters, it is important to realize that "file locking" now occurs at two levels:<br /><ol><li>The VHD/VMDK harddisk files themselves are opened and locked by the virtualization software (be it the hypervisor for bare-metal virtualization or the executable when using hosted virtualization);</li><li>Files can be opened and locked inside in the guest operating system.</li></ol>The first issue of the open VHD/VMDK harddisk files is solved depending on the virtualization product: if you are using host-based virtualization, you can obtain a readable VHD/VMDK file by using VSS on the host operating system and asking to present an application-consistent variant of the VHD/VMDK files. If you are using a bare-metal hypervisor, a typical mechanism is by taking a snapshot of a virtual machine (which, for example in VMWare ESX, shifts the file lock from the VMDK file to the snapshot delta file, thus releasing the VMDK file for reading).<br /><br /><span style="font-weight: bold;">Open files inside the guest OS</span><br />Ironically, the solution of the first problem of open VHD/VMDK host files introduces the second problem of open files inside the guest os: once you have your snapshot of the VHD/VMDK files (be it through VSS for host-based virtualization or a VM snapshot for bare-metal hypervisors)... that snapshot is only in a crash-consistent state! After all, it is a point-in-time "freeze" of the entire harddisk and restoring such an image file would be equivalent to restarting the server after a total powerloss occured.<br /><br />VMWare attempted to tackle this problem by introducing a "filesystem sync driver" in their VMTools (which you are supposed to install in every virtual machine running on a VMWare product). This filesystem sync driver mimics VSS in the sense that it requests that the filesystem flushes its buffer to disk, guaranteeing that the snapshot -- and thus corresponding full virtual machine backup -- is in a filesystem consistent state. Obviously, this does not solve the problem for databases which tend to react quite violently to these kind of non-VSS "freezes" of the filesystem. Prototype horror stories can be read <a href="http://communities.vmware.com/thread/123564">here (AD)</a> and <a href="http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&externalId=5962168">here (Exchange)</a>.<br /><br />So what are the real solutions for this problem? I can think of two at this moment:<br /><ol><li>After taking a snapshot, <span style="font-weight: bold;">do not only backup the disks but also the memory</span>. Then, when restoring the backup, do not "power on" the virtual machine but instead "resume" it. At first, the machine will probably be "shocked" to see that the time has lept forward and that many TCP/IP connections are suddenly being dropped, but the database server you are running should be able to handle this and properly commit any unsaved data from memory to disk.<br /><br /></li><li>Trigger a <span style="font-weight: bold;">VSS operation inside the guest OS</span> to commit all changes to disk and ensure filesystem- and applicationlevel consistency, and only then take the full virtual machine snapshot.<br /></li></ol>The VSS interaction with the guest operating system was first introduced by <a href="http://www.vizioncore.com/">vizionCore</a> in their <a href="http://www.vizioncore.com/vRangerPro.html">vRanger Pro 3.2.0</a> -- which required the installation of an additional service inside the guest VM, .NET 2.0 and was only officially supported for Windows 2003 SP1+ in 32bit. With the release of <a href="http://www.vmware.com/support/vi3/doc/vi3_vcb15_rel_notes.html">VMWare Consolidated Backup 1.5</a>, VMWare announced the default queiscing of disks on ESX 3.5 Update 2 would now be done using the new VSS driver -- supported on Windows 2003/2008/Vista in both 32 & 64-bit variants. Hurray! Problem solved, right?<br /><br /><span style="font-weight: bold;">So VSS seems nice, but is it necessary?</span><br /><span>Obviously, your gut feeling will tell you that it is "nicer" and "more gentle" to the guest virtual machine when using</span> VSS when taking a snapshot and a backup. The arguments on the difference between crash-consistency, filesystem consistency and application-level consistency (which translates to transactional consistency for databases) give solid grounds to this gut feeling.<br /><br />Personally, I cannot find an argument that states that VSS is also really <span style="font-style: italic;">necessary</span> to create a full virtual machine backup. In the physical world, filesystems and databases have been hardened to recover from the crash-consistent state that you obtain when taking a snapshot of a running virtual machine to back up and restore. Hands-on experience about this robustness can be read on several informal channels such as forum posts <a href="http://support.esxpress.com/boards/read.php?3,2863,2863,quote=1#msg-2867">here</a>.<br /><br />However, if you want to be sure that your database is in a consistent state (for a faster recovery) and certainty that those few seconds of data that were not yet committed from memory to disk are in fact included in your snapshot, then VSS is what you need. The next question to answer is: what is the risk of VSS messing up and is this probability larger than not being able to restore a non-VSS-based snapshot?<br /><br /><span style="font-weight: bold;">Conclusion</span><br />Performing live backups of virtual machines seems like an interesting and simple feature of virtualisation at first. However, at a second glance, there are some important decisions to be made regarding the use of VSS/snapshotting technology that can impact your restore strategy and success. Even without any quiescing mechanism, the operating system should be able to handle the crash-consistent backups that are taken by performing live machine backups and should therefore be sufficiently reliable. With the ready availability of VSS in the new VMWare Tools that come with ESX 3.5 Update 2, much more than crash-consistent backups can be guaranteed without the need to install additional agents. The increased reliability and faster restore time (no filesystem/database consistency checks) that come with VSS quiesced snapshots make full virtual machine backups now a fully mature solution without the need to worry for possibly inconsistent backups.<br /><br /><span style="font-weight: bold;">Side remarks</span><br />Some additional remarks regarding full virtual machine backup:<br /><ul><li>Full VM backups can be an addition to guest-based file level backups, but they can never be a complete replacement:<br /><br /><ul><li>you might take a full VM based snapshot of your Exchange or SQL database every day, but a filebased/bricklevel backup (which is far more convenient to use for your typical single file/single mailbox restore operations) might be taken several times a day, depending on the SLA that your IT department has with the rest of the company.<br /><br /></li><li>a full vm backup is a good place to start a full server recovery. It is a bad place to start a single-file or a single mailbox restore.<br /><br /></li><li>a full VM backups using VSS do not allow the backup of SQL transaction logs (see "what is not supported" in <a href="http://www.microsoft.com/technet/prodtechnol/sql/2005/sqlwriter.mspx">the SQL VSS Writer overview</a>), nor do they commit transaction logs to the database in order to clear up the transaction logs (an absolute necessity for Exchange databases or for several types of SQL databases).<br /><br /></li></ul></li><li>Microsoft does not support any form of snapshotting technology on domain controllers. For more information, see <a href="http://support.microsoft.com/kb/888794/en-us">MSKB 888794</a> on "Considerations when hosting Active Directory domain controller in virtual hosting environments".</li></ul><span style="font-weight: bold;">Edit (12 Aug 2008):</span> VeeAm has released <a href="http://www.veeam.com/whitepapers/VMware%20and%20VSS%20-%20Application%20Backup%20and%20Recovery.pdf">a very interesting whitepaper</a> that discusses not only the necessity for VSS awareness during the backup process, but also during the <span style="font-style: italic;">restore</span> process. They give the example of a domain controller that performs <a href="http://support.microsoft.com/kb/875495">USN rollbacks</a> when being backed up using VSS but not restored using a VSS aware software. Another nice example is <a href="http://support.microsoft.com/kb/822896">Exchange 2003</a> that requires VSS aware restore software in order to be supported by Microsoft.<br /><br /><span style="font-style: italic;"><hr />Postscriptum: I started writing this article a few days before VCB 1.5 was released, and the original point I was trying to make at that time was that there were too many disadvantages to the available VSS implementations (yet another service to install, .NET 2.0, very limited OS support) to really profit from the benefits that VSS could offer. Of course, in the meantime, VMWare has taken away most of those objections by including VSS support in their VMTools for a wide range of server operating systems. This forced me to reconsider my view on whether VSS would be a good idea or not.</span>Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com9tag:blogger.com,1999:blog-4834634390856475978.post-25458818103341542472008-06-19T23:15:00.004+02:002008-06-20T00:28:54.036+02:00SQL Server 2005 Express Edition on Windows 2008 x64While experimenting with the Microsoft App-V 4.5 Release Candidate (more on that soon), I decided to go for a full-blown installation on Windows 2008 x64. Since this is only on my home network, I don't run a dedicated SQL server so I went for the natural choice of installing <a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=31711d5d-725c-4afa-9d65-e4465cdff1e7&displaylang=en">SQL Server 2005 Express Edition SP2</a> on my freshly installed Windows 2008 x64 App-V server.<br /><br />This turned out to be less trivial than I thought. The short answer is: if you want to have a painless install of SQL Server 2005 Express Edition, take the download that includes the “<a href="http://www.microsoft.com/downloads/details.aspx?familyid=5B5528B9-13E1-4DB9-A3FC-82116D598C3D">Advanced Services</a>” and simply don’t install them. The “smaller” download package does not include some necessary files for a successful x64 installation.<br /><br />If you want to go the hard way and patch the setup for easier automated deployment (or just to be ‘1337 and be able to say that you fixed Microsoft’s SQL Server installer for 64-bit systems…), then follow these steps:<br /><br /><ul><li>First of all, you should know that SP2 is the first Vista/Windows 2008 certified edition (think UAC, think session zero hardening, think enhanced security). Secondly, SQL Server 2005 Express Edition SP2 is supported to run under WOW64. That is very comforting to know, and I hadn't expected a true 64-bit edition for free. So why does it complain about installing a 32-bit version on a 64-bit machine then?<br /><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsQ9HCnWeE0uOvI7JqDQQ53zGqpvv_GI2FscsijocUieP7eZ9QebsvChaQKsM_GsmeZTFfp68O6PRnJ8bEZTUMJ5WURaPvBdbRJjze97kGBvh_9aF6eXCpbDTlyC-ZjOO-BbT3ZaZ96h88/s1600-h/20080619-SQLExpress64bit.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgsQ9HCnWeE0uOvI7JqDQQ53zGqpvv_GI2FscsijocUieP7eZ9QebsvChaQKsM_GsmeZTFfp68O6PRnJ8bEZTUMJ5WURaPvBdbRJjze97kGBvh_9aF6eXCpbDTlyC-ZjOO-BbT3ZaZ96h88/s320/20080619-SQLExpress64bit.jpg" alt="" id="BLOGGER_PHOTO_ID_5213721933037999010" border="0" /></a><br />"<i>The installation package has a missing file, or you are running a 32-bit only Setup program on a 64-bit computer</i>"<br /><br />Of course, what you don't see is that SQL is first installing the SQL Native Client in the background (as a prerequisite) and the error message conveniently forgets to mention that this is in fact the installation that is not succeeding. The error message was indeed accurate, but the error was not that I was trying to run a 32-bit installer on a 64-bit machine, but that the 64-bit installer for the SQL Native Client is not included in the package! What’s even worse, some other essential x64 packages are also not included in the smallest SQL Express 2005 SP2 download.<br /><br /></li><li>So you have to include the missing files manually:<br /><br /><ol><li>Download the <a href="http://www.microsoft.com/downloads/details.aspx?familyid=5B5528B9-13E1-4DB9-A3FC-82116D598C3D">“SQL Server 2005 Express Edition SP2 with Advanced Services”</a> package.<br /><br /></li><li>Run both the SQL Express installers with the /X switch to extract the setup files (to different directories):<br /><br /><span style="font-size:85%;"><span style="font-family: courier new;">sqlexpr.exe /x</span><br /><span style="font-family: courier new;">sqlexpr_adv.exe /x</span><br /></span><br /></li><li>Next, locate the 64-bit SQL Native Client <a href="http://www.microsoft.com/downloads/details.aspx?familyid=DF0BA5AA-B4BD-4705-AA0A-B477BA72A9CB">(sqlncli_x64.msi)</a> and 64-bit SQL VSS Writer (SqlWriter_x64.msi) from the Advanced Services setup and copy them to the "Setup" directory of the regular SQL Express installation.</li></ol></li></ul>Et voila! The installer works now. One day, we will live in a perfect world of unambiguous error messages...<br /><br />Now off to do some more SoftGri... ehr.. I mean Microsoft Application Vir... ehr... I mean App-V testing!Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com5tag:blogger.com,1999:blog-4834634390856475978.post-13405751805567757612008-05-25T14:29:00.009+02:002008-05-25T17:33:57.100+02:00Installing LSI Logic RAID monitoring tools under the ESX service consoleAs I discussed in <a href="http://timjacobs.blogspot.com/2008/03/esx-35-on-whitebox.html">a recent post</a>, I used a Dell Perc 5i SAS controller in my ESX whitebox server. One of the nice features of this controller is that it is <a href="http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_8408e/index.html">a rebranded LSI Logic controller</a> (with a different board layout!), supported by LSI Logic firmwares and the excellent monitoring tools that LSI offers.<br /><br />Of course, it is important to keep track of your RAID array status, so I decided to install the MegaCLI monitoring software under the ESX Server 3.5 Service Console. Here's how I did it and configured the monitoring on my system:<br /><ul><li>The MegaCLI software can be downloaded from <a href="http://www.lsi.com/storage_home/products_home/internal_raid/megaraid_sas/megaraid_sas_8408e/index.html">the LSI Logic website</a>. I used version 1.01.39 for Linux, which comes in a RPM file.<br /><br /></li><li>After uploading the RPM file to the service console, it was a matter of installing it using the "rpm" command:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">rpm -i -v MegaCli-1.01.39-0.i386.rpm </span></span><br /><br />This installs the "MegaCli" and "MegaCli64" commands in the <span style="font-size:85%;"><span style="font-family:courier new;">/opt/MegaRAID/MegaCli/</span></span> directory of the service console.</li></ul>That's it, MegaCLI is ready to be used now. Some useful commands are the following:<br /><ul><li><span style="font-weight: bold;">/opt/MegaRAID/MegaCli/MegaCli -AdpAllInfo -aALL</span><br />This lists the adapter information for all LSI Logic adapters found in your system.<br /><br /></li><li><span style="font-weight: bold;">/opt/MegaRAID/MegaCli/MegaCli -LDInfo -LALL -aALL</span><br />This lists the logical drives for all LSI Logic adapters found in your system. The "State" should be set to "optimal" in order to have a fully operational array.<br /><br /></li><li><span style="font-weight: bold;">/opt/MegaRAID/MegaCli/MegaCli -PDList -aALL</span><br />This lists all the physical drives for the adapters in your system; the "Firmware state" indicates whether the drive is online or not.<br /></li></ul>The next step is to automate the analysis of the drive status and to alert when things go bad. To do this, I added an hourly cron job that lists the physical drives and then analyzes the output of the MegaCLI command.<br /><ul><li>I created a file called "<span style="font-weight: bold;">analysis.awk</span>" in the <span style="font-weight: bold;">/opt/MegaRAID/MegaCLI</span> directory with the following contents:<br /><br /><span style="font-size:85%;"><blockquote><span style="font-family:courier new;"># This is a little AWK program that interprets MegaCLI output<br /><br />/Device Id/ { counter += 1; device[counter] = $3 }<br />/Firmware state/ { state_drive[counter] = $3 }<br />/Inquiry/ { name_drive[counter] = $3 " " $4 " " $5 " " $6 }<br />END {<br /> for (i=1; i<=counter; i+=1) printf ( "Device %02d (%s) status is: %s <br/>\n", device[i], name_drive[i], state_drive[i]); } </span><span style="font-family:courier new;"><br /></span></blockquote></span>This awk program processes the output of MegaCli, as you can test by running the following command:<br /><br /><span><span style="font-size:85%;"><span style="font-family:courier new;">./MegaCli -PDList -aALL | awk -f analysis.awk</span></span></span><br /><br />when being in the<span style="font-weight: bold;"> /opt/MegaRAID/MegaCLI</span> directory.<span style="font-size:85%;"><br /><br /></span></li><li>Then I created the cron job by placing a file called <span style="font-weight: bold;">raidstatus</span> in <span style="font-weight: bold;">/etc/cron.hourly</span>, with the following contents:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;"></span><blockquote><span style="font-family:courier new;">#!/bin/sh</span><br /><span style="font-family:courier new;"> </span><br /><span style="font-family:courier new;">/opt/MegaRAID/MegaCli/MegaCli -PdList -aALL| awk -f /opt/MegaRAID/MegaCli/analysis.awk >/tmp/megarc.raidstatus</span><br /><span style="font-family:courier new;"> </span><br /><span style="font-family:courier new;">if grep -qEv "*: Online" /tmp/megarc.raidstatus</span><br /><span style="font-family:courier new;">then</span><br /><span style="font-family:courier new;"> /usr/local/bin/smtp_send.pl -t tim@pretnet.local -s "Warning: RAID status no longer optimal" -f esx@pretnet.local -m "`cat /tmp/megarc.raidstatus`" -r exchange.pretnet.local</span><br /><span style="font-family:courier new;">fi</span><br /><span style="font-family:courier new;"> </span><br /><span style="font-family:courier new;">rm -f /tmp/megarc.raidstatus</span><span style="font-family:courier new;"></span><br /><span style="font-family:courier new;">exit 0</span></blockquote></span><br />Don't forget to run a <span style="font-size:85%;"><span style="font-family:courier new;">chmod a+x /etc/cron.hourly/raidstatus</span></span> in order to make the file executable by all users.<br /></li></ul>In order to send an e-mail when things go wrong, I used <a href="http://www.yellow-bricks.com/2008/01/23/howto-sending-html-email-from-the-service-console/">the SMTP_Send Perl script</a> smtp_send.pl that was discussed by Duncan Epping on <a href="http://www.yellow-bricks.com/">his blog</a>.Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com9tag:blogger.com,1999:blog-4834634390856475978.post-6814443808007989762008-05-22T23:03:00.002+02:002008-05-22T23:16:32.682+02:00Renaming a VirtualCenter 2.5 serverAfter running my VirtualCenter server on a standalone host for quite some time, I decided to join it into the domain that I am running on my ESX box (in order to let it participate in the automated WSUS patching mechanism). This also seemed like a perfect opportunity to rename the server's hostname from <span style="font-weight: bold;">W2K3-VC.pretnet.local</span> to <span style="font-weight: bold;">virtualcenter.pretnet.local</span>. However, after the hostname change, the VMWare VirtualCenter service would no longer start with an Event ID 1000 in the eventlog.<br /><br />Somehow, this didn't come as a surprise ;). This has been discussed before on the VMWare forums (<a href="http://communities.vmware.com/message/686628">here</a> and <a href="http://communities.vmware.com/message/602684">here</a>), but I post it here because I did not immediatelly find a step-by-step walkthrough.<br /><br />The problem was in fact twofold, the solution rather simple:<br /><ul><li>Renaming SQL servers is a bad idea in general (so it appears). For my small, nonproduction environment, I use SQL Server 2005 Express edition that comes with the VirtualCenter installation. If you rename a SQL server, you need to internally update the system tables using a set of stored procedures in order to make everything consist again. This is done by installing the <a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=c243a5ae-4bd1-4e3d-94b8-5a0f62bf7796">"SQL Server Management Studio Express"</a> and then executing the following TSQL statements:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">sp_dropserver 'W2K3-VC\SQLEXP_VIM'</span><br /><span style="font-family:courier new;">GO</span><br /><span style="font-family:courier new;">sp_addserver 'VIRTUALCENTER\SQLEXP_VIM', local</span><br /><span style="font-family:courier new;">GO</span><br /><span style="font-family:courier new;">sp_helpserver</span><br /><span style="font-family:courier new;">SELECT @@SERVERNAME, SERVERPROPERTY('ServerName')</span></span><br /><br />The first statement removes the old server instance (replace W2K3-VC with your old server name), the second statement adds the new server instance (replace VIRTUALCENTER with your new server name). The<span style="font-weight: bold;"> sp_helper</span> and <span style="font-weight: bold;">SELECT</span> statement query the internal database and variables for the actually recognized SQL server instances. You need to perform a reboot in order to get the proper instances with the last two statements.<br /><br /></li><li>Secondly, the System ODBC connection that is used by VMWare required an update to point to the new SQL Server instance. This was of course done using the familiar "Data Sources (ODBC)" management console.</li></ul>Afterwards, the VMWare Virtual Center Server service started just fine again.Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com5tag:blogger.com,1999:blog-4834634390856475978.post-59491160180857478752008-05-02T17:39:00.008+02:002008-05-02T18:02:40.562+02:00Enabling Subject Alternate Name certificatesWhen requesting certificates from your freshly installed Certification Authority, it can come in handy to specify multiple DNS names that this certificate should be valid for. This principle is known as specifying a list of "subject alternate names" that the server is also reachable under.<br /><br />Unfortunately, this mechanism doesn't work out of the box with Windows CA's. On your CA, you first need to enable a setting that allows the usage of SAN attributes. Open a command box and type (on one line):<br /><br /><span style="font-family:courier new;font-size:85%;" >certutil -setreg policy\EditFlags +EDITF_ATTRIBUTESUBJECTALTNAME2</span><br /><br /><span style="font-family:courier new;font-size:85%;">net stop CertSvc & net start CertSvc</span><br /><br />Afterwards, use the <span style="font-weight: bold;">SAN:dns=<fqdn1>&dns=<fqdn2></fqdn2></fqdn1></span> attribute when requesting certificates to enable multiple DNS names.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKKKMcfCdq43v5oeTx1iLjcu-SF2J0qUCdTiu0jFBCDZ2tJGFXSy3zIR_It1XUinibtAPkPU_eAkQpZ0p7TQ7bdM4YuKPKU6eVRrqDjXGgqqYzELdbfgaEe5a8p19p2ki3E7l1RAA0DTZQ/s1600-h/20080502-SANCA.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjKKKMcfCdq43v5oeTx1iLjcu-SF2J0qUCdTiu0jFBCDZ2tJGFXSy3zIR_It1XUinibtAPkPU_eAkQpZ0p7TQ7bdM4YuKPKU6eVRrqDjXGgqqYzELdbfgaEe5a8p19p2ki3E7l1RAA0DTZQ/s320/20080502-SANCA.jpg" alt="" id="BLOGGER_PHOTO_ID_5195810989239484226" border="0" /></a>Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com1tag:blogger.com,1999:blog-4834634390856475978.post-6403523433283492472008-04-30T16:39:00.011+02:002008-05-02T12:03:45.562+02:00Windows 2008 Certificate Authority and Windows 2000/XP/2003 clientsI was experimenting with Windows 2008 Certificate Services the other day in order to create certificates for WSUS 3.0 and for doing SSL tunneling of RDP towards the internet. I noticed that several of my clients were unable to automatically install the WSUS client, with vague errors in the event log (Win32HResult=0x00000000):<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcnoymYh157-oYu4geYY2hBfKuB40hhtaw6urijuujYFOJcH7ApCZFurGcJ3O7YKu-VTSpNf8BBiryXLoKk2PPf8Qbu-YohrPFP5QMH_2fUhRkDqfeqcC6Y7ZY2GW2L-RJbKEBH9UXyp5D/s1600-h/20080430-wsuserror.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcnoymYh157-oYu4geYY2hBfKuB40hhtaw6urijuujYFOJcH7ApCZFurGcJ3O7YKu-VTSpNf8BBiryXLoKk2PPf8Qbu-YohrPFP5QMH_2fUhRkDqfeqcC6Y7ZY2GW2L-RJbKEBH9UXyp5D/s320/20080430-wsuserror.jpg" alt="" id="BLOGGER_PHOTO_ID_5195054082857942786" border="0" /></a><br />I had quickly discovered that the problem was related with the certificate that I had issued for the WSUS IIS server. It turned out that Windows 2008 WSUS clients could connect without any problem to the WSUS webpage, but Windows 2003 and Windows XP clients could not. What made it even more puzzling is that on a Windows XP system, connecting to the IIS homepage didn't succeed using Internet Explorer, but worked perfectly fine using Firefox.<br /><br />Opening the certificate of my WSUS server gave the following result:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbmene6bamoOpZmacSjiorec_IF6a-tS7mrR3hF4-gtDv0P66QCe9_VmwMIrWm6nOhzMaI0rUE2UQWTRhBVZ252Hz_TihmnNtoaskc_MlhJRnKCwOMYVZwAoBzZH3TTzla2KGOvUqjOPmh/s1600-h/20080430-certificate.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbmene6bamoOpZmacSjiorec_IF6a-tS7mrR3hF4-gtDv0P66QCe9_VmwMIrWm6nOhzMaI0rUE2UQWTRhBVZ252Hz_TihmnNtoaskc_MlhJRnKCwOMYVZwAoBzZH3TTzla2KGOvUqjOPmh/s320/20080430-certificate.jpg" alt="" id="BLOGGER_PHOTO_ID_5195054924671532818" border="0" /></a><br />with a "<span style="font-style: italic;">This certificate has an nonvalid digital signature</span>" error in the "Certification Path" details for both the issued certificate and my CA certificate.<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">Root cause:</span></span><br />The answer is the bleeding obvious: Windows 2008 has <a href="http://blogs.msdn.com/windowsvistasecurity/archive/2007/06/01/pki-enhancements-in-windows-vista-and-windows-server-2008.aspx">several new additions to the cryptography API</a>, called <a href="http://technet2.microsoft.com/windowsserver2008/en/library/532ac164-da33-4369-bef0-8f019d5a18b81033.mspx?mfr=true">Cryptography Next Generation</a> (CNG), that are used in the V3 certificate templates for CA's and Webservers in Windows 2008. Amongst those new features is support for new certificate signing algorithms (in my case SHA512, a SHA-2 variant) which is not recognized by older clients. <A HREF="http://download.microsoft.com/download/c/d/8/cd8cc719-7d5a-40d3-a802-e4057aa8c631/relnotes.htm">Windows XP SP3 adds support</A> for XP, I suppose a future hotfix will add compatibility for Windows 2003.<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">Solution:</span></span><br />In absense of a worldwide XP SP3 deployment and a working hotfix for W2K3, the only option here is to ensure that the Windows 2008 CA certificate is created with a non-CNG cryptographic provider. If you already created a CA certificate using the new CNG features, the only option is to <span style="font-weight: bold;">reinstall your CA and regenerate your CA certificate</span> --- remember how mum always told you to think things over twice before just plainly installing a W2K8 CA... I bet you regret that now (just like I did :D) ? Reinstalling your CA could be messy, and make your PKI infrastructure go berserk, so this time do think twice before going down that road!<br /><br /><span style="font-size:130%;"><span style="font-weight: bold;">Step by Step plan of attack (POA)</span></span><br />So you have decided you want to proceed? First verify that you are indeed using a CNG CSP. To do this, open your registry editor and navigate to the following key:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">[HKLM\SYSTEM\CurrentControlSet\Services\CertSvc\<br /> Configuration\{CAname}\CSP]</span></span><br /><br />If you find a <span style="font-weight: bold;">CNGHashAlgorithm</span> REG_SZ value, and the <span style="font-weight: bold;">HashAlgorithm</span> DWORD is set to 0xFFFFFFFF, then you are using a CNG CSP. If the HashAlgorithm is set to a value such as 0x00008003, then you are already using a "classic" CSP. You can also use the following command on the CA to retrieve the CSP:<br /><br /><span style="font-size:85%;"><span style="font-family:courier new;">certutil -getreg ca\csp\HashAlgorithm<br />certutil -getreg ca\csp\Provider</span></span><br /><br />which will return the HashAlgorithm and the name of the CSP. For more information, I refer to the Microsoft whitepaper <a href="http://www.microsoft.com/downloads/details.aspx?FamilyID=9bf17231-d832-4ff9-8fb8-0539ba21ab95&displaylang=en">"Active Directory Certificate Server Enhancements in Windows Server Code Name Longhorn"</a>, you crypto-boys out there will love it.<br /><br />Keep in mind that when you are adding the Certificate Services Role to your Windows 2008 server, that you need to specify the proper cryptographic service provider. The image below displays some of the options, what is important to remember here is that all the service providers that contain a hash sign ("#") are CNG providers and thus incompatible with Windows XP SP2/Windows 2003 and earlier clients.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXF1eSzG5-f7SGaDYbWApgnBVVcQxBEKWD_g8NOLBH0F0Wjxlb1VFpQiljYDMpR_9zKA-fiqrpSBDRw2PP3eLugNK7NtdZwoNVjwmG2Ik21HVRsve03lSbhsdqLsEB_ElBVG2l9LtwYnxt/s1600-h/20080430-ca2008.jpg"><img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiXF1eSzG5-f7SGaDYbWApgnBVVcQxBEKWD_g8NOLBH0F0Wjxlb1VFpQiljYDMpR_9zKA-fiqrpSBDRw2PP3eLugNK7NtdZwoNVjwmG2Ik21HVRsve03lSbhsdqLsEB_ElBVG2l9LtwYnxt/s320/20080430-ca2008.jpg" alt="" id="BLOGGER_PHOTO_ID_5195121518139457314" border="0" /></a><br />The default cryptographic service provider for Windows 2003 is the "Microsoft Strong Cryptographic Provider", so that is what you want to use. Notice how selecting this provider reduces the number of certificate signing options... SHA-2 algorithms are no longer included! Proceed as usual to end up with a CA that produces certificates that can be handled by legacy clients.Tim Jacobshttp://www.blogger.com/profile/06131387085752434985noreply@blogger.com14