Tim's Technical Thoughts: 2008

Monday, December 22, 2008

Counting ESX Server storage paths

At a customer, we have been hitting with one of the built-in storage limits of ESX Server: you can only present up to 1024 storage paths to a single ESX host. Depending on your SAN topology, each LUN that you present over a fiber fabric uses 4, 8 or even 16 storage paths. You can check this using the esxcfg-mpath command:

Disk vmhba1:9:2 /dev/sdf (102400MB) has 8 paths and policy of Fixed
FC 13:0.0 10000000c96e8972<->50001fe15009264e vmhba1:9:2 On active preferred
FC 13:0.0 10000000c96e8972<->50001fe15009264a vmhba1:10:2 On
FC 13:0.0 10000000c96e8972<->50001fe15009264c vmhba1:11:2 On
FC 13:0.0 10000000c96e8972<->50001fe150092648 vmhba1:12:2 On
FC 16:0.0 10000000c96e8ccc<->50001fe15009264f vmhba2:12:2 On
FC 16:0.0 10000000c96e8ccc<->50001fe15009264b vmhba2:13:2 On
FC 16:0.0 10000000c96e8ccc<->50001fe15009264d vmhba2:14:2 On
FC 16:0.0 10000000c96e8ccc<->50001fe150092649 vmhba2:15:2 On

To count the total number of paths presented to a single ESX host, you can use the following service console command:

esxcfg-mpath -l | grep paths | awk '{ split($0, array, "has "); split(array[2], array2, " paths"); SUM +=array2[1] } END { print SUM}'

Probably the awk syntax can be greatly shortened but I am no awk/grep/sed expert :). Nevertheless, you can script this command into a cron job such that you can receive reports on whether or not you are hitting this limit.

Sunday, November 30, 2008

App-V 4.5 Certificate Galore

1) Setting
This weekend I finally found some time to delve a bit deeper into properly configuring an App-V 4.5 infrastructure for large scale deployments. One of the first things that I investigated was the usage of RTSPS for smoother firewall tunneling: as you know, when using RTSP a series of ports is dynamically chosen, which means that you need to open up entire portranges in your firewall. This is not something your firewall guys will like if you work in a larger environment.

Going for RTSPS means you need to use a server public certificate and a corresponding private key in order to let the App-V server sign and encrypt its communications. I have blogged before about how to configure this in SoftGrid 4.1/4.2 -- luckily the procedure for configuring an SSL certificate got a lot simpler. At least, that is what I thought. Some issues I ran into that might save you some valuable troubleshooting time:

As always, when requesting a certificate from your Enterprise PKI, use the Virtual Application Server's FQDN as the subject. It is probably also a good idea to use the hostname as a subject alternate name for those people that still refer to servers by their shortnames.
After the App-V 4.5 Web Management Service has been installed, don't forget to configure the certificate for the IIS Default Website. In IIS7, that requires adding a binding & selecting the proper certificate. It is not clear to me why the App-V installer cannot handle this automatically!?
App-V 4.5 runs under the NETWORK SERVICE account by default and no longer under the SYSTEM account as SoftGrid 4.1/4.2 used to. This has some consequences when it comes to Windows PKI: you need to grant the NETWORK SERVICE account read permissions on the private key.

This later action is a lot harder than you think when reading them ;). Read on for more information.

2) Configuring permissions on private keys
You have three options to get this working:

If you are using a Windows 2008 Enterprise CA and are using your own certificate templates, then you can modify the template to automatically grant the NETWORK SERVICE account read permissions on all certificates issued using that template.

Since you will typically be creating a new certificate template for server deployment (to enable longer than 2 years validity & exporting of private keys), this is probably the easiest solution if you have a Windows Server 2008 Enterprise CA.
In a pre-Windows 2008 CA world, you will have to use the WinHTTPcertcfg.exe tool, the Windows HTTP Services Certificate Configuration tool. In our situation, we need to modify the ACL of the certificate to grant read access to the service account of the Management Service (which is the NETWORK SERVICE by default).

winhttpcertcfg -g -c LOCAL_MACHINE\My -s (subjectname) -a NetworkService

Verify that everything went ok by listing the permissions:

winhttpcertcfg –l –c LOCAL_MACHINE\My –s (subjectname)
It is also possible to explicitly set the permissions on the private key file. This information is based on information obtained from the App-V blog, with some corrections below.
- First, obtain the certificate thumbprint. You can find this in the details tab of the certificate:
  
  Copy/paste the thumbprint for the next commandline.
- Next, use the FindPrivateKey.exe utility to locate the private key file on disk (compiled version available here -- download & use untrusted executables from the internet at your own risk). Use the following syntax:
  
  FindPrivateKey.exe My LocalMachine -t "your thumbprint"
  
  This will give you the full path. Read the caveat message below if this path looks awkward.
- Grant the NETWORK SERVICE account read & execute permissions on the private key file.

CAVEAT:

the location of the private key should be in a publicly accessible location. For WinXP/Win2K3 the default is:

C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys

For W2K8/Vista, this changed to:

C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys

If you have a different location, then take actions to deplace the private key. I requested my certificate through the Web Enrollment pages of Active Directory Certificate Services on Windows 2008. This stores the public & private key in your user account's profile by default. I knew this and drog & dropped the public certificate from the "Certificates (My User)" to the "Certificates (My Computer)" MMC and when your private key was marked as exportable, this is indeed possible. However, this does not actually move the private key and leaves it in your user profile location (for example:

C:\Users\Administrator\AppData\Roaming\Microsoft\Crypto\RSA

). I fixed this by explicitly exporting the certificate & private key from my user account and then explicitly importing everything again. So huge warning for all you regular crypto-users: no more drag 'n dropping of public/private keypairs!

4) Conclusion
A bit messy... yet secure! The move towards the NETWORK SERVICE account for the App-V Management service (... and other Microsoft products as well) is obviously a good choice, yet it brings along some difficulties that probably can be streamlined from within the App-V Management Server's installer.

PS: You didn't forget to grant the NETWORK SERVICE account also read permissions on your content directory, since otherwise your streaming won't work?

Friday, November 21, 2008

VMware Tools without a reboot?

Every now and then, you see blogposts appearing on the "issue" that you need to reboot a guest operating system after you install or update the VMware Tools. Many people have pondered about whether a reboot is in fact really necessary and if it can be avoided all together. Recent posts about this can be read here and here, refering to this VMware community thread -- the question is still alive in multiple-year spanning threads like this one right here. I usually frown my eyebrowses when reading on these "no reboot" topics, yet I am interested in keeping up with the advancements in that subject for some of the large customers that I come in contact with professionaly.

The scripts and methods outlined in these blogposts sound a bit tricky at first if you ask me, and I feared they might not have the outcome you expected. I would think the VMware tools really require a reboot on some operating systems because you update parts of the virtual device drivers and those need to be reloaded by a reboot of the operating system (Note: strictly speaking you don't need a reboot for all types of device drivers, only under a specific set of circumstances documented by Microsoft. The VMware disk drivers host a boot device so that would fit under the "requires a reboot" category from that document). This means that just running the installer with a "Suppress Reboot" parameter on all your machines will place the new VMware Tools files on your harddisk, but will not actively load all of them... I am not sure if that is a state I would want my production virtual machines in!? And to be very clear: what these scripts do is request an automatic postpone of the reboot, not trigger some hidden functionality in VMware Tools not to really reboot after all!

To remove all suspicion, I did a little test on a Windows 2003 virtual machine and upgraded the tools from ESX 3.0.2 to ESX 3.5U2 without rebooting (using the commandline setup.exe /S /v"REBOOT=R /qb" on the VMware Tools ISO). This effectively updates the following services and drivers without rebooting:

VMware services (bumped from build 63195 to build 110268)
VMware SVGA II driver, VMware Pointing Device driver

It left the following drivers untouched:

VMware Virtual disk SCSI Disk Device ("dummy" harddisk driver - Microsoft driver)
NECVMWar VMware IDE CDR10 (virtual CD-ROM driver)
Intel Pro/1000 MT Network Connection (vmnet driver - Microsoft driver)
LSI Logic PCI-X Ultra320 SCSI Host Adapter (storage adapter - Microsoft driver)

It turned out that these drivers didn't require updating for my specific virtual machine (even after a reboot). In fact, I wasn't immediatelly able to find one machine in the test environment at work that required updating any bootdisk device drivers (and some still had 3.0.2 VMware Tools running!).

To conclude, I would say that in some circumstances it is safe to postpone the reboot of your virtual machine, if at minimum the boot disk device drivers are not touched. Postponing the reboot is very convenient if you use it in the context of a patch weekend where you want to postpone the restart to one big, single reboot at the end of all your patches.

Update: as Duncan Epping points out in a recent blogpost, be also advises that updating the network driver effectively drops all network connections. This is for all practical purposes maybe just as bad as actually rebooting your server, so beware with the "fake level of safety and comfort" that you might have by postponing a VMware Tools reboot!

Thursday, August 14, 2008

Matching LUN's between ESX hosts and a VCB proxy

One of the problems that I encountered at a customer was to discover what VMFS partitions were presented to a VCB proxy. It turned out to be a bit more complex than I had first expected.

Introduction
VMware released the VCB framework (VMware Consolidated Backup) to make a backups of a virtual machine. The VCB framework is typically installed on a Windows host (the VCB proxy), and in order to make SAN backups, you need to present both the source LUN, which contains the virtual machines to backup, and the destination LUN, where the backup files are stored, to that VCB proxy.

This setup is relatively simple to maintain in smaller environments. However, once you get in a big environment were a dozen teams are involved (separate networking teams, separate SAN teams, separate Windows teams and separate VMware teams), it can become quite challenging to find out which of the 12 LUN's that are presented to a Windows host in fact belong to a specific ESX host.

Finding unique identifiers for a LUN
The mission is to find a unique identifier (UID) that can be used both on the ESX host and the Windows box. The first two obvious candidates to uniquely identify a ESX managed LUN on a SAN network are:

The VMFS ID for the partition
Upon the initialization of a VMFS partition, it is assigned a unique identifier that can be found by looking in the /vmfs/volumes directory on an ESX host, or by using the esxcfg-vmhbadevs -m command on the ESX host. The output looks like this:

vmhba1:0:2:1 /dev/sdb1 48858dc4-f4e218d1-d3a8-001cc497e630
vmhba1:4:1:1 /dev/sdc1 483cf914-29b60dc5-dbfd-001cc497e630
vmhba1:4:2:1 /dev/sdd1 479da7c1-4494cd90-d327-001cc497e630

The first disk is the (remainder) of the locally attached storage, and the two other disks are presented from the SAN. The first column indicates that HBA 1, SCSI target 4 and LUN's 1 and 2 are used (and partition 1 on each LUN); the second column lists the Linux device name under the Service Console and the third column lists the VMFS ID.
The WWPN (World Wide Port Name) of the disk on the SAN
On a fiber-channel SAN network, each device is assigned a unique identifier called the WWPN. You can compare the WWPN as performing the same function as a MAC address on an Ethernet network. The WWPN's of the disks that are presented to an ESX host can be obtained from the Service Console using the esxcfg-mpath -l command:

Disk vmhba1:4:1 /dev/sdc (256000MB) has 16 paths and policy of Fixed
FC 13:0.0 10000000c96e8972<->500507630308060b vmhba1:4:1 On
FC 13:0.0 10000000c96e8972<->500507630313060b vmhba1:5:1 On
FC 13:0.0 10000000c96e8972<->500507630303060b vmhba1:6:1 On active preferred
FC 13:0.0 10000000c96e8972<->500507630303860b vmhba1:7:1 On
FC 13:0.0 10000000c96e8972<->500507630308860b vmhba1:8:1 On
FC 13:0.0 10000000c96e8972<->500507630313860b vmhba1:9:1 On
FC 13:0.0 10000000c96e8972<->500507630318060b vmhba1:10:1 On
FC 13:0.0 10000000c96e8972<->500507630318860b vmhba1:11:1 On
FC 16:0.0 10000000c96e8ccc<->500507630303460b vmhba2:4:1 On
FC 16:0.0 10000000c96e8ccc<->500507630308460b vmhba2:5:1 On
FC 16:0.0 10000000c96e8ccc<->500507630313460b vmhba2:6:1 On
FC 16:0.0 10000000c96e8ccc<->500507630303c60b vmhba2:7:1 On
FC 16:0.0 10000000c96e8ccc<->500507630308c60b vmhba2:8:1 On
FC 16:0.0 10000000c96e8ccc<->500507630313c60b vmhba2:9:1 On
FC 16:0.0 10000000c96e8ccc<->500507630318460b vmhba2:10:1 On
FC 16:0.0 10000000c96e8ccc<->500507630318c60b vmhba2:11:1 On

Disk vmhba1:4:2 /dev/sdd (256000MB) has 16 paths and policy of Fixed
FC 13:0.0 10000000c96e8972<->500507630308060b vmhba1:4:2 On
FC 13:0.0 10000000c96e8972<->500507630313060b vmhba1:5:2 On
FC 13:0.0 10000000c96e8972<->500507630303060b vmhba1:6:2 On
FC 13:0.0 10000000c96e8972<->500507630303860b vmhba1:7:2 On
FC 13:0.0 10000000c96e8972<->500507630308860b vmhba1:8:2 On
FC 13:0.0 10000000c96e8972<->500507630313860b vmhba1:9:2 On
FC 13:0.0 10000000c96e8972<->500507630318060b vmhba1:10:2 On
FC 13:0.0 10000000c96e8972<->500507630318860b vmhba1:11:2 On
FC 16:0.0 10000000c96e8ccc<->500507630303460b vmhba2:4:2 On
FC 16:0.0 10000000c96e8ccc<->500507630308460b vmhba2:5:2 On active preferred
FC 16:0.0 10000000c96e8ccc<->500507630313460b vmhba2:6:2 On
FC 16:0.0 10000000c96e8ccc<->500507630303c60b vmhba2:7:2 On
FC 16:0.0 10000000c96e8ccc<->500507630308c60b vmhba2:8:2 On
FC 16:0.0 10000000c96e8ccc<->500507630313c60b vmhba2:9:2 On
FC 16:0.0 10000000c96e8ccc<->500507630318460b vmhba2:10:2 On
FC 16:0.0 10000000c96e8ccc<->500507630318c60b vmhba2:11:2 On

In this output, you can see two HBA's (that have WWPN's 10000000c96e8972 and 10000000c96e8ccc) that see two LUN's vmhba1:4:1 and vmhba1:4:2 that are presented over 16 paths.

On the VCB proxy / Windows box, I used the Emulex HBAnywhere utility to retrieve the WWPN's of the LUN's that were presented. The output is shown in the following screenshot:

It is also possible to use the HbaCmd.exe AllNodeInfo command to retrieve a list of all WWPN's that a certain HBA sees.

Looks nice, what's the problem?
Using the WWPN seemed to be the obvious answer to identifying the LUN's on both the ESX host and the VCB proxy. Until I discovered that two different LUN's where presented using the same WWPN (obviously they were on two different SAN's and presented to two different hosts). On one of our ESX hosts, a 256 GB LUN was presented using WWPN 50:05:07:63:03:08:06:0b, and on the VCB proxy, a 500 GB LUN was presented using that same WWPN -- apparently our SAN team recycles the WWPN's on the different fibre channel fabrics.

To make matters even worse, I noticed that the same LUN was presented using one WWPN to an ESX host, and with another WWPN to the VCB proxy (I am no SAN expert myself but I assume it is possible to present the same LUN in different SAN zones using different WWPN's). I was able to verify this since VCB was able to do a SAN backup of a virtual machine that resides on a LUN with a WWPN on the ESX side that is not presented to the VCB proxy.

The next step: VMFS ID's as a unique identifier
So, if you cannot rely on the WWPN's to uniquely identify a LUN on a host that is connected to multiple SAN's, then surely VCB must use the VMFS ID to know what LUN to read the virtual machine data from? Right?

On the VCB proxy & Windows machine, I tried to discover the VMFS ID's using the vcbSanDbg.exe tool (included in the VCB framework and available as a separate download from the VMware website -- careful, the separate download is an older version than the one included in the VCB 1.5 framework). An excerpt from its lengthy output:

C:\Program Files\VCB>vcbSanDbg | findstr "ID: NAA: volume"
[info] Found logical volume 48761b97-a4f562bd-6875-0017085d.
[info] Found logical volume 48761bc5-3f508baa-2f5d-0017085d.
[info] Found logical volume 483cf913-05b4f526-45b5-001cc497.
[info] Found logical volume 479da7ac-55fe7dfe-378c-001cc497.
[info] Found logical volume 477c2b4a-7db36616-30ea-001cc495.
[info] Found logical volume 48843bec-154cf784-871a-001cc495.
[info] Found SCSI Device: NAA:600508b10010443953555534314200044c4f47494341
[info] Found SCSI Device: NAA:60060e801525180000012518000000374f50454e2d56
[info] Found SCSI Device: NAA:600508b4000901eb0001100003230000485356323130
[info] ID: LVID:48761b97-dacedf9f-ebb9-0017085d0f91/48761b97-a4f562bd-6875-0017085d0f91/1
Name: 48761b97-a4f562bd-6875-0017085d
[info] Found SCSI Device: NAA:600508b4000901eb0001100003260000485356323130
[info] ID: LVID:48761bc6-7b4afa63-97d9-0017085d0f91/48761bc5-3f508baa-2f5d-0017085d0f91/1
Name: 48761bc5-3f508baa-2f5d-0017085d
[info] Found SCSI Device: NAA:6005076303ffc60b0000000000001049323130373930
[info] ID: LVID:483cf913-458f9fa5-a749-001cc497e630/483cf913-05b4f526-45b5-001cc497e630/1
Name: 483cf913-05b4f526-45b5-001cc497
[info] Found SCSI Device: NAA:6005076303ffc60b000000000000104a323130373930
[info] ID: LVID:479da7b6-877867e9-dd06-001cc497e630/479da7ac-55fe7dfe-378c-001cc497e630/1
Name: 479da7ac-55fe7dfe-378c-001cc497
[info] Found SCSI Device: NAA:6005076303ffc403000000000000128d323130373930
[info] ID: LVID:477c2b4a-969e01e0-8d49-001cc495fb46/477c2b4a-7db36616-30ea-001cc495fb46/1
Name: 477c2b4a-7db36616-30ea-001cc495
[info] Found SCSI Device: NAA:6005076303ffc403000000000000128e323130373930
[info] Found SCSI Device: NAA:600508b40006e8890000b000010a0000485356323130
[info] Found SCSI Device: NAA:600508b40006e8890000b00003770000485356323130
[info] ID: LVID:48843bec-28cc17a4-ca9e-001cc495fb46/48843bec-154cf784-871a-001cc495fb46/1
Name: 48843bec-154cf784-871a-001cc495

Unfortunately, I was not able to discover the VMFS ID's I saw on the ESX host in this output, even though there are some resemblances:

ESX host VMFS ID 483cf914-29b60dc5-dbfd-001cc497e630 looks a lot like vcbSanDbg.exe output's logical volume 483cf913-05b4f526-45b5-001cc497.
ESX host VMFS ID 479da7c1-4494cd90-d327-001cc497e630 looks a lot like vcbSanDbg.exe output's logical volume 479da7ac-55fe7dfe-378c-001cc497.

Furthermore, I found out that current versions of VCB do not rely on the VMFS ID to discover virtual machines on a LUN. In Andy Tucker's talk "VMware Consolidated Backup: today and tomorrow" at VMworld 2007, it is clearly stated (slide 19) that there...

No “VMFS Driver for Windows” on proxy

And furthermore that the usage of VMFS signatures is on the "todo" list for identifying LUNs on the SAN network (slide 34).

Other ideas?
So where does one turn when all possible solutions seem to lead to a dead end? Right: the VMware community forums. The answer came in this thread by snapper.

What I learned today is that besides the WWPN on a fiber channel network, there is another unique identifier called the NAA (Network Address Authority) to identify devices on the FC fabric. You can obtain the NAA for the LUN's on an ESX host using the esxcfg-mpath command in verbose mode using:

esxcfg-mpath -lv | grep ^Disk | grep -v vmhba0 | awk '{print $3,$5,$2}' | cut -b15-

The output on our ESX host looks much like this:

6005076303ffc60b0000000000001049323130373930 (256000MB) vmhba1:4:1
6005076303ffc60b000000000000104a323130373930 (256000MB) vmhba1:4:2

The NAA can be seen in the vcbSanDbg.exe output shown above, and can be filtered as follows:

vcbSanDbg.exe | findstr "NAA:"

The output should look like this:

C:\Program Files\VCB>vcbSanDbg | findstr "NAA:"

[info] Found SCSI Device: NAA:600508b10010443953555534314200044c4f47494341
[info] Found SCSI Device: NAA:60060e801525180000012518000000374f50454e2d56
[info] Found SCSI Device: NAA:600508b4000901eb0001100003230000485356323130
[info] Found SCSI Device: NAA:600508b4000901eb0001100003260000485356323130
[info] Found SCSI Device: NAA:6005076303ffc60b0000000000001049323130373930
[info] Found SCSI Device: NAA:6005076303ffc60b000000000000104a323130373930
[info] Found SCSI Device: NAA:6005076303ffc403000000000000128d323130373930
[info] Found SCSI Device: NAA:6005076303ffc403000000000000128e323130373930
[info] Found SCSI Device: NAA:600508b40006e8890000b000010a0000485356323130
[info] Found SCSI Device: NAA:600508b40006e8890000b00003770000485356323130

Et voila, now I can start running the esxcfg-mpath command on all our ESX hosts and start matching these NAA's with those in the output of vcbSanDbg to discover what our Windows VCB proxy has access to.

Tuesday, August 12, 2008

VMWare D-Day: 12/08/2008

I recon "12 August 2008" will be long remembered by all VMWare enthousiasts out there.

That is the day that a major bug caused ESX 3.5 Update 2 no longer to recognise any license, even if the license file at your license server was perfectly valid. There is no need to sketch the horror that follows when your ESX clusters no longer detect a valid license: Vmotion fails, DRS fails, HA fails, powering on virtual machines is no longer possible... Ironically, today is also Microsoft's Patch Tuesday of August, which probably means that quite some system admininistrators where caught with their pants down (and their VM's powered off during a scheduled maintenance window) when this bug struck.

The symptoms and errors that we have been experiencing are the following:

Unable to VMotion a host from ESX 3.0.2 to ESX 3.5. The VMotion progresses until 10% and then aborts with error messages such as "operation timed out" or "internal system error".
HA agent getting completely confused (unable to install, reconfigure for HA does not work).
Unable to power on new machines:

[2008-08-12 14:11:16.022 'Vmsvc' 121330608 info] Failed to do Power Op: Error: Internal error
[2008-08-12 14:11:16.065 'vm:/vmfs/volumes/48858dc4-f4e218d1-d3a8-001cc497e630/HOSTNAME/HOSTNAME.vmx' 121330608 warning] Failed operation
[2008-08-12 14:11:16.066 'ha-eventmgr' 121330608 info] Event 15 : Failed to power on HOSTNAME on esx.test.local in ha-datacenter: A general system error occurred

VMWare is promising a patch tomorrow, but several forum posts (here and here) are wondering how this patch will be distributed and -- given the deep integration of the licensing components within ESX -- whether this will require a reboot of the ESX host or not (which can be quite problematic if you cannot VMotion machines away). A possible workaround for this issue is to introduce a 3.0.2 host in the cluster as I have seen in our environment that VMotioning from 3.5 to 3.0.2 still works.

Edit (21:20 PM): hopes are up that VMware should be able to release a patch that doesn't require the ESX host to reboot. See what Toni Verbeiren has to say about it on his blog.

Edit (9:00 AM 13 AUG): a patch has been released by VMware. Regarding whether hosts need to be rebooted or not... there is good news and there is bad news: "to apply the patches, no reboot of ESX/ESXi hosts is required. One can VMotion off running VMs, apply the patches and VMotion the VMs back. If VMotion capability is not available, VMs need to be powered off before the patches are applied and powered back on afterwards."

You can follow the developing crisis at the following sources:

Even our dear friends at Microsoft write about the problem, see the blogpost "It's rude to laugh at other people's misfortunes - even VMware's" here.

Friday, August 8, 2008

WM6 and self-signed certificates

When playing around with a new (unofficial) WM6.1 rom for my Mio A701, I bumped into a well known problem with installing self-signed certificates on (homebrew?) WM6 ROMs: it is not possible to install a new CA certificate with the error message "The certificate was not successfully added; please restart your device and try again". Obviously, restarting the device did not fix the problem.

A few months ago, I already encountered the problem and I knew you could bypass it by importing the certificate directly into the mobile device's registry. However, the procedures that I read all involved:

flashing Windows Mobile 5 (or a WM6 version that was patched to accept any certificate),
importing the certificate in that temporary ROM,
exporting the relevant registry data,
reflashing back to the rom that has the certificate problem,
importing the certificate through the registry file you obtained earlier in step 3.

As you can imagine, this is quite some work and since I am a lazy person by nature, I did not want to go back to WM5 after just having flashed my Mio to a brandnew and shiny WM6. Therefore, I decided to develop a shorter workaround that doesn't involve reflashing.

The tricky part is that you need to create the proper registry file to import. This file looks like:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\824AF72AB87E17AC777098A4164D7A90C90C0D69]
"Blob"=hex:19,00,00,00,01,00,00,00,10,00,00,00,4f,e5,c4,01,4e,7d,89,4a,da,42,\
3f,f7,24,0f,7f,a2,19,00,00,00,01,00,00,00,10,00,00,00,cb,bc,40,37,8a,45,2c,\
...

(please disregard the unintentional wrapping of the registry location; everything between the square brackets should be on one line).

The difficult part is converting your self-signed certificate to the proper registry format. Here's how I did that:

On a regular PC, use Internet Explorer to go to a website with the certificate that you want to install on your mobile device (typically this will be Outlook Web Access or something). Open the certificate and install it on your local PC (let the certificate import wizard automatically place the certificate in whatever store it finds necessary).
View the certificate (in Internet Explorer or by using the Certificate MMC) and go to the "Details" tab. There you will find the "Thumbprint" of the algorithm. You will need to look up this number in a few moments, so be sure to remember the first few digits. In the case for the company I work for, the thumbprint is "824af72ab8somethingsomething".
Open your registry editor and go to the following location:

HKEY_CURRENT_USER\Software\Microsoft\SystemCertificates\Root\Certificates\

There should be a registry key that has the thumbprint of your certificate as its name:

Rightclick that registry key and click "Export...". Choose a location for the exported registry data.
Next, open the registry export in Notepad. Replace the registry key location (between the square brackets) to HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\ followed by the thumbprint. Next, replace the first 12 bytes in the "Blob" registry value by: hex:19,00,00,00,01,00,00,00,10,00,00,00.
Your result should look like this:
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\824AF72AB87E17AC777098A4164D7A90C90C0D69]
"Blob"=hex:19,00,00,00,01,00,00,00,10,00,00,00,4f,e5,c4,01,4e,7d,89,4a,da,42,\
3f,f7,24,0f,7f,a2,19,00,00,00,01,00,00,00,10,00,00,00,cb,bc,40,37,8a,45,2c,\
...
Compare this with the original registry export that I have shown above, the differences are shown in bold.
Save the registry file, copy it to your mobile device and import it there. Voila! Finished!

You can use the "Certificates" control panel to verify that your certificate is properly recognized!

Note: you must either restart the ActiveSync process on your device because it will not immediately recognize the new certificate; you can kill the ActiveSync process or restart your device (but first wait at least a few minutes such that Windows Mobile can commit your registry changes to memory!).

Obviously, this is completely not supported or endorsed by anybody on this planet. Perform these actions at your own risk and be sure you know what to do in case you brick your device!

Tuesday, July 29, 2008

Full backups of virtual machines and Windows VSS

Introduction
One of the new features that is appearing in backup products that take backups of an entire virtual machine, as opposed to using an agent inside the guest operating system, is the ability to cooperate with Windows VSS (Volume Snapshot Service) inside the guest. For example, the recently released version of VMWare's Consolidated Backup 1.5, now supports VSS quiescing for Windows 2003, Windows Vista, Windows 2008; vizioncore's vRanger Pro backup utility has been supporting VSS for Windows 2003 for some versions already.

Several opinions exist on whether this is in fact a useful feature or not; for example, not so long ago the developers of esXpress talked about not including VSS quiescing into their product at that time because it adds additional complexity and does not offer any significant benefits in their opinion (see here). This discussion is still alive as you can see for example here, and the big question is indeed: can you rely on live backups of database virtual machines?

The early days of VSS
The root of the discussion is at the intended use of VSS: on a physical machine that is running a database application such as SQL Server, Exchange or even Active Directory or a DHCP server for that matter, you cannot directly read the database files since they are exclusively locked by the database application. This used to be particularly troublesome because the only way to get a backup of the data inside such a database is to use some sort of export function that had to be programmed into the database application (think of the BACKUP TSQL command or a brick-level backup of an Exchange server).

Microsoft tackled this problem by introducing VSS, which presents a fully readable point-in-time snapshot of a filesystem to the (backup) application that initiates the snapshot. That way, a backup application can read the database file contents and put it away safely in case it is ever needed.

However, there are two problems when reading files from a filesystem that is "frozen" in time:

a file can be in progress of being written (i.e. only 400 bytes of a 512-byte block are filled with actual data).
data still in a filesystem cache or buffer in memory and not yet written to the disk (in the filesystem journal).

On top of the filesystem issues, there are two problems when reading a database that is still in use but "frozen" purely at a filesystem level:

at the time of the snapshot, a transaction could still be in progress. This can be an issue when the transaction is not supposed to be committed to the database at the end: as you know, a database query can initiate thousands of changes and perform a ROLLBACK at the end to reset any changes made since the start of the transaction.

A good (ficteous) example here is when you try to draw 1000 euros in cash from an ATM: if you change your mind right before clicking the "confirm transaction" button on the ATM screen, then you don't want your 1000 euros to be really gone if at the same time a database snapshot is taken and your final "ROLLBACK" command is not included in the database!
some data could still be in memory and not written to a logfile or a database file (so-called "dirty pages").

Crash consistency versus transactional consistency
If you don't take these four problems into account, then restoring a snapshot of such a filesystem would be in fact the same as bringing back up the server after you suddenly pulled the power plug. Such a snapshot is said to be in a crash-consistent state, i.e. the same state as a sudden power-loss.

Modern filesystems have built-in mechanisms (so-called "journalling") to tackle these problems and to ensure that when such a "frozen" filesystem is restored from a backup, the open files are put back in a consistent state as possible. Obviously, any data that only existed in memory and never was written to a filesystem journal/disk is lost. Databases rely on transaction logging to recover from a crash-consistent state back to a consistent database; this is typically done by simply rolling back all unfinished transactions, effectively ignoring all transactions that were not committed or rolled back.

Windows VSS wants to go beyond a crash-consistent snapshot and solves both the filesystem and database problem by not only freezing all I/O to the filesystem but also asking both the filesystem and all applications to flush its dirty data to disk. This allows the creation of both a filesystem consistent and an application-consistent backup. VSS has built-in support for several Windows-native technologies such as NTFS filesystems, Active Directory databases, DNS databases, ... to flush their data to disk before the snapshot is presented to the backup application requesting the snapshot. Other programs, such as SQL/Oracle databases or Exchange mailservers, use "VSS Writer" plugins to get notified when a VSS snapshot is taken and when they have to flush their dirty database pages to disk to bring the database in a transactionally consistent state.

From Technet:

[...] If an application has no writer, the shadow copy will still occur and all of the data, in whatever form it is in at the time of the copy, will be included in the shadow copy. This means that there might be inconsistent data that is now contained in the shadow copy. This data inconsistency is caused by incomplete writes, data buffered in the application that is not written, or open files that are in the middle of a write operation. Even though the file system flushes all buffers prior to creating a shadow copy, the data on the disk can only be guaranteed to be crash-consistent if the application has completed all transactions and has written all of the data to the disk. (Data on disk is “crash-consistent” if it is the same as it would be after a system failure or power outage.). [...] All files that were open will still exist, but are not guaranteed to be free of incomplete I/O operations or data corruption.

Under this design, the responsibility for data consistency has been shifted from the requestor application to the production application. The advantage of this approach is that application developers — those most knowledgeable about their applications — can ensure, through development of their own writers, the maximum effectiveness of the shadow copy creation process.

Conclusions for the physical world: the above makes clear that there is a huge benefit in using VSS when working on physical machines: VSS is a requirement to be able to backup the entire database files and to ensure that the database is not in an inconstent state when you want to do the restore the database- and logfiles and attempt to mount them. The main advantage here is that a restored database does not have to go through a series of consistency checks that typically take up many, many hours.

Going to the virtual world
In the virtual world, there are several different types of backups that can be performed:

Performing the backup inside the guest OS.
Performing a backup of the harddisk files (VHD/VMDK) when using a virtualization product that is hosted on another operating system, such as Microsoft Virtual Server or VMWare Workstation/Server.
Performing a backup of the harddisk files (VHD/VMDK) when using a bare-metal hypervisor based product such as Microsoft Hyper-V or VMWare's ESX/ESXi Server.

Obviously, when you perform the backup inside the guest OS, you still encounter the same problems as when attempting to back up a physical host: open files and database files are locked and thus cannot be backed up directly, so you have to revert to using VSS for the reasons discussed above.

But what about the other two ways of performing a virtual machine backup, when attempting to back up the entire harddisk file? For starters, it is important to realize that "file locking" now occurs at two levels:

The VHD/VMDK harddisk files themselves are opened and locked by the virtualization software (be it the hypervisor for bare-metal virtualization or the executable when using hosted virtualization);
Files can be opened and locked inside in the guest operating system.

The first issue of the open VHD/VMDK harddisk files is solved depending on the virtualization product: if you are using host-based virtualization, you can obtain a readable VHD/VMDK file by using VSS on the host operating system and asking to present an application-consistent variant of the VHD/VMDK files. If you are using a bare-metal hypervisor, a typical mechanism is by taking a snapshot of a virtual machine (which, for example in VMWare ESX, shifts the file lock from the VMDK file to the snapshot delta file, thus releasing the VMDK file for reading).

Open files inside the guest OS
Ironically, the solution of the first problem of open VHD/VMDK host files introduces the second problem of open files inside the guest os: once you have your snapshot of the VHD/VMDK files (be it through VSS for host-based virtualization or a VM snapshot for bare-metal hypervisors)... that snapshot is only in a crash-consistent state! After all, it is a point-in-time "freeze" of the entire harddisk and restoring such an image file would be equivalent to restarting the server after a total powerloss occured.

VMWare attempted to tackle this problem by introducing a "filesystem sync driver" in their VMTools (which you are supposed to install in every virtual machine running on a VMWare product). This filesystem sync driver mimics VSS in the sense that it requests that the filesystem flushes its buffer to disk, guaranteeing that the snapshot -- and thus corresponding full virtual machine backup -- is in a filesystem consistent state. Obviously, this does not solve the problem for databases which tend to react quite violently to these kind of non-VSS "freezes" of the filesystem. Prototype horror stories can be read here (AD) and here (Exchange).

So what are the real solutions for this problem? I can think of two at this moment:

After taking a snapshot, do not only backup the disks but also the memory. Then, when restoring the backup, do not "power on" the virtual machine but instead "resume" it. At first, the machine will probably be "shocked" to see that the time has lept forward and that many TCP/IP connections are suddenly being dropped, but the database server you are running should be able to handle this and properly commit any unsaved data from memory to disk.
Trigger a VSS operation inside the guest OS to commit all changes to disk and ensure filesystem- and applicationlevel consistency, and only then take the full virtual machine snapshot.

The VSS interaction with the guest operating system was first introduced by vizionCore in their vRanger Pro 3.2.0 -- which required the installation of an additional service inside the guest VM, .NET 2.0 and was only officially supported for Windows 2003 SP1+ in 32bit. With the release of VMWare Consolidated Backup 1.5, VMWare announced the default queiscing of disks on ESX 3.5 Update 2 would now be done using the new VSS driver -- supported on Windows 2003/2008/Vista in both 32 & 64-bit variants. Hurray! Problem solved, right?

So VSS seems nice, but is it necessary?
Obviously, your gut feeling will tell you that it is "nicer" and "more gentle" to the guest virtual machine when using VSS when taking a snapshot and a backup. The arguments on the difference between crash-consistency, filesystem consistency and application-level consistency (which translates to transactional consistency for databases) give solid grounds to this gut feeling.

Personally, I cannot find an argument that states that VSS is also really necessary to create a full virtual machine backup. In the physical world, filesystems and databases have been hardened to recover from the crash-consistent state that you obtain when taking a snapshot of a running virtual machine to back up and restore. Hands-on experience about this robustness can be read on several informal channels such as forum posts here.

However, if you want to be sure that your database is in a consistent state (for a faster recovery) and certainty that those few seconds of data that were not yet committed from memory to disk are in fact included in your snapshot, then VSS is what you need. The next question to answer is: what is the risk of VSS messing up and is this probability larger than not being able to restore a non-VSS-based snapshot?

Conclusion
Performing live backups of virtual machines seems like an interesting and simple feature of virtualisation at first. However, at a second glance, there are some important decisions to be made regarding the use of VSS/snapshotting technology that can impact your restore strategy and success. Even without any quiescing mechanism, the operating system should be able to handle the crash-consistent backups that are taken by performing live machine backups and should therefore be sufficiently reliable. With the ready availability of VSS in the new VMWare Tools that come with ESX 3.5 Update 2, much more than crash-consistent backups can be guaranteed without the need to install additional agents. The increased reliability and faster restore time (no filesystem/database consistency checks) that come with VSS quiesced snapshots make full virtual machine backups now a fully mature solution without the need to worry for possibly inconsistent backups.

Side remarks
Some additional remarks regarding full virtual machine backup:

Full VM backups can be an addition to guest-based file level backups, but they can never be a complete replacement:
- you might take a full VM based snapshot of your Exchange or SQL database every day, but a filebased/bricklevel backup (which is far more convenient to use for your typical single file/single mailbox restore operations) might be taken several times a day, depending on the SLA that your IT department has with the rest of the company.
- a full vm backup is a good place to start a full server recovery. It is a bad place to start a single-file or a single mailbox restore.
- a full VM backups using VSS do not allow the backup of SQL transaction logs (see "what is not supported" in the SQL VSS Writer overview), nor do they commit transaction logs to the database in order to clear up the transaction logs (an absolute necessity for Exchange databases or for several types of SQL databases).
Microsoft does not support any form of snapshotting technology on domain controllers. For more information, see MSKB 888794 on "Considerations when hosting Active Directory domain controller in virtual hosting environments".

Edit (12 Aug 2008): VeeAm has released a very interesting whitepaper that discusses not only the necessity for VSS awareness during the backup process, but also during the restore process. They give the example of a domain controller that performs USN rollbacks when being backed up using VSS but not restored using a VSS aware software. Another nice example is Exchange 2003 that requires VSS aware restore software in order to be supported by Microsoft.

Postscriptum: I started writing this article a few days before VCB 1.5 was released, and the original point I was trying to make at that time was that there were too many disadvantages to the available VSS implementations (yet another service to install, .NET 2.0, very limited OS support) to really profit from the benefits that VSS could offer. Of course, in the meantime, VMWare has taken away most of those objections by including VSS support in their VMTools for a wide range of server operating systems. This forced me to reconsider my view on whether VSS would be a good idea or not.

Thursday, June 19, 2008

SQL Server 2005 Express Edition on Windows 2008 x64

While experimenting with the Microsoft App-V 4.5 Release Candidate (more on that soon), I decided to go for a full-blown installation on Windows 2008 x64. Since this is only on my home network, I don't run a dedicated SQL server so I went for the natural choice of installing SQL Server 2005 Express Edition SP2 on my freshly installed Windows 2008 x64 App-V server.

This turned out to be less trivial than I thought. The short answer is: if you want to have a painless install of SQL Server 2005 Express Edition, take the download that includes the “Advanced Services” and simply don’t install them. The “smaller” download package does not include some necessary files for a successful x64 installation.

If you want to go the hard way and patch the setup for easier automated deployment (or just to be ‘1337 and be able to say that you fixed Microsoft’s SQL Server installer for 64-bit systems…), then follow these steps:

First of all, you should know that SP2 is the first Vista/Windows 2008 certified edition (think UAC, think session zero hardening, think enhanced security). Secondly, SQL Server 2005 Express Edition SP2 is supported to run under WOW64. That is very comforting to know, and I hadn't expected a true 64-bit edition for free. So why does it complain about installing a 32-bit version on a 64-bit machine then?

"The installation package has a missing file, or you are running a 32-bit only Setup program on a 64-bit computer"

Of course, what you don't see is that SQL is first installing the SQL Native Client in the background (as a prerequisite) and the error message conveniently forgets to mention that this is in fact the installation that is not succeeding. The error message was indeed accurate, but the error was not that I was trying to run a 32-bit installer on a 64-bit machine, but that the 64-bit installer for the SQL Native Client is not included in the package! What’s even worse, some other essential x64 packages are also not included in the smallest SQL Express 2005 SP2 download.
So you have to include the missing files manually:
1. Download the “SQL Server 2005 Express Edition SP2 with Advanced Services” package.
2. Run both the SQL Express installers with the /X switch to extract the setup files (to different directories):
  
  sqlexpr.exe /x
  sqlexpr_adv.exe /x
3. Next, locate the 64-bit SQL Native Client (sqlncli_x64.msi) and 64-bit SQL VSS Writer (SqlWriter_x64.msi) from the Advanced Services setup and copy them to the "Setup" directory of the regular SQL Express installation.

Et voila! The installer works now. One day, we will live in a perfect world of unambiguous error messages...

Now off to do some more SoftGri... ehr.. I mean Microsoft Application Vir... ehr... I mean App-V testing!

Sunday, May 25, 2008

Installing LSI Logic RAID monitoring tools under the ESX service console

As I discussed in a recent post, I used a Dell Perc 5i SAS controller in my ESX whitebox server. One of the nice features of this controller is that it is a rebranded LSI Logic controller (with a different board layout!), supported by LSI Logic firmwares and the excellent monitoring tools that LSI offers.

Of course, it is important to keep track of your RAID array status, so I decided to install the MegaCLI monitoring software under the ESX Server 3.5 Service Console. Here's how I did it and configured the monitoring on my system:

The MegaCLI software can be downloaded from the LSI Logic website. I used version 1.01.39 for Linux, which comes in a RPM file.
After uploading the RPM file to the service console, it was a matter of installing it using the "rpm" command:

rpm -i -v MegaCli-1.01.39-0.i386.rpm

This installs the "MegaCli" and "MegaCli64" commands in the /opt/MegaRAID/MegaCli/ directory of the service console.

That's it, MegaCLI is ready to be used now. Some useful commands are the following:

/opt/MegaRAID/MegaCli/MegaCli -AdpAllInfo -aALL
This lists the adapter information for all LSI Logic adapters found in your system.
/opt/MegaRAID/MegaCli/MegaCli -LDInfo -LALL -aALL
This lists the logical drives for all LSI Logic adapters found in your system. The "State" should be set to "optimal" in order to have a fully operational array.
/opt/MegaRAID/MegaCli/MegaCli -PDList -aALL
This lists all the physical drives for the adapters in your system; the "Firmware state" indicates whether the drive is online or not.

The next step is to automate the analysis of the drive status and to alert when things go bad. To do this, I added an hourly cron job that lists the physical drives and then analyzes the output of the MegaCLI command.

I created a file called "analysis.awk" in the /opt/MegaRAID/MegaCLI directory with the following contents:

# This is a little AWK program that interprets MegaCLI output

/Device Id/ { counter += 1; device[counter] = $3 }
/Firmware state/ { state_drive[counter] = $3 }
/Inquiry/ { name_drive[counter] = $3 " " $4 " " $5 " " $6 }
END {
for (i=1; i<=counter; i+=1) printf ( "Device %02d (%s) status is: %s <br/>\n", device[i], name_drive[i], state_drive[i]); }
This awk program processes the output of MegaCli, as you can test by running the following command:

./MegaCli -PDList -aALL | awk -f analysis.awk

when being in the /opt/MegaRAID/MegaCLI directory.
Then I created the cron job by placing a file called raidstatus in /etc/cron.hourly, with the following contents:

#!/bin/sh

/opt/MegaRAID/MegaCli/MegaCli -PdList -aALL| awk -f /opt/MegaRAID/MegaCli/analysis.awk >/tmp/megarc.raidstatus

if grep -qEv "*: Online" /tmp/megarc.raidstatus
then
/usr/local/bin/smtp_send.pl -t tim@pretnet.local -s "Warning: RAID status no longer optimal" -f esx@pretnet.local -m "`cat /tmp/megarc.raidstatus`" -r exchange.pretnet.local
fi

rm -f /tmp/megarc.raidstatus
exit 0

Don't forget to run a chmod a+x /etc/cron.hourly/raidstatus in order to make the file executable by all users.

In order to send an e-mail when things go wrong, I used the SMTP_Send Perl script smtp_send.pl that was discussed by Duncan Epping on his blog.

Thursday, May 22, 2008

Renaming a VirtualCenter 2.5 server

After running my VirtualCenter server on a standalone host for quite some time, I decided to join it into the domain that I am running on my ESX box (in order to let it participate in the automated WSUS patching mechanism). This also seemed like a perfect opportunity to rename the server's hostname from W2K3-VC.pretnet.local to virtualcenter.pretnet.local. However, after the hostname change, the VMWare VirtualCenter service would no longer start with an Event ID 1000 in the eventlog.

Somehow, this didn't come as a surprise ;). This has been discussed before on the VMWare forums (here and here), but I post it here because I did not immediatelly find a step-by-step walkthrough.

The problem was in fact twofold, the solution rather simple:

Renaming SQL servers is a bad idea in general (so it appears). For my small, nonproduction environment, I use SQL Server 2005 Express edition that comes with the VirtualCenter installation. If you rename a SQL server, you need to internally update the system tables using a set of stored procedures in order to make everything consist again. This is done by installing the "SQL Server Management Studio Express" and then executing the following TSQL statements:

sp_dropserver 'W2K3-VC\SQLEXP_VIM'
GO
sp_addserver 'VIRTUALCENTER\SQLEXP_VIM', local
GO
sp_helpserver
SELECT @@SERVERNAME, SERVERPROPERTY('ServerName')

The first statement removes the old server instance (replace W2K3-VC with your old server name), the second statement adds the new server instance (replace VIRTUALCENTER with your new server name). The sp_helper and SELECT statement query the internal database and variables for the actually recognized SQL server instances. You need to perform a reboot in order to get the proper instances with the last two statements.
Secondly, the System ODBC connection that is used by VMWare required an update to point to the new SQL Server instance. This was of course done using the familiar "Data Sources (ODBC)" management console.

Afterwards, the VMWare Virtual Center Server service started just fine again.

Friday, May 2, 2008

Enabling Subject Alternate Name certificates

When requesting certificates from your freshly installed Certification Authority, it can come in handy to specify multiple DNS names that this certificate should be valid for. This principle is known as specifying a list of "subject alternate names" that the server is also reachable under.

Unfortunately, this mechanism doesn't work out of the box with Windows CA's. On your CA, you first need to enable a setting that allows the usage of SAN attributes. Open a command box and type (on one line):

certutil -setreg policy\EditFlags +EDITF_ATTRIBUTESUBJECTALTNAME2

net stop CertSvc & net start CertSvc

Afterwards, use the SAN:dns=&dns= attribute when requesting certificates to enable multiple DNS names.

Wednesday, April 30, 2008

Windows 2008 Certificate Authority and Windows 2000/XP/2003 clients

I was experimenting with Windows 2008 Certificate Services the other day in order to create certificates for WSUS 3.0 and for doing SSL tunneling of RDP towards the internet. I noticed that several of my clients were unable to automatically install the WSUS client, with vague errors in the event log (Win32HResult=0x00000000):

I had quickly discovered that the problem was related with the certificate that I had issued for the WSUS IIS server. It turned out that Windows 2008 WSUS clients could connect without any problem to the WSUS webpage, but Windows 2003 and Windows XP clients could not. What made it even more puzzling is that on a Windows XP system, connecting to the IIS homepage didn't succeed using Internet Explorer, but worked perfectly fine using Firefox.

Opening the certificate of my WSUS server gave the following result:

with a "This certificate has an nonvalid digital signature" error in the "Certification Path" details for both the issued certificate and my CA certificate.

Root cause:
The answer is the bleeding obvious: Windows 2008 has several new additions to the cryptography API, called Cryptography Next Generation (CNG), that are used in the V3 certificate templates for CA's and Webservers in Windows 2008. Amongst those new features is support for new certificate signing algorithms (in my case SHA512, a SHA-2 variant) which is not recognized by older clients. Windows XP SP3 adds support for XP, I suppose a future hotfix will add compatibility for Windows 2003.

Solution:
In absense of a worldwide XP SP3 deployment and a working hotfix for W2K3, the only option here is to ensure that the Windows 2008 CA certificate is created with a non-CNG cryptographic provider. If you already created a CA certificate using the new CNG features, the only option is to reinstall your CA and regenerate your CA certificate --- remember how mum always told you to think things over twice before just plainly installing a W2K8 CA... I bet you regret that now (just like I did :D) ? Reinstalling your CA could be messy, and make your PKI infrastructure go berserk, so this time do think twice before going down that road!

Step by Step plan of attack (POA)
So you have decided you want to proceed? First verify that you are indeed using a CNG CSP. To do this, open your registry editor and navigate to the following key:

[HKLM\SYSTEM\CurrentControlSet\Services\CertSvc\
Configuration\{CAname}\CSP]

If you find a CNGHashAlgorithm REG_SZ value, and the HashAlgorithm DWORD is set to 0xFFFFFFFF, then you are using a CNG CSP. If the HashAlgorithm is set to a value such as 0x00008003, then you are already using a "classic" CSP. You can also use the following command on the CA to retrieve the CSP:

certutil -getreg ca\csp\HashAlgorithm
certutil -getreg ca\csp\Provider

which will return the HashAlgorithm and the name of the CSP. For more information, I refer to the Microsoft whitepaper "Active Directory Certificate Server Enhancements in Windows Server Code Name Longhorn", you crypto-boys out there will love it.

Keep in mind that when you are adding the Certificate Services Role to your Windows 2008 server, that you need to specify the proper cryptographic service provider. The image below displays some of the options, what is important to remember here is that all the service providers that contain a hash sign ("#") are CNG providers and thus incompatible with Windows XP SP2/Windows 2003 and earlier clients.

The default cryptographic service provider for Windows 2003 is the "Microsoft Strong Cryptographic Provider", so that is what you want to use. Notice how selecting this provider reduces the number of certificate signing options... SHA-2 algorithms are no longer included! Proceed as usual to end up with a CA that produces certificates that can be handled by legacy clients.

Sunday, March 9, 2008

ESX 3.5 on a whitebox

It has been very quiet from my end for the past weeks because I was very busy at a client & at the same time spending all my free time working on my ESX-on-whitebox hardware project. After being inspired by some colleagues, I decided to order the following hardware:

Asus P5BP-E/4L motherboard
This motherboard supports an Intel S775 processor, has VGA and audio onboard and most importantly, the LAN controllers on this motherboard are ESX certified (Broadcom 57xx chipset).
Intel Q6600 Quad Core processor (2.4 GHz) and 8 GB ECC RAM (4x 2GB)
Just to be sure I have enough CPU power and memory resource pools :)
Dell Perc 5i Integrated SAS Controller
My colleagues advised me that storage was the biggest bottleneck in their ESX whiteboxes (based around the very nice Asus P5M2/SAS board). I decided to go for a dedicated hardware controller. I picked up the Dell Perc 5i controller, which is more or less a rebranded LSI Logic 8408 SAS controller on EBay with 256MB of RAM and a battery backup unit for about 175 EUR.

The main advantage of SAS controllers is that they also support the (cheaper) SATA consumer drives. A quick test confirmed this; I had absolutely no problems at all with this controller & even flashed the latest LSI Logic firmware to it :).

Maybe of interest for some: the later Dell firmwares and also the later LSI logic firmwares for this controller provide support for Write Back without a BBU present.
SATA to SAS cables
The Dell Perc 5i has SFF-8484 SAS connectors on board, so I purchased two Adaptec SFF-8484 to 4xSATA cables from a nearby store to attach all the drives.
8 Seagate SATA harddisks (4x 1TB and 4x 200GB)
Space... loads of space.

The hardest thing was getting all these disks in my Silentmaxx ST11 casing; it required some case modding and loads of patience to get everything well fitted. The 500W PSU that is necessary to provide enough juice, was recycled from an Antec Sonata case. I also added a small 3Com 3C905 100Mbps card for my ISP modem connection.

The installation of ESX 3.5 was a piece of a cake & and I can confirm that the above hardware works like a charm. For those interested, I also noticed that ESX 3.5 supports the ICH7 SATA controllers (found on many consumer motherboards as well). I think -- but this has to be confirmed by someone else -- that you need to configure your ICH7 disks in a RAID before the ESX kernel will accept them as a storage pool.

Sunday, February 3, 2008

MAV 4.5: How to perform a Dynamic Suite Composition

The blog of Justin Zarb details a step-by-step guide of how the Dynamic Suite Composition (DSC) in Microsoft Application Virtualization 4.5 functions. He describes how to include a Snag-It bubble into an existing Office 2007 bubble. There are not many technical details about how DSC works, but some interesting facts are mentioned there:

Appearantly, it is possible to compose multiple bubbles but only one level deep. If you attempt to include an OSD file that in itself has another DSC, this third bubble is not included.
Sequencing tip: make sure your sequencer workstation has all the software installed that you want your second bubble to hook onto. For example: if you are sequencing an application that integrated with Office 2007, do a fat installation of Office 2007 first, and only then start the monitoring and sequencing of the add-ins.

Personal note: also for applications that depend on Java or Oracle clients, you obviously first need to prepare your sequencer workstation by installing those core components.
The user changes that are made in the dynamically composed bubbles are all redirected to the primary bubble's UsrVo_sftfs.pkg files.

I am already planning for a few weeks to delve deeper into the DSC of MAV 4.5 beta and to check in more details what Justin describes, but a project at a customer currently prioritizes my spare time into non-MAV related things. Be sure to check regularly here again for more information on DSC.

Saturday, January 5, 2008

Microsoft SoftGrid 4.1 SP1 and 4.2 Hotfixes

The Microsoft SoftGrid blog contains the announcement for the first Hotfix Rollup Packages for SoftGrid 4.1 SP1 and SoftGrid 4.2. The main new feature for these two packages is support for the MSI Utility that was released at Christmas.

Further improvements include:

Better ActiveUpgrade and better downgrade of a package version.
Improvements to nonpaged pool usage when sequencing large applications.
Improvements when you sequence applications that use both the Microsoft .NET Framework 1.1 and the .NET Framework 2.0.
Improvements to command-line parameter handling of virtualized child processes.

The new versions are downloadable from the Microsoft Support site:

MSKB 938497 for SoftGrid 4.1 SP1
MSKB 941408 for SoftGrid 4.2

Enjoy!