Tim's Technical Thoughts

Friday, August 8, 2008

WM6 and self-signed certificates

When playing around with a new (unofficial) WM6.1 rom for my Mio A701, I bumped into a well known problem with installing self-signed certificates on (homebrew?) WM6 ROMs: it is not possible to install a new CA certificate with the error message "The certificate was not successfully added; please restart your device and try again". Obviously, restarting the device did not fix the problem.

A few months ago, I already encountered the problem and I knew you could bypass it by importing the certificate directly into the mobile device's registry. However, the procedures that I read all involved:

flashing Windows Mobile 5 (or a WM6 version that was patched to accept any certificate),
importing the certificate in that temporary ROM,
exporting the relevant registry data,
reflashing back to the rom that has the certificate problem,
importing the certificate through the registry file you obtained earlier in step 3.

As you can imagine, this is quite some work and since I am a lazy person by nature, I did not want to go back to WM5 after just having flashed my Mio to a brandnew and shiny WM6. Therefore, I decided to develop a shorter workaround that doesn't involve reflashing.

The tricky part is that you need to create the proper registry file to import. This file looks like:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\824AF72AB87E17AC777098A4164D7A90C90C0D69]
"Blob"=hex:19,00,00,00,01,00,00,00,10,00,00,00,4f,e5,c4,01,4e,7d,89,4a,da,42,\
3f,f7,24,0f,7f,a2,19,00,00,00,01,00,00,00,10,00,00,00,cb,bc,40,37,8a,45,2c,\
...

(please disregard the unintentional wrapping of the registry location; everything between the square brackets should be on one line).

The difficult part is converting your self-signed certificate to the proper registry format. Here's how I did that:

On a regular PC, use Internet Explorer to go to a website with the certificate that you want to install on your mobile device (typically this will be Outlook Web Access or something). Open the certificate and install it on your local PC (let the certificate import wizard automatically place the certificate in whatever store it finds necessary).
View the certificate (in Internet Explorer or by using the Certificate MMC) and go to the "Details" tab. There you will find the "Thumbprint" of the algorithm. You will need to look up this number in a few moments, so be sure to remember the first few digits. In the case for the company I work for, the thumbprint is "824af72ab8somethingsomething".
Open your registry editor and go to the following location:

HKEY_CURRENT_USER\Software\Microsoft\SystemCertificates\Root\Certificates\

There should be a registry key that has the thumbprint of your certificate as its name:

Rightclick that registry key and click "Export...". Choose a location for the exported registry data.
Next, open the registry export in Notepad. Replace the registry key location (between the square brackets) to HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\ followed by the thumbprint. Next, replace the first 12 bytes in the "Blob" registry value by: hex:19,00,00,00,01,00,00,00,10,00,00,00.
Your result should look like this:
Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\Comm\Security\SystemCertificates\Root\Certificates\824AF72AB87E17AC777098A4164D7A90C90C0D69]
"Blob"=hex:19,00,00,00,01,00,00,00,10,00,00,00,4f,e5,c4,01,4e,7d,89,4a,da,42,\
3f,f7,24,0f,7f,a2,19,00,00,00,01,00,00,00,10,00,00,00,cb,bc,40,37,8a,45,2c,\
...
Compare this with the original registry export that I have shown above, the differences are shown in bold.
Save the registry file, copy it to your mobile device and import it there. Voila! Finished!

You can use the "Certificates" control panel to verify that your certificate is properly recognized!

Note: you must either restart the ActiveSync process on your device because it will not immediately recognize the new certificate; you can kill the ActiveSync process or restart your device (but first wait at least a few minutes such that Windows Mobile can commit your registry changes to memory!).

Obviously, this is completely not supported or endorsed by anybody on this planet. Perform these actions at your own risk and be sure you know what to do in case you brick your device!

Tuesday, July 29, 2008

Full backups of virtual machines and Windows VSS

Introduction
One of the new features that is appearing in backup products that take backups of an entire virtual machine, as opposed to using an agent inside the guest operating system, is the ability to cooperate with Windows VSS (Volume Snapshot Service) inside the guest. For example, the recently released version of VMWare's Consolidated Backup 1.5, now supports VSS quiescing for Windows 2003, Windows Vista, Windows 2008; vizioncore's vRanger Pro backup utility has been supporting VSS for Windows 2003 for some versions already.

Several opinions exist on whether this is in fact a useful feature or not; for example, not so long ago the developers of esXpress talked about not including VSS quiescing into their product at that time because it adds additional complexity and does not offer any significant benefits in their opinion (see here). This discussion is still alive as you can see for example here, and the big question is indeed: can you rely on live backups of database virtual machines?

The early days of VSS
The root of the discussion is at the intended use of VSS: on a physical machine that is running a database application such as SQL Server, Exchange or even Active Directory or a DHCP server for that matter, you cannot directly read the database files since they are exclusively locked by the database application. This used to be particularly troublesome because the only way to get a backup of the data inside such a database is to use some sort of export function that had to be programmed into the database application (think of the BACKUP TSQL command or a brick-level backup of an Exchange server).

Microsoft tackled this problem by introducing VSS, which presents a fully readable point-in-time snapshot of a filesystem to the (backup) application that initiates the snapshot. That way, a backup application can read the database file contents and put it away safely in case it is ever needed.

However, there are two problems when reading files from a filesystem that is "frozen" in time:

a file can be in progress of being written (i.e. only 400 bytes of a 512-byte block are filled with actual data).
data still in a filesystem cache or buffer in memory and not yet written to the disk (in the filesystem journal).

On top of the filesystem issues, there are two problems when reading a database that is still in use but "frozen" purely at a filesystem level:

at the time of the snapshot, a transaction could still be in progress. This can be an issue when the transaction is not supposed to be committed to the database at the end: as you know, a database query can initiate thousands of changes and perform a ROLLBACK at the end to reset any changes made since the start of the transaction.

A good (ficteous) example here is when you try to draw 1000 euros in cash from an ATM: if you change your mind right before clicking the "confirm transaction" button on the ATM screen, then you don't want your 1000 euros to be really gone if at the same time a database snapshot is taken and your final "ROLLBACK" command is not included in the database!
some data could still be in memory and not written to a logfile or a database file (so-called "dirty pages").

Crash consistency versus transactional consistency
If you don't take these four problems into account, then restoring a snapshot of such a filesystem would be in fact the same as bringing back up the server after you suddenly pulled the power plug. Such a snapshot is said to be in a crash-consistent state, i.e. the same state as a sudden power-loss.

Modern filesystems have built-in mechanisms (so-called "journalling") to tackle these problems and to ensure that when such a "frozen" filesystem is restored from a backup, the open files are put back in a consistent state as possible. Obviously, any data that only existed in memory and never was written to a filesystem journal/disk is lost. Databases rely on transaction logging to recover from a crash-consistent state back to a consistent database; this is typically done by simply rolling back all unfinished transactions, effectively ignoring all transactions that were not committed or rolled back.

Windows VSS wants to go beyond a crash-consistent snapshot and solves both the filesystem and database problem by not only freezing all I/O to the filesystem but also asking both the filesystem and all applications to flush its dirty data to disk. This allows the creation of both a filesystem consistent and an application-consistent backup. VSS has built-in support for several Windows-native technologies such as NTFS filesystems, Active Directory databases, DNS databases, ... to flush their data to disk before the snapshot is presented to the backup application requesting the snapshot. Other programs, such as SQL/Oracle databases or Exchange mailservers, use "VSS Writer" plugins to get notified when a VSS snapshot is taken and when they have to flush their dirty database pages to disk to bring the database in a transactionally consistent state.

From Technet:

[...] If an application has no writer, the shadow copy will still occur and all of the data, in whatever form it is in at the time of the copy, will be included in the shadow copy. This means that there might be inconsistent data that is now contained in the shadow copy. This data inconsistency is caused by incomplete writes, data buffered in the application that is not written, or open files that are in the middle of a write operation. Even though the file system flushes all buffers prior to creating a shadow copy, the data on the disk can only be guaranteed to be crash-consistent if the application has completed all transactions and has written all of the data to the disk. (Data on disk is “crash-consistent” if it is the same as it would be after a system failure or power outage.). [...] All files that were open will still exist, but are not guaranteed to be free of incomplete I/O operations or data corruption.

Under this design, the responsibility for data consistency has been shifted from the requestor application to the production application. The advantage of this approach is that application developers — those most knowledgeable about their applications — can ensure, through development of their own writers, the maximum effectiveness of the shadow copy creation process.

Conclusions for the physical world: the above makes clear that there is a huge benefit in using VSS when working on physical machines: VSS is a requirement to be able to backup the entire database files and to ensure that the database is not in an inconstent state when you want to do the restore the database- and logfiles and attempt to mount them. The main advantage here is that a restored database does not have to go through a series of consistency checks that typically take up many, many hours.

Going to the virtual world
In the virtual world, there are several different types of backups that can be performed:

Performing the backup inside the guest OS.
Performing a backup of the harddisk files (VHD/VMDK) when using a virtualization product that is hosted on another operating system, such as Microsoft Virtual Server or VMWare Workstation/Server.
Performing a backup of the harddisk files (VHD/VMDK) when using a bare-metal hypervisor based product such as Microsoft Hyper-V or VMWare's ESX/ESXi Server.

Obviously, when you perform the backup inside the guest OS, you still encounter the same problems as when attempting to back up a physical host: open files and database files are locked and thus cannot be backed up directly, so you have to revert to using VSS for the reasons discussed above.

But what about the other two ways of performing a virtual machine backup, when attempting to back up the entire harddisk file? For starters, it is important to realize that "file locking" now occurs at two levels:

The VHD/VMDK harddisk files themselves are opened and locked by the virtualization software (be it the hypervisor for bare-metal virtualization or the executable when using hosted virtualization);
Files can be opened and locked inside in the guest operating system.

The first issue of the open VHD/VMDK harddisk files is solved depending on the virtualization product: if you are using host-based virtualization, you can obtain a readable VHD/VMDK file by using VSS on the host operating system and asking to present an application-consistent variant of the VHD/VMDK files. If you are using a bare-metal hypervisor, a typical mechanism is by taking a snapshot of a virtual machine (which, for example in VMWare ESX, shifts the file lock from the VMDK file to the snapshot delta file, thus releasing the VMDK file for reading).

Open files inside the guest OS
Ironically, the solution of the first problem of open VHD/VMDK host files introduces the second problem of open files inside the guest os: once you have your snapshot of the VHD/VMDK files (be it through VSS for host-based virtualization or a VM snapshot for bare-metal hypervisors)... that snapshot is only in a crash-consistent state! After all, it is a point-in-time "freeze" of the entire harddisk and restoring such an image file would be equivalent to restarting the server after a total powerloss occured.

VMWare attempted to tackle this problem by introducing a "filesystem sync driver" in their VMTools (which you are supposed to install in every virtual machine running on a VMWare product). This filesystem sync driver mimics VSS in the sense that it requests that the filesystem flushes its buffer to disk, guaranteeing that the snapshot -- and thus corresponding full virtual machine backup -- is in a filesystem consistent state. Obviously, this does not solve the problem for databases which tend to react quite violently to these kind of non-VSS "freezes" of the filesystem. Prototype horror stories can be read here (AD) and here (Exchange).

So what are the real solutions for this problem? I can think of two at this moment:

After taking a snapshot, do not only backup the disks but also the memory. Then, when restoring the backup, do not "power on" the virtual machine but instead "resume" it. At first, the machine will probably be "shocked" to see that the time has lept forward and that many TCP/IP connections are suddenly being dropped, but the database server you are running should be able to handle this and properly commit any unsaved data from memory to disk.
Trigger a VSS operation inside the guest OS to commit all changes to disk and ensure filesystem- and applicationlevel consistency, and only then take the full virtual machine snapshot.

The VSS interaction with the guest operating system was first introduced by vizionCore in their vRanger Pro 3.2.0 -- which required the installation of an additional service inside the guest VM, .NET 2.0 and was only officially supported for Windows 2003 SP1+ in 32bit. With the release of VMWare Consolidated Backup 1.5, VMWare announced the default queiscing of disks on ESX 3.5 Update 2 would now be done using the new VSS driver -- supported on Windows 2003/2008/Vista in both 32 & 64-bit variants. Hurray! Problem solved, right?

So VSS seems nice, but is it necessary?
Obviously, your gut feeling will tell you that it is "nicer" and "more gentle" to the guest virtual machine when using VSS when taking a snapshot and a backup. The arguments on the difference between crash-consistency, filesystem consistency and application-level consistency (which translates to transactional consistency for databases) give solid grounds to this gut feeling.

Personally, I cannot find an argument that states that VSS is also really necessary to create a full virtual machine backup. In the physical world, filesystems and databases have been hardened to recover from the crash-consistent state that you obtain when taking a snapshot of a running virtual machine to back up and restore. Hands-on experience about this robustness can be read on several informal channels such as forum posts here.

However, if you want to be sure that your database is in a consistent state (for a faster recovery) and certainty that those few seconds of data that were not yet committed from memory to disk are in fact included in your snapshot, then VSS is what you need. The next question to answer is: what is the risk of VSS messing up and is this probability larger than not being able to restore a non-VSS-based snapshot?

Conclusion
Performing live backups of virtual machines seems like an interesting and simple feature of virtualisation at first. However, at a second glance, there are some important decisions to be made regarding the use of VSS/snapshotting technology that can impact your restore strategy and success. Even without any quiescing mechanism, the operating system should be able to handle the crash-consistent backups that are taken by performing live machine backups and should therefore be sufficiently reliable. With the ready availability of VSS in the new VMWare Tools that come with ESX 3.5 Update 2, much more than crash-consistent backups can be guaranteed without the need to install additional agents. The increased reliability and faster restore time (no filesystem/database consistency checks) that come with VSS quiesced snapshots make full virtual machine backups now a fully mature solution without the need to worry for possibly inconsistent backups.

Side remarks
Some additional remarks regarding full virtual machine backup:

Full VM backups can be an addition to guest-based file level backups, but they can never be a complete replacement:
- you might take a full VM based snapshot of your Exchange or SQL database every day, but a filebased/bricklevel backup (which is far more convenient to use for your typical single file/single mailbox restore operations) might be taken several times a day, depending on the SLA that your IT department has with the rest of the company.
- a full vm backup is a good place to start a full server recovery. It is a bad place to start a single-file or a single mailbox restore.
- a full VM backups using VSS do not allow the backup of SQL transaction logs (see "what is not supported" in the SQL VSS Writer overview), nor do they commit transaction logs to the database in order to clear up the transaction logs (an absolute necessity for Exchange databases or for several types of SQL databases).
Microsoft does not support any form of snapshotting technology on domain controllers. For more information, see MSKB 888794 on "Considerations when hosting Active Directory domain controller in virtual hosting environments".

Edit (12 Aug 2008): VeeAm has released a very interesting whitepaper that discusses not only the necessity for VSS awareness during the backup process, but also during the restore process. They give the example of a domain controller that performs USN rollbacks when being backed up using VSS but not restored using a VSS aware software. Another nice example is Exchange 2003 that requires VSS aware restore software in order to be supported by Microsoft.

Postscriptum: I started writing this article a few days before VCB 1.5 was released, and the original point I was trying to make at that time was that there were too many disadvantages to the available VSS implementations (yet another service to install, .NET 2.0, very limited OS support) to really profit from the benefits that VSS could offer. Of course, in the meantime, VMWare has taken away most of those objections by including VSS support in their VMTools for a wide range of server operating systems. This forced me to reconsider my view on whether VSS would be a good idea or not.

Thursday, June 19, 2008

SQL Server 2005 Express Edition on Windows 2008 x64

While experimenting with the Microsoft App-V 4.5 Release Candidate (more on that soon), I decided to go for a full-blown installation on Windows 2008 x64. Since this is only on my home network, I don't run a dedicated SQL server so I went for the natural choice of installing SQL Server 2005 Express Edition SP2 on my freshly installed Windows 2008 x64 App-V server.

This turned out to be less trivial than I thought. The short answer is: if you want to have a painless install of SQL Server 2005 Express Edition, take the download that includes the “Advanced Services” and simply don’t install them. The “smaller” download package does not include some necessary files for a successful x64 installation.

If you want to go the hard way and patch the setup for easier automated deployment (or just to be ‘1337 and be able to say that you fixed Microsoft’s SQL Server installer for 64-bit systems…), then follow these steps:

First of all, you should know that SP2 is the first Vista/Windows 2008 certified edition (think UAC, think session zero hardening, think enhanced security). Secondly, SQL Server 2005 Express Edition SP2 is supported to run under WOW64. That is very comforting to know, and I hadn't expected a true 64-bit edition for free. So why does it complain about installing a 32-bit version on a 64-bit machine then?

"The installation package has a missing file, or you are running a 32-bit only Setup program on a 64-bit computer"

Of course, what you don't see is that SQL is first installing the SQL Native Client in the background (as a prerequisite) and the error message conveniently forgets to mention that this is in fact the installation that is not succeeding. The error message was indeed accurate, but the error was not that I was trying to run a 32-bit installer on a 64-bit machine, but that the 64-bit installer for the SQL Native Client is not included in the package! What’s even worse, some other essential x64 packages are also not included in the smallest SQL Express 2005 SP2 download.
So you have to include the missing files manually:
1. Download the “SQL Server 2005 Express Edition SP2 with Advanced Services” package.
2. Run both the SQL Express installers with the /X switch to extract the setup files (to different directories):
  
  sqlexpr.exe /x
  sqlexpr_adv.exe /x
3. Next, locate the 64-bit SQL Native Client (sqlncli_x64.msi) and 64-bit SQL VSS Writer (SqlWriter_x64.msi) from the Advanced Services setup and copy them to the "Setup" directory of the regular SQL Express installation.

Et voila! The installer works now. One day, we will live in a perfect world of unambiguous error messages...

Now off to do some more SoftGri... ehr.. I mean Microsoft Application Vir... ehr... I mean App-V testing!

Sunday, May 25, 2008

Installing LSI Logic RAID monitoring tools under the ESX service console

As I discussed in a recent post, I used a Dell Perc 5i SAS controller in my ESX whitebox server. One of the nice features of this controller is that it is a rebranded LSI Logic controller (with a different board layout!), supported by LSI Logic firmwares and the excellent monitoring tools that LSI offers.

Of course, it is important to keep track of your RAID array status, so I decided to install the MegaCLI monitoring software under the ESX Server 3.5 Service Console. Here's how I did it and configured the monitoring on my system:

The MegaCLI software can be downloaded from the LSI Logic website. I used version 1.01.39 for Linux, which comes in a RPM file.
After uploading the RPM file to the service console, it was a matter of installing it using the "rpm" command:

rpm -i -v MegaCli-1.01.39-0.i386.rpm

This installs the "MegaCli" and "MegaCli64" commands in the /opt/MegaRAID/MegaCli/ directory of the service console.

That's it, MegaCLI is ready to be used now. Some useful commands are the following:

/opt/MegaRAID/MegaCli/MegaCli -AdpAllInfo -aALL
This lists the adapter information for all LSI Logic adapters found in your system.
/opt/MegaRAID/MegaCli/MegaCli -LDInfo -LALL -aALL
This lists the logical drives for all LSI Logic adapters found in your system. The "State" should be set to "optimal" in order to have a fully operational array.
/opt/MegaRAID/MegaCli/MegaCli -PDList -aALL
This lists all the physical drives for the adapters in your system; the "Firmware state" indicates whether the drive is online or not.

The next step is to automate the analysis of the drive status and to alert when things go bad. To do this, I added an hourly cron job that lists the physical drives and then analyzes the output of the MegaCLI command.

I created a file called "analysis.awk" in the /opt/MegaRAID/MegaCLI directory with the following contents:

# This is a little AWK program that interprets MegaCLI output

/Device Id/ { counter += 1; device[counter] = $3 }
/Firmware state/ { state_drive[counter] = $3 }
/Inquiry/ { name_drive[counter] = $3 " " $4 " " $5 " " $6 }
END {
for (i=1; i<=counter; i+=1) printf ( "Device %02d (%s) status is: %s <br/>\n", device[i], name_drive[i], state_drive[i]); }
This awk program processes the output of MegaCli, as you can test by running the following command:

./MegaCli -PDList -aALL | awk -f analysis.awk

when being in the /opt/MegaRAID/MegaCLI directory.
Then I created the cron job by placing a file called raidstatus in /etc/cron.hourly, with the following contents:

#!/bin/sh

/opt/MegaRAID/MegaCli/MegaCli -PdList -aALL| awk -f /opt/MegaRAID/MegaCli/analysis.awk >/tmp/megarc.raidstatus

if grep -qEv "*: Online" /tmp/megarc.raidstatus
then
/usr/local/bin/smtp_send.pl -t tim@pretnet.local -s "Warning: RAID status no longer optimal" -f esx@pretnet.local -m "`cat /tmp/megarc.raidstatus`" -r exchange.pretnet.local
fi

rm -f /tmp/megarc.raidstatus
exit 0

Don't forget to run a chmod a+x /etc/cron.hourly/raidstatus in order to make the file executable by all users.

In order to send an e-mail when things go wrong, I used the SMTP_Send Perl script smtp_send.pl that was discussed by Duncan Epping on his blog.

Thursday, May 22, 2008

Renaming a VirtualCenter 2.5 server

After running my VirtualCenter server on a standalone host for quite some time, I decided to join it into the domain that I am running on my ESX box (in order to let it participate in the automated WSUS patching mechanism). This also seemed like a perfect opportunity to rename the server's hostname from W2K3-VC.pretnet.local to virtualcenter.pretnet.local. However, after the hostname change, the VMWare VirtualCenter service would no longer start with an Event ID 1000 in the eventlog.

Somehow, this didn't come as a surprise ;). This has been discussed before on the VMWare forums (here and here), but I post it here because I did not immediatelly find a step-by-step walkthrough.

The problem was in fact twofold, the solution rather simple:

Renaming SQL servers is a bad idea in general (so it appears). For my small, nonproduction environment, I use SQL Server 2005 Express edition that comes with the VirtualCenter installation. If you rename a SQL server, you need to internally update the system tables using a set of stored procedures in order to make everything consist again. This is done by installing the "SQL Server Management Studio Express" and then executing the following TSQL statements:

sp_dropserver 'W2K3-VC\SQLEXP_VIM'
GO
sp_addserver 'VIRTUALCENTER\SQLEXP_VIM', local
GO
sp_helpserver
SELECT @@SERVERNAME, SERVERPROPERTY('ServerName')

The first statement removes the old server instance (replace W2K3-VC with your old server name), the second statement adds the new server instance (replace VIRTUALCENTER with your new server name). The sp_helper and SELECT statement query the internal database and variables for the actually recognized SQL server instances. You need to perform a reboot in order to get the proper instances with the last two statements.
Secondly, the System ODBC connection that is used by VMWare required an update to point to the new SQL Server instance. This was of course done using the familiar "Data Sources (ODBC)" management console.

Afterwards, the VMWare Virtual Center Server service started just fine again.

Friday, May 2, 2008

Enabling Subject Alternate Name certificates

When requesting certificates from your freshly installed Certification Authority, it can come in handy to specify multiple DNS names that this certificate should be valid for. This principle is known as specifying a list of "subject alternate names" that the server is also reachable under.

Unfortunately, this mechanism doesn't work out of the box with Windows CA's. On your CA, you first need to enable a setting that allows the usage of SAN attributes. Open a command box and type (on one line):

certutil -setreg policy\EditFlags +EDITF_ATTRIBUTESUBJECTALTNAME2

net stop CertSvc & net start CertSvc

Afterwards, use the SAN:dns=&dns= attribute when requesting certificates to enable multiple DNS names.

Wednesday, April 30, 2008

Windows 2008 Certificate Authority and Windows 2000/XP/2003 clients

I was experimenting with Windows 2008 Certificate Services the other day in order to create certificates for WSUS 3.0 and for doing SSL tunneling of RDP towards the internet. I noticed that several of my clients were unable to automatically install the WSUS client, with vague errors in the event log (Win32HResult=0x00000000):

I had quickly discovered that the problem was related with the certificate that I had issued for the WSUS IIS server. It turned out that Windows 2008 WSUS clients could connect without any problem to the WSUS webpage, but Windows 2003 and Windows XP clients could not. What made it even more puzzling is that on a Windows XP system, connecting to the IIS homepage didn't succeed using Internet Explorer, but worked perfectly fine using Firefox.

Opening the certificate of my WSUS server gave the following result:

with a "This certificate has an nonvalid digital signature" error in the "Certification Path" details for both the issued certificate and my CA certificate.

Root cause:
The answer is the bleeding obvious: Windows 2008 has several new additions to the cryptography API, called Cryptography Next Generation (CNG), that are used in the V3 certificate templates for CA's and Webservers in Windows 2008. Amongst those new features is support for new certificate signing algorithms (in my case SHA512, a SHA-2 variant) which is not recognized by older clients. Windows XP SP3 adds support for XP, I suppose a future hotfix will add compatibility for Windows 2003.

Solution:
In absense of a worldwide XP SP3 deployment and a working hotfix for W2K3, the only option here is to ensure that the Windows 2008 CA certificate is created with a non-CNG cryptographic provider. If you already created a CA certificate using the new CNG features, the only option is to reinstall your CA and regenerate your CA certificate --- remember how mum always told you to think things over twice before just plainly installing a W2K8 CA... I bet you regret that now (just like I did :D) ? Reinstalling your CA could be messy, and make your PKI infrastructure go berserk, so this time do think twice before going down that road!

Step by Step plan of attack (POA)
So you have decided you want to proceed? First verify that you are indeed using a CNG CSP. To do this, open your registry editor and navigate to the following key:

[HKLM\SYSTEM\CurrentControlSet\Services\CertSvc\
Configuration\{CAname}\CSP]

If you find a CNGHashAlgorithm REG_SZ value, and the HashAlgorithm DWORD is set to 0xFFFFFFFF, then you are using a CNG CSP. If the HashAlgorithm is set to a value such as 0x00008003, then you are already using a "classic" CSP. You can also use the following command on the CA to retrieve the CSP:

certutil -getreg ca\csp\HashAlgorithm
certutil -getreg ca\csp\Provider

which will return the HashAlgorithm and the name of the CSP. For more information, I refer to the Microsoft whitepaper "Active Directory Certificate Server Enhancements in Windows Server Code Name Longhorn", you crypto-boys out there will love it.

Keep in mind that when you are adding the Certificate Services Role to your Windows 2008 server, that you need to specify the proper cryptographic service provider. The image below displays some of the options, what is important to remember here is that all the service providers that contain a hash sign ("#") are CNG providers and thus incompatible with Windows XP SP2/Windows 2003 and earlier clients.

The default cryptographic service provider for Windows 2003 is the "Microsoft Strong Cryptographic Provider", so that is what you want to use. Notice how selecting this provider reduces the number of certificate signing options... SHA-2 algorithms are no longer included! Proceed as usual to end up with a CA that produces certificates that can be handled by legacy clients.