Last week I was onsite with a new customer, spending some time familiarizing myself with their existing infrastructure. They are running vSphere 4.1 on Dell servers and EqualLogic storage. While the hardware layout was solid and installed well, it quickly became apparent that the SysAdmin was a duck out of water. There was a knowledge and comfort gap that could cause problems down the road if he became careless. Little did I know that ‘down the road’ would be 2 hours later.
As the next few hours progressed, I found that he was using BackupExec agents within his critical business applications (2 VMs) and backing up directly. The rest of his VMs were unprotected other than EqualLogic snapshots. I pointed out to him that his snapshot schedule created one snapshot per hour, keeping 12 snapshots. This protected him for less than one day. He was at significant risk, and I recommended a more aggressive snapshot schedule as well as a more comprehensive backup plan.
We covered several other topics over the next few hours such as vMotion, Storage vMotion, and basic VM operations. As I prepared to leave, he decided to perform a few Storage vMotion operations on VMs he had renamed, causing mismatched folder names within the datastore . One of them happened to be his Domain Controller, which I will call ‘MismatchDC”. When he tried to migrate to another datastore, vCenter threw an error stating that “the file mismatchDC-vmdk could not be found” and shut down the VM. Within minutes, his phone was exploding with calls. Nobody could login or get Exchange email. He had pooched his Domain Controller.
While panic set in around his office and he was scrambling around, I opened up a datastore browser and was shocked. Apparently the SysAdmin had deleted some ‘unnecessary files’, such as the VMX, VMDK, and everything else except the NVRAM and mismatchDC-flat.VMDK file! These remained simply because they wouldn’t delete when he tried to delete them.
A quick check of the EqualLogic snapshots determined that he deleted these more than a day ago and they were not available, I also found that the Domain Controller was not on the list of ‘critical servers’ that they were protecting with BackupExec agents. There was no way to restore this VM! He started making plans to build a new VM s a DC, run DCPROMO, and sync across the WAN from his second site on the other side of the country.
As he was making his recovery plans and trying to explain to his boss how they would be down for the rest of the day, I recalled that I had read a KB article a while ago about recovering from a flat.vmdk file. After a quick collaboration with my friend Google, I found it. While it didn’t exactly fit my needs, it was enough to get rolling. I took the size of the flat.vmdk file, divided by 1024, and divided by 1024 again to get the size of the VMDK in GB. Luckily it came out to 25 on the nose. Assuming that it was Win 2k3 server with a 25 GB drive, and the SysAdmin verified this from memory, I created a new VM called Temp as a Win2k3 server with 1 vCPU, 2 GB RAM, and a 25 GB HDD.
I enabled remote SSH, and logged in via Putty. I went to the mismatchDC folder to verify contents.
I was correct in finding only the mismatchDC.nvram and mismatchDC-flat.vmdk files there. I then moved over to the Temp VM location.
and found the temp-flat.vmdk file. I renamed this to prevent it from being overwritten.
mv temp-flat.vmdk temp-flat.vmdk.old
Next, I copied over the original flat file and renamed it temp-flat.vmdk
cp /vmfs/volumes/vmfs01/mismatchDC/mismatchDC-flat.vmdk temp-flat.vmdk
When finished, I simply attempted to start the Temp VM.
As it started up, I saw the familiar Windows 2003 server splash screen, and called the SysAdmin over to verify that it was working.
After a few minutes of evaluation, it was proven that the Domain Controller was back up and running with less than an hour of data loss.
Steps to recover a VM from just the flat.vmdk file:
- Build new temp VM with EXACTLY identical vmkd file size
- Connect via CLI
- Rename temp-flat.vmkd file
- Copy existing-flat.vmdk file and rename to temp-flat.vmkd
- Power on temp VM
- Backup your Servers, physical or virtual
- Snapshots are NOT backups
- Don’t delete things you don’t understand