Recover a VM from the vm–flat.vmdk file

Last week I was onsite with a new customer, spending some time familiarizing myself with their existing infrastructure.  They are running vSphere 4.1 on Dell servers and EqualLogic storage.  While the hardware layout was solid and installed well, it quickly became apparent that the SysAdmin was a duck out of water.  There was a knowledge and comfort gap that could cause problems down the road if he became careless.  Little did I know that ‘down the road’ would be 2 hours later.

As the next few hours progressed, I found that he was using BackupExec agents within his critical business applications (2 VMs) and backing up directly.  The rest of his VMs were unprotected other than EqualLogic snapshots.   I pointed out to him that his snapshot schedule created one snapshot per hour, keeping 12 snapshots.  This protected him for less than one day.  He was at significant risk, and I recommended a more aggressive snapshot schedule as well as a more comprehensive backup plan.

We covered several other topics over the next few hours such as vMotion, Storage vMotion, and basic VM operations.  As I prepared to leave, he decided to perform a few Storage vMotion operations on VMs he had renamed, causing mismatched folder names within the datastore .  One of them happened to be his Domain Controller, which I will call ‘MismatchDC”.  When he tried to migrate to another datastore, vCenter threw an error stating that “the file mismatchDC-vmdk could not be found” and shut down the VM.  Within minutes, his phone was exploding with calls.  Nobody could login or get Exchange email.  He had pooched his Domain Controller.

While panic set in around his office and he was scrambling around, I opened up a datastore browser and was shocked.  Apparently the SysAdmin had deleted some ‘unnecessary files’, such as the VMX, VMDK, and everything else except the NVRAM and mismatchDC-flat.VMDK file!  These remained simply because they wouldn’t delete when he tried to delete them.

A quick check of the EqualLogic snapshots determined that he deleted these more than a day ago and they were not available, I also found that the Domain Controller was not on the list of ‘critical servers’ that they were protecting with BackupExec agents.  There was no way to restore this VM!  He started making plans to build a new VM s a DC, run DCPROMO, and sync across the WAN from his second site on the other side of the country.

As he was making his recovery plans and trying to explain to his boss how they would be down for the rest of the day, I recalled that I had read a KB article a while ago about recovering from a flat.vmdk file.  After a quick collaboration with my friend Google, I found it.  While it didn’t exactly fit my needs, it was enough to get rolling.  I took the size of the flat.vmdk file, divided by 1024, and divided by 1024 again to get the size of the VMDK in GB.  Luckily it came out to 25 on the nose.  Assuming that it was Win 2k3 server with a 25 GB drive, and the SysAdmin verified this from memory, I created a new VM called Temp as a Win2k3 server with 1 vCPU, 2 GB RAM, and a 25 GB HDD.

I enabled remote SSH, and logged in via Putty.  I went to the mismatchDC folder to verify contents.

cd /vmfs/volumes/vmfs01/mismatchDC/

I was correct in finding only the mismatchDC.nvram and mismatchDC-flat.vmdk files there.  I then moved over to the Temp VM location.

cd ../temp/

and found the temp-flat.vmdk file.  I renamed this to prevent it from being overwritten.

mv temp-flat.vmdk temp-flat.vmdk.old

Next, I copied over the original flat file and renamed it temp-flat.vmdk

cp /vmfs/volumes/vmfs01/mismatchDC/mismatchDC-flat.vmdk temp-flat.vmdk

When finished, I simply attempted to start the Temp VM.

As it started up, I saw the familiar Windows 2003 server splash screen, and called the SysAdmin over to verify that it was working.

After a few minutes of evaluation, it was proven that the Domain Controller was back up and running with less than an hour of data loss.

Steps to recover a VM from just the flat.vmdk file:

  1. Build new temp VM with EXACTLY identical vmkd file size
  2. Connect via CLI
  3. Rename temp-flat.vmkd file
  4. Copy existing-flat.vmdk file and rename to temp-flat.vmkd
  5. Power on temp VM

Lessons Learned:

  1. Backup your Servers, physical or virtual
  2. Snapshots are NOT backups
  3. Don’t delete things you don’t understand

VMware KB article 1002511

 

About these ads

About timantz

I am a Solutions Architect for Mosaic Technology, helping people around the country with their virtualization, storage, backup and recovery projects.
This entry was posted in backups, DR, equallogic, virtualization, vmware and tagged , , , . Bookmark the permalink.

10 Responses to Recover a VM from the vm–flat.vmdk file

  1. ITforMe says:

    Great story Tim. I’m amazed that this worked, as I’m sure they were as well. Thanks for sharing!

  2. Good job on getting it back up and operational Tim. I think it is actually a good thing that the equallogic array didn’t have any snapshots. He could have seriously messed up his AD environment. Restoring a domain controller from a snapshot, from vmware or otherwise is ALWAYS a bad idea when there is AD replication going on (USN problems).

    • timantz says:

      Josh,
      I agree with you there. AD restores can cause more problems than they solve in most cases. At the time I was gathering information and trying to see what tools I had to work with. Fortunately for him, I was able to get the existing vmkd back online without lost data. I was sweating a bit for the guy, even if it wasn’t my datacenter. I am glad that I was there to get him back online quickly. I felt sort of responsible, as I was the one that showed him Storage vMotion in the first place. That’s what I like to call ‘running with scissors’
      -Tim

  3. Irshaad says:

    Hi Timantz
    in our IT setup we have one physical domain controller and one additional domain controller running as a VM in VSphere 4.1, recently due to some problem I tried to restart the ESXi Host durring this process I restarted Domain Controller VM, but it did not restarted and it appears in VMs inventory as a invalid VM and I couldn’t restart the VM (DC),then I removed the VM from the inventory and created new VM with the same name of the domain controller VM and attached old domain controller VM’s VMDK file (there was no Flat-VMDK file found)to the newly created VM and the new VM (domain Controller) booted perfectly, now the problem started, due to AD replication problem between existing physical DC and VM DC domain users not able to login and multorole Exchange server stopped communicating each other and stopped working,then I shut down the VM DC,then everything stated working fine,now my question is how I can bring VM DC to live?
    looking forward your valuable advice!

    • timantz says:

      This sounds more likely to be an AD synchronization issue rather than a VM issue. My guess is that your virtual DC has been offline too long and has gotten out of sync with the latest AD database. I would try an AD repair, and if that doesn’t work, possibly demote and promote the VM again via the dcpromo.exe utility. Please don’t take my advice as gospel, as I haven’t been a Windows Domain Administrator in over 4 years and am very rusty. My focus is primarily in the virtual world. Before you do anything to your CD, I would run it past someone who is much more familiar with these sort of issues than I am.
      -Tim

  4. agarkoff says:

    Thank you very much!
    I accidentally removed most files of launched VM. Only *flat*.vmdk and some other remain safe because they were opened by VMX. After rebooting the hypervisor that VM did not booted, of course.

    Renaming *flat*.vmdk to *regular*.vmdk and playing with .raw & .dsc extension gave no results. I found your article and it helped.

    Your article reminded me some similar problem: on brother’s computer there was a problem with DOS partition table – largest partition with his movie/porn/leisure collection disappeared.

    Using GParted LiveCD I recreated the partition with exactly same parameters and happy bro could enjoy his (moving) pictures after the next boot.

  5. Walgran Apolonio says:

    We are so gratefull for this post. I can realy say that we love you man!!!

  6. DCIM Tools says:

    Great stuff, Tim. Glad you were able to get it back up and running.

  7. BW says:

    Tim, interesting article.. I’m in a very similar situation, are you still around to answer a few questions?

  8. RB says:

    You’re the man. I just fixed my issue using your method. I still don’t understand how those files were removed.

    Thanks,
    RB

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s