by Harley Stagner on November 25, 2009
I ran into a situation recently with a client that had an Exchange server VM that would not start due to a locked VMDK file on the ESX host. I am writing this post to remind myself and others how to unlock a VMDK (or any open file) on an ESX (with Service Console) host without rebooting the host. Why couldn’t I reboot the host? Here’s the story.
I was called in to help a client with an ESX 3.5 host failure in their environment. HA had kicked in, but they only had one host available (with multiple VMs already running on it) and the Exchange server VM would not boot. I got the Exchange server to boot, but that is a story for a different post
. When the Exchange VM booted, I discovered that the system volume (C:) was nearly out of space. So, the Exchange Transport service would not start. So, I used VMware Converter to convert the existing Exchange VM to a new VM on the same ESX host with a larger system volume (C:).
However, there was a catch. I couldn’t convert the system volume (C:) and the data drive for the Exchange VM. There was not enough space on the datastore. So, I just converted the system volume (C:) and re-sized it accordingly. Then, the plan was to move the data drive into the newly converted VM’s folder and attach it to the VM. It would have worked too, if it wasn’t for that meddling open file lock. I got the old “Cannot start <VMName> because there is a lock on a file” error when I tried to start the new VM.
I verified that the original Exchange VM was powered off, so I figured it must be a hung process that had the VMDK file still open. Here is how you can find and kill the process.
- Log on to the Service Console of your ESX host.
- Use the Linux
lsof (list open files) command along with grep to find the VM process that is responsible for locking your VMDK file.
lsof |grep vmname (Where vmname is the name of the VM in question.)
- Then, kill any processes that have your VMDK file open with the
kill -9 pid command (Where pid is the process ID of the offending process).
That’s it! Next time you find yourself needing to kill a process on your ESX host instead of rebooting, you can just track it down.
by Harley Stagner on November 20, 2009
This VCDX Enterprise Admin exam note is about N-Port ID Virtualization (NPIV). On the exam blueprint this is section 1.1.S.3. According to the VI:3 Fibre Channel SAN Configuration Guide on pg. 86:
NPIV is an ANSI T11 standard that describes how a single Fibre Channel HBA port can register with the fabric using several worldwide port names (WWPNs). This allows a fabric-attached N-port to claim multiple fabric addresses. Each address appears as a unique entity on the Fibre Channel fabric.
So, what does this mean?
NPIV allows a single FC HBA port to register multiple unique WWNs with the SAN fabric. Each of these WWNs can be assigned to an individual virtual machine. This allows the SAN administrator to monitor and route storage access on a per VM basis. Pretty cool, huh? Here is my understanding of how the NPIV configuration process works with regards to VMware.
Notes for section 1.1.S.3: Configure and use NPIV HBAs
- NPIV must be configured on the ports that you will use on your FC switches. (This configuration steps depend on the switch vendor).
- Your HBAs on the ESX host must support NPIV. You can check to see if they support NPIV using the following command:
cat /proc/scsi/qla2300/? | grep NPIV (For Qlogic HBAs)
cat /proc/scsi/lpfc/? | grep NPIV (For Emulex HBAs)
- The physical HBAs on an ESX host must be able to access the same LUNs that you will be presenting to the VM’s through NPIV.
- When you are configuring an NPIV LUN for access, make sure that the NPIV LUN number and NPIV target ID match the physical LUN and Target ID.
- Now when you create a VM and want to use NPIV, you need to do the following:
- Create the VM as normal until you get to the disk portion. The NPIV disk will need to be a Raw Device Map (RDM) disk.
- If you want to be able to VMotion the VM that uses NPIV, the RDM file needs to be on the same shared datastore where the virtual machine resides.
- The RDM can be either virtual or physical compatibility mode, depending on your needs after weighing the pros and cons of each choice.
- When you get to the end of the VM creation process, choose “Edit the virtual machine settings before submitting the creation task” checkbox and click Continue.
- Select the Options tab, then Fibre Channel NPIV.
- For a new VM choose the “Generate new WWNs” option. This will generate and assign new WWNs (four pairs: WWPN & WWNN) to the virtual machine.
- The other two options are “Leave unchanged” and “Remove WWN assignment.”
- Click OK to save the changes.
- When the VM powers up, it uses each of the WWN pairs in sequence to try and discover an access path to the storage.
- The number of VPORTS (virtual ports) that are instantiated is equal to the number of physical HBAs up to 4.
- A VPORT is created on each physical HBA where a physical access path is found.
- The physical paths will be used to determine the virtual paths used to access the LUN.
NPIV Confirmation Techniques:
To identify the HBAs (including the HBA #) in a host system, use the following commands:
ls /proc/scsi/qla2300 (For Qlogic HBAs)
ls /proc/scsi/lpfc (For Emulex HBAs)
Now, to confirm that NPIV traffic is going through an HBA you can check Fibre Channel traffic on the virtual switch port, or on the ESX host you can do the following:
cat /proc/scsi/qla2300/HBA# (For Qlogic HBAs)
cat /proc/scsi/lpfc/HBA# (For Emulex HBAs)
- HBA# is the number for the particular HBA you are checking.
- You can also confirm the creation of a VPORT using the same commands.
Additional Notes / Best Practices
- The WWN information for an NPIV-enabled virtual machine is stored in the virtual machine’s vmx file.
- When you clone a virtual machine or template with a WWN assigned to it, the clone does not retain the WWN.
- Always use the VI Client to manipulate virtual machines with WWNs.
by Harley Stagner on November 19, 2009
This note is about Objective 1.1.S.2: Verify SAN LUN Accessibility. There are two simple ways to determine if a SAN LUN is accessible.
- See if you can browse the datastore in the VI client. See if you can get in the folder containing a virtual machine on that datastore (LUN).
- On the command line, list the contents of the
/vmfs/volumes directory. Then navigate to the LUN in question and see if the vm files show up.
ls -l /vmfs/volumes
cd /vmfs/volumes/LUN_3 (for example).
by Harley Stagner on November 18, 2009
This VCDX study note deals with Round-Robin Multi-pathing. The 1.1.K.4 section of the VCDX Enterprise Administration Exam blueprint asks the certification candidate to “Explain the use cases for round-robin load balancing.” Well, in VMware 3.5, round-robin is “experimentally” supported. So, in my opinion, the use cases are limited to test/dev and not production environments. That being said, the round-robin path selection policy is used to load-balance paths from the ESX host to the LUNs that are presented to it. Here is my take on round-robin, as I understand it. If anyone reading has something more to add, please do so in the comments section.
- I believe round-robin to be most effective in Active/Active arrays where a LUN can be “owned” by more than one storage processor.
- Round-robin can be set up for both Active/Active and Active/Passive arrays. However, if it is used on an Active/Passive array, then the array must NOT be configured to automatically switch controllers (path ping-ponging / thrashing can occur).
- Path switching for failover can be set in the VI client, but load-balancing can only be set from the command line.
- You set the path switching policy for load balancing on a per LUN basis using the
esxcfg-mpath command.
This brings us to the 1.1.S.1 section in “Skills and Abilities”. Perform advanced multi-pathing configuration (specifically, configure round-robin behavior using command-line tools.
Notes for 1.1.S.1
- Configure patch switching policy using the esxcfg-mpath command.
- Round-robin and esxcfg-mpath are discussed in the Round-Robin Load Balancing technical note from VMware. Refer to this technical note for esxcfg-mpath options and syntax.
- Below are some examples of setting a round-robin custom load balancing policy using esxcfg-mpath.
esxcfg-mpath --lun=vmhba0:0:0 -p custom (sets this particular LUN to use a custom policy. After this has been set, there is no need to specify the -p option on additional custom configurations for that LUN.
esxcfg-mpath --lun=vmhba0:0:0 -H any -B 4000 (Sets a custom HBA policy with the -H option to choose switch to any available HBA after 4000 blocks have been issued on the current path).
esxcfg-mpath --lun=vmhba0:0:0 -C 100 (Sets the maximum amount of commands before a path switch to 100 using -C or --custom-max-commands policy)
esxcfg-mpath -q --lun=vmhba0:0:0 (Shows the current load-balancing policy settings for that particular LUN).
- There are some default settings for the
--custom-max-blocks (2048) and --custom-max-commands (50) options. These defaults can be changed with the esxcfg-advcfg command as shown below.
esxcfg-advcfg -s 4000 /Disk/SPBlksToSwitch (Sets the number of blocks sent over a given path before a path switch default to 4000)
esxcfg-advcfg -s 100 /Disk/SPCmdsToSwitch (Sets the number of I/O commands sent over a given path before a path switch default to 100)
esxcfg-advcfg -g /Disk/SPBlksToSwitch (Shows the current setting on the SPBlksToSwitch parameter).
esxcfg-advcfg-g /Disk/SPCmdsToSwitch (Shows the current setting on the SPCmdsToSwitch parameter).
by Harley Stagner on November 17, 2009
I recently started preparing for the VCDX Enterprise Administration Exam on VI:3. To help myself and others I will be making notes on my blog to refer back to as I prepare for the test. The first note is for objective 1.1: Create and Administer VMFS datastores using advanced techniques. All of the objectives in the Enterprise Administration Exam Blueprint are split into Knowledge (K), Skills and Abilities (S), and Tools (T). So, my posts will refer to an objective using the following syntax:
Objective#.(K|S|T).Sub-Objective#.Sub-Objective#
Without further ado, here is my note.
1.1.K.3 – Explain the process used to align VMFS partitions
- VMFS partitons are aligned by default if they are created with the VI Client.
- Identify LUNs that are available to your ESX host using this command:
esxcfg-vmhbadevs
- Note the Service Console Device Linux names of the LUNs you will be using for VMFS: e.g. /dev/sda
- See if any existing partitions are aligned with this command:
fdisk -lu /dev/sd* Where * is your device letter (e.g. /dev/sda)
- A properly aligned partition should start at 128.
- To align a partition (/dev/sda for example) do the following sequence of steps:
fdisk /dev/sda
- Determine if any VMFS partitions exist. They will have an id of “fb”.
- Backup your data on the partition if you need it. If there are any VMFS partitions and you want to delete them, type “d.”
- Type “n” to create a new partition
- Type “p” to create a primary partition
- Type “1″ to create partition number 1
- Select the defaults to use the complete disk
- Type “t” to set the partition’s system ID
- Type “fb” for the VMFS system ID
- Type “x” to go into expert mode
- Type “b” to adjust the starting block number
- Type “1″ to choose partition 1
- Type “128″ to set it to 128
- Type “w” to write the label and partition information to the disk
Now that you have an aligned partition you can create a VMFS volume on that partition using the following command:
vmkfstools -C vmfs3 -S YourLabel vmhba#:#:#:#
-C creates the vmfs3 volume and -S labels the volume. Replace YourLabel with your volume label and replace the vmhba#:#:#:# with your Adapter:Target:LUN:Partition.
For example vmhba1:0:2:1 .
Verify that ESX sees the new VMFS volume with the following command:
vdf -h
by Harley Stagner on October 21, 2009
by Harley Stagner on October 9, 2009
Maintenance mode on ESX hosts is great. As long as DRS is set to automatic and VMotion is working, all of the VM’s will be evacuated from the host. However, I always seem to run into one VM that still has a CDROM drive connected to it. This causes the VMotion to stop until the CDROM drive is disconnected.
Since, this happened quite frequently, I decided to do maintenance mode through PowerShell to automate the process
. The below script will:
- Search for any VM’s that have CDROM’s connected
- Disconnect the CDROM’s from said VM’s
- VMotion the VM’s to the remaining ESX hosts (there is no way that I know of to automatically VMotion the VM’s by just putting the host into maintenance mode using PowerShell)
- Put the target host into maintenance mode
A special thanks goes to Steve Beaver. His post “Working with Maintenance Mode in PowerShell” helped me figure out the automatic VMotion to the remaining hosts.
So, here it is:
# Name: esx-maintenance-prep.ps1
# Purpose: Disconnects all CDROM Drives on VM's from a chosen host. It will VMotion the
# VM's on that host. Then, it will put the host into maintenance mode.
#
# Created: 09/11/2009
# Author: Harley Stagner
# Version: 1
#
# TODO:
# -Wrap work into functions
# -Use if statement to account for VM's that are powered off
################################################################
#Connect to your vCenter Server
Connect-VIServer
Write-Host "Choose your server from the list below:"
# Set up variables for the Host Selection
$colESX = Get-VMHost
$selectionNumber = 0
#HashTable
$colHostSelection = @{}
#Array
$colSelection = @()
# Create the menu choices for the hosts
$colESX | sort Name | ForEach-Object {
$selectionNumber = $selectionNumber + 1
$colHostSelection.Add("$selectionNumber", "$_")
Write-Host "$selectionNumber - $_"
}
# Needed to match the option for the switch statement
$colHostSelection.Keys | ForEach-Object{
$colSelection += $_
}
# Have the VM Admin make the host selection
$VMHostSelection = Read-Host "Please choose an option between" $colSelection[0] "and" $colSelection[-1]
$VMHost = $colHostSelection[$VMHostSelection]
# Switch statement to get the appropriate host to put into maintenance mode.
switch ($colSelection[0]..$colSelection[-1]){
{$_ -eq $VMHostSelection} {
# Disconnect the CDRom Drives from the VM's.
Get-VMHost $VMHost | Get-VM | Get-CDDrive | Set-CDDrive -Connected $false -Confirm:$false
ForEach ($ESX in $colESX){
If ($ESX -cnotmatch $VMHost){
# vMotion the VM's to the other hosts in the cluster.
Get-VMHost -Name $VMHost | Get-VM | Move-VM -Destination (Get-VMHost $ESX)
}
}
# Put the chosen host into Maintenance Mode.
Get-VMHost -Name $VMHost | Set-VMHost -State Maintenance | ForEach-Object { "Entering maintenance mode on '" + $VMHost + "'"} ; continue
}
}
Disconnect-VIServer -Confirm:$False
#END SCRIPT#
As you can see, I still have a couple items in my to do list for this script. However, it is very useable as it stands.
by Harley Stagner on October 8, 2009
I know I may be late on this. However, I think it is cool enough to mention. I noticed that some VMware KB articles now have videos! This is an excellent resource and I hope that this trend continues in the future.
Just head over to the VMware Support Page to check out some of the videos on the right under “Top Support Videos.”
An example of one of these videos is in this article:
Committing Snapshots From Within the Service Console
Bravo VMware, Bravo! I’d love to see the videos continue.
by Harley Stagner on October 6, 2009

Has anyone else seen the above vRanger Pro Error?
by Harley Stagner on October 1, 2009