Advertisements

Unable to access a file since it is locked

This one has caused me hours of grief last week  🙁 . I arrived in the office and was immediately accosted by the VDI admin who informed me that he could not access a VMDK.  For the purposes of the blog this file will be  called “Win XP SP2 -flat.vmdk” and is located on a SAN shared LUN on our ESX 3.5 infrastructure (3 ESX servers).  The first thing that was done was that a new VM was created,  and the mentioned file was attached as the Disk of this VM. Now when we started the VM, we received this wonderful and succinct error message:

 Unable to access a file since it is locked

So we attempted to copy (vmdktools –i) the locked file to another location, and received the following error message, by now I am starting to get annoyed:

root@XXXXXXX Win XP SP2# cp "Win XP SP2-flat.vmdk"
/vmfs/volumes/LUNVDI02/WinXP_SP2/
cp: cannot open `Win XP SP2-flat.vmdk' for reading: Device or resource busy

Note: above is a single wrapped line.

So how do I know why it is locked? well first I ran on all the hosts:

 ps ax |grep "Win XP SP2"

now this usually works but in this case didn’t find anything. I could not afford to restart my hosts at that time, change control would have had a fit.

So the thinking cap was duly placed upon my head  – this is the end result (suitably shorted to remove dead-end processes that resulted in no benefit)

Logon to the ESX host where the VM was last known to be running.
Issue the following command:

vmkfstools -D /vmfs/volumes/<insert your pathname here>

This dumps the information on the file into /var/log/vmkernel
Next issue the command:

less /var/log/vmkernel

Scroll to the bottom, The output will be similar to below:

a.  Sep 29 15:49:17 vm22 vmkernel: 2:00:15:18.435 cpu6:1038)FS3: 130:
    <START vmware-16.log>
b.  Sep 29 15:49:17 vm22 vmkernel: 2:00:15:18.435 cpu6:1038)Lock
    [type 10c00001 offset 30439424 v 21, hb offset 4154368
c.  Sep 29 15:49:17 vm22 vmkernel: gen 66493, mode 1, owner
    46c60a7c-94813bcf-4273-*0017a44c7727* mtime 8781867]

Note: Bold type added to number for emphasis and file have been wrapped.

d.  Sep 29 15:49:17 vm22 vmkernel: 2:00:15:18.435 cpu6:1038)Addr
    <4, 588, 7>, gen 20, links 1, type reg, flags 0x0, uid 0, gid 0, mode 644
e.  Sep 29 15:49:17 vm22 vmkernel: 2:00:15:18.435 cpu6:1038)len 23973,
    nb 1 tbz 0, zla 2, bs 65536
f.  Sep 29 15:49:17 vm22 vmkernel: 2:00:15:18.435 cpu6:1038)FS3: 132:
    <END vmware-16.log>

Note: The lettering at the beginning of each line has been added to add readability and the lines have been wrapped.

Now you can see that the owner of the lock is on line 3c, all you need, in this case is the last part of the GUID string –  0017a44c7727
The next step is to issue the following command:

esxcfg-info | grep -i 'system uuid' | awk -F '-' '{print $NF}'

This display the system uuid of the esx server.  Now this is the labourious part as this command needs to be run on each ESX server in the cluster to discover the owner. That said it is only a maximum of 32 host correct 😉
Once the ESX server that matches the uuid owner has  been found,  logon to that ESX server and run the command:

ps -elf|grep vmname

where vmname is the problem vm. Example output below:

a.  4 S root 7570 1 0 65 -10 - 435 schedu Sep27 ? 00:00:02
/usr/lib/vmware/bin/vmkload_app /usr/lib/vmware/bin/vmware-vmx
-ssched.group=host/user/pool2-@ pipe=/tmp/vmhsdaemon-0/vmxf7fb85ef5d8b3522;
vm=f7fb85ef5d8b3522 /vmfs/volumes/470e25b6-37016b37-a2b3-001b78bedd4c
/XXXXXXXX/XXXXXXXXXX.vmx0

Note: the above line is wrapped and is in reality a single output.

As you can see there is a process running, this is shown by the pid 7570 in the above example, this will need killing, to do so issue the command

Kill -9 pid

Once the kill is complete the files should be released.  and there you go.  This is a lesson in Commandline skills

Advertisements

2 comments

  1. What kind of datastores are you using? With NFS, this happens a lot if you need to manually kill VMs that no longer respond to vCenter Server (or the host itself.) When you look in the VM’s directory in the datastore, you see something like this:

    -rw——- 1 root root 8684 Oct 1 08:44 arundel (up2date).nvram
    -rw——- 1 root root 606 Sep 30 06:45 arundel (up2date).vmdk
    -rw——- 1 root root 504 Sep 30 06:45 arundel (up2date).vmsd
    -rwxr-xr-x 1 root root 3880 Sep 30 09:36 arundel (up2date).vmx
    -rw——- 1 root root 272 Sep 30 09:36 arundel (up2date).vmxf
    -rwxrwxrwx 1 root root 84 Oct 1 08:50 .lck-3183720100000000
    -rwxrwxrwx 1 root root 84 Oct 1 08:50 .lck-fdb4900100000000

    Note the existence of the two files whose names start with “.lck”. Remove those, and you’ll be able to re-start the VMs.

  2. Hey Tom,

    Just to add to your post:

    – this is also documented in the VMware KB system (See here) http://kb.vmware.com/kb/10051/

    – that last part of the GUID string you had bolded comes directly from the MAC Address of either the default Service Console on ESX. If you just want to find which ESX host has control of it, you can do an ‘ifconfig | grep HWa’ and it will give you all the hardware addresses … you should see a MAC address that matches the last part of the GUID on the host who is locking the file.

Comments have been disabled.

%d bloggers like this: