Unmounting a volume with open file handles to deleted files without killing the process
Recently we've been turning on compression in our Oracle databases because it saves us a ton of disk space and actually improves the performance of the databases. Among other things the process involves writing the newly compressed tablespaces to new data files (the systems in question are not using ASM). Because we're moving the DB's from where they are, that gives us the opportunity to do some housecleaning before moving them back, such as un-mounting, checking and possibly resizing the existing filesystems.
Occasionally it will occur that after the data has been migrated away and the old data files deleted, that Oracle still has open handles to the deleted files.
So the DBA tells me the filesystem is clear and I can have it, but then this happens:
[root@kwt-r3oql00 E1Q]# umount /dev/mapper/vg_kwt_r3oql20_s00-oracle_E1Q_sapdata7
umount: /oracle/E1Q/sapdata7: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
Occasionally it will occur that after the data has been migrated away and the old data files deleted, that Oracle still has open handles to the deleted files.
So the DBA tells me the filesystem is clear and I can have it, but then this happens:
[root@kwt-r3oql00 E1Q]# umount /dev/mapper/vg_kwt_r3oql20_s00-oracle_E1Q_sapdata7
umount: /oracle/E1Q/sapdata7: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
And when we go to see who the culprit is:
[root@kwt-r3oql00 E1Q]# lsof /oracle/E1Q/sapdata7
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
oracle_13 13214 oracle 520u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_13 13254 oracle 519u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_13 13258 oracle 520u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_13 13286 oracle 517u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_13 13294 oracle 520u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_13 13526 oracle 506u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_13 13996 oracle 512u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_14 14424 oracle 493u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_14 14488 oracle 515u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_14 14574 oracle 403u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_14 14712 oracle 506u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_23 23296 oracle 273u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
oracle_24 24260 oracle 277u REG 253,17 10485768192 17317890 /oracle/E1Q/sapdata7/sr3usr_1/sr3usr.data1 (deleted)
This is super annoying because the DB seems to have forgotten about those open handles and internally has no mechanism for gracefully releasing them. If you can tolerate the outage restarting the DB will release those handles, but if you can't there is another solution.
The GNU debugger (gdb) will allow you to attach to the process and close the file handles. It's not without risk, but I've been using it successfully for years. The main drawback is how tedious it is to sift through /proc looking for the open file handles and closing them one by one with the debugger interactively.
The GNU debugger (gdb) will allow you to attach to the process and close the file handles. It's not without risk, but I've been using it successfully for years. The main drawback is how tedious it is to sift through /proc looking for the open file handles and closing them one by one with the debugger interactively.
Laziness being the father of invention, I wrote the following script to do this for you. Simply provide it the path to check for open deleted file handles, and it will find the processes, attach the debugger and remove the handles.
#!/bin/bash
FS=$1
if [[ -z $FS ]]; then
echo "Please provide a filesystem path to check for open file descriptors to deleted files"
exit 1
fi
echo "WARNING: This is super-dangerous. Please don't use it in Prod without a "
echo " really good reason and a change request/blackout"
read -p "Type 'C' and Enter to continue, anything else to abort: " CHECK
if [[ "$CHECK" != "C" ]]; then
echo "Aborted by user. No changes made."
exit 0
fi
# Get a list of processes with open file handles to the given directory
PIDLIST=`lsof $FS | egrep -v 'PID' | awk '{print $2}' | sort -u`
for PID in $PIDLIST; do
unset DESCLIST
# Get a list of file descriptors for that PID that refer to deleted files
DESCLIST=`ls -l /proc/${PID}/fd | grep deleted | awk '{print $9}'`
# Display the list
echo "The Process $PID has open file descriptors for deleted files:"
echo "${DESCLIST}" | sed 's/^/ /g'
# Create a name for the script for gdb
DEBUGSCRIPT=/tmp/.$PID
# Remove any previous version of the script
if [[ -f $DEBUGSCRIPT ]]; then /bin/rm $DEBUGSCRIPT; fi
# Write a close command for each deleted file descriptor
for DFD in $DESCLIST; do
echo "p close(${DFD})" >> $DEBUGSCRIPT
done
# Detach and close the debugger
echo "detach" >> $DEBUGSCRIPT
echo "quit" >> $DEBUGSCRIPT
echo "Forcibly closing handles for deleted files on process $PID"
# Run the debugger in batch mode to execute the script and close the handles
/usr/bin/gdb --pid $PID --batch -x $DEBUGSCRIPT
# Wait before deleting the script
sleep 1
# Clean up the script
/bin/rm $DEBUGSCRIPT
done
Comments
Post a Comment