Heartbeat with dual SCSI config

Alan Robertson alanr@unix.sh
Wed, 06 Mar 2002 11:56:31 -0700


--------------010103000301040007020809
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Roberto Zini wrote:

> Alan Robertson wrote:
> 
>>Hi Roberto,
>>
>>I've CCed this reply to the linux-ha mailing list.
>>
>>Roberto Zini wrote:
>>
>>
>>>Hi Alan !
>>>
>>>In the past few days I tried you heartbeat solution on a couple of
>>>Linux boxes and so far I was impressed by the results I got !
>>>
>>Thanks!
>>
>>
>>>I'm trying to use the HB (heartbeat) solution as to allow a given
>>>process (eg, Apache running several CGI scripts) to operate on a
>>>SCSI disk which is shared between the above 2 boxes. Just to provide
>>>you with some numbers, I'm using a couple of Adaptec 29160 HA (configured as
>>>ID=14 on the first box and ID=13 on the second one) whose secondary bus is
>>>connected to an external SCSI disk configured as ID=5 (the primary SCSI
>>>bus is being used for the boot/root disk).
>>>
>>>Let me preface that I'm neither a SCSI expert nor a Linux one but in
>>>my tests I've seen that both boxes are able to access the same
>>>shared HD (which has been prepared with a Linux partition) at the same
>>>time (ie, they can "mount" it without problems) which can lead to data
>>>corruption if both OSes try to write the same chunk of data.
>>>
>>>I'm wondering if there is an HW/SW solution which prevents the "failover"
>>>box (from the HB point of view) from mounting the external disk when it's
>>>already being mounted by the primary box.
>>>
>>You could write a resource script which removes the /dev entry when the
>>other side has the disk mounted.  Or you could do the equivalent at the
>>kernel level like this:
>>        echo "scsi-remove-single-device A 0 D 0" > /proc/scsi/scsi
>>to make the kernel believe the device is gone, and then also do
>>        echo "scsi-add-single-device A 0 D 0" > /proc/scsi/scsi
>>to make it come back just before takeover.
>>
> 
> HEY ! THIS IS REALLY SOMETHING !


But, I read the kernel code, and it has warnings surrounding this feature. 
I've written this as a resource script, and have attached it.  It's also in 
CVS.  It worked for me in my (very) limited testing.

READ THE WARNINGS (and tell me how it works if you try it anyway ;-))

	-- Alan Robertson
	   alanr@unix.sh



--------------010103000301040007020809
Content-Type: text/plain;
 name="LinuxSCSI.in"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="LinuxSCSI.in"

#!/bin/sh
#
# $Id: LinuxSCSI.in,v 1.1 2002/03/06 18:43:20 alan Exp $
# 
# LinuxSCSI
#
# Description:	Enables/Disables SCSI devices to protect them from being
#		used by mistake
#
#
# Author:	Alan Robertson
#		Support: linux-ha-dev@lists.tummy.com
# License:	GNU Lesser General Public License (LGPL)
# Copyright:	(C) 2002 IBM
#
# CAVEATS:	See the usage message for some important warnings
#
# usage: ./LinuxSCSI <host>:<channel>:<target>[:<lun>] (start|stop|status)
#
#<adapter>:	Adapter number of the SCSI device to query
#<target>:	Target ID of the SCSI device under consideration
#<lun>:		LUN of the SCSI device under consideration
#			(optional)
#
#
# An example usage in /etc/ha.d/haresources: 
#       node1  10.0.0.170 LinuxSCSI:0:0:11 
#
unset LC_ALL; export LC_ALL
unset LANGUAGE; export LANGUAGE

usage() {
  cat <<-! >&1
	usage: $0 <host>:<channel>:<target>[:<lun>] (start|stop|status)

	$0 manages the availability of a SCSI device from the point
	of view of the linux kernel.  It make Linux stop believing the
	device is there, and it makes it come back again.

	The purpose of this resource script is to keep admins from
	messing with a shared disk that is managed by the HA subsystem and
	is currently owned by the other side.

	To get maximum benefit from this feature, you should (manually)
	disable the resources on boot, and let your HA software enable
	them when it wants to acquire the disk.
	
	The kernel code says this is potentially dangerous.  DO NOT USE
	IT ON AN ACTIVE DEVICE.  If the device is inactive, this script
	will make it stay inactive, when given "off".  If you inactivate
	the wrong device, you will have to reboot your machine, and your
	data may take a hit!

	Here are the warnings from the kernel source about the "stop"
	operation as of 2.4.10:

	Consider this feature pre-BETA.
	    CAUTION: This is not for hotplugging your peripherals. As
	    SCSI was not designed for this you could damage your
	    hardware and thoroughly confuse the SCSI subsystem.

	Similar warnings apply to the "start" operation...

	 Consider this feature BETA.
	     CAUTION: This is not for hotplugging your peripherals.
	     As SCSI was not designed for this you could damage your
	     hardware !
	However perhaps it is legal to switch on an already connected
	device. It is perhaps not guaranteed this device doesn't corrupt
	an ongoing data transfer.

	So, Caveat Emptor, and test this pre-beta feature thoroughly on
	your kernel and your configuration with real load on the SCSI
	before using it in production!

	!
  exit 1
}

zeropat="[ 0]0"

prefix=@prefix@
exec_prefix=@exec_prefix@
#. @sysconfdir@/ha.d/shellfuncs

PROCSCSI=/proc/scsi/scsi

scsi_methods() {
  cat <<-!
	start
	stop
	status
	methods
	!
}


parseinst() {
  lun=0
  case "$1" in
	
    [0-9*]:[0-9]*:[0-9]*);;

    [0-9*]:[0-9]*:[0-9]*:[0-9]*)
	lun=`echo "$1" | cut -d: -f4`;;

    *)	host=error
	channel=error
	target=error
	lun=error
	echo "Invalid SCSI instance $1" >&2
  esac
  host=`echo "$1" | cut -d: -f1`
  channel=`echo "$1" | cut -d: -f2`
  target=`echo "$1" | cut -d: -f3`
}


#
# start: Enable the given SCSI device in the kernel
#
scsi_start() {
  parseinst "$1"
  [ $target = error ] && exit 1
  echo "scsi-add-single-device $host $channel $target $lun" >>$PROCSCSI
  if
    scsi_status "$1"
  then
    return 0
  else
    echo "ERROR: SCSI device $1 not active!"
    return 1
  fi
}


#
# stop: Disable the given SCSI device in the kernel
#
scsi_stop() {
  parseinst "$1"
  [ $target = error ] && exit 1
  echo "scsi-remove-single-device $host $channel $target $lun" >>$PROCSCSI
}


#
# status: is the given device now available?
#
scsi_status() {
  parseinst "$1"
  [ $target = error ] && exit 1
  [ $channel -eq 0 ]	&& channel=$zeropat
  [ $target -eq 0 ]	&& target=$zeropat
  [ $lun -eq 0 ]	&& lun=$zeropat
  greppat="Host: *scsi$host *Channel: *$channel *Id: *$target *Lun: *$lun"
  grep -i "$greppat" $PROCSCSI >/dev/null
}

if
  [ $# -eq 1 -a "X$1" = "Xmethods" ]
then
  scsi_methods
  exit #?
fi

instance=$1
# Look for the start, stop, status, or methods calls...
case "$2" in
  stop)
	scsi_stop $instance
	exit $?;;
  start)
	scsi_start $instance
	exit $?;;
  status) 
	if
	  scsi_status $instance
	then
	  echo SCSI device $instance is running
	  exit 0
	else
	  echo SCSI device $instance is stopped
	  exit 1
	 fi
	exit $?;;


#
# methods: What methods do we support?
#
  methods) 
	scsi_methods
	exit $?;;

*)
    usage
    exit 1;;

esac

exit 1

--------------010103000301040007020809--