Resent-Date: Fri, 18 Sep 1998 05:47:34 +0200 (MET DST)
Sender: schmitz@lbl.gov
Date: Thu, 17 Sep 1998 20:47:28 -0700
From: Michael Schmitz <schmitz@lcbvax.cchem.berkeley.edu>
Reply-To: MSchmitz@lbl.gov
Organization: Tinoco Lab, UC Berkekely / Lawrence Berkeley Laboratory
To: linux-m68k@lists.linux-m68k.org, linux-mac68k@baltimore.wwaves.com,
        alan@cymru.net, Jes.Sorensen@cern.ch
Subject: Patch for generic NCR5380 SCSI driver
Resent-From: linux-m68k@phil.uni-sb.de

This is a multi-part message in MIME format.
--------------6344831F8616DAB62AE38F22
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

attached should be my diff for the generic NCR5380 driver to fix the
nasty 
reselect/select races we had on the Mac. The bulk of the patch is adding 
if (hostdata->connected) return -1; and similar tests for reselection
everywhere in the select code where it seemed necessary, I've copied
that part from the good old Atari driver (we know that one works fine). 
Another part implements the abort and reset code the way it's used in
the Atari driver. I'm  not sure that's really necessary, the problem in
the Atari driver was that commands timing out while not on any of the
queues, with the coroutine not running, would have had to reacquire the
ST-DMA lock from interrupt context. 
If that was the _only_ reason why we can't use the simple reset code in
the 
driver, that part of the diff shouldn't be applied. 

I've added two things to the coroutine: a test for reselection if the
select failed (we don't need to waste time searching the issue queue to
the end and failing each select here), and two lines to prevent
restarting the search of the issue queue from the head of the queue
after each failed select (we loop over the first two entries in the
queue forever otherwise). The second is absolutely required, the first
is merely a speedup.
Tested on Mac, should not hurt for other machines from what I see but it
probably needs more testing. 

The patch is relative to 2.1.115-mac but should apply with minor
problems. 

Trying to integrate this with the mainstram source raises the question
how to 
handle the different requirements of m68k and other archs WRT use of
sti() vs. 
restore_flags() and similar ... currently the m68k (Mac) specific stuff
is 
enclosed in #ifdef CONFIG_MAC but that won't please everybody.

	Michael

P.S.: I hope Netscrap doesn't garble the attachment, I can't simply
include it
anymore :-(
--------------6344831F8616DAB62AE38F22
Content-Type: text/plain; charset=us-ascii; name="NCR5380-2.1.115-mac.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="NCR5380-2.1.115-mac.diff"

--- drivers/scsi/NCR5380.c.org	Thu Sep  3 20:38:21 1998
+++ drivers/scsi/NCR5380.c	Wed Sep 16 22:57:42 1998
@@ -1370,9 +1370,14 @@
 							tmp->host_scribble = (unsigned char *)
 							    hostdata->issue_queue;
 							hostdata->issue_queue = tmp;
+							/*
+							 * prevent reset of issue queue search 
+							 */
+							 if (prev)
+								tmp = prev;
 							done = 0;
 #ifdef CONFIG_MAC
-							sti();
+							restore_flags(flags); /*seems OK, was: sti();*/
 #else
 							restore_flags(flags);
 #endif /* CONFIG_MAC */
@@ -1380,6 +1385,21 @@
 							printk("scsi%d : main(): select() failed, returned to issue_queue\n",
 							       instance->host_no);
 #endif
+							/* 
+							 * MSch: reselects happening between testing for hostdata->connected 
+							 * and successful arbitration by NCR5380_select abort the selection
+							 * above and reinsert the attempted command to the head of the issue
+							 * queue. Fine. But that also restarted the search at the head of the 
+							 * issue queue, attempting the next select, failing again etc. while 
+							 * the Right Thing would be to service the reconnected command.
+							 */
+							if (hostdata->connected) {
+#if (NDEBUG & NDEBUG_MAIN)
+							  printk("scsi%d: main(): reselection (%d) while select (%d) attempted\n", 
+								 instance->host_no, hostdata->connected->target, tmp->target);
+#endif
+							  break;
+							}
 						}
 					}	/* if target/lun is not busy */
 				}	/* for */
@@ -1484,6 +1504,9 @@
 					if ((NCR5380_read(STATUS_REG) & (SR_SEL | SR_IO)) ==
 					    (SR_SEL | SR_IO)) {
 						done = 0;
+#ifdef CONFIG_MAC
+						ENABLE_IRQ(); /*required for level triggered ints; was: sti();*/
+#else
 						restore_flags(flags);
 #endif /* CONFIG_MAC */
 #if (NDEBUG & NDEBUG_INTR)
@@ -1623,6 +1646,17 @@
  *
  *      If failed (no target) : cmd->scsi_done() will be called, and the 
  *              cmd->result host byte set to DID_BAD_TARGET.
+ *
+ *	MSch 980914: since NCR5380_main enables interrupts before selection
+ *		(we need the timer int.) it is possible that a reselection 
+ *		happens while NCR5380_select is arbitrating for the bus or 
+ *		selecting the target. Numerous steps in this function are 
+ *		run without timeouts, so commands hanging on select will 
+ *		time out in the midlevel code if lucky. 
+ *		Roman Hodek fixed this in the Atari driver, and I've 
+ *		experienced driver hangup due to reselect while selecting
+ *		on the Mac, so I've adopted the Atari fixes for the generic
+ *		driver.
  */
 static int NCR5380_select(struct Scsi_Host *instance, Scsi_Cmnd * cmd, int tag) {
 	NCR5380_local_declare();
@@ -1643,6 +1677,10 @@
 	}
 #endif
 
+	if (hostdata->connected) {
+	  	return -1;
+	}
+
 	 hostdata->restart_select = 0;
 #if defined (NDEBUG) && (NDEBUG & NDEBUG_ARBITRATION)
 	 NCR5380_print(instance);
@@ -1652,6 +1690,10 @@
 	 save_flags(flags);
 	 cli();
 
+	 if (hostdata->connected) {
+		restore_flags(flags);
+	  	return -1;
+	}
 	/* 
 	 * Set the phase bits to 0, otherwise the NCR5380 won't drive the 
 	 * data bus during SELECTION.
@@ -1684,7 +1726,8 @@
 		}
 	}
 #else				/* NCR_TIMEOUT */
-	while (!(NCR5380_read(INITIATOR_COMMAND_REG) & ICR_ARBITRATION_PROGRESS));
+	while (!(NCR5380_read(INITIATOR_COMMAND_REG) & ICR_ARBITRATION_PROGRESS)
+	       && !hostdata->connected);
 #endif
 
 #if (NDEBUG & NDEBUG_ARBITRATION)
@@ -1693,6 +1736,11 @@
 	 __asm__("nop");
 #endif
 
+	if (hostdata->connected) {
+	  NCR5380_write(MODE_REG, MR_BASE); 
+	  return -1;
+	}
+
 	/* 
 	 * The arbitration delay is 2.2us, but this is a minimum and there is 
 	 * no maximum so we can safely sleep for ceil(2.2) usecs to accommodate
@@ -1704,8 +1752,9 @@
 
 	/* Check for lost arbitration */
 	if ((NCR5380_read(INITIATOR_COMMAND_REG) & ICR_ARBITRATION_LOST) ||
-	     (NCR5380_read(CURRENT_SCSI_DATA_REG) & hostdata->id_higher_mask) ||
-	  (NCR5380_read(INITIATOR_COMMAND_REG) & ICR_ARBITRATION_LOST)) {
+	    (NCR5380_read(CURRENT_SCSI_DATA_REG) & hostdata->id_higher_mask) ||
+	    (NCR5380_read(INITIATOR_COMMAND_REG) & ICR_ARBITRATION_LOST) || 
+	    hostdata->connected) {
 		NCR5380_write(MODE_REG, MR_BASE);
 #if (NDEBUG & NDEBUG_ARBITRATION)
 		printk("scsi%d : lost arbitration, deasserting MR_ARBITRATE\n",
@@ -1713,14 +1762,21 @@
 #endif
 		return -1;
 	}
-	NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_SEL);
+	/* 
+	 * MSch: need to asert BSY while selecting, otherwise a target might 
+	 * detect bus free and burst in after we won arbitration
+	 * Fix by Roman Hodek; see atari_NCR5380.c 
+	 */
+	NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_SEL |
+		                             ICR_ASSERT_BSY );
 
 	if (!(hostdata->flags & FLAG_DTC3181E) &&
 		/* RvC: DTC3181E has some trouble with this
 		 *	so we simply removed it. Seems to work with
 		 *	only Mustek scanner attached
 		 */
-		(NCR5380_read(INITIATOR_COMMAND_REG) & ICR_ARBITRATION_LOST)) 
+		((NCR5380_read(INITIATOR_COMMAND_REG) & ICR_ARBITRATION_LOST) ||
+	         hostdata->connected)) 
 	{
 		NCR5380_write(MODE_REG, MR_BASE);
 		NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
@@ -1737,6 +1793,12 @@
 
 	udelay(2);
 
+	if (hostdata->connected) {
+	  NCR5380_write(MODE_REG, MR_BASE);
+	  NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
+	  return -1;
+	}
+
 #if (NDEBUG & NDEBUG_ARBITRATION)
 	printk("scsi%d : won arbitration\n", instance->host_no);
 #endif
@@ -1759,6 +1821,11 @@
 		     ICR_ASSERT_DATA | ICR_ASSERT_ATN | ICR_ASSERT_SEL));
 	NCR5380_write(MODE_REG, MR_BASE);
 
+	if (hostdata->connected) {
+	  NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
+	  return -1;
+	}
+
 	/* 
 	 * Reselect interrupts must be turned off prior to the dropping of BSY,
 	 * otherwise we will trigger an interrupt.
@@ -1834,9 +1901,12 @@
 	hostdata->selecting = 0; /* clear this pointer, because we passed the
 				waiting period */
 #else
+#if 0   /* MSch: testing */
 	while ((jiffies < timeout) && !(NCR5380_read(STATUS_REG) &
 					(SR_BSY | SR_IO)));
 #endif
+#endif
+#if 0   /* MSch: testing */
 	if ((NCR5380_read(STATUS_REG) & (SR_SEL | SR_IO)) ==
 	    (SR_SEL | SR_IO)) {
 		NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
@@ -1846,6 +1916,16 @@
 		NCR5380_write(SELECT_ENABLE_REG, hostdata->id_mask);
 		return -1;
 	}
+#else
+	/* MSch: fix from Atari driver */
+	/* ++roman: If a target conformed to the SCSI standard, it wouldn't assert
+	 * IO while SEL is true. But again, there are some disks out the in the
+	 * world that do that nevertheless. (Somebody claimed that this announces
+	 * reselection capability of the target.) So we better skip that test and
+	 * only wait for BSY... (Famous german words: Der Kl|gere gibt nach :-)
+	 */
+	while ((jiffies < timeout) && !(NCR5380_read(STATUS_REG) & SR_BSY));
+#endif
 	/* 
 	 * No less than two deskew delays after the initiator detects the 
 	 * BSY signal is true, it shall release the SEL signal and may 
@@ -1919,7 +1999,12 @@
 	printk("scsi%d : target %d selected, going into MESSAGE OUT phase.\n",
 	       instance->host_no, cmd->target);
 #endif
+#ifdef CONFIG_MAC
+	/* MSch: Mac kludge, prohibit disconnects if can_queue <= 2 */
+	tmp[0] = IDENTIFY(( ((instance->irq == IRQ_NONE) || (instance->can_queue < 3)) ? 0 : 1), cmd->lun);
+#else
 	tmp[0] = IDENTIFY(((instance->irq == IRQ_NONE) ? 0 : 1), cmd->lun);
+#endif
 #ifdef SCSI2
 	if (cmd->device->tagged_queue && (tag != TAG_NONE)) {
 		tmp[1] = SIMPLE_QUEUE_TAG;
@@ -2161,15 +2246,18 @@
 	 udelay(25);
 	 NCR5380_write(INITIATOR_COMMAND_REG, ICR_BASE);
 	 restore_flags(flags);
-}				/*
+}
+/*
+ * Function : do_abort (Scsi_Host *host)
+ * 
+ * Purpose : abort the currently established nexus.  Should only be 
+ *      called from a routine which can drop into a 
+ * 
+ * Returns : 0 on success, -1 on failure.
+ */ 
+
+static int do_abort(struct Scsi_Host *host) {
 
-				 * Function : do_abort (Scsi_Host *host)
-				 * 
-				 * Purpose : abort the currently established nexus.  Should only be 
-				 *      called from a routine which can drop into a 
-				 * 
-				 * Returns : 0 on success, -1 on failure.
-				 */ static int do_abort(struct Scsi_Host *host) {
 	NCR5380_local_declare();
 	unsigned char tmp, *msgptr, phase;
 	int len;
@@ -3285,6 +3373,9 @@
  *       connected, you have to wait for it to complete.  If this is 
  *       a problem, we could implement longjmp() / setjmp(), setjmp()
  *       called where the loop started in NCR5380_main().
+ *
+ * XXX - MSch: I've changed the abort and reset code to work like the code
+ *       in the Atari NCR5380 driver, the old reset code was broken there.
  */
 
 #ifndef NCR5380_abort
@@ -3303,10 +3394,6 @@
 
 	NCR5380_print_status(instance);
 
-	printk("scsi%d : aborting command\n", instance->host_no);
-	print_Scsi_Cmnd(cmd);
-
-	NCR5380_print_status(instance);
 
 	save_flags(flags);
 	cli();
@@ -3335,7 +3422,9 @@
  * into BUS FREE.
  */
 
+#if 0
 		NCR5380_write(INITIATOR_COMMAND_REG, ICR_ASSERT_ATN);
+#endif
 /* 
  * Since we can't change phases until we've completed the current 
  * handshake, we have to source or sink a byte of data if the current
@@ -3346,8 +3435,19 @@
  * Return control to the executing NCR drive so we can clear the
  * aborted flag and get back into our main loop.
  */
-
-		return 0;
+		if (do_abort(instance) == 0) {
+			hostdata->aborted = 1;
+			hostdata->connected = NULL;
+			cmd->result = DID_ABORT << 16;
+			hostdata->busy[cmd->target] &= ~(1 << cmd->lun);
+			restore_flags(flags);
+			cmd->scsi_done(cmd);
+			return SCSI_ABORT_SUCCESS;
+		} else {
+/*			restore_flags(flags); */
+			printk("scsi%d: abort of connected command failed!\n", instance->host_no);
+			return SCSI_ABORT_ERROR;
+		}
 	}
 #endif
 
@@ -3373,7 +3473,7 @@
 			printk("scsi%d : abort removed command from issue queue.\n",
 			       instance->host_no);
 #endif
-			tmp->done(tmp);
+			tmp->scsi_done(tmp);
 			return SCSI_ABORT_SUCCESS;
 		}
 #if (NDEBUG  & NDEBUG_ABORT)
@@ -3398,7 +3498,7 @@
 #if (NDEBUG & NDEBUG_ABORT)
 		printk("scsi%d : abort failed, command connected.\n", instance->host_no);
 #endif
-		return SCSI_ABORT_NOT_RUNNING;
+		return SCSI_ABORT_SNOOZE;	/* was: SCSI_ABORT_NOT_RUNNING */
 	}
 /*
  * Case 4: If the command is currently disconnected from the bus, and 
@@ -3452,8 +3552,9 @@
 					*prev = (Scsi_Cmnd *) tmp->host_scribble;
 					tmp->host_scribble = NULL;
 					tmp->result = DID_ABORT << 16;
+					hostdata->busy[cmd->target] &= ~(1 << cmd->lun);
 					restore_flags(flags);
-					tmp->done(tmp);
+					tmp->scsi_done(tmp);
 					return SCSI_ABORT_SUCCESS;
 				}
 		}
@@ -3481,6 +3582,10 @@
  *
  * Returns : SCSI_RESET_WAKEUP
  *
+ *	MSch: commands affected by a bus reset are subject to timeout, but the
+ *	midlevel code (using old error handling code) doesn't disable timeouts
+ *	immediately. Plus do_reset just resets the bus, without inserting connected
+ *	and disconnected commands into the issue queue. 
  */
 
 #ifndef NCR5380_reset
@@ -3488,10 +3593,88 @@
 #endif
 int NCR5380_reset(Scsi_Cmnd * cmd, unsigned int dummy) {
 	NCR5380_local_declare();
+	struct Scsi_Host *instance;
+	struct NCR5380_hostdata *hostdata;
+	int           i;
+	unsigned long flags;
+	Scsi_Cmnd *connected, *disconnected_queue;
+
 	NCR5380_setup(cmd->host);
 
+	instance = cmd->host;
+	hostdata = (struct NCR5380_hostdata *) instance->hostdata;
+
 	NCR5380_print_status(cmd->host);
+
+#if 0	/* old code */
 	do_reset(cmd->host);
 
 	return SCSI_RESET_WAKEUP;
+#else
+	/* get in phase */
+	NCR5380_write( TARGET_COMMAND_REG,
+		      PHASE_SR_TO_TCR( NCR5380_read(STATUS_REG) ));
+	/* assert RST */
+	NCR5380_write( INITIATOR_COMMAND_REG, ICR_BASE | ICR_ASSERT_RST );
+	udelay (40);
+	/* reset NCR registers */
+	NCR5380_write( INITIATOR_COMMAND_REG, ICR_BASE );
+	NCR5380_write( MODE_REG, MR_BASE );
+	NCR5380_write( TARGET_COMMAND_REG, 0 );
+	NCR5380_write( SELECT_ENABLE_REG, 0 );
+	/* ++roman: reset interrupt condition! otherwise no interrupts don't get
+	 * through anymore ... */
+	(void)NCR5380_read( RESET_PARITY_INTERRUPT_REG );
+
+	/* XXX Should now be done by midlevel code, but it's broken XXX */
+	/* XXX see below                                            XXX */
+
+	/* MSch: old-style reset: actually abort all command processing here */
+
+	/* After the reset, there are no more connected or disconnected commands
+	 * and no busy units; to avoid problems with re-inserting the commands
+	 * into the issue_queue (via scsi_done()), the aborted commands are
+	 * remembered in local variables first.
+	 */
+	save_flags(flags);
+	cli();
+	connected = (Scsi_Cmnd *)hostdata->connected;
+	hostdata->connected = NULL;
+	disconnected_queue = (Scsi_Cmnd *)hostdata->disconnected_queue;
+	hostdata->disconnected_queue = NULL;
+
+	for( i = 0; i < 8; ++i )
+	  hostdata->busy[i] = 0;
+#ifdef REAL_DMA
+	hostdata->dma_len = 0;
+#endif
+	restore_flags(flags);
+
+	/* In order to tell the mid-level code which commands were aborted, 
+	 * set the command status to DID_RESET and call scsi_done() !!!
+	 * This ultimately aborts processing of these commands in the mid-level.
+	 */
+
+	if ((cmd = connected)) {
+	  printk("scsi%d: reset aborted a connected command\n", (cmd)->host->host_no);
+	  cmd->result = (cmd->result & 0xffff) | (DID_RESET << 16);
+	  cmd->scsi_done( cmd );
+	}
+
+	for (i = 0; (cmd = disconnected_queue); ++i) {
+	  disconnected_queue = (Scsi_Cmnd *) cmd->host_scribble;
+	  (Scsi_Cmnd *) cmd->host_scribble = NULL;
+	  cmd->result = (cmd->result & 0xffff) | (DID_RESET << 16);
+	  cmd->scsi_done( cmd );
+	}
+	if (i > 0)
+	  printk("scsi: reset aborted %d disconnected command(s)\n", i);
+
+	/* since all commands have been explicitly terminated, we need to tell
+	 * the midlevel code that the reset was SUCCESSFUL, and there is no 
+	 * need to 'wake up' the commands by a request_sense
+	 */
+	return SCSI_RESET_SUCCESS | SCSI_RESET_BUS_RESET;
+
+#endif
 }

--------------6344831F8616DAB62AE38F22--

