LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
@ 2004-05-22 19:07 Lorenzo Allegrucci
  2004-05-22 19:21 ` Andrew Morton
  2004-05-23  8:27 ` Jens Axboe
  0 siblings, 2 replies; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-22 19:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel


I get a 100% reproducible oops mounting an ext3 or reiserfs
partition with -o barrier enabled.
Hand written oops (for ext3):

hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hda: drive_cmd: error=0x04 { DriveStatusError }
end_request: I/O error, dev hda, sector 84666781
------------[ cut here ]------------
kernel BUG at include/linux/blkdev.h:576!
invalid operand: 0000 [#1]
PREEMPT
Modules linked in: lp emu10k1 sound soundcore ac97_codec unix
CPU:    0
EIP:    0060:[<c02125ae>]    Not tainted VLI
EFLAGS: 00010046   (2.6.6-mm5)
EIP is at __ide_end_request+0xbe/0x110
eax: 00000e7a   ebx: da88636c   ecx: 00000000   edx: da88636c
esi: 00000000   edi: c038180c   ebp: 00000008   esp: c0368ec4
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c0368000 task=c02d1a40)
Stack: 00000001 00000008 da88636c 050be99d 00000000 c02128fc 00000008 c0116718
       00000000 c036bb01 00000246 00000001 00000000 c038180c 00000286 c0368000
       dffdfa78 c038180c c0212a79 c02abc65 c0381760 042aa50a 00000051 dffdf951
Call Trace:
 [<c02128fc>] ide_complete_barrier+0xec/0x160
 [<c0116718>] release_console_sem+0xc8/0xe0
 [<c0212a79>] ide_end_drive_cmd+0x109/0x370
 [<c021d54c>] idedisk_error+0x6c/0x1f0
 [<c02024d2>] end_that_request_last+0x52/0xb0
 [<c021258c>] __ide_end_request+0x9c/0x110
 [<c01fefc2>] elv_queue_empty+0x12/0x20
 [<c0213548>] ide_do_request+0x78/0x390
 [<c0213059>] drive_cmd_intr+0x79/0xd0
 [<c0213cd3>] ide_intr+0xd3/0x170
 [<c0212fe0>] drive_cmd_intr+0x0/0xd0
 [<c0106f60>] handle_IRQ_event+0x30/0x60
 [<c01072b0>] do_IRQ+0xc0/0x1a0
========================
 [<c01058c8>] common_interrupt+0x18/0x20
 [<c01038d3>] default_idle+0x23/0x30
 [<c010393c>] cpu_idle+0x2c/0x40
 [<c03455d9>] start_kernel+0x179/0x1c0
 [<c0345320>] unknown_bootoption+0x0/0x110

Code: 75 1d 89 d8 e8 f4 fe fe ff 8b 47 70 8b 40 08 c7 40 20 00 00 00 00 c7 04
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing


hda is a Maxtor 6Y060L0

-- 
Lorenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-22 19:07 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier Lorenzo Allegrucci
@ 2004-05-22 19:21 ` Andrew Morton
  2004-05-22 21:20   ` Jens Axboe
  2004-05-23  8:27 ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2004-05-22 19:21 UTC (permalink / raw)
  To: Lorenzo Allegrucci; +Cc: linux-kernel, Jens Axboe

Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
>
> 
> I get a 100% reproducible oops mounting an ext3 or reiserfs
> partition with -o barrier enabled.
> Hand written oops (for ext3):

That's a lot of hand-writing.  Thanks for doing that.  You can usually omit
the hex numbers in [brackets] when doing this.

The crash is here:

static inline void blkdev_dequeue_request(struct request *req)
{
	BUG_ON(list_empty(&req->queuelist));

perhaps related to that I/O error sending the code through less-tested
paths.


> hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> hda: drive_cmd: error=0x04 { DriveStatusError }
> end_request: I/O error, dev hda, sector 84666781
> ------------[ cut here ]------------
> kernel BUG at include/linux/blkdev.h:576!
> invalid operand: 0000 [#1]
> PREEMPT
> Modules linked in: lp emu10k1 sound soundcore ac97_codec unix
> CPU:    0
> EIP:    0060:[<c02125ae>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.6-mm5)
> EIP is at __ide_end_request+0xbe/0x110
> eax: 00000e7a   ebx: da88636c   ecx: 00000000   edx: da88636c
> esi: 00000000   edi: c038180c   ebp: 00000008   esp: c0368ec4
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, threadinfo=c0368000 task=c02d1a40)
> Stack: 00000001 00000008 da88636c 050be99d 00000000 c02128fc 00000008 c0116718
>        00000000 c036bb01 00000246 00000001 00000000 c038180c 00000286 c0368000
>        dffdfa78 c038180c c0212a79 c02abc65 c0381760 042aa50a 00000051 dffdf951
> Call Trace:
>  [<c02128fc>] ide_complete_barrier+0xec/0x160
>  [<c0116718>] release_console_sem+0xc8/0xe0
>  [<c0212a79>] ide_end_drive_cmd+0x109/0x370
>  [<c021d54c>] idedisk_error+0x6c/0x1f0
>  [<c02024d2>] end_that_request_last+0x52/0xb0
>  [<c021258c>] __ide_end_request+0x9c/0x110
>  [<c01fefc2>] elv_queue_empty+0x12/0x20
>  [<c0213548>] ide_do_request+0x78/0x390
>  [<c0213059>] drive_cmd_intr+0x79/0xd0
>  [<c0213cd3>] ide_intr+0xd3/0x170
>  [<c0212fe0>] drive_cmd_intr+0x0/0xd0
>  [<c0106f60>] handle_IRQ_event+0x30/0x60
>  [<c01072b0>] do_IRQ+0xc0/0x1a0
> ========================
>  [<c01058c8>] common_interrupt+0x18/0x20
>  [<c01038d3>] default_idle+0x23/0x30
>  [<c010393c>] cpu_idle+0x2c/0x40
>  [<c03455d9>] start_kernel+0x179/0x1c0
>  [<c0345320>] unknown_bootoption+0x0/0x110
> 
> Code: 75 1d 89 d8 e8 f4 fe fe ff 8b 47 70 8b 40 08 c7 40 20 00 00 00 00 c7 04
>  <0>Kernel panic: Fatal exception in interrupt
> In interrupt handler - not syncing
> 
> 
> hda is a Maxtor 6Y060L0
> 
> -- 
> Lorenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-22 19:21 ` Andrew Morton
@ 2004-05-22 21:20   ` Jens Axboe
  2004-05-22 21:30     ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-22 21:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Lorenzo Allegrucci, linux-kernel

On Sat, May 22 2004, Andrew Morton wrote:
> Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> >
> > 
> > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > partition with -o barrier enabled.
> > Hand written oops (for ext3):
> 
> That's a lot of hand-writing.  Thanks for doing that.  You can usually omit
> the hex numbers in [brackets] when doing this.
> 
> The crash is here:
> 
> static inline void blkdev_dequeue_request(struct request *req)
> {
> 	BUG_ON(list_empty(&req->queuelist));
> 
> perhaps related to that I/O error sending the code through less-tested
> paths.

Ouch indeed, I'll get that fixed up first thing in the morning.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-22 21:20   ` Jens Axboe
@ 2004-05-22 21:30     ` Jens Axboe
  2004-05-23  8:58       ` Lorenzo Allegrucci
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-22 21:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Lorenzo Allegrucci, linux-kernel

On Sat, May 22 2004, Jens Axboe wrote:
> On Sat, May 22 2004, Andrew Morton wrote:
> > Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> > >
> > > 
> > > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > > partition with -o barrier enabled.
> > > Hand written oops (for ext3):
> > 
> > That's a lot of hand-writing.  Thanks for doing that.  You can usually omit
> > the hex numbers in [brackets] when doing this.
> > 
> > The crash is here:
> > 
> > static inline void blkdev_dequeue_request(struct request *req)
> > {
> > 	BUG_ON(list_empty(&req->queuelist));
> > 
> > perhaps related to that I/O error sending the code through less-tested
> > paths.
> 
> Ouch indeed, I'll get that fixed up first thing in the morning.

Can you test this work-around? The work-around should be perfectly safe,
this is just a case where a BUG_ON() does more harm than good :-)

--- drivers/ide/ide-io.c~	2004-05-21 11:02:58.000000000 +0200
+++ drivers/ide/ide-io.c	2004-05-22 23:28:37.692944185 +0200
@@ -291,6 +291,8 @@
 		sector = real_rq->hard_sector;
 
 	bad_sectors = real_rq->hard_nr_sectors - good_sectors;
+	/* work-around, make sure request is on queue */
+	elv_requeue_request(drive->queue, real_rq);
 	if (good_sectors)
 		__ide_end_request(drive, real_rq, 1, good_sectors);
 	if (bad_sectors)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-22 19:07 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier Lorenzo Allegrucci
  2004-05-22 19:21 ` Andrew Morton
@ 2004-05-23  8:27 ` Jens Axboe
  2004-05-23 10:37   ` Kurt Garloff
  1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23  8:27 UTC (permalink / raw)
  To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel

On Sat, May 22 2004, Lorenzo Allegrucci wrote:
> 
> I get a 100% reproducible oops mounting an ext3 or reiserfs
> partition with -o barrier enabled.
> Hand written oops (for ext3):
> 
> hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> hda: drive_cmd: error=0x04 { DriveStatusError }
> end_request: I/O error, dev hda, sector 84666781
> ------------[ cut here ]------------
> kernel BUG at include/linux/blkdev.h:576!
> invalid operand: 0000 [#1]
> PREEMPT
> Modules linked in: lp emu10k1 sound soundcore ac97_codec unix
> CPU:    0
> EIP:    0060:[<c02125ae>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.6-mm5)
> EIP is at __ide_end_request+0xbe/0x110
> eax: 00000e7a   ebx: da88636c   ecx: 00000000   edx: da88636c
> esi: 00000000   edi: c038180c   ebp: 00000008   esp: c0368ec4
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, threadinfo=c0368000 task=c02d1a40)
> Stack: 00000001 00000008 da88636c 050be99d 00000000 c02128fc 00000008 c0116718
>        00000000 c036bb01 00000246 00000001 00000000 c038180c 00000286 c0368000
>        dffdfa78 c038180c c0212a79 c02abc65 c0381760 042aa50a 00000051 dffdf951
> Call Trace:
>  [<c02128fc>] ide_complete_barrier+0xec/0x160

So here's a complete patch, that both fixes the code bug and adds some
support for (hopefully) fool proof checking proper flush support in the
drive. Can you give this a test, please? Post the dmesg ide boot
messages and whether it works, thanks.

diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-disk.c linux-2.6.6-mm5/drivers/ide/ide-disk.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-disk.c	2004-05-23 09:56:20.098457257 +0200
+++ linux-2.6.6-mm5/drivers/ide/ide-disk.c	2004-05-23 10:23:36.211541946 +0200
@@ -1392,13 +1392,6 @@ static int set_nowerr(ide_drive_t *drive
 	return 0;
 }
 
-/* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
-#define ide_id_has_flush_cache(id)	((id)->cfs_enable_2 & 0x3000)
-
-/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
-#define ide_id_has_flush_cache_ext(id)	\
-	(((id)->cfs_enable_2 & 0x2400) == 0x2400)
-
 static int write_cache (ide_drive_t *drive, int arg)
 {
 	ide_task_t args;
@@ -1597,6 +1590,7 @@ static void idedisk_setup (ide_drive_t *
 {
 	struct hd_driveid *id = drive->id;
 	unsigned long long capacity;
+	int barrier = 0;
 
 	idedisk_add_settings(drive);
 
@@ -1729,8 +1723,29 @@ static void idedisk_setup (ide_drive_t *
 
 	write_cache(drive, 1);
 
-	blk_queue_ordered(drive->queue, 1);
-	blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
+	/*
+	 * decide if we can sanely support flushes and barriers on
+	 * this drive
+	 */
+	if (drive->addressing == 1) {
+		/*
+		 * if capacity is over 2^28 sectors, drive must support
+		 * FLUSH_CACHE_EXT
+		 */
+		if (ide_id_has_flush_cache_ext(id))
+			barrier = 1;
+		else if (capacity <= (1ULL << 28))
+			barrier = 1;
+		else
+			printk("%s: drive is buggy, no support for FLUSH_CACHE_EXT with lba48\n", drive->name);
+	} else if (ide_id_has_flush_cache(id))
+		barrier = 1;
+
+	printk("%s: cache flushes %ssupported\n", drive->name, barrier ? "" : "not ");
+	if (barrier) {
+		blk_queue_ordered(drive->queue, 1);
+		blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
+	}
 
 #ifdef CONFIG_BLK_DEV_IDE_TCQ_DEFAULT
 	if (drive->using_dma)
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-io.c linux-2.6.6-mm5/drivers/ide/ide-io.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-io.c	2004-05-23 09:56:20.100457041 +0200
+++ linux-2.6.6-mm5/drivers/ide/ide-io.c	2004-05-23 10:22:59.204542947 +0200
@@ -67,7 +67,7 @@ static void ide_fill_flush_cmd(ide_drive
 	rq->buffer = buf;
 	rq->buffer[0] = WIN_FLUSH_CACHE;
 
-	if (drive->id->cfs_enable_2 & 0x2400)
+	if (ide_id_has_flush_cache_ext(drive->id))
 		rq->buffer[0] = WIN_FLUSH_CACHE_EXT;
 }
 
@@ -229,7 +229,7 @@ u64 ide_get_error_location(ide_drive_t *
 	lcyl = args[4];
 	sect = args[3];
 
-	if (drive->id->cfs_enable_2 & 0x2400) {
+	if (ide_id_has_flush_cache_ext(drive->id)) {
 		low = (hcyl << 16) | (lcyl << 8) | sect;
 		HWIF(drive)->OUTB(drive->ctl|0x80, IDE_CONTROL_REG);
 		high = ide_read_24(drive);
@@ -277,32 +277,48 @@ static void ide_complete_barrier(ide_dri
 	}
 
 	/*
-	 * bummer, flush failed. if it was the pre-flush, fail the barrier.
-	 * if it was the post-flush, complete the succesful part of the request
-	 * and fail the rest
+	 * we need to end real_rq, but it's not on the queue currently.
+	 * put it back on the queue, so we don't have to special case
+	 * anything else for completing it
 	 */
-	good_sectors = 0;
-	if (blk_barrier_postflush(rq)) {
-		sector = ide_get_error_location(drive, rq->buffer);
-
-		if ((sector >= real_rq->hard_sector) &&
-		    (sector < real_rq->hard_sector + real_rq->hard_nr_sectors))
-			good_sectors = sector - real_rq->hard_sector;
-	} else
-		sector = real_rq->hard_sector;
+	elv_requeue_request(drive->queue, real_rq);
+	
+	/*
+	 * drive aborted flush command, assume FLUSH_CACHE_* doesn't
+	 * work and disable barrier support
+	 */
+	if (error & ABRT_ERR) {
+		printk(KERN_ERR "%s: barrier support doesn't work\n", drive->name);
+		__ide_end_request(drive, real_rq, 0, real_rq->hard_nr_sectors);
+		blk_queue_ordered(drive->queue, 0);
+		blk_queue_issue_flush_fn(drive->queue, NULL);
+	} else {
+		/*
+		 * find out what part of the request failed
+		 */
+		good_sectors = 0;
+		if (blk_barrier_postflush(rq)) {
+			sector = ide_get_error_location(drive, rq->buffer);
 
-	bad_sectors = real_rq->hard_nr_sectors - good_sectors;
-	if (good_sectors)
-		__ide_end_request(drive, real_rq, 1, good_sectors);
-	if (bad_sectors)
-		__ide_end_request(drive, real_rq, 0, bad_sectors);
+			if ((sector >= real_rq->hard_sector) &&
+			    (sector < real_rq->hard_sector + real_rq->hard_nr_sectors))
+				good_sectors = sector - real_rq->hard_sector;
+		} else
+			sector = real_rq->hard_sector;
+
+		bad_sectors = real_rq->hard_nr_sectors - good_sectors;
+		if (good_sectors)
+			__ide_end_request(drive, real_rq, 1, good_sectors);
+		if (bad_sectors)
+			__ide_end_request(drive, real_rq, 0, bad_sectors);
+
+		printk(KERN_ERR "%s: failed barrier write: "
+				"sector=%Lx(good=%d/bad=%d)\n",
+				drive->name, (unsigned long long)sector,
+				good_sectors, bad_sectors);
+	}
 
 	drive->doing_barrier = 0;
-
-	printk(KERN_ERR "%s: failed barrier write: "
-			"sector=%Lx(good=%d/bad=%d)\n",
-			drive->name, (unsigned long long)sector,
-			good_sectors, bad_sectors);
 }
 
 /**
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/include/linux/ide.h linux-2.6.6-mm5/include/linux/ide.h
--- /opt/kernel/linux-2.6.6-mm5/include/linux/ide.h	2004-05-23 09:56:21.401316264 +0200
+++ linux-2.6.6-mm5/include/linux/ide.h	2004-05-23 10:09:21.313967868 +0200
@@ -1716,4 +1716,11 @@ static inline int ata_can_queue(ide_driv
 
 extern struct bus_type ide_bus_type;
 
+/* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
+#define ide_id_has_flush_cache(id)	((id)->cfs_enable_2 & 0x3000)
+
+/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
+#define ide_id_has_flush_cache_ext(id)	\
+	(((id)->cfs_enable_2 & 0x2400) == 0x2400)
+
 #endif /* _IDE_H */

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-22 21:30     ` Jens Axboe
@ 2004-05-23  8:58       ` Lorenzo Allegrucci
  2004-05-23  9:11         ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23  8:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Saturday 22 May 2004 23:30, Jens Axboe wrote:
> On Sat, May 22 2004, Jens Axboe wrote:
> > On Sat, May 22 2004, Andrew Morton wrote:
> > > Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> > > > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > > > partition with -o barrier enabled.
> > > > Hand written oops (for ext3):
> > >
> > > That's a lot of hand-writing.  Thanks for doing that.  You can usually
> > > omit the hex numbers in [brackets] when doing this.
> > >
> > > The crash is here:
> > >
> > > static inline void blkdev_dequeue_request(struct request *req)
> > > {
> > > 	BUG_ON(list_empty(&req->queuelist));
> > >
> > > perhaps related to that I/O error sending the code through less-tested
> > > paths.
> >
> > Ouch indeed, I'll get that fixed up first thing in the morning.
>
> Can you test this work-around? The work-around should be perfectly safe,
> this is just a case where a BUG_ON() does more harm than good :-)
>
> --- drivers/ide/ide-io.c~	2004-05-21 11:02:58.000000000 +0200
> +++ drivers/ide/ide-io.c	2004-05-22 23:28:37.692944185 +0200
> @@ -291,6 +291,8 @@
>  		sector = real_rq->hard_sector;
>
>  	bad_sectors = real_rq->hard_nr_sectors - good_sectors;
> +	/* work-around, make sure request is on queue */
> +	elv_requeue_request(drive->queue, real_rq);
>  	if (good_sectors)
>  		__ide_end_request(drive, real_rq, 1, good_sectors);
>  	if (bad_sectors)

The oops goes away but:

hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hda: drive_cmd: error=0x04 { DriveStatusError }
end_request: I/O error, dev hda, sector 84667085
Buffer I/O error on device hda11, logical block 559
lost page write due to I/O error on hda11
hda: failed barrier write: sector=50beacd(good=0/bad=8)
Aborting journal on device hda11.
ext3_abort called.
EXT3-fs abort (device hda11): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only

-- 
Lorenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23  8:58       ` Lorenzo Allegrucci
@ 2004-05-23  9:11         ` Jens Axboe
  2004-05-23 10:03           ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23  9:11 UTC (permalink / raw)
  To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel

On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> On Saturday 22 May 2004 23:30, Jens Axboe wrote:
> > On Sat, May 22 2004, Jens Axboe wrote:
> > > On Sat, May 22 2004, Andrew Morton wrote:
> > > > Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> > > > > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > > > > partition with -o barrier enabled.
> > > > > Hand written oops (for ext3):
> > > >
> > > > That's a lot of hand-writing.  Thanks for doing that.  You can usually
> > > > omit the hex numbers in [brackets] when doing this.
> > > >
> > > > The crash is here:
> > > >
> > > > static inline void blkdev_dequeue_request(struct request *req)
> > > > {
> > > > 	BUG_ON(list_empty(&req->queuelist));
> > > >
> > > > perhaps related to that I/O error sending the code through less-tested
> > > > paths.
> > >
> > > Ouch indeed, I'll get that fixed up first thing in the morning.
> >
> > Can you test this work-around? The work-around should be perfectly safe,
> > this is just a case where a BUG_ON() does more harm than good :-)
> >
> > --- drivers/ide/ide-io.c~	2004-05-21 11:02:58.000000000 +0200
> > +++ drivers/ide/ide-io.c	2004-05-22 23:28:37.692944185 +0200
> > @@ -291,6 +291,8 @@
> >  		sector = real_rq->hard_sector;
> >
> >  	bad_sectors = real_rq->hard_nr_sectors - good_sectors;
> > +	/* work-around, make sure request is on queue */
> > +	elv_requeue_request(drive->queue, real_rq);
> >  	if (good_sectors)
> >  		__ide_end_request(drive, real_rq, 1, good_sectors);
> >  	if (bad_sectors)
> 
> The oops goes away but:
> 
> hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> hda: drive_cmd: error=0x04 { DriveStatusError }
> end_request: I/O error, dev hda, sector 84667085
> Buffer I/O error on device hda11, logical block 559
> lost page write due to I/O error on hda11
> hda: failed barrier write: sector=50beacd(good=0/bad=8)
> Aborting journal on device hda11.
> ext3_abort called.
> EXT3-fs abort (device hda11): ext3_journal_start: Detected aborted journal
> Remounting filesystem read-only

That's expected with that patch. Try the one I just sent you as well.
ext3 doesn't recover nicely though, I'll see if I can find a way to pass
back -EOPNOTSUPP in case of ABRT in completer_barrier().

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23  9:11         ` Jens Axboe
@ 2004-05-23 10:03           ` Jens Axboe
  2004-05-23 15:32             ` Lorenzo Allegrucci
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 10:03 UTC (permalink / raw)
  To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel

On Sun, May 23 2004, Jens Axboe wrote:
> On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> > On Saturday 22 May 2004 23:30, Jens Axboe wrote:
> > > On Sat, May 22 2004, Jens Axboe wrote:
> > > > On Sat, May 22 2004, Andrew Morton wrote:
> > > > > Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> > > > > > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > > > > > partition with -o barrier enabled.
> > > > > > Hand written oops (for ext3):
> > > > >
> > > > > That's a lot of hand-writing.  Thanks for doing that.  You can usually
> > > > > omit the hex numbers in [brackets] when doing this.
> > > > >
> > > > > The crash is here:
> > > > >
> > > > > static inline void blkdev_dequeue_request(struct request *req)
> > > > > {
> > > > > 	BUG_ON(list_empty(&req->queuelist));
> > > > >
> > > > > perhaps related to that I/O error sending the code through less-tested
> > > > > paths.
> > > >
> > > > Ouch indeed, I'll get that fixed up first thing in the morning.
> > >
> > > Can you test this work-around? The work-around should be perfectly safe,
> > > this is just a case where a BUG_ON() does more harm than good :-)
> > >
> > > --- drivers/ide/ide-io.c~	2004-05-21 11:02:58.000000000 +0200
> > > +++ drivers/ide/ide-io.c	2004-05-22 23:28:37.692944185 +0200
> > > @@ -291,6 +291,8 @@
> > >  		sector = real_rq->hard_sector;
> > >
> > >  	bad_sectors = real_rq->hard_nr_sectors - good_sectors;
> > > +	/* work-around, make sure request is on queue */
> > > +	elv_requeue_request(drive->queue, real_rq);
> > >  	if (good_sectors)
> > >  		__ide_end_request(drive, real_rq, 1, good_sectors);
> > >  	if (bad_sectors)
> > 
> > The oops goes away but:
> > 
> > hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> > hda: drive_cmd: error=0x04 { DriveStatusError }
> > end_request: I/O error, dev hda, sector 84667085
> > Buffer I/O error on device hda11, logical block 559
> > lost page write due to I/O error on hda11
> > hda: failed barrier write: sector=50beacd(good=0/bad=8)
> > Aborting journal on device hda11.
> > ext3_abort called.
> > EXT3-fs abort (device hda11): ext3_journal_start: Detected aborted journal
> > Remounting filesystem read-only
> 
> That's expected with that patch. Try the one I just sent you as well.
> ext3 doesn't recover nicely though, I'll see if I can find a way to pass
> back -EOPNOTSUPP in case of ABRT in completer_barrier().

Here's a rolled up updated version that tries to get async notification
of missing barrier support working as well. reiser currently doesn't
cope with that correctly (fails mount), ext3 seems to but gets stuck.
Andrew has that fixed already, I think :-)

Lorenzo, can you test this on top of 2.6.6-mm5?

diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/block/ll_rw_blk.c linux-2.6.6-mm5/drivers/block/ll_rw_blk.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/block/ll_rw_blk.c	2004-05-23 09:56:19.000000000 +0200
+++ linux-2.6.6-mm5/drivers/block/ll_rw_blk.c	2004-05-23 11:29:57.774204303 +0200
@@ -2706,10 +2706,17 @@ void blk_recalc_rq_sectors(struct reques
 static int __end_that_request_first(struct request *req, int uptodate,
 				    int nr_bytes)
 {
-	int total_bytes, bio_nbytes, error = 0, next_idx = 0;
+	int total_bytes, bio_nbytes, error, next_idx = 0;
 	struct bio *bio;
 
 	/*
+	 * extend uptodate bool to allow < 0 value to be direct io error
+	 */
+	error = 0;
+	if (end_io_error(uptodate))
+		error = !uptodate ? -EIO : uptodate;
+
+	/*
 	 * for a REQ_BLOCK_PC request, we want to carry any eventual
 	 * sense key with us all the way through
 	 */
@@ -2717,7 +2724,6 @@ static int __end_that_request_first(stru
 		req->errors = 0;
 
 	if (!uptodate) {
-		error = -EIO;
 		if (blk_fs_request(req) && !(req->flags & REQ_QUIET))
 			printk("end_request: I/O error, dev %s, sector %llu\n",
 				req->rq_disk ? req->rq_disk->disk_name : "?",
@@ -2800,7 +2806,7 @@ static int __end_that_request_first(stru
 /**
  * end_that_request_first - end I/O on a request
  * @req:      the request being processed
- * @uptodate: 0 for I/O error
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
  * @nr_sectors: number of sectors to end I/O on
  *
  * Description:
@@ -2821,7 +2827,7 @@ EXPORT_SYMBOL(end_that_request_first);
 /**
  * end_that_request_chunk - end I/O on a request
  * @req:      the request being processed
- * @uptodate: 0 for I/O error
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
  * @nr_bytes: number of bytes to complete
  *
  * Description:
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-disk.c linux-2.6.6-mm5/drivers/ide/ide-disk.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-disk.c	2004-05-23 09:56:20.000000000 +0200
+++ linux-2.6.6-mm5/drivers/ide/ide-disk.c	2004-05-23 11:48:51.095716239 +0200
@@ -1392,13 +1392,6 @@ static int set_nowerr(ide_drive_t *drive
 	return 0;
 }
 
-/* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
-#define ide_id_has_flush_cache(id)	((id)->cfs_enable_2 & 0x3000)
-
-/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
-#define ide_id_has_flush_cache_ext(id)	\
-	(((id)->cfs_enable_2 & 0x2400) == 0x2400)
-
 static int write_cache (ide_drive_t *drive, int arg)
 {
 	ide_task_t args;
@@ -1597,6 +1590,7 @@ static void idedisk_setup (ide_drive_t *
 {
 	struct hd_driveid *id = drive->id;
 	unsigned long long capacity;
+	int barrier = 0;
 
 	idedisk_add_settings(drive);
 
@@ -1729,8 +1723,32 @@ static void idedisk_setup (ide_drive_t *
 
 	write_cache(drive, 1);
 
-	blk_queue_ordered(drive->queue, 1);
-	blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
+	/*
+	 * decide if we can sanely support flushes and barriers on
+	 * this drive. if we have FLUSH_CACHE_EXT, it's always safe.
+	 * if not, it's only safe if drive has FLUSH_CACHE, and:
+	 *	- capacity is less than required for LBA48
+	 *	- doesn't use LBA48
+	 */
+	if (ide_id_has_flush_cache_ext(id))
+		barrier = 1;
+	else {
+		if (drive->addressing == 1) {
+			if (capacity <= (1ULL << 28))
+				barrier = 1;
+			else
+				printk("%s: drive is buggy, no support for FLUSH_CACHE_EXT with lba48\n", drive->name);
+		} else {
+			if (ide_id_has_flush_cache(id))
+				barrier = 1;
+		}
+	}
+			barrier = 1;
+	printk("%s: cache flushes %ssupported\n", drive->name, barrier ? "" : "not ");
+	if (barrier) {
+		blk_queue_ordered(drive->queue, 1);
+		blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
+	}
 
 #ifdef CONFIG_BLK_DEV_IDE_TCQ_DEFAULT
 	if (drive->using_dma)
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-io.c linux-2.6.6-mm5/drivers/ide/ide-io.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-io.c	2004-05-23 09:56:20.000000000 +0200
+++ linux-2.6.6-mm5/drivers/ide/ide-io.c	2004-05-23 12:00:13.150994698 +0200
@@ -67,7 +67,7 @@ static void ide_fill_flush_cmd(ide_drive
 	rq->buffer = buf;
 	rq->buffer[0] = WIN_FLUSH_CACHE;
 
-	if (drive->id->cfs_enable_2 & 0x2400)
+	if (ide_id_has_flush_cache_ext(drive->id))
 		rq->buffer[0] = WIN_FLUSH_CACHE_EXT;
 }
 
@@ -115,10 +115,10 @@ static int __ide_end_request(ide_drive_t
 	 * if failfast is set on a request, override number of sectors and
 	 * complete the whole request right now
 	 */
-	if (blk_noretry_request(rq) && !uptodate)
+	if (blk_noretry_request(rq) && end_io_error(uptodate))
 		nr_sectors = rq->hard_nr_sectors;
 
-	if (!blk_fs_request(rq) && !uptodate && !rq->errors)
+	if (!blk_fs_request(rq) && end_io_error(uptodate) && !rq->errors)
 		rq->errors = -EIO;
 
 	/*
@@ -229,7 +229,7 @@ u64 ide_get_error_location(ide_drive_t *
 	lcyl = args[4];
 	sect = args[3];
 
-	if (drive->id->cfs_enable_2 & 0x2400) {
+	if (ide_id_has_flush_cache_ext(drive->id)) {
 		low = (hcyl << 16) | (lcyl << 8) | sect;
 		HWIF(drive)->OUTB(drive->ctl|0x80, IDE_CONTROL_REG);
 		high = ide_read_24(drive);
@@ -277,32 +277,48 @@ static void ide_complete_barrier(ide_dri
 	}
 
 	/*
-	 * bummer, flush failed. if it was the pre-flush, fail the barrier.
-	 * if it was the post-flush, complete the succesful part of the request
-	 * and fail the rest
+	 * we need to end real_rq, but it's not on the queue currently.
+	 * put it back on the queue, so we don't have to special case
+	 * anything else for completing it
 	 */
-	good_sectors = 0;
-	if (blk_barrier_postflush(rq)) {
-		sector = ide_get_error_location(drive, rq->buffer);
-
-		if ((sector >= real_rq->hard_sector) &&
-		    (sector < real_rq->hard_sector + real_rq->hard_nr_sectors))
-			good_sectors = sector - real_rq->hard_sector;
-	} else
-		sector = real_rq->hard_sector;
+	elv_requeue_request(drive->queue, real_rq);
+	
+	/*
+	 * drive aborted flush command, assume FLUSH_CACHE_* doesn't
+	 * work and disable barrier support
+	 */
+	if (error & ABRT_ERR) {
+		printk(KERN_ERR "%s: barrier support doesn't work\n", drive->name);
+		__ide_end_request(drive, real_rq, -EOPNOTSUPP, real_rq->hard_nr_sectors);
+		blk_queue_ordered(drive->queue, 0);
+		blk_queue_issue_flush_fn(drive->queue, NULL);
+	} else {
+		/*
+		 * find out what part of the request failed
+		 */
+		good_sectors = 0;
+		if (blk_barrier_postflush(rq)) {
+			sector = ide_get_error_location(drive, rq->buffer);
 
-	bad_sectors = real_rq->hard_nr_sectors - good_sectors;
-	if (good_sectors)
-		__ide_end_request(drive, real_rq, 1, good_sectors);
-	if (bad_sectors)
-		__ide_end_request(drive, real_rq, 0, bad_sectors);
+			if ((sector >= real_rq->hard_sector) &&
+			    (sector < real_rq->hard_sector + real_rq->hard_nr_sectors))
+				good_sectors = sector - real_rq->hard_sector;
+		} else
+			sector = real_rq->hard_sector;
+
+		bad_sectors = real_rq->hard_nr_sectors - good_sectors;
+		if (good_sectors)
+			__ide_end_request(drive, real_rq, 1, good_sectors);
+		if (bad_sectors)
+			__ide_end_request(drive, real_rq, 0, bad_sectors);
+
+		printk(KERN_ERR "%s: failed barrier write: "
+				"sector=%Lx(good=%d/bad=%d)\n",
+				drive->name, (unsigned long long)sector,
+				good_sectors, bad_sectors);
+	}
 
 	drive->doing_barrier = 0;
-
-	printk(KERN_ERR "%s: failed barrier write: "
-			"sector=%Lx(good=%d/bad=%d)\n",
-			drive->name, (unsigned long long)sector,
-			good_sectors, bad_sectors);
 }
 
 /**
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/include/linux/blkdev.h linux-2.6.6-mm5/include/linux/blkdev.h
--- /opt/kernel/linux-2.6.6-mm5/include/linux/blkdev.h	2004-05-23 09:56:21.000000000 +0200
+++ linux-2.6.6-mm5/include/linux/blkdev.h	2004-05-23 11:57:30.227604989 +0200
@@ -571,6 +571,14 @@ extern void end_that_request_last(struct
 extern int process_that_request_first(struct request *, unsigned int);
 extern void end_request(struct request *req, int uptodate);
 
+/*
+ * end_that_request_first/chunk() takes an uptodate argument. we account
+ * any value <= as an io error. 0 means -EIO for compatability reasons,
+ * any other < 0 value is the direct error type. An uptodate value of
+ * 1 indicates successful io completion
+ */
+#define end_io_error(uptodate)	(unlikely((uptodate) <= 0))
+
 static inline void blkdev_dequeue_request(struct request *req)
 {
 	BUG_ON(list_empty(&req->queuelist));
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/include/linux/ide.h linux-2.6.6-mm5/include/linux/ide.h
--- /opt/kernel/linux-2.6.6-mm5/include/linux/ide.h	2004-05-23 09:56:21.000000000 +0200
+++ linux-2.6.6-mm5/include/linux/ide.h	2004-05-23 10:09:21.313967868 +0200
@@ -1716,4 +1716,11 @@ static inline int ata_can_queue(ide_driv
 
 extern struct bus_type ide_bus_type;
 
+/* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
+#define ide_id_has_flush_cache(id)	((id)->cfs_enable_2 & 0x3000)
+
+/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
+#define ide_id_has_flush_cache_ext(id)	\
+	(((id)->cfs_enable_2 & 0x2400) == 0x2400)
+
 #endif /* _IDE_H */

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23  8:27 ` Jens Axboe
@ 2004-05-23 10:37   ` Kurt Garloff
  2004-05-23 10:56     ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Kurt Garloff @ 2004-05-23 10:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Lorenzo Allegrucci, Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]

Hi Jens,

On Sun, May 23, 2004 at 10:27:28AM +0200, Jens Axboe wrote:
> @@ -1729,8 +1723,29 @@ static void idedisk_setup (ide_drive_t *
>  
>  	write_cache(drive, 1);
>  
> -	blk_queue_ordered(drive->queue, 1);
> -	blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
> +	/*
> +	 * decide if we can sanely support flushes and barriers on
> +	 * this drive
> +	 */
> +	if (drive->addressing == 1) {
> +		/*
> +		 * if capacity is over 2^28 sectors, drive must support
> +		 * FLUSH_CACHE_EXT
> +		 */
> +		if (ide_id_has_flush_cache_ext(id))
> +			barrier = 1;
> +		else if (capacity <= (1ULL << 28))
> +			barrier = 1;
> +		else
> +			printk("%s: drive is buggy, no support for FLUSH_CACHE_EXT with lba48\n", drive->name);

So, for drives with LBA48, you enable barriers either if report to
support it or if their capacity is _smaller_ than 2^28. If neither
is the case, it's left disabled and the kernel complains.
Is it safe to enable for 
(addressing == 1 && !ide_id_has_flush_cache_ext() && capacity <= 1<<28)
?
Shouldn't we check ide_has_flush_cache() then, as for the non-
LBA48 drives?

> +	} else if (ide_id_has_flush_cache(id))
> +		barrier = 1;

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                            Cologne, DE 
SUSE LINUX AG / Novell, Nuernberg, DE               Director SUSE Labs

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23 10:37   ` Kurt Garloff
@ 2004-05-23 10:56     ` Jens Axboe
  0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 10:56 UTC (permalink / raw)
  To: Kurt Garloff, Lorenzo Allegrucci, Andrew Morton, linux-kernel
  Cc: Bartlomiej Zolnierkiewicz

On Sun, May 23 2004, Kurt Garloff wrote:
> Hi Jens,
> 
> On Sun, May 23, 2004 at 10:27:28AM +0200, Jens Axboe wrote:
> > @@ -1729,8 +1723,29 @@ static void idedisk_setup (ide_drive_t *
> >  
> >  	write_cache(drive, 1);
> >  
> > -	blk_queue_ordered(drive->queue, 1);
> > -	blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
> > +	/*
> > +	 * decide if we can sanely support flushes and barriers on
> > +	 * this drive
> > +	 */
> > +	if (drive->addressing == 1) {
> > +		/*
> > +		 * if capacity is over 2^28 sectors, drive must support
> > +		 * FLUSH_CACHE_EXT
> > +		 */
> > +		if (ide_id_has_flush_cache_ext(id))
> > +			barrier = 1;
> > +		else if (capacity <= (1ULL << 28))
> > +			barrier = 1;
> > +		else
> > +			printk("%s: drive is buggy, no support for FLUSH_CACHE_EXT with lba48\n", drive->name);
> 
> So, for drives with LBA48, you enable barriers either if report to
> support it or if their capacity is _smaller_ than 2^28. If neither
> is the case, it's left disabled and the kernel complains.
> Is it safe to enable for 
> (addressing == 1 && !ide_id_has_flush_cache_ext() && capacity <= 1<<28)
> ?

Yes that's safe.

> Shouldn't we check ide_has_flush_cache() then, as for the non-
> LBA48 drives?

Yep.

Yeah that was buggy, see later posting (which had a barrier = 1 hanging
from update, argh). Here's a better version imho. I have one drive here
that does support FLUSH_CACHE, but doesn't flag cfs_enable_2 as such. So
I think our logic should be:

        /*
         * decide if we can sanely support flushes and barriers on
         * this drive. unfortunately not all drives advertise
         * FLUSH_CACHE
         * support even if they support it. So assume FLUSH_CACHE is
         * there
         * if write back caching is enabled. LBA48 drives are newer, so
         * expect it to flag support properly. We can safely support
         * FLUSH_CACHE on lba48, if capacity doesn't exceed lba28
         */
        barrier = drive->wcache;
        if (drive->addressing == 1) {
                barrier = ide_id_has_flush_cache(id);
                if (capacity > (1ULL << 28) && !ide_id_has_flush_cache_ext(id))
                        barrier = 0;
        }

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23 10:03           ` Jens Axboe
@ 2004-05-23 15:32             ` Lorenzo Allegrucci
  2004-05-23 15:45               ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 15:32 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Sunday 23 May 2004 12:03, Jens Axboe wrote:

> Here's a rolled up updated version that tries to get async notification
> of missing barrier support working as well. reiser currently doesn't
> cope with that correctly (fails mount), ext3 seems to but gets stuck.
> Andrew has that fixed already, I think :-)
>
> Lorenzo, can you test this on top of 2.6.6-mm5?

Problem fixed, but there is some performance regression

ext3 (default)
untar		read		copy		remove
0m53.861s	0m24.942s	1m30.164s	0m20.664s
0m7.132s	0m1.191s	0m0.766s	0m0.076s
0m5.807s	0m3.345s	0m9.996s	0m1.719s

ext3 (-o barrier=1)
untar		read		copy		remove
0m52.117s	0m28.502s	1m51.153s	0m25.561s
0m7.231s	0m1.209s	0m0.738s	0m0.071s
0m6.117s	0m3.191s	0m9.347s	0m1.635s

-- 
Lorenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23 15:32             ` Lorenzo Allegrucci
@ 2004-05-23 15:45               ` Jens Axboe
  2004-05-23 16:43                 ` Lorenzo Allegrucci
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 15:45 UTC (permalink / raw)
  To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel

On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> On Sunday 23 May 2004 12:03, Jens Axboe wrote:
> 
> > Here's a rolled up updated version that tries to get async notification
> > of missing barrier support working as well. reiser currently doesn't
> > cope with that correctly (fails mount), ext3 seems to but gets stuck.
> > Andrew has that fixed already, I think :-)
> >
> > Lorenzo, can you test this on top of 2.6.6-mm5?
> 
> Problem fixed, but there is some performance regression
> 
> ext3 (default)
> untar		read		copy		remove
> 0m53.861s	0m24.942s	1m30.164s	0m20.664s
> 0m7.132s	0m1.191s	0m0.766s	0m0.076s
> 0m5.807s	0m3.345s	0m9.996s	0m1.719s
> 
> ext3 (-o barrier=1)
> untar		read		copy		remove
> 0m52.117s	0m28.502s	1m51.153s	0m25.561s
> 0m7.231s	0m1.209s	0m0.738s	0m0.071s
> 0m6.117s	0m3.191s	0m9.347s	0m1.635s

Not sure what you mean here, but yes of course -o barrier=1 is going to
be slower than default + write back caching. What you should compare is
without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
caching on is slower then that's a regression :-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23 15:45               ` Jens Axboe
@ 2004-05-23 16:43                 ` Lorenzo Allegrucci
  2004-05-23 16:56                   ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 16:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Sunday 23 May 2004 17:45, Jens Axboe wrote:
> On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> > On Sunday 23 May 2004 12:03, Jens Axboe wrote:
> > > Here's a rolled up updated version that tries to get async notification
> > > of missing barrier support working as well. reiser currently doesn't
> > > cope with that correctly (fails mount), ext3 seems to but gets stuck.
> > > Andrew has that fixed already, I think :-)
> > >
> > > Lorenzo, can you test this on top of 2.6.6-mm5?
> >
> > Problem fixed, but there is some performance regression
> >
> > ext3 (default)
> > untar		read		copy		remove
> > 0m53.861s	0m24.942s	1m30.164s	0m20.664s
> > 0m7.132s	0m1.191s	0m0.766s	0m0.076s
> > 0m5.807s	0m3.345s	0m9.996s	0m1.719s
> >
> > ext3 (-o barrier=1)
> > untar		read		copy		remove
> > 0m52.117s	0m28.502s	1m51.153s	0m25.561s
> > 0m7.231s	0m1.209s	0m0.738s	0m0.071s
> > 0m6.117s	0m3.191s	0m9.347s	0m1.635s
>
> Not sure what you mean here

Untar, read, copy and remove the OpenOffice tarball, each test
run with cold cache (mount/umount cycle).

> but yes of course -o barrier=1 is going to 
> be slower than default + write back caching. What you should compare is
> without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
> caching on is slower then that's a regression :-)

hdparm -W0 /dev/hda

ext3 (-o barrier=0)
untar		read		copy		remove
1m55.190s	0m27.633s	2m19.072s	0m21.348s
0m7.081s	0m1.189s	0m0.724s	0m0.083s
0m6.502s	0m3.244s	0m9.715s	0m1.633s

ext3 (-o barrier=1)
untar		read		copy		remove
1m55.358s	0m23.831s	2m16.674s	0m21.508s
0m7.153s	0m1.200s	0m0.748s	0m0.087s
0m6.775s	0m3.358s	0m9.985s	0m1.781s


haparm -W1 /dev/hda

ext3 (-o barrier=0)
untar		read		copy		remove
0m55.405s	0m26.230s	1m28.765s	0m20.766s
0m7.195s	0m1.199s	0m0.773s	0m0.081s
0m6.502s	0m3.359s	0m9.672s	0m1.868s

ext3 (-o barrier=1)
untar		read		copy		remove
0m52.117s	0m28.502s	1m51.153s	0m25.561s
0m7.231s	0m1.209s	0m0.738s	0m0.071s
0m6.117s	0m3.191s	0m9.347s	0m1.635s

-- 
Lorenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23 16:43                 ` Lorenzo Allegrucci
@ 2004-05-23 16:56                   ` Jens Axboe
  2004-05-23 17:17                     ` Lorenzo Allegrucci
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 16:56 UTC (permalink / raw)
  To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel

On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> On Sunday 23 May 2004 17:45, Jens Axboe wrote:
> > On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> > > On Sunday 23 May 2004 12:03, Jens Axboe wrote:
> > > > Here's a rolled up updated version that tries to get async notification
> > > > of missing barrier support working as well. reiser currently doesn't
> > > > cope with that correctly (fails mount), ext3 seems to but gets stuck.
> > > > Andrew has that fixed already, I think :-)
> > > >
> > > > Lorenzo, can you test this on top of 2.6.6-mm5?
> > >
> > > Problem fixed, but there is some performance regression
> > >
> > > ext3 (default)
> > > untar		read		copy		remove
> > > 0m53.861s	0m24.942s	1m30.164s	0m20.664s
> > > 0m7.132s	0m1.191s	0m0.766s	0m0.076s
> > > 0m5.807s	0m3.345s	0m9.996s	0m1.719s
> > >
> > > ext3 (-o barrier=1)
> > > untar		read		copy		remove
> > > 0m52.117s	0m28.502s	1m51.153s	0m25.561s
> > > 0m7.231s	0m1.209s	0m0.738s	0m0.071s
> > > 0m6.117s	0m3.191s	0m9.347s	0m1.635s
> >
> > Not sure what you mean here
> 
> Untar, read, copy and remove the OpenOffice tarball, each test
> run with cold cache (mount/umount cycle).

I understand that, I just don't see how you can call it a regression.
It's a given that barrier will be slower.

> > but yes of course -o barrier=1 is going to 
> > be slower than default + write back caching. What you should compare is
> > without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
> > caching on is slower then that's a regression :-)
> 
> hdparm -W0 /dev/hda
> 
> ext3 (-o barrier=0)
> untar		read		copy		remove
> 1m55.190s	0m27.633s	2m19.072s	0m21.348s
> 0m7.081s	0m1.189s	0m0.724s	0m0.083s
> 0m6.502s	0m3.244s	0m9.715s	0m1.633s
> 
> ext3 (-o barrier=1)
> untar		read		copy		remove
> 1m55.358s	0m23.831s	2m16.674s	0m21.508s
> 0m7.153s	0m1.200s	0m0.748s	0m0.087s
> 0m6.775s	0m3.358s	0m9.985s	0m1.781s
> 
> 
> haparm -W1 /dev/hda
> 
> ext3 (-o barrier=0)
> untar		read		copy		remove
> 0m55.405s	0m26.230s	1m28.765s	0m20.766s
> 0m7.195s	0m1.199s	0m0.773s	0m0.081s
> 0m6.502s	0m3.359s	0m9.672s	0m1.868s
> 
> ext3 (-o barrier=1)
> untar		read		copy		remove
> 0m52.117s	0m28.502s	1m51.153s	0m25.561s
> 0m7.231s	0m1.209s	0m0.738s	0m0.071s
> 0m6.117s	0m3.191s	0m9.347s	0m1.635s

Your results look a bit over the map, how many runs are your averaging
for each one?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23 16:56                   ` Jens Axboe
@ 2004-05-23 17:17                     ` Lorenzo Allegrucci
  2004-05-23 17:31                       ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 17:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Sunday 23 May 2004 18:56, Jens Axboe wrote:

> > Untar, read, copy and remove the OpenOffice tarball, each test
> > run with cold cache (mount/umount cycle).
>
> I understand that, I just don't see how you can call it a regression.
> It's a given that barrier will be slower.

I'm sorry, I didn't know :)

I read from www.kerneltrap.org:

Request barriers, also known as write barriers, provide a mechanism for 
guaranteeing the order of disk I/O operations without actually waiting for 
the data to be written to disk. Specifically, a request barrier guarantees 
that any data queued up prior to the the barrier will be written to disk 
before data queued up after the barrier. Without a request barrier, the block 
layer can reorder how data is written to disk for maximum performance. The 
problem with this being most notably with journaling filesystems which 
require that their metadata be updated prior to actually updating data, 
allowing true crash recovery. Without request barriers, a journaling 
filesystem has to wait for the metadata change to be written to disk before 
it can proceed with actually updating the filesystem. Hence, the addition of 
request barriers provides a performance boost for journaled filesystems. In 
-mm5, request barriers are enabled by default for reiserfs and ext3.

> > > but yes of course -o barrier=1 is going to
> > > be slower than default + write back caching. What you should compare is
> > > without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
> > > caching on is slower then that's a regression :-)
> >
> > hdparm -W0 /dev/hda
> >
> > ext3 (-o barrier=0)
> > untar		read		copy		remove
> > 1m55.190s	0m27.633s	2m19.072s	0m21.348s
> > 0m7.081s	0m1.189s	0m0.724s	0m0.083s
> > 0m6.502s	0m3.244s	0m9.715s	0m1.633s
> >
> > ext3 (-o barrier=1)
> > untar		read		copy		remove
> > 1m55.358s	0m23.831s	2m16.674s	0m21.508s
> > 0m7.153s	0m1.200s	0m0.748s	0m0.087s
> > 0m6.775s	0m3.358s	0m9.985s	0m1.781s
> >
> >
> > haparm -W1 /dev/hda
> >
> > ext3 (-o barrier=0)
> > untar		read		copy		remove
> > 0m55.405s	0m26.230s	1m28.765s	0m20.766s
> > 0m7.195s	0m1.199s	0m0.773s	0m0.081s
> > 0m6.502s	0m3.359s	0m9.672s	0m1.868s
> >
> > ext3 (-o barrier=1)
> > untar		read		copy		remove
> > 0m52.117s	0m28.502s	1m51.153s	0m25.561s
> > 0m7.231s	0m1.209s	0m0.738s	0m0.071s
> > 0m6.117s	0m3.191s	0m9.347s	0m1.635s
>
> Your results look a bit over the map, how many runs are your averaging
> for each one?

Just one run, no averaging.
Yes, it's not a scientific approach, but I have not enough time
and this is my production machine :)
By experience I can say that numbers between each run are quite
stable and reproducible.

-- 
Lorenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23 17:17                     ` Lorenzo Allegrucci
@ 2004-05-23 17:31                       ` Jens Axboe
  2004-05-23 20:41                         ` Lorenzo Allegrucci
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 17:31 UTC (permalink / raw)
  To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel

On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> On Sunday 23 May 2004 18:56, Jens Axboe wrote:
> 
> > > Untar, read, copy and remove the OpenOffice tarball, each test
> > > run with cold cache (mount/umount cycle).
> >
> > I understand that, I just don't see how you can call it a regression.
> > It's a given that barrier will be slower.
> 
> I'm sorry, I didn't know :)
> 
> I read from www.kerneltrap.org:
> 
> Request barriers, also known as write barriers, provide a mechanism for 

[snip]

Ah ok, I see the confusion! ext3 does nothing clever with barriers, it
just uses it for data integrity. A journalled fs is normally not safe to
run on write back cached drives if they aren't battery backed. You could
try reiser instead, it's has a more intelligent use of barriers.

> > > > but yes of course -o barrier=1 is going to
> > > > be slower than default + write back caching. What you should compare is
> > > > without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
> > > > caching on is slower then that's a regression :-)
> > >
> > > hdparm -W0 /dev/hda
> > >
> > > ext3 (-o barrier=0)
> > > untar		read		copy		remove
> > > 1m55.190s	0m27.633s	2m19.072s	0m21.348s
> > > 0m7.081s	0m1.189s	0m0.724s	0m0.083s
> > > 0m6.502s	0m3.244s	0m9.715s	0m1.633s
> > >
> > > ext3 (-o barrier=1)
> > > untar		read		copy		remove
> > > 1m55.358s	0m23.831s	2m16.674s	0m21.508s
> > > 0m7.153s	0m1.200s	0m0.748s	0m0.087s
> > > 0m6.775s	0m3.358s	0m9.985s	0m1.781s
> > >
> > >
> > > haparm -W1 /dev/hda
> > >
> > > ext3 (-o barrier=0)
> > > untar		read		copy		remove
> > > 0m55.405s	0m26.230s	1m28.765s	0m20.766s
> > > 0m7.195s	0m1.199s	0m0.773s	0m0.081s
> > > 0m6.502s	0m3.359s	0m9.672s	0m1.868s
> > >
> > > ext3 (-o barrier=1)
> > > untar		read		copy		remove
> > > 0m52.117s	0m28.502s	1m51.153s	0m25.561s
> > > 0m7.231s	0m1.209s	0m0.738s	0m0.071s
> > > 0m6.117s	0m3.191s	0m9.347s	0m1.635s
> >
> > Your results look a bit over the map, how many runs are your averaging
> > for each one?
> 
> Just one run, no averaging.
> Yes, it's not a scientific approach, but I have not enough time
> and this is my production machine :)
> By experience I can say that numbers between each run are quite
> stable and reproducible.

It just looks odd that eg reads vary as much as they do, and that -o
barrier=1 makes -W0 reads faster (and faster then -W1 even). remove
looks reasonable for -W1, but -W0 is still faster. That is _really_ odd.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
  2004-05-23 17:31                       ` Jens Axboe
@ 2004-05-23 20:41                         ` Lorenzo Allegrucci
  0 siblings, 0 replies; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 20:41 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, linux-kernel

On Sunday 23 May 2004 19:31, Jens Axboe wrote:

> It just looks odd that eg reads vary as much as they do, and that -o
> barrier=1 makes -W0 reads faster (and faster then -W1 even). remove
> looks reasonable for -W1, but -W0 is still faster. That is _really_ odd.

I rerun all tests with noatime.
Using noatime numbers look more predictable but I cannot guarantee
any statistical relevance.

hdparm -W1
ext3 (-o barrier=0,noatime)
untar		read		copy		remove
0m56.744s	0m22.452s	1m26.558s	0m20.683s
0m7.187s	0m1.153s	0m0.747s	0m0.101s
0m6.077s	0m3.045s	0m9.132s	0m1.907s

ext3 (-o barrier=1,noatime)
untar		read		copy		remove
0m55.152s	0m21.340s	1m23.705s	0m19.369s
0m7.237s	0m1.217s	0m0.788s	0m0.077s
0m6.513s	0m3.082s	0m9.226s	0m1.507s


haparm -W0
ext3 (-o barrier=0,noatime)
untar		read		copy		remove
1m55.221s	0m23.768s	2m12.458s	0m22.256s
0m7.227s	0m1.216s	0m0.756s	0m0.085s
0m6.424s	0m3.002s	0m9.085s	0m1.560s

ext3 (-o barrier=1,noatime)
untar		read		copy		remove
1m56.578s	0m23.382s	2m14.117s	0m22.289s
0m7.035s	0m1.202s	0m0.766s	0m0.089s
0m6.814s	0m3.103s	0m9.141s	0m1.696s

-- 
Lorenzo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2004-05-23 20:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-22 19:07 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier Lorenzo Allegrucci
2004-05-22 19:21 ` Andrew Morton
2004-05-22 21:20   ` Jens Axboe
2004-05-22 21:30     ` Jens Axboe
2004-05-23  8:58       ` Lorenzo Allegrucci
2004-05-23  9:11         ` Jens Axboe
2004-05-23 10:03           ` Jens Axboe
2004-05-23 15:32             ` Lorenzo Allegrucci
2004-05-23 15:45               ` Jens Axboe
2004-05-23 16:43                 ` Lorenzo Allegrucci
2004-05-23 16:56                   ` Jens Axboe
2004-05-23 17:17                     ` Lorenzo Allegrucci
2004-05-23 17:31                       ` Jens Axboe
2004-05-23 20:41                         ` Lorenzo Allegrucci
2004-05-23  8:27 ` Jens Axboe
2004-05-23 10:37   ` Kurt Garloff
2004-05-23 10:56     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).