LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
@ 2004-05-22 19:07 Lorenzo Allegrucci
2004-05-22 19:21 ` Andrew Morton
2004-05-23 8:27 ` Jens Axboe
0 siblings, 2 replies; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-22 19:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
I get a 100% reproducible oops mounting an ext3 or reiserfs
partition with -o barrier enabled.
Hand written oops (for ext3):
hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hda: drive_cmd: error=0x04 { DriveStatusError }
end_request: I/O error, dev hda, sector 84666781
------------[ cut here ]------------
kernel BUG at include/linux/blkdev.h:576!
invalid operand: 0000 [#1]
PREEMPT
Modules linked in: lp emu10k1 sound soundcore ac97_codec unix
CPU: 0
EIP: 0060:[<c02125ae>] Not tainted VLI
EFLAGS: 00010046 (2.6.6-mm5)
EIP is at __ide_end_request+0xbe/0x110
eax: 00000e7a ebx: da88636c ecx: 00000000 edx: da88636c
esi: 00000000 edi: c038180c ebp: 00000008 esp: c0368ec4
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0368000 task=c02d1a40)
Stack: 00000001 00000008 da88636c 050be99d 00000000 c02128fc 00000008 c0116718
00000000 c036bb01 00000246 00000001 00000000 c038180c 00000286 c0368000
dffdfa78 c038180c c0212a79 c02abc65 c0381760 042aa50a 00000051 dffdf951
Call Trace:
[<c02128fc>] ide_complete_barrier+0xec/0x160
[<c0116718>] release_console_sem+0xc8/0xe0
[<c0212a79>] ide_end_drive_cmd+0x109/0x370
[<c021d54c>] idedisk_error+0x6c/0x1f0
[<c02024d2>] end_that_request_last+0x52/0xb0
[<c021258c>] __ide_end_request+0x9c/0x110
[<c01fefc2>] elv_queue_empty+0x12/0x20
[<c0213548>] ide_do_request+0x78/0x390
[<c0213059>] drive_cmd_intr+0x79/0xd0
[<c0213cd3>] ide_intr+0xd3/0x170
[<c0212fe0>] drive_cmd_intr+0x0/0xd0
[<c0106f60>] handle_IRQ_event+0x30/0x60
[<c01072b0>] do_IRQ+0xc0/0x1a0
========================
[<c01058c8>] common_interrupt+0x18/0x20
[<c01038d3>] default_idle+0x23/0x30
[<c010393c>] cpu_idle+0x2c/0x40
[<c03455d9>] start_kernel+0x179/0x1c0
[<c0345320>] unknown_bootoption+0x0/0x110
Code: 75 1d 89 d8 e8 f4 fe fe ff 8b 47 70 8b 40 08 c7 40 20 00 00 00 00 c7 04
<0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing
hda is a Maxtor 6Y060L0
--
Lorenzo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-22 19:07 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier Lorenzo Allegrucci
@ 2004-05-22 19:21 ` Andrew Morton
2004-05-22 21:20 ` Jens Axboe
2004-05-23 8:27 ` Jens Axboe
1 sibling, 1 reply; 17+ messages in thread
From: Andrew Morton @ 2004-05-22 19:21 UTC (permalink / raw)
To: Lorenzo Allegrucci; +Cc: linux-kernel, Jens Axboe
Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
>
>
> I get a 100% reproducible oops mounting an ext3 or reiserfs
> partition with -o barrier enabled.
> Hand written oops (for ext3):
That's a lot of hand-writing. Thanks for doing that. You can usually omit
the hex numbers in [brackets] when doing this.
The crash is here:
static inline void blkdev_dequeue_request(struct request *req)
{
BUG_ON(list_empty(&req->queuelist));
perhaps related to that I/O error sending the code through less-tested
paths.
> hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> hda: drive_cmd: error=0x04 { DriveStatusError }
> end_request: I/O error, dev hda, sector 84666781
> ------------[ cut here ]------------
> kernel BUG at include/linux/blkdev.h:576!
> invalid operand: 0000 [#1]
> PREEMPT
> Modules linked in: lp emu10k1 sound soundcore ac97_codec unix
> CPU: 0
> EIP: 0060:[<c02125ae>] Not tainted VLI
> EFLAGS: 00010046 (2.6.6-mm5)
> EIP is at __ide_end_request+0xbe/0x110
> eax: 00000e7a ebx: da88636c ecx: 00000000 edx: da88636c
> esi: 00000000 edi: c038180c ebp: 00000008 esp: c0368ec4
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0, threadinfo=c0368000 task=c02d1a40)
> Stack: 00000001 00000008 da88636c 050be99d 00000000 c02128fc 00000008 c0116718
> 00000000 c036bb01 00000246 00000001 00000000 c038180c 00000286 c0368000
> dffdfa78 c038180c c0212a79 c02abc65 c0381760 042aa50a 00000051 dffdf951
> Call Trace:
> [<c02128fc>] ide_complete_barrier+0xec/0x160
> [<c0116718>] release_console_sem+0xc8/0xe0
> [<c0212a79>] ide_end_drive_cmd+0x109/0x370
> [<c021d54c>] idedisk_error+0x6c/0x1f0
> [<c02024d2>] end_that_request_last+0x52/0xb0
> [<c021258c>] __ide_end_request+0x9c/0x110
> [<c01fefc2>] elv_queue_empty+0x12/0x20
> [<c0213548>] ide_do_request+0x78/0x390
> [<c0213059>] drive_cmd_intr+0x79/0xd0
> [<c0213cd3>] ide_intr+0xd3/0x170
> [<c0212fe0>] drive_cmd_intr+0x0/0xd0
> [<c0106f60>] handle_IRQ_event+0x30/0x60
> [<c01072b0>] do_IRQ+0xc0/0x1a0
> ========================
> [<c01058c8>] common_interrupt+0x18/0x20
> [<c01038d3>] default_idle+0x23/0x30
> [<c010393c>] cpu_idle+0x2c/0x40
> [<c03455d9>] start_kernel+0x179/0x1c0
> [<c0345320>] unknown_bootoption+0x0/0x110
>
> Code: 75 1d 89 d8 e8 f4 fe fe ff 8b 47 70 8b 40 08 c7 40 20 00 00 00 00 c7 04
> <0>Kernel panic: Fatal exception in interrupt
> In interrupt handler - not syncing
>
>
> hda is a Maxtor 6Y060L0
>
> --
> Lorenzo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-22 19:21 ` Andrew Morton
@ 2004-05-22 21:20 ` Jens Axboe
2004-05-22 21:30 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-22 21:20 UTC (permalink / raw)
To: Andrew Morton; +Cc: Lorenzo Allegrucci, linux-kernel
On Sat, May 22 2004, Andrew Morton wrote:
> Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> >
> >
> > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > partition with -o barrier enabled.
> > Hand written oops (for ext3):
>
> That's a lot of hand-writing. Thanks for doing that. You can usually omit
> the hex numbers in [brackets] when doing this.
>
> The crash is here:
>
> static inline void blkdev_dequeue_request(struct request *req)
> {
> BUG_ON(list_empty(&req->queuelist));
>
> perhaps related to that I/O error sending the code through less-tested
> paths.
Ouch indeed, I'll get that fixed up first thing in the morning.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-22 21:20 ` Jens Axboe
@ 2004-05-22 21:30 ` Jens Axboe
2004-05-23 8:58 ` Lorenzo Allegrucci
0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-22 21:30 UTC (permalink / raw)
To: Andrew Morton; +Cc: Lorenzo Allegrucci, linux-kernel
On Sat, May 22 2004, Jens Axboe wrote:
> On Sat, May 22 2004, Andrew Morton wrote:
> > Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> > >
> > >
> > > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > > partition with -o barrier enabled.
> > > Hand written oops (for ext3):
> >
> > That's a lot of hand-writing. Thanks for doing that. You can usually omit
> > the hex numbers in [brackets] when doing this.
> >
> > The crash is here:
> >
> > static inline void blkdev_dequeue_request(struct request *req)
> > {
> > BUG_ON(list_empty(&req->queuelist));
> >
> > perhaps related to that I/O error sending the code through less-tested
> > paths.
>
> Ouch indeed, I'll get that fixed up first thing in the morning.
Can you test this work-around? The work-around should be perfectly safe,
this is just a case where a BUG_ON() does more harm than good :-)
--- drivers/ide/ide-io.c~ 2004-05-21 11:02:58.000000000 +0200
+++ drivers/ide/ide-io.c 2004-05-22 23:28:37.692944185 +0200
@@ -291,6 +291,8 @@
sector = real_rq->hard_sector;
bad_sectors = real_rq->hard_nr_sectors - good_sectors;
+ /* work-around, make sure request is on queue */
+ elv_requeue_request(drive->queue, real_rq);
if (good_sectors)
__ide_end_request(drive, real_rq, 1, good_sectors);
if (bad_sectors)
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-22 19:07 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier Lorenzo Allegrucci
2004-05-22 19:21 ` Andrew Morton
@ 2004-05-23 8:27 ` Jens Axboe
2004-05-23 10:37 ` Kurt Garloff
1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 8:27 UTC (permalink / raw)
To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel
On Sat, May 22 2004, Lorenzo Allegrucci wrote:
>
> I get a 100% reproducible oops mounting an ext3 or reiserfs
> partition with -o barrier enabled.
> Hand written oops (for ext3):
>
> hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> hda: drive_cmd: error=0x04 { DriveStatusError }
> end_request: I/O error, dev hda, sector 84666781
> ------------[ cut here ]------------
> kernel BUG at include/linux/blkdev.h:576!
> invalid operand: 0000 [#1]
> PREEMPT
> Modules linked in: lp emu10k1 sound soundcore ac97_codec unix
> CPU: 0
> EIP: 0060:[<c02125ae>] Not tainted VLI
> EFLAGS: 00010046 (2.6.6-mm5)
> EIP is at __ide_end_request+0xbe/0x110
> eax: 00000e7a ebx: da88636c ecx: 00000000 edx: da88636c
> esi: 00000000 edi: c038180c ebp: 00000008 esp: c0368ec4
> ds: 007b es: 007b ss: 0068
> Process swapper (pid: 0, threadinfo=c0368000 task=c02d1a40)
> Stack: 00000001 00000008 da88636c 050be99d 00000000 c02128fc 00000008 c0116718
> 00000000 c036bb01 00000246 00000001 00000000 c038180c 00000286 c0368000
> dffdfa78 c038180c c0212a79 c02abc65 c0381760 042aa50a 00000051 dffdf951
> Call Trace:
> [<c02128fc>] ide_complete_barrier+0xec/0x160
So here's a complete patch, that both fixes the code bug and adds some
support for (hopefully) fool proof checking proper flush support in the
drive. Can you give this a test, please? Post the dmesg ide boot
messages and whether it works, thanks.
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-disk.c linux-2.6.6-mm5/drivers/ide/ide-disk.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-disk.c 2004-05-23 09:56:20.098457257 +0200
+++ linux-2.6.6-mm5/drivers/ide/ide-disk.c 2004-05-23 10:23:36.211541946 +0200
@@ -1392,13 +1392,6 @@ static int set_nowerr(ide_drive_t *drive
return 0;
}
-/* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
-#define ide_id_has_flush_cache(id) ((id)->cfs_enable_2 & 0x3000)
-
-/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
-#define ide_id_has_flush_cache_ext(id) \
- (((id)->cfs_enable_2 & 0x2400) == 0x2400)
-
static int write_cache (ide_drive_t *drive, int arg)
{
ide_task_t args;
@@ -1597,6 +1590,7 @@ static void idedisk_setup (ide_drive_t *
{
struct hd_driveid *id = drive->id;
unsigned long long capacity;
+ int barrier = 0;
idedisk_add_settings(drive);
@@ -1729,8 +1723,29 @@ static void idedisk_setup (ide_drive_t *
write_cache(drive, 1);
- blk_queue_ordered(drive->queue, 1);
- blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
+ /*
+ * decide if we can sanely support flushes and barriers on
+ * this drive
+ */
+ if (drive->addressing == 1) {
+ /*
+ * if capacity is over 2^28 sectors, drive must support
+ * FLUSH_CACHE_EXT
+ */
+ if (ide_id_has_flush_cache_ext(id))
+ barrier = 1;
+ else if (capacity <= (1ULL << 28))
+ barrier = 1;
+ else
+ printk("%s: drive is buggy, no support for FLUSH_CACHE_EXT with lba48\n", drive->name);
+ } else if (ide_id_has_flush_cache(id))
+ barrier = 1;
+
+ printk("%s: cache flushes %ssupported\n", drive->name, barrier ? "" : "not ");
+ if (barrier) {
+ blk_queue_ordered(drive->queue, 1);
+ blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
+ }
#ifdef CONFIG_BLK_DEV_IDE_TCQ_DEFAULT
if (drive->using_dma)
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-io.c linux-2.6.6-mm5/drivers/ide/ide-io.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-io.c 2004-05-23 09:56:20.100457041 +0200
+++ linux-2.6.6-mm5/drivers/ide/ide-io.c 2004-05-23 10:22:59.204542947 +0200
@@ -67,7 +67,7 @@ static void ide_fill_flush_cmd(ide_drive
rq->buffer = buf;
rq->buffer[0] = WIN_FLUSH_CACHE;
- if (drive->id->cfs_enable_2 & 0x2400)
+ if (ide_id_has_flush_cache_ext(drive->id))
rq->buffer[0] = WIN_FLUSH_CACHE_EXT;
}
@@ -229,7 +229,7 @@ u64 ide_get_error_location(ide_drive_t *
lcyl = args[4];
sect = args[3];
- if (drive->id->cfs_enable_2 & 0x2400) {
+ if (ide_id_has_flush_cache_ext(drive->id)) {
low = (hcyl << 16) | (lcyl << 8) | sect;
HWIF(drive)->OUTB(drive->ctl|0x80, IDE_CONTROL_REG);
high = ide_read_24(drive);
@@ -277,32 +277,48 @@ static void ide_complete_barrier(ide_dri
}
/*
- * bummer, flush failed. if it was the pre-flush, fail the barrier.
- * if it was the post-flush, complete the succesful part of the request
- * and fail the rest
+ * we need to end real_rq, but it's not on the queue currently.
+ * put it back on the queue, so we don't have to special case
+ * anything else for completing it
*/
- good_sectors = 0;
- if (blk_barrier_postflush(rq)) {
- sector = ide_get_error_location(drive, rq->buffer);
-
- if ((sector >= real_rq->hard_sector) &&
- (sector < real_rq->hard_sector + real_rq->hard_nr_sectors))
- good_sectors = sector - real_rq->hard_sector;
- } else
- sector = real_rq->hard_sector;
+ elv_requeue_request(drive->queue, real_rq);
+
+ /*
+ * drive aborted flush command, assume FLUSH_CACHE_* doesn't
+ * work and disable barrier support
+ */
+ if (error & ABRT_ERR) {
+ printk(KERN_ERR "%s: barrier support doesn't work\n", drive->name);
+ __ide_end_request(drive, real_rq, 0, real_rq->hard_nr_sectors);
+ blk_queue_ordered(drive->queue, 0);
+ blk_queue_issue_flush_fn(drive->queue, NULL);
+ } else {
+ /*
+ * find out what part of the request failed
+ */
+ good_sectors = 0;
+ if (blk_barrier_postflush(rq)) {
+ sector = ide_get_error_location(drive, rq->buffer);
- bad_sectors = real_rq->hard_nr_sectors - good_sectors;
- if (good_sectors)
- __ide_end_request(drive, real_rq, 1, good_sectors);
- if (bad_sectors)
- __ide_end_request(drive, real_rq, 0, bad_sectors);
+ if ((sector >= real_rq->hard_sector) &&
+ (sector < real_rq->hard_sector + real_rq->hard_nr_sectors))
+ good_sectors = sector - real_rq->hard_sector;
+ } else
+ sector = real_rq->hard_sector;
+
+ bad_sectors = real_rq->hard_nr_sectors - good_sectors;
+ if (good_sectors)
+ __ide_end_request(drive, real_rq, 1, good_sectors);
+ if (bad_sectors)
+ __ide_end_request(drive, real_rq, 0, bad_sectors);
+
+ printk(KERN_ERR "%s: failed barrier write: "
+ "sector=%Lx(good=%d/bad=%d)\n",
+ drive->name, (unsigned long long)sector,
+ good_sectors, bad_sectors);
+ }
drive->doing_barrier = 0;
-
- printk(KERN_ERR "%s: failed barrier write: "
- "sector=%Lx(good=%d/bad=%d)\n",
- drive->name, (unsigned long long)sector,
- good_sectors, bad_sectors);
}
/**
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/include/linux/ide.h linux-2.6.6-mm5/include/linux/ide.h
--- /opt/kernel/linux-2.6.6-mm5/include/linux/ide.h 2004-05-23 09:56:21.401316264 +0200
+++ linux-2.6.6-mm5/include/linux/ide.h 2004-05-23 10:09:21.313967868 +0200
@@ -1716,4 +1716,11 @@ static inline int ata_can_queue(ide_driv
extern struct bus_type ide_bus_type;
+/* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
+#define ide_id_has_flush_cache(id) ((id)->cfs_enable_2 & 0x3000)
+
+/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
+#define ide_id_has_flush_cache_ext(id) \
+ (((id)->cfs_enable_2 & 0x2400) == 0x2400)
+
#endif /* _IDE_H */
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-22 21:30 ` Jens Axboe
@ 2004-05-23 8:58 ` Lorenzo Allegrucci
2004-05-23 9:11 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 8:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, linux-kernel
On Saturday 22 May 2004 23:30, Jens Axboe wrote:
> On Sat, May 22 2004, Jens Axboe wrote:
> > On Sat, May 22 2004, Andrew Morton wrote:
> > > Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> > > > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > > > partition with -o barrier enabled.
> > > > Hand written oops (for ext3):
> > >
> > > That's a lot of hand-writing. Thanks for doing that. You can usually
> > > omit the hex numbers in [brackets] when doing this.
> > >
> > > The crash is here:
> > >
> > > static inline void blkdev_dequeue_request(struct request *req)
> > > {
> > > BUG_ON(list_empty(&req->queuelist));
> > >
> > > perhaps related to that I/O error sending the code through less-tested
> > > paths.
> >
> > Ouch indeed, I'll get that fixed up first thing in the morning.
>
> Can you test this work-around? The work-around should be perfectly safe,
> this is just a case where a BUG_ON() does more harm than good :-)
>
> --- drivers/ide/ide-io.c~ 2004-05-21 11:02:58.000000000 +0200
> +++ drivers/ide/ide-io.c 2004-05-22 23:28:37.692944185 +0200
> @@ -291,6 +291,8 @@
> sector = real_rq->hard_sector;
>
> bad_sectors = real_rq->hard_nr_sectors - good_sectors;
> + /* work-around, make sure request is on queue */
> + elv_requeue_request(drive->queue, real_rq);
> if (good_sectors)
> __ide_end_request(drive, real_rq, 1, good_sectors);
> if (bad_sectors)
The oops goes away but:
hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
hda: drive_cmd: error=0x04 { DriveStatusError }
end_request: I/O error, dev hda, sector 84667085
Buffer I/O error on device hda11, logical block 559
lost page write due to I/O error on hda11
hda: failed barrier write: sector=50beacd(good=0/bad=8)
Aborting journal on device hda11.
ext3_abort called.
EXT3-fs abort (device hda11): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
--
Lorenzo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 8:58 ` Lorenzo Allegrucci
@ 2004-05-23 9:11 ` Jens Axboe
2004-05-23 10:03 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 9:11 UTC (permalink / raw)
To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel
On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> On Saturday 22 May 2004 23:30, Jens Axboe wrote:
> > On Sat, May 22 2004, Jens Axboe wrote:
> > > On Sat, May 22 2004, Andrew Morton wrote:
> > > > Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> > > > > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > > > > partition with -o barrier enabled.
> > > > > Hand written oops (for ext3):
> > > >
> > > > That's a lot of hand-writing. Thanks for doing that. You can usually
> > > > omit the hex numbers in [brackets] when doing this.
> > > >
> > > > The crash is here:
> > > >
> > > > static inline void blkdev_dequeue_request(struct request *req)
> > > > {
> > > > BUG_ON(list_empty(&req->queuelist));
> > > >
> > > > perhaps related to that I/O error sending the code through less-tested
> > > > paths.
> > >
> > > Ouch indeed, I'll get that fixed up first thing in the morning.
> >
> > Can you test this work-around? The work-around should be perfectly safe,
> > this is just a case where a BUG_ON() does more harm than good :-)
> >
> > --- drivers/ide/ide-io.c~ 2004-05-21 11:02:58.000000000 +0200
> > +++ drivers/ide/ide-io.c 2004-05-22 23:28:37.692944185 +0200
> > @@ -291,6 +291,8 @@
> > sector = real_rq->hard_sector;
> >
> > bad_sectors = real_rq->hard_nr_sectors - good_sectors;
> > + /* work-around, make sure request is on queue */
> > + elv_requeue_request(drive->queue, real_rq);
> > if (good_sectors)
> > __ide_end_request(drive, real_rq, 1, good_sectors);
> > if (bad_sectors)
>
> The oops goes away but:
>
> hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> hda: drive_cmd: error=0x04 { DriveStatusError }
> end_request: I/O error, dev hda, sector 84667085
> Buffer I/O error on device hda11, logical block 559
> lost page write due to I/O error on hda11
> hda: failed barrier write: sector=50beacd(good=0/bad=8)
> Aborting journal on device hda11.
> ext3_abort called.
> EXT3-fs abort (device hda11): ext3_journal_start: Detected aborted journal
> Remounting filesystem read-only
That's expected with that patch. Try the one I just sent you as well.
ext3 doesn't recover nicely though, I'll see if I can find a way to pass
back -EOPNOTSUPP in case of ABRT in completer_barrier().
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 9:11 ` Jens Axboe
@ 2004-05-23 10:03 ` Jens Axboe
2004-05-23 15:32 ` Lorenzo Allegrucci
0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 10:03 UTC (permalink / raw)
To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel
On Sun, May 23 2004, Jens Axboe wrote:
> On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> > On Saturday 22 May 2004 23:30, Jens Axboe wrote:
> > > On Sat, May 22 2004, Jens Axboe wrote:
> > > > On Sat, May 22 2004, Andrew Morton wrote:
> > > > > Lorenzo Allegrucci <l_allegrucci@despammed.com> wrote:
> > > > > > I get a 100% reproducible oops mounting an ext3 or reiserfs
> > > > > > partition with -o barrier enabled.
> > > > > > Hand written oops (for ext3):
> > > > >
> > > > > That's a lot of hand-writing. Thanks for doing that. You can usually
> > > > > omit the hex numbers in [brackets] when doing this.
> > > > >
> > > > > The crash is here:
> > > > >
> > > > > static inline void blkdev_dequeue_request(struct request *req)
> > > > > {
> > > > > BUG_ON(list_empty(&req->queuelist));
> > > > >
> > > > > perhaps related to that I/O error sending the code through less-tested
> > > > > paths.
> > > >
> > > > Ouch indeed, I'll get that fixed up first thing in the morning.
> > >
> > > Can you test this work-around? The work-around should be perfectly safe,
> > > this is just a case where a BUG_ON() does more harm than good :-)
> > >
> > > --- drivers/ide/ide-io.c~ 2004-05-21 11:02:58.000000000 +0200
> > > +++ drivers/ide/ide-io.c 2004-05-22 23:28:37.692944185 +0200
> > > @@ -291,6 +291,8 @@
> > > sector = real_rq->hard_sector;
> > >
> > > bad_sectors = real_rq->hard_nr_sectors - good_sectors;
> > > + /* work-around, make sure request is on queue */
> > > + elv_requeue_request(drive->queue, real_rq);
> > > if (good_sectors)
> > > __ide_end_request(drive, real_rq, 1, good_sectors);
> > > if (bad_sectors)
> >
> > The oops goes away but:
> >
> > hda: drive_cmd: status=0x51 { DriveReady SeekComplete Error }
> > hda: drive_cmd: error=0x04 { DriveStatusError }
> > end_request: I/O error, dev hda, sector 84667085
> > Buffer I/O error on device hda11, logical block 559
> > lost page write due to I/O error on hda11
> > hda: failed barrier write: sector=50beacd(good=0/bad=8)
> > Aborting journal on device hda11.
> > ext3_abort called.
> > EXT3-fs abort (device hda11): ext3_journal_start: Detected aborted journal
> > Remounting filesystem read-only
>
> That's expected with that patch. Try the one I just sent you as well.
> ext3 doesn't recover nicely though, I'll see if I can find a way to pass
> back -EOPNOTSUPP in case of ABRT in completer_barrier().
Here's a rolled up updated version that tries to get async notification
of missing barrier support working as well. reiser currently doesn't
cope with that correctly (fails mount), ext3 seems to but gets stuck.
Andrew has that fixed already, I think :-)
Lorenzo, can you test this on top of 2.6.6-mm5?
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/block/ll_rw_blk.c linux-2.6.6-mm5/drivers/block/ll_rw_blk.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/block/ll_rw_blk.c 2004-05-23 09:56:19.000000000 +0200
+++ linux-2.6.6-mm5/drivers/block/ll_rw_blk.c 2004-05-23 11:29:57.774204303 +0200
@@ -2706,10 +2706,17 @@ void blk_recalc_rq_sectors(struct reques
static int __end_that_request_first(struct request *req, int uptodate,
int nr_bytes)
{
- int total_bytes, bio_nbytes, error = 0, next_idx = 0;
+ int total_bytes, bio_nbytes, error, next_idx = 0;
struct bio *bio;
/*
+ * extend uptodate bool to allow < 0 value to be direct io error
+ */
+ error = 0;
+ if (end_io_error(uptodate))
+ error = !uptodate ? -EIO : uptodate;
+
+ /*
* for a REQ_BLOCK_PC request, we want to carry any eventual
* sense key with us all the way through
*/
@@ -2717,7 +2724,6 @@ static int __end_that_request_first(stru
req->errors = 0;
if (!uptodate) {
- error = -EIO;
if (blk_fs_request(req) && !(req->flags & REQ_QUIET))
printk("end_request: I/O error, dev %s, sector %llu\n",
req->rq_disk ? req->rq_disk->disk_name : "?",
@@ -2800,7 +2806,7 @@ static int __end_that_request_first(stru
/**
* end_that_request_first - end I/O on a request
* @req: the request being processed
- * @uptodate: 0 for I/O error
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
* @nr_sectors: number of sectors to end I/O on
*
* Description:
@@ -2821,7 +2827,7 @@ EXPORT_SYMBOL(end_that_request_first);
/**
* end_that_request_chunk - end I/O on a request
* @req: the request being processed
- * @uptodate: 0 for I/O error
+ * @uptodate: 1 for success, 0 for I/O error, < 0 for specific error
* @nr_bytes: number of bytes to complete
*
* Description:
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-disk.c linux-2.6.6-mm5/drivers/ide/ide-disk.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-disk.c 2004-05-23 09:56:20.000000000 +0200
+++ linux-2.6.6-mm5/drivers/ide/ide-disk.c 2004-05-23 11:48:51.095716239 +0200
@@ -1392,13 +1392,6 @@ static int set_nowerr(ide_drive_t *drive
return 0;
}
-/* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
-#define ide_id_has_flush_cache(id) ((id)->cfs_enable_2 & 0x3000)
-
-/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
-#define ide_id_has_flush_cache_ext(id) \
- (((id)->cfs_enable_2 & 0x2400) == 0x2400)
-
static int write_cache (ide_drive_t *drive, int arg)
{
ide_task_t args;
@@ -1597,6 +1590,7 @@ static void idedisk_setup (ide_drive_t *
{
struct hd_driveid *id = drive->id;
unsigned long long capacity;
+ int barrier = 0;
idedisk_add_settings(drive);
@@ -1729,8 +1723,32 @@ static void idedisk_setup (ide_drive_t *
write_cache(drive, 1);
- blk_queue_ordered(drive->queue, 1);
- blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
+ /*
+ * decide if we can sanely support flushes and barriers on
+ * this drive. if we have FLUSH_CACHE_EXT, it's always safe.
+ * if not, it's only safe if drive has FLUSH_CACHE, and:
+ * - capacity is less than required for LBA48
+ * - doesn't use LBA48
+ */
+ if (ide_id_has_flush_cache_ext(id))
+ barrier = 1;
+ else {
+ if (drive->addressing == 1) {
+ if (capacity <= (1ULL << 28))
+ barrier = 1;
+ else
+ printk("%s: drive is buggy, no support for FLUSH_CACHE_EXT with lba48\n", drive->name);
+ } else {
+ if (ide_id_has_flush_cache(id))
+ barrier = 1;
+ }
+ }
+ barrier = 1;
+ printk("%s: cache flushes %ssupported\n", drive->name, barrier ? "" : "not ");
+ if (barrier) {
+ blk_queue_ordered(drive->queue, 1);
+ blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
+ }
#ifdef CONFIG_BLK_DEV_IDE_TCQ_DEFAULT
if (drive->using_dma)
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-io.c linux-2.6.6-mm5/drivers/ide/ide-io.c
--- /opt/kernel/linux-2.6.6-mm5/drivers/ide/ide-io.c 2004-05-23 09:56:20.000000000 +0200
+++ linux-2.6.6-mm5/drivers/ide/ide-io.c 2004-05-23 12:00:13.150994698 +0200
@@ -67,7 +67,7 @@ static void ide_fill_flush_cmd(ide_drive
rq->buffer = buf;
rq->buffer[0] = WIN_FLUSH_CACHE;
- if (drive->id->cfs_enable_2 & 0x2400)
+ if (ide_id_has_flush_cache_ext(drive->id))
rq->buffer[0] = WIN_FLUSH_CACHE_EXT;
}
@@ -115,10 +115,10 @@ static int __ide_end_request(ide_drive_t
* if failfast is set on a request, override number of sectors and
* complete the whole request right now
*/
- if (blk_noretry_request(rq) && !uptodate)
+ if (blk_noretry_request(rq) && end_io_error(uptodate))
nr_sectors = rq->hard_nr_sectors;
- if (!blk_fs_request(rq) && !uptodate && !rq->errors)
+ if (!blk_fs_request(rq) && end_io_error(uptodate) && !rq->errors)
rq->errors = -EIO;
/*
@@ -229,7 +229,7 @@ u64 ide_get_error_location(ide_drive_t *
lcyl = args[4];
sect = args[3];
- if (drive->id->cfs_enable_2 & 0x2400) {
+ if (ide_id_has_flush_cache_ext(drive->id)) {
low = (hcyl << 16) | (lcyl << 8) | sect;
HWIF(drive)->OUTB(drive->ctl|0x80, IDE_CONTROL_REG);
high = ide_read_24(drive);
@@ -277,32 +277,48 @@ static void ide_complete_barrier(ide_dri
}
/*
- * bummer, flush failed. if it was the pre-flush, fail the barrier.
- * if it was the post-flush, complete the succesful part of the request
- * and fail the rest
+ * we need to end real_rq, but it's not on the queue currently.
+ * put it back on the queue, so we don't have to special case
+ * anything else for completing it
*/
- good_sectors = 0;
- if (blk_barrier_postflush(rq)) {
- sector = ide_get_error_location(drive, rq->buffer);
-
- if ((sector >= real_rq->hard_sector) &&
- (sector < real_rq->hard_sector + real_rq->hard_nr_sectors))
- good_sectors = sector - real_rq->hard_sector;
- } else
- sector = real_rq->hard_sector;
+ elv_requeue_request(drive->queue, real_rq);
+
+ /*
+ * drive aborted flush command, assume FLUSH_CACHE_* doesn't
+ * work and disable barrier support
+ */
+ if (error & ABRT_ERR) {
+ printk(KERN_ERR "%s: barrier support doesn't work\n", drive->name);
+ __ide_end_request(drive, real_rq, -EOPNOTSUPP, real_rq->hard_nr_sectors);
+ blk_queue_ordered(drive->queue, 0);
+ blk_queue_issue_flush_fn(drive->queue, NULL);
+ } else {
+ /*
+ * find out what part of the request failed
+ */
+ good_sectors = 0;
+ if (blk_barrier_postflush(rq)) {
+ sector = ide_get_error_location(drive, rq->buffer);
- bad_sectors = real_rq->hard_nr_sectors - good_sectors;
- if (good_sectors)
- __ide_end_request(drive, real_rq, 1, good_sectors);
- if (bad_sectors)
- __ide_end_request(drive, real_rq, 0, bad_sectors);
+ if ((sector >= real_rq->hard_sector) &&
+ (sector < real_rq->hard_sector + real_rq->hard_nr_sectors))
+ good_sectors = sector - real_rq->hard_sector;
+ } else
+ sector = real_rq->hard_sector;
+
+ bad_sectors = real_rq->hard_nr_sectors - good_sectors;
+ if (good_sectors)
+ __ide_end_request(drive, real_rq, 1, good_sectors);
+ if (bad_sectors)
+ __ide_end_request(drive, real_rq, 0, bad_sectors);
+
+ printk(KERN_ERR "%s: failed barrier write: "
+ "sector=%Lx(good=%d/bad=%d)\n",
+ drive->name, (unsigned long long)sector,
+ good_sectors, bad_sectors);
+ }
drive->doing_barrier = 0;
-
- printk(KERN_ERR "%s: failed barrier write: "
- "sector=%Lx(good=%d/bad=%d)\n",
- drive->name, (unsigned long long)sector,
- good_sectors, bad_sectors);
}
/**
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/include/linux/blkdev.h linux-2.6.6-mm5/include/linux/blkdev.h
--- /opt/kernel/linux-2.6.6-mm5/include/linux/blkdev.h 2004-05-23 09:56:21.000000000 +0200
+++ linux-2.6.6-mm5/include/linux/blkdev.h 2004-05-23 11:57:30.227604989 +0200
@@ -571,6 +571,14 @@ extern void end_that_request_last(struct
extern int process_that_request_first(struct request *, unsigned int);
extern void end_request(struct request *req, int uptodate);
+/*
+ * end_that_request_first/chunk() takes an uptodate argument. we account
+ * any value <= as an io error. 0 means -EIO for compatability reasons,
+ * any other < 0 value is the direct error type. An uptodate value of
+ * 1 indicates successful io completion
+ */
+#define end_io_error(uptodate) (unlikely((uptodate) <= 0))
+
static inline void blkdev_dequeue_request(struct request *req)
{
BUG_ON(list_empty(&req->queuelist));
diff -urp -X /home/axboe/cdrom/exclude /opt/kernel/linux-2.6.6-mm5/include/linux/ide.h linux-2.6.6-mm5/include/linux/ide.h
--- /opt/kernel/linux-2.6.6-mm5/include/linux/ide.h 2004-05-23 09:56:21.000000000 +0200
+++ linux-2.6.6-mm5/include/linux/ide.h 2004-05-23 10:09:21.313967868 +0200
@@ -1716,4 +1716,11 @@ static inline int ata_can_queue(ide_driv
extern struct bus_type ide_bus_type;
+/* check if CACHE FLUSH (EXT) command is supported (bits defined in ATA-6) */
+#define ide_id_has_flush_cache(id) ((id)->cfs_enable_2 & 0x3000)
+
+/* some Maxtor disks have bit 13 defined incorrectly so check bit 10 too */
+#define ide_id_has_flush_cache_ext(id) \
+ (((id)->cfs_enable_2 & 0x2400) == 0x2400)
+
#endif /* _IDE_H */
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 8:27 ` Jens Axboe
@ 2004-05-23 10:37 ` Kurt Garloff
2004-05-23 10:56 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Kurt Garloff @ 2004-05-23 10:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: Lorenzo Allegrucci, Andrew Morton, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]
Hi Jens,
On Sun, May 23, 2004 at 10:27:28AM +0200, Jens Axboe wrote:
> @@ -1729,8 +1723,29 @@ static void idedisk_setup (ide_drive_t *
>
> write_cache(drive, 1);
>
> - blk_queue_ordered(drive->queue, 1);
> - blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
> + /*
> + * decide if we can sanely support flushes and barriers on
> + * this drive
> + */
> + if (drive->addressing == 1) {
> + /*
> + * if capacity is over 2^28 sectors, drive must support
> + * FLUSH_CACHE_EXT
> + */
> + if (ide_id_has_flush_cache_ext(id))
> + barrier = 1;
> + else if (capacity <= (1ULL << 28))
> + barrier = 1;
> + else
> + printk("%s: drive is buggy, no support for FLUSH_CACHE_EXT with lba48\n", drive->name);
So, for drives with LBA48, you enable barriers either if report to
support it or if their capacity is _smaller_ than 2^28. If neither
is the case, it's left disabled and the kernel complains.
Is it safe to enable for
(addressing == 1 && !ide_id_has_flush_cache_ext() && capacity <= 1<<28)
?
Shouldn't we check ide_has_flush_cache() then, as for the non-
LBA48 drives?
> + } else if (ide_id_has_flush_cache(id))
> + barrier = 1;
Regards,
--
Kurt Garloff <garloff@suse.de> Cologne, DE
SUSE LINUX AG / Novell, Nuernberg, DE Director SUSE Labs
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 10:37 ` Kurt Garloff
@ 2004-05-23 10:56 ` Jens Axboe
0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 10:56 UTC (permalink / raw)
To: Kurt Garloff, Lorenzo Allegrucci, Andrew Morton, linux-kernel
Cc: Bartlomiej Zolnierkiewicz
On Sun, May 23 2004, Kurt Garloff wrote:
> Hi Jens,
>
> On Sun, May 23, 2004 at 10:27:28AM +0200, Jens Axboe wrote:
> > @@ -1729,8 +1723,29 @@ static void idedisk_setup (ide_drive_t *
> >
> > write_cache(drive, 1);
> >
> > - blk_queue_ordered(drive->queue, 1);
> > - blk_queue_issue_flush_fn(drive->queue, idedisk_issue_flush);
> > + /*
> > + * decide if we can sanely support flushes and barriers on
> > + * this drive
> > + */
> > + if (drive->addressing == 1) {
> > + /*
> > + * if capacity is over 2^28 sectors, drive must support
> > + * FLUSH_CACHE_EXT
> > + */
> > + if (ide_id_has_flush_cache_ext(id))
> > + barrier = 1;
> > + else if (capacity <= (1ULL << 28))
> > + barrier = 1;
> > + else
> > + printk("%s: drive is buggy, no support for FLUSH_CACHE_EXT with lba48\n", drive->name);
>
> So, for drives with LBA48, you enable barriers either if report to
> support it or if their capacity is _smaller_ than 2^28. If neither
> is the case, it's left disabled and the kernel complains.
> Is it safe to enable for
> (addressing == 1 && !ide_id_has_flush_cache_ext() && capacity <= 1<<28)
> ?
Yes that's safe.
> Shouldn't we check ide_has_flush_cache() then, as for the non-
> LBA48 drives?
Yep.
Yeah that was buggy, see later posting (which had a barrier = 1 hanging
from update, argh). Here's a better version imho. I have one drive here
that does support FLUSH_CACHE, but doesn't flag cfs_enable_2 as such. So
I think our logic should be:
/*
* decide if we can sanely support flushes and barriers on
* this drive. unfortunately not all drives advertise
* FLUSH_CACHE
* support even if they support it. So assume FLUSH_CACHE is
* there
* if write back caching is enabled. LBA48 drives are newer, so
* expect it to flag support properly. We can safely support
* FLUSH_CACHE on lba48, if capacity doesn't exceed lba28
*/
barrier = drive->wcache;
if (drive->addressing == 1) {
barrier = ide_id_has_flush_cache(id);
if (capacity > (1ULL << 28) && !ide_id_has_flush_cache_ext(id))
barrier = 0;
}
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 10:03 ` Jens Axboe
@ 2004-05-23 15:32 ` Lorenzo Allegrucci
2004-05-23 15:45 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 15:32 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, linux-kernel
On Sunday 23 May 2004 12:03, Jens Axboe wrote:
> Here's a rolled up updated version that tries to get async notification
> of missing barrier support working as well. reiser currently doesn't
> cope with that correctly (fails mount), ext3 seems to but gets stuck.
> Andrew has that fixed already, I think :-)
>
> Lorenzo, can you test this on top of 2.6.6-mm5?
Problem fixed, but there is some performance regression
ext3 (default)
untar read copy remove
0m53.861s 0m24.942s 1m30.164s 0m20.664s
0m7.132s 0m1.191s 0m0.766s 0m0.076s
0m5.807s 0m3.345s 0m9.996s 0m1.719s
ext3 (-o barrier=1)
untar read copy remove
0m52.117s 0m28.502s 1m51.153s 0m25.561s
0m7.231s 0m1.209s 0m0.738s 0m0.071s
0m6.117s 0m3.191s 0m9.347s 0m1.635s
--
Lorenzo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 15:32 ` Lorenzo Allegrucci
@ 2004-05-23 15:45 ` Jens Axboe
2004-05-23 16:43 ` Lorenzo Allegrucci
0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 15:45 UTC (permalink / raw)
To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel
On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> On Sunday 23 May 2004 12:03, Jens Axboe wrote:
>
> > Here's a rolled up updated version that tries to get async notification
> > of missing barrier support working as well. reiser currently doesn't
> > cope with that correctly (fails mount), ext3 seems to but gets stuck.
> > Andrew has that fixed already, I think :-)
> >
> > Lorenzo, can you test this on top of 2.6.6-mm5?
>
> Problem fixed, but there is some performance regression
>
> ext3 (default)
> untar read copy remove
> 0m53.861s 0m24.942s 1m30.164s 0m20.664s
> 0m7.132s 0m1.191s 0m0.766s 0m0.076s
> 0m5.807s 0m3.345s 0m9.996s 0m1.719s
>
> ext3 (-o barrier=1)
> untar read copy remove
> 0m52.117s 0m28.502s 1m51.153s 0m25.561s
> 0m7.231s 0m1.209s 0m0.738s 0m0.071s
> 0m6.117s 0m3.191s 0m9.347s 0m1.635s
Not sure what you mean here, but yes of course -o barrier=1 is going to
be slower than default + write back caching. What you should compare is
without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
caching on is slower then that's a regression :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 15:45 ` Jens Axboe
@ 2004-05-23 16:43 ` Lorenzo Allegrucci
2004-05-23 16:56 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 16:43 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, linux-kernel
On Sunday 23 May 2004 17:45, Jens Axboe wrote:
> On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> > On Sunday 23 May 2004 12:03, Jens Axboe wrote:
> > > Here's a rolled up updated version that tries to get async notification
> > > of missing barrier support working as well. reiser currently doesn't
> > > cope with that correctly (fails mount), ext3 seems to but gets stuck.
> > > Andrew has that fixed already, I think :-)
> > >
> > > Lorenzo, can you test this on top of 2.6.6-mm5?
> >
> > Problem fixed, but there is some performance regression
> >
> > ext3 (default)
> > untar read copy remove
> > 0m53.861s 0m24.942s 1m30.164s 0m20.664s
> > 0m7.132s 0m1.191s 0m0.766s 0m0.076s
> > 0m5.807s 0m3.345s 0m9.996s 0m1.719s
> >
> > ext3 (-o barrier=1)
> > untar read copy remove
> > 0m52.117s 0m28.502s 1m51.153s 0m25.561s
> > 0m7.231s 0m1.209s 0m0.738s 0m0.071s
> > 0m6.117s 0m3.191s 0m9.347s 0m1.635s
>
> Not sure what you mean here
Untar, read, copy and remove the OpenOffice tarball, each test
run with cold cache (mount/umount cycle).
> but yes of course -o barrier=1 is going to
> be slower than default + write back caching. What you should compare is
> without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
> caching on is slower then that's a regression :-)
hdparm -W0 /dev/hda
ext3 (-o barrier=0)
untar read copy remove
1m55.190s 0m27.633s 2m19.072s 0m21.348s
0m7.081s 0m1.189s 0m0.724s 0m0.083s
0m6.502s 0m3.244s 0m9.715s 0m1.633s
ext3 (-o barrier=1)
untar read copy remove
1m55.358s 0m23.831s 2m16.674s 0m21.508s
0m7.153s 0m1.200s 0m0.748s 0m0.087s
0m6.775s 0m3.358s 0m9.985s 0m1.781s
haparm -W1 /dev/hda
ext3 (-o barrier=0)
untar read copy remove
0m55.405s 0m26.230s 1m28.765s 0m20.766s
0m7.195s 0m1.199s 0m0.773s 0m0.081s
0m6.502s 0m3.359s 0m9.672s 0m1.868s
ext3 (-o barrier=1)
untar read copy remove
0m52.117s 0m28.502s 1m51.153s 0m25.561s
0m7.231s 0m1.209s 0m0.738s 0m0.071s
0m6.117s 0m3.191s 0m9.347s 0m1.635s
--
Lorenzo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 16:43 ` Lorenzo Allegrucci
@ 2004-05-23 16:56 ` Jens Axboe
2004-05-23 17:17 ` Lorenzo Allegrucci
0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 16:56 UTC (permalink / raw)
To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel
On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> On Sunday 23 May 2004 17:45, Jens Axboe wrote:
> > On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> > > On Sunday 23 May 2004 12:03, Jens Axboe wrote:
> > > > Here's a rolled up updated version that tries to get async notification
> > > > of missing barrier support working as well. reiser currently doesn't
> > > > cope with that correctly (fails mount), ext3 seems to but gets stuck.
> > > > Andrew has that fixed already, I think :-)
> > > >
> > > > Lorenzo, can you test this on top of 2.6.6-mm5?
> > >
> > > Problem fixed, but there is some performance regression
> > >
> > > ext3 (default)
> > > untar read copy remove
> > > 0m53.861s 0m24.942s 1m30.164s 0m20.664s
> > > 0m7.132s 0m1.191s 0m0.766s 0m0.076s
> > > 0m5.807s 0m3.345s 0m9.996s 0m1.719s
> > >
> > > ext3 (-o barrier=1)
> > > untar read copy remove
> > > 0m52.117s 0m28.502s 1m51.153s 0m25.561s
> > > 0m7.231s 0m1.209s 0m0.738s 0m0.071s
> > > 0m6.117s 0m3.191s 0m9.347s 0m1.635s
> >
> > Not sure what you mean here
>
> Untar, read, copy and remove the OpenOffice tarball, each test
> run with cold cache (mount/umount cycle).
I understand that, I just don't see how you can call it a regression.
It's a given that barrier will be slower.
> > but yes of course -o barrier=1 is going to
> > be slower than default + write back caching. What you should compare is
> > without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
> > caching on is slower then that's a regression :-)
>
> hdparm -W0 /dev/hda
>
> ext3 (-o barrier=0)
> untar read copy remove
> 1m55.190s 0m27.633s 2m19.072s 0m21.348s
> 0m7.081s 0m1.189s 0m0.724s 0m0.083s
> 0m6.502s 0m3.244s 0m9.715s 0m1.633s
>
> ext3 (-o barrier=1)
> untar read copy remove
> 1m55.358s 0m23.831s 2m16.674s 0m21.508s
> 0m7.153s 0m1.200s 0m0.748s 0m0.087s
> 0m6.775s 0m3.358s 0m9.985s 0m1.781s
>
>
> haparm -W1 /dev/hda
>
> ext3 (-o barrier=0)
> untar read copy remove
> 0m55.405s 0m26.230s 1m28.765s 0m20.766s
> 0m7.195s 0m1.199s 0m0.773s 0m0.081s
> 0m6.502s 0m3.359s 0m9.672s 0m1.868s
>
> ext3 (-o barrier=1)
> untar read copy remove
> 0m52.117s 0m28.502s 1m51.153s 0m25.561s
> 0m7.231s 0m1.209s 0m0.738s 0m0.071s
> 0m6.117s 0m3.191s 0m9.347s 0m1.635s
Your results look a bit over the map, how many runs are your averaging
for each one?
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 16:56 ` Jens Axboe
@ 2004-05-23 17:17 ` Lorenzo Allegrucci
2004-05-23 17:31 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 17:17 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, linux-kernel
On Sunday 23 May 2004 18:56, Jens Axboe wrote:
> > Untar, read, copy and remove the OpenOffice tarball, each test
> > run with cold cache (mount/umount cycle).
>
> I understand that, I just don't see how you can call it a regression.
> It's a given that barrier will be slower.
I'm sorry, I didn't know :)
I read from www.kerneltrap.org:
Request barriers, also known as write barriers, provide a mechanism for
guaranteeing the order of disk I/O operations without actually waiting for
the data to be written to disk. Specifically, a request barrier guarantees
that any data queued up prior to the the barrier will be written to disk
before data queued up after the barrier. Without a request barrier, the block
layer can reorder how data is written to disk for maximum performance. The
problem with this being most notably with journaling filesystems which
require that their metadata be updated prior to actually updating data,
allowing true crash recovery. Without request barriers, a journaling
filesystem has to wait for the metadata change to be written to disk before
it can proceed with actually updating the filesystem. Hence, the addition of
request barriers provides a performance boost for journaled filesystems. In
-mm5, request barriers are enabled by default for reiserfs and ext3.
> > > but yes of course -o barrier=1 is going to
> > > be slower than default + write back caching. What you should compare is
> > > without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
> > > caching on is slower then that's a regression :-)
> >
> > hdparm -W0 /dev/hda
> >
> > ext3 (-o barrier=0)
> > untar read copy remove
> > 1m55.190s 0m27.633s 2m19.072s 0m21.348s
> > 0m7.081s 0m1.189s 0m0.724s 0m0.083s
> > 0m6.502s 0m3.244s 0m9.715s 0m1.633s
> >
> > ext3 (-o barrier=1)
> > untar read copy remove
> > 1m55.358s 0m23.831s 2m16.674s 0m21.508s
> > 0m7.153s 0m1.200s 0m0.748s 0m0.087s
> > 0m6.775s 0m3.358s 0m9.985s 0m1.781s
> >
> >
> > haparm -W1 /dev/hda
> >
> > ext3 (-o barrier=0)
> > untar read copy remove
> > 0m55.405s 0m26.230s 1m28.765s 0m20.766s
> > 0m7.195s 0m1.199s 0m0.773s 0m0.081s
> > 0m6.502s 0m3.359s 0m9.672s 0m1.868s
> >
> > ext3 (-o barrier=1)
> > untar read copy remove
> > 0m52.117s 0m28.502s 1m51.153s 0m25.561s
> > 0m7.231s 0m1.209s 0m0.738s 0m0.071s
> > 0m6.117s 0m3.191s 0m9.347s 0m1.635s
>
> Your results look a bit over the map, how many runs are your averaging
> for each one?
Just one run, no averaging.
Yes, it's not a scientific approach, but I have not enough time
and this is my production machine :)
By experience I can say that numbers between each run are quite
stable and reproducible.
--
Lorenzo
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 17:17 ` Lorenzo Allegrucci
@ 2004-05-23 17:31 ` Jens Axboe
2004-05-23 20:41 ` Lorenzo Allegrucci
0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2004-05-23 17:31 UTC (permalink / raw)
To: Lorenzo Allegrucci; +Cc: Andrew Morton, linux-kernel
On Sun, May 23 2004, Lorenzo Allegrucci wrote:
> On Sunday 23 May 2004 18:56, Jens Axboe wrote:
>
> > > Untar, read, copy and remove the OpenOffice tarball, each test
> > > run with cold cache (mount/umount cycle).
> >
> > I understand that, I just don't see how you can call it a regression.
> > It's a given that barrier will be slower.
>
> I'm sorry, I didn't know :)
>
> I read from www.kerneltrap.org:
>
> Request barriers, also known as write barriers, provide a mechanism for
[snip]
Ah ok, I see the confusion! ext3 does nothing clever with barriers, it
just uses it for data integrity. A journalled fs is normally not safe to
run on write back cached drives if they aren't battery backed. You could
try reiser instead, it's has a more intelligent use of barriers.
> > > > but yes of course -o barrier=1 is going to
> > > > be slower than default + write back caching. What you should compare is
> > > > without barrier support and hdparm -W0 /dev/hdX, if -o barrier=1 with
> > > > caching on is slower then that's a regression :-)
> > >
> > > hdparm -W0 /dev/hda
> > >
> > > ext3 (-o barrier=0)
> > > untar read copy remove
> > > 1m55.190s 0m27.633s 2m19.072s 0m21.348s
> > > 0m7.081s 0m1.189s 0m0.724s 0m0.083s
> > > 0m6.502s 0m3.244s 0m9.715s 0m1.633s
> > >
> > > ext3 (-o barrier=1)
> > > untar read copy remove
> > > 1m55.358s 0m23.831s 2m16.674s 0m21.508s
> > > 0m7.153s 0m1.200s 0m0.748s 0m0.087s
> > > 0m6.775s 0m3.358s 0m9.985s 0m1.781s
> > >
> > >
> > > haparm -W1 /dev/hda
> > >
> > > ext3 (-o barrier=0)
> > > untar read copy remove
> > > 0m55.405s 0m26.230s 1m28.765s 0m20.766s
> > > 0m7.195s 0m1.199s 0m0.773s 0m0.081s
> > > 0m6.502s 0m3.359s 0m9.672s 0m1.868s
> > >
> > > ext3 (-o barrier=1)
> > > untar read copy remove
> > > 0m52.117s 0m28.502s 1m51.153s 0m25.561s
> > > 0m7.231s 0m1.209s 0m0.738s 0m0.071s
> > > 0m6.117s 0m3.191s 0m9.347s 0m1.635s
> >
> > Your results look a bit over the map, how many runs are your averaging
> > for each one?
>
> Just one run, no averaging.
> Yes, it's not a scientific approach, but I have not enough time
> and this is my production machine :)
> By experience I can say that numbers between each run are quite
> stable and reproducible.
It just looks odd that eg reads vary as much as they do, and that -o
barrier=1 makes -W0 reads faster (and faster then -W1 even). remove
looks reasonable for -W1, but -W0 is still faster. That is _really_ odd.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier
2004-05-23 17:31 ` Jens Axboe
@ 2004-05-23 20:41 ` Lorenzo Allegrucci
0 siblings, 0 replies; 17+ messages in thread
From: Lorenzo Allegrucci @ 2004-05-23 20:41 UTC (permalink / raw)
To: Jens Axboe; +Cc: Andrew Morton, linux-kernel
On Sunday 23 May 2004 19:31, Jens Axboe wrote:
> It just looks odd that eg reads vary as much as they do, and that -o
> barrier=1 makes -W0 reads faster (and faster then -W1 even). remove
> looks reasonable for -W1, but -W0 is still faster. That is _really_ odd.
I rerun all tests with noatime.
Using noatime numbers look more predictable but I cannot guarantee
any statistical relevance.
hdparm -W1
ext3 (-o barrier=0,noatime)
untar read copy remove
0m56.744s 0m22.452s 1m26.558s 0m20.683s
0m7.187s 0m1.153s 0m0.747s 0m0.101s
0m6.077s 0m3.045s 0m9.132s 0m1.907s
ext3 (-o barrier=1,noatime)
untar read copy remove
0m55.152s 0m21.340s 1m23.705s 0m19.369s
0m7.237s 0m1.217s 0m0.788s 0m0.077s
0m6.513s 0m3.082s 0m9.226s 0m1.507s
haparm -W0
ext3 (-o barrier=0,noatime)
untar read copy remove
1m55.221s 0m23.768s 2m12.458s 0m22.256s
0m7.227s 0m1.216s 0m0.756s 0m0.085s
0m6.424s 0m3.002s 0m9.085s 0m1.560s
ext3 (-o barrier=1,noatime)
untar read copy remove
1m56.578s 0m23.382s 2m14.117s 0m22.289s
0m7.035s 0m1.202s 0m0.766s 0m0.089s
0m6.814s 0m3.103s 0m9.141s 0m1.696s
--
Lorenzo
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2004-05-23 20:40 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-05-22 19:07 2.6.6-mm5 oops mounting ext3 or reiserfs with -o barrier Lorenzo Allegrucci
2004-05-22 19:21 ` Andrew Morton
2004-05-22 21:20 ` Jens Axboe
2004-05-22 21:30 ` Jens Axboe
2004-05-23 8:58 ` Lorenzo Allegrucci
2004-05-23 9:11 ` Jens Axboe
2004-05-23 10:03 ` Jens Axboe
2004-05-23 15:32 ` Lorenzo Allegrucci
2004-05-23 15:45 ` Jens Axboe
2004-05-23 16:43 ` Lorenzo Allegrucci
2004-05-23 16:56 ` Jens Axboe
2004-05-23 17:17 ` Lorenzo Allegrucci
2004-05-23 17:31 ` Jens Axboe
2004-05-23 20:41 ` Lorenzo Allegrucci
2004-05-23 8:27 ` Jens Axboe
2004-05-23 10:37 ` Kurt Garloff
2004-05-23 10:56 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).