Vladislav Bolkhovitin wrote: > Nick Piggin wrote: >> On Wednesday 29 October 2008 06:38, Vladislav Bolkhovitin wrote: >>> Nick Piggin wrote: >>>> On Saturday 25 October 2008 03:10, Vladislav Bolkhovitin wrote: >>>>> Hi, >>>>> >>>>> During recent debugging session of my SCSI target SCST >>>>> (http://scst.sf.net) I noticed many >>>>> >>>>> WARNING: at fs/buffer.c:1186 mark_buffer_dirty+0x51/0x66() >>>>> >>>>> messages in kernel log on the initiator. I attached the full log of >>>>> several of them. >>>>> >>>>> My target was buggy and I was working on fixing it, but I suppose Linux >>>>> should handle such failures more gracefully. In all the cases the target >>>>> had one type of failure: it "ate" a SCSI command and never returned >>>>> result of it. >>>> Right. This is one of the warnings I see in my fault-injection testing. >>>> It is fixed by my patch to clean up and improve the page and buffer >>>> error handling in the vm/fs. >>> Can you specify which patch you referring? Is it in 2.6.27? >> It's just an RFC at the moment which I posted to fsdevel. Not in 2.6.27. > > I see. I'm looking forward to see it in 2.6.28 or .29. This is really a > needed work. > > BTW, have you even seen in your fault-injection testing that after > receiving a failure from a SCSI device during heavy load ext3 file > system mounted on it gets corrupted and journal replay on remount > doesn't repair it, only manual e2fsck helps? I've many times seen that, > including cases when the target was remaining up and fully functional. > See, e.g., "MOANING MODE ON" part in > http://marc.info/?l=linux-scsi&m=121932252324432&w=2. I haven't checked > that case since then, although I see such corruptions quite often. But > in all them I can't so clearly say that it isn't a target's failure. I've just checked it with 2.6.27. The situation greatly improved and dbench was able to complete several runs under constant TASK_ABORTED "bombarding" (TASK RESET task management commands using "sg_reset -b" each 31 seconds from another "connection" to that device via qla2xxx initiator driver. You can see those resets in the attached log). But when then I unmounted the affected partition, e2fsck found errors on it. See attachments for details. The target all the times was fine and completely healthy.