LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) @ 2008-02-25 11:23 Gaudenz Steinlin 2008-02-25 11:34 ` Johannes Berg 2008-02-25 18:15 ` [xfs-masters] " Eric Sandeen 0 siblings, 2 replies; 14+ messages in thread From: Gaudenz Steinlin @ 2008-02-25 11:23 UTC (permalink / raw) To: xfs-masters; +Cc: xfs, johannes, linux-kernel Hi Since upgrading to 2.6.25-rc1 I see filesystem corruption on my XFS filesystem. I can reproduce this by doing "git reset --hard v2.6.25-rc1" on a git checkout which is on some other revision. Git outputs strange error messages (like file xxx is a directory when xxx really is a file) and sometimes the filesystem "hangs" (I can no longer do any operations on it even from another shell). If I reboot with a working kernel and check the filesystem xfs_check reports many errors. I also see the problem when doing other (not related to git) operations on the filesystem. Git reset is just the easiest way to reproduce it. I was able to track this corruption down to commit a69b176df246d59626e6a9c640b44c0921fa4566 ([XFS] Use the generic bitops rather than implementing them ourselves.) using git bisect. Reverting edd319dc527733e61eec5bdc9ce20c94634b6482 ([XFS] Fix xfs_lowbit64) to avoid merge conflicts and the faulty commit on top of 2.6.25-rc3 fixes the problem. My filesystem is on an LVM2 logical volume and my computer is a PowerBook G4 (model 5,8). I'm using GCC 4.2.3. My problem is similar to the problem Johannes Berg reported in: http://oss.sgi.com/archives/xfs/2008-02/msg00244.html AFAIK Johannes also uses a PowerBook. Maybe this is an endianness issue. Gaudenz -- Ever tried. Ever failed. No matter. Try again. Fail again. Fail better. ~ Samuel Beckett ~ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-25 11:23 filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) Gaudenz Steinlin @ 2008-02-25 11:34 ` Johannes Berg 2008-02-25 18:15 ` [xfs-masters] " Eric Sandeen 1 sibling, 0 replies; 14+ messages in thread From: Johannes Berg @ 2008-02-25 11:34 UTC (permalink / raw) To: Gaudenz Steinlin; +Cc: xfs-masters, xfs, linux-kernel [-- Attachment #1: Type: text/plain, Size: 904 bytes --] Hi, > Git reset is just the easiest way to reproduce it. Interesting :) > I was able to track this corruption down to commit > a69b176df246d59626e6a9c640b44c0921fa4566 ([XFS] Use the generic bitops > rather than implementing them ourselves.) using git bisect. > > Reverting edd319dc527733e61eec5bdc9ce20c94634b6482 ([XFS] Fix > xfs_lowbit64) to avoid merge conflicts and the faulty commit on top of > 2.6.25-rc3 fixes the problem. Odd. The replaced code doesn't look like it has any sort of endianness assumptions. > My filesystem is on an LVM2 logical volume and my computer is a > PowerBook G4 (model 5,8). I'm using GCC 4.2.3. > > My problem is similar to the problem Johannes Berg reported in: > http://oss.sgi.com/archives/xfs/2008-02/msg00244.html > > AFAIK Johannes also uses a PowerBook. Indeed, I do, forgot to mention that, thanks for copying me. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-25 11:23 filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) Gaudenz Steinlin 2008-02-25 11:34 ` Johannes Berg @ 2008-02-25 18:15 ` Eric Sandeen 2008-02-25 23:42 ` Rafael J. Wysocki 1 sibling, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2008-02-25 18:15 UTC (permalink / raw) To: xfs-masters; +Cc: xfs, johannes, linux-kernel Mailing List Gaudenz Steinlin wrote: > Hi > > Since upgrading to 2.6.25-rc1 I see filesystem corruption on my XFS > filesystem. I can reproduce this by doing "git reset --hard v2.6.25-rc1" > on a git checkout which is on some other revision. Git outputs strange > error messages (like file xxx is a directory when xxx really is a file) > and sometimes the filesystem "hangs" (I can no longer do any operations > on it even from another shell). If I reboot with a working kernel and > check the filesystem xfs_check reports many errors. I also see the > problem when doing other (not related to git) operations on the > filesystem. Git reset is just the easiest way to reproduce it. > > I was able to track this corruption down to commit > a69b176df246d59626e6a9c640b44c0921fa4566 ([XFS] Use the generic bitops > rather than implementing them ourselves.) using git bisect. > > Reverting edd319dc527733e61eec5bdc9ce20c94634b6482 ([XFS] Fix > xfs_lowbit64) to avoid merge conflicts and the faulty commit on top of > 2.6.25-rc3 fixes the problem. If you're feeling motivated, maybe you can narrow it down to which of the changes - xfs_highbit32, xfs_highbit64, xfs_lowbit32, or xfs_lowbit64 - is causing the problem? (or maybe they all are ...) Or maybe someone looking at the commit can immediately see the problem... but I can't :) -Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-25 18:15 ` [xfs-masters] " Eric Sandeen @ 2008-02-25 23:42 ` Rafael J. Wysocki 2008-02-25 23:48 ` Eric Sandeen 0 siblings, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2008-02-25 23:42 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs-masters, xfs, johannes, linux-kernel Mailing List On Monday, 25 of February 2008, Eric Sandeen wrote: > Gaudenz Steinlin wrote: > > Hi > > > > Since upgrading to 2.6.25-rc1 I see filesystem corruption on my XFS > > filesystem. I can reproduce this by doing "git reset --hard v2.6.25-rc1" > > on a git checkout which is on some other revision. Git outputs strange > > error messages (like file xxx is a directory when xxx really is a file) > > and sometimes the filesystem "hangs" (I can no longer do any operations > > on it even from another shell). If I reboot with a working kernel and > > check the filesystem xfs_check reports many errors. I also see the > > problem when doing other (not related to git) operations on the > > filesystem. Git reset is just the easiest way to reproduce it. > > > > I was able to track this corruption down to commit > > a69b176df246d59626e6a9c640b44c0921fa4566 ([XFS] Use the generic bitops > > rather than implementing them ourselves.) using git bisect. > > > > Reverting edd319dc527733e61eec5bdc9ce20c94634b6482 ([XFS] Fix > > xfs_lowbit64) to avoid merge conflicts and the faulty commit on top of > > 2.6.25-rc3 fixes the problem. > > If you're feeling motivated, maybe you can narrow it down to which of > the changes - xfs_highbit32, xfs_highbit64, xfs_lowbit32, or > xfs_lowbit64 - is causing the problem? (or maybe they all are ...) > > Or maybe someone looking at the commit can immediately see the > problem... but I can't :) Well, IMO a reproducible filesystem corruption is a serious enough issue for reverting all of the commits in question. Thanks, Rafael ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-25 23:42 ` Rafael J. Wysocki @ 2008-02-25 23:48 ` Eric Sandeen 2008-02-25 23:52 ` Rafael J. Wysocki 0 siblings, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2008-02-25 23:48 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: xfs-masters, xfs, johannes, linux-kernel Mailing List Rafael J. Wysocki wrote: > On Monday, 25 of February 2008, Eric Sandeen wrote: >> If you're feeling motivated, maybe you can narrow it down to which of >> the changes - xfs_highbit32, xfs_highbit64, xfs_lowbit32, or >> xfs_lowbit64 - is causing the problem? (or maybe they all are ...) >> >> Or maybe someone looking at the commit can immediately see the >> problem... but I can't :) > > Well, IMO a reproducible filesystem corruption is a serious enough issue > for reverting all of the commits in question. I'm not suggesting a partial revert; I just wonder which part of the change is causing the problem, as part of the debugging process. -Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-25 23:48 ` Eric Sandeen @ 2008-02-25 23:52 ` Rafael J. Wysocki 2008-02-25 23:57 ` [xfs-masters] " Christoph Hellwig 0 siblings, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2008-02-25 23:52 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs-masters, xfs, johannes, linux-kernel Mailing List On Tuesday, 26 of February 2008, Eric Sandeen wrote: > Rafael J. Wysocki wrote: > > On Monday, 25 of February 2008, Eric Sandeen wrote: > > > >> If you're feeling motivated, maybe you can narrow it down to which of > >> the changes - xfs_highbit32, xfs_highbit64, xfs_lowbit32, or > >> xfs_lowbit64 - is causing the problem? (or maybe they all are ...) > >> > >> Or maybe someone looking at the commit can immediately see the > >> problem... but I can't :) > > > > Well, IMO a reproducible filesystem corruption is a serious enough issue > > for reverting all of the commits in question. > > I'm not suggesting a partial revert; I just wonder which part of the > change is causing the problem, as part of the debugging process. Understood. My point is, if that's not practical (whatever the reason), I'd consider reverting all of the commits in question. Thanks, Rafael ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-25 23:52 ` Rafael J. Wysocki @ 2008-02-25 23:57 ` Christoph Hellwig 2008-02-26 0:13 ` Rafael J. Wysocki 0 siblings, 1 reply; 14+ messages in thread From: Christoph Hellwig @ 2008-02-25 23:57 UTC (permalink / raw) To: xfs-masters; +Cc: Eric Sandeen, xfs, johannes, linux-kernel Mailing List On Tue, Feb 26, 2008 at 12:52:56AM +0100, Rafael J. Wysocki wrote: > > I'm not suggesting a partial revert; I just wonder which part of the > > change is causing the problem, as part of the debugging process. > > Understood. > > My point is, if that's not practical (whatever the reason), I'd consider > reverting all of the commits in question. If you could revert all of them and verify it makes the problem go away that would be a very good start already. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-25 23:57 ` [xfs-masters] " Christoph Hellwig @ 2008-02-26 0:13 ` Rafael J. Wysocki 2008-02-26 7:34 ` Gaudenz Steinlin 2008-02-26 11:44 ` Gaudenz Steinlin 0 siblings, 2 replies; 14+ messages in thread From: Rafael J. Wysocki @ 2008-02-26 0:13 UTC (permalink / raw) To: Christoph Hellwig Cc: xfs-masters, Eric Sandeen, xfs, johannes, linux-kernel Mailing List, Gaudenz Steinlin On Tuesday, 26 of February 2008, Christoph Hellwig wrote: > On Tue, Feb 26, 2008 at 12:52:56AM +0100, Rafael J. Wysocki wrote: > > > I'm not suggesting a partial revert; I just wonder which part of the > > > change is causing the problem, as part of the debugging process. > > > > Understood. > > > > My point is, if that's not practical (whatever the reason), I'd consider > > reverting all of the commits in question. > > If you could revert all of them and verify it makes the problem go away > that would be a very good start already. The original reporter (CC added) said exactly that, if I understood him correctly: http://lkml.org/lkml/2008/2/25/123 Thanks, Rafael ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-26 0:13 ` Rafael J. Wysocki @ 2008-02-26 7:34 ` Gaudenz Steinlin 2008-02-26 11:44 ` Gaudenz Steinlin 1 sibling, 0 replies; 14+ messages in thread From: Gaudenz Steinlin @ 2008-02-26 7:34 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Christoph Hellwig, xfs-masters, Eric Sandeen, xfs, johannes, linux-kernel Mailing List On Tue, Feb 26, 2008 at 01:13:56AM +0100, Rafael J. Wysocki wrote: > On Tuesday, 26 of February 2008, Christoph Hellwig wrote: > > On Tue, Feb 26, 2008 at 12:52:56AM +0100, Rafael J. Wysocki wrote: > > > > I'm not suggesting a partial revert; I just wonder which part of the > > > > change is causing the problem, as part of the debugging process. > > > > > > Understood. > > > > > > My point is, if that's not practical (whatever the reason), I'd consider > > > reverting all of the commits in question. > > > > If you could revert all of them and verify it makes the problem go away > > that would be a very good start already. > > The original reporter (CC added) said exactly that, if I understood him > correctly: > > http://lkml.org/lkml/2008/2/25/123 Sorry if I was not clear. The problematic commit after bisecting is a69b176df246d59626e6a9c640b44c0921fa4566. Reverting this commit and commit edd319dc527733e61eec5bdc9ce20c94634b6482 fixes the problem. So all other commits in the XFS merge for 2.6.25 seem to be OK. I had to revert the second commit only to avoid a merge conflict. And I forget to mention on my first post: Please CC me on all replies. I'm not subscribed to the lists. Gaudenz -- Ever tried. Ever failed. No matter. Try again. Fail again. Fail better. ~ Samuel Beckett ~ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-26 0:13 ` Rafael J. Wysocki 2008-02-26 7:34 ` Gaudenz Steinlin @ 2008-02-26 11:44 ` Gaudenz Steinlin 2008-02-26 18:11 ` Johannes Berg 2008-02-26 20:05 ` Eric Sandeen 1 sibling, 2 replies; 14+ messages in thread From: Gaudenz Steinlin @ 2008-02-26 11:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Christoph Hellwig, xfs-masters, Eric Sandeen, xfs, johannes, linux-kernel Mailing List On Tue, Feb 26, 2008 at 01:13:56AM +0100, Rafael J. Wysocki wrote: > On Tuesday, 26 of February 2008, Christoph Hellwig wrote: > > On Tue, Feb 26, 2008 at 12:52:56AM +0100, Rafael J. Wysocki wrote: > > > > I'm not suggesting a partial revert; I just wonder which part of the > > > > change is causing the problem, as part of the debugging process. I debuged this a bit further by testing the 4 changed functions individually. The problem only occurs with the new version of xfs_lowbit64. Gaudenz -- Ever tried. Ever failed. No matter. Try again. Fail again. Fail better. ~ Samuel Beckett ~ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-26 11:44 ` Gaudenz Steinlin @ 2008-02-26 18:11 ` Johannes Berg 2008-02-28 14:40 ` Eric Sandeen 2008-02-26 20:05 ` Eric Sandeen 1 sibling, 1 reply; 14+ messages in thread From: Johannes Berg @ 2008-02-26 18:11 UTC (permalink / raw) To: Gaudenz Steinlin Cc: Rafael J. Wysocki, Christoph Hellwig, xfs-masters, Eric Sandeen, xfs, linux-kernel Mailing List, Andi Kleen [-- Attachment #1: Type: text/plain, Size: 1016 bytes --] > I debuged this a bit further by testing the 4 changed functions > individually. The problem only occurs with the new version of > xfs_lowbit64. Eh, uh, of course. Now that I look at that code it becomes obvious. find_first_bit() works on unsigned longs, not 64-bit quantities, so find_first_bit((unsigned long *)&t, 64) isn't equivalent to finding the lowest bit set in a 64-bit quantity. Think of the memory layout of a 64-bit word: LE: low 32 bits | high 32 bits BE: high 32 bits | low 32 bits Take a look at the start of include/asm-powerpc/bitops.h, and note how bitops don't define the memory layout at all :) So find_first_bit(&t, 64) on BE will give you the number of the first bit of the 32-bit rotated quantity, ie. of ((t<<32) | (t>>32)). The problem doesn't happen with highbit64 because fls64 was specifically coded for this purpose. You really need to keep xfs_lowbit64 defined as it was before, or, maybe even better, define ffs64 in parallel to fls64. johannes [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-26 18:11 ` Johannes Berg @ 2008-02-28 14:40 ` Eric Sandeen 0 siblings, 0 replies; 14+ messages in thread From: Eric Sandeen @ 2008-02-28 14:40 UTC (permalink / raw) To: Johannes Berg Cc: Gaudenz Steinlin, Rafael J. Wysocki, Christoph Hellwig, xfs-masters, xfs, linux-kernel Mailing List, Andi Kleen Johannes Berg wrote: >> I debuged this a bit further by testing the 4 changed functions >> individually. The problem only occurs with the new version of >> xfs_lowbit64. ... > You really need to keep xfs_lowbit64 defined as it was before, or, maybe > even better, define ffs64 in parallel to fls64. Yep, I agree. -Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-26 11:44 ` Gaudenz Steinlin 2008-02-26 18:11 ` Johannes Berg @ 2008-02-26 20:05 ` Eric Sandeen 2008-02-26 20:59 ` Mark Goodwin 1 sibling, 1 reply; 14+ messages in thread From: Eric Sandeen @ 2008-02-26 20:05 UTC (permalink / raw) To: xfs-masters Cc: Rafael J. Wysocki, Christoph Hellwig, xfs, johannes, linux-kernel Mailing List Gaudenz Steinlin wrote: > On Tue, Feb 26, 2008 at 01:13:56AM +0100, Rafael J. Wysocki wrote: >> On Tuesday, 26 of February 2008, Christoph Hellwig wrote: >>> On Tue, Feb 26, 2008 at 12:52:56AM +0100, Rafael J. Wysocki wrote: >>>>> I'm not suggesting a partial revert; I just wonder which part of the >>>>> change is causing the problem, as part of the debugging process. > > I debuged this a bit further by testing the 4 changed functions > individually. The problem only occurs with the new version of > xfs_lowbit64. FWIW, Dave & I did some testing/debugging on 32-bit powerpc, and it is indeed only xfs_lowbit64 which is doing the wrong thing on that arch, because generic find_next_bit is doing the wrong thing on big-endian 32-bit systems, for sizes > 32 bits, near as I can tell. Rather than reverting it all, I think just changing xfs_lowbit64 back to: int xfs_lowbit64( __uint64_t v) { __uint32_t w = (__uint32_t)v; int n = 0; if (w) { /* lower bits */ n = ffs(w); } else { /* upper bits */ w = (__uint32_t)(v >> 32); if (w && (n = ffs(w))) n += 32; } return n - 1; } for now should fix it (this is essentially just ffs64()) -Eric ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [xfs-masters] Re: filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) 2008-02-26 20:05 ` Eric Sandeen @ 2008-02-26 20:59 ` Mark Goodwin 0 siblings, 0 replies; 14+ messages in thread From: Mark Goodwin @ 2008-02-26 20:59 UTC (permalink / raw) To: xfs-masters Cc: Rafael J. Wysocki, Christoph Hellwig, xfs, johannes, linux-kernel Mailing List Eric Sandeen wrote: > Gaudenz Steinlin wrote: >> On Tue, Feb 26, 2008 at 01:13:56AM +0100, Rafael J. Wysocki wrote: >>> On Tuesday, 26 of February 2008, Christoph Hellwig wrote: >>>> On Tue, Feb 26, 2008 at 12:52:56AM +0100, Rafael J. Wysocki wrote: >>>>>> I'm not suggesting a partial revert; I just wonder which part of the >>>>>> change is causing the problem, as part of the debugging process. >> I debuged this a bit further by testing the 4 changed functions >> individually. The problem only occurs with the new version of >> xfs_lowbit64. > > FWIW, Dave & I did some testing/debugging on 32-bit powerpc, and it is > indeed only xfs_lowbit64 which is doing the wrong thing on that arch, > because generic find_next_bit is doing the wrong thing on big-endian > 32-bit systems, for sizes > 32 bits, near as I can tell. > > Rather than reverting it all, I think just changing xfs_lowbit64 back to: > ... Thanks Eric (and Dave), we'll look at your proposed fix. But for now, the bit ops cleanup patch has been fully reverted, see Lachlan's earlier mail and git pull request. We're also expanding our internal h/w test matrix to include some additional platforms to avoid this kind of thing in the future. Cheers -- Mark ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2008-02-28 14:40 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-02-25 11:23 filesystem corruption on xfs after 2.6.25-rc1 (bisected, powerpc related?) Gaudenz Steinlin 2008-02-25 11:34 ` Johannes Berg 2008-02-25 18:15 ` [xfs-masters] " Eric Sandeen 2008-02-25 23:42 ` Rafael J. Wysocki 2008-02-25 23:48 ` Eric Sandeen 2008-02-25 23:52 ` Rafael J. Wysocki 2008-02-25 23:57 ` [xfs-masters] " Christoph Hellwig 2008-02-26 0:13 ` Rafael J. Wysocki 2008-02-26 7:34 ` Gaudenz Steinlin 2008-02-26 11:44 ` Gaudenz Steinlin 2008-02-26 18:11 ` Johannes Berg 2008-02-28 14:40 ` Eric Sandeen 2008-02-26 20:05 ` Eric Sandeen 2008-02-26 20:59 ` Mark Goodwin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).