LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* kerneloops.org: 2.6.28-rc regression in epoll (list corruption)
@ 2008-10-26 18:29 Arjan van de Ven
  2008-10-26 18:38 ` Davide Libenzi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Arjan van de Ven @ 2008-10-26 18:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: torvalds, Rafael J. Wysocki, davidel

This one is upcoming fast (and I just hit it as well)

http://www.kerneloops.org/searchweek.php?search=ep_poll_callback

seems epoll grew some list corruption....


-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kerneloops.org: 2.6.28-rc regression in epoll (list corruption)
  2008-10-26 18:29 kerneloops.org: 2.6.28-rc regression in epoll (list corruption) Arjan van de Ven
@ 2008-10-26 18:38 ` Davide Libenzi
  2008-10-26 18:43 ` Rafael J. Wysocki
  2008-10-26 18:58 ` Linus Torvalds
  2 siblings, 0 replies; 6+ messages in thread
From: Davide Libenzi @ 2008-10-26 18:38 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andrew Morton, Linux Kernel Mailing List, Linus Torvalds,
	Rafael J. Wysocki

On Sun, 26 Oct 2008, Arjan van de Ven wrote:

> This one is upcoming fast (and I just hit it as well)
> 
> http://www.kerneloops.org/searchweek.php?search=ep_poll_callback
> 
> seems epoll grew some list corruption....

I sent a patch Andrew-ward on 10/17:

http://marc.info/?l=linux-kernel&m=122428548613067&w=2



- Davide



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kerneloops.org: 2.6.28-rc regression in epoll (list corruption)
  2008-10-26 18:29 kerneloops.org: 2.6.28-rc regression in epoll (list corruption) Arjan van de Ven
  2008-10-26 18:38 ` Davide Libenzi
@ 2008-10-26 18:43 ` Rafael J. Wysocki
  2008-10-26 18:58 ` Linus Torvalds
  2 siblings, 0 replies; 6+ messages in thread
From: Rafael J. Wysocki @ 2008-10-26 18:43 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel, torvalds, davidel, Andrew Morton

On Sunday, 26 of October 2008, Arjan van de Ven wrote:
> This one is upcoming fast (and I just hit it as well)
> 
> http://www.kerneloops.org/searchweek.php?search=ep_poll_callback
> 
> seems epoll grew some list corruption....

Yes, I think this is http://bugzilla.kernel.org/show_bug.cgi?id=11831 and
there's a patch from Davide fixing this bug, at:
http://lkml.org/lkml/2008/10/17/491

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kerneloops.org: 2.6.28-rc regression in epoll (list corruption)
  2008-10-26 18:29 kerneloops.org: 2.6.28-rc regression in epoll (list corruption) Arjan van de Ven
  2008-10-26 18:38 ` Davide Libenzi
  2008-10-26 18:43 ` Rafael J. Wysocki
@ 2008-10-26 18:58 ` Linus Torvalds
  2008-10-26 19:33   ` Arjan van de Ven
  2008-10-26 19:35   ` Davide Libenzi
  2 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2008-10-26 18:58 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linux Kernel Mailing List, Rafael J. Wysocki, davidel, Thomas Gleixner



On Sun, 26 Oct 2008, Arjan van de Ven wrote:
>
> This one is upcoming fast (and I just hit it as well)
> 
> http://www.kerneloops.org/searchweek.php?search=ep_poll_callback
> 
> seems epoll grew some list corruption....

It sounds very much like f337b9c58332bdecde965b436e47ea4c94d30da0 ("epoll: 
drop unnecessary test") deleted a test that wasn't so unnecessary after 
all..

That ep_poll_callback() code is:

        /* If this file is already in the ready list we exit soon */
        if (ep_is_linked(&epi->rdllink))
                goto is_linked;

        list_add_tail(&epi->rdllink, &ep->rdllist);

and the unnecessary test that was removed looks _very_ much like that kind 
of code.

Thomas? Davide?

And if somebody knows how to reproduce this reliably, it would be really 
good to hear if doing a revert on that thing just fixed it. It should 
revert cleanly - it's the only change to fs/eventpoll.c since 2.6.27.

I'm somewhat inclined to revert it without even getting confirmation, 
since I wanted to do an early -rc2 today with all the brown-paper-bag 
fixes that have accumulated. But it would be good to get some 
confirmation.

Btw, that whole logic in ep_send_events() sounds a bit scary. It says:

	We can loop without lock because this is a task private list.

and it's true that "txlist" is a private list, but it still seems to 
depend on the fact that none of the "struct epitem"s on that list can be 
reached by any other means. And that whole thing depends on the magic 
behaviour of 'ep->ovflist', but there are a lot of code sequences that do 
*not* seem to test that at all.

Maybe the rqace/bug has always been there (ie some sequence that works 
with an "epi->rdllink" without checking ovflist), but the "unnecessary" 
test protected us from seeing it in practice.

			Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kerneloops.org: 2.6.28-rc regression in epoll (list corruption)
  2008-10-26 18:58 ` Linus Torvalds
@ 2008-10-26 19:33   ` Arjan van de Ven
  2008-10-26 19:35   ` Davide Libenzi
  1 sibling, 0 replies; 6+ messages in thread
From: Arjan van de Ven @ 2008-10-26 19:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Rafael J. Wysocki, davidel, Thomas Gleixner

On Sun, 26 Oct 2008 11:58:06 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Sun, 26 Oct 2008, Arjan van de Ven wrote:
> >
> > This one is upcoming fast (and I just hit it as well)
> > 
> > http://www.kerneloops.org/searchweek.php?search=ep_poll_callback
> > 
> > seems epoll grew some list corruption....
> 
> It sounds very much like f337b9c58332bdecde965b436e47ea4c94d30da0
> ("epoll: drop unnecessary test") deleted a test that wasn't so
> unnecessary after all..
> 
> That ep_poll_callback() code is:
> 
>         /* If this file is already in the ready list we exit soon */
>         if (ep_is_linked(&epi->rdllink))
>                 goto is_linked;
> 
>         list_add_tail(&epi->rdllink, &ep->rdllist);
> 
> and the unnecessary test that was removed looks _very_ much like that
> kind of code.
> 
> Thomas? Davide?
> 
> And if somebody knows how to reproduce this reliably, it would be
> really good to hear if doing a revert on that thing just fixed it. It
> should revert cleanly - it's the only change to fs/eventpoll.c since
> 2.6.27.
> 

I did the revert and did the same thing I did before, and haven't seen
it yet.

(this is obviously a really small sample size on both sides of the
line, but it's at least not show it easily now while I saw it a few
times before.. after I fixed my Wifi and manually set the r8169 mac)
-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kerneloops.org: 2.6.28-rc regression in epoll (list corruption)
  2008-10-26 18:58 ` Linus Torvalds
  2008-10-26 19:33   ` Arjan van de Ven
@ 2008-10-26 19:35   ` Davide Libenzi
  1 sibling, 0 replies; 6+ messages in thread
From: Davide Libenzi @ 2008-10-26 19:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Linux Kernel Mailing List, Rafael J. Wysocki,
	Thomas Gleixner

On Sun, 26 Oct 2008, Linus Torvalds wrote:

> 
> 
> On Sun, 26 Oct 2008, Arjan van de Ven wrote:
> >
> > This one is upcoming fast (and I just hit it as well)
> > 
> > http://www.kerneloops.org/searchweek.php?search=ep_poll_callback
> > 
> > seems epoll grew some list corruption....
> 
> It sounds very much like f337b9c58332bdecde965b436e47ea4c94d30da0 ("epoll: 
> drop unnecessary test") deleted a test that wasn't so unnecessary after 
> all..
> 
> That ep_poll_callback() code is:
> 
>         /* If this file is already in the ready list we exit soon */
>         if (ep_is_linked(&epi->rdllink))
>                 goto is_linked;
> 
>         list_add_tail(&epi->rdllink, &ep->rdllist);
> 
> and the unnecessary test that was removed looks _very_ much like that kind 
> of code.

No, the test was in the re-insertion loop. This is the patch that fixes it:

---
 fs/eventpoll.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: linux-2.6.mod/fs/eventpoll.c
===================================================================
--- linux-2.6.mod.orig/fs/eventpoll.c	2008-10-17 15:51:09.000000000 -0700
+++ linux-2.6.mod/fs/eventpoll.c	2008-10-17 15:54:14.000000000 -0700
@@ -930,8 +930,15 @@
 	 * inside the main ready-list here.
 	 */
 	for (nepi = ep->ovflist; (epi = nepi) != NULL;
-	     nepi = epi->next, epi->next = EP_UNACTIVE_PTR)
-		list_add_tail(&epi->rdllink, &ep->rdllist);
+	     nepi = epi->next, epi->next = EP_UNACTIVE_PTR) {
+		/*
+		 * If the above loop quit with errors, the epoll item might still
+		 * be linked to "txlist", and the list_splice() done below will
+		 * take care of those cases.
+		 */
+		if (!ep_is_linked(&epi->rdllink))
+			list_add_tail(&epi->rdllink, &ep->rdllist);
+	}
 	/*
 	 * We need to set back ep->ovflist to EP_UNACTIVE_PTR, so that after
 	 * releasing the lock, events will be queued in the normal way inside



> And if somebody knows how to reproduce this reliably, it would be really 
> good to hear if doing a revert on that thing just fixed it. It should 
> revert cleanly - it's the only change to fs/eventpoll.c since 2.6.27.

That's 100% due to the removed test that went in when you sucked up Andrew 
bits after .27.
Patch has been confirmed by me and bug submitters.




> I'm somewhat inclined to revert it without even getting confirmation, 
> since I wanted to do an early -rc2 today with all the brown-paper-bag 
> fixes that have accumulated. But it would be good to get some 
> confirmation.

No need to revert, since one of the removed tests is really un-needed.




> Btw, that whole logic in ep_send_events() sounds a bit scary. It says:
> 
> 	We can loop without lock because this is a task private list.
> 
> and it's true that "txlist" is a private list, but it still seems to 
> depend on the fact that none of the "struct epitem"s on that list can be 
> reached by any other means. And that whole thing depends on the magic 
> behaviour of 'ep->ovflist', but there are a lot of code sequences that do 
> *not* seem to test that at all.
> 
> Maybe the rqace/bug has always been there (ie some sequence that works 
> with an "epi->rdllink" without checking ovflist), but the "unnecessary" 
> test protected us from seeing it in practice.

No. Bug was introduced by the removed test. Thomas contacted me telling 
that such test was un-needed, and at first sight it looked OK to drop it 
(since during the loop time the callback inserts into 'ep->ovflist').
But we didn't notice that a premature loop exit might leave items linked 
to 'txlist', that the splice below takes care of re-insert them.
You can speed up things by sucking the patch directly, w/out waiting 
Andrew's pull.




- Davide



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-10-26 19:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-26 18:29 kerneloops.org: 2.6.28-rc regression in epoll (list corruption) Arjan van de Ven
2008-10-26 18:38 ` Davide Libenzi
2008-10-26 18:43 ` Rafael J. Wysocki
2008-10-26 18:58 ` Linus Torvalds
2008-10-26 19:33   ` Arjan van de Ven
2008-10-26 19:35   ` Davide Libenzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).