LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/4] Improve swap page error handling
@ 2007-01-08 13:48 Richard Purdie
  2007-01-08 19:31 ` Hugh Dickins
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Purdie @ 2007-01-08 13:48 UTC (permalink / raw)
  To: Nick Piggin, Hugh Dickins, kernel list; +Cc: Andrew Morton

Improve the error handling when writes fail to a swap page. 

Currently, the kernel will repeatedly retry the write which is unlikely
to ever succeed. Instead we allow the pages to be unused and then marked
as bad at which prevents reuse. It should hopefully be suitable for
testing in -mm.

These patches are a based on a patch by Nick Piggin and some of my own
patches/bugfixes as discussed on LKML.

Richard


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] Improve swap page error handling
  2007-01-08 13:48 [PATCH 0/4] Improve swap page error handling Richard Purdie
@ 2007-01-08 19:31 ` Hugh Dickins
  0 siblings, 0 replies; 5+ messages in thread
From: Hugh Dickins @ 2007-01-08 19:31 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Nick Piggin, kernel list, Andrew Morton

On Mon, 8 Jan 2007, Richard Purdie wrote:
> Improve the error handling when writes fail to a swap page. 
> 
> Currently, the kernel will repeatedly retry the write which is unlikely
> to ever succeed. Instead we allow the pages to be unused and then marked
> as bad at which prevents reuse. It should hopefully be suitable for
> testing in -mm.
> 
> These patches are a based on a patch by Nick Piggin and some of my own
> patches/bugfixes as discussed on LKML.

No, not this way, I'm afraid.  Sorry, I don't remember the prior
discussion on LKML, must have flooded past when my attention was
elsewhere.

Is it worth doing this at all?  Probably, but I've no experience
whatsoever of swap write errors, so it's hard for me to judge: my
guess is that many cases would turn out to be software errors (e.g.
lower level needing more memory to perform the write).  But you'd
be right to counter: let's assume they're hardware errors, and
then fix up any software errors when reported.

If it is worth doing this, then you'll need to add code to write
back the swap header, to note the bad pages permanently: you may
well have been waiting to see what reception the patches so far
get, before embarking on that.

1/4 seemed mostly reasonable to me at first, though I had a few
nits about the way you'd divided it between try_to_unuse_entry
and try_to_unuse (e.g. the complication of !start_mm_p belongs
to a later patch, and the mm_users check belongs in try_to_unuse).

I was uneasy with 2/4, wondered if swap_free(entry, page) would
be a better direction to go than your swap_free_markbad(entry).

I disliked 3/4: that initial_locked argument to try_to_unuse_entry,
which (I think) was only necessary because you'd put the lock_page
in try_to_unuse_entry instead of try_to_unuse in the first one.

I rather liked 4/4, the way you let the page rotate around and at
that point discover the failure and fix it up.  But that reveals
both the problem with your approach and its solution.

The killer problem, doing it this way, is that unuse_mm does
down_read(&mm->mmap_sem) for every mm it's tried on, but it's very
possible that the process doing shrink_page_list holds an mmap_sem:
most likely only for reading, but even a recursive down_read can
be deadlocked by a waiting down_write.  There may be more such
problems, I didn't bother to think further once I'd seen that.

But it hints at the solution too.  You're calling your
try_to_unuse_page_entry shortly before the call to try_to_unmap,
and try_to_unmap manages to find all the pages without needing
any mmap_sems at all.

I guess your approach simply highlights my laziness: all the mm
stuff that you factor out in 1/4 dates from before rmap came in.
It still serves a vital purpose, when dealing with pages freshly
read in from swap, when there's no record of what mm or tmpfs file
they belong to; but once that's known, it much quicker to use an
rmap technique, looping through the vmas of the relevant anon_vma
(taking its lock), than scanning all the (swapped) mms in the system.

All I bothered to do, after objrmap came in, was force a shortcut in
unuse_vma once the anon_vma is known (that page_address_in_vma call):
it didn't seem worth rejigging the whole mm loop.  But now you have
another use for unusing a swap entry, that is surely the way to go:
the mm loop is unnecessary, or becomes unnecessary, once we know
the anon_vma to which the swap page belongs.  Which is already so
for your failed-write swap pages.

Hugh

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] Improve swap page error handling
  2007-01-10 22:32 ` Nick Piggin
@ 2007-01-10 23:52   ` Richard Purdie
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Purdie @ 2007-01-10 23:52 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Hugh Dickins, kernel list, Andrew Morton

On Thu, 2007-01-11 at 09:32 +1100, Nick Piggin wrote:
> Richard Purdie wrote: 
> > I think you were cc'd on some of it but you never commented. Anyhow,
> > I've reworked this patch series based on your comments. The hints were
> > appreciated, thanks. This was the way I'd originally hoped to be able to
> > work things, I just couldn't find the right way to do it.
> 
> IMO it seems a bit complex for so small a benefit. Last time I was
> working on this, I thought it would be almost as good to do something
> simple like stop trying to write out the page if PG_error is set (and
> clear that bit in delete_from_swap_cache or try_to_unusesomewhere).
> This way the admin could swapoff and scan the swap device at some
> point.

FWIW, the patches have got a lot less invasive and I was pleased with
the way the last set I posted worked out. 

1/4 is a lot of noise adding the page parameter to swap_free but doesn't
actually change much. 
2/4 is the guts of the solution in the form of two new functions. 
3/4 just hooks it in. 
4/4 is an optional cleanup.

I guess the key point for me is that the lack of proper handling of this
was bringing one of my systems to its knees due to IO loops before these
patches. Yes, there are ways to minimise the impact but why not fix it
properly? This proposal is certainly nowhere near as invasive as the
previous ones and since its got this far...

> > You can't proceed to do that until you're able to identify the bad pages
> > so this would be a necessary first step towards that, yes.
> 
> Agreed here, FWIW. I think that might be just as well done in
> userspace?

Maybe, I haven't made my mind up about that yet. I'd have to see how the
code looked I guess.

Cheers,

Richard


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/4] Improve swap page error handling
  2007-01-10 18:04 Richard Purdie
@ 2007-01-10 22:32 ` Nick Piggin
  2007-01-10 23:52   ` Richard Purdie
  0 siblings, 1 reply; 5+ messages in thread
From: Nick Piggin @ 2007-01-10 22:32 UTC (permalink / raw)
  To: Richard Purdie; +Cc: Hugh Dickins, kernel list, Andrew Morton

Richard Purdie wrote:

>>No, not this way, I'm afraid.  Sorry, I don't remember the prior
>>discussion on LKML, must have flooded past when my attention was
>>elsewhere.
> 
> 
> I think you were cc'd on some of it but you never commented. Anyhow,
> I've reworked this patch series based on your comments. The hints were
> appreciated, thanks. This was the way I'd originally hoped to be able to
> work things, I just couldn't find the right way to do it.

IMO it seems a bit complex for so small a benefit. Last time I was
working on this, I thought it would be almost as good to do something
simple like stop trying to write out the page if PG_error is set (and
clear that bit in delete_from_swap_cache or try_to_unusesomewhere).
This way the admin could swapoff and scan the swap device at some
point.

>>Is it worth doing this at all?  Probably, but I've no experience
>>whatsoever of swap write errors, so it's hard for me to judge: my
>>guess is that many cases would turn out to be software errors (e.g.
>>lower level needing more memory to perform the write).  But you'd
>>be right to counter: let's assume they're hardware errors, and
>>then fix up any software errors when reported.
> 
> 
> I have a swap block driver where hardware write errors are more likely
> and hence have a need to handle them more gracefully than IO loops. It
> seems like a good idea to avoid the IO loops anyway.
> 
> 
>>If it is worth doing this, then you'll need to add code to write
>>back the swap header, to note the bad pages permanently: you may
>>well have been waiting to see what reception the patches so far
>>get, before embarking on that.
> 
> 
> You can't proceed to do that until you're able to identify the bad pages
> so this would be a necessary first step towards that, yes.

Agreed here, FWIW. I think that might be just as well done in
userspace?

Nick

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 0/4] Improve swap page error handling
@ 2007-01-10 18:04 Richard Purdie
  2007-01-10 22:32 ` Nick Piggin
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Purdie @ 2007-01-10 18:04 UTC (permalink / raw)
  To: Nick Piggin, Hugh Dickins, kernel list; +Cc: Andrew Morton

Improve the error handling when writes fail to a swap page. 

Currently, the kernel will repeatedly retry the write which is unlikely
to ever succeed. Instead we allow the pages to be unused and then marked
as bad at which prevents reuse. It should hopefully be suitable for
testing in -mm.

Hugh Dickins (on a previous incarnation of this series):
> No, not this way, I'm afraid.  Sorry, I don't remember the prior
> discussion on LKML, must have flooded past when my attention was
> elsewhere.

I think you were cc'd on some of it but you never commented. Anyhow,
I've reworked this patch series based on your comments. The hints were
appreciated, thanks. This was the way I'd originally hoped to be able to
work things, I just couldn't find the right way to do it.

> Is it worth doing this at all?  Probably, but I've no experience
> whatsoever of swap write errors, so it's hard for me to judge: my
> guess is that many cases would turn out to be software errors (e.g.
> lower level needing more memory to perform the write).  But you'd
> be right to counter: let's assume they're hardware errors, and
> then fix up any software errors when reported.

I have a swap block driver where hardware write errors are more likely
and hence have a need to handle them more gracefully than IO loops. It
seems like a good idea to avoid the IO loops anyway.

> If it is worth doing this, then you'll need to add code to write
> back the swap header, to note the bad pages permanently: you may
> well have been waiting to see what reception the patches so far
> get, before embarking on that.

You can't proceed to do that until you're able to identify the bad pages
so this would be a necessary first step towards that, yes.

> I was uneasy with 2/4, wondered if swap_free(entry, page) would
> be a better direction to go than your swap_free_markbad(entry).

Agreed, see the following 1/4. 

Patch 4/4 in this series is optional but its appended in hope. It cleans
up code at the expense of what looks like a performance optimisation. I
found the code as it stands rather confusing as a newcomer to that code.

Richard


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-01-10 23:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-08 13:48 [PATCH 0/4] Improve swap page error handling Richard Purdie
2007-01-08 19:31 ` Hugh Dickins
2007-01-10 18:04 Richard Purdie
2007-01-10 22:32 ` Nick Piggin
2007-01-10 23:52   ` Richard Purdie

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).