LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	netdev@vger.kernel.org, trond.myklebust@fys.uio.no,
	Pekka Enberg <penberg@cs.helsinki.fi>
Subject: Re: [PATCH 00/28] Swap over NFS -v16
Date: Mon, 10 Mar 2008 16:15:56 +1100	[thread overview]
Message-ID: <18388.50188.552322.780524@notabene.brown> (raw)
In-Reply-To: message from Peter Zijlstra on Friday March 7

On Friday March 7, a.p.zijlstra@chello.nl wrote:
> Hi Neil,
> 
> I'm so glad you are working with me on this and writing this in human
> English. It seems to be my eternal short-comming to communicate my ideas
> clearly :-/. Thanks for your effort!

:-)
It always helps to have a second brain with a different perspective.


> 
> On Fri, 2008-03-07 at 14:33 +1100, Neil Brown wrote:
> > 
> > [I don't find the above wholly satisfying.  There seems to be too much
> >  hand-waving.  If someone can provide better text explaining why
> >  swapout is a special case, that would be great.]
> 
> Anonymous pages are dirty by definition (except the zero page, but I
> think we recently ditched it). So shrinking of the anonymous pool will
> require swapping.

Well, there is the swap cache.  That's probably what I was thinking of
when I said "clean anonymous pages".  I suspect they are the first to
go!

> 
> It is indeed the last refuge for those with GFP_NOFS. Allong with the
> strict limit on the amount of dirty file pages it also ensures writing
> those out will never deadlock the machine as there are always clean file
> pages and or anonymous pages to launder.

The difficulty I have is justifying exactly why page-cache writeout
will not deadlock.  What if all the memory that is not dirty-pagecache
is anonymous, and if swap isn't enabled?
Maybe the number returned by "determine_dirtyable_memory" in
page-writeback.c excludes anonymous pages?  I wonder if the meaning of
NR_FREE_PAGES, NR_INACTIVE, etc is documented anywhere....

...
> 
> Right. I've had a long conversation on PG_emergency with Pekka. And I
> think the conclusion was that PG_emergency will create more head-aches
> than it solves. I probably have the conversation in my IRC logs and
> could email it if you're interested (and Pekka doesn't object).

Maybe that depends on the exact semantic of PG_emergency ??
I remember you being concerned that PG_emergency never changes between
allocation and freeing, and that wouldn't work well with slub.
My envisioned semantic has it possibly changing quite often.
What it means is:
   The last allocation done from this page was in a low-memory
   condition.

You really need some way to tell if the result of kmalloc/kmemalloc
should be treated as reserved.
I think you had code which first tried the allocation without
GFP_MEMALLOC and then if that failed, tried again *with*
GFP_MEMALLOC.  If that then succeeded, it is assumed to be an
allocation from reserves.  That seemed rather ugly, though I guess you
could wrap it in a function to hide the ugliness:

void *kmalloc_reserve(size_t size, int *reserve, gfp_t gfp_flags)
{
	void *result = kmalloc(size, gfp_flags & ~GFP_MEMALLOC);
	if (result) {
		*reserve = 0;
		return result;
	}
	result = kmalloc(size, gfp_flags | GFP_MEMALLOC);
	if (result) {
		*reserve = 1;
		return result;
	}
	return NULL;
}
???

> 
> I've already heard interest from other people to use these hooks to
> provide swap on other non-block filesystems such as jffs2, logfs and the
> like.

I'm interested in the swap_in/swap_out interface for external
write-intent bitmaps for md/raid arrays.
You can have a write-intent bitmap which records which blocks might be
dirty if the host crashes, so that resync is much faster.
It can be stored in a file in a separate filesystem, but that is
currently implemented by using bmap to enumerate the blocks and then
reading/writing directly to the device (like swap).  Your interface
would be much nicer for that (not that I think having a
write-intent-bitmap on an NFS filesystem would be a clever idea ;-)

I'll look forward to your next patch set....

One thing I had thought odd while reading the patches, but haven't
found an opportunity to mention before, is the "IS_SWAPFILE" test in
nfs-swapper.patch.
This seems like a layering violation.  It would be better if the test
was based on whether  ->swapfile had been called on the file.  That way
my write-intent-bitmaps would get the same benefit.

NeilBrown

  parent reply	other threads:[~2008-03-10  5:16 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-20 14:46 Peter Zijlstra
2008-02-20 14:46 ` [PATCH 01/28] mm: gfp_to_alloc_flags() Peter Zijlstra
2008-02-20 14:46 ` [PATCH 02/28] mm: tag reseve pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 03/28] mm: slb: add knowledge of reserve pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 04/28] mm: kmem_estimate_pages() Peter Zijlstra
2008-02-23  8:05   ` Andrew Morton
2008-02-20 14:46 ` [PATCH 05/28] mm: allow PF_MEMALLOC from softirq context Peter Zijlstra
2008-02-23  8:05   ` Andrew Morton
2008-02-20 14:46 ` [PATCH 06/28] mm: serialize access to min_free_kbytes Peter Zijlstra
2008-02-20 14:46 ` [PATCH 07/28] mm: emergency pool Peter Zijlstra
2008-02-23  8:05   ` Andrew Morton
2008-02-20 14:46 ` [PATCH 08/28] mm: system wide ALLOC_NO_WATERMARK Peter Zijlstra
2008-02-23  8:05   ` Andrew Morton
2008-02-20 14:46 ` [PATCH 09/28] mm: __GFP_MEMALLOC Peter Zijlstra
2008-02-23  8:06   ` Andrew Morton
2008-02-20 14:46 ` [PATCH 10/28] mm: memory reserve management Peter Zijlstra
2008-02-23  8:06   ` Andrew Morton
2008-02-20 14:46 ` [PATCH 11/28] selinux: tag avc cache alloc as non-critical Peter Zijlstra
2008-02-20 14:46 ` [PATCH 12/28] net: wrap sk->sk_backlog_rcv() Peter Zijlstra
2008-02-20 14:46 ` [PATCH 13/28] net: packet split receive api Peter Zijlstra
2008-02-20 14:46 ` [PATCH 14/28] net: sk_allocation() - concentrate socket related allocations Peter Zijlstra
2008-02-20 14:46 ` [PATCH 15/28] netvm: network reserve infrastructure Peter Zijlstra
2008-02-23  8:06   ` Andrew Morton
2008-02-24  6:52   ` Mike Snitzer
2008-02-20 14:46 ` [PATCH 16/28] netvm: INET reserves Peter Zijlstra
2008-02-20 14:46 ` [PATCH 17/28] netvm: hook skb allocation to reserves Peter Zijlstra
2008-02-23  8:06   ` Andrew Morton
2008-02-20 14:46 ` [PATCH 18/28] netvm: filter emergency skbs Peter Zijlstra
2008-02-20 14:46 ` [PATCH 19/28] netvm: prevent a stream specific deadlock Peter Zijlstra
2008-02-20 14:46 ` [PATCH 20/28] netfilter: NF_QUEUE vs emergency skbs Peter Zijlstra
2008-02-20 14:46 ` [PATCH 21/28] netvm: skb processing Peter Zijlstra
2008-02-20 14:46 ` [PATCH 22/28] mm: add support for non block device backed swap files Peter Zijlstra
2008-02-20 16:30   ` Randy Dunlap
2008-02-20 16:46     ` Peter Zijlstra
2008-02-26 12:45   ` Miklos Szeredi
2008-02-26 12:58     ` Peter Zijlstra
2008-02-20 14:46 ` [PATCH 23/28] mm: methods for teaching filesystems about PG_swapcache pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 24/28] nfs: remove mempools Peter Zijlstra
2008-02-20 14:46 ` [PATCH 25/28] nfs: teach the NFS client how to treat PG_swapcache pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 26/28] nfs: disable data cache revalidation for swapfiles Peter Zijlstra
2008-02-20 14:46 ` [PATCH 27/28] nfs: enable swap on NFS Peter Zijlstra
2008-02-20 14:46 ` [PATCH 28/28] nfs: fix various memory recursions possible with swap over NFS Peter Zijlstra
2008-02-23  8:06 ` [PATCH 00/28] Swap over NFS -v16 Andrew Morton
2008-02-26  6:03   ` Neil Brown
2008-02-26 10:50     ` Peter Zijlstra
2008-02-26 12:00       ` Peter Zijlstra
2008-02-26 15:29       ` Miklos Szeredi
2008-02-26 15:41         ` Peter Zijlstra
2008-02-26 15:43         ` Peter Zijlstra
2008-02-26 15:47           ` Miklos Szeredi
2008-02-26 17:56       ` Andrew Morton
2008-02-27  5:51       ` Neil Brown
2008-02-27  7:58         ` Peter Zijlstra
2008-02-27  8:05           ` Pekka Enberg
2008-02-27  8:14             ` Peter Zijlstra
2008-02-27  8:33               ` Peter Zijlstra
2008-02-27  8:43                 ` Pekka J Enberg
2008-02-29 11:51             ` Peter Zijlstra
2008-02-29 11:58               ` Pekka Enberg
2008-02-29 12:18                 ` Peter Zijlstra
2008-02-29 12:29                   ` Pekka Enberg
2008-02-29  1:29           ` Neil Brown
2008-02-29 10:21             ` Peter Zijlstra
2008-03-02 22:18               ` Neil Brown
2008-03-02 23:33                 ` Peter Zijlstra
2008-03-03 23:41                   ` Neil Brown
2008-03-04 10:28                     ` Peter Zijlstra
     [not found]           ` <1837 <1204626509.6241.39.camel@lappy>
2008-03-07  3:33             ` Neil Brown
2008-03-07 11:17               ` Peter Zijlstra
2008-03-07 11:55                 ` Peter Zijlstra
2008-03-10  5:15                 ` Neil Brown [this message]
2008-03-10  9:17                   ` Peter Zijlstra
2008-03-14  5:22                     ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18388.50188.552322.780524@notabene.brown \
    --to=neilb@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --cc=penberg@cs.helsinki.fi \
    --cc=torvalds@linux-foundation.org \
    --cc=trond.myklebust@fys.uio.no \
    --subject='Re: [PATCH 00/28] Swap over NFS -v16' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).