LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
netdev@vger.kernel.org, trond.myklebust@fys.uio.no
Subject: Re: [PATCH 00/28] Swap over NFS -v16
Date: Tue, 26 Feb 2008 17:03:26 +1100 [thread overview]
Message-ID: <18371.43950.150842.429997@notabene.brown> (raw)
In-Reply-To: message from Andrew Morton on Saturday February 23
On Saturday February 23, akpm@linux-foundation.org wrote:
> On Wed, 20 Feb 2008 15:46:10 +0100 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> > Another posting of the full swap over NFS series.
>
> Well I looked. There's rather a lot of it and I wouldn't pretend to
> understand it.
But pretending is fun :-)
>
> What is the NFS and net people's take on all of this?
Well I'm only vaguely an NFS person, barely a net person, sporadically
an mm person, but I've had a look and it seems to mostly make sense.
We introduce a new "emergency" concept for page allocation.
The size of the emergency pool is set by various reservations by
different potential users.
If the number of free pages is below the "emergency" size, then only
users with a "MEMALLOC" flag get to allocate pages. Further, those
pages get a "reserve" flag set which propagates into slab/slub so
kmalloc/kmemalloc only return memory from those pages to MEMALLOC
users.
MEMALLOC users are those that set PF_MEMALLOC. A socket can get
SOCK_MEMALLOC set which will cause certain pieces of code to
temporarily set PF_MEMALLOC while working on that socket.
The upshot is that providing any MEMALLOC user reserves an appropriate
amount of emergency space, returns the emergency memory promptly, and
sets PF_MEMALLOC whenever allocating memory, it's memory allocations
should never fail.
As memory is requested is small units, but allocated as pages, there
needs to be a conversion from small-units to pages. One of the
patches does this and appears to err on the side of be over-generous,
which is the right thing to do.
Memory reservations are organised in a tree. I really don't
understand the tree. Is it just to make /proc/reserve_info look more
helpful?
Certainly all the individual reservations need to be recorded, and the
cumulative reservation needs also to be recorded (currently in the
root of the tree) but what are all the other levels used for?
Reservations are used for all the transient memory that might be used
by the network stack. This particularly includes the route cache and
skbs for incoming messages. I have no idea if there is anything else
that needs to be allowed for.
Filesystems can advertise (via address_space_operations) that files
may be used as swap file. They then provide swapout/swapin methods
which are like writepage/readpage but may behave differently and have
a different way to get credentials from a 'struct file'.
So in general, the patch set looks to have the right sort of shape. I
cannot be very authoritative on the details as there are a lot of
them, and they touch code that I'm not very familiar with.
Some specific comments on patches:
reserve-slub.patch
Please avoid irrelevant reformatting in patches. It makes them
harder to read. e.g.:
-static void setup_object(struct kmem_cache *s, struct page *page,
- void *object)
+static void setup_object(struct kmem_cache *s, struct page *page, void *object)
mm-kmem_estimate_pages.patch
This introduces
kestimate
kestimate_single
kmem_estimate_pages
The last obviously returns a number of pages. The contrast seems
to suggest the others don't. But they do...
I don't think the names are very good, but I concede that it is
hard to choose good names here. Maybe:
kmalloc_estimate_variable
kmalloc_estimate_fixed
kmem_alloc_estimate
???
mm-reserve.patch
I'm confused by __mem_reserve_add.
+ reserve = mem_reserve_root.pages;
+ __calc_reserve(res, pages, 0);
+ reserve = mem_reserve_root.pages - reserve;
__calc_reserve will always add 'pages' to mem_reserve_root.pages.
So this is a complex way of doing
reserve = pages;
__calc_reserve(res, pages, 0);
And as you can calculate reserve before calling __calc_reserve
(which seems odd when stated that way), the whole function looks
like it could become:
ret = adjust_memalloc_reserve(pages);
if (!ret)
__calc_reserve(res, pages, limit);
return ret;
What am I missing?
Also, mem_reserve_disconnect really should be a "void" function.
Just put a BUG_ON(ret) and don't return anything.
Finally, I'll just repeat that the purpose of the tree structure
eludes me.
net-sk_allocation.patch
Why are the "GFP_KERNEL" call sites just changed to
"sk->sk_allocation" rather than "sk_allocation(sk, GFP_KERNEL)" ??
I assume there is a good reason, and seeing it in the change log
would educate me and make the patch more obviously correct.
netvm-reserve.patch
Function names again:
sk_adjust_memalloc
sk_set_memalloc
sound similar. Purpose is completely different.
mm-page_file_methods.patch
This makes page_offset and others more expensive by adding a
conditional jump to a function call that is not usually made.
Why do swap pages have a different index to everyone else?
nfs-swap_ops.patch
What happens if you have two swap files on the same NFS
filesystem?
I assume ->swapfile gets called twice. But it hasn't been written
to nest, so the first swapoff will disable swapping for both
files??
That's all for now :-)
NeilBrown
next prev parent reply other threads:[~2008-02-26 6:03 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-20 14:46 Peter Zijlstra
2008-02-20 14:46 ` [PATCH 01/28] mm: gfp_to_alloc_flags() Peter Zijlstra
2008-02-20 14:46 ` [PATCH 02/28] mm: tag reseve pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 03/28] mm: slb: add knowledge of reserve pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 04/28] mm: kmem_estimate_pages() Peter Zijlstra
2008-02-23 8:05 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 05/28] mm: allow PF_MEMALLOC from softirq context Peter Zijlstra
2008-02-23 8:05 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 06/28] mm: serialize access to min_free_kbytes Peter Zijlstra
2008-02-20 14:46 ` [PATCH 07/28] mm: emergency pool Peter Zijlstra
2008-02-23 8:05 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 08/28] mm: system wide ALLOC_NO_WATERMARK Peter Zijlstra
2008-02-23 8:05 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 09/28] mm: __GFP_MEMALLOC Peter Zijlstra
2008-02-23 8:06 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 10/28] mm: memory reserve management Peter Zijlstra
2008-02-23 8:06 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 11/28] selinux: tag avc cache alloc as non-critical Peter Zijlstra
2008-02-20 14:46 ` [PATCH 12/28] net: wrap sk->sk_backlog_rcv() Peter Zijlstra
2008-02-20 14:46 ` [PATCH 13/28] net: packet split receive api Peter Zijlstra
2008-02-20 14:46 ` [PATCH 14/28] net: sk_allocation() - concentrate socket related allocations Peter Zijlstra
2008-02-20 14:46 ` [PATCH 15/28] netvm: network reserve infrastructure Peter Zijlstra
2008-02-23 8:06 ` Andrew Morton
2008-02-24 6:52 ` Mike Snitzer
2008-02-20 14:46 ` [PATCH 16/28] netvm: INET reserves Peter Zijlstra
2008-02-20 14:46 ` [PATCH 17/28] netvm: hook skb allocation to reserves Peter Zijlstra
2008-02-23 8:06 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 18/28] netvm: filter emergency skbs Peter Zijlstra
2008-02-20 14:46 ` [PATCH 19/28] netvm: prevent a stream specific deadlock Peter Zijlstra
2008-02-20 14:46 ` [PATCH 20/28] netfilter: NF_QUEUE vs emergency skbs Peter Zijlstra
2008-02-20 14:46 ` [PATCH 21/28] netvm: skb processing Peter Zijlstra
2008-02-20 14:46 ` [PATCH 22/28] mm: add support for non block device backed swap files Peter Zijlstra
2008-02-20 16:30 ` Randy Dunlap
2008-02-20 16:46 ` Peter Zijlstra
2008-02-26 12:45 ` Miklos Szeredi
2008-02-26 12:58 ` Peter Zijlstra
2008-02-20 14:46 ` [PATCH 23/28] mm: methods for teaching filesystems about PG_swapcache pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 24/28] nfs: remove mempools Peter Zijlstra
2008-02-20 14:46 ` [PATCH 25/28] nfs: teach the NFS client how to treat PG_swapcache pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 26/28] nfs: disable data cache revalidation for swapfiles Peter Zijlstra
2008-02-20 14:46 ` [PATCH 27/28] nfs: enable swap on NFS Peter Zijlstra
2008-02-20 14:46 ` [PATCH 28/28] nfs: fix various memory recursions possible with swap over NFS Peter Zijlstra
2008-02-23 8:06 ` [PATCH 00/28] Swap over NFS -v16 Andrew Morton
2008-02-26 6:03 ` Neil Brown [this message]
2008-02-26 10:50 ` Peter Zijlstra
2008-02-26 12:00 ` Peter Zijlstra
2008-02-26 15:29 ` Miklos Szeredi
2008-02-26 15:41 ` Peter Zijlstra
2008-02-26 15:43 ` Peter Zijlstra
2008-02-26 15:47 ` Miklos Szeredi
2008-02-26 17:56 ` Andrew Morton
2008-02-27 5:51 ` Neil Brown
2008-02-27 7:58 ` Peter Zijlstra
2008-02-27 8:05 ` Pekka Enberg
2008-02-27 8:14 ` Peter Zijlstra
2008-02-27 8:33 ` Peter Zijlstra
2008-02-27 8:43 ` Pekka J Enberg
2008-02-29 11:51 ` Peter Zijlstra
2008-02-29 11:58 ` Pekka Enberg
2008-02-29 12:18 ` Peter Zijlstra
2008-02-29 12:29 ` Pekka Enberg
2008-02-29 1:29 ` Neil Brown
2008-02-29 10:21 ` Peter Zijlstra
2008-03-02 22:18 ` Neil Brown
2008-03-02 23:33 ` Peter Zijlstra
2008-03-03 23:41 ` Neil Brown
2008-03-04 10:28 ` Peter Zijlstra
[not found] ` <1837 <1204626509.6241.39.camel@lappy>
2008-03-07 3:33 ` Neil Brown
2008-03-07 11:17 ` Peter Zijlstra
2008-03-07 11:55 ` Peter Zijlstra
2008-03-10 5:15 ` Neil Brown
2008-03-10 9:17 ` Peter Zijlstra
2008-03-14 5:22 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18371.43950.150842.429997@notabene.brown \
--to=neilb@suse.de \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=trond.myklebust@fys.uio.no \
--subject='Re: [PATCH 00/28] Swap over NFS -v16' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).