LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Neil Brown <neilb@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
netdev@vger.kernel.org, trond.myklebust@fys.uio.no
Subject: Re: [PATCH 00/28] Swap over NFS -v16
Date: Wed, 27 Feb 2008 08:58:32 +0100 [thread overview]
Message-ID: <1204099113.6242.353.camel@lappy> (raw)
In-Reply-To: <18372.64081.995262.986841@notabene.brown>
On Wed, 2008-02-27 at 16:51 +1100, Neil Brown wrote:
> Hi Peter,
>
> On Tuesday February 26, a.p.zijlstra@chello.nl wrote:
> > Hi Neil,
> >
> > Thanks for taking a look, and giving such elaborate feedback. I'll try
> > and address these issues asap, but first let me reply to a few points
> > here.
>
> Thanks... the tree thing is starting to make sense, and I'm not
> confused my __mem_reserve_add any more :-)
>
> I've been having a closer read of some of the code that I skimmed over
> before and I have some more questions.
>
> 1/ I note there is no way to tell if memory returned by kmalloc is
> from the emergency reserve - which contrasts with alloc_page
> which does make that information available through page->reserve.
> This seems a slightly unfortunate aspect of the interface.
Yes, but alas there is no room to store such information in kmalloc().
That is, in a sane way. I think it was Daniel Phillips who suggested
encoding it in the return pointer by flipping the low bit - but that is
just too ugly and breaks all current kmalloc sites to boot.
> It seems to me that __alloc_skb could be simpler if this
> information was available. It currently tries the allocation
> normally, then if that fails it retries with __GFP_MEMALLOC set and
> if that works it assume it was from the emergency pool ... which it
> might not be, though the assumption is safe enough.
>
> It would seem to be nicer if you could just make the one alloc call,
> setting GFP_MEMALLOC if that might be appropriate. Then if the
> memory came from the emergency reserve, flag it as an emergency skb.
>
> However doing that would have issues with reservations.... the
> mem_reserve would need to be passed to kmalloc :-(
Yes, it would require a massive overhaul of quite a few things. I agree,
it would all be nicer, but I think you see why I didn't do it.
> 2/ It doesn't appear to be possible to wait on a reservation. i.e. if
> mem_reserve_*_charge fails, we might sometimes want to wait until
> it will succeed. This feature is an integral part of how mempools
> work and are used. If you want reservations to (be able to)
> replace mempools, then waiting really is needed.
>
> It seems that waiting would work best if it was done quite low down
> inside kmalloc. That would require kmalloc to know which
> 'mem_reserve' you are using which it currently doesn't.
>
> If it did, then it could choose to use emergency pages if the
> mem_reserve allowed it more space, otherwise require a regular page.
> And if __GFP_WAIT is set then each time around the loop it could
> revise whether to use an emergency page or not, based on whether it
> can successfully charge the reservation.
Like mempools, we could add a wrapper with a mem_reserve and waitqueue
inside, strip __GFP_WAIT, try, see if the reservation allows, and wait
if not.
I haven't yet done such a wrapper because it wasn't needed. But it could
be done.
> Of course, having a mem_reserve available for PF_MEMALLOC
> allocations would require changing every kmalloc call in the
> protected region, which might not be ideal, but might not be a
> major hassle, and would ensure that you find all kmalloc calls that
> might be made while in PF_MALLOC state.
I assumed the current PF_MEMALLOC usage was good enough for the current
reserves - not quite true as its potentially unlimited, but it seems to
work in practise.
I did try to find all allocation sites in the paths I enabled
PF_MEMALLOC over.
> 3/ Thinking about the tree structure a bit more: Your motivation
> seems to be that it allows you to make two separate reservations,
> and then charge some memory usage again either-one-of-the-other.
> This seems to go against a key attribute of reservations. I would
> have thought that an important rule about reservations is that
> no-one else can use memory reserved for a particular purpose.
> So if IPv6 reserves some memory, and the IPv4 uses it, that doesn't
> seem like something we want to encourage...
Well, we only have one kind of skb, a network packet doesn't know if it
belongs to IPv4 or IPv6 (or yet a whole different address familiy) when
it comes in. So we grow the skb pool to overflow both defragment caches.
But yeah, its something where you need to know what you're doing - as
with so many other things in the kernel, hence I didn't worry too much.
> 4/ __netdev_alloc_page is bothering me a bit.
> This is used to allocate memory for incoming fragments in a
> (potentially multi-fragment) packet. But we only rx_emergency_get
> for each page as it arrives rather than all at once at the start.
> So you could have a situation where two very large packets are
> arriving at the same time and there is enough reserve to hold
> either of them but not both. The current code will hand out that
> reservation a little (well, one page) at a time to each packet and
> will run out before either packet has been fully received. This
> seems like a bad thing. Is it?
>
> Is it possible to do the rx_emergency_get just once of the whole
> packet I wonder?
I honestly don't know enough about network cards and drivers to answer
this. It was a great feat I managed this much :-)
next prev parent reply other threads:[~2008-02-27 7:59 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-20 14:46 Peter Zijlstra
2008-02-20 14:46 ` [PATCH 01/28] mm: gfp_to_alloc_flags() Peter Zijlstra
2008-02-20 14:46 ` [PATCH 02/28] mm: tag reseve pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 03/28] mm: slb: add knowledge of reserve pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 04/28] mm: kmem_estimate_pages() Peter Zijlstra
2008-02-23 8:05 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 05/28] mm: allow PF_MEMALLOC from softirq context Peter Zijlstra
2008-02-23 8:05 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 06/28] mm: serialize access to min_free_kbytes Peter Zijlstra
2008-02-20 14:46 ` [PATCH 07/28] mm: emergency pool Peter Zijlstra
2008-02-23 8:05 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 08/28] mm: system wide ALLOC_NO_WATERMARK Peter Zijlstra
2008-02-23 8:05 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 09/28] mm: __GFP_MEMALLOC Peter Zijlstra
2008-02-23 8:06 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 10/28] mm: memory reserve management Peter Zijlstra
2008-02-23 8:06 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 11/28] selinux: tag avc cache alloc as non-critical Peter Zijlstra
2008-02-20 14:46 ` [PATCH 12/28] net: wrap sk->sk_backlog_rcv() Peter Zijlstra
2008-02-20 14:46 ` [PATCH 13/28] net: packet split receive api Peter Zijlstra
2008-02-20 14:46 ` [PATCH 14/28] net: sk_allocation() - concentrate socket related allocations Peter Zijlstra
2008-02-20 14:46 ` [PATCH 15/28] netvm: network reserve infrastructure Peter Zijlstra
2008-02-23 8:06 ` Andrew Morton
2008-02-24 6:52 ` Mike Snitzer
2008-02-20 14:46 ` [PATCH 16/28] netvm: INET reserves Peter Zijlstra
2008-02-20 14:46 ` [PATCH 17/28] netvm: hook skb allocation to reserves Peter Zijlstra
2008-02-23 8:06 ` Andrew Morton
2008-02-20 14:46 ` [PATCH 18/28] netvm: filter emergency skbs Peter Zijlstra
2008-02-20 14:46 ` [PATCH 19/28] netvm: prevent a stream specific deadlock Peter Zijlstra
2008-02-20 14:46 ` [PATCH 20/28] netfilter: NF_QUEUE vs emergency skbs Peter Zijlstra
2008-02-20 14:46 ` [PATCH 21/28] netvm: skb processing Peter Zijlstra
2008-02-20 14:46 ` [PATCH 22/28] mm: add support for non block device backed swap files Peter Zijlstra
2008-02-20 16:30 ` Randy Dunlap
2008-02-20 16:46 ` Peter Zijlstra
2008-02-26 12:45 ` Miklos Szeredi
2008-02-26 12:58 ` Peter Zijlstra
2008-02-20 14:46 ` [PATCH 23/28] mm: methods for teaching filesystems about PG_swapcache pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 24/28] nfs: remove mempools Peter Zijlstra
2008-02-20 14:46 ` [PATCH 25/28] nfs: teach the NFS client how to treat PG_swapcache pages Peter Zijlstra
2008-02-20 14:46 ` [PATCH 26/28] nfs: disable data cache revalidation for swapfiles Peter Zijlstra
2008-02-20 14:46 ` [PATCH 27/28] nfs: enable swap on NFS Peter Zijlstra
2008-02-20 14:46 ` [PATCH 28/28] nfs: fix various memory recursions possible with swap over NFS Peter Zijlstra
2008-02-23 8:06 ` [PATCH 00/28] Swap over NFS -v16 Andrew Morton
2008-02-26 6:03 ` Neil Brown
2008-02-26 10:50 ` Peter Zijlstra
2008-02-26 12:00 ` Peter Zijlstra
2008-02-26 15:29 ` Miklos Szeredi
2008-02-26 15:41 ` Peter Zijlstra
2008-02-26 15:43 ` Peter Zijlstra
2008-02-26 15:47 ` Miklos Szeredi
2008-02-26 17:56 ` Andrew Morton
2008-02-27 5:51 ` Neil Brown
2008-02-27 7:58 ` Peter Zijlstra [this message]
2008-02-27 8:05 ` Pekka Enberg
2008-02-27 8:14 ` Peter Zijlstra
2008-02-27 8:33 ` Peter Zijlstra
2008-02-27 8:43 ` Pekka J Enberg
2008-02-29 11:51 ` Peter Zijlstra
2008-02-29 11:58 ` Pekka Enberg
2008-02-29 12:18 ` Peter Zijlstra
2008-02-29 12:29 ` Pekka Enberg
2008-02-29 1:29 ` Neil Brown
2008-02-29 10:21 ` Peter Zijlstra
2008-03-02 22:18 ` Neil Brown
2008-03-02 23:33 ` Peter Zijlstra
2008-03-03 23:41 ` Neil Brown
2008-03-04 10:28 ` Peter Zijlstra
[not found] ` <1837 <1204626509.6241.39.camel@lappy>
2008-03-07 3:33 ` Neil Brown
2008-03-07 11:17 ` Peter Zijlstra
2008-03-07 11:55 ` Peter Zijlstra
2008-03-10 5:15 ` Neil Brown
2008-03-10 9:17 ` Peter Zijlstra
2008-03-14 5:22 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1204099113.6242.353.camel@lappy \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=neilb@suse.de \
--cc=netdev@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=trond.myklebust@fys.uio.no \
--subject='Re: [PATCH 00/28] Swap over NFS -v16' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).