LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-mm@kvack.org, David Miller <davem@davemloft.net>
Subject: Re: [PATCH 9/9] net: vm deadlock avoidance core
Date: Thu, 18 Jan 2007 13:18:44 +0100	[thread overview]
Message-ID: <1169122724.6197.50.camel@twins> (raw)
In-Reply-To: <20070118104144.GA20925@2ka.mipt.ru>

On Thu, 2007-01-18 at 13:41 +0300, Evgeniy Polyakov wrote:

> > > What about 'level-7' ack as you described in introduction?
> > 
> > Take NFS, it does full data traffic in kernel.
> 
> NFS case is exactly the situation, when you only need to generate an ACK.

No it is not, it needs the full RPC response.

> > > You artificially limit system to just add a reserve to generate one ack.
> > > For that purpose you do not need to have all those flags - just reseve
> > > some data in network core and use it when system is in OOM (or reclaim)
> > > for critical data pathes.
> > 
> > How would that end up being different, I would have to replace all
> > allocations done in the full network processing path.
> > 
> > This seems a much less invasive method, all the (allocation) code can
> > stay the way it is and use the normal allocation functions.

> And acutally we are starting to talk about different approach - having
> separated allocator for network, which will be turned on on OOM (reclaim
> or at any other time).

I think we might be, I'm more talking about requirements on the
allocator, while you seem to talk about implementations.

Replacing the allocator, or splitting it in two based on a condition are
all fine as long as they observe the requirements.

The requirement I add is that there is a reserve nobody touches unless
given express permission.

You could implement this by modifying each reachable allocator call site
and stick a branch in and use an alternate allocator when the normal
route fails and we do have permission; much like:

   foo = kmalloc(size, gfp_mask);
+  if (!foo && special)
+    foo = my_alloc(size)

And earlier versions of this work did something like that. But it
litters the code quite badly and its quite easy to miss spots. There can
be quite a few allocations in processing network data.

Hence my work on integrating this into the regular memory allocators.

FYI; 'special' evaluates to something like:
  !(gfp_mask & __GFP_NOMEMALLOC) &&
  ((gfp_mask & __GFP_EMERGENCY) || 
   (!in_irq() && (current->flags & PF_MEMALLOC)))


>  If you do not mind, I would likw to refresh a
> discussion about network tree allocator,

>  which utilizes own pool of
> pages, 

very high order pages, no?

This means that you have to either allocate at boot time and cannot
resize/add pools; which means you waste all that memory if the network
load never comes near using the reserved amount.

Or, you get into all the same trouble the hugepages folks are trying so
very hard to solve.

> performs self-defragmentation of the memeory, 

Does it move memory about? 

All it does is try to avoid fragmentation by policy - a problem
impossible to solve in general; but can achieve good results in view of
practical limitations on program behaviour.

Does your policy work for the given workload? we'll see.

Also, on what level, each level has both internal and external
fragmentation. I can argue that having large immovable objects in memory
adds to the fragmentation issues on the page-allocator level.

> is very SMP
> friendly in that regard that it is per-cpu like slab and never free
> objects on different CPUs, so they always stay in the same cache.

This makes it very hard to guarantee a reserve limit. (Not impossible,
just more difficult)

> Among other goodies it allows to have full sending/receiving zero-copy.

That won't ever work unless you have page aligned objects, otherwise you
cannot map them into user-space. Which seems to be at odds with your
tight packing/reduce internal fragmentation goals.

Zero-copy entails mapping the page the hardware writes the packet in
into user-space, right?

Since its impossible to predict to whoem the next packet is addressed
the packets must be written (by hardware) to different pages.



  reply	other threads:[~2007-01-18 12:20 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-16  9:45 [PATCH 0/9] VM deadlock avoidance -v10 Peter Zijlstra
2007-01-16  9:45 ` [PATCH 1/9] mm: page allocation rank Peter Zijlstra
2007-01-16  9:45 ` [PATCH 2/9] mm: slab allocation fairness Peter Zijlstra
2007-01-16  9:46 ` [PATCH 3/9] mm: allow PF_MEMALLOC from softirq context Peter Zijlstra
2007-01-16  9:46 ` [PATCH 4/9] mm: serialize access to min_free_kbytes Peter Zijlstra
2007-01-16  9:46 ` [PATCH 5/9] mm: emergency pool Peter Zijlstra
2007-01-16  9:46 ` [PATCH 6/9] mm: __GFP_EMERGENCY Peter Zijlstra
2007-01-16  9:46 ` [PATCH 7/9] mm: allow mempool to fall back to memalloc reserves Peter Zijlstra
2007-01-16  9:46 ` [PATCH 8/9] slab: kmem_cache_objs_to_pages() Peter Zijlstra
2007-01-16  9:46 ` [PATCH 9/9] net: vm deadlock avoidance core Peter Zijlstra
2007-01-16 13:25   ` Evgeniy Polyakov
2007-01-16 13:47     ` Peter Zijlstra
2007-01-16 15:33       ` Evgeniy Polyakov
2007-01-16 16:08         ` Peter Zijlstra
2007-01-17  4:54           ` Evgeniy Polyakov
2007-01-17  9:07             ` Peter Zijlstra
2007-01-18 10:41               ` Evgeniy Polyakov
2007-01-18 12:18                 ` Peter Zijlstra [this message]
2007-01-18 13:58                   ` Possible ways of dealing with OOM conditions Evgeniy Polyakov
2007-01-18 15:10                     ` Peter Zijlstra
2007-01-18 15:50                       ` Evgeniy Polyakov
2007-01-18 17:31                         ` Peter Zijlstra
2007-01-18 18:34                           ` Evgeniy Polyakov
2007-01-19 12:53                             ` Peter Zijlstra
2007-01-19 22:56                               ` Evgeniy Polyakov
2007-01-20 22:36                                 ` Rik van Riel
2007-01-21  1:46                                   ` Evgeniy Polyakov
2007-01-21  2:14                                     ` Evgeniy Polyakov
2007-01-21 16:30                                     ` Rik van Riel
2007-01-19 17:54                           ` Christoph Lameter
2007-01-17  9:12 ` [PATCH 0/9] VM deadlock avoidance -v10 Pavel Machek
2007-01-17  9:20   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1169122724.6197.50.camel@twins \
    --to=a.p.zijlstra@chello.nl \
    --cc=davem@davemloft.net \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=netdev@vger.kernel.org \
    --subject='Re: [PATCH 9/9] net: vm deadlock avoidance core' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).