LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: peterz@infradead.org, rientjes@google.com, npiggin@suse.de,
menage@google.com, dfults@sgi.com, linux-kernel@vger.kernel.org,
containers@lists.osdl.org
Subject: Re: [patch 0/7] cpuset writeback throttling
Date: Wed, 5 Nov 2008 12:56:03 -0800 [thread overview]
Message-ID: <20081105125603.36d0c222.akpm@linux-foundation.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0811051434300.31450@quilx.com>
On Wed, 5 Nov 2008 14:40:05 -0600 (CST)
Christoph Lameter <cl@linux-foundation.org> wrote:
> On Wed, 5 Nov 2008, Andrew Morton wrote:
>
> > > That means running reclaim. But we are only interested in getting rid of
> > > dirty pages. Plus the filesystem guys have repeatedly pointed out that
> > > page sized I/O to random places in a file is not a good thing to do. There
> > > was actually talk of stopping kswapd from writing out pages!
> >
> > They don't have to be reclaimed.
>
> Well the LRU is used for reclaim. If you step over it then its using the
> existing reclaim logic in vmscan.c right?
Only if you use it that way.
I imagine that a suitable implementation would start IO on the page
then move it to the other end of the LRU. ie: treat it as referenced.
Pretty simple stuff.
If we were to do writeout on the page's inode instead then we'd need
to move the page out of the way somehow, presumably by rotating it.
It's all workable outable.
> > > > There would probably be performance benefits in doing
> > > > address_space-ordered writeback, so the dirty-memory throttling could
> > > > pick a dirty page off the LRU, go find its inode and then feed that
> > > > into __sync_single_inode().
> > >
> > > We cannot call into the writeback functions for an inode from a reclaim
> > > context. We can write back single pages but not a range of pages from an
> > > inode due to various locking issues (see discussion on slab defrag
> > > patchset).
> >
> > We're not in a reclaim context. We're in sys_write() context.
>
> Dirtying a page can occur from a variety of kernel contexts.
This writeback will occur from one quite specific place:
balance_dirty_pages(). That's called from sys_write() and pagefaults.
Other scruffy places like splice too.
But none of that matters - the fact is that we're _already_ doing
writeback from balance_dirty_pages(). All we're talking about here is
alternative schemes for looking up the pages to write.
> > > > But _are_ people hitting this problem? I haven't seen any real-looking
> > > > reports in ages. Is there some workaround? If so, what is it? How
> > > > serious is this problem now?
> > >
> > > Are there people who are actually having memcg based solutions deployed?
> > > No enterprise release includes it yet so I guess that there is not much of
> > > a use yet.
> >
> > If you know the answer then please provide it. If you don't, please
> > say "I don't know".
>
> I thought we were talking about memcg related reports. I have dealt with
> scores of the cpuset related ones in my prior job.
>
> Workarounds are:
>
> 1. Reduce the global dirty ratios so that the number of dirty pages in a
> cpuset cannot become too high.
That would be less than the smallest node's memory capacity, I guess.
> 2. Do not create small cpusets where the system can dirty all pages.
>
> 3. Find other ways to limit the dirty pages (run sync once in a while or
> so).
hm, OK.
See, here's my problem: we have a pile of new code which fixes some
problem. But the problem seems to be fairly small - it only affects a
small number of sophisticated users and they already have workarounds
in place.
So the world wouldn't end if we just didn't merge it. Those users
stick with their workarounds and the kernel remains simpler and
smaller.
How do we work out which is the best choice here? I don't have enough
information to do this.
next prev parent reply other threads:[~2008-11-05 20:57 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-30 19:23 David Rientjes
2008-10-30 19:23 ` [patch 1/7] cpusets: add dirty map to struct address_space David Rientjes
2008-11-04 21:09 ` Andrew Morton
2008-11-04 21:20 ` Christoph Lameter
2008-11-04 21:42 ` Andrew Morton
2008-10-30 19:23 ` [patch 2/7] pdflush: allow the passing of a nodemask parameter David Rientjes
2008-10-30 19:23 ` [patch 3/7] mm: make page writeback obey cpuset constraints David Rientjes
2008-10-30 19:23 ` [patch 4/7] mm: cpuset aware reclaim writeout David Rientjes
2008-10-30 19:23 ` [patch 5/7] mm: throttle writeout with cpuset awareness David Rientjes
2008-10-30 19:23 ` [patch 6/7] cpusets: per cpuset dirty ratios David Rientjes
2008-10-30 19:23 ` [patch 7/7] cpusets: update documentation for writeback throttling David Rientjes
2008-10-30 21:08 ` [patch 0/7] cpuset " Dave Chinner
2008-10-30 21:33 ` Christoph Lameter
2008-10-30 22:03 ` Dave Chinner
2008-10-31 13:47 ` Christoph Lameter
2008-10-31 16:36 ` David Rientjes
2008-11-04 20:47 ` Andrew Morton
2008-11-04 20:53 ` Peter Zijlstra
2008-11-04 20:58 ` Christoph Lameter
2008-11-04 21:10 ` David Rientjes
2008-11-04 21:16 ` Andrew Morton
2008-11-04 21:21 ` Peter Zijlstra
2008-11-04 21:50 ` Andrew Morton
2008-11-04 22:17 ` Christoph Lameter
2008-11-04 22:35 ` Andrew Morton
2008-11-04 22:52 ` Christoph Lameter
2008-11-04 23:36 ` Andrew Morton
2008-11-05 1:31 ` KAMEZAWA Hiroyuki
2008-11-05 3:09 ` Andrew Morton
2008-11-05 2:45 ` Christoph Lameter
2008-11-05 3:05 ` Andrew Morton
2008-11-05 4:31 ` KAMEZAWA Hiroyuki
2008-11-10 9:02 ` Andrea Righi
2008-11-10 10:02 ` David Rientjes
2008-11-05 13:52 ` Christoph Lameter
2008-11-05 18:41 ` Andrew Morton
2008-11-05 20:21 ` Christoph Lameter
2008-11-05 20:31 ` Andrew Morton
2008-11-05 20:40 ` Christoph Lameter
2008-11-05 20:56 ` Andrew Morton [this message]
2008-11-05 21:28 ` Christoph Lameter
2008-11-05 21:55 ` Paul Menage
2008-11-05 22:04 ` David Rientjes
2008-11-06 1:34 ` KAMEZAWA Hiroyuki
2008-11-06 20:35 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20081105125603.36d0c222.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=containers@lists.osdl.org \
--cc=dfults@sgi.com \
--cc=linux-kernel@vger.kernel.org \
--cc=menage@google.com \
--cc=npiggin@suse.de \
--cc=peterz@infradead.org \
--cc=rientjes@google.com \
--subject='Re: [patch 0/7] cpuset writeback throttling' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).