LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: linux-kernel@vger.kernel.org
Subject: [PATCH - RFC] allow setting vm_dirty below 1% for large memory machines
Date: Tue, 9 Jan 2007 19:57:50 +1100	[thread overview]
Message-ID: <17827.22798.625018.673326@notabene.brown> (raw)


Imagine a machine with lots of memory - say 100Gig.

Suppose there is one (largish) filesystem that is ext3 (or maybe
reiser) with the default data=ordered.

Suppose this filesystem is being written to steadily so that the
maximum amount of memory is always dirty.  With the default
vm.dirty_ratio of 40%, this could be 40Gig.

When the journal triggers a commit, all the dirty data needs to be
flushed out in order to adhere to the "data=ordered" semantics.
This can take a while.

While this is happening, some small updates such as 'atime' update can
block waiting for the journal to be unlocked again after the flush.

Waiting for 40gig to flush for an atime update to complete is clearly
unsatisfactory. 

We can reduce the amount of dirty memory by setting vm.dirty_ratio
down to 1 still allows 1Gig of dirty data which can cause unpleasant
pauses (and this was on a kernel where '1' still meant something.  In
current kernels, '5' is the effective minimum).

So this patch removes the minimum of '5' and introduces a new tunable
'vm.dirty_kb' which sets an upper limit in Kibibytes.

This allows the amount of dirty memory to be limited to - say - 50M
which should flush fast enough.

So: is this patch acceptable?  And should a lower default value for
vm_dirty_kb be used?


Some of the details in the above description might not be 100%
accurate (I'm not sure of the exact connection between atime updates
and journal commits).  The symptoms are:
  While generating constant write traffic on a machine with > 20Gig
  of RAM, performing assorted read-only operations can sometimes
  produces a pause of 10s of seconds.
  The pause can be removed by:
    - mounting noatime
    - mounting data=writeback
    - setting vm.dirty_kb to 1000 with this patch.

Maybe the problem is really just in atime updates, but I feel that it
is broader than that.

Thanks for any comments.

NeilBrown

-----------------
Allow fixed limit on amount of dirty memory.


On large memory machines, a interger percentage (dirty_ratio) does not
allow sufficiently fine control on the limit to the amount of dirty memory,
especially when that percentage is forced to be >=5.

So remove the >=5 restriction and introduce 'vm_dirty_kb' which sets
an upper limit in kibibytes to the amount of dirty memory.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./include/linux/writeback.h |    1 +
 ./kernel/sysctl.c           |   11 +++++++++++
 ./mm/page-writeback.c       |   19 +++++++++++++++----
 3 files changed, 27 insertions(+), 4 deletions(-)

diff .prev/include/linux/writeback.h ./include/linux/writeback.h
--- .prev/include/linux/writeback.h	2007-01-09 17:16:00.000000000 +1100
+++ ./include/linux/writeback.h	2007-01-09 17:16:31.000000000 +1100
@@ -95,6 +95,7 @@ static inline int laptop_spinned_down(vo
 /* These are exported to sysctl. */
 extern int dirty_background_ratio;
 extern int vm_dirty_ratio;
+extern int vm_dirty_kb;
 extern int dirty_writeback_interval;
 extern int dirty_expire_interval;
 extern int block_dump;

diff .prev/kernel/sysctl.c ./kernel/sysctl.c
--- .prev/kernel/sysctl.c	2007-01-09 17:16:00.000000000 +1100
+++ ./kernel/sysctl.c	2007-01-09 17:17:57.000000000 +1100
@@ -860,6 +860,17 @@ static ctl_table vm_table[] = {
 		.extra2		= &one_hundred,
 	},
 	{
+		.ctl_name	= -2,
+		.procname	= "dirty_kb",
+		.data		= &vm_dirty_kb,
+		.maxlen		= sizeof(vm_dirty_kb),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+		.extra2		= NULL,
+	},
+	{
 		.ctl_name	= VM_DIRTY_WB_CS,
 		.procname	= "dirty_writeback_centisecs",
 		.data		= &dirty_writeback_interval,

diff .prev/mm/page-writeback.c ./mm/page-writeback.c
--- .prev/mm/page-writeback.c	2007-01-09 17:16:00.000000000 +1100
+++ ./mm/page-writeback.c	2007-01-09 17:52:55.000000000 +1100
@@ -75,6 +75,11 @@ int dirty_background_ratio = 10;
 int vm_dirty_ratio = 40;
 
 /*
+ * If that percentage exceeds this limit, use this instead
+ */
+int vm_dirty_kb = 10000000; /* 10 gigabytes, way too much really */
+
+/*
  * The interval between `kupdate'-style writebacks, in jiffies
  */
 int dirty_writeback_interval = 5 * HZ;
@@ -149,15 +154,21 @@ get_dirty_limits(long *pbackground, long
 	if (dirty_ratio > unmapped_ratio / 2)
 		dirty_ratio = unmapped_ratio / 2;
 
-	if (dirty_ratio < 5)
-		dirty_ratio = 5;
-
 	background_ratio = dirty_background_ratio;
 	if (background_ratio >= dirty_ratio)
 		background_ratio = dirty_ratio / 2;
+	if (dirty_background_ratio && !background_ratio)
+		background_ratio = 1;
 
-	background = (background_ratio * available_memory) / 100;
 	dirty = (dirty_ratio * available_memory) / 100;
+	if (dirty > vm_dirty_kb / (PAGE_SIZE/1024))
+		dirty = vm_dirty_kb / (PAGE_SIZE/1024);
+	if (dirty_ratio == 0)
+		background = 0;
+	else if (background_ratio >= dirty_ratio)
+		background = dirty / 2;
+	else
+		background = dirty * background_ratio / dirty_ratio;
 	tsk = current;
 	if (tsk->flags & PF_LESS_THROTTLE || rt_task(tsk)) {
 		background += background / 4;

             reply	other threads:[~2007-01-09  8:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-01-09  8:57 Neil Brown [this message]
2007-01-09 10:10 ` Andrew Morton
2007-01-10  3:04   ` Neil Brown
2007-01-10  3:29   ` Neil Brown
2007-01-10  3:41     ` Andrew Morton
2007-01-11 11:04 ` dean gaudet
2007-01-11 20:21   ` Andrew Morton
2007-01-11 22:35     ` dean gaudet
2007-01-11 22:48       ` Andrew Morton
2007-03-07 10:23         ` Leroy van Logchem

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17827.22798.625018.673326@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --subject='Re: [PATCH - RFC] allow setting vm_dirty below 1% for large memory machines' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).