LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX
@ 2008-02-17 16:41 Török Edwin
  2008-02-17 16:51 ` Oliver Pinter
  2008-02-18 23:22 ` Linda Walsh
  0 siblings, 2 replies; 7+ messages in thread
From: Török Edwin @ 2008-02-17 16:41 UTC (permalink / raw)
  To: xfs; +Cc: Linux Kernel, Arjan van de Ven

[-- Attachment #1: Type: text/plain, Size: 362 bytes --]

Hi,

xfsaild is causing many wakeups, a quick investigation shows
xfsaild_push is always
returning 30 msecs timeout value.

This is on an idle system, running only gnome, and gnome-terminal.

I suggest changing the timeout logic in xfsaild to be more power
consumption friendly.

See below my original report to the powerTOP mailing list.

Best regards,
--Edwin

[-- Attachment #2: Re: new offender in 2.6.25-git: xfsaild.eml --]
[-- Type: message/rfc822, Size: 2550 bytes --]

From: Arjan van de Ven <arjan@linux.intel.com>
To: "Török Edwin" <edwintorok@gmail.com>
Cc: power@bughost.org
Subject: Re: new offender in 2.6.25-git: xfsaild
Date: Sun, 17 Feb 2008 08:09:06 -0800
Message-ID: <47B85C22.3070504@linux.intel.com>

Török Edwin wrote:
> Török Edwin wrote:
>> Hi,
>>
>> On latest -git of 2.6.25 I am getting lots of wakeups from xfsaild.
>>   23.5% ( 33.3)           xfsaild : schedule_timeout (process_timeout)
>>   
> 
> [Should I Cc: xfs mailing list / lkml on this?]
> 
> The problem seems to be with the timeout logic in xfsaild_push, which
> can return 3 timeout values (msecs): 1000, 10, 20, 30.
> I inserted a marker and attached a probe function, schedule_timeout
> always got called with 9 jiffies (which is 30 msecs, I have HZ=300).
> 
> Changing xfs_trans_ail.c:270 from "tout += 20" to "tout = 1000", made
> xfsaild only do 1 wakeup/s instead of 33!
> 
> For some reason xfs_aild always thinks it has work (I/O) to do, and
> never chooses the 1000 msec sleep value.
> 

sounds like an XFS bug... worth reporting to the xfs/lkml folks for sure.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX
  2008-02-17 16:41 xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX Török Edwin
@ 2008-02-17 16:51 ` Oliver Pinter
  2008-02-17 22:47   ` David Chinner
  2008-02-18 23:22 ` Linda Walsh
  1 sibling, 1 reply; 7+ messages in thread
From: Oliver Pinter @ 2008-02-17 16:51 UTC (permalink / raw)
  To: Török Edwin
  Cc: xfs, Linux Kernel, Arjan van de Ven, Christoph Lameter,
	David Chinner, Christoph Hellwig

On 2/17/08, Török Edwin <edwintorok@gmail.com> wrote:
> Hi,
>
> xfsaild is causing many wakeups, a quick investigation shows
> xfsaild_push is always
> returning 30 msecs timeout value.
>
> This is on an idle system, running only gnome, and gnome-terminal.
>
> I suggest changing the timeout logic in xfsaild to be more power
> consumption friendly.
>
> See below my original report to the powerTOP mailing list.
>
> Best regards,
> --Edwin
>


-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX
  2008-02-17 16:51 ` Oliver Pinter
@ 2008-02-17 22:47   ` David Chinner
  2008-02-18  9:41     ` Török Edwin
  0 siblings, 1 reply; 7+ messages in thread
From: David Chinner @ 2008-02-17 22:47 UTC (permalink / raw)
  To: Oliver Pinter
  Cc: Török Edwin, xfs, Linux Kernel, Arjan van de Ven,
	Christoph Lameter, David Chinner, Christoph Hellwig

On Sun, Feb 17, 2008 at 05:51:08PM +0100, Oliver Pinter wrote:
> On 2/17/08, Török Edwin <edwintorok@gmail.com> wrote:
> > Hi,
> >
> > xfsaild is causing many wakeups, a quick investigation shows
> > xfsaild_push is always
> > returning 30 msecs timeout value.

That's a bug, and has nothing to do with power consumption. ;)

I can see that there is a dirty item in the filesystem:

Entering kdb (current=0xe00000b8f4fe0000, pid 30046) on processor 3 due to Breakpoint @ 0xa000000100454fc0
[3]kdb> bt
Stack traceback for pid 30046
0xe00000b8f4fe0000    30046        2  1    3   R  0xe00000b8f4fe0340 *xfsaild
0xa000000100454fc0 xfsaild_push
        args (0xe00000b8ffff9090, 0xe00000b8f4fe7e30, 0x31b)
....
[3]kdb> xail 0xe00000b8ffff9090
AIL for mp 0xe00000b8ffff9090, oldest first
[0] type buf flags: 0x1 <in ail >   lsn [1:13133]
   buf 0xe00000b880258800 blkno 0x0 flags: 0x2 <dirty >
   Superblock (at 0xe00000b8f9b3c000)
[3]kdb>

the superblock is dirty, and the lsn is well beyond the target
of the xfsaild hence it *should* be idling. However, it isn't
idling because there is a dirty item in the list and the idle trigger
of "is list empty" is not tripping.

I only managed to reproduce this on a lazy superblock counter
filesystem (i.e.  new mkfs and recent kernel), as it logs the
superblock every so often, and that is probably what is keeping the
fs dirty like this.

Can you see if the patch below fixes the problem.

---

Idle state is not being detected properly by the xfsaild push
code. The current idle state is detected by an empty list
which may never happen with mostly idle filesystem or one
using lazy superblock counters. A single dirty item in the
list will result repeated looping to push everything past
the target when everything because it fails to check if we
managed to push anything.

Fix by considering a dirty list with everything past the target
as an idle state and set the timeout appropriately.

Signed-off-by: Dave Chinner <dgc@sgi.com>
---
 fs/xfs/xfs_trans_ail.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c
===================================================================
--- 2.6.x-xfs-new.orig/fs/xfs/xfs_trans_ail.c	2008-02-18 09:14:34.000000000 +1100
+++ 2.6.x-xfs-new/fs/xfs/xfs_trans_ail.c	2008-02-18 09:18:52.070682570 +1100
@@ -261,14 +261,17 @@ xfsaild_push(
 		xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
 	}
 
-	/*
-	 * We reached the target so wait a bit longer for I/O to complete and
-	 * remove pushed items from the AIL before we start the next scan from
-	 * the start of the AIL.
-	 */
-	if ((XFS_LSN_CMP(lsn, target) >= 0)) {
+	if (count && (XFS_LSN_CMP(lsn, target) >= 0)) {
+		/*
+		 * We reached the target so wait a bit longer for I/O to
+		 * complete and remove pushed items from the AIL before we
+		 * start the next scan from the start of the AIL.
+		 */
 		tout += 20;
 		last_pushed_lsn = 0;
+	} else if (!count) {
+		/* We're past our target or empty, so idle */
+		tout = 1000;
 	} else if ((restarts > XFS_TRANS_PUSH_AIL_RESTARTS) ||
 		   (count && ((stuck * 100) / count > 90))) {
 		/*

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX
  2008-02-17 22:47   ` David Chinner
@ 2008-02-18  9:41     ` Török Edwin
  2008-02-18 10:21       ` David Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Török Edwin @ 2008-02-18  9:41 UTC (permalink / raw)
  To: David Chinner
  Cc: Oliver Pinter, xfs, Linux Kernel, Arjan van de Ven,
	Christoph Lameter, Christoph Hellwig

David Chinner wrote:
> On Sun, Feb 17, 2008 at 05:51:08PM +0100, Oliver Pinter wrote:
>   
>> On 2/17/08, Török Edwin <edwintorok@gmail.com> wrote:
>>     
>>> Hi,
>>>
>>> xfsaild is causing many wakeups, a quick investigation shows
>>> xfsaild_push is always
>>> returning 30 msecs timeout value.
>>>       
>
> That's a bug

Ok. Your patches fixes the 30+ wakeups :)

> , and has nothing to do with power consumption. ;)
>   

I suggest using a sysctl value (such as
/proc/sys/vm/dirty_writeback_centisecs), instead of a hardcoded default
1000.
That would further reduce the wakeups.

>
> I only managed to reproduce this on a lazy superblock counter
> filesystem (i.e.  new mkfs and recent kernel), 

The filesystem was created in July 2007

> Can you see if the patch below fixes the problem.

Yes, it reduces wakeups to 1/sec.

Thanks,
--Edwin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX
  2008-02-18  9:41     ` Török Edwin
@ 2008-02-18 10:21       ` David Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: David Chinner @ 2008-02-18 10:21 UTC (permalink / raw)
  To: Török Edwin
  Cc: David Chinner, Oliver Pinter, xfs, Linux Kernel,
	Arjan van de Ven, Christoph Lameter, Christoph Hellwig

On Mon, Feb 18, 2008 at 11:41:39AM +0200, Török Edwin wrote:
> David Chinner wrote:
> > On Sun, Feb 17, 2008 at 05:51:08PM +0100, Oliver Pinter wrote:
> >   
> >> On 2/17/08, Török Edwin <edwintorok@gmail.com> wrote:
> >>     
> >>> Hi,
> >>>
> >>> xfsaild is causing many wakeups, a quick investigation shows
> >>> xfsaild_push is always
> >>> returning 30 msecs timeout value.
> >>>       
> >
> > That's a bug
> 
> Ok. Your patches fixes the 30+ wakeups :)

Good. I'll push it out for review then.

> > , and has nothing to do with power consumption. ;)
> >   
> 
> I suggest using a sysctl value (such as
> /proc/sys/vm/dirty_writeback_centisecs), instead of a hardcoded default
> 1000.

No, too magic. I dislike adding knobs to workaround issues that
really should be fixed by having sane default behaviour.  Further
down the track as we correct know issues with the AIL push code
we'll be able to increase this idle timeout or even make it purely
wakeup driven once we get back to an idle state. However, right now
it still needs that once a second wakeup to work around a nasty
corner case that can hang the filesystem....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX
  2008-02-17 16:41 xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX Török Edwin
  2008-02-17 16:51 ` Oliver Pinter
@ 2008-02-18 23:22 ` Linda Walsh
  2008-02-19  8:20   ` David Chinner
  1 sibling, 1 reply; 7+ messages in thread
From: Linda Walsh @ 2008-02-18 23:22 UTC (permalink / raw)
  To: Török Edwin; +Cc: xfs, Linux Kernel, Arjan van de Ven

Not to look excessively dumb, but what's xfsaild?

xfs seems to be sprouting daemons at a more rapid pace
these days...xfsbufd, xfssyncd, xfsdatad, xfslogd, xfs_mru_cache, and
now xfsaild?

Not a complaint if it ups performance, but I do sorta wonder what all
of them are for and why they are needed "now" but not for, say,
kernels before 2.6.18 (arbitrary number picked out of hat).

Like bufd writes out buffers, logd writes/hands the log, datad?  Isn't
the data in buffers? mru_cache? -- isn't that handled by the linux
block layer?  Sorry...just a bit confused by the additions...

Are there any design docs (scribbles?) saying what these do and why
they were added so I can just go read 'em myself?  I'm sure they
were added for good reason...just am curious more than anything.

Thanksd
-lindad


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX
  2008-02-18 23:22 ` Linda Walsh
@ 2008-02-19  8:20   ` David Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: David Chinner @ 2008-02-19  8:20 UTC (permalink / raw)
  To: Linda Walsh
  Cc: Török Edwin, xfs, Linux Kernel, Arjan van de Ven

On Mon, Feb 18, 2008 at 03:22:02PM -0800, Linda Walsh wrote:
> Not to look excessively dumb, but what's xfsaild?

AIL = Active Item List

It is a sorted list all the logged metadata objects that have not
yet been written back to disk.  The xfsaild is responsible for tail
pushing the log.  i.e.  writing back objects in the AIL in the most
efficient manner possible.

Why a thread? Because allowing parallelism in tail pushing is a
scalability problem and moving this to it's own thread completely
isolates it from parallelism. Tail pushing only requires a small
amount of CPU time, but it requires a global scope spinlock.
Isolating the pushing to a single CPU means the spinlock is not
contended across every CPU in the machine.

How much did it improve scalability? on a 2048p machine with an
MPI job that did a synchronised close of 12,000 files (6 per CPU),
the close time went from ~5400s without the thread to 9s with the
xfsaild. That's only about 600x faster. ;)

> xfs seems to be sprouting daemons at a more rapid pace
> these days...xfsbufd, xfssyncd, xfsdatad, xfslogd, xfs_mru_cache, and
> now xfsaild?

Why not? Got to make use of all those cores machines have these
days. ;)

Fundamentally, threads are cheap and simple. We'll keep adding
threads where it makes sense as long as it improves performance and
scalability.

> Are there any design docs (scribbles?) saying what these do and why
> they were added so I can just go read 'em myself?  I'm sure they
> were added for good reason...just am curious more than anything.

'git log' is your friend. The commits that introduce the new threads
explain why they are necessary. ;)

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-02-19  8:20 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-17 16:41 xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX Török Edwin
2008-02-17 16:51 ` Oliver Pinter
2008-02-17 22:47   ` David Chinner
2008-02-18  9:41     ` Török Edwin
2008-02-18 10:21       ` David Chinner
2008-02-18 23:22 ` Linda Walsh
2008-02-19  8:20   ` David Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).