LKML Archive on
help / color / mirror / Atom feed
From: Andrea Arcangeli <>
To: "Jindřich Makovička" <>
Cc:, Mel Gorman <>,
	Andrew Morton <>
Subject: Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2
Date: Wed, 2 Feb 2011 01:26:05 +0100	[thread overview]
Message-ID: <20110202002605.GD16981@random.random> (raw)
In-Reply-To: <>

On Tue, Feb 01, 2011 at 10:24:00PM +0100, Jindřich Makovička wrote:
> With -rc2, there is
> $ ps aux | grep -E "kswap|khugep"
> root       474  0.0  0.0      0     0 ?        S    20:44   0:00 [kswapd0]
> root       540  0.0  0.0      0     0 ?        DN   20:44   0:00 [khugepaged]
> Sysrq-t output is attached.

khugepaged is missing at the top because dmesg is too small to fit all

Anyway I see lots of tasks (you've some heavy java load allocating
plenty of hugepages) that allocates transparent hugepages and they're
all stuck in migrate_pages->wait_on_page_writeback and

> Good news is, I don't see these issues with -rc3.

Ah try again, I didn't check the diff between -rc2 and -rc3 to be able
to tell what helped.. but it sounds too easy that got magically fixed
by -rc3.

Anyway it's not THP, it had to be something in compaction, and if it
happens again you can be sure that doing "echo never >defrag" will fix
it (if it really is it). Ironically you can leave khugepaged/defrag
set to "always". It's ok if khugepaged stays in D state (khugepaged
will actually be not noticeable at all in D state with CONFIG_NUMA=n,
because it'd allocate all hugepages without having to hold any
mmap_sem at all, but with CONFIG_NUMA=y it tried to allocate the
hugepage from the right node and it needs to pass a vma down to the
allocator to track the right allocation node, and that requires the
mmap_sem read mode during the allocation to avoid the vma to go away,
but it's no big deal).

Maybe we need to change compaction to never block unless some
__GFP_COMPACTION_WAIT bitflag is set. It's perfectly ok to fail some
hugepage allocation if there's congestion like that without trying so
hard to allocate hugepages. The only thing that would need to pass
down a __GFP_COMPACTION_WAIT would then be fork() in the kernel stack
allocation... everything else should have a 4k fallback. Even
khugepaged doesn't need so hard to compact if the system is under huge

Usually to reproduce you need "cp /dev/zero /mnt/usbdrive", and that
tends to hang all systems no matter THP or not... it's hard to
quantify what is normal and what is not.

I've another latency issue that is much easier to quantify for some
heavy write fs-network load being reported that is most certainly
related to the use of compaction even for the jumbo frames and large
network skbs. It's still compaction related (not THP related as THP on
but with compaction only used by THP it doesn't happen). I'll let you
know when that is fixed for any patch to try as that may benefit your
workload too. In the meantime if you've have more data let me know.


  reply	other threads:[~2011-02-02  0:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-31 19:28 Jindřich Makovička
2011-02-01 15:49 ` Andrea Arcangeli
2011-02-01 21:24   ` Jindřich Makovička
2011-02-02  0:26     ` Andrea Arcangeli [this message]
2011-02-03 13:24   ` Mel Gorman
2011-02-03 19:06     ` Andrea Arcangeli
2011-02-03 21:16       ` Jindřich Makovička
2011-02-04 15:48         ` Andrea Arcangeli
2011-02-13 10:47           ` Jindřich Makovička

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110202002605.GD16981@random.random \ \ \ \ \ \
    --subject='Re: khugepaged: gets stuck when writing to USB flash, 2.6.38-rc2' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).