LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/7] HWPOISON for hugepage backed KVM guest
@ 2011-01-21  6:28 Naoya Horiguchi
  2011-01-21  6:28 ` [PATCH 1/7] hugetlb: check swap entry in follow_hugetlb_page() Naoya Horiguchi
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21  6:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm

Hi,

I wrote "HWPOISON for hugepage" patchset last year, but it didn't
cover the hugepages used by KVM guest because follow_hugetlb_pages()
called in a guest page fault code path didn't know about swap entry
formatted pmd entry.
This patchset fixes it and makes both soft and hard offline available
on hugepage backed KVM guest.

I appreciate all of your comments and reviews.

Thanks,
Naoya Horiguchi

Summary:

  [PATCH 1/7] hugetlb: check swap entry in follow_hugetlb_page()
  [PATCH 2/7] check hugepage swap entry in get_user_pages_fast()
  [PATCH 3/7] remove putback_lru_pages() in hugepage migration context
  [PATCH 4/7] hugetlb, migration: add migration_hugepage_entry_wait()
  [PATCH 5/7] hugetlb: fix race condition between hugepage soft offline and page fault
  [PATCH 6/7] HWPOISON: pass order to set/clear_page_hwpoison_huge_page()
  [PATCH 7/7] HWPOISON, hugetlb: fix hard offline for hugepage backed KVM guest

  arch/x86/mm/gup.c       |    9 +++++++++
  include/linux/swapops.h |   20 ++++++++++++++++++++
  mm/hugetlb.c            |   39 +++++++++++++++++++++++++++++----------
  mm/memory-failure.c     |   24 +++++++++++++-----------
  mm/migrate.c            |   33 +++++++++++++++++++++++++++++++++
  5 files changed, 104 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/7] hugetlb: check swap entry in follow_hugetlb_page()
  2011-01-21  6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
@ 2011-01-21  6:28 ` Naoya Horiguchi
  2011-01-21  6:28 ` [PATCH 2/7] check hugepage swap entry in get_user_pages_fast() Naoya Horiguchi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21  6:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm

KVM host calls follow_hugetlb_page() in HVA-PFN translation
(through get_user_pages(),) so we need to have it handle swap
entry to detect HWPOISONed or migrating hugepages.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/hugetlb.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git v2.6.38-rc1/mm/hugetlb.c v2.6.38-rc1/mm/hugetlb.c
index bb0b7c1..97c7471 100644
--- v2.6.38-rc1/mm/hugetlb.c
+++ v2.6.38-rc1/mm/hugetlb.c
@@ -2731,6 +2731,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	while (vaddr < vma->vm_end && remainder) {
 		pte_t *pte;
 		int absent;
+		int swap;
 		struct page *page;
 
 		/*
@@ -2740,6 +2741,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		 */
 		pte = huge_pte_offset(mm, vaddr & huge_page_mask(h));
 		absent = !pte || huge_pte_none(huge_ptep_get(pte));
+		swap = !absent && !pte_present(*pte);
 
 		/*
 		 * When coredumping, it suits get_dump_page if we just return
@@ -2754,7 +2756,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
 			break;
 		}
 
-		if (absent ||
+		if (absent || swap ||
 		    ((flags & FOLL_WRITE) && !pte_write(huge_ptep_get(pte)))) {
 			int ret;
 
-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/7] check hugepage swap entry in get_user_pages_fast()
  2011-01-21  6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
  2011-01-21  6:28 ` [PATCH 1/7] hugetlb: check swap entry in follow_hugetlb_page() Naoya Horiguchi
@ 2011-01-21  6:28 ` Naoya Horiguchi
  2011-01-21  6:28 ` [PATCH 3/7] remove putback_lru_pages() in hugepage migration context Naoya Horiguchi
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21  6:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm

When the hugepage associated with a given address is HWPOISONed
or under page migration, get_user_pages_fast() need to fall back
to slow path in order to make the page fault fail (when HWPOISONed)
or to wait for migration completion (when under migration.)

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 arch/x86/mm/gup.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git v2.6.38-rc1/arch/x86/mm/gup.c v2.6.38-rc1/arch/x86/mm/gup.c
index dbe34b9..93b74dd 100644
--- v2.6.38-rc1/arch/x86/mm/gup.c
+++ v2.6.38-rc1/arch/x86/mm/gup.c
@@ -176,6 +176,15 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
 		 */
 		if (pmd_none(pmd) || pmd_trans_splitting(pmd))
 			return 0;
+		/*
+		 * PMD can be in swap entry style when the hugepage
+		 * pointed to by it is hwpoisoned or under migration.
+		 * Because the swap entry format has no flag showing
+		 * the page size, pmd_large() cannot detect it.
+		 * So then we just fall back to the slow path.
+		 */
+		if (unlikely(!pmd_present(pmd)))
+			return 0;
 		if (unlikely(pmd_large(pmd))) {
 			if (!gup_huge_pmd(pmd, addr, next, write, pages, nr))
 				return 0;
-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/7] remove putback_lru_pages() in hugepage migration context
  2011-01-21  6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
  2011-01-21  6:28 ` [PATCH 1/7] hugetlb: check swap entry in follow_hugetlb_page() Naoya Horiguchi
  2011-01-21  6:28 ` [PATCH 2/7] check hugepage swap entry in get_user_pages_fast() Naoya Horiguchi
@ 2011-01-21  6:28 ` Naoya Horiguchi
  2011-01-21  6:40   ` Minchan Kim
  2011-01-21  6:28 ` [PATCH 4/7] hugetlb, migration: add migration_hugepage_entry_wait() Naoya Horiguchi
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21  6:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm,
	Minchan Kim

This putback_lru_pages() is inserted at cf608ac19c to allow
memory compaction to count the number of migration failed pages.

But we should not do it for a hugepage because page->lru of a hugepage
is used differently from that of a normal page:

   in-use hugepage : page->lru is unlinked,
   free hugepage   : page->lru is linked to the free hugepage list,

so putting back hugepages to LRU lists collapses this rule.
We just drop this change (without any impact on memory compaction.)

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
---
 mm/memory-failure.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git v2.6.38-rc1/mm/memory-failure.c v2.6.38-rc1/mm/memory-failure.c
index 548fbd7..b4910e8 100644
--- v2.6.38-rc1/mm/memory-failure.c
+++ v2.6.38-rc1/mm/memory-failure.c
@@ -1295,7 +1295,6 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	ret = migrate_huge_pages(&pagelist, new_page, MPOL_MF_MOVE_ALL, 0,
 				true);
 	if (ret) {
-		putback_lru_pages(&pagelist);
 		pr_debug("soft offline: %#lx: migration failed %d, type %lx\n",
 			 pfn, ret, page->flags);
 		if (ret > 0)
-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/7] hugetlb, migration: add migration_hugepage_entry_wait()
  2011-01-21  6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
                   ` (2 preceding siblings ...)
  2011-01-21  6:28 ` [PATCH 3/7] remove putback_lru_pages() in hugepage migration context Naoya Horiguchi
@ 2011-01-21  6:28 ` Naoya Horiguchi
  2011-01-21  6:28 ` [PATCH 5/7] hugetlb: fix race condition between hugepage soft offline and page fault Naoya Horiguchi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21  6:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm

migration_entry_wait() doesn't work for hugepage, because page->ptl
on hugepage is unused for now. So this patch introduces a hugepage
variant of this function.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 include/linux/swapops.h |    8 ++++++++
 mm/hugetlb.c            |    3 ++-
 mm/migrate.c            |   33 +++++++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+), 1 deletions(-)

diff --git v2.6.38-rc1/include/linux/swapops.h v2.6.38-rc1/include/linux/swapops.h
index cd42e30..a220ef5 100644
--- v2.6.38-rc1/include/linux/swapops.h
+++ v2.6.38-rc1/include/linux/swapops.h
@@ -169,3 +169,11 @@ static inline int non_swap_entry(swp_entry_t entry)
 	return 0;
 }
 #endif
+
+#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_MIGRATION)
+extern void migration_hugepage_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+					unsigned long address);
+#else
+static inline void migration_hugepage_entry_wait(struct mm_struct *mm,
+				 pmd_t *pmd, unsigned long address) { }
+#endif
diff --git v2.6.38-rc1/mm/hugetlb.c v2.6.38-rc1/mm/hugetlb.c
index 97c7471..d3b856a 100644
--- v2.6.38-rc1/mm/hugetlb.c
+++ v2.6.38-rc1/mm/hugetlb.c
@@ -2618,7 +2618,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (ptep) {
 		entry = huge_ptep_get(ptep);
 		if (unlikely(is_hugetlb_entry_migration(entry))) {
-			migration_entry_wait(mm, (pmd_t *)ptep, address);
+			migration_hugepage_entry_wait(mm, (pmd_t *)ptep,
+						      address);
 			return 0;
 		} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
 			return VM_FAULT_HWPOISON_LARGE | 
diff --git v2.6.38-rc1/mm/migrate.c v2.6.38-rc1/mm/migrate.c
index 46fe8cc..363685f 100644
--- v2.6.38-rc1/mm/migrate.c
+++ v2.6.38-rc1/mm/migrate.c
@@ -220,6 +220,39 @@ out:
 	pte_unmap_unlock(ptep, ptl);
 }
 
+void migration_hugepage_entry_wait(struct mm_struct *mm, pmd_t *pmd,
+				   unsigned long address)
+{
+	pte_t *ptep, pte;
+	spinlock_t *ptl;
+	swp_entry_t entry;
+	struct page *page;
+
+	ptep = (pte_t *)pmd;
+	ptl = &(mm)->page_table_lock;
+	spin_lock(ptl);
+	pte = *ptep;
+	if (!is_swap_pte(pte))
+		goto out;
+
+	entry = pte_to_swp_entry(pte);
+	if (!is_migration_entry(entry))
+		goto out;
+
+	page = migration_entry_to_page(entry);
+
+	if (!get_page_unless_zero(page))
+		goto out;
+	spin_unlock(ptl);
+	pte_unmap(ptep);
+	wait_on_page_locked(page);
+	put_page(page);
+	return;
+out:
+	spin_unlock(ptl);
+	pte_unmap(ptep);
+}
+
 /*
  * Replace the page in the mapping.
  *
-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 5/7] hugetlb: fix race condition between hugepage soft offline and page fault
  2011-01-21  6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
                   ` (3 preceding siblings ...)
  2011-01-21  6:28 ` [PATCH 4/7] hugetlb, migration: add migration_hugepage_entry_wait() Naoya Horiguchi
@ 2011-01-21  6:28 ` Naoya Horiguchi
  2011-01-21  6:28 ` [PATCH 6/7] HWPOISON: pass order to set/clear_page_hwpoison_huge_page() Naoya Horiguchi
  2011-01-21  6:29 ` [PATCH 7/7] HWPOISON, hugetlb: fix hard offline for hugepage backed KVM guest Naoya Horiguchi
  6 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21  6:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm

When hugepage soft offline succeeds, the old hugepage is expected
to be temporarily enqueued to free hugepage list, and then dequeued
as a HWPOISONed hugepage.

But there is a race window, which collapses reference counting.
See the following list:

  soft offline                       page fault

  soft_offline_huge_page
    migrate_huge_pages
      unmap_and_move_huge_page
        lock_page
        try_to_unmap
        move_to_new_page
          migrate_page
            migrate_page_copy
                                     hugetlb_fault
                                       migration_hugepage_entry_wait
                                         get_page_unless_zero
                                         wait_on_page_locked
          remove_migration_ptes
        unlock_page
  -------------------------------------------------------------------
        put_page                         put_page
    dequeue_hwpoisoned_huge_page


Two put_page()s below the horizontal line are racy.
If put_page() from soft offline comes first, the HWPOISONed hugepage
remains in free hugepage list, causing wrong results.

It's hard to fix this problem by locking because we cannot control
page fault by page lock.
So this patch just adds to free_huge_page() a HWPOISON check,
which ensures that the last user of the old hugepage dequeues it
from free hugepage list.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/hugetlb.c |   28 +++++++++++++++++++++-------
 1 files changed, 21 insertions(+), 7 deletions(-)

diff --git v2.6.38-rc1/mm/hugetlb.c v2.6.38-rc1/mm/hugetlb.c
index d3b856a..b777c81 100644
--- v2.6.38-rc1/mm/hugetlb.c
+++ v2.6.38-rc1/mm/hugetlb.c
@@ -524,6 +524,8 @@ struct hstate *size_to_hstate(unsigned long size)
 	return NULL;
 }
 
+static int __dequeue_hwpoisoned_huge_page(struct page *hpage, struct hstate *h);
+
 static void free_huge_page(struct page *page)
 {
 	/*
@@ -548,6 +550,8 @@ static void free_huge_page(struct page *page)
 		h->surplus_huge_pages_node[nid]--;
 	} else {
 		enqueue_huge_page(h, page);
+		if (unlikely(PageHWPoison(page)))
+			__dequeue_hwpoisoned_huge_page(page, h);
 	}
 	spin_unlock(&hugetlb_lock);
 	if (mapping)
@@ -2932,17 +2936,11 @@ static int is_hugepage_on_freelist(struct page *hpage)
 	return 0;
 }
 
-/*
- * This function is called from memory failure code.
- * Assume the caller holds page lock of the head page.
- */
-int dequeue_hwpoisoned_huge_page(struct page *hpage)
+static int __dequeue_hwpoisoned_huge_page(struct page *hpage, struct hstate *h)
 {
-	struct hstate *h = page_hstate(hpage);
 	int nid = page_to_nid(hpage);
 	int ret = -EBUSY;
 
-	spin_lock(&hugetlb_lock);
 	if (is_hugepage_on_freelist(hpage)) {
 		list_del(&hpage->lru);
 		set_page_refcounted(hpage);
@@ -2950,6 +2948,22 @@ int dequeue_hwpoisoned_huge_page(struct page *hpage)
 		h->free_huge_pages_node[nid]--;
 		ret = 0;
 	}
+	return ret;
+}
+
+/*
+ * This function is called from memory failure code.
+ * Assume the caller holds page lock of the head page.
+ */
+int dequeue_hwpoisoned_huge_page(struct page *hpage)
+{
+	struct hstate *h = page_hstate(hpage);
+	int ret;
+
+	if (!h)
+		return 0;
+	spin_lock(&hugetlb_lock);
+	ret = __dequeue_hwpoisoned_huge_page(hpage, h);
 	spin_unlock(&hugetlb_lock);
 	return ret;
 }
-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 6/7] HWPOISON: pass order to set/clear_page_hwpoison_huge_page()
  2011-01-21  6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
                   ` (4 preceding siblings ...)
  2011-01-21  6:28 ` [PATCH 5/7] hugetlb: fix race condition between hugepage soft offline and page fault Naoya Horiguchi
@ 2011-01-21  6:28 ` Naoya Horiguchi
  2011-01-21  6:29 ` [PATCH 7/7] HWPOISON, hugetlb: fix hard offline for hugepage backed KVM guest Naoya Horiguchi
  6 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21  6:28 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm

When a surplus hugepage is soft-offlined, the old hugepage will
be freed into buddy directly. Then we'll have no access to hstate.
So we need to pass page order to PG_HWPoison set/clear functions.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
---
 mm/memory-failure.c |   21 ++++++++++++---------
 1 files changed, 12 insertions(+), 9 deletions(-)

diff --git v2.6.38-rc1/mm/memory-failure.c v2.6.38-rc1/mm/memory-failure.c
index b4910e8..eed1846 100644
--- v2.6.38-rc1/mm/memory-failure.c
+++ v2.6.38-rc1/mm/memory-failure.c
@@ -927,18 +927,18 @@ static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 	return ret;
 }
 
-static void set_page_hwpoison_huge_page(struct page *hpage)
+static void set_page_hwpoison_huge_page(struct page *hpage, int order)
 {
 	int i;
-	int nr_pages = 1 << compound_trans_order(hpage);
+	int nr_pages = 1 << order;
 	for (i = 0; i < nr_pages; i++)
 		SetPageHWPoison(hpage + i);
 }
 
-static void clear_page_hwpoison_huge_page(struct page *hpage)
+static void clear_page_hwpoison_huge_page(struct page *hpage, int order)
 {
 	int i;
-	int nr_pages = 1 << compound_trans_order(hpage);
+	int nr_pages = 1 << order;
 	for (i = 0; i < nr_pages; i++)
 		ClearPageHWPoison(hpage + i);
 }
@@ -1002,7 +1002,8 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 				atomic_long_sub(nr_pages, &mce_bad_pages);
 				return 0;
 			}
-			set_page_hwpoison_huge_page(hpage);
+			set_page_hwpoison_huge_page(hpage,
+						    compound_order(hpage));
 			res = dequeue_hwpoisoned_huge_page(hpage);
 			action_result(pfn, "free huge",
 				      res ? IGNORED : DELAYED);
@@ -1078,7 +1079,7 @@ int __memory_failure(unsigned long pfn, int trapno, int flags)
 	 * page lock held, we can safely set PG_hwpoison bits on tail pages.
 	 */
 	if (PageHuge(p))
-		set_page_hwpoison_huge_page(hpage);
+		set_page_hwpoison_huge_page(hpage, compound_order(hpage));
 
 	wait_on_page_writeback(p);
 
@@ -1197,7 +1198,8 @@ int unpoison_memory(unsigned long pfn)
 		atomic_long_sub(nr_pages, &mce_bad_pages);
 		freeit = 1;
 		if (PageHuge(page))
-			clear_page_hwpoison_huge_page(page);
+			clear_page_hwpoison_huge_page(page,
+						      compound_order(page));
 	}
 	unlock_page(page);
 
@@ -1275,6 +1277,7 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	int ret;
 	unsigned long pfn = page_to_pfn(page);
 	struct page *hpage = compound_head(page);
+	int order = compound_order(hpage);
 	LIST_HEAD(pagelist);
 
 	ret = get_any_page(page, pfn, flags);
@@ -1303,8 +1306,8 @@ static int soft_offline_huge_page(struct page *page, int flags)
 	}
 done:
 	if (!PageHWPoison(hpage))
-		atomic_long_add(1 << compound_trans_order(hpage), &mce_bad_pages);
-	set_page_hwpoison_huge_page(hpage);
+		atomic_long_add(1 << order, &mce_bad_pages);
+	set_page_hwpoison_huge_page(hpage, order);
 	dequeue_hwpoisoned_huge_page(hpage);
 	/* keep elevated page count for bad page */
 	return ret;
-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 7/7] HWPOISON, hugetlb: fix hard offline for hugepage backed KVM guest
  2011-01-21  6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
                   ` (5 preceding siblings ...)
  2011-01-21  6:28 ` [PATCH 6/7] HWPOISON: pass order to set/clear_page_hwpoison_huge_page() Naoya Horiguchi
@ 2011-01-21  6:29 ` Naoya Horiguchi
  6 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21  6:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm

When a qemu-kvm process touches HWPOISONed pages,
we expect that a SIGBUS signal causes MCE on the guest OS.
But currently it doesn't work on a hugepage backed KVM guest
because is_hwpoison_address() can't detect the HWPOISON entry
on PMD and the guest repeats page fault infinitely.

This patch fixes it.

Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Huang Ying <ying.huang@intel.com>
---
 include/linux/swapops.h |   12 ++++++++++++
 mm/hugetlb.c            |    4 +++-
 mm/memory-failure.c     |    2 +-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git v2.6.38-rc1/include/linux/swapops.h v2.6.38-rc1/include/linux/swapops.h
index a220ef5..2c1a942 100644
--- v2.6.38-rc1/include/linux/swapops.h
+++ v2.6.38-rc1/include/linux/swapops.h
@@ -177,3 +177,15 @@ extern void migration_hugepage_entry_wait(struct mm_struct *mm, pmd_t *pmd,
 static inline void migration_hugepage_entry_wait(struct mm_struct *mm,
 				 pmd_t *pmd, unsigned long address) { }
 #endif
+
+#if defined(CONFIG_MEMORY_FAILURE) && defined(CONFIG_HUGETLB_PAGE)
+extern int is_hugetlb_entry_hwpoisoned(pte_t pte);
+#else
+static inline int is_hugetlb_entry_hwpoisoned(pte_t pte)
+{
+	return 0;
+}
+#endif
+
+
+
diff --git v2.6.38-rc1/mm/hugetlb.c v2.6.38-rc1/mm/hugetlb.c
index b777c81..c65922e 100644
--- v2.6.38-rc1/mm/hugetlb.c
+++ v2.6.38-rc1/mm/hugetlb.c
@@ -2185,7 +2185,8 @@ static int is_hugetlb_entry_migration(pte_t pte)
 		return 0;
 }
 
-static int is_hugetlb_entry_hwpoisoned(pte_t pte)
+#ifdef CONFIG_MEMORY_FAILURE
+int is_hugetlb_entry_hwpoisoned(pte_t pte)
 {
 	swp_entry_t swp;
 
@@ -2197,6 +2198,7 @@ static int is_hugetlb_entry_hwpoisoned(pte_t pte)
 	} else
 		return 0;
 }
+#endif
 
 void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
 			    unsigned long end, struct page *ref_page)
diff --git v2.6.38-rc1/mm/memory-failure.c v2.6.38-rc1/mm/memory-failure.c
index eed1846..8ee5038 100644
--- v2.6.38-rc1/mm/memory-failure.c
+++ v2.6.38-rc1/mm/memory-failure.c
@@ -1461,7 +1461,7 @@ int is_hwpoison_address(unsigned long addr)
 	pmdp = pmd_offset(pudp, addr);
 	pmd = *pmdp;
 	if (!pmd_present(pmd) || pmd_large(pmd))
-		return 0;
+		return is_hugetlb_entry_hwpoisoned(*(pte_t *)pmdp);
 	ptep = pte_offset_map(pmdp, addr);
 	pte = *ptep;
 	pte_unmap(ptep);
-- 
1.7.3.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/7] remove putback_lru_pages() in hugepage migration context
  2011-01-21  6:28 ` [PATCH 3/7] remove putback_lru_pages() in hugepage migration context Naoya Horiguchi
@ 2011-01-21  6:40   ` Minchan Kim
  2011-01-21 10:00     ` Naoya Horiguchi
  0 siblings, 1 reply; 10+ messages in thread
From: Minchan Kim @ 2011-01-21  6:40 UTC (permalink / raw)
  To: Naoya Horiguchi
  Cc: Andi Kleen, Andrew Morton, Wu Fengguang, Mel Gorman,
	Christoph Lameter, Huang Ying, Fernando Luis Vazquez Cao,
	tony.luck, LKML, linux-mm

Hello,

On Fri, Jan 21, 2011 at 3:28 PM, Naoya Horiguchi
<n-horiguchi@ah.jp.nec.com> wrote:
> This putback_lru_pages() is inserted at cf608ac19c to allow
> memory compaction to count the number of migration failed pages.
>
> But we should not do it for a hugepage because page->lru of a hugepage
> is used differently from that of a normal page:
>
>   in-use hugepage : page->lru is unlinked,
>   free hugepage   : page->lru is linked to the free hugepage list,
>
> so putting back hugepages to LRU lists collapses this rule.
> We just drop this change (without any impact on memory compaction.)
>
> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> Cc: Minchan Kim <minchan.kim@gmail.com>

As I said previously, It seems mistake during patch merge.
I didn't add it in my original patch. You can see my final patch.
https://lkml.org/lkml/2010/8/24/248

Anyway, I realized it recently so I sent the patch to Andrew.
Could you see this one?
https://lkml.org/lkml/2011/1/20/241

Thanks for notice me.



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/7] remove putback_lru_pages() in hugepage migration context
  2011-01-21  6:40   ` Minchan Kim
@ 2011-01-21 10:00     ` Naoya Horiguchi
  0 siblings, 0 replies; 10+ messages in thread
From: Naoya Horiguchi @ 2011-01-21 10:00 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Wu Fengguang, Mel Gorman, Christoph Lameter,
	Huang Ying, Fernando Luis Vazquez Cao, tony.luck, LKML, linux-mm,
	Andi Kleen

Hi,

On Fri, Jan 21, 2011 at 03:40:35PM +0900, Minchan Kim wrote:
> Hello,
> 
> On Fri, Jan 21, 2011 at 3:28 PM, Naoya Horiguchi
> <n-horiguchi@ah.jp.nec.com> wrote:
> > This putback_lru_pages() is inserted at cf608ac19c to allow
> > memory compaction to count the number of migration failed pages.
> >
> > But we should not do it for a hugepage because page->lru of a hugepage
> > is used differently from that of a normal page:
> >
> >   in-use hugepage : page->lru is unlinked,
> >   free hugepage   : page->lru is linked to the free hugepage list,
> >
> > so putting back hugepages to LRU lists collapses this rule.
> > We just drop this change (without any impact on memory compaction.)
> >
> > Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
> > Cc: Minchan Kim <minchan.kim@gmail.com>
> 
> As I said previously, It seems mistake during patch merge.
> I didn't add it in my original patch. You can see my final patch.
> https://lkml.org/lkml/2010/8/24/248

OK.

> Anyway, I realized it recently so I sent the patch to Andrew.
> Could you see this one?
> https://lkml.org/lkml/2011/1/20/241

This patch seems not to change hugepage soft offline's behavior,
so I have no objection.

-- Naoya Horiguchi

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-01-21 10:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-21  6:28 [PATCH 0/7] HWPOISON for hugepage backed KVM guest Naoya Horiguchi
2011-01-21  6:28 ` [PATCH 1/7] hugetlb: check swap entry in follow_hugetlb_page() Naoya Horiguchi
2011-01-21  6:28 ` [PATCH 2/7] check hugepage swap entry in get_user_pages_fast() Naoya Horiguchi
2011-01-21  6:28 ` [PATCH 3/7] remove putback_lru_pages() in hugepage migration context Naoya Horiguchi
2011-01-21  6:40   ` Minchan Kim
2011-01-21 10:00     ` Naoya Horiguchi
2011-01-21  6:28 ` [PATCH 4/7] hugetlb, migration: add migration_hugepage_entry_wait() Naoya Horiguchi
2011-01-21  6:28 ` [PATCH 5/7] hugetlb: fix race condition between hugepage soft offline and page fault Naoya Horiguchi
2011-01-21  6:28 ` [PATCH 6/7] HWPOISON: pass order to set/clear_page_hwpoison_huge_page() Naoya Horiguchi
2011-01-21  6:29 ` [PATCH 7/7] HWPOISON, hugetlb: fix hard offline for hugepage backed KVM guest Naoya Horiguchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).