LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Tiberiu Georgescu <tiberiu.georgescu@nutanix.com>
Cc: David Hildenbrand <david@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Alistair Popple <apopple@nvidia.com>,
	Ivan Teterevkov <ivan.teterevkov@nutanix.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Hugh Dickins <hughd@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	"Carl Waldspurger [C]" <carl.waldspurger@nutanix.com>,
	Florian Schmidt <flosch@nutanix.com>,
	Jonathan Davies <jond@nutanix.com>
Subject: Re: [PATCH RFC 0/4] mm: Enable PM_SWAP for shmem with PTE_MARKER
Date: Fri, 20 Aug 2021 15:12:01 -0400	[thread overview]
Message-ID: <YR/+gfL8RCP8XoB1@t490s> (raw)
In-Reply-To: <B130B700-B3DB-4D07-A632-73030BCBC715@nutanix.com>

Hello, Tiberiu,

On Fri, Aug 20, 2021 at 04:49:58PM +0000, Tiberiu Georgescu wrote:
> Firstly, I am worried lseek with the SEEK_HOLE flag would page in pages from
> swap, so using it would be a direct factor on its own output. If people are working
> on Live Migration, this would not be ideal. I am not 100% sure this is how lseek
> works, so please feel free to contradict me, but I think it would swap in some
> of the pages that it seeks through, if not all, to figure out when to stop. Unless it
> leverages the page cache somehow, or an internal bitmap.

It shouldn't.  Man page is clear on that:

       SEEK_DATA
              Adjust the file offset to the next location in the file greater
              than or equal to offset containing data.  If offset points to
              data, then the file offset is set to offset.

Again, I think your requirement is different from CRIU, so I think mincore() is
the right thing for you.

> 
> Secondly, mincore() could return some "false positives" for this particular use
> case. That is because it returns flag=1 for pages which are still in the swap
> cache, so the output becomes ambiguous.

I don't think so; mincore() should return flag=0 if it's either in swap cache
or even got dropped from it.  I think its name/doc also shows that in the fact
that "as long as it's not in RAM, the flag is cleared".  That's why I think
that should indeed be what you're looking for, if swp entry can be ignored.
More below on that.

Note that my series is as you mentioned missing the changes to support
mincore() (otherwise I'll know the existance of it!).  It'll be trivial to add
that, but let's see whether mincore() will satisfy your need.

[...]

> It is possible for the swap device to be network attached and shared, so multiple
> hosts would need to understand its content. Then it is no longer internal to one
> kernel only.
> 
> By being swap-aware, we can skip swapped-out pages during migration (to prevent IO and potential thrashing), and transfer those pages in another way that
> is zero-copy.

That sounds reasonable, but I'm not aware of any user-API that exposes swap
entries to userspace, or is there one?

I.e., how do you know which swap device is which?  How do you guarantee the
kernel swp entry information won't change along with time?

Thanks,

-- 
Peter Xu


  reply	other threads:[~2021-08-20 19:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-07  3:25 Peter Xu
2021-08-07  3:25 ` [PATCH RFC 1/4] mm: Introduce PTE_MARKER swap entry Peter Xu
2021-08-07  3:25 ` [PATCH RFC 2/4] mm: Check against orig_pte for finish_fault() Peter Xu
2021-08-07  3:25 ` [PATCH RFC 3/4] mm: Handle PTE_MARKER page faults Peter Xu
2021-08-07  3:25 ` [PATCH RFC 4/4] mm: Install marker pte when page out for shmem pages Peter Xu
2021-08-13 15:18   ` Tiberiu Georgescu
2021-08-13 16:01     ` Peter Xu
2021-08-18 18:02       ` Tiberiu Georgescu
2021-08-17  9:04 ` [PATCH RFC 0/4] mm: Enable PM_SWAP for shmem with PTE_MARKER David Hildenbrand
2021-08-17 17:09   ` Peter Xu
2021-08-17 18:46     ` David Hildenbrand
2021-08-17 20:24       ` Peter Xu
2021-08-18  8:24         ` David Hildenbrand
2021-08-18 17:52           ` Tiberiu Georgescu
2021-08-18 18:13             ` David Hildenbrand
2021-08-19 14:54               ` Tiberiu Georgescu
2021-08-19 17:26                 ` David Hildenbrand
2021-08-20 16:49                   ` Tiberiu Georgescu
2021-08-20 19:12                     ` Peter Xu [this message]
2021-08-25 13:40                       ` Tiberiu Georgescu
2021-08-25 14:59                         ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YR/+gfL8RCP8XoB1@t490s \
    --to=peterx@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=carl.waldspurger@nutanix.com \
    --cc=david@redhat.com \
    --cc=flosch@nutanix.com \
    --cc=hughd@google.com \
    --cc=ivan.teterevkov@nutanix.com \
    --cc=jond@nutanix.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=tiberiu.georgescu@nutanix.com \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH RFC 0/4] mm: Enable PM_SWAP for shmem with PTE_MARKER' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).