Netdev Archive on
help / color / mirror / Atom feed
From: Jason Gunthorpe <>
To: Matthew Wilcox <>
Cc:, Christoph Hellwig <>,
	Joao Martins <>,
	John Hubbard <>,
	Logan Gunthorpe <>,
	Ming Lei <>,,,,,,
Subject: Re: Phyr Starter
Date: Mon, 10 Jan 2022 20:41:26 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Mon, Jan 10, 2022 at 07:34:49PM +0000, Matthew Wilcox wrote:

> Finally, it may be possible to stop using scatterlist to describe the
> input to the DMA-mapping operation.  We may be able to get struct
> scatterlist down to just dma_address and dma_length, with chaining
> handled through an enclosing struct.

Can you talk about this some more? IMHO one of the key properties of
the scatterlist is that it can hold huge amounts of pages without
having to do any kind of special allocation due to the chaining.

The same will be true of the phyr idea right?

> I would like to see phyr replace bio_vec everywhere it's currently used.
> I don't have time to do that work now because I'm busy with folios.
> If someone else wants to take that on, I shall cheer from the sidelines.
> What I do intend to do is:

I wonder if we mixed things though..

IMHO there is a lot of optimization to be had by having a
datastructure that is expressly 'the physical pages underlying a
contiguous chunk of va'

If you limit to that scenario then we can be more optimal because
things like byte granular offsets and size in the interior pages don't
need to exist. Every interior chunk is always aligned to its order and
we only need to record the order.

An overall starting offset and total length allow computing the slice
of the first/last entry.

If the physical address is always aligned then we get 12 free bits
from the min 4k alignment and also only need to store order, not an
arbitary byte granular length.

The win is I think we can meaningfully cover most common cases using
only 8 bytes per physical chunk. The 12 bits can be used to encode the
common orders (4k, 2M, 1G, etc) and some smart mechanism to get
another 16 bits to cover 'everything'.

IMHO storage density here is quite important, we end up having to keep
this stuff around for a long time.

I say this here, because I've always though bio_vec/etc are more
general than we actually need, being byte granular at every chunk.

>  - Add an interface to gup.c to pin/unpin N phyrs
>  - Add a sg_map_phyrs()
>    This will take an array of phyrs and allocate an sg for them
>  - Whatever else I need to do to make one RDMA driver happy with
>    this scheme

I spent alot of time already cleaning all the DMA code in RDMA - it is
now nicely uniform and ready to do this sort of change. I was
expecting to be a bio_vec, but this is fine too.

What is needed is a full scatterlist replacement, including the IOMMU

For the IOMMU I would expect the datastructure to be re-used, we start
with a list of physical pages and then 'dma map' gives us a list of
IOVA physical pages, in another allocation, but exactly the same

This 'dma map' could return a pointer to the first datastructure if
there is no iommu, allocate a single entry list if the whole thing can
be linearly mapped with the iommu, and other baroque cases (like pci
offset/etc) will need to allocate full array. ie good HW runs fast and
is memory efficient.

It would be nice to see a patch sketching showing what this
datastructure could look like.

VFIO would like this structure as well as it currently is a very
inefficient page at a time loop when it iommu maps things.


  reply	other threads:[~2022-01-11  0:41 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-10 19:34 Matthew Wilcox
2022-01-11  0:41 ` Jason Gunthorpe [this message]
2022-01-11  4:32   ` Matthew Wilcox
2022-01-11 15:01     ` Jason Gunthorpe
2022-01-11 18:33       ` Matthew Wilcox
2022-01-11 20:21         ` Jason Gunthorpe
2022-01-11 21:25           ` Matthew Wilcox
2022-01-11 22:09             ` Logan Gunthorpe
2022-01-11 22:57               ` Jason Gunthorpe
2022-01-11 23:02                 ` Logan Gunthorpe
2022-01-11 22:53             ` Jason Gunthorpe
2022-01-11 22:57               ` Logan Gunthorpe
2022-01-11 23:02                 ` Jason Gunthorpe
2022-01-11 23:08                   ` Logan Gunthorpe
2022-01-12 18:37               ` Matthew Wilcox
2022-01-12 19:08                 ` Jason Gunthorpe
2022-01-20 14:03                 ` Christoph Hellwig
2022-01-20 17:17                   ` Jason Gunthorpe
2022-01-20 14:00       ` Christoph Hellwig
2022-01-11  9:05   ` Daniel Vetter
2022-01-11 20:26     ` Jason Gunthorpe
2022-01-20 14:09       ` Christoph Hellwig
2022-01-20 13:56   ` Christoph Hellwig
2022-01-20 15:27     ` Keith Busch
2022-01-20 15:28       ` Christoph Hellwig
2022-01-20 17:54       ` Robin Murphy
2022-01-11  8:17 ` John Hubbard
2022-01-11 14:01   ` Matthew Wilcox
2022-01-11 15:02     ` Jason Gunthorpe
2022-01-11 17:31   ` Logan Gunthorpe
2022-01-20 14:12   ` Christoph Hellwig
2022-01-20 21:35     ` John Hubbard
2022-01-11 11:40 ` Thomas Zimmermann
2022-01-11 13:56   ` Matthew Wilcox
2022-01-11 14:10     ` Thomas Zimmermann
2022-01-20 13:39 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: Phyr Starter' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).