LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"
@ 2015-02-20  5:01 Benjamin Herrenschmidt
  2015-02-20  5:06 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2015-02-20  5:01 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Martin K. Petersen, James Bottomley, Linux Kernel Mailing List,
	scsi, Christoph Hellwig

Hi Sreekanth !

While looking at some (unrelated) issue where mtp2sas seems to be using
32-bit DMA instead of 64-bit DMA on some POWER platforms, I noticed this
patch which was merged as 5fb1bf8aaa832e1e9ca3198de7bbecb8eff7db9c.

Can you confirm my understanding that you are:

 - Setting the DMA mask to 32-bit

 - Mapping pages for DMA

 - Changing the DMA mask to 64-bit

?

If yes, then I don't think this is a legal thing to do and definitely
not something supported by all architectures. It might work by accident,
but there is no telling that any translation/DMA mapping provided before
a call to set_dma_mask() is still valid after that call.

The architecture might have to completely reconfigure the iommu, for
example on some PowerPC platforms, we switch from a remapped mapping to
a direct linear map of all memory, all translations established before
the switch might be lost (it depends on the specific implementation).

How does it work on x86 with DMAR ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"
  2015-02-20  5:01 Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support" Benjamin Herrenschmidt
@ 2015-02-20  5:06 ` Benjamin Herrenschmidt
  2015-02-20  5:22   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2015-02-20  5:06 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Martin K. Petersen, James Bottomley, Linux Kernel Mailing List,
	scsi, Christoph Hellwig

On Fri, 2015-02-20 at 16:01 +1100, Benjamin Herrenschmidt wrote:
> Hi Sreekanth !
> 
> While looking at some (unrelated) issue where mtp2sas seems to be using
> 32-bit DMA instead of 64-bit DMA on some POWER platforms, I noticed this
> patch which was merged as 5fb1bf8aaa832e1e9ca3198de7bbecb8eff7db9c.
> 
> Can you confirm my understanding that you are:
> 
>  - Setting the DMA mask to 32-bit
> 
>  - Mapping pages for DMA
> 
>  - Changing the DMA mask to 64-bit
> 
> ?
> 
> If yes, then I don't think this is a legal thing to do and definitely
> not something supported by all architectures. It might work by accident,
> but there is no telling that any translation/DMA mapping provided before
> a call to set_dma_mask() is still valid after that call.
> 
> The architecture might have to completely reconfigure the iommu, for
> example on some PowerPC platforms, we switch from a remapped mapping to
> a direct linear map of all memory, all translations established before
> the switch might be lost (it depends on the specific implementation).
> 
> How does it work on x86 with DMAR ?

Note that even on powerpc platforms where it would work because we
maintain both 32-bit and 64-bit bypass windows in the device address
space simultaneously, you will leak iommu entries unless you also switch
back to 32-bit when freeing the 32-bit mappings... (and you would
probably crash if you tried to free a 64-bit mapping while in 32-bit
mode).

The iommu APIs weren't designed with that "switching mask" facility in
mind...

Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"
  2015-02-20  5:06 ` Benjamin Herrenschmidt
@ 2015-02-20  5:22   ` Benjamin Herrenschmidt
  2015-02-20  5:45     ` James Bottomley
  2015-02-20  7:16     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2015-02-20  5:22 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Martin K. Petersen, James Bottomley, Linux Kernel Mailing List,
	scsi, Christoph Hellwig

On Fri, 2015-02-20 at 16:06 +1100, Benjamin Herrenschmidt wrote:

> Note that even on powerpc platforms where it would work because we
> maintain both 32-bit and 64-bit bypass windows in the device address
> space simultaneously, you will leak iommu entries unless you also switch
> back to 32-bit when freeing the 32-bit mappings... (and you would
> probably crash if you tried to free a 64-bit mapping while in 32-bit
> mode).
> 
> The iommu APIs weren't designed with that "switching mask" facility in
> mind...

Looking a bit more closely, you basically do

 - set_dma_mask(64-bit)
 - set_consistent_dma_mask(32-bit)

Now, I don't know how x86 will react to the conflicting masks, but on
ppc64, I'm pretty sure the second one will barf. IE, the first one will
establish a set of direct mapping ops which give you a bypass of the
iommu to all of memory. The second one will then do a
dma_supported(mask) call which will hit the direct ops, and they will
fail since a 32-bit mask cannot address the bypass completely.

Are architectures really required to support such mismatching dma_mask
and consistent_dma_mask ? what a bloody trainwreck ... :-(

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"
  2015-02-20  5:22   ` Benjamin Herrenschmidt
@ 2015-02-20  5:45     ` James Bottomley
  2015-02-20  7:19       ` Benjamin Herrenschmidt
  2015-04-02  5:39       ` Benjamin Herrenschmidt
  2015-02-20  7:16     ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 8+ messages in thread
From: James Bottomley @ 2015-02-20  5:45 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Sreekanth Reddy, Martin K. Petersen, Linux Kernel Mailing List,
	scsi, Christoph Hellwig

On Fri, 2015-02-20 at 16:22 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2015-02-20 at 16:06 +1100, Benjamin Herrenschmidt wrote:
> 
> > Note that even on powerpc platforms where it would work because we
> > maintain both 32-bit and 64-bit bypass windows in the device address
> > space simultaneously, you will leak iommu entries unless you also switch
> > back to 32-bit when freeing the 32-bit mappings... (and you would
> > probably crash if you tried to free a 64-bit mapping while in 32-bit
> > mode).
> > 
> > The iommu APIs weren't designed with that "switching mask" facility in
> > mind...
> 
> Looking a bit more closely, you basically do
> 
>  - set_dma_mask(64-bit)
>  - set_consistent_dma_mask(32-bit)
> 
> Now, I don't know how x86 will react to the conflicting masks, but on
> ppc64, I'm pretty sure the second one will barf. IE, the first one will
> establish a set of direct mapping ops which give you a bypass of the
> iommu to all of memory. The second one will then do a
> dma_supported(mask) call which will hit the direct ops, and they will
> fail since a 32-bit mask cannot address the bypass completely.
> 
> Are architectures really required to support such mismatching dma_mask
> and consistent_dma_mask ? what a bloody trainwreck ... :-(

Ben, this is legal by design.  It was specifically designed for the
aic79xx SCSI card, but can be used for a variety of other reasons.  The
aic79xx hardware problem was that the DMA engine could address the whole
of memory (it had two address modes, a 39 bit one and a 64 bit one) but
the script engine that runs the mailboxes only had a 32 bit activation
register (the activating write points at the physical address of the
script to begin executing).  This meant that the scripts that run in
memory had to be in the first 4GB of physical memory, hence the split
mask.  The DMA mask specifies that the card can transfer from anywhere
in physical memory, but the consistent_dma_mask says that the consistent
allocation used to get scripts memory must come from the lower 4GB.

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"
  2015-02-20  5:22   ` Benjamin Herrenschmidt
  2015-02-20  5:45     ` James Bottomley
@ 2015-02-20  7:16     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2015-02-20  7:16 UTC (permalink / raw)
  To: Sreekanth Reddy
  Cc: Martin K. Petersen, James Bottomley, Linux Kernel Mailing List,
	scsi, Christoph Hellwig

On Fri, 2015-02-20 at 16:22 +1100, Benjamin Herrenschmidt wrote:

> Looking a bit more closely, you basically do
> 
>  - set_dma_mask(64-bit)
>  - set_consistent_dma_mask(32-bit)
> 
> Now, I don't know how x86 will react to the conflicting masks, but on
> ppc64, I'm pretty sure the second one will barf. IE, the first one will
> establish a set of direct mapping ops which give you a bypass of the
> iommu to all of memory. The second one will then do a
> dma_supported(mask) call which will hit the direct ops, and they will
> fail since a 32-bit mask cannot address the bypass completely.
> 
> Are architectures really required to support such mismatching dma_mask
> and consistent_dma_mask ? what a bloody trainwreck ... :-(

Oh well, looks like x86 supports it and it won't be too hard to support
it on ppc64 as well. We even had some code along those lines for FSL
platforms with an ifdef due to the amount of drivers that used to fail
setting the consistent mask properly but that seems to be mostly fixed
nowadays.

Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"
  2015-02-20  5:45     ` James Bottomley
@ 2015-02-20  7:19       ` Benjamin Herrenschmidt
  2015-04-02  5:39       ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2015-02-20  7:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sreekanth Reddy, Martin K. Petersen, Linux Kernel Mailing List,
	scsi, Christoph Hellwig

On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote:
> Ben, this is legal by design.  It was specifically designed for the
> aic79xx SCSI card, but can be used for a variety of other reasons.  The
> aic79xx hardware problem was that the DMA engine could address the whole
> of memory (it had two address modes, a 39 bit one and a 64 bit one) but
> the script engine that runs the mailboxes only had a 32 bit activation
> register (the activating write points at the physical address of the
> script to begin executing).  This meant that the scripts that run in
> memory had to be in the first 4GB of physical memory, hence the split
> mask.  The DMA mask specifies that the card can transfer from anywhere
> in physical memory, but the consistent_dma_mask says that the consistent
> allocation used to get scripts memory must come from the lower 4GB.

Right, ok, it looks like it's easy enough to support with ZONE_DMA32, I'm
testing patches to create it unconditionally on ppc64 (it used to depend
on us using swiotlb on embedded platforms) and I'll shoot that upstream
if it passes.

Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"
  2015-02-20  5:45     ` James Bottomley
  2015-02-20  7:19       ` Benjamin Herrenschmidt
@ 2015-04-02  5:39       ` Benjamin Herrenschmidt
  2015-04-02  5:59         ` James Bottomley
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2015-04-02  5:39 UTC (permalink / raw)
  To: James Bottomley
  Cc: Sreekanth Reddy, Martin K. Petersen, Linux Kernel Mailing List,
	scsi, Christoph Hellwig

On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote:

> Ben, this is legal by design.  It was specifically designed for the
> aic79xx SCSI card, but can be used for a variety of other reasons.  The
> aic79xx hardware problem was that the DMA engine could address the whole
> of memory (it had two address modes, a 39 bit one and a 64 bit one) but
> the script engine that runs the mailboxes only had a 32 bit activation
> register (the activating write points at the physical address of the
> script to begin executing).  This meant that the scripts that run in
> memory had to be in the first 4GB of physical memory, hence the split
> mask.  The DMA mask specifies that the card can transfer from anywhere
> in physical memory, but the consistent_dma_mask says that the consistent
> allocation used to get scripts memory must come from the lower 4GB.

So looking at that again...

This is interesting ... basically any driver using a different mask has
been broken on powerpc for basically ever. The whole concept was poorly
designed, for example,  the set_consistent_mask isn't a dma_map_ops
unlike everything else.

In some cases, what we want is convey a base+offset information to
drivers but we can't do that.

This stuff cannot work with setups like a lot of our iommus where we
have a remapped region at the bottom of the DMA address space and a
bypass (direct map) region high up.

Basically, we can fix it, at least for most platforms, but it will be
hard, invasive, *and* will need to go to stable. Grmbl.

We'll have to replace our "direct" DMA ops (which just apply an offset)
which we use for devices that set a 64-bit mask on platform that support
a bypass window, with some smart-ass hybrid variant that selectively
shoot stuff up to the bypass window or down via the iommu remapped based
on the applicable mask for a given operation.

It would be nice if we could come up with a way to inform the driver
that we support that sort of "bypass" region with an offset. That would
allow drivers that have that 64-bit base + 32-bit offset scheme to work
much more efficiently for us. The driver could configure the base to be
our "bypass window offset", and we could use ZONE_DMA32 for consistent
DMAs.

It would also help us with things like some GPUs that can only do 40-bit
DMA (which won't allow them to reach our bypass region normally) but do
have a way to configure the generated top bits of all DMA addresses in
some fixed register.

Any idea what such an interface might look like ?

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support"
  2015-04-02  5:39       ` Benjamin Herrenschmidt
@ 2015-04-02  5:59         ` James Bottomley
  0 siblings, 0 replies; 8+ messages in thread
From: James Bottomley @ 2015-04-02  5:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Sreekanth Reddy, Martin K. Petersen, Linux Kernel Mailing List,
	scsi, Christoph Hellwig

On Thu, 2015-04-02 at 16:39 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2015-02-19 at 21:45 -0800, James Bottomley wrote:
> 
> > Ben, this is legal by design.  It was specifically designed for the
> > aic79xx SCSI card, but can be used for a variety of other reasons.  The
> > aic79xx hardware problem was that the DMA engine could address the whole
> > of memory (it had two address modes, a 39 bit one and a 64 bit one) but
> > the script engine that runs the mailboxes only had a 32 bit activation
> > register (the activating write points at the physical address of the
> > script to begin executing).  This meant that the scripts that run in
> > memory had to be in the first 4GB of physical memory, hence the split
> > mask.  The DMA mask specifies that the card can transfer from anywhere
> > in physical memory, but the consistent_dma_mask says that the consistent
> > allocation used to get scripts memory must come from the lower 4GB.
> 
> So looking at that again...
> 
> This is interesting ... basically any driver using a different mask has
> been broken on powerpc for basically ever. The whole concept was poorly
> designed, for example,  the set_consistent_mask isn't a dma_map_ops
> unlike everything else.
> 
> In some cases, what we want is convey a base+offset information to
> drivers but we can't do that.
> 
> This stuff cannot work with setups like a lot of our iommus where we
> have a remapped region at the bottom of the DMA address space and a
> bypass (direct map) region high up.
> 
> Basically, we can fix it, at least for most platforms, but it will be
> hard, invasive, *and* will need to go to stable. Grmbl.

Well, it was originally a hack for altix, because they had no regions
below 4GB and had to specifically manufacture them.  As you know, in
Linux, if Intel doesn't need it, no-one cares and the implementation
bitrots.

> We'll have to replace our "direct" DMA ops (which just apply an offset)
> which we use for devices that set a 64-bit mask on platform that support
> a bypass window, with some smart-ass hybrid variant that selectively
> shoot stuff up to the bypass window or down via the iommu remapped based
> on the applicable mask for a given operation.
> 
> It would be nice if we could come up with a way to inform the driver
> that we support that sort of "bypass" region with an offset. That would
> allow drivers that have that 64-bit base + 32-bit offset scheme to work
> much more efficiently for us. The driver could configure the base to be
> our "bypass window offset", and we could use ZONE_DMA32 for consistent
> DMAs.
> 
> It would also help us with things like some GPUs that can only do 40-bit
> DMA (which won't allow them to reach our bypass region normally) but do
> have a way to configure the generated top bits of all DMA addresses in
> some fixed register.
> 
> Any idea what such an interface might look like ?

Why doesn't the API we have today work (modulo a better implementation)?
A consistent mask specifies a wide range of possible locations for your
bypass region.  You just have to select one and attach it somehow and
remember to use it for consistent allocations.  As long as
set_consistent_mask becomes a dma op, it should all work, right?

James



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-04-02  5:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-20  5:01 Concerns about "mpt2sas: Added Reply Descriptor Post Queue (RDPQ) Array support" Benjamin Herrenschmidt
2015-02-20  5:06 ` Benjamin Herrenschmidt
2015-02-20  5:22   ` Benjamin Herrenschmidt
2015-02-20  5:45     ` James Bottomley
2015-02-20  7:19       ` Benjamin Herrenschmidt
2015-04-02  5:39       ` Benjamin Herrenschmidt
2015-04-02  5:59         ` James Bottomley
2015-02-20  7:16     ` Benjamin Herrenschmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).