LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: PCI Bursting with PIO
       [not found] <fa.Mbzb1/2dWnp5V/5ElzijlkAstZU@ifi.uio.no>
@ 2008-02-16  6:00 ` Robert Hancock
  2008-02-17  4:53   ` Dan Gora
  0 siblings, 1 reply; 10+ messages in thread
From: Robert Hancock @ 2008-02-16  6:00 UTC (permalink / raw)
  To: Dan Gora; +Cc: linux-kernel

Dan Gora wrote:
> Hi,
> 
> I am trying to optimize a driver for a slave only PCI device and am
> having a lot of trouble getting any kind of PCI burst transactions in
> either the read or the write direction.  Using bcopy/memcpy or even a
> hand-crafted while (len) { *pdst++ = *psrc++} (with pdst and psrc
> unsigned long*) I can only get writes to burst and even in that case
> only for 2 data phases (8 bytes) and only on 64 bit machines.  The
> best that I have managed is to use a hand crafted asm function which
> copies the data through mmx registers on i386 machines, but that still
> only bursts a maximum of 16 bytes in the write direction and not at
> all in the read direction.  The source and destination pointers are
> both aligned to 8 byte boundaries, so I don't think that it's an
> alignment issue.

The chipset is being limited by what the CPU is giving it. If the CPU 
sends only a small amount of data in one access then the chipset usually 
does not try to burst more than that.

> 
> Is there any way to get PIO to burst over the PCI bus in the read and
> write direction?  My device has 4 BAR registers, but the area where I
> am transferring data is marked 'prefetchable' (although the others are
> not).  I read here: http://lkml.org/lkml/2004/9/23/393 that this was a
> prerequisite, but it is apparently not sufficient.  He also mentioned
> that the area had to be marked as write-back, but it's not clear how
> you can tell (no /proc/mtrr doesn't tell you) or that it has anything
> to do with bursting reads.
> 
> Any ideas would be really appreciated,

Well, in order for the CPU to batch up more writes you'd have to map the 
  BAR as either write-combining or write-back. If it's not listed in 
/proc/mtrr it will be the default setting of uncacheable. X has code to 
set up the video memory on the video card as write-combining so it can 
get better write performance, you could do something similar.

Setting it as write-back might allow you to get the reads to do bursting 
  as well (since the CPU will do a cache-line fill instead of individual 
accesses) but this if the device is modifying this memory area, unless 
you add code to invalidate those cache lines before reading the data 
you'll get stale data back. You could run into some other less obvious 
issues as well, as normally device memory regions are not mapped write-back.

In general, especially if you need to read data back from the device, 
implementing a DMA engine would be by far the better option. Most 
chipsets seem not at all optimized for handling sequential reads from 
PCI memory from the CPU. (Even in the DMA case, you have to be careful 
with what type of memory read transaction you use when transferring from 
host memory - some chipsets don't like to burst more than one cycle if 
you use normal Memory Read instead of Memory Read Line or Memory Read 
Multiple.)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCI Bursting with PIO
  2008-02-16  6:00 ` PCI Bursting with PIO Robert Hancock
@ 2008-02-17  4:53   ` Dan Gora
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Gora @ 2008-02-17  4:53 UTC (permalink / raw)
  To: linux-kernel

On Feb 15, 2008 10:00 PM, Robert Hancock <rwh461@mail.usask.ca> wrote:
>
> Well, in order for the CPU to batch up more writes you'd have to map the
>   BAR as either write-combining or write-back. If it's not listed in
> /proc/mtrr it will be the default setting of uncacheable.

Ok, this is pretty much what I thought, but I still don't really have
any idea how to do this.  ioremap() doesn't take any flags and I'm not
using ioremap_uncacheable(), plus the BAR is marked prefetchable...

> X has code to
> set up the video memory on the video card as write-combining so it can
> get better write performance, you could do something similar.

Alan mentioned this as well, but I haven't tried to hunt this code
yet.  If you have any pointers as to where I might find this, I would
appreciate it.

> Setting it as write-back might allow you to get the reads to do bursting
> as well (since the CPU will do a cache-line fill instead of individual
> accesses)

I don't see what the cache write policy has to do with the reads.  If
the region is marked cacheable, then all reads should try and read a
cache line, right?  The write-back or write-through policy only has to
do with the writes.  If it's write through then writes go directly to
RAM, if it's write-back then they hit the cache and are flushed when
the  line is flushed (LRU replacement, explicit cache line flush,
etc..), right?

> but this if the device is modifying this memory area, unless
> you add code to invalidate those cache lines before reading the data
> you'll get stale data back.

Yeah this could definitely be tricky, would pci_dma_sync suffice for this?

> You could run into some other less obvious
> issues as well, as normally device memory regions are not mapped write-back.
>
> In general, especially if you need to read data back from the device,
> implementing a DMA engine would be by far the better option. Most
> chipsets seem not at all optimized for handling sequential reads from
> PCI memory from the CPU. (Even in the DMA case, you have to be careful
> with what type of memory read transaction you use when transferring from
> host memory - some chipsets don't like to burst more than one cycle if
> you use normal Memory Read instead of Memory Read Line or Memory Read
> Multiple.)

True enough... Fortunately my device allows me to set these...

What I am trying to avoid is PCI read transactions in general.  PCI
reads are slow pretty much no matter if they are originated from the
device or from the host because of all the multitude of bridges they
have to go through (I've seen 5 in some cases... sheesh).  So
ultimately I like for everything going to the device to be written
from the host, then everything going towards the host be DMA'd into
RAM by the device, at least then we can take advantage of PCI write
posting and you don't have to wait for the write to actually complete
before we plod on.  But this depends on at least getting get write
burst performance from the host so that the time to write the data
from host is less than the time it would take for the device to read
the data out of RAM.

thanks again for your help!
dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCI Bursting with PIO
       [not found]   ` <fa.kFanAY/O5VMnSf6YXqEyxmsR62U@ifi.uio.no>
@ 2008-02-17 19:06     ` Robert Hancock
  0 siblings, 0 replies; 10+ messages in thread
From: Robert Hancock @ 2008-02-17 19:06 UTC (permalink / raw)
  To: Dan Gora; +Cc: linux-kernel

Dan Gora wrote:
> On Feb 15, 2008 10:00 PM, Robert Hancock <rwh461@mail.usask.ca> wrote:
>> Well, in order for the CPU to batch up more writes you'd have to map the
>>   BAR as either write-combining or write-back. If it's not listed in
>> /proc/mtrr it will be the default setting of uncacheable.
> 
> Ok, this is pretty much what I thought, but I still don't really have
> any idea how to do this.  ioremap() doesn't take any flags and I'm not
> using ioremap_uncacheable(), plus the BAR is marked prefetchable...

Likely easiest to do it from userspace by writing into /proc/mtrr to 
change the memory type attributes. Have a look at Documentation/mtrr.txt.

> 
>> X has code to
>> set up the video memory on the video card as write-combining so it can
>> get better write performance, you could do something similar.
> 
> Alan mentioned this as well, but I haven't tried to hunt this code
> yet.  If you have any pointers as to where I might find this, I would
> appreciate it.
> 
>> Setting it as write-back might allow you to get the reads to do bursting
>> as well (since the CPU will do a cache-line fill instead of individual
>> accesses)
> 
> I don't see what the cache write policy has to do with the reads.  If
> the region is marked cacheable, then all reads should try and read a
> cache line, right?  The write-back or write-through policy only has to
> do with the writes.  If it's write through then writes go directly to
> RAM, if it's write-back then they hit the cache and are flushed when
> the  line is flushed (LRU replacement, explicit cache line flush,
> etc..), right?

That caching attribute affects reads as well. If it's marked uncacheable 
or write-combining then reads will never be cached, only if it's marked 
write-back.

> 
>> but this if the device is modifying this memory area, unless
>> you add code to invalidate those cache lines before reading the data
>> you'll get stale data back.
> 
> Yeah this could definitely be tricky, would pci_dma_sync suffice for this?

No, that's not meant to handle this case of stale data in the CPU's 
cache since that doesn't normally happen. Something like clflush or 
wbinvd would do it, those being x86 specific of course..

> 
>> You could run into some other less obvious
>> issues as well, as normally device memory regions are not mapped write-back.
>>
>> In general, especially if you need to read data back from the device,
>> implementing a DMA engine would be by far the better option. Most
>> chipsets seem not at all optimized for handling sequential reads from
>> PCI memory from the CPU. (Even in the DMA case, you have to be careful
>> with what type of memory read transaction you use when transferring from
>> host memory - some chipsets don't like to burst more than one cycle if
>> you use normal Memory Read instead of Memory Read Line or Memory Read
>> Multiple.)
> 
> True enough... Fortunately my device allows me to set these...
> 
> What I am trying to avoid is PCI read transactions in general.  PCI
> reads are slow pretty much no matter if they are originated from the
> device or from the host because of all the multitude of bridges they
> have to go through (I've seen 5 in some cases... sheesh).  So
> ultimately I like for everything going to the device to be written
> from the host, then everything going towards the host be DMA'd into
> RAM by the device, at least then we can take advantage of PCI write
> posting and you don't have to wait for the write to actually complete
> before we plod on.  But this depends on at least getting get write
> burst performance from the host so that the time to write the data
> from host is less than the time it would take for the device to read
> the data out of RAM.
> 
> thanks again for your help!
> dan

Setting write-combining should be fairly easy without too many wierd 
side effects. Trying to use write-back to get burst reads is potentially 
doable, but may be fraught with difficulty.

I think DMA in both directions is still likely better though, unless the 
data you are writing is very small. Most chipsets have pretty small 
posting buffers so the amount it will help you is small. If you fill 
them up you'll just stall the CPU. With doing a DMA read, at least only 
the device will stall.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCI Bursting with PIO
  2008-02-15 18:00   ` Dan Gora
  2008-02-15 18:41     ` H. Peter Anvin
@ 2008-02-15 19:00     ` Alan Cox
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Cox @ 2008-02-15 19:00 UTC (permalink / raw)
  To: Dan Gora; +Cc: linux-kernel

On Fri, 15 Feb 2008 10:00:28 -0800
"Dan Gora" <dan.gora@gmail.com> wrote:

> On Fri, Feb 15, 2008 at 5:02 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > > Is there any way to get PIO to burst over the PCI bus in the read and
> >  > write direction?  My device has 4 BAR registers, but the area where I
> >
> >  I think you are doign about as well as the X folks did when they spent
> >  time on trying to optimise pio transfers to and from graphics card RAM.
> >
> 
> That's good to know.  Do you have a link or anything to their
> discussion or some key words that I could hunt it down?

It was some time ago but a look at the X tree will find you the code.
It's basically the same as you did - using MMX.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCI Bursting with PIO
  2008-02-15 18:00   ` Dan Gora
@ 2008-02-15 18:41     ` H. Peter Anvin
  2008-02-15 19:00     ` Alan Cox
  1 sibling, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2008-02-15 18:41 UTC (permalink / raw)
  To: Dan Gora; +Cc: linux-kernel

Dan Gora wrote:
>>
>>  Put a DMA controller on it ;)
> 
> Ugh.. sadly that's what's coming.  I really don't get why the
> northbridge cannot burst however.

Because the early Intel northbridges didn't, so noone else bothered 
either, since everyone designed their hardware to not require that 
capability.

	-hpa

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCI Bursting with PIO
  2008-02-15 13:02 ` Alan Cox
@ 2008-02-15 18:00   ` Dan Gora
  2008-02-15 18:41     ` H. Peter Anvin
  2008-02-15 19:00     ` Alan Cox
  0 siblings, 2 replies; 10+ messages in thread
From: Dan Gora @ 2008-02-15 18:00 UTC (permalink / raw)
  To: linux-kernel

On Fri, Feb 15, 2008 at 5:02 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > Is there any way to get PIO to burst over the PCI bus in the read and
>  > write direction?  My device has 4 BAR registers, but the area where I
>
>  I think you are doign about as well as the X folks did when they spent
>  time on trying to optimise pio transfers to and from graphics card RAM.
>

That's good to know.  Do you have a link or anything to their
discussion or some key words that I could hunt it down?

>
>  > Any ideas would be really appreciated,
>
>  Put a DMA controller on it ;)

Ugh.. sadly that's what's coming.  I really don't get why the
northbridge cannot burst however.  If the memory is mapped
prefetchable and you have to do a PCI read through 3 PCIe bridges to
finally get to your device it seems like it would _really_ behoove the
bridge to do a Memory read multiple and get the whole cache line.  I
have searched around a lot and there doesn't seem to be any info at
all on how you can persuade these bridges to do different PCI commands
or burst.  I don't know why....

thanks again for your help,

dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCI Bursting with PIO
  2008-02-15 10:54 ` Andi Kleen
@ 2008-02-15 17:55   ` Dan Gora
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Gora @ 2008-02-15 17:55 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On Fri, Feb 15, 2008 at 2:54 AM, Andi Kleen <andi@firstfloor.org> wrote:
> "Dan Gora" <dan.gora@gmail.com> writes:
>  >
>  > Is there any way to get PIO
>
>  I assume you really mean MMIO, not PIO. PIO would be port IO.

Sorry, I always saw it referred to as "Programmed I/O" as opposed to DMA...

>  You should set the MMIO mapping to write combining using an MTRR

Sorry to be thick here, but how would  I go about doing that?

>  You might need to add appropiate memory barriers if you rely
>  on write ordering though.

Ok, thanks for the info...

dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCI Bursting with PIO
  2008-02-15  3:28 Dan Gora
  2008-02-15 10:54 ` Andi Kleen
@ 2008-02-15 13:02 ` Alan Cox
  2008-02-15 18:00   ` Dan Gora
  1 sibling, 1 reply; 10+ messages in thread
From: Alan Cox @ 2008-02-15 13:02 UTC (permalink / raw)
  To: Dan Gora; +Cc: linux-kernel

> Is there any way to get PIO to burst over the PCI bus in the read and
> write direction?  My device has 4 BAR registers, but the area where I

I think you are doign about as well as the X folks did when they spent
time on trying to optimise pio transfers to and from graphics card RAM.

> Any ideas would be really appreciated,

Put a DMA controller on it ;)

Alan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PCI Bursting with PIO
  2008-02-15  3:28 Dan Gora
@ 2008-02-15 10:54 ` Andi Kleen
  2008-02-15 17:55   ` Dan Gora
  2008-02-15 13:02 ` Alan Cox
  1 sibling, 1 reply; 10+ messages in thread
From: Andi Kleen @ 2008-02-15 10:54 UTC (permalink / raw)
  To: Dan Gora; +Cc: linux-kernel

"Dan Gora" <dan.gora@gmail.com> writes:
>
> Is there any way to get PIO 

I assume you really mean MMIO, not PIO. PIO would be port IO.

> to burst over the PCI bus in the read and
> write direction? 

You should set the MMIO mapping to write combining using an MTRR
You might need to add appropiate memory barriers if you rely
on write ordering though.

-Andi

^ permalink raw reply	[flat|nested] 10+ messages in thread

* PCI Bursting with PIO
@ 2008-02-15  3:28 Dan Gora
  2008-02-15 10:54 ` Andi Kleen
  2008-02-15 13:02 ` Alan Cox
  0 siblings, 2 replies; 10+ messages in thread
From: Dan Gora @ 2008-02-15  3:28 UTC (permalink / raw)
  To: linux-kernel

Hi,

I am trying to optimize a driver for a slave only PCI device and am
having a lot of trouble getting any kind of PCI burst transactions in
either the read or the write direction.  Using bcopy/memcpy or even a
hand-crafted while (len) { *pdst++ = *psrc++} (with pdst and psrc
unsigned long*) I can only get writes to burst and even in that case
only for 2 data phases (8 bytes) and only on 64 bit machines.  The
best that I have managed is to use a hand crafted asm function which
copies the data through mmx registers on i386 machines, but that still
only bursts a maximum of 16 bytes in the write direction and not at
all in the read direction.  The source and destination pointers are
both aligned to 8 byte boundaries, so I don't think that it's an
alignment issue.

Is there any way to get PIO to burst over the PCI bus in the read and
write direction?  My device has 4 BAR registers, but the area where I
am transferring data is marked 'prefetchable' (although the others are
not).  I read here: http://lkml.org/lkml/2004/9/23/393 that this was a
prerequisite, but it is apparently not sufficient.  He also mentioned
that the area had to be marked as write-back, but it's not clear how
you can tell (no /proc/mtrr doesn't tell you) or that it has anything
to do with bursting reads.

Any ideas would be really appreciated,

thanks-
dan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-02-17 19:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fa.Mbzb1/2dWnp5V/5ElzijlkAstZU@ifi.uio.no>
2008-02-16  6:00 ` PCI Bursting with PIO Robert Hancock
2008-02-17  4:53   ` Dan Gora
     [not found] <fa.QF1nJvJhMpLvtquNDa6sbADHwhs@ifi.uio.no>
     [not found] ` <fa.LrKLA2l3F3abMpHk4aDjFfwzFVI@ifi.uio.no>
     [not found]   ` <fa.kFanAY/O5VMnSf6YXqEyxmsR62U@ifi.uio.no>
2008-02-17 19:06     ` Robert Hancock
2008-02-15  3:28 Dan Gora
2008-02-15 10:54 ` Andi Kleen
2008-02-15 17:55   ` Dan Gora
2008-02-15 13:02 ` Alan Cox
2008-02-15 18:00   ` Dan Gora
2008-02-15 18:41     ` H. Peter Anvin
2008-02-15 19:00     ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).