LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [RFC PATCH] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs @ 2019-05-10 19:25 Uenal Mutlu [not found] ` <jwvk1ex6rvb.fsf-monnier+gmane.comp.hardware.netbook.arm.sunxi@gnu.org> 2019-05-12 12:12 ` Maxime Ripard 0 siblings, 2 replies; 5+ messages in thread From: Uenal Mutlu @ 2019-05-10 19:25 UTC (permalink / raw) To: Jens Axboe, Maxime Ripard, Chen-Yu Tsai, linux-ide, linux-arm-kernel, linux-kernel Cc: Uenal Mutlu, linux-sunxi, u-boot, linux-amarula, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from default 0x0 each to 0x3 each gives a write performance boost of 120MB/s from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s [tested on SSD using dd bs=4K count=512K]. Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf). These devices are RaspberryPi-like small devices. RFC: Since more than about 25 similar SBC/SoC models do use the ahci_sunxi driver, users are encouraged to test it on all the affected boards and give feedback. List of the affected sunxi and other boards and SoCs with SATA using the ahci_sunxi driver: $ grep -i -e "^&ahci" arch/arm/boot/dts/sun*dts and http://linux-sunxi.org/Category:Devices_with_SATA_port Signed-off-by: Uenal Mutlu <um@mutluit.com> --- drivers/ata/ahci_sunxi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/ata/ahci_sunxi.c b/drivers/ata/ahci_sunxi.c index 911710643305..257986431c79 100644 --- a/drivers/ata/ahci_sunxi.c +++ b/drivers/ata/ahci_sunxi.c @@ -158,7 +158,7 @@ static void ahci_sunxi_start_engine(struct ata_port *ap) struct ahci_host_priv *hpriv = ap->host->private_data; /* Setup DMA before DMA start */ - sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ff00, 0x00004400); + sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ffff, 0x00004433); /* Start DMA */ sunxi_setbits(port_mmio + PORT_CMD, PORT_CMD_START); -- 2.11.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <jwvk1ex6rvb.fsf-monnier+gmane.comp.hardware.netbook.arm.sunxi@gnu.org>]
* Re: [RFC PATCH] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs [not found] ` <jwvk1ex6rvb.fsf-monnier+gmane.comp.hardware.netbook.arm.sunxi@gnu.org> @ 2019-05-11 18:12 ` U.Mutlu 0 siblings, 0 replies; 5+ messages in thread From: U.Mutlu @ 2019-05-11 18:12 UTC (permalink / raw) To: Stefan Monnier, linux-ide Cc: linux-sunxi, linux-arm-kernel, linux-kernel, u-boot Stefan Monnier wrote on 05/11/2019 03:37 PM: >> Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from >> default 0x0 each to 0x3 each gives a write performance boost of 120MB/s >> from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s >> [tested on SSD using dd bs=4K count=512K]. > > Such a simple patch to fix such a long-standing performance problem that > everyone [ well, apparently not quite everyone ] assumed was a hardware > limitation... > > And yet, April 1st is long gone. > > Is it really for real? Yes, it's indeed real, Stefan; really no April 1st joke. :-) As you indicated, this problem of slow SATA write-speed with these small devices lasts now for more than 5 years. This patch finally solves the problem. On my test device (BPI-R1) the optimum blocksize seems to be 12K as it then gives even 129 MB/s write speed. Here are some test results with different blocksizes, all giving a write speed of 125 to 129 MB/s: time sh -c "dd if=/dev/zero of=test.tmp bs=$bs count=$count conv=fdatasync" ------------ bs=8K / count=256K / 1 ------------------ 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 16.9237 s, 127 MB/s real 0m16.935s user 0m0.388s sys 0m15.777s ------------ bs=8K / count=256K / 2 ------------------ 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 16.9916 s, 126 MB/s real 0m17.973s user 0m0.326s sys 0m16.806s ------------ bs=8K / count=256K / 3 ------------------ 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 17.0085 s, 126 MB/s real 0m17.993s user 0m0.442s sys 0m16.588s ------------ bs=12K / count=171K / 1 ------------------ 175104+0 records in 175104+0 records out 2151677952 bytes (2.2 GB) copied, 16.8474 s, 128 MB/s real 0m16.860s user 0m0.205s sys 0m15.705s ------------ bs=12K / count=171K / 2 ------------------ 175104+0 records in 175104+0 records out 2151677952 bytes (2.2 GB) copied, 16.6934 s, 129 MB/s real 0m17.669s user 0m0.227s sys 0m16.355s ------------ bs=12K / count=171K / 3 ------------------ 175104+0 records in 175104+0 records out 2151677952 bytes (2.2 GB) copied, 16.6684 s, 129 MB/s real 0m17.654s user 0m0.388s sys 0m16.118s ------------ bs=16K / count=128K / 1 ------------------ 131072+0 records in 131072+0 records out 2147483648 bytes (2.1 GB) copied, 17.1845 s, 125 MB/s real 0m17.200s user 0m0.251s sys 0m16.060s ------------ bs=16K / count=128K / 2 ------------------ 131072+0 records in 131072+0 records out 2147483648 bytes (2.1 GB) copied, 16.9221 s, 127 MB/s real 0m17.902s user 0m0.170s sys 0m16.763s ------------ bs=16K / count=128K / 3 ------------------ 131072+0 records in 131072+0 records out 2147483648 bytes (2.1 GB) copied, 16.8845 s, 127 MB/s real 0m17.868s user 0m0.167s sys 0m16.736s ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs 2019-05-10 19:25 [RFC PATCH] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs Uenal Mutlu [not found] ` <jwvk1ex6rvb.fsf-monnier+gmane.comp.hardware.netbook.arm.sunxi@gnu.org> @ 2019-05-12 12:12 ` Maxime Ripard 2019-05-12 16:08 ` U.Mutlu 1 sibling, 1 reply; 5+ messages in thread From: Maxime Ripard @ 2019-05-12 12:12 UTC (permalink / raw) To: Uenal Mutlu Cc: Jens Axboe, Chen-Yu Tsai, linux-ide, linux-arm-kernel, linux-kernel, linux-sunxi, u-boot, linux-amarula, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara Hi, On Fri, May 10, 2019 at 09:25:50PM +0200, Uenal Mutlu wrote: > Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from > default 0x0 each to 0x3 each gives a write performance boost of 120MB/s > from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s > [tested on SSD using dd bs=4K count=512K]. > > Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs > with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf). > These devices are RaspberryPi-like small devices. > > RFC: Since more than about 25 similar SBC/SoC models do use the > ahci_sunxi driver, users are encouraged to test it on all the > affected boards and give feedback. > > List of the affected sunxi and other boards and SoCs with SATA using > the ahci_sunxi driver: > $ grep -i -e "^&ahci" arch/arm/boot/dts/sun*dts > and http://linux-sunxi.org/Category:Devices_with_SATA_port > > Signed-off-by: Uenal Mutlu <um@mutluit.com> > --- > drivers/ata/ahci_sunxi.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/ata/ahci_sunxi.c b/drivers/ata/ahci_sunxi.c > index 911710643305..257986431c79 100644 > --- a/drivers/ata/ahci_sunxi.c > +++ b/drivers/ata/ahci_sunxi.c > @@ -158,7 +158,7 @@ static void ahci_sunxi_start_engine(struct ata_port *ap) > struct ahci_host_priv *hpriv = ap->host->private_data; > > /* Setup DMA before DMA start */ > - sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ff00, 0x00004400); > + sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ffff, 0x00004433); Having comments / defines here would be great, once fixed: Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Maxime -- Maxime Ripard, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs 2019-05-12 12:12 ` Maxime Ripard @ 2019-05-12 16:08 ` U.Mutlu 2019-05-12 17:40 ` Maxime Ripard 0 siblings, 1 reply; 5+ messages in thread From: U.Mutlu @ 2019-05-12 16:08 UTC (permalink / raw) To: Maxime Ripard Cc: Jens Axboe, Chen-Yu Tsai, linux-ide, linux-arm-kernel, linux-kernel, linux-sunxi, u-boot, linux-amarula, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara Hi Maxime & Others, what follows is a somewhat lengthy technical story behind this patch; you can just skip it and jump to the end. As can be seen in the ahci_sunxi.c, the port used in this patch is this one (32bit): #define AHCI_P0DMACR 0x0170 It's a so called "Vendor Specific Port" according to the SATA/AHCI specs by Intel. The data behind it is actually a struct, consisting of 4 fields, each 4bits long, plus a 16bits long field that is marked as Reserved in secondary literature (see below): struct AHCI_P0DMACR_t { unsigned TXTS : 4, RXTS : 4, TXABL : 4, RXABL : 4, Res1 : 16; }; This struct is just my creation for my own tests as it's not part of the driver source. The patch touches only the first 2 fields: TXTS and RXTS. See this similar product documentation by Texas Instruments for the above struct: https://www.ti.com/lit/ug/sprugj8c/sprugj8c.pdf TMS320C674x/OMAP-L1x Processor, Serial ATA (SATA) Controller, User's Guide, Literature Number: SPRUGJ8C, March 2011, Page 68, Chapter 4.33 "Port DMA Control Register (P0DMACR)" The above TI document describes the two fields as follows: TXTS: Transmit Transaction Size (TX_TRANSACTION_SIZE). This field defines the DMA transaction size in DWORDs for transmit (system bus read, device write) operation. [...] RXTS: Receive Transaction Size (RX_TRANSACTION_SIZE). This field defines the Port DMA transaction size in DWORDs for receive (system bus write, device read) operation. [...] So, in my patch the fields TXTS and RXTS are set to 3 each. Without the patch, these fields seem to have some random content (I'vee seen 5 and 6, 4 and 4, and 0 and 0 on different devices), as the previous code doesn't touch these 2 fields (ie. these two fields are not within the used old mask of 0xff00; cf. ahci_sunxi.c, function ahci_sunxi_start_engine(...)). Some background story in my hunt for obtaining product documentation: I couldn't find any product documentation for the SATA/AHCI in these SoCs by Allwinner Technology (allwinnertech.com), unlike with such products from other such companies. I asked Allwinner, but they just said that the A20 of my SBC would (allegedly) no more be actual and that the support for it has ended (but this statement somehow cannot be true as the A20 SoC is still continued being marketed at their website). They instead sent me a bunch of really irrelevant PDFs which have nothing to do with SATA/AHCI. So, the company Allwinner Technology unfortunately was not cooperative to provide me information on their SATA/AHCI-implementation in their SoCs :-( Even the ports used in the actual ahci_sunxi.c in the linux tree are undocumented; it is even commented with "/* This magic is from the original code */" and below it many ports are used for which no documentation is available, or at least I couldn't find any on the Internet. And the initial programmer in 2014 and prior was Daniel Wang (danielwang@allwinnertech.com), but email to that address bounces. So, I was forced to research secondary literature from other vendors like Texas Instruments (thanks TI !) and Intel, and also studying very old source codes in the old Linux repositories (as it differs much from the current version) going back to the year 2014, and had to do many (blind) experiments until I found this solution. The above given User's Guide by Texas Instruments (and their such documents for their newer such products) helped me much to find the solution. It's of course not really the correct documentation for the Allwinner SoCs, but still better than nothing. If I only had the right documentation, then I for sure could try to further improve that already achieved result by this patch, as with SATA-II upto 300 MiB/s is possible. Yes, I'll resend the patch with some appropriate comments. Uenal Mutlu Maxime Ripard wrote on 05/12/2019 02:12 PM: > Hi, > > On Fri, May 10, 2019 at 09:25:50PM +0200, Uenal Mutlu wrote: >> Increasing the SATA/AHCI DMA TX/RX FIFOs (P0DMACR.TXTS and .RXTS) from >> default 0x0 each to 0x3 each gives a write performance boost of 120MB/s >> from lame 36MB/s to 45MB/s previously. Read performance is about 200MB/s >> [tested on SSD using dd bs=4K count=512K]. >> >> Tested on the Banana Pi R1 (aka Lamobo R1) and Banana Pi M1 SBCs >> with Allwinner A20 32bit-SoCs (ARMv7-a / arm-linux-gnueabihf). >> These devices are RaspberryPi-like small devices. >> >> RFC: Since more than about 25 similar SBC/SoC models do use the >> ahci_sunxi driver, users are encouraged to test it on all the >> affected boards and give feedback. >> >> List of the affected sunxi and other boards and SoCs with SATA using >> the ahci_sunxi driver: >> $ grep -i -e "^&ahci" arch/arm/boot/dts/sun*dts >> and http://linux-sunxi.org/Category:Devices_with_SATA_port >> >> Signed-off-by: Uenal Mutlu <um@mutluit.com> >> --- >> drivers/ata/ahci_sunxi.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/ata/ahci_sunxi.c b/drivers/ata/ahci_sunxi.c >> index 911710643305..257986431c79 100644 >> --- a/drivers/ata/ahci_sunxi.c >> +++ b/drivers/ata/ahci_sunxi.c >> @@ -158,7 +158,7 @@ static void ahci_sunxi_start_engine(struct ata_port *ap) >> struct ahci_host_priv *hpriv = ap->host->private_data; >> >> /* Setup DMA before DMA start */ >> - sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ff00, 0x00004400); >> + sunxi_clrsetbits(hpriv->mmio + AHCI_P0DMACR, 0x0000ffff, 0x00004433); > > Having comments / defines here would be great, once fixed: > Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> > > Maxime > > -- > Maxime Ripard, Bootlin > Embedded Linux and Kernel engineering > https://bootlin.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs 2019-05-12 16:08 ` U.Mutlu @ 2019-05-12 17:40 ` Maxime Ripard 0 siblings, 0 replies; 5+ messages in thread From: Maxime Ripard @ 2019-05-12 17:40 UTC (permalink / raw) To: U.Mutlu Cc: Jens Axboe, Chen-Yu Tsai, linux-ide, linux-arm-kernel, linux-kernel, linux-sunxi, u-boot, linux-amarula, Jagan Teki, Pablo Greco, Mark Rutland, Oliver Schinagl, Linus Walleij, Hans de Goede, FUKAUMI Naoki, Andre Przywara Hi, On Sun, May 12, 2019 at 06:08:19PM +0200, U.Mutlu wrote: > Hi Maxime & Others, > > what follows is a somewhat lengthy technical story behind this patch; > you can just skip it and jump to the end. > > > As can be seen in the ahci_sunxi.c, the port used in this patch > is this one (32bit): > #define AHCI_P0DMACR 0x0170 > It's a so called "Vendor Specific Port" according to the SATA/AHCI specs by Intel. > The data behind it is actually a struct, consisting of 4 fields, > each 4bits long, plus a 16bits long field that is marked as Reserved > in secondary literature (see below): > > struct AHCI_P0DMACR_t > { > unsigned TXTS : 4, > RXTS : 4, > TXABL : 4, > RXABL : 4, > Res1 : 16; > }; > > This struct is just my creation for my own tests as it's not part of the > driver source. The patch touches only the first 2 fields: TXTS and RXTS. > > See this similar product documentation by Texas Instruments for the above struct: > https://www.ti.com/lit/ug/sprugj8c/sprugj8c.pdf > TMS320C674x/OMAP-L1x Processor, Serial ATA (SATA) Controller, User's Guide, > Literature Number: SPRUGJ8C, March 2011, > Page 68, Chapter 4.33 "Port DMA Control Register (P0DMACR)" > > The above TI document describes the two fields as follows: > > TXTS: > Transmit Transaction Size (TX_TRANSACTION_SIZE). This field defines the > DMA transaction size in > DWORDs for transmit (system bus read, device write) operation. [...] > > RXTS: > Receive Transaction Size (RX_TRANSACTION_SIZE). This field defines the > Port DMA transaction size > in DWORDs for receive (system bus write, device read) operation. [...] > > > So, in my patch the fields TXTS and RXTS are set to 3 each. > Without the patch, these fields seem to have some random content > (I'vee seen 5 and 6, 4 and 4, and 0 and 0 on different devices), > as the previous code doesn't touch these 2 fields (ie. these two fields > are not within the used old mask of 0xff00; cf. ahci_sunxi.c, function > ahci_sunxi_start_engine(...)). > > > Some background story in my hunt for obtaining product documentation: > > I couldn't find any product documentation for the SATA/AHCI > in these SoCs by Allwinner Technology (allwinnertech.com), > unlike with such products from other such companies. > > I asked Allwinner, but they just said that the A20 of my SBC > would (allegedly) no more be actual and that the support for it > has ended (but this statement somehow cannot be true as the > A20 SoC is still continued being marketed at their website). > They instead sent me a bunch of really irrelevant PDFs which have > nothing to do with SATA/AHCI. > > So, the company Allwinner Technology unfortunately was not cooperative > to provide me information on their SATA/AHCI-implementation in their SoCs :-( > Even the ports used in the actual ahci_sunxi.c in the linux tree are undocumented; > it is even commented with "/* This magic is from the original code */" > and below it many ports are used for which no documentation is available, > or at least I couldn't find any on the Internet. And the initial programmer > in 2014 and prior was Daniel Wang (danielwang@allwinnertech.com), > but email to that address bounces. > > So, I was forced to research secondary literature from other vendors > like Texas Instruments (thanks TI !) and Intel, and also studying > very old source codes in the old Linux repositories (as it differs > much from the current version) going back to the year 2014, and had > to do many (blind) experiments until I found this solution. > > The above given User's Guide by Texas Instruments (and their such > documents for their newer such products) helped me much to find the solution. > It's of course not really the correct documentation for the Allwinner SoCs, > but still better than nothing. > > If I only had the right documentation, then I for sure could try > to further improve that already achieved result by this patch, > as with SATA-II upto 300 MiB/s is possible. > > > Yes, I'll resend the patch with some appropriate comments. That's awesome research and explanation, thanks! :) Maxime -- Maxime Ripard, Bootlin Embedded Linux and Kernel engineering https://bootlin.com ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-05-12 17:40 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-10 19:25 [RFC PATCH] drivers: ata: ahci_sunxi: Increased SATA/AHCI DMA TX/RX FIFOs Uenal Mutlu [not found] ` <jwvk1ex6rvb.fsf-monnier+gmane.comp.hardware.netbook.arm.sunxi@gnu.org> 2019-05-11 18:12 ` U.Mutlu 2019-05-12 12:12 ` Maxime Ripard 2019-05-12 16:08 ` U.Mutlu 2019-05-12 17:40 ` Maxime Ripard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).