LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx
@ 2007-03-23  6:51 Dan Williams
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 01/15] dmaengine: add base support for the async_tx api Dan Williams
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:51 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

The following patch set implements the async_tx api and modifies
md-raid5 to issue memory copies and xor calculations asynchronously.
Async_tx is an extension of the existing dmaengine interface in the
kernel.  Async_tx allows kernel code to utilize application specific
acceleration engines when present and fall back to software routines
otherwise.  Further details can be found in the individual patch
headers. 
 
The implementation has been in -mm since 2.6.20-mm1 and has been
released in various forms to IOP users since 2.6.18
(http://sourceforge.net/projects/xscaleiop).  This release includes
cleanups to the iop-adma driver.  The md-raid5 patches are just rebased
versions of the 2.6.20-rc5 release. 
 
Given the release of an acceleration engine driver for another platform
(440spe http://marc.info/?l=linux-raid&m=117400143317440&w=2), I feel
more comfortable broaching the subject of 2.6.22 inclusion.
 
Regards, 
Dan
 
 
Dan Williams (15):
        dmaengine: add base support for the async_tx api
        ARM: Add drivers/dma to arch/arm/Kconfig
        dmaengine: add the async_tx api
        md: add raid5_run_ops and support routines
        md: use raid5_run_ops for stripe cache operations
        md: move write operations to raid5_run_ops
        md: move raid5 compute block operations to raid5_run_ops
        md: move raid5 parity checks to raid5_run_ops
        md: satisfy raid5 read requests via raid5_run_ops
        md: use async_tx and raid5_run_ops for raid5 expansion operations
        md: move raid5 io requests to raid5_run_ops
        md: remove raid5 compute_block and compute_parity5
        dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
        iop13xx: Surface the iop13xx adma units to the iop-adma driver
        iop3xx: Surface the iop3xx DMA and AAU units to the iop-adma driver
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 01/15] dmaengine: add base support for the async_tx api
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
@ 2007-03-23  6:51 ` Dan Williams
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 02/15] ARM: Add drivers/dma to arch/arm/Kconfig Dan Williams
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:51 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

The async_tx api provides methods for describing a chain of asynchronous
bulk memory transfers/transforms with support for inter-transactional
dependencies.  It is implemented as a dmaengine client
that smooths over the details of different hardware offload engine
implementations.  Code that is written to the api can optimize for
asynchrnous operation and the api will fit the chain of operations to the
available offload resources.

Currently the raid5 implementation in the MD raid456 driver has been
converted to the async_tx api.  A driver for the offload engines on the
Intel Xscale series of I/O processors, iop-adma, is provided.  With the
iop-adma driver and async_txi, raid456 is able to offload copy, xor, and
xor-zero-sum operations to hardware engines.

On iop342 tiobench showed higher throughput for sequential writes (20 - 30%
improvement) and sequential reads to a degraded array (40 - 55%
improvement).  For the other cases performance was roughly equal, +/- a few
percentage points.  On a x86-smp platform the performance of the async_tx
implementation (in synchronous mode) was also +/- a few percentage points
of the original implementation.  According to 'top' CPU utilization was
positively affected in the offload case, but exact measurements have yet to
be taken.

The tiobench command line used for testing was:
tiobench --size 2048 --block 4096 --block 131072 --dir /mnt/raid --numruns 5
* iop342 had 1GB of memory available

This patch:
1/ introduces struct dma_async_tx_descriptor as a common field for all dmaengine
software descriptors
2/ converts the device_memcpy_* methods into separate prep, set src/dest, and
submit stages
3/ adds support for capabilities beyond memcpy (xor, memset, xor zero sum, completion
interrupts)
4/ converts ioatdma to the new semantics

Changelog:
* drop dma mapping methods, suggested by Chris Leech
* fix ioat_dma_dependency_added, also caught by Andrew Morton
* fix dma_sync_wait, change from Andrew Morton
* uninline large functions, change from Andrew Morton
* add tx->callback = NULL to dmaengine calls to interoperate with async_tx
  calls

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/dma/dmaengine.c   |  194 ++++++++++++++++++++++++++++++++++-
 drivers/dma/ioatdma.c     |  248 ++++++++++++++++++++-------------------------
 drivers/dma/ioatdma.h     |    8 +
 include/linux/dmaengine.h |  237 ++++++++++++++++++++++++-------------------
 4 files changed, 439 insertions(+), 248 deletions(-)

diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 322ee29..2285f33 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -59,6 +59,7 @@
 
 #include <linux/init.h>
 #include <linux/module.h>
+#include <linux/mm.h>
 #include <linux/device.h>
 #include <linux/dmaengine.h>
 #include <linux/hardirq.h>
@@ -66,6 +67,7 @@
 #include <linux/percpu.h>
 #include <linux/rcupdate.h>
 #include <linux/mutex.h>
+#include <linux/jiffies.h>
 
 static DEFINE_MUTEX(dma_list_mutex);
 static LIST_HEAD(dma_device_list);
@@ -165,6 +167,24 @@ static struct dma_chan *dma_client_chan_alloc(struct dma_client *client)
 	return NULL;
 }
 
+enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie)
+{
+	enum dma_status status;
+	unsigned long dma_sync_wait_timeout = jiffies + msecs_to_jiffies(5000);
+
+	dma_async_issue_pending(chan);
+	do {
+		status = dma_async_is_tx_complete(chan, cookie, NULL, NULL);
+		if (time_after_eq(jiffies, dma_sync_wait_timeout)) {
+			printk(KERN_ERR "dma_sync_wait_timeout!\n");
+			return DMA_ERROR;
+		}
+	} while (status == DMA_IN_PROGRESS);
+
+	return status;
+}
+EXPORT_SYMBOL(dma_sync_wait);
+
 /**
  * dma_chan_cleanup - release a DMA channel's resources
  * @kref: kernel reference structure that contains the DMA channel device
@@ -211,7 +231,8 @@ static void dma_chans_rebalance(void)
 	mutex_lock(&dma_list_mutex);
 
 	list_for_each_entry(client, &dma_client_list, global_node) {
-		while (client->chans_desired > client->chan_count) {
+		while (client->chans_desired < 0 ||
+			client->chans_desired > client->chan_count) {
 			chan = dma_client_chan_alloc(client);
 			if (!chan)
 				break;
@@ -220,7 +241,8 @@ static void dma_chans_rebalance(void)
 	                                       chan,
 	                                       DMA_RESOURCE_ADDED);
 		}
-		while (client->chans_desired < client->chan_count) {
+		while (client->chans_desired >= 0 &&
+			client->chans_desired < client->chan_count) {
 			spin_lock_irqsave(&client->lock, flags);
 			chan = list_entry(client->channels.next,
 			                  struct dma_chan,
@@ -297,12 +319,12 @@ EXPORT_SYMBOL(dma_async_client_unregister);
  * @number: count of DMA channels requested
  *
  * Clients call dma_async_client_chan_request() to specify how many
- * DMA channels they need, 0 to free all currently allocated.
+ * DMA channels they need, 0 to free all currently allocated. A request
+ * < 0 indicates the client wants to handle all engines in the system.
  * The resulting allocations/frees are indicated to the client via the
  * event callback.
  */
-void dma_async_client_chan_request(struct dma_client *client,
-			unsigned int number)
+void dma_async_client_chan_request(struct dma_client *client, int number)
 {
 	client->chans_desired = number;
 	dma_chans_rebalance();
@@ -322,6 +344,28 @@ int dma_async_device_register(struct dma_device *device)
 	if (!device)
 		return -ENODEV;
 
+	/* validate device routines */
+	BUG_ON(test_bit(DMA_MEMCPY, &device->capabilities) &&
+		!device->device_prep_dma_memcpy);
+	BUG_ON(test_bit(DMA_XOR, &device->capabilities) &&
+		!device->device_prep_dma_xor);
+	BUG_ON(test_bit(DMA_ZERO_SUM, &device->capabilities) &&
+		!device->device_prep_dma_zero_sum);
+	BUG_ON(test_bit(DMA_MEMSET, &device->capabilities) &&
+		!device->device_prep_dma_memset);
+	BUG_ON(test_bit(DMA_ZERO_SUM, &device->capabilities) &&
+		!device->device_prep_dma_interrupt);
+
+	BUG_ON(!device->device_alloc_chan_resources);
+	BUG_ON(!device->device_free_chan_resources);
+	BUG_ON(!device->device_tx_submit);
+	BUG_ON(!device->device_set_dest);
+	BUG_ON(!device->device_set_src);
+	BUG_ON(!device->device_dependency_added);
+	BUG_ON(!device->device_is_tx_complete);
+	BUG_ON(!device->device_issue_pending);
+	BUG_ON(!device->dev);
+
 	init_completion(&device->done);
 	kref_init(&device->refcount);
 	device->dev_id = id++;
@@ -397,6 +441,146 @@ void dma_async_device_unregister(struct dma_device *device)
 }
 EXPORT_SYMBOL(dma_async_device_unregister);
 
+/**
+ * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
+ * @chan: DMA channel to offload copy to
+ * @dest: destination address (virtual)
+ * @src: source address (virtual)
+ * @len: length
+ *
+ * Both @dest and @src must be mappable to a bus address according to the
+ * DMA mapping API rules for streaming mappings.
+ * Both @dest and @src must stay memory resident (kernel memory or locked
+ * user space pages).
+ */
+dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
+        void *dest, void *src, size_t len)
+{
+	struct dma_device *dev = chan->device;
+	struct dma_async_tx_descriptor *tx;
+	dma_addr_t addr;
+	dma_cookie_t cookie;
+	int cpu;
+
+	tx = dev->device_prep_dma_memcpy(chan, len, 0);
+	if (!tx)
+		return -ENOMEM;
+
+	tx->ack = 1;
+	tx->callback = NULL;
+	addr = dma_map_single(dev->dev, src, len, DMA_TO_DEVICE);
+	dev->device_set_src(addr, tx, 0);
+	addr = dma_map_single(dev->dev, dest, len, DMA_FROM_DEVICE);
+	dev->device_set_dest(addr, tx, 0);
+	cookie = dev->device_tx_submit(tx);
+
+	cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return cookie;
+}
+EXPORT_SYMBOL(dma_async_memcpy_buf_to_buf);
+
+/**
+ * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
+ * @chan: DMA channel to offload copy to
+ * @page: destination page
+ * @offset: offset in page to copy to
+ * @kdata: source address (virtual)
+ * @len: length
+ *
+ * Both @page/@offset and @kdata must be mappable to a bus address according
+ * to the DMA mapping API rules for streaming mappings.
+ * Both @page/@offset and @kdata must stay memory resident (kernel memory or
+ * locked user space pages)
+ */
+dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
+        struct page *page, unsigned int offset, void *kdata, size_t len)
+{
+	struct dma_device *dev = chan->device;
+	struct dma_async_tx_descriptor *tx;
+	dma_addr_t addr;
+	dma_cookie_t cookie;
+	int cpu;
+
+	tx = dev->device_prep_dma_memcpy(chan, len, 0);
+	if (!tx)
+		return -ENOMEM;
+
+	tx->ack = 1;
+	tx->callback = NULL;
+	addr = dma_map_single(dev->dev, kdata, len, DMA_TO_DEVICE);
+	dev->device_set_src(addr, tx, 0);
+	addr = dma_map_page(dev->dev, page, offset, len, DMA_FROM_DEVICE);
+	dev->device_set_dest(addr, tx, 0);
+	cookie = dev->device_tx_submit(tx);
+
+	cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return cookie;
+}
+EXPORT_SYMBOL(dma_async_memcpy_buf_to_pg);
+
+/**
+ * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
+ * @chan: DMA channel to offload copy to
+ * @dest_pg: destination page
+ * @dest_off: offset in page to copy to
+ * @src_pg: source page
+ * @src_off: offset in page to copy from
+ * @len: length
+ *
+ * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
+ * address according to the DMA mapping API rules for streaming mappings.
+ * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
+ * (kernel memory or locked user space pages).
+ */
+dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
+        struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
+        unsigned int src_off, size_t len)
+{
+	struct dma_device *dev = chan->device;
+	struct dma_async_tx_descriptor *tx;
+	dma_addr_t addr;
+	dma_cookie_t cookie;
+	int cpu;
+
+	tx = dev->device_prep_dma_memcpy(chan, len, 0);
+	if (!tx)
+		return -ENOMEM;
+
+	tx->ack = 1;
+	tx->callback = NULL;
+	addr = dma_map_page(dev->dev, src_pg, src_off, len, DMA_TO_DEVICE);
+	dev->device_set_src(addr, tx, 0);
+	addr = dma_map_page(dev->dev, dest_pg, dest_off, len, DMA_FROM_DEVICE);
+	dev->device_set_dest(addr, tx, 0);
+	cookie = dev->device_tx_submit(tx);
+
+	cpu = get_cpu();
+	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
+	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
+	put_cpu();
+
+	return cookie;
+}
+EXPORT_SYMBOL(dma_async_memcpy_pg_to_pg);
+
+void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
+	struct dma_chan *chan)
+{
+	tx->chan = chan;
+	spin_lock_init(&tx->lock);
+	INIT_LIST_HEAD(&tx->depend_node);
+	INIT_LIST_HEAD(&tx->depend_list);
+}
+EXPORT_SYMBOL(dma_async_tx_descriptor_init);
+
 static int __init dma_bus_init(void)
 {
 	mutex_init(&dma_list_mutex);
diff --git a/drivers/dma/ioatdma.c b/drivers/dma/ioatdma.c
index 8e87261..6d0f32e 100644
--- a/drivers/dma/ioatdma.c
+++ b/drivers/dma/ioatdma.c
@@ -31,6 +31,7 @@
 #include <linux/dmaengine.h>
 #include <linux/delay.h>
 #include <linux/dma-mapping.h>
+#include <linux/async_tx.h>
 #include "ioatdma.h"
 #include "ioatdma_io.h"
 #include "ioatdma_registers.h"
@@ -39,6 +40,7 @@
 #define to_ioat_chan(chan) container_of(chan, struct ioat_dma_chan, common)
 #define to_ioat_device(dev) container_of(dev, struct ioat_device, common)
 #define to_ioat_desc(lh) container_of(lh, struct ioat_desc_sw, node)
+#define tx_to_ioat_desc(tx) container_of(tx, struct ioat_desc_sw, async_tx)
 
 /* internal functions */
 static int __devinit ioat_probe(struct pci_dev *pdev, const struct pci_device_id *ent);
@@ -99,6 +101,8 @@ static struct ioat_desc_sw *ioat_dma_alloc_descriptor(
 	}
 
 	memset(desc, 0, sizeof(*desc));
+	dma_async_tx_descriptor_init(&desc_sw->async_tx, &ioat_chan->common);
+	INIT_LIST_HEAD(&desc_sw->group_list);
 	desc_sw->hw = desc;
 	desc_sw->phys = phys;
 
@@ -215,45 +219,25 @@ static void ioat_dma_free_chan_resources(struct dma_chan *chan)
 	ioatdma_chan_write16(ioat_chan, IOAT_CHANCTRL_OFFSET, chanctrl);
 }
 
-/**
- * do_ioat_dma_memcpy - actual function that initiates a IOAT DMA transaction
- * @ioat_chan: IOAT DMA channel handle
- * @dest: DMA destination address
- * @src: DMA source address
- * @len: transaction length in bytes
- */
-
-static dma_cookie_t do_ioat_dma_memcpy(struct ioat_dma_chan *ioat_chan,
-                                       dma_addr_t dest,
-                                       dma_addr_t src,
-                                       size_t len)
+static struct dma_async_tx_descriptor *
+ioat_dma_prep_memcpy(struct dma_chan *chan, size_t len, int int_en)
 {
-	struct ioat_desc_sw *first;
-	struct ioat_desc_sw *prev;
-	struct ioat_desc_sw *new;
-	dma_cookie_t cookie;
+	struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+	struct ioat_desc_sw *first, *prev, *new;
 	LIST_HEAD(new_chain);
 	u32 copy;
 	size_t orig_len;
-	dma_addr_t orig_src, orig_dst;
-	unsigned int desc_count = 0;
-	unsigned int append = 0;
-
-	if (!ioat_chan || !dest || !src)
-		return -EFAULT;
+	int desc_count = 0;
 
 	if (!len)
-		return ioat_chan->common.cookie;
+		return NULL;
 
 	orig_len = len;
-	orig_src = src;
-	orig_dst = dest;
 
 	first = NULL;
 	prev = NULL;
 
 	spin_lock_bh(&ioat_chan->desc_lock);
-
 	while (len) {
 		if (!list_empty(&ioat_chan->free_desc)) {
 			new = to_ioat_desc(ioat_chan->free_desc.next);
@@ -270,9 +254,8 @@ static dma_cookie_t do_ioat_dma_memcpy(struct ioat_dma_chan *ioat_chan,
 
 		new->hw->size = copy;
 		new->hw->ctl = 0;
-		new->hw->src_addr = src;
-		new->hw->dst_addr = dest;
-		new->cookie = 0;
+		new->async_tx.cookie = 0;
+		new->async_tx.ack = 1;
 
 		/* chain together the physical address list for the HW */
 		if (!first)
@@ -281,129 +264,90 @@ static dma_cookie_t do_ioat_dma_memcpy(struct ioat_dma_chan *ioat_chan,
 			prev->hw->next = (u64) new->phys;
 
 		prev = new;
-
 		len  -= copy;
-		dest += copy;
-		src  += copy;
-
 		list_add_tail(&new->node, &new_chain);
 		desc_count++;
 	}
-	new->hw->ctl = IOAT_DMA_DESCRIPTOR_CTL_CP_STS;
-	new->hw->next = 0;
 
-	/* cookie incr and addition to used_list must be atomic */
+	list_splice(&new_chain, &new->group_list);
 
-	cookie = ioat_chan->common.cookie;
-	cookie++;
-	if (cookie < 0)
-		cookie = 1;
-	ioat_chan->common.cookie = new->cookie = cookie;
+	new->hw->ctl = IOAT_DMA_DESCRIPTOR_CTL_CP_STS;
+	new->hw->next = 0;
+	new->group_count = desc_count;
+	new->async_tx.ack = 0; /* client is in control of this ack */
+	new->async_tx.cookie = -EBUSY;
+	new->async_tx.type = DMA_MEMCPY;
 
-	pci_unmap_addr_set(new, src, orig_src);
-	pci_unmap_addr_set(new, dst, orig_dst);
 	pci_unmap_len_set(new, src_len, orig_len);
 	pci_unmap_len_set(new, dst_len, orig_len);
-
-	/* write address into NextDescriptor field of last desc in chain */
-	to_ioat_desc(ioat_chan->used_desc.prev)->hw->next = first->phys;
-	list_splice_init(&new_chain, ioat_chan->used_desc.prev);
-
-	ioat_chan->pending += desc_count;
-	if (ioat_chan->pending >= 20) {
-		append = 1;
-		ioat_chan->pending = 0;
-	}
-
 	spin_unlock_bh(&ioat_chan->desc_lock);
 
-	if (append)
-		ioatdma_chan_write8(ioat_chan,
-		                    IOAT_CHANCMD_OFFSET,
-		                    IOAT_CHANCMD_APPEND);
-	return cookie;
+	return new ? &new->async_tx : NULL;
 }
 
-/**
- * ioat_dma_memcpy_buf_to_buf - wrapper that takes src & dest bufs
- * @chan: IOAT DMA channel handle
- * @dest: DMA destination address
- * @src: DMA source address
- * @len: transaction length in bytes
- */
-
-static dma_cookie_t ioat_dma_memcpy_buf_to_buf(struct dma_chan *chan,
-                                               void *dest,
-                                               void *src,
-                                               size_t len)
+static void
+ioat_set_src(dma_addr_t addr, struct dma_async_tx_descriptor *tx, int index)
 {
-	dma_addr_t dest_addr;
-	dma_addr_t src_addr;
-	struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+	struct ioat_desc_sw *iter, *desc = tx_to_ioat_desc(tx);
+	struct ioat_dma_chan *ioat_chan = to_ioat_chan(tx->chan);
 
-	dest_addr = pci_map_single(ioat_chan->device->pdev,
-		dest, len, PCI_DMA_FROMDEVICE);
-	src_addr = pci_map_single(ioat_chan->device->pdev,
-		src, len, PCI_DMA_TODEVICE);
+	pci_unmap_addr_set(desc, src, addr);
 
-	return do_ioat_dma_memcpy(ioat_chan, dest_addr, src_addr, len);
-}
+	list_for_each_entry(iter, &desc->group_list, node) {
+		iter->hw->src_addr = addr;
+		addr += ioat_chan->xfercap;
+	}
 
-/**
- * ioat_dma_memcpy_buf_to_pg - wrapper, copying from a buf to a page
- * @chan: IOAT DMA channel handle
- * @page: pointer to the page to copy to
- * @offset: offset into that page
- * @src: DMA source address
- * @len: transaction length in bytes
- */
+}
 
-static dma_cookie_t ioat_dma_memcpy_buf_to_pg(struct dma_chan *chan,
-                                              struct page *page,
-                                              unsigned int offset,
-                                              void *src,
-                                              size_t len)
+static void
+ioat_set_dest(dma_addr_t addr, struct dma_async_tx_descriptor *tx, int index)
 {
-	dma_addr_t dest_addr;
-	dma_addr_t src_addr;
-	struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+	struct ioat_desc_sw *iter, *desc = tx_to_ioat_desc(tx);
+	struct ioat_dma_chan *ioat_chan = to_ioat_chan(tx->chan);
 
-	dest_addr = pci_map_page(ioat_chan->device->pdev,
-		page, offset, len, PCI_DMA_FROMDEVICE);
-	src_addr = pci_map_single(ioat_chan->device->pdev,
-		src, len, PCI_DMA_TODEVICE);
+	pci_unmap_addr_set(desc, dst, addr);
 
-	return do_ioat_dma_memcpy(ioat_chan, dest_addr, src_addr, len);
+	list_for_each_entry(iter, &desc->group_list, node) {
+		iter->hw->dst_addr = addr;
+		addr += ioat_chan->xfercap;
+	}
 }
 
-/**
- * ioat_dma_memcpy_pg_to_pg - wrapper, copying between two pages
- * @chan: IOAT DMA channel handle
- * @dest_pg: pointer to the page to copy to
- * @dest_off: offset into that page
- * @src_pg: pointer to the page to copy from
- * @src_off: offset into that page
- * @len: transaction length in bytes. This is guaranteed not to make a copy
- *	 across a page boundary.
- */
-
-static dma_cookie_t ioat_dma_memcpy_pg_to_pg(struct dma_chan *chan,
-                                             struct page *dest_pg,
-                                             unsigned int dest_off,
-                                             struct page *src_pg,
-                                             unsigned int src_off,
-                                             size_t len)
+static dma_cookie_t
+ioat_tx_submit(struct dma_async_tx_descriptor *tx)
 {
-	dma_addr_t dest_addr;
-	dma_addr_t src_addr;
-	struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+	struct ioat_dma_chan *ioat_chan = to_ioat_chan(tx->chan);
+	struct ioat_desc_sw *desc = tx_to_ioat_desc(tx);
+	struct ioat_desc_sw *group_start = list_entry(desc->group_list.next,
+		struct ioat_desc_sw, node);
+	int append = 0;
+	dma_cookie_t cookie;
+
+	spin_lock_bh(&ioat_chan->desc_lock);
+	/* cookie incr and addition to used_list must be atomic */
+	cookie = ioat_chan->common.cookie;
+	cookie++;
+	if (cookie < 0)
+		cookie = 1;
+	ioat_chan->common.cookie = desc->async_tx.cookie = cookie;
 
-	dest_addr = pci_map_page(ioat_chan->device->pdev,
-		dest_pg, dest_off, len, PCI_DMA_FROMDEVICE);
-	src_addr = pci_map_page(ioat_chan->device->pdev,
-		src_pg, src_off, len, PCI_DMA_TODEVICE);
+	/* write address into NextDescriptor field of last desc in chain */
+	to_ioat_desc(ioat_chan->used_desc.prev)->hw->next = group_start->phys;
+	list_splice_init(&desc->group_list, ioat_chan->used_desc.prev);
+
+	ioat_chan->pending += desc->group_count;
+	if (ioat_chan->pending >= 20) {
+		append = 1;
+		ioat_chan->pending = 0;
+	}
+	spin_unlock_bh(&ioat_chan->desc_lock);
 
-	return do_ioat_dma_memcpy(ioat_chan, dest_addr, src_addr, len);
+	if (append)
+		ioatdma_chan_write8(ioat_chan,
+		                    IOAT_CHANCMD_OFFSET,
+		                    IOAT_CHANCMD_APPEND);
+	return cookie;
 }
 
 /**
@@ -467,8 +411,8 @@ static void ioat_dma_memcpy_cleanup(struct ioat_dma_chan *chan)
 		 * exceeding xfercap, perhaps. If so, only the last one will
 		 * have a cookie, and require unmapping.
 		 */
-		if (desc->cookie) {
-			cookie = desc->cookie;
+		if (desc->async_tx.cookie) {
+			cookie = desc->async_tx.cookie;
 
 			/* yes we are unmapping both _page and _single alloc'd
 			   regions with unmap_page. Is this *really* that bad?
@@ -484,13 +428,18 @@ static void ioat_dma_memcpy_cleanup(struct ioat_dma_chan *chan)
 		}
 
 		if (desc->phys != phys_complete) {
-			/* a completed entry, but not the last, so cleanup */
-			list_del(&desc->node);
-			list_add_tail(&desc->node, &chan->free_desc);
+			/* a completed entry, but not the last, so cleanup
+			 * if the client is done with the descriptor
+			 */
+			if (desc->async_tx.ack) {
+				list_del(&desc->node);
+				list_add_tail(&desc->node, &chan->free_desc);
+			} else
+				desc->async_tx.cookie = 0;
 		} else {
 			/* last used desc. Do not remove, so we can append from
 			   it, but don't look at it next time, either */
-			desc->cookie = 0;
+			desc->async_tx.cookie = 0;
 
 			/* TODO check status bits? */
 			break;
@@ -506,6 +455,17 @@ static void ioat_dma_memcpy_cleanup(struct ioat_dma_chan *chan)
 	spin_unlock(&chan->cleanup_lock);
 }
 
+static void ioat_dma_dependency_added(struct dma_chan *chan)
+{
+	struct ioat_dma_chan *ioat_chan = to_ioat_chan(chan);
+	spin_lock_bh(&ioat_chan->desc_lock);
+	if (ioat_chan->pending == 0) {
+		spin_unlock_bh(&ioat_chan->desc_lock);
+		ioat_dma_memcpy_cleanup(ioat_chan);
+	} else
+		spin_unlock_bh(&ioat_chan->desc_lock);
+}
+
 /**
  * ioat_dma_is_complete - poll the status of a IOAT DMA transaction
  * @chan: IOAT DMA channel handle
@@ -607,6 +567,7 @@ static void ioat_start_null_desc(struct ioat_dma_chan *ioat_chan)
 
 	desc->hw->ctl = IOAT_DMA_DESCRIPTOR_NUL;
 	desc->hw->next = 0;
+	desc->async_tx.ack = 1;
 
 	list_add_tail(&desc->node, &ioat_chan->used_desc);
 	spin_unlock_bh(&ioat_chan->desc_lock);
@@ -633,6 +594,8 @@ static int ioat_self_test(struct ioat_device *device)
 	u8 *src;
 	u8 *dest;
 	struct dma_chan *dma_chan;
+	struct dma_async_tx_descriptor *tx;
+	dma_addr_t addr;
 	dma_cookie_t cookie;
 	int err = 0;
 
@@ -658,7 +621,13 @@ static int ioat_self_test(struct ioat_device *device)
 		goto out;
 	}
 
-	cookie = ioat_dma_memcpy_buf_to_buf(dma_chan, dest, src, IOAT_TEST_SIZE);
+	tx = ioat_dma_prep_memcpy(dma_chan, IOAT_TEST_SIZE, 0);
+	async_tx_ack(tx);
+	addr = dma_map_single(dma_chan->device->dev, src, IOAT_TEST_SIZE, DMA_TO_DEVICE);
+	ioat_set_src(addr, tx, 0);
+	addr = dma_map_single(dma_chan->device->dev, dest, IOAT_TEST_SIZE, DMA_FROM_DEVICE);
+	ioat_set_dest(addr, tx, 0);
+	cookie = ioat_tx_submit(tx);
 	ioat_dma_memcpy_issue_pending(dma_chan);
 	msleep(1);
 
@@ -754,13 +723,16 @@ static int __devinit ioat_probe(struct pci_dev *pdev,
 	INIT_LIST_HEAD(&device->common.channels);
 	enumerate_dma_channels(device);
 
+	set_bit(DMA_MEMCPY, &device->common.capabilities);
 	device->common.device_alloc_chan_resources = ioat_dma_alloc_chan_resources;
 	device->common.device_free_chan_resources = ioat_dma_free_chan_resources;
-	device->common.device_memcpy_buf_to_buf = ioat_dma_memcpy_buf_to_buf;
-	device->common.device_memcpy_buf_to_pg = ioat_dma_memcpy_buf_to_pg;
-	device->common.device_memcpy_pg_to_pg = ioat_dma_memcpy_pg_to_pg;
-	device->common.device_memcpy_complete = ioat_dma_is_complete;
-	device->common.device_memcpy_issue_pending = ioat_dma_memcpy_issue_pending;
+	device->common.device_prep_dma_memcpy = ioat_dma_prep_memcpy;
+	device->common.device_set_src = ioat_set_src;
+	device->common.device_set_dest = ioat_set_dest;
+	device->common.device_is_tx_complete = ioat_dma_is_complete;
+	device->common.device_issue_pending = ioat_dma_memcpy_issue_pending;
+	device->common.device_dependency_added = ioat_dma_dependency_added;
+	device->common.dev = &pdev->dev;
 	printk(KERN_INFO "Intel(R) I/OAT DMA Engine found, %d channels\n",
 		device->common.chancnt);
 
diff --git a/drivers/dma/ioatdma.h b/drivers/dma/ioatdma.h
index 62b26a9..fed259a 100644
--- a/drivers/dma/ioatdma.h
+++ b/drivers/dma/ioatdma.h
@@ -105,15 +105,20 @@ struct ioat_dma_chan {
 /**
  * struct ioat_desc_sw - wrapper around hardware descriptor
  * @hw: hardware DMA descriptor
+ * @async_tx:
  * @node:
+ * @group_list:
+ * @group_cnt:
  * @cookie:
  * @phys:
  */
 
 struct ioat_desc_sw {
 	struct ioat_dma_descriptor *hw;
+	struct dma_async_tx_descriptor async_tx;
 	struct list_head node;
-	dma_cookie_t cookie;
+	struct list_head group_list;
+	int group_count;
 	dma_addr_t phys;
 	DECLARE_PCI_UNMAP_ADDR(src)
 	DECLARE_PCI_UNMAP_LEN(src_len)
@@ -122,4 +127,3 @@ struct ioat_desc_sw {
 };
 
 #endif /* IOATDMA_H */
-
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index c94d8f1..a35040d 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -21,13 +21,12 @@
 #ifndef DMAENGINE_H
 #define DMAENGINE_H
 
-#ifdef CONFIG_DMA_ENGINE
-
 #include <linux/device.h>
 #include <linux/uio.h>
 #include <linux/kref.h>
 #include <linux/completion.h>
 #include <linux/rcupdate.h>
+#include <linux/dma-mapping.h>
 
 /**
  * enum dma_event - resource PNP/power managment events
@@ -65,6 +64,47 @@ enum dma_status {
 };
 
 /**
+ * enum dma_transaction_type - DMA transaction types/indexes
+ */
+enum dma_transaction_type {
+	DMA_MEMCPY,
+	DMA_XOR,
+	DMA_PQ_XOR,
+	DMA_DUAL_XOR,
+	DMA_PQ_UPDATE,
+	DMA_ZERO_SUM,
+	DMA_PQ_ZERO_SUM,
+	DMA_MEMSET,
+	DMA_MEMCPY_CRC32C,
+	DMA_INTERRUPT,
+};
+
+#define DMA_TX_TYPE_START (DMA_MEMCPY)
+#define DMA_TX_TYPE_END (DMA_INTERRUPT)
+#define dma_async_for_each_tx_type(index) for (\
+	 index = (unsigned long) DMA_TX_TYPE_START;\
+	 index <= (unsigned long) DMA_TX_TYPE_END;\
+	 index++)
+
+#define DMA_CHAN_REQUEST_ALL (-1)
+
+/**
+ * enum dma_capability - DMA transfer/transform capabilities
+ */
+enum dma_capability {
+	DMA_CAP_MEMCPY	      = (1 << DMA_MEMCPY),
+	DMA_CAP_XOR	      = (1 << DMA_XOR),
+	DMA_CAP_PQ_XOR	      = (1 << DMA_PQ_XOR),
+	DMA_CAP_DUAL_XOR      = (1 << DMA_DUAL_XOR),
+	DMA_CAP_PQ_UPDATE     = (1 << DMA_PQ_UPDATE),
+	DMA_CAP_ZERO_SUM      = (1 << DMA_ZERO_SUM),
+	DMA_CAP_PQ_ZERO_SUM   = (1 << DMA_PQ_ZERO_SUM),
+	DMA_CAP_MEMSET	      = (1 << DMA_MEMSET),
+	DMA_CAP_MEMCPY_CRC32C = (1 << DMA_MEMCPY_CRC32C),
+	DMA_CAP_INTERRUPT     = (1 << DMA_INTERRUPT),
+};
+
+/**
  * struct dma_chan_percpu - the per-CPU part of struct dma_chan
  * @refcount: local_t used for open-coded "bigref" counting
  * @memcpy_count: transaction counter
@@ -149,154 +189,142 @@ typedef void (*dma_event_callback) (struct dma_client *client,
  */
 struct dma_client {
 	dma_event_callback	event_callback;
-	unsigned int		chan_count;
-	unsigned int		chans_desired;
+	int			chan_count;
+	int			chans_desired;
 
 	spinlock_t		lock;
 	struct list_head	channels;
 	struct list_head	global_node;
 };
 
+typedef void (*dma_async_tx_callback)(void *dma_async_param);
+/**
+ * struct dma_async_tx_descriptor - async transaction descriptor
+ * @cookie: tracking cookie for this transaction, set to -EBUSY if
+ *	this tx is sitting on a dependency list
+ * @ack: the descriptor can not be reused until the client acknowledges
+ *	receipt, i.e. has has a chance to establish any dependency chains
+ * @type: allows backend implementations to key off the tx_type
+ * @callback: routine to call after this operation is complete
+ * @callback_param: general parameter to pass to the callback routine
+ * @chan: target channel for this operation
+ * @depend_list: at completion this list of transactions are submitted
+ * @depend_node: allow this transaction to be executed after another
+ *	transaction has completed
+ * @parent: pointer to the next level up in the dependency chain
+ * @lock: protect the dependency list
+ */
+struct dma_async_tx_descriptor {
+	dma_cookie_t cookie;
+	int ack;
+	enum dma_transaction_type type;
+	dma_async_tx_callback callback;
+	void *callback_param;
+	struct dma_chan *chan;
+	struct list_head depend_list;
+	struct list_head depend_node;
+	struct dma_async_tx_descriptor *parent;
+	spinlock_t lock;
+};
+
 /**
  * struct dma_device - info on the entity supplying DMA services
  * @chancnt: how many DMA channels are supported
  * @channels: the list of struct dma_chan
  * @global_node: list_head for global dma_device_list
+ * @capabilities: one or more dma_capability flags
+ * @max_xor: maximum number of xor sources, 0 if no capability
  * @refcount: reference count
  * @done: IO completion struct
  * @dev_id: unique device ID
+ * @dev: struct device reference for dma mapping api
  * @device_alloc_chan_resources: allocate resources and return the
  *	number of allocated descriptors
  * @device_free_chan_resources: release DMA channel's resources
- * @device_memcpy_buf_to_buf: memcpy buf pointer to buf pointer
- * @device_memcpy_buf_to_pg: memcpy buf pointer to struct page
- * @device_memcpy_pg_to_pg: memcpy struct page/offset to struct page/offset
- * @device_memcpy_complete: poll the status of an IOAT DMA transaction
- * @device_memcpy_issue_pending: push appended descriptors to hardware
+ * @device_prep_dma_memcpy: prepares a memcpy operation
+ * @device_prep_dma_xor: prepares a xor operation
+ * @device_prep_dma_zero_sum: prepares a zero_sum operation
+ * @device_prep_dma_memset: prepares a memset operation
+ * @device_prep_dma_interrupt: prepares an end of chain interrupt operation
+ * @device_tx_submit: execute an operation
+ * @device_set_dest: set a destination address in a hardware descriptor
+ * @device_set_src: set a source address in a hardware descriptor
+ * @device_dependency_added: async_tx notifies the channel about new deps
+ * @device_issue_pending: push pending transactions to hardware
  */
 struct dma_device {
 
 	unsigned int chancnt;
 	struct list_head channels;
 	struct list_head global_node;
+	unsigned long capabilities;
+	unsigned int max_xor;
 
 	struct kref refcount;
 	struct completion done;
 
 	int dev_id;
+	struct device *dev;
 
 	int (*device_alloc_chan_resources)(struct dma_chan *chan);
 	void (*device_free_chan_resources)(struct dma_chan *chan);
-	dma_cookie_t (*device_memcpy_buf_to_buf)(struct dma_chan *chan,
-			void *dest, void *src, size_t len);
-	dma_cookie_t (*device_memcpy_buf_to_pg)(struct dma_chan *chan,
-			struct page *page, unsigned int offset, void *kdata,
-			size_t len);
-	dma_cookie_t (*device_memcpy_pg_to_pg)(struct dma_chan *chan,
-			struct page *dest_pg, unsigned int dest_off,
-			struct page *src_pg, unsigned int src_off, size_t len);
-	enum dma_status (*device_memcpy_complete)(struct dma_chan *chan,
+
+	struct dma_async_tx_descriptor *(*device_prep_dma_memcpy)(
+		struct dma_chan *chan, size_t len, int int_en);
+	struct dma_async_tx_descriptor *(*device_prep_dma_xor)(
+		struct dma_chan *chan, unsigned int src_cnt, size_t len,
+		int int_en);
+	struct dma_async_tx_descriptor *(*device_prep_dma_zero_sum)(
+		struct dma_chan *chan, unsigned int src_cnt, size_t len,
+		u32 *result, int int_en);
+	struct dma_async_tx_descriptor *(*device_prep_dma_memset)(
+		struct dma_chan *chan, int value, size_t len, int int_en);
+	struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)(
+		struct dma_chan *chan);
+
+	dma_cookie_t (*device_tx_submit)(struct dma_async_tx_descriptor *tx);
+	void (*device_set_dest)(dma_addr_t addr,
+		struct dma_async_tx_descriptor *tx, int index);
+	void (*device_set_src)(dma_addr_t addr,
+		struct dma_async_tx_descriptor *tx, int index);
+	void (*device_dependency_added)(struct dma_chan *chan);
+	enum dma_status (*device_is_tx_complete)(struct dma_chan *chan,
 			dma_cookie_t cookie, dma_cookie_t *last,
 			dma_cookie_t *used);
-	void (*device_memcpy_issue_pending)(struct dma_chan *chan);
+	void (*device_issue_pending)(struct dma_chan *chan);
 };
 
 /* --- public DMA engine API --- */
 
 struct dma_client *dma_async_client_register(dma_event_callback event_callback);
 void dma_async_client_unregister(struct dma_client *client);
-void dma_async_client_chan_request(struct dma_client *client,
-		unsigned int number);
-
-/**
- * dma_async_memcpy_buf_to_buf - offloaded copy between virtual addresses
- * @chan: DMA channel to offload copy to
- * @dest: destination address (virtual)
- * @src: source address (virtual)
- * @len: length
- *
- * Both @dest and @src must be mappable to a bus address according to the
- * DMA mapping API rules for streaming mappings.
- * Both @dest and @src must stay memory resident (kernel memory or locked
- * user space pages).
- */
-static inline dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
-	void *dest, void *src, size_t len)
-{
-	int cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
-	put_cpu();
-
-	return chan->device->device_memcpy_buf_to_buf(chan, dest, src, len);
-}
-
-/**
- * dma_async_memcpy_buf_to_pg - offloaded copy from address to page
- * @chan: DMA channel to offload copy to
- * @page: destination page
- * @offset: offset in page to copy to
- * @kdata: source address (virtual)
- * @len: length
- *
- * Both @page/@offset and @kdata must be mappable to a bus address according
- * to the DMA mapping API rules for streaming mappings.
- * Both @page/@offset and @kdata must stay memory resident (kernel memory or
- * locked user space pages)
- */
-static inline dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
-	struct page *page, unsigned int offset, void *kdata, size_t len)
-{
-	int cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
-	put_cpu();
-
-	return chan->device->device_memcpy_buf_to_pg(chan, page, offset,
-	                                             kdata, len);
-}
-
-/**
- * dma_async_memcpy_pg_to_pg - offloaded copy from page to page
- * @chan: DMA channel to offload copy to
- * @dest_pg: destination page
- * @dest_off: offset in page to copy to
- * @src_pg: source page
- * @src_off: offset in page to copy from
- * @len: length
- *
- * Both @dest_page/@dest_off and @src_page/@src_off must be mappable to a bus
- * address according to the DMA mapping API rules for streaming mappings.
- * Both @dest_page/@dest_off and @src_page/@src_off must stay memory resident
- * (kernel memory or locked user space pages).
- */
-static inline dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
-	struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
-	unsigned int src_off, size_t len)
-{
-	int cpu = get_cpu();
-	per_cpu_ptr(chan->local, cpu)->bytes_transferred += len;
-	per_cpu_ptr(chan->local, cpu)->memcpy_count++;
-	put_cpu();
-
-	return chan->device->device_memcpy_pg_to_pg(chan, dest_pg, dest_off,
-	                                            src_pg, src_off, len);
-}
+void dma_async_client_chan_request(struct dma_client *client, int number);
+dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
+        void *dest, void *src, size_t len);
+dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
+        struct page *page, unsigned int offset, void *kdata, size_t len);
+dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
+        struct page *dest_pg, unsigned int dest_off, struct page *src_pg,
+        unsigned int src_off, size_t len);
+void dma_async_tx_descriptor_init(struct dma_async_tx_descriptor *tx,
+	struct dma_chan *chan);
 
 /**
- * dma_async_memcpy_issue_pending - flush pending copies to HW
+ * dma_async_issue_pending - flush pending transactions to HW
  * @chan: target DMA channel
  *
  * This allows drivers to push copies to HW in batches,
  * reducing MMIO writes where possible.
  */
-static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
+static inline void dma_async_issue_pending(struct dma_chan *chan)
 {
-	return chan->device->device_memcpy_issue_pending(chan);
+	return chan->device->device_issue_pending(chan);
 }
 
+#define dma_async_memcpy_issue_pending(chan) dma_async_issue_pending(chan)
+
 /**
- * dma_async_memcpy_complete - poll for transaction completion
+ * dma_async_is_tx_complete - poll for transaction completion
  * @chan: DMA channel
  * @cookie: transaction identifier to check status of
  * @last: returns last completed cookie, can be NULL
@@ -306,12 +334,15 @@ static inline void dma_async_memcpy_issue_pending(struct dma_chan *chan)
  * internal state and can be used with dma_async_is_complete() to check
  * the status of multiple cookies without re-checking hardware state.
  */
-static inline enum dma_status dma_async_memcpy_complete(struct dma_chan *chan,
+static inline enum dma_status dma_async_is_tx_complete(struct dma_chan *chan,
 	dma_cookie_t cookie, dma_cookie_t *last, dma_cookie_t *used)
 {
-	return chan->device->device_memcpy_complete(chan, cookie, last, used);
+	return chan->device->device_is_tx_complete(chan, cookie, last, used);
 }
 
+#define dma_async_memcpy_complete(chan, cookie, last, used)\
+	dma_async_is_tx_complete(chan, cookie, last, used)
+
 /**
  * dma_async_is_complete - test a cookie against chan state
  * @cookie: transaction identifier to test status of
@@ -334,6 +365,7 @@ static inline enum dma_status dma_async_is_complete(dma_cookie_t cookie,
 	return DMA_IN_PROGRESS;
 }
 
+enum dma_status dma_sync_wait(struct dma_chan *chan, dma_cookie_t cookie);
 
 /* --- DMA device --- */
 
@@ -362,5 +394,4 @@ dma_cookie_t dma_memcpy_pg_to_iovec(struct dma_chan *chan, struct iovec *iov,
 	struct dma_pinned_list *pinned_list, struct page *page,
 	unsigned int offset, size_t len);
 
-#endif /* CONFIG_DMA_ENGINE */
 #endif /* DMAENGINE_H */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 02/15] ARM: Add drivers/dma to arch/arm/Kconfig
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 01/15] dmaengine: add base support for the async_tx api Dan Williams
@ 2007-03-23  6:51 ` Dan Williams
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 03/15] dmaengine: add the async_tx api Dan Williams
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:51 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 arch/arm/Kconfig |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e7baca2..74077e3 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -997,6 +997,8 @@ source "drivers/mmc/Kconfig"
 
 source "drivers/rtc/Kconfig"
 
+source "drivers/dma/Kconfig"
+
 endmenu
 
 source "fs/Kconfig"

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 03/15] dmaengine: add the async_tx api
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 01/15] dmaengine: add base support for the async_tx api Dan Williams
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 02/15] ARM: Add drivers/dma to arch/arm/Kconfig Dan Williams
@ 2007-03-23  6:51 ` Dan Williams
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 04/15] md: add raid5_run_ops and support routines Dan Williams
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:51 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

async_tx is an api to describe a series of bulk memory
transfers/transforms.  When possible these transactions are carried out by
asynchrounous dma engines.  The api handles inter-transaction dependencies
and hides dma channel management from the client.  When a dma engine is not
present the transaction is carried out via synchronous software routines.

Xor operations are handled by async_tx, to this end xor.c is moved into
drivers/dma and is changed to take an explicit destination address and
a series of sources to match the hardware engine implementation.

When CONFIG_DMA_ENGINE is not set the asynchrounous path is compiled away.

Changelog:
* fixed a leftover debug print
* don't allow callbacks in async_interrupt_cond
* fixed xor_block changes
* fixed usage of ASYNC_TX_XOR_DROP_DEST
* drop dma mapping methods, suggested by Chris Leech
* printk warning fixups from Andrew Morton
* don't use inline in C files, Adrian Bunk
* select the API when MD is enabled

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/Makefile         |    1 
 drivers/dma/Kconfig      |   15 +
 drivers/dma/Makefile     |    1 
 drivers/dma/async_tx.c   |  905 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/dma/xor.c        |  153 ++++++++
 drivers/md/Kconfig       |    3 
 drivers/md/Makefile      |    6 
 drivers/md/raid5.c       |   52 +--
 drivers/md/xor.c         |  154 --------
 include/linux/async_tx.h |  180 +++++++++
 include/linux/raid/xor.h |    5 
 11 files changed, 1286 insertions(+), 189 deletions(-)

diff --git a/drivers/Makefile b/drivers/Makefile
index 3a718f5..2e8de9e 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_I2C)		+= i2c/
 obj-$(CONFIG_W1)		+= w1/
 obj-$(CONFIG_HWMON)		+= hwmon/
 obj-$(CONFIG_PHONE)		+= telephony/
+obj-$(CONFIG_ASYNC_TX_DMA)	+= dma/
 obj-$(CONFIG_MD)		+= md/
 obj-$(CONFIG_BT)		+= bluetooth/
 obj-$(CONFIG_ISDN)		+= isdn/
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 30d021d..292ddad 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -7,8 +7,8 @@ menu "DMA Engine support"
 config DMA_ENGINE
 	bool "Support for DMA engines"
 	---help---
-	  DMA engines offload copy operations from the CPU to dedicated
-	  hardware, allowing the copies to happen asynchronously.
+          DMA engines offload bulk memory operations from the CPU to dedicated
+          hardware, allowing the operations to happen asynchronously.
 
 comment "DMA Clients"
 
@@ -22,6 +22,16 @@ config NET_DMA
 	  Since this is the main user of the DMA engine, it should be enabled;
 	  say Y here.
 
+config ASYNC_TX_DMA
+	tristate "Asynchronous Bulk Memory Transfers/Transforms API"
+	---help---
+	  This enables the async_tx management layer for dma engines.
+	  Subsystems coded to this API will use offload engines for bulk
+	  memory operations where present.  Software implementations are
+	  called when a dma engine is not present or fails to allocate
+	  memory to carry out the transaction.
+	  Current subsystems ported to async_tx: MD_RAID4,5
+
 comment "DMA Devices"
 
 config INTEL_IOATDMA
@@ -30,5 +40,4 @@ config INTEL_IOATDMA
 	default m
 	---help---
 	  Enable support for the Intel(R) I/OAT DMA engine.
-
 endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index bdcfdbd..6a99341 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_DMA_ENGINE) += dmaengine.o
 obj-$(CONFIG_NET_DMA) += iovlock.o
 obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o
+obj-$(CONFIG_ASYNC_TX_DMA) += async_tx.o xor.o
diff --git a/drivers/dma/async_tx.c b/drivers/dma/async_tx.c
new file mode 100644
index 0000000..8f7e701
--- /dev/null
+++ b/drivers/dma/async_tx.c
@@ -0,0 +1,905 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#include <linux/kernel.h>
+#include <linux/interrupt.h>
+#include <linux/dma-mapping.h>
+#include <linux/raid/xor.h>
+#include <linux/async_tx.h>
+
+#define ASYNC_TX_DEBUG 0
+#define PRINTK(x...) ((void)(ASYNC_TX_DEBUG && printk(x)))
+
+#ifdef CONFIG_DMA_ENGINE
+static struct dma_client *async_api_client;
+static struct async_channel_entry async_channel_htable[] = {
+	[DMA_MEMCPY] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_MEMCPY].list), },
+	[DMA_XOR] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_XOR].list), },
+	[DMA_PQ_XOR] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_PQ_XOR].list), },
+	[DMA_DUAL_XOR] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_DUAL_XOR].list), },
+	[DMA_PQ_UPDATE] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_PQ_UPDATE].list), },
+	[DMA_ZERO_SUM] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_ZERO_SUM].list), },
+	[DMA_PQ_ZERO_SUM] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_PQ_ZERO_SUM].list), },
+	[DMA_MEMSET] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_MEMSET].list), },
+	[DMA_MEMCPY_CRC32C] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_MEMCPY_CRC32C].list), },
+	[DMA_INTERRUPT] = { .list =
+		LIST_HEAD_INIT(async_channel_htable[DMA_INTERRUPT].list), },
+};
+
+struct async_channel_entry async_tx_master_list = {
+	.list = LIST_HEAD_INIT(async_tx_master_list.list),
+};
+EXPORT_SYMBOL_GPL(async_tx_master_list);
+
+static void
+free_dma_chan_ref(struct rcu_head *rcu)
+{
+	struct dma_chan_ref *ref;
+	ref = container_of(rcu, struct dma_chan_ref, rcu);
+	dma_chan_put(ref->chan);
+	kfree(ref);
+}
+
+static void
+init_dma_chan_ref(struct dma_chan_ref *ref, struct dma_chan *chan)
+{
+	INIT_LIST_HEAD(&ref->async_node);
+	INIT_RCU_HEAD(&ref->rcu);
+	ref->chan = chan;
+}
+
+static void
+dma_channel_add_remove(struct dma_client *client,
+	struct dma_chan *chan, enum dma_event event)
+{
+	unsigned long i, flags;
+	struct dma_chan_ref *master_ref, *ref;
+	struct async_channel_entry *channel_entry;
+
+	switch (event) {
+	case DMA_RESOURCE_ADDED:
+		PRINTK("async_tx: dma resource added (capabilities: %#lx)\n",
+			chan->device->capabilities);
+		/* add the channel to the generic management list */
+		master_ref = kmalloc(sizeof(*master_ref), GFP_KERNEL);
+		if (master_ref) {
+			/* keep a reference until async_tx is unloaded */
+			dma_chan_get(chan);
+			init_dma_chan_ref(master_ref, chan);
+			spin_lock_irqsave(&async_tx_master_list.lock, flags);
+			list_add_tail_rcu(&master_ref->async_node,
+				&async_tx_master_list.list);
+			spin_unlock_irqrestore(&async_tx_master_list.lock,
+				flags);
+		} else {
+			printk(KERN_WARNING "async_tx: unable to create"
+				" new master entry in response to"
+				" a DMA_RESOURCE_ADDED event"
+				" (-ENOMEM)\n");
+			return;
+		}
+
+		/* add an entry for each capability of this channel */
+		dma_async_for_each_tx_type(i) {
+			if (test_bit(i, &chan->device->capabilities))
+				ref = kmalloc(sizeof(*ref), GFP_KERNEL);
+			else
+				continue;
+
+			if (ref) {
+				channel_entry = &async_channel_htable[i];
+				init_dma_chan_ref(ref, chan);
+				spin_lock_irqsave(&channel_entry->lock, flags);
+				atomic_inc(&channel_entry->version);
+				list_add_tail_rcu(&ref->async_node,
+					&channel_entry->list);
+				spin_unlock_irqrestore(&channel_entry->lock,
+					flags);
+			} else {
+				printk(KERN_WARNING "async_tx: unable to create"
+					" new op-specific entry in response to"
+					" a DMA_RESOURCE_ADDED event"
+					" (-ENOMEM)\n");
+				return;
+			}
+		}
+		break;
+	case DMA_RESOURCE_REMOVED:
+		PRINTK("async_tx: dma resource removed (capabilities: %#lx)\n",
+			chan->device->capabilities);
+		dma_async_for_each_tx_type(i) {
+			if (!test_bit(i, &chan->device->capabilities))
+				continue;
+
+			channel_entry = &async_channel_htable[i];
+
+			spin_lock_irqsave(&channel_entry->lock, flags);
+			list_for_each_entry_rcu(ref, &channel_entry->list,
+				async_node) {
+				if (ref->chan == chan) {
+					atomic_inc(&channel_entry->version);
+					list_del_rcu(&ref->async_node);
+					call_rcu(&ref->rcu, free_dma_chan_ref);
+					break;
+				}
+			}
+			spin_unlock_irqrestore(&channel_entry->lock, flags);
+		}
+		break;
+	case DMA_RESOURCE_SUSPEND:
+	case DMA_RESOURCE_RESUME:
+		printk(KERN_WARNING "async_tx: does not support dma channel"
+			" suspend/resume\n");
+		break;
+	default:
+		BUG();
+	}
+}
+
+static int __init
+async_tx_init(void)
+{
+	unsigned long i;
+	struct async_channel_entry *channel_entry;
+	int cpu;
+
+	spin_lock_init(&async_tx_master_list.lock);
+
+	dma_async_for_each_tx_type(i) {
+		channel_entry = &async_channel_htable[i];
+		spin_lock_init(&channel_entry->lock);
+		channel_entry->local_iter = alloc_percpu(struct async_iter_percpu);
+		if (!channel_entry->local_iter) {
+			i++;
+			goto err;
+		}
+
+		atomic_set(&channel_entry->version, 0);
+
+		for_each_possible_cpu(cpu) {
+			struct async_iter_percpu *local_iter =
+				channel_entry->local_iter;
+			per_cpu_ptr(local_iter, cpu)->iter = &channel_entry->list;
+			per_cpu_ptr(local_iter, cpu)->local_version = 0;
+		}
+	}
+
+	async_api_client = dma_async_client_register(dma_channel_add_remove);
+
+	if (!async_api_client)
+		goto err;
+
+	dma_async_client_chan_request(async_api_client, DMA_CHAN_REQUEST_ALL);
+
+	printk("async_tx: api initialized (async)\n");
+
+	return 0;
+err:
+	printk("async_tx: initialization failure\n");
+
+	while (--i >= 0)
+		free_percpu(async_channel_htable[i].local_iter);
+
+	return 1;
+}
+
+static void __exit async_tx_exit(void)
+{
+	unsigned long i, flags;
+	struct async_channel_entry *channel_entry;
+	struct dma_chan_ref *ref;
+
+	if (async_api_client)
+		dma_async_client_unregister(async_api_client);
+
+	dma_async_for_each_tx_type(i) {
+		channel_entry = &async_channel_htable[i];
+		if (channel_entry->local_iter)
+			free_percpu(channel_entry->local_iter);
+
+		/* free all the per operation channel references */
+		spin_lock_irqsave(&channel_entry->lock, flags);
+		list_for_each_entry_rcu(ref, &channel_entry->list, async_node) {
+			list_del_rcu(&ref->async_node);
+			call_rcu(&ref->rcu, free_dma_chan_ref);
+		}
+		spin_unlock_irqrestore(&channel_entry->lock, flags);
+	}
+
+	/* free all the channels on the master list */
+	spin_lock_irqsave(&async_tx_master_list.lock, flags);
+	list_for_each_entry_rcu(ref, &async_tx_master_list.list, async_node) {
+		dma_chan_put(ref->chan); /* permit backing devices to go away */
+		list_del_rcu(&ref->async_node);
+		call_rcu(&ref->rcu, free_dma_chan_ref);
+	}
+	spin_unlock_irqrestore(&async_tx_master_list.lock, flags);
+}
+
+/**
+ * async_tx_find_channel - find a channel to carry out the operation or let
+ *	the transaction execute synchronously
+ * @depend_tx: transaction dependency
+ * @tx_type: transaction type
+ */
+static struct dma_chan *
+async_tx_find_channel(struct dma_async_tx_descriptor *depend_tx,
+	enum dma_transaction_type tx_type)
+{
+	/* see if we can keep the chain on one channel */
+	if (depend_tx &&
+		test_bit(tx_type, &depend_tx->chan->device->capabilities))
+		return depend_tx->chan;
+	else {
+		int cpu;
+		struct async_channel_entry *channel_entry =
+			&async_channel_htable[tx_type];
+		struct async_iter_percpu *local_iter;
+		struct list_head *iter;
+		struct dma_chan *chan;
+
+		rcu_read_lock();
+		if (list_empty(&channel_entry->list)) {
+			rcu_read_unlock();
+			return NULL;
+		}
+
+		cpu = get_cpu();
+		local_iter = per_cpu_ptr(channel_entry->local_iter, cpu);
+		put_cpu();
+
+		/* ensure the percpu place holder is pointing to a
+		 * valid list entry and get the next channel in the
+		 * round robin
+		 */
+		if (unlikely(local_iter->local_version !=
+			atomic_read(&channel_entry->version))) {
+			local_iter->local_version =
+				atomic_read(&channel_entry->version);
+			iter = channel_entry->list.next;
+		} else {
+			iter = local_iter->iter->next;
+			/* wrap around detect */
+			if (iter == &channel_entry->list)
+				iter = iter->next;
+		}
+
+		/* if we are still pointing to the head then the list
+		 * recently became empty
+		 */
+		if (iter == &channel_entry->list)
+			chan = NULL;
+		else {
+			local_iter->iter = iter;
+			chan = list_entry(iter, struct dma_chan_ref, async_node)->chan;
+		}
+		rcu_read_unlock();
+
+		return chan;
+	}
+}
+#else
+static int __init async_tx_init(void)
+{
+	printk("async_tx: api initialized (sync-only)\n");
+	return 0;
+}
+
+static void __exit async_tx_exit(void)
+{
+	do { } while (0);
+}
+
+static struct dma_chan *
+async_tx_find_channel(struct dma_async_tx_descriptor *depend_tx,
+	enum dma_transaction_type tx_type)
+{
+	return NULL;
+}
+#endif
+
+static void
+async_tx_submit(struct dma_chan *chan, struct dma_async_tx_descriptor *tx,
+	enum async_tx_flags flags, struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	tx->callback = callback;
+	tx->callback_param = callback_param;
+
+	/* set this new tx to run after depend_tx if:
+	 * 1/ a dependency exists (depend_tx is !NULL)
+	 * 2/ the tx can not be submitted to the current channel
+	 */
+	if (depend_tx && depend_tx->chan != chan) {
+		/* if ack is already set then we cannot be sure
+		 * we are referring to the correct operation
+		 */
+		BUG_ON(depend_tx->ack);
+
+		tx->parent = depend_tx;
+		spin_lock_bh(&depend_tx->lock);
+		list_add_tail(&tx->depend_node, &depend_tx->depend_list);
+		if (depend_tx->cookie == 0) {
+			struct dma_chan *dep_chan = depend_tx->chan;
+			struct dma_device *dep_dev = dep_chan->device;
+			dep_dev->device_dependency_added(dep_chan);
+		}
+		spin_unlock_bh(&depend_tx->lock);
+	} else {
+		tx->parent = NULL;
+		chan->device->device_tx_submit(tx);
+	}
+
+	if (flags & ASYNC_TX_ACK)
+		async_tx_ack(tx);
+
+	if (depend_tx && (flags & ASYNC_TX_DEP_ACK))
+		async_tx_ack(depend_tx);
+}
+
+/**
+ * sync_epilog - actions to take if an operation is run synchronously
+ * @flags: async_tx flags
+ * @depend_tx: transaction depends on depend_tx
+ * @callback: function to call when the transaction completes
+ * @callback_param: parameter to pass to the callback routine
+ */
+static void
+sync_epilog(unsigned long flags, struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	if (callback)
+		callback(callback_param);
+
+	if (depend_tx && (flags & ASYNC_TX_DEP_ACK))
+		async_tx_ack(depend_tx);
+}
+
+static void
+do_async_xor(struct dma_async_tx_descriptor *tx, struct dma_device *device,
+	struct dma_chan *chan, struct page *dest, struct page **src_list,
+	unsigned int offset, unsigned int src_cnt, size_t len,
+	enum async_tx_flags flags, struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	dma_addr_t dma_addr;
+	enum dma_data_direction dir;
+	int i;
+
+	PRINTK("%s: len: %zu\n", __FUNCTION__, len);
+
+	dir = (flags & ASYNC_TX_ASSUME_COHERENT) ?
+		DMA_NONE : DMA_FROM_DEVICE;
+
+	dma_addr = dma_map_page(device->dev, dest, offset, len, dir);
+     	device->device_set_dest(dma_addr, tx, 0);
+
+	dir = (flags & ASYNC_TX_ASSUME_COHERENT) ?
+		DMA_NONE : DMA_TO_DEVICE;
+
+	for (i = 0; i < src_cnt; i++) {
+		dma_addr = dma_map_page(device->dev, src_list[i],
+			offset, len, dir);
+	     	device->device_set_src(dma_addr, tx, i);
+	}
+
+	async_tx_submit(chan, tx, flags, depend_tx, callback,
+		callback_param);
+}
+
+static void
+do_sync_xor(struct page *dest, struct page **src_list, unsigned int offset,
+	unsigned int src_cnt, size_t len, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	void *_dest;
+	int i;
+
+	PRINTK("%s: len: %zu\n", __FUNCTION__, len);
+
+	/* reuse the 'src_list' array to convert to buffer pointers */
+	for (i = 0; i < src_cnt; i++)
+		src_list[i] = (struct page *)
+			(page_address(src_list[i]) + offset);
+
+	/* set destination address */
+	_dest = page_address(dest) + offset;
+
+	if (flags & ASYNC_TX_XOR_ZERO_DST)
+		memset(_dest, 0, len);
+
+	xor_block(src_cnt, len, _dest,
+		(void **) src_list);
+
+	sync_epilog(flags, depend_tx, callback, callback_param);
+}
+
+/**
+ * async_xor - attempt to xor a set of blocks with a dma engine.
+ *	xor_block always uses the dest as a source so the ASYNC_TX_XOR_ZERO_DST
+ *	flag must be set to not include dest data in the calculation.  The
+ *	assumption with dma eninges is that they only use the destination
+ *	buffer as a source when it is explicity specified in the source list.
+ * @dest: destination page
+ * @src_list: array of source pages (if the dest is also a source it must be
+ *	at index zero).  The contents of this array may be overwritten.
+ * @offset: offset in pages to start transaction
+ * @src_cnt: number of source pages
+ * @len: length in bytes
+ * @flags: ASYNC_TX_XOR_ZERO_DST, ASYNC_TX_XOR_DROP_DEST,
+ 	ASYNC_TX_ASSUME_COHERENT, ASYNC_TX_ACK, ASYNC_TX_DEP_ACK
+ * @depend_tx: xor depends on the result of this transaction.
+ * @callback: function to call when the xor completes
+ * @callback_param: parameter to pass to the callback routine
+ */
+struct dma_async_tx_descriptor *
+async_xor(struct page *dest, struct page **src_list, unsigned int offset,
+	unsigned int src_cnt, size_t len, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	struct dma_chan *chan = async_tx_find_channel(depend_tx, DMA_XOR);
+	struct dma_device *device = chan ? chan->device : NULL;
+	struct dma_async_tx_descriptor *tx = NULL;
+	dma_async_tx_callback _callback;
+	void *_callback_param;
+	unsigned long local_flags;
+	int xor_src_cnt;
+	int i = 0, src_off = 0, int_en;
+
+	BUG_ON(src_cnt <= 1);
+
+	while (src_cnt) {
+		local_flags = flags;
+		if (device) { /* run the xor asynchronously */
+			xor_src_cnt = min(src_cnt, device->max_xor);
+			/* if we are submitting additional xors
+			 * only set the callback on the last transaction
+			 */
+			if (src_cnt > xor_src_cnt) {
+				local_flags &= ~(ASYNC_TX_ACK | ASYNC_TX_INT_EN);
+				_callback = NULL;
+				_callback_param = NULL;
+			} else {
+				_callback = callback;
+				_callback_param = callback_param;
+			}
+
+			int_en = (local_flags & ASYNC_TX_INT_EN) ? 1 : 0;
+
+			tx = device->device_prep_dma_xor(
+				chan, xor_src_cnt, len, int_en);
+
+			if (tx) {
+				do_async_xor(tx, device, chan, dest,
+				&src_list[src_off], offset, xor_src_cnt, len,
+				local_flags, depend_tx, _callback,
+				_callback_param);
+			} else /* fall through */
+				goto xor_sync;
+		} else { /* run the xor synchronously */
+xor_sync:
+			/* in the sync case the dest is an implied source
+			 * (assumes the dest is at the src_off index)
+			 */
+			if (flags & ASYNC_TX_XOR_DROP_DST) {
+				src_cnt--;
+				src_off++;
+			}
+
+			/* process up to 'MAX_XOR_BLOCKS' sources */
+			xor_src_cnt = min(src_cnt, (unsigned int) MAX_XOR_BLOCKS);
+
+			/* if we are submitting additional xors
+			 * only set the callback on the last transaction
+			 */
+			if (src_cnt > xor_src_cnt) {
+				local_flags &= ~(ASYNC_TX_ACK | ASYNC_TX_INT_EN);
+				_callback = NULL;
+				_callback_param = NULL;
+			} else {
+				_callback = callback;
+				_callback_param = callback_param;
+			}
+
+			/* wait for any prerequisite operations */
+			if (depend_tx) {
+				/* if ack is already set then we cannot be sure
+				 * we are referring to the correct operation
+				 */
+				BUG_ON(depend_tx->ack);
+				if (dma_wait_for_async_tx(depend_tx) == DMA_ERROR)
+					panic("%s: DMA_ERROR waiting for depend_tx\n",
+						__FUNCTION__);
+			}
+
+			do_sync_xor(dest, &src_list[src_off], offset,
+				xor_src_cnt, len, local_flags, depend_tx,
+				_callback, _callback_param);
+		}
+
+		/* the previous tx is hidden from the client,
+		 * so ack it
+		 */
+		if (i && depend_tx)
+			async_tx_ack(depend_tx);
+
+		depend_tx = tx;
+
+		if (src_cnt > xor_src_cnt) {
+			/* drop completed sources */
+			src_cnt -= xor_src_cnt;
+			src_off += xor_src_cnt;
+
+			/* unconditionally preserve the destination */
+			flags &= ~ASYNC_TX_XOR_ZERO_DST;
+
+			/* use the intermediate result a source, but remember
+			 * it's dropped, because it's implied, in the sync case
+			 */
+			src_list[--src_off] = dest;
+			src_cnt++;
+			flags |= ASYNC_TX_XOR_DROP_DST;
+		} else
+			src_cnt = 0;
+		i++;
+	}
+
+	return tx;
+}
+EXPORT_SYMBOL_GPL(async_xor);
+
+static int page_is_zero(struct page *p, size_t len)
+{
+	char *a = page_address(p);
+	return ((*(u32*)a) == 0 &&
+		memcmp(a, a+4, len-4)==0);
+}
+
+/**
+ * async_xor_zero_sum - attempt a xor parity check with a dma engine.
+ * @dest: destination page used if the xor is performed synchronously
+ * @src_list: array of source pages.  The dest page must be listed as a source
+ * 	at index zero.  The contents of this array may be overwritten.
+ * @offset: offset in pages to start transaction
+ * @src_cnt: number of source pages
+ * @len: length in bytes
+ * @result: 0 if sum == 0 else non-zero
+ * @flags: ASYNC_TX_ASSUME_COHERENT, ASYNC_TX_ACK
+ * @depend_tx: xor depends on the result of this transaction.
+ * @callback: function to call when the xor completes
+ * @callback_param: parameter to pass to the callback routine
+ */
+struct dma_async_tx_descriptor *
+async_xor_zero_sum(struct page *dest, struct page **src_list,
+	unsigned int offset, unsigned int src_cnt, size_t len,
+	u32 *result, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	struct dma_chan *chan = async_tx_find_channel(depend_tx, DMA_ZERO_SUM);
+	struct dma_device *device = chan ? chan->device : NULL;
+	int int_en = (flags & ASYNC_TX_INT_EN) ? 1 : 0;
+	struct dma_async_tx_descriptor *tx = device ?
+		device->device_prep_dma_zero_sum(chan, src_cnt, len, result,
+			int_en) : NULL;
+	int i;
+
+	if (tx) {
+		dma_addr_t dma_addr;
+		enum dma_data_direction dir;
+
+		PRINTK("%s: (async) len: %zu\n", __FUNCTION__, len);
+
+		dir = (flags & ASYNC_TX_ASSUME_COHERENT) ?
+			DMA_NONE : DMA_TO_DEVICE;
+
+		for (i = 0; i < src_cnt; i++) {
+			dma_addr = dma_map_page(device->dev, src_list[i],
+				offset, len, dir);
+		     	device->device_set_src(dma_addr, tx, i);
+		}
+
+		async_tx_submit(chan, tx, flags, depend_tx, callback,
+			callback_param);
+	} else {
+		unsigned long xor_flags = flags;
+
+		PRINTK("%s: (sync) len: %zu\n", __FUNCTION__, len);
+
+		xor_flags |= ASYNC_TX_XOR_DROP_DST;
+		xor_flags &= ~ASYNC_TX_ACK;
+
+		tx = async_xor(dest, src_list, offset, src_cnt, len, xor_flags,
+			depend_tx, NULL, NULL);
+
+		if (tx) {
+			if (dma_wait_for_async_tx(tx) == DMA_ERROR)
+				panic("%s: DMA_ERROR waiting for tx\n",
+					__FUNCTION__);
+			async_tx_ack(tx);
+		}
+
+		*result = page_is_zero(dest, len) ? 0 : 1;
+
+		tx = NULL;
+
+		sync_epilog(flags, depend_tx, callback, callback_param);
+	}
+
+	return tx;
+}
+EXPORT_SYMBOL_GPL(async_xor_zero_sum);
+
+/**
+ * async_memcpy - attempt to copy memory with a dma engine.
+ * @dest: destination page
+ * @src: src page
+ * @offset: offset in pages to start transaction
+ * @len: length in bytes
+ * @flags: ASYNC_TX_ASSUME_COHERENT, ASYNC_TX_ACK, ASYNC_TX_DEP_ACK,
+ *	ASYNC_TX_KMAP_SRC, ASYNC_TX_KMAP_DST
+ * @depend_tx: memcpy depends on the result of this transaction
+ * @callback: function to call when the memcpy completes
+ * @callback_param: parameter to pass to the callback routine
+ */
+struct dma_async_tx_descriptor *
+async_memcpy(struct page *dest, struct page *src, unsigned int dest_offset,
+	unsigned int src_offset, size_t len, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	struct dma_chan *chan = async_tx_find_channel(depend_tx, DMA_MEMCPY);
+	struct dma_device *device = chan ? chan->device : NULL;
+	int int_en = (flags & ASYNC_TX_INT_EN) ? 1 : 0;
+	struct dma_async_tx_descriptor *tx = device ?
+		device->device_prep_dma_memcpy(chan, len,
+		int_en) : NULL;
+
+	if (tx) { /* run the memcpy asynchronously */
+		dma_addr_t dma_addr;
+		enum dma_data_direction dir;
+
+		PRINTK("%s: (async) len: %zu\n", __FUNCTION__, len);
+
+		dir = (flags & ASYNC_TX_ASSUME_COHERENT) ?
+			DMA_NONE : DMA_FROM_DEVICE;
+
+		dma_addr = dma_map_page(device->dev, dest, dest_offset, len, dir);
+	     	device->device_set_dest(dma_addr, tx, 0);
+
+		dir = (flags & ASYNC_TX_ASSUME_COHERENT) ?
+			DMA_NONE : DMA_TO_DEVICE;
+
+		dma_addr = dma_map_page(device->dev, src, src_offset, len, dir);
+	     	device->device_set_src(dma_addr, tx, 0);
+
+		async_tx_submit(chan, tx, flags, depend_tx, callback,
+			callback_param);
+	} else { /* run the memcpy synchronously */
+		void *dest_buf, *src_buf;
+		PRINTK("%s: (sync) len: %zu\n", __FUNCTION__, len);
+
+		/* wait for any prerequisite operations */
+		if (depend_tx) {
+			/* if ack is already set then we cannot be sure
+			 * we are referring to the correct operation
+			 */
+			BUG_ON(depend_tx->ack);
+			if (dma_wait_for_async_tx(depend_tx) == DMA_ERROR)
+				panic("%s: DMA_ERROR waiting for depend_tx\n",
+					__FUNCTION__);
+		}
+
+		if (flags & ASYNC_TX_KMAP_DST)
+			dest_buf = kmap_atomic(dest, KM_USER0) + dest_offset;
+		else
+			dest_buf = page_address(dest) + dest_offset;
+
+		if (flags & ASYNC_TX_KMAP_SRC)
+			src_buf = kmap_atomic(src, KM_USER0) + src_offset;
+		else
+			src_buf = page_address(src) + src_offset;
+
+		memcpy(dest_buf, src_buf, len);
+
+		if (flags & ASYNC_TX_KMAP_DST)
+			kunmap_atomic(dest_buf, KM_USER0);
+
+		if (flags & ASYNC_TX_KMAP_SRC)
+			kunmap_atomic(src_buf, KM_USER0);
+
+		sync_epilog(flags, depend_tx, callback, callback_param);
+	}
+
+	return tx;
+}
+EXPORT_SYMBOL_GPL(async_memcpy);
+
+/**
+ * async_memset - attempt to fill memory with a dma engine.
+ * @dest: destination page
+ * @val: fill value
+ * @offset: offset in pages to start transaction
+ * @len: length in bytes
+ * @flags: ASYNC_TX_ASSUME_COHERENT, ASYNC_TX_ACK
+ * @depend_tx: memset depends on the result of this transaction
+ * @callback: function to call when the memcpy completes
+ * @callback_param: parameter to pass to the callback routine
+ */
+struct dma_async_tx_descriptor *
+async_memset(struct page *dest, int val, unsigned int offset,
+	size_t len, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	struct dma_chan *chan = async_tx_find_channel(depend_tx, DMA_MEMSET);
+	struct dma_device *device = chan ? chan->device : NULL;
+	int int_en = (flags & ASYNC_TX_INT_EN) ? 1 : 0;
+	struct dma_async_tx_descriptor *tx = device ?
+		device->device_prep_dma_memset(chan, val, len,
+			int_en) : NULL;
+
+	if (tx) { /* run the memset asynchronously */
+		dma_addr_t dma_addr;
+		enum dma_data_direction dir;
+
+		PRINTK("%s: (async) len: %zu\n", __FUNCTION__, len);
+		dir = (flags & ASYNC_TX_ASSUME_COHERENT) ?
+			DMA_NONE : DMA_FROM_DEVICE;
+
+		dma_addr = dma_map_page(device->dev, dest, offset, len, dir);
+	     	device->device_set_dest(dma_addr, tx, 0);
+
+		async_tx_submit(chan, tx, flags, depend_tx, callback,
+			callback_param);
+	} else { /* run the memset synchronously */
+		void *dest_buf;
+		PRINTK("%s: (sync) len: %zu\n", __FUNCTION__, len);
+
+		dest_buf = (void *) (((char *) page_address(dest)) + offset);
+
+		/* wait for any prerequisite operations */
+		if (depend_tx) {
+			/* if ack is already set then we cannot be sure
+			 * we are referring to the correct operation
+			 */
+			BUG_ON(depend_tx->ack);
+			if (dma_wait_for_async_tx(depend_tx) == DMA_ERROR)
+				panic("%s: DMA_ERROR waiting for depend_tx\n",
+					__FUNCTION__);
+		}
+
+		memset(dest_buf, val, len);
+
+		sync_epilog(flags, depend_tx, callback, callback_param);
+	}
+
+	return tx;
+}
+EXPORT_SYMBOL_GPL(async_memset);
+
+/**
+ * async_interrupt - cause an interrupt to asynchrounously flush pending
+ *	completion callbacks, or schedule new callback.  Note: this rouine
+ *	assumes that all dma channels have the DMA_INTERRUPT capability
+ * @flags: ASYNC_TX_DEP_ACK
+ * @depend_tx: interrupt depends the result of this transaction
+ * @callback: function to call after the interrupt fires
+ * @callback_param: parameter to pass to the callback routine
+ */
+struct dma_async_tx_descriptor *
+async_interrupt(enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param)
+{
+	struct dma_chan *chan = async_tx_find_channel(depend_tx, DMA_INTERRUPT);
+	struct dma_device *device = chan ? chan->device : NULL;
+	struct dma_async_tx_descriptor *tx = device ?
+		device->device_prep_dma_interrupt(chan) : NULL;
+
+	if (tx) {
+		PRINTK("%s: (async)\n", __FUNCTION__);
+
+		async_tx_submit(chan, tx, flags, depend_tx, callback,
+			callback_param);
+	} else {
+		PRINTK("%s: (sync)\n", __FUNCTION__);
+
+		/* wait for any prerequisite operations */
+		if (depend_tx) {
+			/* if ack is already set then we cannot be sure
+			 * we are referring to the correct operation
+			 */
+			BUG_ON(depend_tx->ack);
+			if (dma_wait_for_async_tx(depend_tx) == DMA_ERROR)
+				panic("%s: DMA_ERROR waiting for depend_tx\n",
+					__FUNCTION__);
+		}
+
+		sync_epilog(flags, depend_tx, callback, callback_param);
+	}
+
+	return tx;
+}
+EXPORT_SYMBOL_GPL(async_interrupt);
+
+/**
+ * async_interrupt_cond - same as async_interrupt except that this will only be
+ *	if next_op must be run on a different channel.  Note: this rouine
+ *	assumes that all dma channels have the DMA_INTERRUPT capability
+ * @next_op: the next operation type to be submitted
+ * @flags: ASYNC_TX_DEP_ACK
+ * @depend_tx: interrupt depends the result of this transaction
+ */
+struct dma_async_tx_descriptor *
+async_interrupt_cond(enum dma_transaction_type next_op,
+	enum async_tx_flags flags, struct dma_async_tx_descriptor *depend_tx)
+{
+	int chan_switch = depend_tx ?
+		!test_bit(next_op, &depend_tx->chan->device->capabilities) : 0;
+	struct dma_chan *chan = chan_switch ? depend_tx->chan : NULL;
+	struct dma_device *device = chan ? chan->device : NULL;
+	struct dma_async_tx_descriptor *tx = device ?
+		device->device_prep_dma_interrupt(chan) : NULL;
+
+	if (!chan_switch)
+		return depend_tx;
+	else if (tx) {
+		PRINTK("%s: (async)\n", __FUNCTION__);
+
+		async_tx_submit(chan, tx, flags, depend_tx, NULL, NULL);
+	} else {
+		PRINTK("%s: (sync)\n", __FUNCTION__);
+
+		/* wait for any prerequisite operations */
+		if (depend_tx) {
+			/* if ack is already set then we cannot be sure
+			 * we are referring to the correct operation
+			 */
+			BUG_ON(depend_tx->ack);
+			if (dma_wait_for_async_tx(depend_tx) == DMA_ERROR)
+				panic("%s: DMA_ERROR waiting for depend_tx\n",
+					__FUNCTION__);
+		}
+
+		sync_epilog(flags, depend_tx, NULL, NULL);
+	}
+
+	return tx;
+}
+EXPORT_SYMBOL_GPL(async_interrupt_cond);
+
+module_init(async_tx_init);
+module_exit(async_tx_exit);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("Asynchronous Bulk Memory Transactions API");
+MODULE_LICENSE("GPL");
diff --git a/drivers/dma/xor.c b/drivers/dma/xor.c
new file mode 100644
index 0000000..6eb3416
--- /dev/null
+++ b/drivers/dma/xor.c
@@ -0,0 +1,153 @@
+/*
+ * xor.c : Multiple Devices driver for Linux
+ *
+ * Copyright (C) 1996, 1997, 1998, 1999, 2000,
+ * Ingo Molnar, Matti Aarnio, Jakub Jelinek, Richard Henderson.
+ *
+ * Dispatch optimized RAID-5 checksumming functions.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * You should have received a copy of the GNU General Public License
+ * (for example /usr/src/linux/COPYING); if not, write to the Free
+ * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#define BH_TRACE 0
+#include <linux/module.h>
+#include <linux/raid/md.h>
+#include <linux/raid/xor.h>
+#include <asm/xor.h>
+
+/* The xor routines to use.  */
+static struct xor_block_template *active_template;
+
+void
+xor_block(unsigned int src_count, unsigned int bytes, void *dest, void **srcs)
+{
+	unsigned long *p1, *p2, *p3, *p4;
+
+	p1 = (unsigned long *) srcs[0];
+	if (src_count == 1) {
+		active_template->do_2(bytes, dest, p1);
+		return;
+	}
+
+	p2 = (unsigned long *) srcs[1];
+	if (src_count == 2) {
+		active_template->do_3(bytes, dest, p1, p2);
+		return;
+	}
+
+	p3 = (unsigned long *) srcs[2];
+	if (src_count == 3) {
+		active_template->do_4(bytes, dest, p1, p2, p3);
+		return;
+	}
+
+	p4 = (unsigned long *) srcs[3];
+	active_template->do_5(bytes, dest, p1, p2, p3, p4);
+}
+
+/* Set of all registered templates.  */
+static struct xor_block_template *template_list;
+
+#define BENCH_SIZE (PAGE_SIZE)
+
+static void
+do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
+{
+	int speed;
+	unsigned long now;
+	int i, count, max;
+
+	tmpl->next = template_list;
+	template_list = tmpl;
+
+	/*
+	 * Count the number of XORs done during a whole jiffy, and use
+	 * this to calculate the speed of checksumming.  We use a 2-page
+	 * allocation to have guaranteed color L1-cache layout.
+	 */
+	max = 0;
+	for (i = 0; i < 5; i++) {
+		now = jiffies;
+		count = 0;
+		while (jiffies == now) {
+			mb();
+			tmpl->do_2(BENCH_SIZE, b1, b2);
+			mb();
+			count++;
+			mb();
+		}
+		if (count > max)
+			max = count;
+	}
+
+	speed = max * (HZ * BENCH_SIZE / 1024);
+	tmpl->speed = speed;
+
+	printk("   %-10s: %5d.%03d MB/sec\n", tmpl->name,
+	       speed / 1000, speed % 1000);
+}
+
+static int
+calibrate_xor_block(void)
+{
+	void *b1, *b2;
+	struct xor_block_template *f, *fastest;
+
+	b1 = (void *) __get_free_pages(GFP_KERNEL, 2);
+	if (! b1) {
+		printk("xor: Yikes!  No memory available.\n");
+		return -ENOMEM;
+	}
+	b2 = b1 + 2*PAGE_SIZE + BENCH_SIZE;
+
+	/*
+	 * If this arch/cpu has a short-circuited selection, don't loop through all
+	 * the possible functions, just test the best one
+	 */
+
+	fastest = NULL;
+
+#ifdef XOR_SELECT_TEMPLATE
+		fastest = XOR_SELECT_TEMPLATE(fastest);
+#endif
+
+#define xor_speed(templ)	do_xor_speed((templ), b1, b2)
+
+	if (fastest) {
+		printk(KERN_INFO "xor: automatically using best checksumming function: %s\n",
+			fastest->name);
+		xor_speed(fastest);
+	} else {
+		printk(KERN_INFO "xor: measuring software checksumming speed\n");
+		XOR_TRY_TEMPLATES;
+		fastest = template_list;
+		for (f = fastest; f; f = f->next)
+			if (f->speed > fastest->speed)
+				fastest = f;
+	}
+
+	printk("xor: using function: %s (%d.%03d MB/sec)\n",
+	       fastest->name, fastest->speed / 1000, fastest->speed % 1000);
+
+#undef xor_speed
+
+	free_pages((unsigned long)b1, 2);
+
+	active_template = fastest;
+	return 0;
+}
+
+static __exit void xor_exit(void) { }
+
+EXPORT_SYMBOL(xor_block);
+MODULE_LICENSE("GPL");
+
+module_init(calibrate_xor_block);
+module_exit(xor_exit);
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 4540ade..79a361e 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -8,6 +8,7 @@ menu "Multi-device support (RAID and LVM)"
 
 config MD
 	bool "Multiple devices driver support (RAID and LVM)"
+	select ASYNC_TX_DMA
 	help
 	  Support multiple physical spindles through a single logical device.
 	  Required for RAID and logical volume management.
@@ -108,7 +109,7 @@ config MD_RAID10
 
 config MD_RAID456
 	tristate "RAID-4/RAID-5/RAID-6 mode"
-	depends on BLK_DEV_MD
+	depends on BLK_DEV_MD && ASYNC_TX_DMA
 	---help---
 	  A RAID-5 set of N drives with a capacity of C MB per drive provides
 	  the capacity of C * (N - 1) MB, and protects against a failure
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 34957a6..23c3049 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -17,15 +17,15 @@ raid456-objs	:= raid5.o raid6algos.o raid6recov.o raid6tables.o \
 hostprogs-y	:= mktables
 
 # Note: link order is important.  All raid personalities
-# and xor.o must come before md.o, as they each initialise 
-# themselves, and md.o may use the personalities when it 
+# must come before md.o, as they each initialise
+# themselves, and md.o may use the personalities when it
 # auto-initialised.
 
 obj-$(CONFIG_MD_LINEAR)		+= linear.o
 obj-$(CONFIG_MD_RAID0)		+= raid0.o
 obj-$(CONFIG_MD_RAID1)		+= raid1.o
 obj-$(CONFIG_MD_RAID10)		+= raid10.o
-obj-$(CONFIG_MD_RAID456)	+= raid456.o xor.o
+obj-$(CONFIG_MD_RAID456)	+= raid456.o
 obj-$(CONFIG_MD_MULTIPATH)	+= multipath.o
 obj-$(CONFIG_MD_FAULTY)		+= faulty.o
 obj-$(CONFIG_BLK_DEV_MD)	+= md-mod.o
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 54a1ad5..6386814 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -916,25 +916,25 @@ static void copy_data(int frombio, struct bio *bio,
 	}
 }
 
-#define check_xor() 	do { 						\
-			   if (count == MAX_XOR_BLOCKS) {		\
-				xor_block(count, STRIPE_SIZE, ptr);	\
-				count = 1;				\
-			   }						\
+#define check_xor()	do {						       \
+		     	   if (count == MAX_XOR_BLOCKS) {		       \
+				xor_block(count, STRIPE_SIZE, dest, ptr);      \
+				count = 0;				       \
+			   }						       \
 			} while(0)
 
 
 static void compute_block(struct stripe_head *sh, int dd_idx)
 {
 	int i, count, disks = sh->disks;
-	void *ptr[MAX_XOR_BLOCKS], *p;
+	void *ptr[MAX_XOR_BLOCKS], *dest, *p;
 
 	PRINTK("compute_block, stripe %llu, idx %d\n", 
 		(unsigned long long)sh->sector, dd_idx);
 
-	ptr[0] = page_address(sh->dev[dd_idx].page);
-	memset(ptr[0], 0, STRIPE_SIZE);
-	count = 1;
+	dest = page_address(sh->dev[dd_idx].page);
+	memset(dest, 0, STRIPE_SIZE);
+	count = 0;
 	for (i = disks ; i--; ) {
 		if (i == dd_idx)
 			continue;
@@ -948,8 +948,8 @@ static void compute_block(struct stripe_head *sh, int dd_idx)
 
 		check_xor();
 	}
-	if (count != 1)
-		xor_block(count, STRIPE_SIZE, ptr);
+	if (count)
+		xor_block(count, STRIPE_SIZE, dest, ptr);
 	set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
 }
 
@@ -957,14 +957,14 @@ static void compute_parity5(struct stripe_head *sh, int method)
 {
 	raid5_conf_t *conf = sh->raid_conf;
 	int i, pd_idx = sh->pd_idx, disks = sh->disks, count;
-	void *ptr[MAX_XOR_BLOCKS];
+	void *ptr[MAX_XOR_BLOCKS], *dest;
 	struct bio *chosen;
 
 	PRINTK("compute_parity5, stripe %llu, method %d\n",
 		(unsigned long long)sh->sector, method);
 
-	count = 1;
-	ptr[0] = page_address(sh->dev[pd_idx].page);
+	count = 0;
+	dest = page_address(sh->dev[pd_idx].page);
 	switch(method) {
 	case READ_MODIFY_WRITE:
 		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags));
@@ -987,7 +987,7 @@ static void compute_parity5(struct stripe_head *sh, int method)
 		}
 		break;
 	case RECONSTRUCT_WRITE:
-		memset(ptr[0], 0, STRIPE_SIZE);
+		memset(dest, 0, STRIPE_SIZE);
 		for (i= disks; i-- ;)
 			if (i!=pd_idx && sh->dev[i].towrite) {
 				chosen = sh->dev[i].towrite;
@@ -1003,9 +1003,9 @@ static void compute_parity5(struct stripe_head *sh, int method)
 	case CHECK_PARITY:
 		break;
 	}
-	if (count>1) {
-		xor_block(count, STRIPE_SIZE, ptr);
-		count = 1;
+	if (count) {
+		xor_block(count, STRIPE_SIZE, dest, ptr);
+		count = 0;
 	}
 	
 	for (i = disks; i--;)
@@ -1037,8 +1037,8 @@ static void compute_parity5(struct stripe_head *sh, int method)
 				check_xor();
 			}
 	}
-	if (count != 1)
-		xor_block(count, STRIPE_SIZE, ptr);
+	if (count)
+		xor_block(count, STRIPE_SIZE, dest, ptr);
 	
 	if (method != CHECK_PARITY) {
 		set_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
@@ -1132,7 +1132,7 @@ static void compute_parity6(struct stripe_head *sh, int method)
 static void compute_block_1(struct stripe_head *sh, int dd_idx, int nozero)
 {
 	int i, count, disks = sh->disks;
-	void *ptr[MAX_XOR_BLOCKS], *p;
+	void *ptr[MAX_XOR_BLOCKS], *dest, *p;
 	int pd_idx = sh->pd_idx;
 	int qd_idx = raid6_next_disk(pd_idx, disks);
 
@@ -1143,9 +1143,9 @@ static void compute_block_1(struct stripe_head *sh, int dd_idx, int nozero)
 		/* We're actually computing the Q drive */
 		compute_parity6(sh, UPDATE_PARITY);
 	} else {
-		ptr[0] = page_address(sh->dev[dd_idx].page);
-		if (!nozero) memset(ptr[0], 0, STRIPE_SIZE);
-		count = 1;
+		dest = page_address(sh->dev[dd_idx].page);
+		if (!nozero) memset(dest, 0, STRIPE_SIZE);
+		count = 0;
 		for (i = disks ; i--; ) {
 			if (i == dd_idx || i == qd_idx)
 				continue;
@@ -1159,8 +1159,8 @@ static void compute_block_1(struct stripe_head *sh, int dd_idx, int nozero)
 
 			check_xor();
 		}
-		if (count != 1)
-			xor_block(count, STRIPE_SIZE, ptr);
+		if (count)
+			xor_block(count, STRIPE_SIZE, dest, ptr);
 		if (!nozero) set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
 		else clear_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
 	}
diff --git a/drivers/md/xor.c b/drivers/md/xor.c
deleted file mode 100644
index 324897c..0000000
--- a/drivers/md/xor.c
+++ /dev/null
@@ -1,154 +0,0 @@
-/*
- * xor.c : Multiple Devices driver for Linux
- *
- * Copyright (C) 1996, 1997, 1998, 1999, 2000,
- * Ingo Molnar, Matti Aarnio, Jakub Jelinek, Richard Henderson.
- *
- * Dispatch optimized RAID-5 checksumming functions.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2, or (at your option)
- * any later version.
- *
- * You should have received a copy of the GNU General Public License
- * (for example /usr/src/linux/COPYING); if not, write to the Free
- * Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- */
-
-#define BH_TRACE 0
-#include <linux/module.h>
-#include <linux/raid/md.h>
-#include <linux/raid/xor.h>
-#include <asm/xor.h>
-
-/* The xor routines to use.  */
-static struct xor_block_template *active_template;
-
-void
-xor_block(unsigned int count, unsigned int bytes, void **ptr)
-{
-	unsigned long *p0, *p1, *p2, *p3, *p4;
-
-	p0 = (unsigned long *) ptr[0];
-	p1 = (unsigned long *) ptr[1];
-	if (count == 2) {
-		active_template->do_2(bytes, p0, p1);
-		return;
-	}
-
-	p2 = (unsigned long *) ptr[2];
-	if (count == 3) {
-		active_template->do_3(bytes, p0, p1, p2);
-		return;
-	}
-
-	p3 = (unsigned long *) ptr[3];
-	if (count == 4) {
-		active_template->do_4(bytes, p0, p1, p2, p3);
-		return;
-	}
-
-	p4 = (unsigned long *) ptr[4];
-	active_template->do_5(bytes, p0, p1, p2, p3, p4);
-}
-
-/* Set of all registered templates.  */
-static struct xor_block_template *template_list;
-
-#define BENCH_SIZE (PAGE_SIZE)
-
-static void
-do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
-{
-	int speed;
-	unsigned long now;
-	int i, count, max;
-
-	tmpl->next = template_list;
-	template_list = tmpl;
-
-	/*
-	 * Count the number of XORs done during a whole jiffy, and use
-	 * this to calculate the speed of checksumming.  We use a 2-page
-	 * allocation to have guaranteed color L1-cache layout.
-	 */
-	max = 0;
-	for (i = 0; i < 5; i++) {
-		now = jiffies;
-		count = 0;
-		while (jiffies == now) {
-			mb();
-			tmpl->do_2(BENCH_SIZE, b1, b2);
-			mb();
-			count++;
-			mb();
-		}
-		if (count > max)
-			max = count;
-	}
-
-	speed = max * (HZ * BENCH_SIZE / 1024);
-	tmpl->speed = speed;
-
-	printk("   %-10s: %5d.%03d MB/sec\n", tmpl->name,
-	       speed / 1000, speed % 1000);
-}
-
-static int
-calibrate_xor_block(void)
-{
-	void *b1, *b2;
-	struct xor_block_template *f, *fastest;
-
-	b1 = (void *) __get_free_pages(GFP_KERNEL, 2);
-	if (! b1) {
-		printk("raid5: Yikes!  No memory available.\n");
-		return -ENOMEM;
-	}
-	b2 = b1 + 2*PAGE_SIZE + BENCH_SIZE;
-
-	/*
-	 * If this arch/cpu has a short-circuited selection, don't loop through all
-	 * the possible functions, just test the best one
-	 */
-
-	fastest = NULL;
-
-#ifdef XOR_SELECT_TEMPLATE
-		fastest = XOR_SELECT_TEMPLATE(fastest);
-#endif
-
-#define xor_speed(templ)	do_xor_speed((templ), b1, b2)
-
-	if (fastest) {
-		printk(KERN_INFO "raid5: automatically using best checksumming function: %s\n",
-			fastest->name);
-		xor_speed(fastest);
-	} else {
-		printk(KERN_INFO "raid5: measuring checksumming speed\n");
-		XOR_TRY_TEMPLATES;
-		fastest = template_list;
-		for (f = fastest; f; f = f->next)
-			if (f->speed > fastest->speed)
-				fastest = f;
-	}
-
-	printk("raid5: using function: %s (%d.%03d MB/sec)\n",
-	       fastest->name, fastest->speed / 1000, fastest->speed % 1000);
-
-#undef xor_speed
-
-	free_pages((unsigned long)b1, 2);
-
-	active_template = fastest;
-	return 0;
-}
-
-static __exit void xor_exit(void) { }
-
-EXPORT_SYMBOL(xor_block);
-MODULE_LICENSE("GPL");
-
-module_init(calibrate_xor_block);
-module_exit(xor_exit);
diff --git a/include/linux/async_tx.h b/include/linux/async_tx.h
new file mode 100644
index 0000000..302c4f6
--- /dev/null
+++ b/include/linux/async_tx.h
@@ -0,0 +1,180 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#include <linux/dmaengine.h>
+#include <linux/spinlock.h>
+#include <linux/interrupt.h>
+
+struct dma_chan_ref {
+	struct dma_chan *chan;
+	struct list_head async_node;
+	struct rcu_head rcu;
+};
+
+struct async_iter_percpu {
+	struct list_head *iter;
+	unsigned long local_version;
+};
+
+struct async_channel_entry {
+	struct list_head list;
+	spinlock_t lock;
+	struct async_iter_percpu *local_iter;
+	atomic_t version;
+};
+
+/**
+ * async_tx_flags - modifiers for the async_* calls
+ * @ASYNC_TX_XOR_ZERO_DST: for synchronous xor: zero the destination
+ *	asynchronous assumes a pre-zeroed destination
+ * @ASYNC_TX_XOR_DROP_DST: for synchronous xor: drop source index zero (dest)
+ *	the dest is an implicit source to the synchronous routine
+ * @ASYNC_TX_ASSUME_COHERENT: skip cache maintenance operations
+ * @ASYNC_TX_ACK: immediately ack the descriptor, preclude setting up a
+ *	dependency chain
+ * @ASYNC_TX_DEP_ACK: ack the dependency
+ * @ASYNC_TX_INT_EN: have the dma engine trigger an interrupt on completion
+ * @ASYNC_TX_KMAP_SRC: take an atomic mapping (KM_USER0) on the source page(s)
+ *	if the transaction is to be performed synchronously
+ * @ASYNC_TX_KMAP_DST: take an atomic mapping (KM_USER0) on the dest page(s)
+ *	if the transaction is to be performed synchronously
+ */
+enum async_tx_flags {
+	ASYNC_TX_XOR_ZERO_DST	 = (1 << 0),
+	ASYNC_TX_XOR_DROP_DST	 = (1 << 1),
+	ASYNC_TX_ASSUME_COHERENT = (1 << 2),
+	ASYNC_TX_ACK		 = (1 << 3),
+	ASYNC_TX_DEP_ACK	 = (1 << 4),
+	ASYNC_TX_INT_EN		 = (1 << 5),
+	ASYNC_TX_KMAP_SRC	 = (1 << 6),
+	ASYNC_TX_KMAP_DST	 = (1 << 7),
+};
+
+#ifdef CONFIG_DMA_ENGINE
+static inline enum dma_status
+dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx)
+{
+	enum dma_status status;
+	struct dma_async_tx_descriptor *iter;
+
+	if (!tx)
+		return DMA_SUCCESS;
+
+	/* poll through the dependency chain, return when tx is complete */
+	do {
+		iter = tx;
+		while (iter->cookie == -EBUSY)
+			iter = iter->parent;
+
+		status = dma_sync_wait(iter->chan, iter->cookie);
+	} while (status == DMA_IN_PROGRESS || (iter != tx));
+
+	return status;
+}
+
+extern struct async_channel_entry async_tx_master_list;
+static inline void async_tx_issue_pending_all(void)
+{
+	struct dma_chan_ref *ref;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ref, &async_tx_master_list.list, async_node)
+		ref->chan->device->device_issue_pending(ref->chan);
+	rcu_read_unlock();
+}
+
+static inline void
+async_tx_run_dependencies(struct dma_async_tx_descriptor *tx,
+	struct dma_chan *host_chan)
+{
+	struct dma_async_tx_descriptor *dep_tx, *_dep_tx;
+	struct dma_device *dev;
+	struct dma_chan *chan;
+
+	list_for_each_entry_safe(dep_tx, _dep_tx, &tx->depend_list,
+		depend_node) {
+		chan = dep_tx->chan;
+		dev = chan->device;
+		/* we can't depend on ourselves */
+		BUG_ON(chan == host_chan);
+		list_del(&dep_tx->depend_node);
+		dev->device_tx_submit(dep_tx);
+
+		/* we need to poke the engine as client code does not
+		 * know about dependency submission events
+		 */
+		dev->device_issue_pending(chan);
+	}
+}
+#else
+static inline void
+async_tx_run_dependencies(struct dma_async_tx_descriptor *tx,
+	struct dma_chan *host_chan)
+{
+	do { } while (0);
+}
+
+static inline enum dma_status
+dma_wait_for_async_tx(struct dma_async_tx_descriptor *tx)
+{
+	return DMA_SUCCESS;
+}
+
+static inline void async_tx_issue_pending_all(void)
+{
+	do { } while (0);
+}
+#endif
+
+static inline void
+async_tx_ack(struct dma_async_tx_descriptor *tx)
+{
+	tx->ack = 1;
+}
+
+struct dma_async_tx_descriptor *
+async_xor(struct page *dest, struct page **src_list, unsigned int offset,
+	unsigned int src_cnt, size_t len, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param);
+struct dma_async_tx_descriptor *
+async_xor_zero_sum(struct page *dest, struct page **src_list,
+	unsigned int offset, unsigned int src_cnt, size_t len,
+	u32 *result, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param);
+struct dma_async_tx_descriptor *
+async_memcpy(struct page *dest, struct page *src, unsigned int dest_offset,
+	unsigned int src_offset, size_t len, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param);
+struct dma_async_tx_descriptor *
+async_memset(struct page *dest, int val, unsigned int offset,
+	size_t len, enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param);
+struct dma_async_tx_descriptor *
+async_interrupt(enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx,
+	dma_async_tx_callback callback, void *callback_param);
+struct dma_async_tx_descriptor *
+async_interrupt_cond(enum dma_transaction_type next_op,
+	enum async_tx_flags flags,
+	struct dma_async_tx_descriptor *depend_tx);
diff --git a/include/linux/raid/xor.h b/include/linux/raid/xor.h
index f0d67cb..d151f16 100644
--- a/include/linux/raid/xor.h
+++ b/include/linux/raid/xor.h
@@ -3,9 +3,10 @@
 
 #include <linux/raid/md.h>
 
-#define MAX_XOR_BLOCKS 5
+#define MAX_XOR_BLOCKS 4
 
-extern void xor_block(unsigned int count, unsigned int bytes, void **ptr);
+extern void xor_block(unsigned int count, unsigned int bytes,
+	void *dest, void **srcs);
 
 struct xor_block_template {
         struct xor_block_template *next;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 04/15] md: add raid5_run_ops and support routines
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (2 preceding siblings ...)
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 03/15] dmaengine: add the async_tx api Dan Williams
@ 2007-03-23  6:51 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 05/15] md: use raid5_run_ops for stripe cache operations Dan Williams
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:51 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

Prepare the raid5 implementation to use async_tx for running stripe
operations:
* biofill (copy data into request buffers to satisfy a read request)
* compute block (generate a missing block in the cache from the other
blocks)
* prexor (subtract existing data as part of the read-modify-write process)
* biodrain (copy data out of request buffers to satisfy a write request)
* postxor (recalculate parity for new data that has entered the cache)
* check (verify that the parity is correct)
* io (submit i/o to the member disks)

Changelog:
* removed ops_complete_biodrain in favor of ops_complete_postxor and
ops_complete_write.
* removed the workqueue
* call bi_end_io for reads in ops_complete_biofill

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c         |  520 ++++++++++++++++++++++++++++++++++++++++++++
 include/linux/raid/raid5.h |   63 +++++
 2 files changed, 580 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6386814..b7185a1 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -52,6 +52,7 @@
 #include "raid6.h"
 
 #include <linux/raid/bitmap.h>
+#include <linux/async_tx.h>
 
 /*
  * Stripe cache
@@ -324,6 +325,525 @@ static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector
 	return sh;
 }
 
+static int
+raid5_end_read_request(struct bio * bi, unsigned int bytes_done, int error);
+static int
+raid5_end_write_request (struct bio *bi, unsigned int bytes_done, int error);
+
+static void ops_run_io(struct stripe_head *sh)
+{
+	raid5_conf_t *conf = sh->raid_conf;
+	int i, disks = sh->disks;
+
+	might_sleep();
+
+	for (i=disks; i-- ;) {
+		int rw;
+		struct bio *bi;
+		mdk_rdev_t *rdev;
+		if (test_and_clear_bit(R5_Wantwrite, &sh->dev[i].flags))
+			rw = WRITE;
+		else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
+			rw = READ;
+		else
+			continue;
+
+		bi = &sh->dev[i].req;
+
+		bi->bi_rw = rw;
+		if (rw == WRITE)
+			bi->bi_end_io = raid5_end_write_request;
+		else
+			bi->bi_end_io = raid5_end_read_request;
+
+		rcu_read_lock();
+		rdev = rcu_dereference(conf->disks[i].rdev);
+		if (rdev && test_bit(Faulty, &rdev->flags))
+			rdev = NULL;
+		if (rdev)
+			atomic_inc(&rdev->nr_pending);
+		rcu_read_unlock();
+
+		if (rdev) {
+			if (test_bit(STRIPE_SYNCING, &sh->state) ||
+				test_bit(STRIPE_EXPAND_SOURCE, &sh->state) ||
+				test_bit(STRIPE_EXPAND_READY, &sh->state))
+				md_sync_acct(rdev->bdev, STRIPE_SECTORS);
+
+			bi->bi_bdev = rdev->bdev;
+			PRINTK("%s: for %llu schedule op %ld on disc %d\n",
+				__FUNCTION__, (unsigned long long)sh->sector,
+				bi->bi_rw, i);
+			atomic_inc(&sh->count);
+			bi->bi_sector = sh->sector + rdev->data_offset;
+			bi->bi_flags = 1 << BIO_UPTODATE;
+			bi->bi_vcnt = 1;
+			bi->bi_max_vecs = 1;
+			bi->bi_idx = 0;
+			bi->bi_io_vec = &sh->dev[i].vec;
+			bi->bi_io_vec[0].bv_len = STRIPE_SIZE;
+			bi->bi_io_vec[0].bv_offset = 0;
+			bi->bi_size = STRIPE_SIZE;
+			bi->bi_next = NULL;
+			if (rw == WRITE &&
+			    test_bit(R5_ReWrite, &sh->dev[i].flags))
+				atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
+			generic_make_request(bi);
+		} else {
+			if (rw == WRITE)
+				set_bit(STRIPE_DEGRADED, &sh->state);
+			PRINTK("skip op %ld on disc %d for sector %llu\n",
+				bi->bi_rw, i, (unsigned long long)sh->sector);
+			clear_bit(R5_LOCKED, &sh->dev[i].flags);
+			set_bit(STRIPE_HANDLE, &sh->state);
+		}
+	}
+}
+
+static struct dma_async_tx_descriptor *
+async_copy_data(int frombio, struct bio *bio, struct page *page, sector_t sector,
+	struct dma_async_tx_descriptor *tx)
+{
+	struct bio_vec *bvl;
+	struct page *bio_page;
+	int i;
+	int page_offset;
+
+	if (bio->bi_sector >= sector)
+		page_offset = (signed)(bio->bi_sector - sector) * 512;
+	else
+		page_offset = (signed)(sector - bio->bi_sector) * -512;
+	bio_for_each_segment(bvl, bio, i) {
+		int len = bio_iovec_idx(bio,i)->bv_len;
+		int clen;
+		int b_offset = 0;
+
+		if (page_offset < 0) {
+			b_offset = -page_offset;
+			page_offset += b_offset;
+			len -= b_offset;
+		}
+
+		if (len > 0 && page_offset + len > STRIPE_SIZE)
+			clen = STRIPE_SIZE - page_offset;
+		else clen = len;
+
+		if (clen > 0) {
+			b_offset += bio_iovec_idx(bio,i)->bv_offset;
+			bio_page = bio_iovec_idx(bio,i)->bv_page;
+			if (frombio)
+				tx = async_memcpy(page, bio_page, page_offset,
+					b_offset, clen,
+					ASYNC_TX_DEP_ACK | ASYNC_TX_KMAP_SRC,
+					tx, NULL, NULL);
+			else
+				tx = async_memcpy(bio_page, page, b_offset,
+					page_offset, clen,
+					ASYNC_TX_DEP_ACK | ASYNC_TX_KMAP_DST,
+					tx, NULL, NULL);
+		}
+		if (clen < len) /* hit end of page */
+			break;
+		page_offset +=  len;
+	}
+
+	return tx;
+}
+
+static void ops_complete_biofill(void *stripe_head_ref)
+{
+	struct stripe_head *sh = stripe_head_ref;
+	struct bio *return_bi = NULL, *bi;
+	int i;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	/* clear completed biofills */
+	spin_lock(&sh->lock);
+	BUG_ON(!test_and_clear_bit(STRIPE_OP_BIOFILL, &sh->ops.ack));
+	BUG_ON(!test_and_clear_bit(STRIPE_OP_BIOFILL, &sh->ops.pending));
+	for (i=sh->disks ; i-- ;) {
+		struct r5dev *dev = &sh->dev[i];
+		/* acknowledge completion of a biofill operation */
+		if (test_bit(R5_Wantfill, &dev->flags) && !dev->toread)
+			clear_bit(R5_Wantfill, &dev->flags);
+
+		/* reply to the read */
+		if (dev->read && !test_bit(R5_Wantfill, &dev->flags) &&
+			!test_bit(STRIPE_OP_BIOFILL, &sh->ops.pending)) {
+			return_bi = dev->read;
+			dev->read = NULL;
+		}
+	}
+	spin_unlock(&sh->lock);
+
+	while ((bi=return_bi)) {
+		int bytes = bi->bi_size;
+
+		return_bi = bi->bi_next;
+		bi->bi_next = NULL;
+		bi->bi_size = 0;
+		bi->bi_end_io(bi, bytes,
+			      test_bit(BIO_UPTODATE, &bi->bi_flags)
+			        ? 0 : -EIO);
+	}
+	
+	release_stripe(sh);
+}
+
+static void ops_run_biofill(struct stripe_head *sh)
+{
+	struct bio *return_bi = NULL;
+	struct dma_async_tx_descriptor *tx = NULL;
+	raid5_conf_t *conf = sh->raid_conf;
+	int i;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	for (i=sh->disks ; i-- ;) {
+		struct r5dev *dev = &sh->dev[i];
+		if (test_bit(R5_Wantfill, &dev->flags)) {
+			struct bio *rbi, *rbi2;
+			spin_lock_irq(&conf->device_lock);
+			rbi = dev->toread;
+			dev->toread = NULL;
+			spin_unlock_irq(&conf->device_lock);
+			while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
+				tx = async_copy_data(0, rbi, dev->page,
+					dev->sector, tx);
+				rbi2 = r5_next_bio(rbi, dev->sector);
+				spin_lock_irq(&conf->device_lock);
+				if (--rbi->bi_phys_segments == 0) {
+					rbi->bi_next = return_bi;
+					return_bi = rbi;
+				}
+				spin_unlock_irq(&conf->device_lock);
+				rbi = rbi2;
+			}
+			dev->read = return_bi;
+		}
+	}
+
+	atomic_inc(&sh->count);
+	async_interrupt(ASYNC_TX_DEP_ACK | ASYNC_TX_ACK, tx,
+		ops_complete_biofill, sh);
+}
+
+static void ops_complete_compute5(void *stripe_head_ref)
+{
+	struct stripe_head *sh = stripe_head_ref;
+	int target = sh->ops.target;
+	struct r5dev *tgt = &sh->dev[target];
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	set_bit(R5_UPTODATE, &tgt->flags);
+	BUG_ON(!test_and_clear_bit(R5_Wantcompute, &tgt->flags));
+	BUG_ON(test_and_set_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete));
+	set_bit(STRIPE_HANDLE, &sh->state);
+	release_stripe(sh);
+}
+
+static struct dma_async_tx_descriptor *
+ops_run_compute5(struct stripe_head *sh, unsigned long pending)
+{
+	/* kernel stack size limits the total number of disks */
+	int disks = sh->disks;
+	struct page *xor_srcs[disks];
+	int target = sh->ops.target;
+	struct r5dev *tgt = &sh->dev[target];
+	struct page *xor_dest = tgt->page;
+	int count = 0;
+	struct dma_async_tx_descriptor *tx;
+	int i;
+
+	PRINTK("%s: stripe %llu block: %d\n",
+		__FUNCTION__, (unsigned long long)sh->sector, target);
+	BUG_ON(!test_bit(R5_Wantcompute, &tgt->flags));
+
+	for (i=disks ; i-- ; )
+		if (i != target)
+			xor_srcs[count++] = sh->dev[i].page;
+
+	atomic_inc(&sh->count);
+
+	tx = async_xor(xor_dest, xor_srcs, 0, count, STRIPE_SIZE,
+		ASYNC_TX_XOR_ZERO_DST | ASYNC_TX_INT_EN, NULL,
+		ops_complete_compute5, sh);
+
+	/* ack now if postxor is not set to be run */
+	if (tx && !test_bit(STRIPE_OP_POSTXOR, &pending))
+		async_tx_ack(tx);
+
+	return tx;
+}
+
+static void ops_complete_prexor(void *stripe_head_ref)
+{
+	struct stripe_head *sh = stripe_head_ref;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	set_bit(STRIPE_OP_PREXOR, &sh->ops.complete);
+}
+
+static struct dma_async_tx_descriptor *
+ops_run_prexor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
+{
+	/* kernel stack size limits the total number of disks */
+	int disks = sh->disks;
+	struct page *xor_srcs[disks];
+	int count = 0, pd_idx = sh->pd_idx, i;
+
+	/* existing parity data subtracted */
+	struct page *xor_dest = xor_srcs[count++] = sh->dev[pd_idx].page;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	for (i=disks ; i-- ;) {
+		struct r5dev *dev = &sh->dev[i];
+		/* Only process blocks that are known to be uptodate */
+		if (dev->towrite && test_bit(R5_Wantprexor, &dev->flags))
+			xor_srcs[count++] = dev->page;
+	}
+
+	tx = async_xor(xor_dest, xor_srcs, 0, count, STRIPE_SIZE,
+		ASYNC_TX_DEP_ACK | ASYNC_TX_XOR_DROP_DST, tx,
+		ops_complete_prexor, sh);
+
+	/* trigger a channel switch if necesary */
+	tx = async_interrupt_cond(DMA_MEMCPY, ASYNC_TX_DEP_ACK, tx);
+
+	return tx;
+}
+
+static struct dma_async_tx_descriptor *
+ops_run_biodrain(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
+{
+	int disks = sh->disks;
+	int pd_idx = sh->pd_idx, i;
+
+	/* check if prexor is active which means only process blocks
+	 * that are part of a read-modify-write (Wantprexor)
+	 */
+	int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	for (i=disks ; i-- ;) {
+		struct r5dev *dev = &sh->dev[i];
+		struct bio *chosen;
+		int towrite;
+
+		towrite = 0;
+		if (prexor) { /* rmw */
+			if (dev->towrite && test_bit(R5_Wantprexor, &dev->flags))
+				towrite = 1;
+		} else { /* rcw */
+			if (i!=pd_idx && dev->towrite &&
+				test_bit(R5_LOCKED, &dev->flags))
+				towrite = 1;
+		}
+
+		if (towrite) {
+			struct bio *wbi;
+
+			spin_lock(&sh->lock);
+			chosen = dev->towrite;
+			dev->towrite = NULL;
+			BUG_ON(dev->written);
+			wbi = dev->written = chosen;
+			spin_unlock(&sh->lock);
+
+			while (wbi && wbi->bi_sector < dev->sector + STRIPE_SECTORS) {
+				tx = async_copy_data(1, wbi, dev->page,
+					dev->sector, tx);
+				wbi = r5_next_bio(wbi, dev->sector);
+			}
+		}
+	}
+
+	return tx;
+}
+
+static void ops_complete_postxor(void *stripe_head_ref)
+{
+	struct stripe_head *sh = stripe_head_ref;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	BUG_ON(test_and_set_bit(STRIPE_OP_POSTXOR, &sh->ops.complete));
+	set_bit(STRIPE_HANDLE, &sh->state);
+	release_stripe(sh);
+}
+
+static void ops_complete_write(void *stripe_head_ref)
+{
+	struct stripe_head *sh = stripe_head_ref;
+	int disks = sh->disks, i, pd_idx = sh->pd_idx;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	for (i=disks ; i-- ;) {
+		struct r5dev *dev = &sh->dev[i];
+		if (dev->written || i == pd_idx)
+			set_bit(R5_UPTODATE, &dev->flags);
+	}
+
+	BUG_ON(test_and_set_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete));
+	BUG_ON(test_and_set_bit(STRIPE_OP_POSTXOR, &sh->ops.complete));
+
+	set_bit(STRIPE_HANDLE, &sh->state);
+	release_stripe(sh);
+}
+
+static void
+ops_run_postxor(struct stripe_head *sh, struct dma_async_tx_descriptor *tx)
+{
+	/* kernel stack size limits the total number of disks */
+	int disks = sh->disks;
+	struct page *xor_srcs[disks];
+
+	int count = 0, pd_idx = sh->pd_idx, i;
+	struct page *xor_dest;
+	int prexor = test_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+	unsigned long flags;
+	dma_async_tx_callback callback;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	/* check if prexor is active which means only process blocks
+	 * that are part of a read-modify-write (written)
+	 */
+	if (prexor) {
+		xor_dest = xor_srcs[count++] = sh->dev[pd_idx].page;
+		for (i=disks; i--;) {
+			struct r5dev *dev = &sh->dev[i];
+			if (dev->written)
+				xor_srcs[count++] = dev->page;
+		}
+	} else {
+		xor_dest = sh->dev[pd_idx].page;
+		for (i=disks; i--;) {
+			struct r5dev *dev = &sh->dev[i];
+			if (i!=pd_idx)
+				xor_srcs[count++] = dev->page;
+		}
+	}
+
+	/* check whether this postxor is part of a write */
+	callback = test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending) ?
+		ops_complete_write : ops_complete_postxor;
+
+	/* 1/ if we prexor'd then the dest is reused as a source
+	 * 2/ if we did not prexor then we are redoing the parity
+	 * set ASYNC_TX_XOR_DROP_DST and ASYNC_TX_XOR_ZERO_DST
+	 * for the synchronous xor case
+	 */
+	flags = ASYNC_TX_DEP_ACK | ASYNC_TX_ACK | ASYNC_TX_INT_EN |
+		(prexor ? ASYNC_TX_XOR_DROP_DST : ASYNC_TX_XOR_ZERO_DST);
+
+	atomic_inc(&sh->count);
+	tx = async_xor(xor_dest, xor_srcs, 0, count, STRIPE_SIZE,
+		flags, tx, callback, sh);
+}
+
+static void ops_complete_check(void *stripe_head_ref)
+{
+	struct stripe_head *sh = stripe_head_ref;
+	int pd_idx = sh->pd_idx;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	if (test_and_clear_bit(STRIPE_OP_MOD_DMA_CHECK, &sh->ops.pending) &&
+		sh->ops.zero_sum_result == 0)
+		set_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
+
+	BUG_ON(test_and_set_bit(STRIPE_OP_CHECK, &sh->ops.complete));
+	set_bit(STRIPE_HANDLE, &sh->state);
+	release_stripe(sh);
+}
+
+static void ops_run_check(struct stripe_head *sh)
+{
+	/* kernel stack size limits the total number of disks */
+	int disks = sh->disks;
+	struct page *xor_srcs[disks];
+	struct dma_async_tx_descriptor *tx;
+
+	int count = 0, pd_idx = sh->pd_idx, i;
+	struct page *xor_dest = xor_srcs[count++] = sh->dev[pd_idx].page;
+
+	PRINTK("%s: stripe %llu\n", __FUNCTION__,
+		(unsigned long long)sh->sector);
+
+	for (i=disks; i--;) {
+		struct r5dev *dev = &sh->dev[i];
+		if (i != pd_idx)
+			xor_srcs[count++] = dev->page;
+	}
+
+	tx = async_xor_zero_sum(xor_dest, xor_srcs, 0, count, STRIPE_SIZE,
+		&sh->ops.zero_sum_result, 0, NULL, NULL, NULL);
+
+	if (tx)
+		set_bit(STRIPE_OP_MOD_DMA_CHECK, &sh->ops.pending);
+	else
+		clear_bit(STRIPE_OP_MOD_DMA_CHECK, &sh->ops.pending);
+
+	atomic_inc(&sh->count);
+	tx = async_interrupt(ASYNC_TX_DEP_ACK | ASYNC_TX_ACK, tx,
+		ops_complete_check, sh);
+}
+
+static void raid5_run_ops(struct stripe_head *sh, unsigned long pending)
+{
+	int overlap_clear=0, i, disks = sh->disks;
+	struct dma_async_tx_descriptor *tx = NULL;
+
+	if (test_bit(STRIPE_OP_BIOFILL, &pending)) {
+		ops_run_biofill(sh);
+		overlap_clear++;
+	}
+
+	if (test_bit(STRIPE_OP_COMPUTE_BLK, &pending))
+		tx = ops_run_compute5(sh, pending);
+
+	if (test_bit(STRIPE_OP_PREXOR, &pending))
+		tx = ops_run_prexor(sh, tx);
+
+	if (test_bit(STRIPE_OP_BIODRAIN, &pending)) {
+		tx = ops_run_biodrain(sh, tx);
+		overlap_clear++;
+	}
+
+	if (test_bit(STRIPE_OP_POSTXOR, &pending))
+		ops_run_postxor(sh, tx);
+
+	if (test_bit(STRIPE_OP_CHECK, &pending))
+		ops_run_check(sh);
+
+	if (test_bit(STRIPE_OP_IO, &pending))
+		ops_run_io(sh);
+
+	if (overlap_clear)
+		for (i=disks; i-- ;) {
+			struct r5dev *dev = &sh->dev[i];
+			if (test_and_clear_bit(R5_Overlap, &dev->flags))
+				wake_up(&sh->raid_conf->wait_for_overlap);
+		}
+}
+
 static int grow_one_stripe(raid5_conf_t *conf)
 {
 	struct stripe_head *sh;
diff --git a/include/linux/raid/raid5.h b/include/linux/raid/raid5.h
index d8286db..3541d2c 100644
--- a/include/linux/raid/raid5.h
+++ b/include/linux/raid/raid5.h
@@ -116,13 +116,42 @@
  *  attach a request to an active stripe (add_stripe_bh())
  *     lockdev attach-buffer unlockdev
  *  handle a stripe (handle_stripe())
- *     lockstripe clrSTRIPE_HANDLE ... (lockdev check-buffers unlockdev) .. change-state .. record io needed unlockstripe schedule io
+ *     lockstripe clrSTRIPE_HANDLE ... (lockdev check-buffers unlockdev) .. change-state .. record io/ops needed unlockstripe schedule io/ops
  *  release an active stripe (release_stripe())
  *     lockdev if (!--cnt) { if  STRIPE_HANDLE, add to handle_list else add to inactive-list } unlockdev
  *
  * The refcount counts each thread that have activated the stripe,
  * plus raid5d if it is handling it, plus one for each active request
- * on a cached buffer.
+ * on a cached buffer, and plus one if the stripe is undergoing stripe
+ * operations.
+ *
+ * Stripe operations are performed outside the stripe lock,
+ * the stripe operations are:
+ * -copying data between the stripe cache and user application buffers
+ * -computing blocks to save a disk access, or to recover a missing block
+ * -updating the parity on a write operation (reconstruct write and read-modify-write)
+ * -checking parity correctness
+ * -running i/o to disk
+ * These operations are carried out by raid5_run_ops which uses the async_tx
+ * api to (optionally) offload operations to dedicated hardware engines.
+ * When requesting an operation handle_stripe sets the pending bit for the
+ * operation and increments the count.  raid5_run_ops is then run whenever
+ * the count is non-zero.
+ * There are some critical dependencies between the operations that prevent some
+ * from being requested while another is in flight.
+ * 1/ Parity check operations destroy the in cache version of the parity block,
+ *    so we prevent parity dependent operations like writes and compute_blocks
+ *    from starting while a check is in progress.  Some dma engines can perform
+ *    the check without damaging the parity block, in these cases the parity block
+ *    is re-marked up to date (assuming the check was successful) and is not
+ *    re-read from disk.
+ * 2/ When a write operation is requested we immediately lock the affected blocks,
+ *    and mark them as not up to date.  This causes new read requests to be held
+ *    off, as well as parity checks and compute block operations.
+ * 3/ Once a compute block operation has been requested handle_stripe treats that
+ *    block as if it is up to date.  raid5_run_ops
+ *    guaruntees that any operation that is dependent on the
+ *    compute block result is initiated after the compute block completes.
  */
 
 struct stripe_head {
@@ -136,11 +165,19 @@ struct stripe_head {
 	spinlock_t		lock;
 	int			bm_seq;	/* sequence number for bitmap flushes */
 	int			disks;			/* disks in stripe */
+	struct stripe_operations {
+		unsigned long	   pending;  /* pending operations (set for request->issue->complete) */
+		unsigned long	   ack;	     /* submitted operations (set for issue->complete */
+		unsigned long	   complete; /* completed operations flags (set for complete) */
+		int		   target;   /* STRIPE_OP_COMPUTE_BLK target */
+		int		   count;    /* raid5_runs_ops is set to run when this is non-zero */
+		u32		   zero_sum_result;
+	} ops;
 	struct r5dev {
 		struct bio	req;
 		struct bio_vec	vec;
 		struct page	*page;
-		struct bio	*toread, *towrite, *written;
+		struct bio	*toread, *read, *towrite, *written;
 		sector_t	sector;			/* sector of this page */
 		unsigned long	flags;
 	} dev[1]; /* allocated with extra space depending of RAID geometry */
@@ -158,6 +195,10 @@ struct stripe_head {
 #define	R5_ReWrite	9	/* have tried to over-write the readerror */
 
 #define	R5_Expanded	10	/* This block now has post-expand data */
+#define	R5_Wantcompute	11	/* compute_block in progress treat as uptodate */
+#define	R5_Wantfill	12	/* dev->toread contains a bio that needs filling */
+#define	R5_Wantprexor	13	/* distinguish blocks ready for rmw from other "towrites" */
+
 /*
  * Write method
  */
@@ -179,6 +220,22 @@ struct stripe_head {
 #define	STRIPE_EXPANDING	9
 #define	STRIPE_EXPAND_SOURCE	10
 #define	STRIPE_EXPAND_READY	11
+
+/*
+ * Operations flags (in issue order)
+ */
+#define STRIPE_OP_BIOFILL	0
+#define STRIPE_OP_COMPUTE_BLK	1
+#define STRIPE_OP_PREXOR	2
+#define STRIPE_OP_BIODRAIN	3
+#define STRIPE_OP_POSTXOR	4
+#define STRIPE_OP_CHECK	5
+#define STRIPE_OP_IO		6
+
+/* modifiers to the base operations */
+#define STRIPE_OP_MOD_REPAIR_PD	7 /* compute the parity block and write it back */
+#define STRIPE_OP_MOD_DMA_CHECK	8 /* parity is not corrupted by the check */
+
 /*
  * Plugging:
  *

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 05/15] md: use raid5_run_ops for stripe cache operations
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (3 preceding siblings ...)
  2007-03-23  6:51 ` [PATCH 2.6.21-rc4 04/15] md: add raid5_run_ops and support routines Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 06/15] md: move write operations to raid5_run_ops Dan Williams
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

Each stripe has three flag variables to reflect the state of operations
(pending, ack, and complete).
-pending: set to request servicing in raid5_run_ops
-ack: set to reflect that raid5_runs_ops has seen this request
-complete: set when the operation is complete and it is ok for handle_stripe5
to clear 'pending' and 'ack'.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |   65 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 56 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b7185a1..0397e33 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -126,6 +126,7 @@ static void __release_stripe(raid5_conf_t *conf, struct stripe_head *sh)
 			}
 			md_wakeup_thread(conf->mddev->thread);
 		} else {
+			BUG_ON(sh->ops.pending);
 			if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
 				atomic_dec(&conf->preread_active_stripes);
 				if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD)
@@ -225,7 +226,8 @@ static void init_stripe(struct stripe_head *sh, sector_t sector, int pd_idx, int
 
 	BUG_ON(atomic_read(&sh->count) != 0);
 	BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));
-	
+	BUG_ON(sh->ops.pending || sh->ops.ack || sh->ops.complete);
+
 	CHECK_DEVLOCK();
 	PRINTK("init_stripe called, stripe %llu\n", 
 		(unsigned long long)sh->sector);
@@ -241,11 +243,11 @@ static void init_stripe(struct stripe_head *sh, sector_t sector, int pd_idx, int
 	for (i = sh->disks; i--; ) {
 		struct r5dev *dev = &sh->dev[i];
 
-		if (dev->toread || dev->towrite || dev->written ||
+		if (dev->toread || dev->read || dev->towrite || dev->written ||
 		    test_bit(R5_LOCKED, &dev->flags)) {
-			printk("sector=%llx i=%d %p %p %p %d\n",
+			printk("sector=%llx i=%d %p %p %p %p %d\n",
 			       (unsigned long long)sh->sector, i, dev->toread,
-			       dev->towrite, dev->written,
+			       dev->read, dev->towrite, dev->written,
 			       test_bit(R5_LOCKED, &dev->flags));
 			BUG();
 		}
@@ -325,6 +327,43 @@ static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector
 	return sh;
 }
 
+/* check_op() ensures that we only dequeue an operation once */
+#define check_op(op) do {\
+	if (test_bit(op, &sh->ops.pending) &&\
+		!test_bit(op, &sh->ops.complete)) {\
+		if (test_and_set_bit(op, &sh->ops.ack))\
+			clear_bit(op, &pending);\
+		else\
+			ack++;\
+	} else\
+		clear_bit(op, &pending);\
+} while(0)
+
+/* find new work to run, do not resubmit work that is already
+ * in flight
+ */
+static unsigned long get_stripe_work(struct stripe_head *sh)
+{
+	unsigned long pending;
+	int ack = 0;
+
+	pending = sh->ops.pending;
+
+	check_op(STRIPE_OP_BIOFILL);
+	check_op(STRIPE_OP_COMPUTE_BLK);
+	check_op(STRIPE_OP_PREXOR);
+	check_op(STRIPE_OP_BIODRAIN);
+	check_op(STRIPE_OP_POSTXOR);
+	check_op(STRIPE_OP_CHECK);
+	if (test_and_clear_bit(STRIPE_OP_IO, &sh->ops.pending))
+		ack++;
+
+	sh->ops.count -= ack;
+	BUG_ON(sh->ops.count < 0);
+
+	return pending;
+}
+
 static int
 raid5_end_read_request(struct bio * bi, unsigned int bytes_done, int error);
 static int
@@ -1859,7 +1898,6 @@ static int stripe_to_pdidx(sector_t stripe, raid5_conf_t *conf, int disks)
  *    schedule a write of some buffers
  *    return confirmation of parity correctness
  *
- * Parity calculations are done inside the stripe lock
  * buffers are taken off read_list or write_list, and bh_cache buffers
  * get BH_Lock set before the stripe lock is released.
  *
@@ -1877,10 +1915,11 @@ static void handle_stripe5(struct stripe_head *sh)
 	int non_overwrite = 0;
 	int failed_num=0;
 	struct r5dev *dev;
+	unsigned long pending=0;
 
-	PRINTK("handling stripe %llu, cnt=%d, pd_idx=%d\n",
-		(unsigned long long)sh->sector, atomic_read(&sh->count),
-		sh->pd_idx);
+	PRINTK("handling stripe %llu, state=%#lx cnt=%d, pd_idx=%d ops=%lx:%lx:%lx\n",
+	       (unsigned long long)sh->sector, sh->state, atomic_read(&sh->count),
+	       sh->pd_idx, sh->ops.pending, sh->ops.ack, sh->ops.complete);
 
 	spin_lock(&sh->lock);
 	clear_bit(STRIPE_HANDLE, &sh->state);
@@ -2330,8 +2369,14 @@ static void handle_stripe5(struct stripe_head *sh)
 			}
 	}
 
+	if (sh->ops.count)
+		pending = get_stripe_work(sh);
+
 	spin_unlock(&sh->lock);
 
+	if (pending)
+		raid5_run_ops(sh, pending);
+
 	while ((bi=return_bi)) {
 		int bytes = bi->bi_size;
 
@@ -3828,8 +3873,10 @@ static void raid5d (mddev_t *mddev)
 			handled++;
 		}
 
-		if (list_empty(&conf->handle_list))
+		if (list_empty(&conf->handle_list)) {
+			async_tx_issue_pending_all();
 			break;
+		}
 
 		first = conf->handle_list.next;
 		sh = list_entry(first, struct stripe_head, lru);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 06/15] md: move write operations to raid5_run_ops
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (4 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 05/15] md: use raid5_run_ops for stripe cache operations Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 07/15] md: move raid5 compute block " Dan Williams
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

handle_stripe sets STRIPE_OP_PREXOR, STRIPE_OP_BIODRAIN, STRIPE_OP_POSTXOR
to request a write to the stripe cache.  raid5_run_ops is triggerred to run
and executes the request outside the stripe lock.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |  152 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 131 insertions(+), 21 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0397e33..4d1adb5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1788,7 +1788,75 @@ static void compute_block_2(struct stripe_head *sh, int dd_idx1, int dd_idx2)
 	}
 }
 
+static int handle_write_operations5(struct stripe_head *sh, int rcw, int expand)
+{
+	int i, pd_idx = sh->pd_idx, disks = sh->disks;
+	int locked=0;
+
+	if (rcw == 0) {
+		/* skip the drain operation on an expand */
+		if (!expand) {
+			BUG_ON(test_and_set_bit(STRIPE_OP_BIODRAIN,
+				&sh->ops.pending));
+			sh->ops.count++;
+		}
+
+		BUG_ON(test_and_set_bit(STRIPE_OP_POSTXOR, &sh->ops.pending));
+		sh->ops.count++;
+
+		for (i=disks ; i-- ;) {
+			struct r5dev *dev = &sh->dev[i];
+
+			if (dev->towrite) {
+				set_bit(R5_LOCKED, &dev->flags);
+				if (!expand)
+					clear_bit(R5_UPTODATE, &dev->flags);
+				locked++;
+			}
+		}
+	} else {
+		BUG_ON(!(test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags) ||
+			test_bit(R5_Wantcompute, &sh->dev[pd_idx].flags)));
+
+		BUG_ON(test_and_set_bit(STRIPE_OP_PREXOR, &sh->ops.pending) ||
+			test_and_set_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending) ||
+			test_and_set_bit(STRIPE_OP_POSTXOR, &sh->ops.pending));
+
+		sh->ops.count += 3;
+
+		for (i=disks ; i-- ;) {
+			struct r5dev *dev = &sh->dev[i];
+			if (i==pd_idx)
+				continue;
 
+			/* For a read-modify write there may be blocks that are
+			 * locked for reading while others are ready to be written
+			 * so we distinguish these blocks by the R5_Wantprexor bit
+			 */
+			if (dev->towrite &&
+			    (test_bit(R5_UPTODATE, &dev->flags) ||
+			    test_bit(R5_Wantcompute, &dev->flags))) {
+				set_bit(R5_Wantprexor, &dev->flags);
+				set_bit(R5_LOCKED, &dev->flags);
+				clear_bit(R5_UPTODATE, &dev->flags);
+				locked++;
+			}
+		}
+	}
+
+	/* keep the parity disk locked while asynchronous operations
+	 * are in flight
+	 */
+	set_bit(R5_LOCKED, &sh->dev[pd_idx].flags);
+	clear_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
+	locked++;
+
+	PRINTK("%s: stripe %llu locked: %d pending: %lx\n",
+		__FUNCTION__, (unsigned long long)sh->sector,
+		locked, sh->ops.pending);
+
+	return locked;
+}
 
 /*
  * Each stripe/dev can have one or more bion attached.
@@ -2151,8 +2219,67 @@ static void handle_stripe5(struct stripe_head *sh)
 		set_bit(STRIPE_HANDLE, &sh->state);
 	}
 
-	/* now to consider writing and what else, if anything should be read */
-	if (to_write) {
+	/* Now we check to see if any write operations have recently
+	 * completed
+	 */
+
+	/* leave prexor set until postxor is done, allows us to distinguish
+	 * a rmw from a rcw during biodrain
+	 */
+	if (test_bit(STRIPE_OP_PREXOR, &sh->ops.complete) &&
+		test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete)) {
+
+		clear_bit(STRIPE_OP_PREXOR, &sh->ops.complete);
+		clear_bit(STRIPE_OP_PREXOR, &sh->ops.ack);
+		clear_bit(STRIPE_OP_PREXOR, &sh->ops.pending);
+
+		for (i=disks; i--;)
+			clear_bit(R5_Wantprexor, &sh->dev[i].flags);
+	}
+
+	/* if only POSTXOR is set then this is an 'expand' postxor */
+	if (test_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete) &&
+		test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete)) {
+
+		clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.complete);
+		clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.ack);
+		clear_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending);
+
+		clear_bit(STRIPE_OP_POSTXOR, &sh->ops.complete);
+		clear_bit(STRIPE_OP_POSTXOR, &sh->ops.ack);
+		clear_bit(STRIPE_OP_POSTXOR, &sh->ops.pending);
+
+		/* All the 'written' buffers and the parity block are ready to be
+		 * written back to disk
+		 */
+		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags));
+		for (i=disks; i--;) {
+			dev = &sh->dev[i];
+			if (test_bit(R5_LOCKED, &dev->flags) &&
+				(i == sh->pd_idx || dev->written)) {
+				PRINTK("Writing block %d\n", i);
+				set_bit(R5_Wantwrite, &dev->flags);
+				if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+					sh->ops.count++;
+				if (!test_bit(R5_Insync, &dev->flags)
+				    || (i==sh->pd_idx && failed == 0))
+					set_bit(STRIPE_INSYNC, &sh->state);
+			}
+		}
+		if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
+			atomic_dec(&conf->preread_active_stripes);
+			if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD)
+				md_wakeup_thread(conf->mddev->thread);
+		}
+	}
+
+	/* 1/ Now to consider new write requests and what else, if anything should be read
+	 * 2/ Check operations clobber the parity block so do not start new writes while
+	 *    a check is in flight
+	 * 3/ Write operations do not stack
+	 */
+	if (to_write && !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending) &&
+		!test_bit(STRIPE_OP_CHECK, &sh->ops.pending)) {
 		int rmw=0, rcw=0;
 		for (i=disks ; i--;) {
 			/* would I have to read this buffer for read_modify_write */
@@ -2219,25 +2346,8 @@ static void handle_stripe5(struct stripe_head *sh)
 			}
 		/* now if nothing is locked, and if we have enough data, we can start a write request */
 		if (locked == 0 && (rcw == 0 ||rmw == 0) &&
-		    !test_bit(STRIPE_BIT_DELAY, &sh->state)) {
-			PRINTK("Computing parity...\n");
-			compute_parity5(sh, rcw==0 ? RECONSTRUCT_WRITE : READ_MODIFY_WRITE);
-			/* now every locked buffer is ready to be written */
-			for (i=disks; i--;)
-				if (test_bit(R5_LOCKED, &sh->dev[i].flags)) {
-					PRINTK("Writing block %d\n", i);
-					locked++;
-					set_bit(R5_Wantwrite, &sh->dev[i].flags);
-					if (!test_bit(R5_Insync, &sh->dev[i].flags)
-					    || (i==sh->pd_idx && failed == 0))
-						set_bit(STRIPE_INSYNC, &sh->state);
-				}
-			if (test_and_clear_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) {
-				atomic_dec(&conf->preread_active_stripes);
-				if (atomic_read(&conf->preread_active_stripes) < IO_THRESHOLD)
-					md_wakeup_thread(conf->mddev->thread);
-			}
-		}
+		    !test_bit(STRIPE_BIT_DELAY, &sh->state))
+			locked += handle_write_operations5(sh, rcw, 0);
 	}
 
 	/* maybe we need to check and possibly fix the parity for this stripe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 07/15] md: move raid5 compute block operations to raid5_run_ops
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (5 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 06/15] md: move write operations to raid5_run_ops Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 08/15] md: move raid5 parity checks " Dan Williams
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

handle_stripe sets STRIPE_OP_COMPUTE_BLK to request servicing from
raid5_run_ops.  It also sets a flag for the block being computed to let
other parts of handle_stripe submit dependent operations.  raid5_run_ops
guarantees that the compute operation completes before any dependent
operation starts.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |  125 +++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 93 insertions(+), 32 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4d1adb5..9856742 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1980,7 +1980,7 @@ static void handle_stripe5(struct stripe_head *sh)
 	int i;
 	int syncing, expanding, expanded;
 	int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0;
-	int non_overwrite = 0;
+	int compute=0, req_compute=0, non_overwrite=0;
 	int failed_num=0;
 	struct r5dev *dev;
 	unsigned long pending=0;
@@ -2032,8 +2032,8 @@ static void handle_stripe5(struct stripe_head *sh)
 		/* now count some things */
 		if (test_bit(R5_LOCKED, &dev->flags)) locked++;
 		if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++;
+		if (test_bit(R5_Wantcompute, &dev->flags)) BUG_ON(++compute > 1);
 
-		
 		if (dev->toread) to_read++;
 		if (dev->towrite) {
 			to_write++;
@@ -2188,31 +2188,82 @@ static void handle_stripe5(struct stripe_head *sh)
 	 * parity, or to satisfy requests
 	 * or to load a block that is being partially written.
 	 */
-	if (to_read || non_overwrite || (syncing && (uptodate < disks)) || expanding) {
-		for (i=disks; i--;) {
-			dev = &sh->dev[i];
-			if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
-			    (dev->toread ||
-			     (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) ||
-			     syncing ||
-			     expanding ||
-			     (failed && (sh->dev[failed_num].toread ||
-					 (sh->dev[failed_num].towrite && !test_bit(R5_OVERWRITE, &sh->dev[failed_num].flags))))
-				    )
-				) {
-				/* we would like to get this block, possibly
-				 * by computing it, but we might not be able to
+	if (to_read || non_overwrite || (syncing && (uptodate + compute < disks)) || expanding ||
+		test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending)) {
+
+		/* Clear completed compute operations.  Parity recovery
+		 * (STRIPE_OP_MOD_REPAIR_PD) implies a write-back which is handled
+		 * later on in this routine
+		 */
+		if (test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete) &&
+			!test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) {
+			clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete);
+			clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.ack);
+			clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending);
+		}
+
+		/* look for blocks to read/compute, skip this if a compute
+		 * is already in flight, or if the stripe contents are in the
+		 * midst of changing due to a write
+		 */
+		if (!test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending) &&
+			!test_bit(STRIPE_OP_PREXOR, &sh->ops.pending) &&
+			!test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) {
+			for (i=disks; i--;) {
+				dev = &sh->dev[i];
+
+				/* don't schedule compute operations or reads on
+				 * the parity block while a check is in flight
 				 */
-				if (uptodate == disks-1) {
-					PRINTK("Computing block %d\n", i);
-					compute_block(sh, i);
-					uptodate++;
-				} else if (test_bit(R5_Insync, &dev->flags)) {
-					set_bit(R5_LOCKED, &dev->flags);
-					set_bit(R5_Wantread, &dev->flags);
-					locked++;
-					PRINTK("Reading block %d (sync=%d)\n", 
-						i, syncing);
+				if ((i == sh->pd_idx) && test_bit(STRIPE_OP_CHECK, &sh->ops.pending))
+					continue;
+
+				if (!test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
+				     (dev->toread ||
+				     (dev->towrite && !test_bit(R5_OVERWRITE, &dev->flags)) ||
+				     syncing ||
+				     expanding ||
+				     (failed && (sh->dev[failed_num].toread ||
+						 (sh->dev[failed_num].towrite &&
+						 	!test_bit(R5_OVERWRITE, &sh->dev[failed_num].flags))))
+					    )
+					) {
+					/* 1/ We would like to get this block, possibly
+					 * by computing it, but we might not be able to.
+					 *
+					 * 2/ Since parity check operations potentially
+					 * make the parity block !uptodate it will need
+					 * to be refreshed before any compute operations
+					 * on data disks are scheduled.
+					 *
+					 * 3/ We hold off parity block re-reads until check
+					 * operations have quiesced.
+					 */
+					if ((uptodate == disks-1) && !test_bit(STRIPE_OP_CHECK, &sh->ops.pending)) {
+						set_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending);
+						set_bit(R5_Wantcompute, &dev->flags);
+						sh->ops.target = i;
+						BUG_ON(req_compute++);
+						sh->ops.count++;
+						/* Careful: from this point on 'uptodate' is in the eye of
+						 * raid5_run_ops which services 'compute' operations before
+						 * writes. R5_Wantcompute flags a block that will be R5_UPTODATE
+						 * by the time it is needed for a subsequent operation.
+						 */
+						uptodate++;
+					} else if ((uptodate < disks-1) && test_bit(R5_Insync, &dev->flags)) {
+						/* Note: we hold off compute operations while checks are in flight,
+						 * but we still prefer 'compute' over 'read' hence we only read if
+						 * (uptodate < disks-1)
+						 */
+						set_bit(R5_LOCKED, &dev->flags);
+						set_bit(R5_Wantread, &dev->flags);
+						if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+							sh->ops.count++;
+						locked++;
+						PRINTK("Reading block %d (sync=%d)\n",
+							i, syncing);
+					}
 				}
 			}
 		}
@@ -2287,7 +2338,7 @@ static void handle_stripe5(struct stripe_head *sh)
 			if ((dev->towrite || i == sh->pd_idx) &&
 			    (!test_bit(R5_LOCKED, &dev->flags) 
 				    ) &&
-			    !test_bit(R5_UPTODATE, &dev->flags)) {
+			    !(test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_Wantcompute, &dev->flags))) {
 				if (test_bit(R5_Insync, &dev->flags)
 /*				    && !(!mddev->insync && i == sh->pd_idx) */
 					)
@@ -2298,7 +2349,7 @@ static void handle_stripe5(struct stripe_head *sh)
 			if (!test_bit(R5_OVERWRITE, &dev->flags) && i != sh->pd_idx &&
 			    (!test_bit(R5_LOCKED, &dev->flags) 
 				    ) &&
-			    !test_bit(R5_UPTODATE, &dev->flags)) {
+			    !(test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_Wantcompute, &dev->flags))) {
 				if (test_bit(R5_Insync, &dev->flags)) rcw++;
 				else rcw += 2*disks;
 			}
@@ -2311,7 +2362,8 @@ static void handle_stripe5(struct stripe_head *sh)
 			for (i=disks; i--;) {
 				dev = &sh->dev[i];
 				if ((dev->towrite || i == sh->pd_idx) &&
-				    !test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
+				    !test_bit(R5_LOCKED, &dev->flags) &&
+				    !(test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_Wantcompute, &dev->flags)) &&
 				    test_bit(R5_Insync, &dev->flags)) {
 					if (test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
 					{
@@ -2330,7 +2382,8 @@ static void handle_stripe5(struct stripe_head *sh)
 			for (i=disks; i--;) {
 				dev = &sh->dev[i];
 				if (!test_bit(R5_OVERWRITE, &dev->flags) && i != sh->pd_idx &&
-				    !test_bit(R5_LOCKED, &dev->flags) && !test_bit(R5_UPTODATE, &dev->flags) &&
+				    !test_bit(R5_LOCKED, &dev->flags) &&
+				    !(test_bit(R5_UPTODATE, &dev->flags) || test_bit(R5_Wantcompute, &dev->flags)) &&
 				    test_bit(R5_Insync, &dev->flags)) {
 					if (test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))
 					{
@@ -2345,8 +2398,16 @@ static void handle_stripe5(struct stripe_head *sh)
 				}
 			}
 		/* now if nothing is locked, and if we have enough data, we can start a write request */
-		if (locked == 0 && (rcw == 0 ||rmw == 0) &&
-		    !test_bit(STRIPE_BIT_DELAY, &sh->state))
+		/* since handle_stripe can be called at any time we need to handle the case
+		 * where a compute block operation has been submitted and then a subsequent
+		 * call wants to start a write request.  raid5_run_ops only handles the case where
+		 * compute block and postxor are requested simultaneously.  If this
+		 * is not the case then new writes need to be held off until the compute
+		 * completes.
+		 */
+		if ((req_compute || !test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending)) &&
+			(locked == 0 && (rcw == 0 ||rmw == 0) &&
+			!test_bit(STRIPE_BIT_DELAY, &sh->state)))
 			locked += handle_write_operations5(sh, rcw, 0);
 	}
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 08/15] md: move raid5 parity checks to raid5_run_ops
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (6 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 07/15] md: move raid5 compute block " Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 09/15] md: satisfy raid5 read requests via raid5_run_ops Dan Williams
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

handle_stripe sets STRIPE_OP_CHECK to request a check operation in
raid5_run_ops.  If raid5_run_ops is able to perform the check with a
dma engine the parity will be preserved in memory removing the need to
re-read it from disk, as is necessary in the synchronous case.

'Repair' operations re-use the same logic as compute block, with the caveat
that the results of the compute block are immediately written back to the
parity disk.  To differentiate these operations the STRIPE_OP_MOD_REPAIR_PD
flag is added.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |   81 ++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9856742..17a114c 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2411,32 +2411,75 @@ static void handle_stripe5(struct stripe_head *sh)
 			locked += handle_write_operations5(sh, rcw, 0);
 	}
 
-	/* maybe we need to check and possibly fix the parity for this stripe
-	 * Any reads will already have been scheduled, so we just see if enough data
-	 * is available
+	/* 1/ Maybe we need to check and possibly fix the parity for this stripe.
+	 *    Any reads will already have been scheduled, so we just see if enough data
+	 *    is available.
+	 * 2/ Hold off parity checks while parity dependent operations are in flight
+	 *    (conflicting writes are protected by the 'locked' variable)
 	 */
-	if (syncing && locked == 0 &&
-	    !test_bit(STRIPE_INSYNC, &sh->state)) {
+	if ((syncing && locked == 0 && !test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending) &&
+		!test_bit(STRIPE_INSYNC, &sh->state)) ||
+	    	test_bit(STRIPE_OP_CHECK, &sh->ops.pending) ||
+	    	test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) {
+
 		set_bit(STRIPE_HANDLE, &sh->state);
-		if (failed == 0) {
-			BUG_ON(uptodate != disks);
-			compute_parity5(sh, CHECK_PARITY);
-			uptodate--;
-			if (page_is_zero(sh->dev[sh->pd_idx].page)) {
-				/* parity is correct (on disc, not in buffer any more) */
-				set_bit(STRIPE_INSYNC, &sh->state);
-			} else {
-				conf->mddev->resync_mismatches += STRIPE_SECTORS;
-				if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
-					/* don't try to repair!! */
+		/* Take one of the following actions:
+		 * 1/ start a check parity operation if (uptodate == disks)
+		 * 2/ finish a check parity operation and act on the result
+		 * 3/ skip to the writeback section if we previously
+		 *    initiated a recovery operation
+		 */
+		if (failed == 0 && !test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) {
+			if (!test_and_set_bit(STRIPE_OP_CHECK, &sh->ops.pending)) {
+				BUG_ON(uptodate != disks);
+				clear_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags);
+				sh->ops.count++;
+				uptodate--;
+			} else if (test_and_clear_bit(STRIPE_OP_CHECK, &sh->ops.complete)) {
+				clear_bit(STRIPE_OP_CHECK, &sh->ops.ack);
+				clear_bit(STRIPE_OP_CHECK, &sh->ops.pending);
+
+				if (sh->ops.zero_sum_result == 0)
+					/* parity is correct (on disc, not in buffer any more) */
 					set_bit(STRIPE_INSYNC, &sh->state);
 				else {
-					compute_block(sh, sh->pd_idx);
-					uptodate++;
+					conf->mddev->resync_mismatches += STRIPE_SECTORS;
+					if (test_bit(MD_RECOVERY_CHECK, &conf->mddev->recovery))
+						/* don't try to repair!! */
+						set_bit(STRIPE_INSYNC, &sh->state);
+					else {
+						BUG_ON(test_and_set_bit(
+							STRIPE_OP_COMPUTE_BLK,
+							&sh->ops.pending));
+						set_bit(STRIPE_OP_MOD_REPAIR_PD,
+							&sh->ops.pending);
+						BUG_ON(test_and_set_bit(R5_Wantcompute,
+							&sh->dev[sh->pd_idx].flags));
+						sh->ops.target = sh->pd_idx;
+						sh->ops.count++;
+						uptodate++;
+					}
 				}
 			}
 		}
-		if (!test_bit(STRIPE_INSYNC, &sh->state)) {
+
+		/* check if we can clear a parity disk reconstruct */
+		if (test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete) &&
+			test_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending)) {
+
+			clear_bit(STRIPE_OP_MOD_REPAIR_PD, &sh->ops.pending);
+			clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.complete);
+			clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.ack);
+			clear_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending);
+		}
+
+		/* Wait for check parity and compute block operations to complete
+		 * before write-back
+		 */
+		if (!test_bit(STRIPE_INSYNC, &sh->state) &&
+			!test_bit(STRIPE_OP_CHECK, &sh->ops.pending) &&
+			!test_bit(STRIPE_OP_COMPUTE_BLK, &sh->ops.pending)) {
+
 			/* either failed parity check, or recovery is happening */
 			if (failed==0)
 				failed_num = sh->pd_idx;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 09/15] md: satisfy raid5 read requests via raid5_run_ops
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (7 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 08/15] md: move raid5 parity checks " Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 10/15] md: use async_tx and raid5_run_ops for raid5 expansion operations Dan Williams
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

Use raid5_run_ops to carry out the memory copies for a raid5 read request.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |   40 +++++++++++++++-------------------------
 1 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 17a114c..bcd23fb 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1980,7 +1980,7 @@ static void handle_stripe5(struct stripe_head *sh)
 	int i;
 	int syncing, expanding, expanded;
 	int locked=0, uptodate=0, to_read=0, to_write=0, failed=0, written=0;
-	int compute=0, req_compute=0, non_overwrite=0;
+	int to_fill=0, compute=0, req_compute=0, non_overwrite=0;
 	int failed_num=0;
 	struct r5dev *dev;
 	unsigned long pending=0;
@@ -2004,34 +2004,20 @@ static void handle_stripe5(struct stripe_head *sh)
 		dev = &sh->dev[i];
 		clear_bit(R5_Insync, &dev->flags);
 
-		PRINTK("check %d: state 0x%lx read %p write %p written %p\n",
-			i, dev->flags, dev->toread, dev->towrite, dev->written);
-		/* maybe we can reply to a read */
+		PRINTK("check %d: state 0x%lx toread %p read %p write %p written %p\n",
+		i, dev->flags, dev->toread, dev->read, dev->towrite, dev->written);
+
+		/* maybe we can start a biofill operation */
 		if (test_bit(R5_UPTODATE, &dev->flags) && dev->toread) {
-			struct bio *rbi, *rbi2;
-			PRINTK("Return read for disc %d\n", i);
-			spin_lock_irq(&conf->device_lock);
-			rbi = dev->toread;
-			dev->toread = NULL;
-			if (test_and_clear_bit(R5_Overlap, &dev->flags))
-				wake_up(&conf->wait_for_overlap);
-			spin_unlock_irq(&conf->device_lock);
-			while (rbi && rbi->bi_sector < dev->sector + STRIPE_SECTORS) {
-				copy_data(0, rbi, dev->page, dev->sector);
-				rbi2 = r5_next_bio(rbi, dev->sector);
-				spin_lock_irq(&conf->device_lock);
-				if (--rbi->bi_phys_segments == 0) {
-					rbi->bi_next = return_bi;
-					return_bi = rbi;
-				}
-				spin_unlock_irq(&conf->device_lock);
-				rbi = rbi2;
-			}
+			to_read--;
+			if (!test_bit(STRIPE_OP_BIOFILL, &sh->ops.pending))
+				set_bit(R5_Wantfill, &dev->flags);
 		}
 
 		/* now count some things */
 		if (test_bit(R5_LOCKED, &dev->flags)) locked++;
 		if (test_bit(R5_UPTODATE, &dev->flags)) uptodate++;
+		if (test_bit(R5_Wantfill, &dev->flags)) to_fill++;
 		if (test_bit(R5_Wantcompute, &dev->flags)) BUG_ON(++compute > 1);
 
 		if (dev->toread) to_read++;
@@ -2055,9 +2041,13 @@ static void handle_stripe5(struct stripe_head *sh)
 			set_bit(R5_Insync, &dev->flags);
 	}
 	rcu_read_unlock();
+
+	if (to_fill && !test_and_set_bit(STRIPE_OP_BIOFILL, &sh->ops.pending))
+		sh->ops.count++;
+
 	PRINTK("locked=%d uptodate=%d to_read=%d"
-		" to_write=%d failed=%d failed_num=%d\n",
-		locked, uptodate, to_read, to_write, failed, failed_num);
+		" to_write=%d to_fill=%d failed=%d failed_num=%d\n",
+		locked, uptodate, to_read, to_write, to_fill, failed, failed_num);
 	/* check if the array has lost two devices and, if so, some requests might
 	 * need to be failed
 	 */

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 10/15] md: use async_tx and raid5_run_ops for raid5 expansion operations
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (8 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 09/15] md: satisfy raid5 read requests via raid5_run_ops Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 11/15] md: move raid5 io requests to raid5_run_ops Dan Williams
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

The parity calculation for an expansion operation is the same as the
calculation performed at the end of a write with the caveat that all blocks
in the stripe are scheduled to be written.  An expansion operation is
identified as a stripe with the POSTXOR flag set and the BIODRAIN flag not
set.

The bulk copy operation to the new stripe is handled inline by async_tx.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |   48 ++++++++++++++++++++++++++++++++++++------------
 1 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index bcd23fb..84a3f35 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2511,18 +2511,32 @@ static void handle_stripe5(struct stripe_head *sh)
 		}
 	}
 
-	if (expanded && test_bit(STRIPE_EXPANDING, &sh->state)) {
-		/* Need to write out all blocks after computing parity */
-		sh->disks = conf->raid_disks;
-		sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks);
-		compute_parity5(sh, RECONSTRUCT_WRITE);
+	/* Finish postxor operations initiated by the expansion
+	 * process
+	 */
+	if (test_bit(STRIPE_OP_POSTXOR, &sh->ops.complete) &&
+		!test_bit(STRIPE_OP_BIODRAIN, &sh->ops.pending)) {
+
+		clear_bit(STRIPE_EXPANDING, &sh->state);
+
+		clear_bit(STRIPE_OP_POSTXOR, &sh->ops.pending);
+		clear_bit(STRIPE_OP_POSTXOR, &sh->ops.ack);
+		clear_bit(STRIPE_OP_POSTXOR, &sh->ops.complete);
+
 		for (i= conf->raid_disks; i--;) {
-			set_bit(R5_LOCKED, &sh->dev[i].flags);
-			locked++;
 			set_bit(R5_Wantwrite, &sh->dev[i].flags);
+			if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+				sh->ops.count++;
 		}
-		clear_bit(STRIPE_EXPANDING, &sh->state);
-	} else if (expanded) {
+	}
+
+	if (expanded && test_bit(STRIPE_EXPANDING, &sh->state) &&
+		!test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) {
+		/* Need to write out all blocks after computing parity */
+		sh->disks = conf->raid_disks;
+		sh->pd_idx = stripe_to_pdidx(sh->sector, conf, conf->raid_disks);
+		locked += handle_write_operations5(sh, 0, 1);
+	} else if (expanded && !test_bit(STRIPE_OP_POSTXOR, &sh->ops.pending)) {
 		clear_bit(STRIPE_EXPAND_READY, &sh->state);
 		atomic_dec(&conf->reshape_stripes);
 		wake_up(&conf->wait_for_overlap);
@@ -2533,6 +2547,7 @@ static void handle_stripe5(struct stripe_head *sh)
 		/* We have read all the blocks in this stripe and now we need to
 		 * copy some of them into a target stripe for expand.
 		 */
+		struct dma_async_tx_descriptor *tx = NULL;
 		clear_bit(STRIPE_EXPAND_SOURCE, &sh->state);
 		for (i=0; i< sh->disks; i++)
 			if (i != sh->pd_idx) {
@@ -2556,9 +2571,12 @@ static void handle_stripe5(struct stripe_head *sh)
 					release_stripe(sh2);
 					continue;
 				}
-				memcpy(page_address(sh2->dev[dd_idx].page),
-				       page_address(sh->dev[i].page),
-				       STRIPE_SIZE);
+
+				/* place all the copies on one channel */
+				tx = async_memcpy(sh2->dev[dd_idx].page,
+					sh->dev[i].page, 0, 0, STRIPE_SIZE,
+					ASYNC_TX_DEP_ACK, tx, NULL, NULL);
+
 				set_bit(R5_Expanded, &sh2->dev[dd_idx].flags);
 				set_bit(R5_UPTODATE, &sh2->dev[dd_idx].flags);
 				for (j=0; j<conf->raid_disks; j++)
@@ -2570,6 +2588,12 @@ static void handle_stripe5(struct stripe_head *sh)
 					set_bit(STRIPE_HANDLE, &sh2->state);
 				}
 				release_stripe(sh2);
+
+				/* done submitting copies, wait for them to complete */
+				if (i + 1 >= sh->disks) {
+					async_tx_ack(tx);
+					dma_wait_for_async_tx(tx);
+				}
 			}
 	}
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 11/15] md: move raid5 io requests to raid5_run_ops
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (9 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 10/15] md: use async_tx and raid5_run_ops for raid5 expansion operations Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 12/15] md: remove raid5 compute_block and compute_parity5 Dan Williams
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

handle_stripe now only updates the state of stripes.  All execution of
operations is moved to raid5_run_ops.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |   68 ++++++++--------------------------------------------
 1 files changed, 10 insertions(+), 58 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 84a3f35..0be26c2 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2360,6 +2360,8 @@ static void handle_stripe5(struct stripe_head *sh)
 						PRINTK("Read_old block %d for r-m-w\n", i);
 						set_bit(R5_LOCKED, &dev->flags);
 						set_bit(R5_Wantread, &dev->flags);
+						if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+							sh->ops.count++;
 						locked++;
 					} else {
 						set_bit(STRIPE_DELAYED, &sh->state);
@@ -2380,6 +2382,8 @@ static void handle_stripe5(struct stripe_head *sh)
 						PRINTK("Read_old block %d for Reconstruct\n", i);
 						set_bit(R5_LOCKED, &dev->flags);
 						set_bit(R5_Wantread, &dev->flags);
+						if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+							sh->ops.count++;
 						locked++;
 					} else {
 						set_bit(STRIPE_DELAYED, &sh->state);
@@ -2479,6 +2483,8 @@ static void handle_stripe5(struct stripe_head *sh)
 
 			set_bit(R5_LOCKED, &dev->flags);
 			set_bit(R5_Wantwrite, &dev->flags);
+			if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+				sh->ops.count++;
 			clear_bit(STRIPE_DEGRADED, &sh->state);
 			locked++;
 			set_bit(STRIPE_INSYNC, &sh->state);
@@ -2500,12 +2506,16 @@ static void handle_stripe5(struct stripe_head *sh)
 		dev = &sh->dev[failed_num];
 		if (!test_bit(R5_ReWrite, &dev->flags)) {
 			set_bit(R5_Wantwrite, &dev->flags);
+			if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+				sh->ops.count++;
 			set_bit(R5_ReWrite, &dev->flags);
 			set_bit(R5_LOCKED, &dev->flags);
 			locked++;
 		} else {
 			/* let's read it back */
 			set_bit(R5_Wantread, &dev->flags);
+			if (!test_and_set_bit(STRIPE_OP_IO, &sh->ops.pending))
+				sh->ops.count++;
 			set_bit(R5_LOCKED, &dev->flags);
 			locked++;
 		}
@@ -2615,64 +2625,6 @@ static void handle_stripe5(struct stripe_head *sh)
 			      test_bit(BIO_UPTODATE, &bi->bi_flags)
 			        ? 0 : -EIO);
 	}
-	for (i=disks; i-- ;) {
-		int rw;
-		struct bio *bi;
-		mdk_rdev_t *rdev;
-		if (test_and_clear_bit(R5_Wantwrite, &sh->dev[i].flags))
-			rw = WRITE;
-		else if (test_and_clear_bit(R5_Wantread, &sh->dev[i].flags))
-			rw = READ;
-		else
-			continue;
- 
-		bi = &sh->dev[i].req;
- 
-		bi->bi_rw = rw;
-		if (rw == WRITE)
-			bi->bi_end_io = raid5_end_write_request;
-		else
-			bi->bi_end_io = raid5_end_read_request;
- 
-		rcu_read_lock();
-		rdev = rcu_dereference(conf->disks[i].rdev);
-		if (rdev && test_bit(Faulty, &rdev->flags))
-			rdev = NULL;
-		if (rdev)
-			atomic_inc(&rdev->nr_pending);
-		rcu_read_unlock();
- 
-		if (rdev) {
-			if (syncing || expanding || expanded)
-				md_sync_acct(rdev->bdev, STRIPE_SECTORS);
-
-			bi->bi_bdev = rdev->bdev;
-			PRINTK("for %llu schedule op %ld on disc %d\n",
-				(unsigned long long)sh->sector, bi->bi_rw, i);
-			atomic_inc(&sh->count);
-			bi->bi_sector = sh->sector + rdev->data_offset;
-			bi->bi_flags = 1 << BIO_UPTODATE;
-			bi->bi_vcnt = 1;	
-			bi->bi_max_vecs = 1;
-			bi->bi_idx = 0;
-			bi->bi_io_vec = &sh->dev[i].vec;
-			bi->bi_io_vec[0].bv_len = STRIPE_SIZE;
-			bi->bi_io_vec[0].bv_offset = 0;
-			bi->bi_size = STRIPE_SIZE;
-			bi->bi_next = NULL;
-			if (rw == WRITE &&
-			    test_bit(R5_ReWrite, &sh->dev[i].flags))
-				atomic_add(STRIPE_SECTORS, &rdev->corrected_errors);
-			generic_make_request(bi);
-		} else {
-			if (rw == WRITE)
-				set_bit(STRIPE_DEGRADED, &sh->state);
-			PRINTK("skip op %ld on disc %d for sector %llu\n",
-				bi->bi_rw, i, (unsigned long long)sh->sector);
-			clear_bit(R5_LOCKED, &sh->dev[i].flags);
-			set_bit(STRIPE_HANDLE, &sh->state);
-		}
-	}
 }
 
 static void handle_stripe6(struct stripe_head *sh, struct page *tmp_page)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 12/15] md: remove raid5 compute_block and compute_parity5
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (10 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 11/15] md: move raid5 io requests to raid5_run_ops Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 13/15] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines Dan Williams
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

replaced by raid5_run_ops

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/md/raid5.c |  124 ----------------------------------------------------
 1 files changed, 0 insertions(+), 124 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0be26c2..062df02 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1482,130 +1482,6 @@ static void copy_data(int frombio, struct bio *bio,
 			   }						       \
 			} while(0)
 
-
-static void compute_block(struct stripe_head *sh, int dd_idx)
-{
-	int i, count, disks = sh->disks;
-	void *ptr[MAX_XOR_BLOCKS], *dest, *p;
-
-	PRINTK("compute_block, stripe %llu, idx %d\n", 
-		(unsigned long long)sh->sector, dd_idx);
-
-	dest = page_address(sh->dev[dd_idx].page);
-	memset(dest, 0, STRIPE_SIZE);
-	count = 0;
-	for (i = disks ; i--; ) {
-		if (i == dd_idx)
-			continue;
-		p = page_address(sh->dev[i].page);
-		if (test_bit(R5_UPTODATE, &sh->dev[i].flags))
-			ptr[count++] = p;
-		else
-			printk(KERN_ERR "compute_block() %d, stripe %llu, %d"
-				" not present\n", dd_idx,
-				(unsigned long long)sh->sector, i);
-
-		check_xor();
-	}
-	if (count)
-		xor_block(count, STRIPE_SIZE, dest, ptr);
-	set_bit(R5_UPTODATE, &sh->dev[dd_idx].flags);
-}
-
-static void compute_parity5(struct stripe_head *sh, int method)
-{
-	raid5_conf_t *conf = sh->raid_conf;
-	int i, pd_idx = sh->pd_idx, disks = sh->disks, count;
-	void *ptr[MAX_XOR_BLOCKS], *dest;
-	struct bio *chosen;
-
-	PRINTK("compute_parity5, stripe %llu, method %d\n",
-		(unsigned long long)sh->sector, method);
-
-	count = 0;
-	dest = page_address(sh->dev[pd_idx].page);
-	switch(method) {
-	case READ_MODIFY_WRITE:
-		BUG_ON(!test_bit(R5_UPTODATE, &sh->dev[pd_idx].flags));
-		for (i=disks ; i-- ;) {
-			if (i==pd_idx)
-				continue;
-			if (sh->dev[i].towrite &&
-			    test_bit(R5_UPTODATE, &sh->dev[i].flags)) {
-				ptr[count++] = page_address(sh->dev[i].page);
-				chosen = sh->dev[i].towrite;
-				sh->dev[i].towrite = NULL;
-
-				if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
-					wake_up(&conf->wait_for_overlap);
-
-				BUG_ON(sh->dev[i].written);
-				sh->dev[i].written = chosen;
-				check_xor();
-			}
-		}
-		break;
-	case RECONSTRUCT_WRITE:
-		memset(dest, 0, STRIPE_SIZE);
-		for (i= disks; i-- ;)
-			if (i!=pd_idx && sh->dev[i].towrite) {
-				chosen = sh->dev[i].towrite;
-				sh->dev[i].towrite = NULL;
-
-				if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
-					wake_up(&conf->wait_for_overlap);
-
-				BUG_ON(sh->dev[i].written);
-				sh->dev[i].written = chosen;
-			}
-		break;
-	case CHECK_PARITY:
-		break;
-	}
-	if (count) {
-		xor_block(count, STRIPE_SIZE, dest, ptr);
-		count = 0;
-	}
-	
-	for (i = disks; i--;)
-		if (sh->dev[i].written) {
-			sector_t sector = sh->dev[i].sector;
-			struct bio *wbi = sh->dev[i].written;
-			while (wbi && wbi->bi_sector < sector + STRIPE_SECTORS) {
-				copy_data(1, wbi, sh->dev[i].page, sector);
-				wbi = r5_next_bio(wbi, sector);
-			}
-
-			set_bit(R5_LOCKED, &sh->dev[i].flags);
-			set_bit(R5_UPTODATE, &sh->dev[i].flags);
-		}
-
-	switch(method) {
-	case RECONSTRUCT_WRITE:
-	case CHECK_PARITY:
-		for (i=disks; i--;)
-			if (i != pd_idx) {
-				ptr[count++] = page_address(sh->dev[i].page);
-				check_xor();
-			}
-		break;
-	case READ_MODIFY_WRITE:
-		for (i = disks; i--;)
-			if (sh->dev[i].written) {
-				ptr[count++] = page_address(sh->dev[i].page);
-				check_xor();
-			}
-	}
-	if (count)
-		xor_block(count, STRIPE_SIZE, dest, ptr);
-	
-	if (method != CHECK_PARITY) {
-		set_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
-		set_bit(R5_LOCKED,   &sh->dev[pd_idx].flags);
-	} else
-		clear_bit(R5_UPTODATE, &sh->dev[pd_idx].flags);
-}
-
 static void compute_parity6(struct stripe_head *sh, int method)
 {
 	raid6_conf_t *conf = sh->raid_conf;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 13/15] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (11 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 12/15] md: remove raid5 compute_block and compute_parity5 Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 14/15] iop13xx: Surface the iop13xx adma units to the iop-adma driver Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 15/15] iop3xx: Surface the iop3xx DMA and AAU " Dan Williams
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

This is a driver for the iop DMA/AAU/ADMA units which are capable of pq_xor,
pq_update, pq_zero_sum, xor, dual_xor, xor_zero_sum, fill, copy+crc, and copy
operations.

Changelog:
* fixed a slot allocation bug in do_iop13xx_adma_xor that caused too few
slots to be requested eventually leading to data corruption
* enabled the slot allocation routine to attempt to free slots before
returning -ENOMEM
* switched the cleanup routine to solely use the software chain and the
status register to determine if a descriptor is complete.  This is
necessary to support other IOP engines that do not have status writeback
capability
* make the driver iop generic
* modified the allocation routines to understand allocating a group of
slots for a single operation
* added a null xor initialization operation for the xor only channel on
iop3xx
* support xor operations on buffers larger than the hardware maximum
* split the do_* routines into separate prep, src/dest set, submit stages
* added async_tx support (dependent operations initiation at cleanup time)
* simplified group handling
* added interrupt support (callbacks via tasklets)
* brought the pending depth inline with ioat (i.e. 4 descriptors)
* drop dma mapping methods, suggested by Chris Leech
* don't use inline in C files, Adrian Bunk
* remove static tasklet declarations
* make iop_adma_alloc_slots easier to read and remove chances for a
corrupted descriptor chain
* fix locking bug in iop_adma_alloc_chan_resources, Benjamin Herrenschmidt

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 drivers/dma/Kconfig                 |    8 
 drivers/dma/Makefile                |    1 
 drivers/dma/iop-adma.c              | 1469 +++++++++++++++++++++++++++++++++++
 include/asm-arm/hardware/iop_adma.h |  121 +++
 4 files changed, 1599 insertions(+), 0 deletions(-)

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 292ddad..1c2ae4e 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -40,4 +40,12 @@ config INTEL_IOATDMA
 	default m
 	---help---
 	  Enable support for the Intel(R) I/OAT DMA engine.
+
+config INTEL_IOP_ADMA
+        tristate "Intel IOP ADMA support"
+        depends on DMA_ENGINE && (ARCH_IOP32X || ARCH_IOP33X || ARCH_IOP13XX)
+        default m
+        ---help---
+          Enable support for the Intel(R) IOP Series RAID engines.
+
 endmenu
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 6a99341..8ebf10d 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_DMA_ENGINE) += dmaengine.o
 obj-$(CONFIG_NET_DMA) += iovlock.o
 obj-$(CONFIG_INTEL_IOATDMA) += ioatdma.o
+obj-$(CONFIG_INTEL_IOP_ADMA) += iop-adma.o
 obj-$(CONFIG_ASYNC_TX_DMA) += async_tx.o xor.o
diff --git a/drivers/dma/iop-adma.c b/drivers/dma/iop-adma.c
new file mode 100644
index 0000000..3ac03ef
--- /dev/null
+++ b/drivers/dma/iop-adma.c
@@ -0,0 +1,1469 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * This driver supports the asynchrounous DMA copy and RAID engines available
+ * on the Intel Xscale(R) family of I/O Processors (IOP 32x, 33x, 134x)
+ */
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/async_tx.h>
+#include <linux/delay.h>
+#include <linux/dma-mapping.h>
+#include <linux/spinlock.h>
+#include <linux/interrupt.h>
+#include <linux/platform_device.h>
+#include <asm/arch/adma.h>
+#include <asm/memory.h>
+
+#define to_iop_adma_chan(chan) container_of(chan, struct iop_adma_chan, common)
+#define to_iop_adma_device(dev) container_of(dev, struct iop_adma_device, common)
+#define tx_to_iop_adma_slot(tx) container_of(tx, struct iop_adma_desc_slot, async_tx)
+
+#define IOP_ADMA_DEBUG 0
+#define PRINTK(x...) ((void)(IOP_ADMA_DEBUG && printk(x)))
+
+/**
+ * iop_adma_free_slots - flags descriptor slots for reuse
+ * @slot: Slot to free
+ * Caller must hold &iop_chan->lock while calling this function
+ */
+static void iop_adma_free_slots(struct iop_adma_desc_slot *slot)
+{
+	int stride = slot->slots_per_op;
+
+	while (stride--) {
+		slot->slots_per_op = 0;
+		slot = list_entry(slot->slot_node.next,
+				struct iop_adma_desc_slot,
+				slot_node);
+	}
+}
+
+static dma_cookie_t
+iop_adma_run_tx_complete_actions(struct iop_adma_desc_slot *desc,
+	struct iop_adma_chan *iop_chan, dma_cookie_t cookie)
+{
+	BUG_ON(desc->async_tx.cookie < 0);
+	spin_lock_bh(&desc->async_tx.lock);
+	if (desc->async_tx.cookie > 0) {
+		cookie = desc->async_tx.cookie;
+		desc->async_tx.cookie = 0;
+
+		/* call the callback (must not sleep or submit new
+		 * operations to this channel)
+		 */
+		if (desc->async_tx.callback)
+			desc->async_tx.callback(
+				desc->async_tx.callback_param);
+
+		/* unmap dma addresses
+		 * (unmap_single vs unmap_page?)
+		 */
+		if (desc->group_head && desc->async_tx.type != DMA_INTERRUPT) {
+			struct iop_adma_desc_slot *unmap = desc->group_head;
+			struct device *dev =
+				&iop_chan->device->pdev->dev;
+			u32 len = unmap->unmap_len;
+			u32 src_cnt = unmap->unmap_src_cnt;
+			dma_addr_t addr = iop_desc_get_dest_addr(unmap,
+				iop_chan);
+
+			dma_unmap_page(dev, addr, len, DMA_FROM_DEVICE);
+			while(src_cnt--) {
+				addr = iop_desc_get_src_addr(unmap,
+							iop_chan,
+							src_cnt);
+				dma_unmap_page(dev, addr, len,
+					DMA_TO_DEVICE);
+			}
+			desc->group_head = NULL;
+		}
+	}
+
+	/* run dependent operations */
+	async_tx_run_dependencies(&desc->async_tx, &iop_chan->common);
+	spin_unlock_bh(&desc->async_tx.lock);
+
+	return cookie;
+}
+
+static int
+iop_adma_clean_slot(struct iop_adma_desc_slot *desc,
+	struct iop_adma_chan *iop_chan)
+{
+	/* the client is allowed to attach dependent operations
+	 * until 'ack' is set
+	 */
+	if (!desc->async_tx.ack)
+		return 0;
+
+	/* leave the last descriptor in the chain
+	 * so we can append to it
+	 */
+	if (desc->chain_node.next == &iop_chan->chain)
+		return 1;
+
+	PRINTK("\tfree slot: %d slots_per_op: %d\n", desc->idx,
+		desc->slots_per_op);
+
+	list_del(&desc->chain_node);
+	iop_adma_free_slots(desc);
+
+	return 0;
+}
+
+static void __iop_adma_slot_cleanup(struct iop_adma_chan *iop_chan)
+{
+	struct iop_adma_desc_slot *iter, *_iter, *group_start = NULL;
+	dma_cookie_t cookie = 0;
+	u32 current_desc = iop_chan_get_current_descriptor(iop_chan);
+	int busy = iop_chan_is_busy(iop_chan);
+	int seen_current = 0, slot_cnt = 0, slots_per_op = 0;
+
+	PRINTK("iop adma%d: %s\n", iop_chan->device->id, __FUNCTION__);
+	/* free completed slots from the chain starting with
+	 * the oldest descriptor
+	 */
+	list_for_each_entry_safe(iter, _iter, &iop_chan->chain,
+					chain_node) {
+		PRINTK("\tcookie: %d slot: %d busy: %d "
+			"this_desc: %#x next_desc: %#x ack: %d\n",
+			iter->async_tx.cookie, iter->idx, busy, iter->phys,
+			iop_desc_get_next_desc(iter),
+			iter->async_tx.ack);
+		prefetch(_iter);
+		prefetch(&_iter->async_tx);
+
+		/* do not advance past the current descriptor loaded into the
+		 * hardware channel, subsequent descriptors are either in process
+		 * or have not been submitted
+		 */
+		if (seen_current)
+			break;
+
+		/* stop the search if we reach the current descriptor and the
+		 * channel is busy, or if it appears that the current descriptor
+		 * needs to be re-read (i.e. has been appended to)
+		 */
+		if (iter->phys == current_desc) {
+			BUG_ON(seen_current++);
+			if (busy || iop_desc_get_next_desc(iter))
+				break;
+		}
+
+		/* detect the start of a group transaction */
+		if (!slot_cnt && !slots_per_op) {
+			slot_cnt = iter->slot_cnt;
+			slots_per_op = iter->slots_per_op;
+			if (slot_cnt <= slots_per_op) {
+				slot_cnt = 0;
+				slots_per_op = 0;
+			}
+		}
+
+		if (slot_cnt) {
+			PRINTK("\tgroup++\n");
+			if (!group_start)
+				group_start = iter;
+			slot_cnt -= slots_per_op;
+		}
+
+		/* all the members of a group are complete */
+		if (slots_per_op != 0 && slot_cnt == 0) {
+			struct iop_adma_desc_slot *grp_iter, *_grp_iter;
+			int end_of_chain = 0;
+			PRINTK("\tgroup end\n");
+
+			/* collect the total results */
+			if (group_start->xor_check_result) {
+				u32 zero_sum_result = 0;
+				slot_cnt = group_start->slot_cnt;
+				grp_iter = group_start;
+
+				list_for_each_entry_from(grp_iter,
+					&iop_chan->chain, chain_node) {
+					zero_sum_result |=
+						iop_desc_get_zero_result(grp_iter);
+					PRINTK("\titer%d result: %d\n", grp_iter->idx,
+						zero_sum_result);
+					slot_cnt -= slots_per_op;
+					if (slot_cnt == 0)
+						break;
+				}
+				PRINTK("\tgroup_start->xor_check_result: %p\n",
+					group_start->xor_check_result);
+				*group_start->xor_check_result = zero_sum_result;
+			}
+
+			/* clean up the group */
+			slot_cnt = group_start->slot_cnt;
+			grp_iter = group_start;
+			list_for_each_entry_safe_from(grp_iter, _grp_iter,
+				&iop_chan->chain, chain_node) {
+				cookie = iop_adma_run_tx_complete_actions(
+					grp_iter, iop_chan, cookie);
+
+				slot_cnt -= slots_per_op;
+				end_of_chain = iop_adma_clean_slot(grp_iter,
+					iop_chan);
+
+				if (slot_cnt == 0 || end_of_chain)
+					break;
+			}
+
+			/* the group should be complete at this point */
+			BUG_ON(slot_cnt);
+
+			slots_per_op = 0;
+			group_start = NULL;
+			if (end_of_chain)
+				break;
+			else
+				continue;
+		} else if (slots_per_op) /* wait for group completion */
+			continue;
+
+		/* write back zero sum results (single descriptor case) */
+		if (iter->xor_check_result && iter->async_tx.cookie)
+			*iter->xor_check_result = iop_desc_get_zero_result(iter);
+
+		cookie = iop_adma_run_tx_complete_actions(iter, iop_chan, cookie);
+		if (iop_adma_clean_slot(iter, iop_chan))
+			break;
+	}
+
+	BUG_ON(!seen_current);
+
+	iop_chan_idle(busy, iop_chan);
+
+	if (cookie > 0) {
+		iop_chan->completed_cookie = cookie;
+		PRINTK("\tcompleted cookie %d\n", cookie);
+	}
+}
+
+static void
+iop_adma_slot_cleanup(struct iop_adma_chan *iop_chan)
+{
+	spin_lock_bh(&iop_chan->lock);
+	__iop_adma_slot_cleanup(iop_chan);
+	spin_unlock_bh(&iop_chan->lock);
+}
+
+static void iop_adma_tasklet(unsigned long data)
+{
+	struct iop_adma_chan *chan = (struct iop_adma_chan *) data;
+	__iop_adma_slot_cleanup(chan);
+}
+
+static struct iop_adma_desc_slot *
+iop_adma_alloc_slots(struct iop_adma_chan *iop_chan, int num_slots,
+			int slots_per_op)
+{
+	struct iop_adma_desc_slot *iter, *_iter, *alloc_start = NULL;
+	struct list_head chain = LIST_HEAD_INIT(chain);
+	int slots_found, retry = 0;
+
+	/* start search from the last allocated descrtiptor
+	 * if a contiguous allocation can not be found start searching
+	 * from the beginning of the list
+	 */
+retry:
+	slots_found = 0;
+	if (retry == 0)
+		iter = iop_chan->last_used;
+	else
+		iter = list_entry(&iop_chan->all_slots,
+			struct iop_adma_desc_slot,
+			slot_node);
+
+	list_for_each_entry_safe_continue(iter, _iter, &iop_chan->all_slots, slot_node) {
+		prefetch(_iter);
+		prefetch(&_iter->async_tx);
+		if (iter->slots_per_op) {
+			/* give up after finding the first busy slot
+			 * on the second pass through the list
+			 */
+			if (retry)
+				break;
+
+			slots_found = 0;
+			continue;
+		}
+
+		/* start the allocation if the slot is correctly aligned */
+		if (!slots_found++) {
+			if (iop_desc_is_aligned(iter, slots_per_op))
+				alloc_start = iter;
+			else {
+				slots_found = 0;
+				continue;
+			}
+		}
+
+		if (slots_found == num_slots) {
+			struct iop_adma_desc_slot *alloc_tail = NULL;
+			struct iop_adma_desc_slot *last_used = NULL;
+			iter = alloc_start;
+			while (num_slots) {
+				int i;
+				PRINTK("iop adma%d: allocated slot: %d "
+					"(desc %p phys: %#x) slots_per_op %d\n",
+					iop_chan->device->id,
+					iter->idx, iter->hw_desc, iter->phys,
+					slots_per_op);
+
+				/* pre-ack all but the last descriptor */
+				if (num_slots != slots_per_op)
+					iter->async_tx.ack = 1;
+				else
+					iter->async_tx.ack = 0;
+
+				list_add_tail(&iter->chain_node, &chain);
+				alloc_tail = iter;
+				iter->async_tx.cookie = 0;
+				iter->slot_cnt = num_slots;
+				iter->xor_check_result = NULL;
+				for (i = 0; i < slots_per_op; i++) {
+					iter->slots_per_op = slots_per_op - i;
+					last_used = iter;
+					iter = list_entry(iter->slot_node.next,
+						struct iop_adma_desc_slot,
+						slot_node);
+				}
+				num_slots -= slots_per_op;
+			}
+			alloc_tail->group_head = alloc_start;
+			alloc_tail->async_tx.cookie = -EBUSY;
+			list_splice(&chain, &alloc_tail->group_list);
+			iop_chan->last_used = last_used;
+			iop_desc_clear_next_desc(alloc_start);
+			iop_desc_clear_next_desc(alloc_tail);
+			return alloc_tail;
+		}
+	}
+	if (!retry++)
+		goto retry;
+
+	/* try to free some slots if the allocation fails */
+	tasklet_schedule(&iop_chan->irq_tasklet);
+	
+	return NULL;
+}
+
+static void iop_chan_start_null_memcpy(struct iop_adma_chan *iop_chan);
+static void iop_chan_start_null_xor(struct iop_adma_chan *iop_chan);
+
+/* returns the number of allocated descriptors */
+static int iop_adma_alloc_chan_resources(struct dma_chan *chan)
+{
+	char *hw_desc;
+	int idx;
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	struct iop_adma_desc_slot *slot = NULL;
+	int init = iop_chan->slots_allocated ? 0 : 1;
+	struct iop_adma_platform_data *plat_data =
+		iop_chan->device->pdev->dev.platform_data;
+	int num_descs_in_pool = plat_data->pool_size/IOP_ADMA_SLOT_SIZE;
+
+	/* Allocate descriptor slots */
+	do {
+		idx = iop_chan->slots_allocated;
+		if (idx == num_descs_in_pool)
+			break;
+
+		slot = kzalloc(sizeof(*slot), GFP_KERNEL);
+		if (!slot) {
+			printk(KERN_INFO "IOP ADMA Channel only initialized"
+				" %d descriptor slots", idx);
+			break;
+		}
+		hw_desc = (char *) iop_chan->device->dma_desc_pool_virt;
+		slot->hw_desc = (void *) &hw_desc[idx * IOP_ADMA_SLOT_SIZE];
+
+		dma_async_tx_descriptor_init(&slot->async_tx, chan);
+		INIT_LIST_HEAD(&slot->chain_node);
+		INIT_LIST_HEAD(&slot->slot_node);
+		INIT_LIST_HEAD(&slot->group_list);
+		hw_desc = (char *) iop_chan->device->dma_desc_pool;
+		slot->phys = (dma_addr_t) &hw_desc[idx * IOP_ADMA_SLOT_SIZE];
+		slot->idx = idx;
+
+		spin_lock_bh(&iop_chan->lock);
+		iop_chan->slots_allocated++;
+		list_add_tail(&slot->slot_node, &iop_chan->all_slots);
+		spin_unlock_bh(&iop_chan->lock);
+	} while (iop_chan->slots_allocated < num_descs_in_pool);
+
+	if (idx && !iop_chan->last_used)
+		iop_chan->last_used = list_entry(iop_chan->all_slots.next,
+					struct iop_adma_desc_slot,
+					slot_node);
+
+	PRINTK("iop adma%d: allocated %d descriptor slots last_used: %p\n",
+		iop_chan->device->id, iop_chan->slots_allocated,
+		iop_chan->last_used);
+
+	/* initialize the channel and the chain with a null operation */
+	if (init) {
+		if (test_bit(DMA_MEMCPY,
+			&iop_chan->device->common.capabilities))
+			iop_chan_start_null_memcpy(iop_chan);
+		else if (test_bit(DMA_XOR,
+			&iop_chan->device->common.capabilities))
+			iop_chan_start_null_xor(iop_chan);
+		else
+			BUG();
+	}
+
+	return (idx > 0) ? idx : -ENOMEM;
+}
+
+static dma_cookie_t
+iop_desc_assign_cookie(struct iop_adma_chan *iop_chan,
+	struct iop_adma_desc_slot *desc)
+{
+	dma_cookie_t cookie = iop_chan->common.cookie;
+	cookie++;
+	if (cookie < 0)
+		cookie = 1;
+	iop_chan->common.cookie = desc->async_tx.cookie = cookie;
+	return cookie;
+}
+
+static void iop_adma_check_threshold(struct iop_adma_chan *iop_chan)
+{
+	PRINTK("iop adma%d: pending: %d\n", iop_chan->device->id,
+		iop_chan->pending);
+
+	if (iop_chan->pending >= IOP_ADMA_THRESHOLD) {
+		iop_chan->pending = 0;
+		iop_chan_append(iop_chan);
+	}
+}
+
+static dma_cookie_t
+iop_adma_tx_submit(struct dma_async_tx_descriptor *tx)
+{
+	struct iop_adma_desc_slot *sw_desc = tx_to_iop_adma_slot(tx);
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(tx->chan);
+	struct iop_adma_desc_slot *group_start, *old_chain_tail;
+	int slot_cnt;
+	int slots_per_op;
+	dma_cookie_t cookie;
+
+	group_start = sw_desc->group_head;
+	slot_cnt = group_start->slot_cnt;
+	slots_per_op = group_start->slots_per_op;
+
+	spin_lock_bh(&iop_chan->lock);
+	cookie = iop_desc_assign_cookie(iop_chan, sw_desc);
+
+	old_chain_tail = list_entry(iop_chan->chain.prev,
+		struct iop_adma_desc_slot, chain_node);
+	list_splice_init(&sw_desc->group_list, &old_chain_tail->chain_node);
+
+	/* fix up the hardware chain */
+	iop_desc_set_next_desc(old_chain_tail, group_start->phys);
+
+	/* 1/ don't add pre-chained descriptors
+	 * 2/ dummy read to flush next_desc write
+	 */
+	BUG_ON(iop_desc_get_next_desc(sw_desc));
+
+	/* increment the pending count by the number of slots
+	 * memcpy operations have a 1:1 (slot:operation) relation
+	 * other operations are heavier and will pop the threshold
+	 * more often.
+	 */
+	iop_chan->pending += slot_cnt;
+	iop_adma_check_threshold(iop_chan);
+	spin_unlock_bh(&iop_chan->lock);
+
+	PRINTK("iop adma%d: %s cookie: %d slot: %d\n", iop_chan->device->id,
+		__FUNCTION__, sw_desc->async_tx.cookie, sw_desc->idx);
+
+	return cookie;
+}
+
+struct dma_async_tx_descriptor *
+iop_adma_prep_dma_interrupt(struct dma_chan *chan)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	struct iop_adma_desc_slot *sw_desc, *group_start;
+	int slot_cnt, slots_per_op;
+
+	PRINTK("iop adma%d: %s\n", iop_chan->device->id, __FUNCTION__);
+
+	spin_lock_bh(&iop_chan->lock);
+	slot_cnt = iop_chan_interrupt_slot_count(&slots_per_op, iop_chan);
+	sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+	if (sw_desc) {
+		group_start = sw_desc->group_head;
+		iop_desc_init_interrupt(group_start, iop_chan);
+		sw_desc->async_tx.type = DMA_INTERRUPT;
+	}
+	spin_unlock_bh(&iop_chan->lock);
+
+	return sw_desc ? &sw_desc->async_tx : NULL;
+}
+
+struct dma_async_tx_descriptor *
+iop_adma_prep_dma_memcpy(struct dma_chan *chan, size_t len, int int_en)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	struct iop_adma_desc_slot *sw_desc, *group_start;
+	int slot_cnt, slots_per_op;
+
+	if (unlikely(!len))
+		return NULL;
+	BUG_ON(unlikely(len > IOP_ADMA_MAX_BYTE_COUNT));
+
+	PRINTK("iop adma%d: %s len: %u\n",
+	iop_chan->device->id, __FUNCTION__, len);
+
+	spin_lock_bh(&iop_chan->lock);
+	slot_cnt = iop_chan_memcpy_slot_count(len, &slots_per_op);
+	sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+	if (sw_desc) {
+		group_start = sw_desc->group_head;
+		iop_desc_init_memcpy(group_start, int_en);
+		iop_desc_set_byte_count(group_start, iop_chan, len);
+		sw_desc->unmap_src_cnt = 1;
+		sw_desc->unmap_len = len;
+		sw_desc->async_tx.type = DMA_MEMCPY;
+	}
+	spin_unlock_bh(&iop_chan->lock);
+
+	return sw_desc ? &sw_desc->async_tx : NULL;
+}
+
+struct dma_async_tx_descriptor *
+iop_adma_prep_dma_memset(struct dma_chan *chan, int value, size_t len,
+	int int_en)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	struct iop_adma_desc_slot *sw_desc, *group_start;
+	int slot_cnt, slots_per_op;
+
+	if (unlikely(!len))
+		return NULL;
+	BUG_ON(unlikely(len > IOP_ADMA_MAX_BYTE_COUNT));
+
+	PRINTK("iop adma%d: %s len: %u\n",
+	iop_chan->device->id, __FUNCTION__, len);
+
+	spin_lock_bh(&iop_chan->lock);
+	slot_cnt = iop_chan_memset_slot_count(len, &slots_per_op);
+	sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+	if (sw_desc) {
+		group_start = sw_desc->group_head;
+		iop_desc_init_memset(group_start, int_en);
+		iop_desc_set_byte_count(group_start, iop_chan, len);
+		iop_desc_set_block_fill_val(group_start, value);
+		sw_desc->unmap_src_cnt = 1;
+		sw_desc->unmap_len = len;
+		sw_desc->async_tx.type = DMA_MEMSET;
+	}
+	spin_unlock_bh(&iop_chan->lock);
+
+	return sw_desc ? &sw_desc->async_tx : NULL;
+}
+
+struct dma_async_tx_descriptor *
+iop_adma_prep_dma_xor(struct dma_chan *chan, unsigned int src_cnt, size_t len,
+	int int_en)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	struct iop_adma_desc_slot *sw_desc, *group_start;
+	int slot_cnt, slots_per_op;
+
+	if (unlikely(!len))
+		return NULL;
+	BUG_ON(unlikely(len > IOP_ADMA_XOR_MAX_BYTE_COUNT));
+
+	PRINTK("iop adma%d: %s src_cnt: %d len: %u int_en: %d\n",
+	iop_chan->device->id, __FUNCTION__, src_cnt, len, int_en);
+
+	spin_lock_bh(&iop_chan->lock);
+	slot_cnt = iop_chan_xor_slot_count(len, src_cnt, &slots_per_op);
+	sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+	if (sw_desc) {
+		group_start = sw_desc->group_head;
+		iop_desc_init_xor(group_start, src_cnt, int_en);
+		iop_desc_set_byte_count(group_start, iop_chan, len);
+		sw_desc->unmap_src_cnt = src_cnt;
+		sw_desc->unmap_len = len;
+		sw_desc->async_tx.type = DMA_XOR;
+	}
+	spin_unlock_bh(&iop_chan->lock);
+
+	return sw_desc ? &sw_desc->async_tx : NULL;
+}
+
+struct dma_async_tx_descriptor *
+iop_adma_prep_dma_zero_sum(struct dma_chan *chan, unsigned int src_cnt,
+	size_t len, u32 *result, int int_en)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	struct iop_adma_desc_slot *sw_desc, *group_start;
+	int slot_cnt, slots_per_op;
+
+	if (unlikely(!len))
+		return NULL;
+
+	PRINTK("iop adma%d: %s src_cnt: %d len: %u\n",
+	iop_chan->device->id, __FUNCTION__, src_cnt, len);
+
+	spin_lock_bh(&iop_chan->lock);
+	slot_cnt = iop_chan_zero_sum_slot_count(len, src_cnt, &slots_per_op);
+	sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+	if (sw_desc) {
+		group_start = sw_desc->group_head;
+		iop_desc_init_zero_sum(group_start, src_cnt, int_en);
+		iop_desc_set_zero_sum_byte_count(group_start, len);
+		group_start->xor_check_result = result;
+		PRINTK("\t%s: group_start->xor_check_result: %p\n",
+			__FUNCTION__, group_start->xor_check_result);
+		sw_desc->unmap_src_cnt = src_cnt;
+		sw_desc->unmap_len = len;
+		sw_desc->async_tx.type = DMA_ZERO_SUM;
+	}
+	spin_unlock_bh(&iop_chan->lock);
+
+	return sw_desc ? &sw_desc->async_tx : NULL;
+}
+
+static void
+iop_adma_set_dest(dma_addr_t addr, struct dma_async_tx_descriptor *tx,
+	int index)
+{
+	struct iop_adma_desc_slot *sw_desc = tx_to_iop_adma_slot(tx);
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(tx->chan);
+
+	/* to do: support transfers lengths > IOP_ADMA_MAX_BYTE_COUNT */
+	iop_desc_set_dest_addr(sw_desc->group_head, iop_chan, addr);
+}
+
+static void
+iop_adma_set_src(dma_addr_t addr, struct dma_async_tx_descriptor *tx,
+	int index)
+{
+	struct iop_adma_desc_slot *sw_desc = tx_to_iop_adma_slot(tx);
+	struct iop_adma_desc_slot *group_start = sw_desc->group_head;
+
+	switch (tx->type) {
+	case DMA_MEMCPY:
+		iop_desc_set_memcpy_src_addr(group_start, addr);
+		break;
+	case DMA_XOR:
+		iop_desc_set_xor_src_addr(group_start, index, addr);
+		break;
+	case DMA_ZERO_SUM:
+		iop_desc_set_zero_sum_src_addr(group_start, index, addr);
+		break;
+	/* todo: case DMA_PQ_XOR: */
+	/* todo: case DMA_DUAL_XOR: */
+	/* todo: case DMA_PQ_UPDATE: */
+	/* todo: case DMA_PQ_ZERO_SUM: */
+	/* todo: case DMA_MEMCPY_CRC32C: */
+	case DMA_MEMSET:
+	default:
+		do {
+			struct iop_adma_chan *iop_chan =
+				to_iop_adma_chan(tx->chan);
+			printk(KERN_ERR "iop adma%d: unsupport tx_type: %d\n",
+				iop_chan->device->id, tx->type);
+			BUG();
+		} while (0);
+	}
+}
+
+static void iop_adma_dependency_added(struct dma_chan *chan)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	tasklet_schedule(&iop_chan->irq_tasklet);
+}
+
+static void iop_adma_free_chan_resources(struct dma_chan *chan)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	struct iop_adma_desc_slot *iter, *_iter;
+	int in_use_descs = 0;
+
+	iop_adma_slot_cleanup(iop_chan);
+
+	spin_lock_bh(&iop_chan->lock);
+	list_for_each_entry_safe(iter, _iter, &iop_chan->chain,
+					chain_node) {
+		in_use_descs++;
+		list_del(&iter->chain_node);
+	}
+	list_for_each_entry_safe_reverse(iter, _iter, &iop_chan->all_slots, slot_node) {
+		list_del(&iter->slot_node);
+		kfree(iter);
+		iop_chan->slots_allocated--;
+	}
+	iop_chan->last_used = NULL;
+
+	PRINTK("iop adma%d %s slots_allocated %d\n", iop_chan->device->id,
+		__FUNCTION__, iop_chan->slots_allocated);
+	spin_unlock_bh(&iop_chan->lock);
+
+	/* one is ok since we left it on there on purpose */
+	if (in_use_descs > 1)
+		printk(KERN_ERR "IOP: Freeing %d in use descriptors!\n",
+			in_use_descs - 1);
+}
+
+/**
+ * iop_adma_is_complete - poll the status of an ADMA transaction
+ * @chan: ADMA channel handle
+ * @cookie: ADMA transaction identifier
+ */
+static enum dma_status iop_adma_is_complete(struct dma_chan *chan,
+                                            dma_cookie_t cookie,
+                                            dma_cookie_t *done,
+                                            dma_cookie_t *used)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+	dma_cookie_t last_used;
+	dma_cookie_t last_complete;
+	enum dma_status ret;
+
+	last_used = chan->cookie;
+	last_complete = iop_chan->completed_cookie;
+
+	if (done)
+		*done= last_complete;
+	if (used)
+		*used = last_used;
+
+	ret = dma_async_is_complete(cookie, last_complete, last_used);
+	if (ret == DMA_SUCCESS)
+		return ret;
+
+	iop_adma_slot_cleanup(iop_chan);
+
+	last_used = chan->cookie;
+	last_complete = iop_chan->completed_cookie;
+
+	if (done)
+		*done= last_complete;
+	if (used)
+		*used = last_used;
+
+	return dma_async_is_complete(cookie, last_complete, last_used);
+}
+
+static irqreturn_t iop_adma_eot_handler(int irq, void *data)
+{
+	struct iop_adma_chan *chan = data;
+
+	PRINTK("iop adma%d: %s\n", chan->device->id, __FUNCTION__);
+
+	tasklet_schedule(&chan->irq_tasklet);
+
+	iop_adma_device_clear_eot_status(chan);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t iop_adma_eoc_handler(int irq, void *data)
+{
+	struct iop_adma_chan *chan = data;
+
+	PRINTK("iop adma%d: %s\n", chan->device->id, __FUNCTION__);
+
+	tasklet_schedule(&chan->irq_tasklet);
+
+	iop_adma_device_clear_eoc_status(chan);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t iop_adma_err_handler(int irq, void *data)
+{
+	struct iop_adma_chan *chan = data;
+	int id = chan->device->id;
+	unsigned long status = iop_chan_get_status(chan);
+
+	printk(KERN_ERR "iop adma%d: error ( %s%s%s%s%s%s%s)\n", id,
+		iop_is_err_int_parity(status, chan) ? "int_parity " : "",
+		iop_is_err_mcu_abort(status, chan) ? "mcu_abort " : "",
+		iop_is_err_int_tabort(status, chan) ? "int_tabort " : "",
+		iop_is_err_int_mabort(status, chan) ? "int_mabort " : "",
+		iop_is_err_pci_tabort(status, chan) ? "pci_tabort " : "",
+		iop_is_err_pci_mabort(status, chan) ? "pci_mabort " : "",
+		iop_is_err_split_tx(status, chan) ? "split_tx " : "");
+
+	iop_adma_device_clear_err_status(chan);
+
+	BUG();
+
+	return IRQ_HANDLED;
+}
+
+static void iop_adma_issue_pending(struct dma_chan *chan)
+{
+	struct iop_adma_chan *iop_chan = to_iop_adma_chan(chan);
+
+	if (iop_chan->pending) {
+		iop_chan->pending = 0;
+		iop_chan_append(iop_chan);
+	}
+}
+
+/*
+ * Perform a transaction to verify the HW works.
+ */
+#define IOP_ADMA_TEST_SIZE 2000
+
+static int __devinit iop_adma_memcpy_self_test(struct iop_adma_device *device)
+{
+	int i;
+	void *src, *dest;
+	dma_addr_t src_dma, dest_dma;
+	struct dma_chan *dma_chan;
+	dma_cookie_t cookie;
+	struct dma_async_tx_descriptor *tx;
+	int err = 0;
+	struct iop_adma_chan *iop_chan;
+
+	PRINTK("iop adma%d: %s\n", device->id, __FUNCTION__);
+
+	src = kzalloc(sizeof(u8) * IOP_ADMA_TEST_SIZE, GFP_KERNEL);
+	if (!src)
+		return -ENOMEM;
+	dest = kzalloc(sizeof(u8) * IOP_ADMA_TEST_SIZE, GFP_KERNEL);
+	if (!dest) {
+		kfree(src);
+		return -ENOMEM;
+	}
+
+	/* Fill in src buffer */
+	for (i = 0; i < IOP_ADMA_TEST_SIZE; i++)
+		((u8 *) src)[i] = (u8)i;
+
+	memset(dest, 0, IOP_ADMA_TEST_SIZE);
+
+	/* Start copy, using first DMA channel */
+	dma_chan = container_of(device->common.channels.next,
+	                        struct dma_chan,
+	                        device_node);
+	if (iop_adma_alloc_chan_resources(dma_chan) < 1) {
+		err = -ENODEV;
+		goto out;
+	}
+
+	tx = iop_adma_prep_dma_memcpy(dma_chan, IOP_ADMA_TEST_SIZE, 1);
+	dest_dma = dma_map_single(dma_chan->device->dev, dest, IOP_ADMA_TEST_SIZE, DMA_FROM_DEVICE);
+	iop_adma_set_dest(dest_dma, tx, 0);
+	src_dma = dma_map_single(dma_chan->device->dev, src, IOP_ADMA_TEST_SIZE, DMA_TO_DEVICE);
+	iop_adma_set_src(src_dma, tx, 0);
+
+	cookie = iop_adma_tx_submit(tx);
+	iop_adma_issue_pending(dma_chan);
+	async_tx_ack(tx);
+	msleep(1);
+
+	if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+		printk(KERN_ERR "iop adma%d: Self-test copy timed out, disabling\n",
+			device->id);
+		err = -ENODEV;
+		goto free_resources;
+	}
+
+	iop_chan = to_iop_adma_chan(dma_chan);
+	dma_sync_single_for_cpu(&iop_chan->device->pdev->dev, dest_dma,
+		IOP_ADMA_TEST_SIZE, DMA_FROM_DEVICE);
+	if (memcmp(src, dest, IOP_ADMA_TEST_SIZE)) {
+		printk(KERN_ERR "iop adma%d: Self-test copy failed compare, disabling\n",
+			device->id);
+		err = -ENODEV;
+		goto free_resources;
+	}
+
+free_resources:
+	iop_adma_free_chan_resources(dma_chan);
+out:
+	kfree(src);
+	kfree(dest);
+	return err;
+}
+
+#define IOP_ADMA_NUM_SRC_TEST 4 /* must be <= 15 */
+static int __devinit iop_adma_xor_zero_sum_self_test(struct iop_adma_device *device)
+{
+	int i, src_idx;
+	struct page *dest;
+	struct page *xor_srcs[IOP_ADMA_NUM_SRC_TEST];
+	struct page *zero_sum_srcs[IOP_ADMA_NUM_SRC_TEST + 1];
+	dma_addr_t dma_addr, dest_dma;
+	struct dma_async_tx_descriptor *tx;
+	struct dma_chan *dma_chan;
+	dma_cookie_t cookie;
+	u8 cmp_byte = 0;
+	u32 cmp_word;
+	u32 zero_sum_result;
+	int err = 0;
+	struct iop_adma_chan *iop_chan;
+
+	PRINTK("iop adma%d: %s\n", device->id, __FUNCTION__);
+
+	for (src_idx = 0; src_idx < IOP_ADMA_NUM_SRC_TEST; src_idx++) {
+		xor_srcs[src_idx] = alloc_page(GFP_KERNEL);
+		if (!xor_srcs[src_idx])
+			while (src_idx--) {
+				__free_page(xor_srcs[src_idx]);
+				return -ENOMEM;
+			}
+	}
+
+	dest = alloc_page(GFP_KERNEL);
+	if (!dest)
+		while (src_idx--) {
+			__free_page(xor_srcs[src_idx]);
+			return -ENOMEM;
+		}
+
+	/* Fill in src buffers */
+	for (src_idx = 0; src_idx < IOP_ADMA_NUM_SRC_TEST; src_idx++) {
+		u8 *ptr = page_address(xor_srcs[src_idx]);
+		for (i = 0; i < PAGE_SIZE; i++)
+			ptr[i] = (1 << src_idx);
+	}
+
+	for (src_idx = 0; src_idx < IOP_ADMA_NUM_SRC_TEST; src_idx++)
+		cmp_byte ^= (u8) (1 << src_idx);
+
+	cmp_word = (cmp_byte << 24) | (cmp_byte << 16) | (cmp_byte << 8) | cmp_byte;
+
+	memset(page_address(dest), 0, PAGE_SIZE);
+
+	dma_chan = container_of(device->common.channels.next,
+	                        struct dma_chan,
+	                        device_node);
+	if (iop_adma_alloc_chan_resources(dma_chan) < 1) {
+		err = -ENODEV;
+		goto out;
+	}
+
+	/* test xor */
+	tx = iop_adma_prep_dma_xor(dma_chan, IOP_ADMA_NUM_SRC_TEST, PAGE_SIZE, 1);
+	dest_dma = dma_map_page(dma_chan->device->dev, dest, 0, PAGE_SIZE, DMA_FROM_DEVICE);
+	iop_adma_set_dest(dest_dma, tx, 0);
+
+	for (i = 0; i < IOP_ADMA_NUM_SRC_TEST; i++) {
+		dma_addr = dma_map_page(dma_chan->device->dev, xor_srcs[i], 0,
+			PAGE_SIZE, DMA_TO_DEVICE);
+		iop_adma_set_src(dma_addr, tx, i);
+	}
+
+	cookie = iop_adma_tx_submit(tx);
+	iop_adma_issue_pending(dma_chan);
+	async_tx_ack(tx);
+	msleep(8);
+
+	if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+		printk(KERN_ERR "iop_adma: Self-test xor timed out, disabling\n");
+		err = -ENODEV;
+		goto free_resources;
+	}
+
+	iop_chan = to_iop_adma_chan(dma_chan);
+	dma_sync_single_for_cpu(&iop_chan->device->pdev->dev, dest_dma,
+		PAGE_SIZE, DMA_FROM_DEVICE);
+	for (i = 0; i < (PAGE_SIZE / sizeof(u32)); i++) {
+		u32 *ptr = page_address(dest);
+		if (ptr[i] != cmp_word) {
+			printk(KERN_ERR "iop_adma: Self-test xor failed compare, disabling\n");
+			err = -ENODEV;
+			goto free_resources;
+		}
+	}
+	dma_sync_single_for_device(&iop_chan->device->pdev->dev, dest_dma,
+		PAGE_SIZE, DMA_TO_DEVICE);
+
+	/* skip zero sum if the capability is not present */
+	if (!test_bit(DMA_ZERO_SUM, &dma_chan->device->capabilities))
+		goto free_resources;
+
+	/* zero sum the sources with the destintation page */
+	for (i = 0; i < IOP_ADMA_NUM_SRC_TEST; i++)
+		zero_sum_srcs[i] = xor_srcs[i];
+	zero_sum_srcs[i] = dest;
+
+	zero_sum_result = 1;
+
+	tx = iop_adma_prep_dma_zero_sum(dma_chan, IOP_ADMA_NUM_SRC_TEST + 1,
+		PAGE_SIZE, &zero_sum_result, 1);
+	for (i = 0; i < IOP_ADMA_NUM_SRC_TEST + 1; i++) {
+		dma_addr = dma_map_page(dma_chan->device->dev, zero_sum_srcs[i], 0,
+			PAGE_SIZE, DMA_TO_DEVICE);
+		iop_adma_set_src(dma_addr, tx, i);
+	}
+
+	cookie = iop_adma_tx_submit(tx);
+	iop_adma_issue_pending(dma_chan);
+	async_tx_ack(tx);
+	msleep(8);
+
+	if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+		printk(KERN_ERR "iop_adma: Self-test zero sum timed out, disabling\n");
+		err = -ENODEV;
+		goto free_resources;
+	}
+
+	if (zero_sum_result != 0) {
+		printk(KERN_ERR "iop_adma: Self-test zero sum failed compare, disabling\n");
+		err = -ENODEV;
+		goto free_resources;
+	}
+
+	/* test memset */
+	tx = iop_adma_prep_dma_memset(dma_chan, 0, PAGE_SIZE, 1);
+	dma_addr = dma_map_page(dma_chan->device->dev, dest, 0, PAGE_SIZE, DMA_FROM_DEVICE);
+	iop_adma_set_dest(dma_addr, tx, 0);
+
+	cookie = iop_adma_tx_submit(tx);
+	iop_adma_issue_pending(dma_chan);
+	async_tx_ack(tx);
+	msleep(8);
+
+	if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+		printk(KERN_ERR "iop_adma: Self-test memset timed out, disabling\n");
+		err = -ENODEV;
+		goto free_resources;
+	}
+
+	for (i = 0; i < PAGE_SIZE/sizeof(u32); i++) {
+		u32 *ptr = page_address(dest);
+		if (ptr[i]) {
+			printk(KERN_ERR "iop_adma: Self-test memset failed compare, disabling\n");
+			err = -ENODEV;
+			goto free_resources;
+		}
+	}
+
+	/* test for non-zero parity sum */
+	zero_sum_result = 0;
+	tx = iop_adma_prep_dma_zero_sum(dma_chan, IOP_ADMA_NUM_SRC_TEST + 1,
+		PAGE_SIZE, &zero_sum_result, 1);
+	for (i = 0; i < IOP_ADMA_NUM_SRC_TEST + 1; i++) {
+		dma_addr = dma_map_page(dma_chan->device->dev, zero_sum_srcs[i], 0,
+			PAGE_SIZE, DMA_TO_DEVICE);
+		iop_adma_set_src(dma_addr, tx, i);
+	}
+
+	cookie = iop_adma_tx_submit(tx);
+	iop_adma_issue_pending(dma_chan);
+	async_tx_ack(tx);
+	msleep(8);
+
+	if (iop_adma_is_complete(dma_chan, cookie, NULL, NULL) != DMA_SUCCESS) {
+		printk(KERN_ERR "iop_adma: Self-test non-zero sum timed out, disabling\n");
+		err = -ENODEV;
+		goto free_resources;
+	}
+
+	if (zero_sum_result != 1) {
+		printk(KERN_ERR "iop_adma: Self-test non-zero sum failed compare, disabling\n");
+		err = -ENODEV;
+		goto free_resources;
+	}
+
+free_resources:
+	iop_adma_free_chan_resources(dma_chan);
+out:
+	src_idx = IOP_ADMA_NUM_SRC_TEST;
+	while (src_idx--)
+		__free_page(xor_srcs[src_idx]);
+	__free_page(dest);
+	return err;
+}
+
+static int __devexit iop_adma_remove(struct platform_device *dev)
+{
+	struct iop_adma_device *device = platform_get_drvdata(dev);
+	struct dma_chan *chan, *_chan;
+	struct iop_adma_chan *iop_chan;
+	int i;
+	struct iop_adma_platform_data *plat_data = dev->dev.platform_data;
+
+	dma_async_device_unregister(&device->common);
+
+	for (i = 0; i < 3; i++) {
+		unsigned int irq;
+		irq = platform_get_irq(dev, i);
+		free_irq(irq, device);
+	}
+
+	dma_free_coherent(&dev->dev, plat_data->pool_size,
+			device->dma_desc_pool_virt, device->dma_desc_pool);
+
+	do {
+		struct resource *res;
+		res = platform_get_resource(dev, IORESOURCE_MEM, 0);
+		release_mem_region(res->start, res->end - res->start);
+	} while (0);
+
+	list_for_each_entry_safe(chan, _chan, &device->common.channels,
+				device_node) {
+		iop_chan = to_iop_adma_chan(chan);
+		list_del(&chan->device_node);
+		kfree(iop_chan);
+	}
+	kfree(device);
+
+	return 0;
+}
+
+static int __devinit iop_adma_probe(struct platform_device *pdev)
+{
+	struct resource *res;
+	int ret=0, irq_eot=0, irq_eoc=0, irq_err=0, irq, i;
+	struct iop_adma_device *adev;
+	struct iop_adma_chan *iop_chan;
+	struct iop_adma_platform_data *plat_data = pdev->dev.platform_data;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res)
+		return -ENODEV;
+
+	if (!request_mem_region(res->start, res->end - res->start, pdev->name))
+		return -EBUSY;
+
+	if ((adev = kzalloc(sizeof(*adev), GFP_KERNEL)) == NULL) {
+		ret = -ENOMEM;
+		goto err_adev_alloc;
+	}
+
+	/* allocate coherent memory for hardware descriptors
+	 * note: writecombine gives slightly better performance, but
+	 * requires that we explicitly drain the write buffer
+	 */
+	if ((adev->dma_desc_pool_virt = dma_alloc_writecombine(&pdev->dev,
+					plat_data->pool_size,
+					&adev->dma_desc_pool,
+					GFP_KERNEL)) == NULL) {
+		ret = -ENOMEM;
+		goto err_dma_alloc;
+	}
+
+	PRINTK("%s: allocted descriptor pool virt %p phys %p\n",
+	__FUNCTION__, adev->dma_desc_pool_virt, (void *) adev->dma_desc_pool);
+
+	adev->id = plat_data->hw_id;
+	adev->common.capabilities = plat_data->capabilities;
+
+	adev->pdev = pdev;
+	platform_set_drvdata(pdev, adev);
+
+	INIT_LIST_HEAD(&adev->common.channels);
+
+	/* set base routines */
+	adev->common.device_tx_submit = iop_adma_tx_submit;
+	adev->common.device_set_dest = iop_adma_set_dest;
+	adev->common.device_set_src = iop_adma_set_src;
+	adev->common.device_alloc_chan_resources = iop_adma_alloc_chan_resources;
+	adev->common.device_free_chan_resources = iop_adma_free_chan_resources;
+	adev->common.device_is_tx_complete = iop_adma_is_complete;
+	adev->common.device_issue_pending = iop_adma_issue_pending;
+	adev->common.device_dependency_added = iop_adma_dependency_added;
+	adev->common.dev = &pdev->dev;
+
+	/* set prep routines based on capability */
+	if (test_bit(DMA_MEMCPY, &adev->common.capabilities))
+		adev->common.device_prep_dma_memcpy = iop_adma_prep_dma_memcpy;
+	if (test_bit(DMA_MEMSET, &adev->common.capabilities))
+		adev->common.device_prep_dma_memset = iop_adma_prep_dma_memset;
+	if (test_bit(DMA_XOR, &adev->common.capabilities)) {
+		adev->common.max_xor = iop_adma_get_max_xor();
+		adev->common.device_prep_dma_xor = iop_adma_prep_dma_xor;
+	}
+	if (test_bit(DMA_ZERO_SUM, &adev->common.capabilities))
+		adev->common.device_prep_dma_zero_sum =
+			iop_adma_prep_dma_zero_sum;
+	if (test_bit(DMA_INTERRUPT, &adev->common.capabilities))
+		adev->common.device_prep_dma_interrupt =
+			iop_adma_prep_dma_interrupt;
+
+	if ((iop_chan = kzalloc(sizeof(*iop_chan), GFP_KERNEL)) == NULL) {
+		ret = -ENOMEM;
+		goto err_chan_alloc;
+	}
+	iop_chan->device = adev;
+
+	if ((iop_chan->mmr_base = ioremap(res->start, res->end - res->start))
+		== NULL) {
+		ret = -ENOMEM;
+		goto err_ioremap;
+	}
+	tasklet_init(&iop_chan->irq_tasklet, iop_adma_tasklet, (unsigned long)
+		iop_chan);
+
+	/* clear errors before enabling interrupts */
+	iop_adma_device_clear_err_status(iop_chan);
+
+	for (i = 0; i < 3; i++) {
+		irq = platform_get_irq(pdev, i);
+		if (irq < 0) {
+			ret = -ENXIO;
+			switch (i) {
+			case 0:
+				goto err_irq0;
+			case 1:
+				goto err_irq1;
+			case 2:
+				goto err_irq2;
+			}
+		} else {
+			switch (i) {
+			case 0:
+				irq_eot = irq;
+				ret = request_irq(irq, iop_adma_eot_handler,
+					 0, pdev->name, iop_chan);
+				if (ret) {
+					ret = -EIO;
+					goto err_irq0;
+				}
+				break;
+			case 1:
+				irq_eoc = irq;
+				ret = request_irq(irq, iop_adma_eoc_handler,
+					0, pdev->name, iop_chan);
+				if (ret) {
+					ret = -EIO;
+					goto err_irq1;
+				}
+				break;
+			case 2:
+				irq_err = irq;
+				ret = request_irq(irq, iop_adma_err_handler,
+					0, pdev->name, iop_chan);
+				if (ret) {
+					ret = -EIO;
+					goto err_irq2;
+				}
+				break;
+			}
+		}
+	}
+
+	spin_lock_init(&iop_chan->lock);
+	init_timer(&iop_chan->cleanup_watchdog);
+	iop_chan->cleanup_watchdog.data = (unsigned long) iop_chan;
+	iop_chan->cleanup_watchdog.function = iop_adma_tasklet;
+	INIT_LIST_HEAD(&iop_chan->chain);
+	INIT_LIST_HEAD(&iop_chan->all_slots);
+	INIT_RCU_HEAD(&iop_chan->common.rcu);
+	iop_chan->common.device = &adev->common;
+	list_add_tail(&iop_chan->common.device_node, &adev->common.channels);
+
+	if (test_bit(DMA_MEMCPY, &adev->common.capabilities)) {
+		ret = iop_adma_memcpy_self_test(adev);
+		PRINTK("iop adma%d: memcpy self test returned %d\n", adev->id, ret);
+		if (ret)
+			goto err_self_test;
+	}
+
+	if (test_bit(DMA_XOR, &adev->common.capabilities) ||
+		test_bit(DMA_MEMSET, &adev->common.capabilities)) {
+		ret = iop_adma_xor_zero_sum_self_test(adev);
+		PRINTK("iop adma%d: xor self test returned %d\n", adev->id, ret);
+		if (ret)
+			goto err_self_test;
+	}
+
+	printk(KERN_INFO "Intel(R) IOP ADMA Engine found [%d]: "
+	  "( %s%s%s%s%s%s%s%s%s%s)\n",
+	  adev->id,
+	  test_bit(DMA_PQ_XOR, &adev->common.capabilities) ? "pq_xor " : "",
+	  test_bit(DMA_PQ_UPDATE, &adev->common.capabilities) ? "pq_update " : "",
+	  test_bit(DMA_PQ_ZERO_SUM, &adev->common.capabilities) ? "pq_zero_sum " : "",
+	  test_bit(DMA_XOR, &adev->common.capabilities) ? "xor " : "",
+	  test_bit(DMA_DUAL_XOR, &adev->common.capabilities) ? "dual_xor " : "",
+	  test_bit(DMA_ZERO_SUM, &adev->common.capabilities) ? "xor_zero_sum " : "",
+	  test_bit(DMA_MEMSET, &adev->common.capabilities)  ? "memset " : "",
+	  test_bit(DMA_MEMCPY_CRC32C, &adev->common.capabilities) ? "memcpy+crc " : "",
+	  test_bit(DMA_MEMCPY, &adev->common.capabilities) ? "memcpy " : "",
+	  test_bit(DMA_INTERRUPT, &adev->common.capabilities) ? "int " : "");
+
+	dma_async_device_register(&adev->common);
+	goto out;
+
+err_self_test:
+	free_irq(irq_err, adev);
+err_irq2:
+	free_irq(irq_eoc, adev);
+err_irq1:
+	free_irq(irq_eot, adev);
+err_irq0:
+err_ioremap:
+	kfree(iop_chan);
+err_chan_alloc:
+	dma_free_coherent(&adev->pdev->dev, plat_data->pool_size,
+			adev->dma_desc_pool_virt, adev->dma_desc_pool);
+err_dma_alloc:
+	kfree(adev);
+err_adev_alloc:
+	release_mem_region(res->start, res->end - res->start);
+out:
+	return ret;
+}
+
+static void iop_chan_start_null_memcpy(struct iop_adma_chan *iop_chan)
+{
+	struct iop_adma_desc_slot *sw_desc, *group_start;
+	dma_cookie_t cookie;
+	int slot_cnt, slots_per_op;
+
+	PRINTK("iop adma%d: %s\n", iop_chan->device->id, __FUNCTION__);
+
+	spin_lock_bh(&iop_chan->lock);
+	slot_cnt = iop_chan_memcpy_slot_count(0, &slots_per_op);
+	sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+	if (sw_desc) {
+		group_start = sw_desc->group_head;
+
+		list_splice_init(&sw_desc->group_list, &iop_chan->chain);
+		sw_desc->async_tx.ack = 1;
+		iop_desc_init_memcpy(group_start, 0);
+		iop_desc_set_byte_count(group_start, iop_chan, 0);
+		iop_desc_set_dest_addr(group_start, iop_chan, 0);
+		iop_desc_set_memcpy_src_addr(group_start, 0);
+
+		cookie = iop_chan->common.cookie;
+		cookie++;
+		if (cookie <= 1)
+			cookie = 2;
+
+		/* initialize the completed cookie to be less than
+		 * the most recently used cookie
+		 */
+		iop_chan->completed_cookie = cookie - 1;
+		iop_chan->common.cookie = sw_desc->async_tx.cookie = cookie;
+
+		/* channel should not be busy */
+		BUG_ON(iop_chan_is_busy(iop_chan));
+
+		/* clear any prior error-status bits */
+		iop_adma_device_clear_err_status(iop_chan);
+
+		/* disable operation */
+		iop_chan_disable(iop_chan);
+
+		/* set the descriptor address */
+		iop_chan_set_next_descriptor(iop_chan, sw_desc->phys);
+
+		/* run the descriptor */
+		iop_chan_enable(iop_chan);
+	} else
+		printk(KERN_ERR "iop adma%d failed to allocate null descriptor\n",
+			iop_chan->device->id);
+	spin_unlock_bh(&iop_chan->lock);
+}
+
+static void iop_chan_start_null_xor(struct iop_adma_chan *iop_chan)
+{
+	struct iop_adma_desc_slot *sw_desc, *group_start;
+	dma_cookie_t cookie;
+	int slot_cnt, slots_per_op;
+
+	PRINTK("iop adma%d: %s\n", iop_chan->device->id, __FUNCTION__);
+
+	spin_lock_bh(&iop_chan->lock);
+	slot_cnt = iop_chan_xor_slot_count(0, 2, &slots_per_op);
+	sw_desc = iop_adma_alloc_slots(iop_chan, slot_cnt, slots_per_op);
+	if (sw_desc) {
+		group_start = sw_desc->group_head;
+		list_splice_init(&sw_desc->group_list, &iop_chan->chain);
+		sw_desc->async_tx.ack = 1;
+		iop_desc_init_null_xor(group_start, 2, 0);
+		iop_desc_set_byte_count(group_start, iop_chan, 0);
+		iop_desc_set_dest_addr(group_start, iop_chan, 0);
+		iop_desc_set_xor_src_addr(group_start, 0, 0);
+		iop_desc_set_xor_src_addr(group_start, 1, 0);
+
+		cookie = iop_chan->common.cookie;
+		cookie++;
+		if (cookie <= 1)
+			cookie = 2;
+
+		/* initialize the completed cookie to be less than
+		 * the most recently used cookie
+		 */
+		iop_chan->completed_cookie = cookie - 1;
+		iop_chan->common.cookie = sw_desc->async_tx.cookie = cookie;
+
+		/* channel should not be busy */
+		BUG_ON(iop_chan_is_busy(iop_chan));
+
+		/* clear any prior error-status bits */
+		iop_adma_device_clear_err_status(iop_chan);
+
+		/* disable operation */
+		iop_chan_disable(iop_chan);
+
+		/* set the descriptor address */
+		iop_chan_set_next_descriptor(iop_chan, sw_desc->phys);
+
+		/* run the descriptor */
+		iop_chan_enable(iop_chan);
+	} else
+		printk(KERN_ERR "iop adma%d failed to allocate null descriptor\n",
+			iop_chan->device->id);
+	spin_unlock_bh(&iop_chan->lock);
+}
+
+static struct platform_driver iop_adma_driver = {
+	.probe		= iop_adma_probe,
+	.remove		= iop_adma_remove,
+	.driver		= {
+		.owner	= THIS_MODULE,
+		.name	= "IOP-ADMA",
+	},
+};
+
+static int __init iop_adma_init (void)
+{
+	/* it's currently unsafe to unload this module */
+	/* if forced, worst case is that rmmod hangs */
+	__unsafe(THIS_MODULE);
+
+	return platform_driver_register(&iop_adma_driver);
+}
+
+static void __exit iop_adma_exit (void)
+{
+	platform_driver_unregister(&iop_adma_driver);
+	return;
+}
+
+module_init(iop_adma_init);
+module_exit(iop_adma_exit);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("IOP ADMA Engine Driver");
+MODULE_LICENSE("GPL");
diff --git a/include/asm-arm/hardware/iop_adma.h b/include/asm-arm/hardware/iop_adma.h
new file mode 100644
index 0000000..549a7d8
--- /dev/null
+++ b/include/asm-arm/hardware/iop_adma.h
@@ -0,0 +1,121 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef IOP_ADMA_H
+#define IOP_ADMA_H
+#include <linux/types.h>
+#include <linux/dmaengine.h>
+#include <linux/interrupt.h>
+
+#define IOP_ADMA_SLOT_SIZE 32
+#define IOP_ADMA_THRESHOLD 4
+
+/**
+ * struct iop_adma_device - internal representation of an ADMA device
+ * @pdev: Platform device
+ * @id: HW ADMA Device selector
+ * @dma_desc_pool: base of DMA descriptor region (DMA address)
+ * @dma_desc_pool_virt: base of DMA descriptor region (CPU address)
+ * @common: embedded struct dma_device
+ */
+struct iop_adma_device {
+	struct platform_device *pdev;
+	int id;
+	dma_addr_t dma_desc_pool;
+	void *dma_desc_pool_virt;
+	struct dma_device common;
+};
+
+/**
+ * struct iop_adma_chan - internal representation of an ADMA device
+ * @pending: allows batching of hardware operations
+ * @completed_cookie: identifier for the most recently completed operation
+ * @lock: serializes enqueue/dequeue operations to the slot pool
+ * @mmr_base: memory mapped register base
+ * @chain: device chain view of the descriptors
+ * @device: parent device
+ * @common: common dmaengine channel object members
+ * @last_used: place holder for allocation to continue from where it left off
+ * @all_slots: complete domain of slots usable by the channel
+ * @cleanup_watchdog: workaround missed interrupts on iop3xx
+ * @slots_allocated: records the actual size of the descriptor slot pool
+ * @irq_tasklet: bottom half where iop_adma_slot_cleanup runs
+ */
+struct iop_adma_chan {
+	int pending;
+	dma_cookie_t completed_cookie;
+	spinlock_t lock;
+	void __iomem *mmr_base;
+	struct list_head chain;
+	struct iop_adma_device *device;
+	struct dma_chan common;
+	struct iop_adma_desc_slot *last_used;
+	struct list_head all_slots;
+	struct timer_list cleanup_watchdog;
+	int slots_allocated;
+	struct tasklet_struct irq_tasklet;
+};
+
+/**
+ * struct iop_adma_desc_slot - IOP-ADMA software descriptor
+ * @slot_node: node on the iop_adma_chan.all_slots list
+ * @chain_node: node on the op_adma_chan.chain list
+ * @hw_desc: virtual address of the hardware descriptor chain
+ * @phys: hardware address of the hardware descriptor chain
+ * @group_head: first operation in a transaction
+ * @slot_cnt: total slots used in an transaction (group of operations)
+ * @slots_per_op: number of slots per operation
+ * @idx: pool index
+ * @unmap_src_cnt: number of xor sources
+ * @unmap_len: transaction bytecount
+ * @async_tx: support for the async_tx api
+ * @group_list: list of slots that make up a multi-descriptor transaction
+ *	for example transfer lengths larger than the supported hw max
+ * @xor_check_result: result of zero sum
+ * @crc32_result: result crc calculation
+ */
+struct iop_adma_desc_slot {
+	struct list_head slot_node;
+	struct list_head chain_node;
+	void *hw_desc;
+	dma_addr_t phys;
+	struct iop_adma_desc_slot *group_head;
+	u16 slot_cnt;
+	u16 slots_per_op;
+	u16 idx;
+	u16 unmap_src_cnt;
+	size_t unmap_len;
+	struct dma_async_tx_descriptor async_tx;
+	struct list_head group_list;
+	union {
+		u32 *xor_check_result;
+		u32 *crc32_result;
+	};
+};
+
+struct iop_adma_platform_data {
+        int hw_id;
+        unsigned long capabilities;
+        size_t pool_size;
+};
+
+#define to_iop_sw_desc(addr_hw_desc) container_of(addr_hw_desc, struct iop_adma_desc_slot, hw_desc)
+#define iop_hw_desc_slot_idx(hw_desc, idx) ( (void *) (((unsigned long) hw_desc) + ((idx) << 5)) )
+#endif

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 14/15] iop13xx: Surface the iop13xx adma units to the iop-adma driver
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (12 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 13/15] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 15/15] iop3xx: Surface the iop3xx DMA and AAU " Dan Williams
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

Adds the platform device definitions and the architecture specific
support routines (i.e. register initialization and descriptor formats) for the
iop-adma driver.

Changelog:
* added 'descriptor pool size' to the platform data
* add base support for buffer sizes larger than 16MB (hw max)
* build error fix from Kirill A. Shutemov
* rebase for async_tx changes
* add interrupt support
* do not call platform register macros in driver code

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 arch/arm/mach-iop13xx/setup.c          |  194 +++++++++++
 include/asm-arm/arch-iop13xx/adma.h    |  545 ++++++++++++++++++++++++++++++++
 include/asm-arm/arch-iop13xx/iop13xx.h |   34 +-
 3 files changed, 752 insertions(+), 21 deletions(-)

diff --git a/arch/arm/mach-iop13xx/setup.c b/arch/arm/mach-iop13xx/setup.c
index 9a46bcd..43189c8 100644
--- a/arch/arm/mach-iop13xx/setup.c
+++ b/arch/arm/mach-iop13xx/setup.c
@@ -25,6 +25,7 @@
 #include <asm/hardware.h>
 #include <asm/irq.h>
 #include <asm/io.h>
+#include <asm/hardware/iop_adma.h>
 
 #define IOP13XX_UART_XTAL 33334000
 #define IOP13XX_SETUP_DEBUG 0
@@ -236,6 +237,140 @@ static unsigned long iq8134x_probe_flash_size(void)
 }
 #endif
 
+/* ADMA Channels */
+static struct resource iop13xx_adma_0_resources[] = {
+	[0] = {
+		.start = IOP13XX_ADMA_PHYS_BASE(0),
+		.end = IOP13XX_ADMA_UPPER_PA(0),
+		.flags = IORESOURCE_MEM,
+	},
+	[1] = {
+		.start = IRQ_IOP13XX_ADMA0_EOT,
+		.end = IRQ_IOP13XX_ADMA0_EOT,
+		.flags = IORESOURCE_IRQ
+	},
+	[2] = {
+		.start = IRQ_IOP13XX_ADMA0_EOC,
+		.end = IRQ_IOP13XX_ADMA0_EOC,
+		.flags = IORESOURCE_IRQ
+	},
+	[3] = {
+		.start = IRQ_IOP13XX_ADMA0_ERR,
+		.end = IRQ_IOP13XX_ADMA0_ERR,
+		.flags = IORESOURCE_IRQ
+	}
+};
+
+static struct resource iop13xx_adma_1_resources[] = {
+	[0] = {
+		.start = IOP13XX_ADMA_PHYS_BASE(1),
+		.end = IOP13XX_ADMA_UPPER_PA(1),
+		.flags = IORESOURCE_MEM,
+	},
+	[1] = {
+		.start = IRQ_IOP13XX_ADMA1_EOT,
+		.end = IRQ_IOP13XX_ADMA1_EOT,
+		.flags = IORESOURCE_IRQ
+	},
+	[2] = {
+		.start = IRQ_IOP13XX_ADMA1_EOC,
+		.end = IRQ_IOP13XX_ADMA1_EOC,
+		.flags = IORESOURCE_IRQ
+	},
+	[3] = {
+		.start = IRQ_IOP13XX_ADMA1_ERR,
+		.end = IRQ_IOP13XX_ADMA1_ERR,
+		.flags = IORESOURCE_IRQ
+	}
+};
+
+static struct resource iop13xx_adma_2_resources[] = {
+	[0] = {
+		.start = IOP13XX_ADMA_PHYS_BASE(2),
+		.end = IOP13XX_ADMA_UPPER_PA(2),
+		.flags = IORESOURCE_MEM,
+	},
+	[1] = {
+		.start = IRQ_IOP13XX_ADMA2_EOT,
+		.end = IRQ_IOP13XX_ADMA2_EOT,
+		.flags = IORESOURCE_IRQ
+	},
+	[2] = {
+		.start = IRQ_IOP13XX_ADMA2_EOC,
+		.end = IRQ_IOP13XX_ADMA2_EOC,
+		.flags = IORESOURCE_IRQ
+	},
+	[3] = {
+		.start = IRQ_IOP13XX_ADMA2_ERR,
+		.end = IRQ_IOP13XX_ADMA2_ERR,
+		.flags = IORESOURCE_IRQ
+	}
+};
+
+static u64 iop13xx_adma_dmamask = DMA_64BIT_MASK;
+static struct iop_adma_platform_data iop13xx_adma_0_data = {
+	.hw_id = 0,
+	.capabilities =	DMA_CAP_MEMCPY | DMA_CAP_XOR | DMA_CAP_DUAL_XOR |
+			DMA_CAP_ZERO_SUM | DMA_CAP_MEMSET |
+			DMA_CAP_MEMCPY_CRC32C | DMA_CAP_INTERRUPT,
+	.pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop13xx_adma_1_data = {
+	.hw_id = 1,
+	.capabilities =	DMA_CAP_MEMCPY | DMA_CAP_XOR | DMA_CAP_DUAL_XOR |
+			DMA_CAP_ZERO_SUM | DMA_CAP_MEMSET |
+			DMA_CAP_MEMCPY_CRC32C | DMA_CAP_INTERRUPT,
+	.pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop13xx_adma_2_data = {
+	.hw_id = 2,
+	.capabilities =	DMA_CAP_MEMCPY | DMA_CAP_XOR | DMA_CAP_DUAL_XOR |
+			DMA_CAP_ZERO_SUM | DMA_CAP_MEMSET |
+			DMA_CAP_MEMCPY_CRC32C | DMA_CAP_PQ_XOR |
+			DMA_CAP_PQ_UPDATE | DMA_CAP_PQ_ZERO_SUM |
+			DMA_CAP_INTERRUPT,
+	.pool_size = PAGE_SIZE,
+};
+
+/* The ids are fixed up later in iop13xx_platform_init */
+static struct platform_device iop13xx_adma_0_channel = {
+	.name = "IOP-ADMA",
+	.id = 0,
+	.num_resources = 4,
+	.resource = iop13xx_adma_0_resources,
+	.dev = {
+		.dma_mask = &iop13xx_adma_dmamask,
+		.coherent_dma_mask = DMA_64BIT_MASK,
+		.platform_data = (void *) &iop13xx_adma_0_data,
+	},
+};
+
+static struct platform_device iop13xx_adma_1_channel = {
+	.name = "IOP-ADMA",
+	.id = 0,
+	.num_resources = 4,
+	.resource = iop13xx_adma_1_resources,
+	.dev = {
+		.dma_mask = &iop13xx_adma_dmamask,
+		.coherent_dma_mask = DMA_64BIT_MASK,
+		.platform_data = (void *) &iop13xx_adma_1_data,
+	},
+};
+
+static struct platform_device iop13xx_adma_2_channel = {
+	.name = "IOP-ADMA",
+	.id = 0,
+	.num_resources = 4,
+	.resource = iop13xx_adma_2_resources,
+	.dev = {
+		.dma_mask = &iop13xx_adma_dmamask,
+		.coherent_dma_mask = DMA_64BIT_MASK,
+		.platform_data = (void *) &iop13xx_adma_2_data,
+	},
+};
+
 void __init iop13xx_map_io(void)
 {
 	/* Initialize the Static Page Table maps */
@@ -244,11 +379,12 @@ void __init iop13xx_map_io(void)
 
 static int init_uart = 0;
 static int init_i2c = 0;
+static int init_adma = 0;
 
 void __init iop13xx_platform_init(void)
 {
 	int i;
-	u32 uart_idx, i2c_idx, plat_idx;
+	u32 uart_idx, i2c_idx, adma_idx, plat_idx;
 	struct platform_device *iop13xx_devices[IQ81340_MAX_PLAT_DEVICES];
 
 	/* set the bases so we can read the device id */
@@ -298,6 +434,12 @@ void __init iop13xx_platform_init(void)
 		}
 	}
 
+	if (init_adma == IOP13XX_INIT_ADMA_DEFAULT) {
+		init_adma |= IOP13XX_INIT_ADMA_0;
+		init_adma |= IOP13XX_INIT_ADMA_1;
+		init_adma |= IOP13XX_INIT_ADMA_2;
+	}
+
 	plat_idx = 0;
 	uart_idx = 0;
 	i2c_idx = 0;
@@ -336,6 +478,26 @@ void __init iop13xx_platform_init(void)
 		}
 	}
 
+	adma_idx = 0;
+	for(i = 0; i < IQ81340_NUM_ADMA; i++) {
+		if ((init_adma & (1 << i)) && IOP13XX_SETUP_DEBUG)
+			printk("Adding adma%d to platform device list\n", i);
+		switch(init_adma & (1 << i)) {
+		case IOP13XX_INIT_ADMA_0:
+			iop13xx_adma_0_channel.id = adma_idx++;
+			iop13xx_devices[plat_idx++] = &iop13xx_adma_0_channel;
+			break;
+		case IOP13XX_INIT_ADMA_1:
+			iop13xx_adma_1_channel.id = adma_idx++;
+			iop13xx_devices[plat_idx++] = &iop13xx_adma_1_channel;
+			break;
+		case IOP13XX_INIT_ADMA_2:
+			iop13xx_adma_2_channel.id = adma_idx++;
+			iop13xx_devices[plat_idx++] = &iop13xx_adma_2_channel;
+			break;
+		}
+	}
+
 #ifdef CONFIG_MTD_PHYSMAP
 	iq8134x_flash_resource.end = iq8134x_flash_resource.start +
 				iq8134x_probe_flash_size() - 1;
@@ -403,5 +565,35 @@ static int __init iop13xx_init_i2c_setup(char *str)
 	return 1;
 }
 
+static int __init iop13xx_init_adma_setup(char *str)
+{
+	if (str)	{
+		while (*str != '\0') {
+			switch(*str) {
+			case '0':
+				init_adma |= IOP13XX_INIT_ADMA_0;
+				break;
+			case '1':
+				init_adma |= IOP13XX_INIT_ADMA_1;
+				break;
+			case '2':
+				init_adma |= IOP13XX_INIT_ADMA_2;
+				break;
+			case ',':
+			case '=':
+				break;
+			default:
+				PRINTK("\"iop13xx_init_adma\" malformed"
+					    " at character: \'%c\'", *str);
+				*(str + 1) = '\0';
+				init_adma = IOP13XX_INIT_ADMA_DEFAULT;
+			}
+			str++;
+		}
+	}
+	return 1;
+}
+
+__setup("iop13xx_init_adma", iop13xx_init_adma_setup);
 __setup("iop13xx_init_uart", iop13xx_init_uart_setup);
 __setup("iop13xx_init_i2c", iop13xx_init_i2c_setup);
diff --git a/include/asm-arm/arch-iop13xx/adma.h b/include/asm-arm/arch-iop13xx/adma.h
new file mode 100644
index 0000000..f0a5041
--- /dev/null
+++ b/include/asm-arm/arch-iop13xx/adma.h
@@ -0,0 +1,545 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef _ADMA_H
+#define _ADMA_H
+#include <linux/types.h>
+#include <asm/io.h>
+#include <asm/hardware.h>
+#include <asm/hardware/iop_adma.h>
+
+#define ADMA_ACCR(chan)	(chan->mmr_base + 0x0)
+#define ADMA_ACSR(chan)	(chan->mmr_base + 0x4)
+#define ADMA_ADAR(chan)	(chan->mmr_base + 0x8)
+#define ADMA_IIPCR(chan)	(chan->mmr_base + 0x18)
+#define ADMA_IIPAR(chan)	(chan->mmr_base + 0x1c)
+#define ADMA_IIPUAR(chan)	(chan->mmr_base + 0x20)
+#define ADMA_ANDAR(chan)	(chan->mmr_base + 0x24)
+#define ADMA_ADCR(chan)	(chan->mmr_base + 0x28)
+#define ADMA_CARMD(chan)	(chan->mmr_base + 0x2c)
+#define ADMA_ABCR(chan)	(chan->mmr_base + 0x30)
+#define ADMA_DLADR(chan)	(chan->mmr_base + 0x34)
+#define ADMA_DUADR(chan)	(chan->mmr_base + 0x38)
+#define ADMA_SLAR(src, chan)	(chan->mmr_base + (0x3c + (src <<3)))
+#define ADMA_SUAR(src, chan)	(chan->mmr_base + (0x40 + (src <<3)))
+
+struct iop13xx_adma_src {
+	u32 src_addr;
+	union {
+		u32 upper_src_addr;
+		struct {
+			unsigned int pq_upper_src_addr:24;
+			unsigned int pq_dmlt:8;
+		};
+	};
+};
+
+struct iop13xx_adma_desc_ctrl {
+	unsigned int int_en:1;
+	unsigned int xfer_dir:2;
+	unsigned int src_select:4;
+	unsigned int zero_result:1;
+	unsigned int block_fill_en:1;
+	unsigned int crc_gen_en:1;
+	unsigned int crc_xfer_dis:1;
+	unsigned int crc_seed_fetch_dis:1;
+	unsigned int status_write_back_en:1;
+	unsigned int endian_swap_en:1;
+	unsigned int reserved0:2;
+	unsigned int pq_update_xfer_en:1;
+	unsigned int dual_xor_en:1;
+	unsigned int pq_xfer_en:1;
+	unsigned int p_xfer_dis:1;
+	unsigned int reserved1:10;
+	unsigned int relax_order_en:1;
+	unsigned int no_snoop_en:1;
+};
+
+struct iop13xx_adma_byte_count {
+	unsigned int byte_count:24;
+	unsigned int host_if:3;
+	unsigned int reserved:2;
+	unsigned int zero_result_err_q:1;
+	unsigned int zero_result_err:1;
+	unsigned int tx_complete:1;
+};
+
+struct iop13xx_adma_desc_hw {
+	u32 next_desc;
+	union {
+		u32 desc_ctrl;
+		struct iop13xx_adma_desc_ctrl desc_ctrl_field;
+	};
+	union {
+		u32 crc_addr;
+		u32 block_fill_data;
+		u32 q_dest_addr;
+	};
+	union {
+		u32 byte_count;
+		struct iop13xx_adma_byte_count byte_count_field;
+	};
+	union {
+		u32 dest_addr;
+		u32 p_dest_addr;
+	};
+	union {
+		u32 upper_dest_addr;
+		u32 pq_upper_dest_addr;
+	};
+	struct iop13xx_adma_src src[1];
+};
+
+struct iop13xx_adma_desc_dual_xor {
+	u32 next_desc;
+	u32 desc_ctrl;
+	u32 reserved;
+	u32 byte_count;
+	u32 h_dest_addr;
+	u32 h_upper_dest_addr;
+	u32 src0_addr;
+	u32 upper_src0_addr;
+	u32 src1_addr;
+	u32 upper_src1_addr;
+	u32 h_src_addr;
+	u32 h_upper_src_addr;
+	u32 d_src_addr;
+	u32 d_upper_src_addr;
+	u32 d_dest_addr;
+	u32 d_upper_dest_addr;
+};
+
+struct iop13xx_adma_desc_pq_update {
+	u32 next_desc;
+	u32 desc_ctrl;
+	u32 reserved;
+	u32 byte_count;
+	u32 p_dest_addr;
+	u32 p_upper_dest_addr;
+	u32 src0_addr;
+	u32 upper_src0_addr;
+	u32 src1_addr;
+	u32 upper_src1_addr;
+	u32 p_src_addr;
+	u32 p_upper_src_addr;
+	u32 q_src_addr;
+	struct {
+		unsigned int q_upper_src_addr:24;
+		unsigned int q_dmlt:8;
+	};
+	u32 q_dest_addr;
+	u32 q_upper_dest_addr;
+};
+
+static inline int iop_adma_get_max_xor(void)
+{
+	return 16;
+}
+
+static inline u32 iop_chan_get_current_descriptor(struct iop_adma_chan *chan)
+{
+	return __raw_readl(ADMA_ADAR(chan));
+}
+
+static inline void iop_chan_set_next_descriptor(struct iop_adma_chan *chan,
+						u32 next_desc_addr)
+{
+	__raw_writel(next_desc_addr, ADMA_ANDAR(chan));
+}
+
+#define ADMA_STATUS_BUSY (1 << 13)
+
+static inline char iop_chan_is_busy(struct iop_adma_chan *chan)
+{
+	if (__raw_readl(ADMA_ACSR(chan)) &
+		ADMA_STATUS_BUSY)
+		return 1;
+	else
+		return 0;
+}
+
+static inline int iop_chan_get_desc_align(struct iop_adma_chan *chan, int num_slots)
+{
+	return 1;
+}
+#define iop_desc_is_aligned(x, y) 1
+
+static inline int iop_chan_memcpy_slot_count(size_t len, int *slots_per_op)
+{
+	*slots_per_op = 1;
+	return 1;
+}
+
+#define iop_chan_interrupt_slot_count(s, c) iop_chan_memcpy_slot_count(0, s)
+
+static inline int iop_chan_memset_slot_count(size_t len, int *slots_per_op)
+{
+	*slots_per_op = 1;
+	return 1;
+}
+
+static inline int iop_chan_xor_slot_count(size_t len, int src_cnt, int *slots_per_op)
+{
+	int num_slots;
+	/* slots_to_find = 1 for basic descriptor + 1 per 4 sources above 1
+	 * (1 source => 8 bytes) (1 slot => 32 bytes)
+	 */
+	num_slots = 1 + (((src_cnt - 1) << 3) >> 5);
+	if (((src_cnt - 1) << 3) & 0x1f)
+		num_slots++;
+
+	*slots_per_op = num_slots;
+
+	return num_slots;
+}
+
+#define ADMA_MAX_BYTE_COUNT	(16 * 1024 * 1024)
+#define IOP_ADMA_MAX_BYTE_COUNT ADMA_MAX_BYTE_COUNT
+#define IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT ADMA_MAX_BYTE_COUNT
+#define IOP_ADMA_XOR_MAX_BYTE_COUNT ADMA_MAX_BYTE_COUNT
+#define iop_chan_zero_sum_slot_count(l, s, o) iop_chan_xor_slot_count(l, s, o)
+
+static inline u32 iop_desc_get_dest_addr(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	return hw_desc->dest_addr;
+}
+
+static inline u32 iop_desc_get_byte_count(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	return hw_desc->byte_count_field.byte_count;
+}
+
+static inline u32 iop_desc_get_src_addr(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan,
+					int src_idx)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	return hw_desc->src[src_idx].src_addr;
+}
+
+static inline u32 iop_desc_get_src_count(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	return hw_desc->desc_ctrl_field.src_select + 1;
+}
+
+static inline void
+iop_desc_init_memcpy(struct iop_adma_desc_slot *desc, int int_en)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	union {
+		u32 value;
+		struct iop13xx_adma_desc_ctrl field;
+	} u_desc_ctrl;
+
+	u_desc_ctrl.value = 0;
+	u_desc_ctrl.field.xfer_dir = 3; /* local to internal bus */
+	u_desc_ctrl.field.int_en = int_en;
+	hw_desc->desc_ctrl = u_desc_ctrl.value;
+	hw_desc->crc_addr = 0;
+}
+
+static inline void
+iop_desc_init_memset(struct iop_adma_desc_slot *desc, int int_en)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	union {
+		u32 value;
+		struct iop13xx_adma_desc_ctrl field;
+	} u_desc_ctrl;
+
+	u_desc_ctrl.value = 0;
+	u_desc_ctrl.field.xfer_dir = 3; /* local to internal bus */
+	u_desc_ctrl.field.block_fill_en = 1;
+	u_desc_ctrl.field.int_en = int_en;
+	hw_desc->desc_ctrl = u_desc_ctrl.value;
+	hw_desc->crc_addr = 0;
+}
+
+/* to do: support buffers larger than ADMA_MAX_BYTE_COUNT */
+static inline void
+iop_desc_init_xor(struct iop_adma_desc_slot *desc, int src_cnt, int int_en)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	union {
+		u32 value;
+		struct iop13xx_adma_desc_ctrl field;
+	} u_desc_ctrl;
+
+	u_desc_ctrl.value = 0;
+	u_desc_ctrl.field.src_select = src_cnt - 1;
+	u_desc_ctrl.field.xfer_dir = 3; /* local to internal bus */
+	u_desc_ctrl.field.int_en = int_en;
+	hw_desc->desc_ctrl = u_desc_ctrl.value;
+	hw_desc->crc_addr = 0;
+
+}
+#define iop_desc_init_null_xor(d, s, i) iop_desc_init_xor(d, s, i)
+
+/* to do: support buffers larger than ADMA_MAX_BYTE_COUNT */
+static inline int
+iop_desc_init_zero_sum(struct iop_adma_desc_slot *desc, int src_cnt, int int_en)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	union {
+		u32 value;
+		struct iop13xx_adma_desc_ctrl field;
+	} u_desc_ctrl;
+
+	u_desc_ctrl.value = 0;
+	u_desc_ctrl.field.src_select = src_cnt - 1;
+	u_desc_ctrl.field.xfer_dir = 3; /* local to internal bus */
+	u_desc_ctrl.field.zero_result = 1;
+	u_desc_ctrl.field.status_write_back_en = 1;
+	u_desc_ctrl.field.int_en = int_en;
+	hw_desc->desc_ctrl = u_desc_ctrl.value;
+	hw_desc->crc_addr = 0;
+
+	return 1;
+}
+
+static inline void iop_desc_set_byte_count(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan,
+					u32 byte_count)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	hw_desc->byte_count = byte_count;
+}
+
+static inline void iop_desc_set_zero_sum_byte_count(struct iop_adma_desc_slot *desc,
+					u32 len)
+{
+	int slots_per_op = desc->slots_per_op;
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc, *iter;
+	int i = 0;
+
+	if (len <= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT) {
+		hw_desc->byte_count = len;
+	} else {
+		do {
+			iter = iop_hw_desc_slot_idx(hw_desc, i);
+			iter->byte_count = IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+			len -= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+			i += slots_per_op;
+		} while (len > IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT);
+
+		if (len) {
+			iter = iop_hw_desc_slot_idx(hw_desc, i);
+			iter->byte_count = len;
+		}
+	}
+}
+
+
+static inline void iop_desc_set_dest_addr(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan,
+					dma_addr_t addr)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	hw_desc->dest_addr = addr;
+	hw_desc->upper_dest_addr = 0;
+}
+
+static inline void iop_desc_set_memcpy_src_addr(struct iop_adma_desc_slot *desc,
+					dma_addr_t addr)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	hw_desc->src[0].src_addr = addr;
+	hw_desc->src[0].upper_src_addr = 0;
+}
+
+static inline void iop_desc_set_xor_src_addr(struct iop_adma_desc_slot *desc,
+					int src_idx, dma_addr_t addr)
+{
+	int slot_cnt = desc->slot_cnt, slots_per_op = desc->slots_per_op;
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc, *iter;
+	int i = 0;
+
+	do {
+		iter = iop_hw_desc_slot_idx(hw_desc, i);
+		iter->src[src_idx].src_addr = addr;
+		iter->src[src_idx].upper_src_addr = 0;
+		slot_cnt -= slots_per_op;
+		if (slot_cnt) {
+			i += slots_per_op;
+			addr += IOP_ADMA_XOR_MAX_BYTE_COUNT;
+		}
+	} while (slot_cnt);
+}
+
+static inline void
+iop_desc_init_interrupt(struct iop_adma_desc_slot *desc,
+	struct iop_adma_chan *chan)
+{
+	iop_desc_init_memcpy(desc, 1);
+	iop_desc_set_byte_count(desc, chan, 0);
+	iop_desc_set_dest_addr(desc, chan, 0);
+	iop_desc_set_memcpy_src_addr(desc, 0);
+}
+
+#define iop_desc_set_zero_sum_src_addr iop_desc_set_xor_src_addr
+
+static inline void iop_desc_set_next_desc(struct iop_adma_desc_slot *desc,
+					u32 next_desc_addr)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	BUG_ON(hw_desc->next_desc);
+	hw_desc->next_desc = next_desc_addr;
+}
+
+static inline u32 iop_desc_get_next_desc(struct iop_adma_desc_slot *desc)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	return hw_desc->next_desc;
+}
+
+static inline void iop_desc_clear_next_desc(struct iop_adma_desc_slot *desc)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	hw_desc->next_desc = 0;
+}
+
+static inline void iop_desc_set_block_fill_val(struct iop_adma_desc_slot *desc,
+						u32 val)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	hw_desc->block_fill_data = val;
+}
+
+static inline int iop_desc_get_zero_result(struct iop_adma_desc_slot *desc)
+{
+	struct iop13xx_adma_desc_hw *hw_desc = desc->hw_desc;
+	struct iop13xx_adma_desc_ctrl desc_ctrl = hw_desc->desc_ctrl_field;
+	struct iop13xx_adma_byte_count byte_count = hw_desc->byte_count_field;
+
+	BUG_ON(!(byte_count.tx_complete && desc_ctrl.zero_result));
+
+	if(desc_ctrl.pq_xfer_en)
+		return byte_count.zero_result_err_q;
+	else
+		return byte_count.zero_result_err;
+}
+
+static inline void iop_chan_append(struct iop_adma_chan *chan)
+{
+	u32 adma_accr;
+
+	adma_accr = __raw_readl(ADMA_ACCR(chan));
+	adma_accr |= 0x2;
+	__raw_writel(adma_accr, ADMA_ACCR(chan));
+}
+
+static inline void iop_chan_idle(int busy, struct iop_adma_chan *chan)
+{
+	do { } while (0);
+}
+
+static inline u32 iop_chan_get_status(struct iop_adma_chan *chan)
+{
+	return __raw_readl(ADMA_ACSR(chan));
+}
+
+static inline void iop_chan_disable(struct iop_adma_chan *chan)
+{
+	u32 adma_chan_ctrl = __raw_readl(ADMA_ACCR(chan));
+	adma_chan_ctrl &= ~0x1;
+	__raw_writel(adma_chan_ctrl, ADMA_ACCR(chan));
+}
+
+static inline void iop_chan_enable(struct iop_adma_chan *chan)
+{
+	u32 adma_chan_ctrl;
+
+	/* drain write buffer */
+	asm volatile ("mcr p15, 0, r1, c7, c10, 4" : : : "%r1");
+	adma_chan_ctrl = __raw_readl(ADMA_ACCR(chan));
+	adma_chan_ctrl |= 0x1;
+	__raw_writel(adma_chan_ctrl, ADMA_ACCR(chan));
+}
+
+static inline void iop_adma_device_clear_eot_status(struct iop_adma_chan *chan)
+{
+	u32 status = __raw_readl(ADMA_ACSR(chan));
+	status &= (1 << 12);
+	__raw_writel(status, ADMA_ACSR(chan));
+}
+
+static inline void iop_adma_device_clear_eoc_status(struct iop_adma_chan *chan)
+{
+	u32 status = __raw_readl(ADMA_ACSR(chan));
+	status &= (1 << 11);
+	__raw_writel(status, ADMA_ACSR(chan));
+}
+
+static inline void iop_adma_device_clear_err_status(struct iop_adma_chan *chan)
+{
+	u32 status = __raw_readl(ADMA_ACSR(chan));
+	status &= (1 << 9) | (1 << 5) | (1 << 4) | (1 << 3);
+	__raw_writel(status, ADMA_ACSR(chan));
+}
+
+static inline int
+iop_is_err_int_parity(unsigned long status, struct iop_adma_chan *chan)
+{
+	return test_bit(9, &status);
+}
+
+static inline int
+iop_is_err_mcu_abort(unsigned long status, struct iop_adma_chan *chan)
+{
+	return test_bit(5, &status);
+}
+
+static inline int
+iop_is_err_int_tabort(unsigned long status, struct iop_adma_chan *chan)
+{
+	return test_bit(4, &status);
+}
+
+static inline int
+iop_is_err_int_mabort(unsigned long status, struct iop_adma_chan *chan)
+{
+	return test_bit(3, &status);
+}
+
+static inline int
+iop_is_err_pci_tabort(unsigned long status, struct iop_adma_chan *chan)
+{
+	return 0;
+}
+
+static inline int
+iop_is_err_pci_mabort(unsigned long status, struct iop_adma_chan *chan)
+{
+	return 0;
+}
+
+static inline int
+iop_is_err_split_tx(unsigned long status, struct iop_adma_chan *chan)
+{
+	return 0;
+}
+
+#endif /* _ADMA_H */
diff --git a/include/asm-arm/arch-iop13xx/iop13xx.h b/include/asm-arm/arch-iop13xx/iop13xx.h
index d26b755..57c0f7d 100644
--- a/include/asm-arm/arch-iop13xx/iop13xx.h
+++ b/include/asm-arm/arch-iop13xx/iop13xx.h
@@ -161,12 +161,22 @@ static inline int iop13xx_cpu_id(void)
 #define IOP13XX_INIT_I2C_1	      (1 << 1)
 #define IOP13XX_INIT_I2C_2	      (1 << 2)
 
-#define IQ81340_NUM_UART     2
-#define IQ81340_NUM_I2C      3
-#define IQ81340_NUM_PHYS_MAP_FLASH 1
+/* ADMA selection flags */
+/* INIT_ADMA_DEFAULT = Rely on CONFIG_IOP13XX_ADMA* */
+#define IOP13XX_INIT_ADMA_DEFAULT     (0)
+#define IOP13XX_INIT_ADMA_0           (1 << 0)
+#define IOP13XX_INIT_ADMA_1           (1 << 1)
+#define IOP13XX_INIT_ADMA_2           (1 << 2)
+
+/* Platform devices */
+#define IQ81340_NUM_UART     		2
+#define IQ81340_NUM_I2C      		3
+#define IQ81340_NUM_PHYS_MAP_FLASH	1
+#define IQ81340_NUM_ADMA     		3
 #define IQ81340_MAX_PLAT_DEVICES (IQ81340_NUM_UART +\
 				IQ81340_NUM_I2C +\
-				IQ81340_NUM_PHYS_MAP_FLASH)
+				IQ81340_NUM_PHYS_MAP_FLASH +\
+				IQ81340_NUM_ADMA)
 
 /*========================== PMMR offsets for key registers ============*/
 #define IOP13XX_ATU0_PMMR_OFFSET   	0x00048000
@@ -410,22 +420,6 @@ static inline int iop13xx_cpu_id(void)
 /*==============================ADMA UNITS===============================*/
 #define IOP13XX_ADMA_PHYS_BASE(chan)	IOP13XX_REG_ADDR32_PHYS((chan << 9))
 #define IOP13XX_ADMA_UPPER_PA(chan)	(IOP13XX_ADMA_PHYS_BASE(chan) + 0xc0)
-#define IOP13XX_ADMA_OFFSET(chan, ofs)	IOP13XX_REG_ADDR32((chan << 9) + (ofs))
-
-#define IOP13XX_ADMA_ACCR(chan)      IOP13XX_ADMA_OFFSET(chan, 0x0)
-#define IOP13XX_ADMA_ACSR(chan)      IOP13XX_ADMA_OFFSET(chan, 0x4)
-#define IOP13XX_ADMA_ADAR(chan)      IOP13XX_ADMA_OFFSET(chan, 0x8)
-#define IOP13XX_ADMA_IIPCR(chan)     IOP13XX_ADMA_OFFSET(chan, 0x18)
-#define IOP13XX_ADMA_IIPAR(chan)     IOP13XX_ADMA_OFFSET(chan, 0x1c)
-#define IOP13XX_ADMA_IIPUAR(chan)    IOP13XX_ADMA_OFFSET(chan, 0x20)
-#define IOP13XX_ADMA_ANDAR(chan)     IOP13XX_ADMA_OFFSET(chan, 0x24)
-#define IOP13XX_ADMA_ADCR(chan)      IOP13XX_ADMA_OFFSET(chan, 0x28)
-#define IOP13XX_ADMA_CARMD(chan)     IOP13XX_ADMA_OFFSET(chan, 0x2c)
-#define IOP13XX_ADMA_ABCR(chan)      IOP13XX_ADMA_OFFSET(chan, 0x30)
-#define IOP13XX_ADMA_DLADR(chan)     IOP13XX_ADMA_OFFSET(chan, 0x34)
-#define IOP13XX_ADMA_DUADR(chan)     IOP13XX_ADMA_OFFSET(chan, 0x38)
-#define IOP13XX_ADMA_SLAR(src, chan) IOP13XX_ADMA_OFFSET(chan, 0x3c + (src <<3))
-#define IOP13XX_ADMA_SUAR(src, chan) IOP13XX_ADMA_OFFSET(chan, 0x40 + (src <<3))
 
 /*==============================XSI BRIDGE===============================*/
 #define IOP13XX_XBG_BECSR		IOP13XX_REG_ADDR32(0x178c)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2.6.21-rc4 15/15] iop3xx: Surface the iop3xx DMA and AAU units to the iop-adma driver
  2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
                   ` (13 preceding siblings ...)
  2007-03-23  6:52 ` [PATCH 2.6.21-rc4 14/15] iop13xx: Surface the iop13xx adma units to the iop-adma driver Dan Williams
@ 2007-03-23  6:52 ` Dan Williams
  14 siblings, 0 replies; 16+ messages in thread
From: Dan Williams @ 2007-03-23  6:52 UTC (permalink / raw)
  To: neilb, christopher.leech, linux-raid, linux-kernel
  Cc: akpm, torvalds, yur, wd, arjan, rmk+kernel

Adds the platform device definitions and the architecture specific support
routines (i.e. register initialization and descriptor formats) for the
iop-adma driver.

Changelog:
* add support for > 1k zero sum buffer sizes
* added dma/aau platform devices to iq80321 and iq80332 setup
* fixed the calculation in iop_desc_is_aligned
* support xor buffer sizes larger than 16MB
* fix places where software descriptors are assumed to be contiguous, only
hardware descriptors are contiguous
for up to a PAGE_SIZE buffer size
* convert to async_tx
* add interrupt support
* add platform devices for 80219 boards
* do not call platform register macros in driver code
* remove switch() statements for compatible register offsets/layouts

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---

 arch/arm/mach-iop32x/glantank.c        |    2 
 arch/arm/mach-iop32x/iq31244.c         |    5 
 arch/arm/mach-iop32x/iq80321.c         |    3 
 arch/arm/mach-iop32x/n2100.c           |    2 
 arch/arm/mach-iop33x/iq80331.c         |    3 
 arch/arm/mach-iop33x/iq80332.c         |    3 
 arch/arm/plat-iop/Makefile             |    2 
 arch/arm/plat-iop/adma.c               |  198 +++++++
 include/asm-arm/arch-iop32x/adma.h     |    5 
 include/asm-arm/arch-iop33x/adma.h     |    5 
 include/asm-arm/hardware/iop3xx-adma.h |  893 ++++++++++++++++++++++++++++++++
 include/asm-arm/hardware/iop3xx.h      |   68 --
 12 files changed, 1129 insertions(+), 60 deletions(-)

diff --git a/arch/arm/mach-iop32x/glantank.c b/arch/arm/mach-iop32x/glantank.c
index 45f4f13..2e0099b 100644
--- a/arch/arm/mach-iop32x/glantank.c
+++ b/arch/arm/mach-iop32x/glantank.c
@@ -180,6 +180,8 @@ static void __init glantank_init_machine(void)
 	platform_device_register(&iop3xx_i2c1_device);
 	platform_device_register(&glantank_flash_device);
 	platform_device_register(&glantank_serial_device);
+	platform_device_register(&iop3xx_dma_0_channel);
+	platform_device_register(&iop3xx_dma_1_channel);
 
 	pm_power_off = glantank_power_off;
 }
diff --git a/arch/arm/mach-iop32x/iq31244.c b/arch/arm/mach-iop32x/iq31244.c
index 571ac35..bf1c112 100644
--- a/arch/arm/mach-iop32x/iq31244.c
+++ b/arch/arm/mach-iop32x/iq31244.c
@@ -276,9 +276,14 @@ static void __init iq31244_init_machine(void)
 	platform_device_register(&iop3xx_i2c1_device);
 	platform_device_register(&iq31244_flash_device);
 	platform_device_register(&iq31244_serial_device);
+	platform_device_register(&iop3xx_dma_0_channel);
+	platform_device_register(&iop3xx_dma_1_channel);
 
 	if (is_80219())
 		pm_power_off = ep80219_power_off;
+
+	if (!is_80219())	
+		platform_device_register(&iop3xx_aau_channel);
 }
 
 MACHINE_START(IQ31244, "Intel IQ31244")
diff --git a/arch/arm/mach-iop32x/iq80321.c b/arch/arm/mach-iop32x/iq80321.c
index 361c70c..474ec2a 100644
--- a/arch/arm/mach-iop32x/iq80321.c
+++ b/arch/arm/mach-iop32x/iq80321.c
@@ -180,6 +180,9 @@ static void __init iq80321_init_machine(void)
 	platform_device_register(&iop3xx_i2c1_device);
 	platform_device_register(&iq80321_flash_device);
 	platform_device_register(&iq80321_serial_device);
+	platform_device_register(&iop3xx_dma_0_channel);
+	platform_device_register(&iop3xx_dma_1_channel);
+	platform_device_register(&iop3xx_aau_channel);
 }
 
 MACHINE_START(IQ80321, "Intel IQ80321")
diff --git a/arch/arm/mach-iop32x/n2100.c b/arch/arm/mach-iop32x/n2100.c
index 5f07344..8e6fe13 100644
--- a/arch/arm/mach-iop32x/n2100.c
+++ b/arch/arm/mach-iop32x/n2100.c
@@ -245,6 +245,8 @@ static void __init n2100_init_machine(void)
 	platform_device_register(&iop3xx_i2c0_device);
 	platform_device_register(&n2100_flash_device);
 	platform_device_register(&n2100_serial_device);
+	platform_device_register(&iop3xx_dma_0_channel);
+	platform_device_register(&iop3xx_dma_1_channel);
 
 	pm_power_off = n2100_power_off;
 
diff --git a/arch/arm/mach-iop33x/iq80331.c b/arch/arm/mach-iop33x/iq80331.c
index 1a9e361..b4d12bf 100644
--- a/arch/arm/mach-iop33x/iq80331.c
+++ b/arch/arm/mach-iop33x/iq80331.c
@@ -135,6 +135,9 @@ static void __init iq80331_init_machine(void)
 	platform_device_register(&iop33x_uart0_device);
 	platform_device_register(&iop33x_uart1_device);
 	platform_device_register(&iq80331_flash_device);
+	platform_device_register(&iop3xx_dma_0_channel);
+	platform_device_register(&iop3xx_dma_1_channel);
+	platform_device_register(&iop3xx_aau_channel);
 }
 
 MACHINE_START(IQ80331, "Intel IQ80331")
diff --git a/arch/arm/mach-iop33x/iq80332.c b/arch/arm/mach-iop33x/iq80332.c
index 96d6f0f..2abb2d8 100644
--- a/arch/arm/mach-iop33x/iq80332.c
+++ b/arch/arm/mach-iop33x/iq80332.c
@@ -135,6 +135,9 @@ static void __init iq80332_init_machine(void)
 	platform_device_register(&iop33x_uart0_device);
 	platform_device_register(&iop33x_uart1_device);
 	platform_device_register(&iq80332_flash_device);
+	platform_device_register(&iop3xx_dma_0_channel);
+	platform_device_register(&iop3xx_dma_1_channel);
+	platform_device_register(&iop3xx_aau_channel);
 }
 
 MACHINE_START(IQ80332, "Intel IQ80332")
diff --git a/arch/arm/plat-iop/Makefile b/arch/arm/plat-iop/Makefile
index 4d2b1da..36bff03 100644
--- a/arch/arm/plat-iop/Makefile
+++ b/arch/arm/plat-iop/Makefile
@@ -12,6 +12,7 @@ obj-$(CONFIG_ARCH_IOP32X) += setup.o
 obj-$(CONFIG_ARCH_IOP32X) += time.o
 obj-$(CONFIG_ARCH_IOP32X) += io.o
 obj-$(CONFIG_ARCH_IOP32X) += cp6.o
+obj-$(CONFIG_ARCH_IOP32X) += adma.o
 
 # IOP33X
 obj-$(CONFIG_ARCH_IOP33X) += gpio.o
@@ -21,6 +22,7 @@ obj-$(CONFIG_ARCH_IOP33X) += setup.o
 obj-$(CONFIG_ARCH_IOP33X) += time.o
 obj-$(CONFIG_ARCH_IOP33X) += io.o
 obj-$(CONFIG_ARCH_IOP33X) += cp6.o
+obj-$(CONFIG_ARCH_IOP33X) += adma.o
 
 # IOP13XX
 obj-$(CONFIG_ARCH_IOP13XX) += cp6.o
diff --git a/arch/arm/plat-iop/adma.c b/arch/arm/plat-iop/adma.c
new file mode 100644
index 0000000..9d0fee6
--- /dev/null
+++ b/arch/arm/plat-iop/adma.c
@@ -0,0 +1,198 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+
+/*
+ * Platform device definitions for the iop3xx dma/xor engines
+ */
+
+#include <linux/platform_device.h>
+#include <asm/hardware/iop3xx.h>
+#include <linux/dma-mapping.h>
+#include <asm/arch/adma.h>
+#include <asm/hardware/iop_adma.h>
+
+#ifdef CONFIG_ARCH_IOP32X
+#define IRQ_DMA0_EOT IRQ_IOP32X_DMA0_EOT 
+#define IRQ_DMA0_EOC IRQ_IOP32X_DMA0_EOC 
+#define IRQ_DMA0_ERR IRQ_IOP32X_DMA0_ERR 
+
+#define IRQ_DMA1_EOT IRQ_IOP32X_DMA1_EOT 
+#define IRQ_DMA1_EOC IRQ_IOP32X_DMA1_EOC 
+#define IRQ_DMA1_ERR IRQ_IOP32X_DMA1_ERR 
+
+#define IRQ_AA_EOT IRQ_IOP32X_AA_EOT
+#define IRQ_AA_EOC IRQ_IOP32X_AA_EOC
+#define IRQ_AA_ERR IRQ_IOP32X_AA_ERR
+#endif
+#ifdef CONFIG_ARCH_IOP33X
+#define IRQ_DMA0_EOT IRQ_IOP33X_DMA0_EOT 
+#define IRQ_DMA0_EOC IRQ_IOP33X_DMA0_EOC 
+#define IRQ_DMA0_ERR IRQ_IOP33X_DMA0_ERR 
+
+#define IRQ_DMA1_EOT IRQ_IOP33X_DMA1_EOT 
+#define IRQ_DMA1_EOC IRQ_IOP33X_DMA1_EOC 
+#define IRQ_DMA1_ERR IRQ_IOP33X_DMA1_ERR 
+
+#define IRQ_AA_EOT IRQ_IOP33X_AA_EOT
+#define IRQ_AA_EOC IRQ_IOP33X_AA_EOC
+#define IRQ_AA_ERR IRQ_IOP33X_AA_ERR
+#endif
+/* AAU and DMA Channels */
+static struct resource iop3xx_dma_0_resources[] = {
+	[0] = {
+		.start = IOP3XX_DMA_PHYS_BASE(0),
+		.end = IOP3XX_DMA_UPPER_PA(0),
+		.flags = IORESOURCE_MEM,
+	},
+	[1] = {
+		.start = IRQ_DMA0_EOT,
+		.end = IRQ_DMA0_EOT,
+		.flags = IORESOURCE_IRQ
+	},
+	[2] = {
+		.start = IRQ_DMA0_EOC,
+		.end = IRQ_DMA0_EOC,
+		.flags = IORESOURCE_IRQ
+	},
+	[3] = {
+		.start = IRQ_DMA0_ERR,
+		.end = IRQ_DMA0_ERR,
+		.flags = IORESOURCE_IRQ
+	}
+};
+
+static struct resource iop3xx_dma_1_resources[] = {
+	[0] = {
+		.start = IOP3XX_DMA_PHYS_BASE(1),
+		.end = IOP3XX_DMA_UPPER_PA(1),
+		.flags = IORESOURCE_MEM,
+	},
+	[1] = {
+		.start = IRQ_DMA1_EOT,
+		.end = IRQ_DMA1_EOT,
+		.flags = IORESOURCE_IRQ
+	},
+	[2] = {
+		.start = IRQ_DMA1_EOC,
+		.end = IRQ_DMA1_EOC,
+		.flags = IORESOURCE_IRQ
+	},
+	[3] = {
+		.start = IRQ_DMA1_ERR,
+		.end = IRQ_DMA1_ERR,
+		.flags = IORESOURCE_IRQ
+	}
+};
+
+
+static struct resource iop3xx_aau_resources[] = {
+	[0] = {
+		.start = IOP3XX_AAU_PHYS_BASE,
+		.end = IOP3XX_AAU_UPPER_PA,
+		.flags = IORESOURCE_MEM,
+	},
+	[1] = {
+		.start = IRQ_AA_EOT,
+		.end = IRQ_AA_EOT,
+		.flags = IORESOURCE_IRQ
+	},
+	[2] = {
+		.start = IRQ_AA_EOC,
+		.end = IRQ_AA_EOC,
+		.flags = IORESOURCE_IRQ
+	},
+	[3] = {
+		.start = IRQ_AA_ERR,
+		.end = IRQ_AA_ERR,
+		.flags = IORESOURCE_IRQ
+	}
+};
+
+static u64 iop3xx_adma_dmamask = DMA_32BIT_MASK;
+
+static struct iop_adma_platform_data iop3xx_dma_0_data = {
+	.hw_id = DMA0_ID,
+	#ifdef CONFIG_ARCH_IOP32X /* the 32x DMA does not perform CRC32C */
+	.capabilities =	DMA_CAP_MEMCPY | DMA_CAP_INTERRUPT,
+	#else
+	.capabilities =	DMA_CAP_MEMCPY | DMA_CAP_MEMCPY_CRC32C |
+		DMA_CAP_INTERRUPT,
+	#endif
+	.pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop3xx_dma_1_data = {
+	.hw_id = DMA1_ID,
+	#ifdef CONFIG_ARCH_IOP32X /* the 32x DMA does not perform CRC32C */
+	.capabilities =	DMA_CAP_MEMCPY | DMA_CAP_INTERRUPT,
+	#else
+	.capabilities =	DMA_CAP_MEMCPY | DMA_CAP_MEMCPY_CRC32C |
+		DMA_CAP_INTERRUPT,
+	#endif
+	.pool_size = PAGE_SIZE,
+};
+
+static struct iop_adma_platform_data iop3xx_aau_data = {
+	.hw_id = AAU_ID,
+	#ifdef CONFIG_ARCH_IOP32X /* the 32x AAU does not perform zero sum */
+	.capabilities =	DMA_CAP_XOR | DMA_CAP_MEMSET | DMA_CAP_INTERRUPT,
+	#else
+	.capabilities =	DMA_CAP_XOR | DMA_CAP_ZERO_SUM | DMA_CAP_MEMSET |
+		DMA_CAP_INTERRUPT,
+	#endif
+	.pool_size = 3 * PAGE_SIZE,
+};
+
+struct platform_device iop3xx_dma_0_channel = {
+	.name = "IOP-ADMA",
+	.id = 0,
+	.num_resources = 4,
+	.resource = iop3xx_dma_0_resources,
+	.dev = {
+		.dma_mask = &iop3xx_adma_dmamask,
+		.coherent_dma_mask = DMA_64BIT_MASK,
+		.platform_data = (void *) &iop3xx_dma_0_data,
+	},
+};
+
+struct platform_device iop3xx_dma_1_channel = {
+	.name = "IOP-ADMA",
+	.id = 1,
+	.num_resources = 4,
+	.resource = iop3xx_dma_1_resources,
+	.dev = {
+		.dma_mask = &iop3xx_adma_dmamask,
+		.coherent_dma_mask = DMA_64BIT_MASK,
+		.platform_data = (void *) &iop3xx_dma_1_data,
+	},
+};
+
+struct platform_device iop3xx_aau_channel = {
+	.name = "IOP-ADMA",
+	.id = 2,
+	.num_resources = 4,
+	.resource = iop3xx_aau_resources,
+	.dev = {
+		.dma_mask = &iop3xx_adma_dmamask,
+		.coherent_dma_mask = DMA_64BIT_MASK,
+		.platform_data = (void *) &iop3xx_aau_data,
+	},
+};
diff --git a/include/asm-arm/arch-iop32x/adma.h b/include/asm-arm/arch-iop32x/adma.h
new file mode 100644
index 0000000..5ed9203
--- /dev/null
+++ b/include/asm-arm/arch-iop32x/adma.h
@@ -0,0 +1,5 @@
+#ifndef IOP32X_ADMA_H
+#define IOP32X_ADMA_H
+#include <asm/hardware/iop3xx-adma.h>
+#endif
+
diff --git a/include/asm-arm/arch-iop33x/adma.h b/include/asm-arm/arch-iop33x/adma.h
new file mode 100644
index 0000000..4b92f79
--- /dev/null
+++ b/include/asm-arm/arch-iop33x/adma.h
@@ -0,0 +1,5 @@
+#ifndef IOP33X_ADMA_H
+#define IOP33X_ADMA_H
+#include <asm/hardware/iop3xx-adma.h>
+#endif
+
diff --git a/include/asm-arm/hardware/iop3xx-adma.h b/include/asm-arm/hardware/iop3xx-adma.h
new file mode 100644
index 0000000..6f33724
--- /dev/null
+++ b/include/asm-arm/hardware/iop3xx-adma.h
@@ -0,0 +1,893 @@
+/*
+ * Copyright(c) 2006 Intel Corporation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59
+ * Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+ *
+ * The full GNU General Public License is included in this distribution in the
+ * file called COPYING.
+ */
+#ifndef _ADMA_H
+#define _ADMA_H
+#include <linux/types.h>
+#include <asm/io.h>
+#include <asm/hardware.h>
+#include <asm/hardware/iop_adma.h>
+
+/* Memory copy units */
+#define DMA_CCR(chan)		(chan->mmr_base + 0x0)
+#define DMA_CSR(chan)		(chan->mmr_base + 0x4)
+#define DMA_DAR(chan)		(chan->mmr_base + 0xc)
+#define DMA_NDAR(chan)		(chan->mmr_base + 0x10)
+#define DMA_PADR(chan)		(chan->mmr_base + 0x14)
+#define DMA_PUADR(chan)	(chan->mmr_base + 0x18)
+#define DMA_LADR(chan)		(chan->mmr_base + 0x1c)
+#define DMA_BCR(chan)		(chan->mmr_base + 0x20)
+#define DMA_DCR(chan)		(chan->mmr_base + 0x24)
+
+/* Application accelerator unit  */
+#define AAU_ACR(chan)		(chan->mmr_base + 0x0)
+#define AAU_ASR(chan)		(chan->mmr_base + 0x4)
+#define AAU_ADAR(chan)		(chan->mmr_base + 0x8)
+#define AAU_ANDAR(chan)	(chan->mmr_base + 0xc)
+#define AAU_SAR(src, chan)	(chan->mmr_base + (0x10 + ((src) << 2)))
+#define AAU_DAR(chan)		(chan->mmr_base + 0x20)
+#define AAU_ABCR(chan)		(chan->mmr_base + 0x24)
+#define AAU_ADCR(chan)		(chan->mmr_base + 0x28)
+#define AAU_SAR_EDCR(src_edc)	(chan->mmr_base + (0x02c + ((src_edc-4) << 2)))
+#define AAU_EDCR0_IDX	8
+#define AAU_EDCR1_IDX	17
+#define AAU_EDCR2_IDX	26
+
+#define DMA0_ID 0
+#define DMA1_ID 1
+#define AAU_ID 2
+
+struct iop3xx_aau_desc_ctrl {
+	unsigned int int_en:1;
+	unsigned int blk1_cmd_ctrl:3;
+	unsigned int blk2_cmd_ctrl:3;
+	unsigned int blk3_cmd_ctrl:3;
+	unsigned int blk4_cmd_ctrl:3;
+	unsigned int blk5_cmd_ctrl:3;
+	unsigned int blk6_cmd_ctrl:3;
+	unsigned int blk7_cmd_ctrl:3;
+	unsigned int blk8_cmd_ctrl:3;
+	unsigned int blk_ctrl:2;
+	unsigned int dual_xor_en:1;
+	unsigned int tx_complete:1;
+	unsigned int zero_result_err:1;
+	unsigned int zero_result_en:1;
+	unsigned int dest_write_en:1;
+};
+
+struct iop3xx_aau_e_desc_ctrl {
+	unsigned int reserved:1;
+	unsigned int blk1_cmd_ctrl:3;
+	unsigned int blk2_cmd_ctrl:3;
+	unsigned int blk3_cmd_ctrl:3;
+	unsigned int blk4_cmd_ctrl:3;
+	unsigned int blk5_cmd_ctrl:3;
+	unsigned int blk6_cmd_ctrl:3;
+	unsigned int blk7_cmd_ctrl:3;
+	unsigned int blk8_cmd_ctrl:3;
+	unsigned int reserved2:7;
+};
+
+struct iop3xx_dma_desc_ctrl {
+	unsigned int pci_transaction:4;
+	unsigned int int_en:1;
+	unsigned int dac_cycle_en:1;
+	unsigned int mem_to_mem_en:1;
+	unsigned int crc_data_tx_en:1;
+	unsigned int crc_gen_en:1;
+	unsigned int crc_seed_dis:1;
+	unsigned int reserved:21;
+	unsigned int crc_tx_complete:1;
+};
+
+struct iop3xx_desc_dma {
+	u32 next_desc;
+	union {
+		u32 pci_src_addr;
+		u32 pci_dest_addr;
+		u32 src_addr;
+	};
+	union {
+		u32 upper_pci_src_addr;
+		u32 upper_pci_dest_addr;
+	};
+	union {
+		u32 local_pci_src_addr;
+		u32 local_pci_dest_addr;
+		u32 dest_addr;
+	};
+	u32 byte_count;
+	union {
+		u32 desc_ctrl;
+		struct iop3xx_dma_desc_ctrl desc_ctrl_field;
+	};
+	u32 crc_addr;
+};
+
+struct iop3xx_desc_aau {
+	u32 next_desc;
+	u32 src[4];
+	u32 dest_addr;
+	u32 byte_count;
+	union {
+		u32 desc_ctrl;
+		struct iop3xx_aau_desc_ctrl desc_ctrl_field;
+	};
+	union {
+		u32 src_addr;
+		u32 e_desc_ctrl;
+		struct iop3xx_aau_e_desc_ctrl e_desc_ctrl_field;
+	} src_edc[31];
+};
+
+struct iop3xx_aau_gfmr {
+	unsigned int gfmr1:8;
+	unsigned int gfmr2:8;
+	unsigned int gfmr3:8;
+	unsigned int gfmr4:8;
+};
+
+struct iop3xx_desc_pq_xor {
+	u32 next_desc;
+	u32 src[3];
+	union {
+		u32 data_mult1;
+		struct iop3xx_aau_gfmr data_mult1_field;
+	};
+	u32 dest_addr;
+	u32 byte_count;
+	union {
+		u32 desc_ctrl;
+		struct iop3xx_aau_desc_ctrl desc_ctrl_field;
+	};
+	union {
+		u32 src_addr;
+		u32 e_desc_ctrl;
+		struct iop3xx_aau_e_desc_ctrl e_desc_ctrl_field;
+		u32 data_multiplier;
+		struct iop3xx_aau_gfmr data_mult_field;
+		u32 reserved;
+	} src_edc_gfmr[19];
+};
+
+struct iop3xx_desc_dual_xor {
+	u32 next_desc;
+	u32 src0_addr;
+	u32 src1_addr;
+	u32 h_src_addr;
+	u32 d_src_addr;
+	u32 h_dest_addr;
+	u32 byte_count;
+	union {
+		u32 desc_ctrl;
+		struct iop3xx_aau_desc_ctrl desc_ctrl_field;
+	};
+	u32 d_dest_addr;
+};
+
+union iop3xx_desc {
+	struct iop3xx_desc_aau *aau;
+	struct iop3xx_desc_dma *dma;
+	struct iop3xx_desc_pq_xor *pq_xor;
+	struct iop3xx_desc_dual_xor *dual_xor;
+	void *ptr;
+};
+
+static inline int iop_adma_get_max_xor(void)
+{
+	return 32;
+}
+
+static inline u32 iop_chan_get_current_descriptor(struct iop_adma_chan *chan)
+{
+	int id = chan->device->id;
+
+	switch (id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		return __raw_readl(DMA_DAR(chan));
+	case AAU_ID:
+		return __raw_readl(AAU_ADAR(chan));
+	default:
+		BUG();
+	}
+	return 0;
+}
+
+static inline void iop_chan_set_next_descriptor(struct iop_adma_chan *chan,
+						u32 next_desc_addr)
+{
+	int id = chan->device->id;
+
+	switch (id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		__raw_writel(next_desc_addr, DMA_NDAR(chan));
+		break;
+	case AAU_ID:
+		__raw_writel(next_desc_addr, AAU_ANDAR(chan));
+		break;
+	}
+
+}
+
+#define IOP_ADMA_STATUS_BUSY (1 << 10)
+#define IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT (1024)
+#define IOP_ADMA_XOR_MAX_BYTE_COUNT (16 * 1024 * 1024)
+#define IOP_ADMA_MAX_BYTE_COUNT (16 * 1024 * 1024)
+
+static inline int iop_chan_is_busy(struct iop_adma_chan *chan)
+{
+	u32 status = __raw_readl(DMA_CSR(chan));
+	return (status & IOP_ADMA_STATUS_BUSY) ? 1 : 0;
+}
+
+static inline int iop_desc_is_aligned(struct iop_adma_desc_slot *desc,
+					int num_slots)
+{
+	/* num_slots will only ever be 1, 2, 4, or 8 */
+	return (desc->idx & (num_slots - 1)) ? 0 : 1;
+}
+
+/* to do: support large (i.e. > hw max) buffer sizes */
+static inline int iop_chan_memcpy_slot_count(size_t len, int *slots_per_op)
+{
+	*slots_per_op = 1;
+	return 1;
+}
+
+/* to do: support large (i.e. > hw max) buffer sizes */
+static inline int iop_chan_memset_slot_count(size_t len, int *slots_per_op)
+{
+	*slots_per_op = 1;
+	return 1;
+}
+
+static inline int iop3xx_aau_xor_slot_count(size_t len, int src_cnt,
+					int *slots_per_op)
+{
+	const static int slot_count_table[] = { 0,
+					        1, 1, 1, 1, /* 01 - 04 */
+					        2, 2, 2, 2, /* 05 - 08 */
+					        4, 4, 4, 4, /* 09 - 12 */
+					        4, 4, 4, 4, /* 13 - 16 */
+					        8, 8, 8, 8, /* 17 - 20 */
+					        8, 8, 8, 8, /* 21 - 24 */
+					        8, 8, 8, 8, /* 25 - 28 */
+					        8, 8, 8, 8, /* 29 - 32 */
+					      };
+	*slots_per_op = slot_count_table[src_cnt];
+	return *slots_per_op;
+}
+
+static inline int
+iop_chan_interrupt_slot_count(int *slots_per_op, struct iop_adma_chan *chan)
+{
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		return iop_chan_memcpy_slot_count(0, slots_per_op);
+	case AAU_ID:
+		return iop3xx_aau_xor_slot_count(0, 2, slots_per_op);
+	default:
+		BUG();
+	}
+	return 0;
+}
+
+static inline int iop_chan_xor_slot_count(size_t len, int src_cnt,
+						int *slots_per_op)
+{
+	int slot_cnt = iop3xx_aau_xor_slot_count(len, src_cnt, slots_per_op);
+
+	if (len <= IOP_ADMA_XOR_MAX_BYTE_COUNT)
+		return slot_cnt;
+
+	len -= IOP_ADMA_XOR_MAX_BYTE_COUNT;
+	while (len > IOP_ADMA_XOR_MAX_BYTE_COUNT) {
+		len -= IOP_ADMA_XOR_MAX_BYTE_COUNT;
+		slot_cnt += *slots_per_op;
+	}
+
+	if (len)
+		slot_cnt += *slots_per_op;
+
+	return slot_cnt;
+}
+
+/* zero sum on iop3xx is limited to 1k at a time so it requires multiple
+ * descriptors
+ */
+static inline int iop_chan_zero_sum_slot_count(size_t len, int src_cnt,
+						int *slots_per_op)
+{
+	int slot_cnt = iop3xx_aau_xor_slot_count(len, src_cnt, slots_per_op);
+
+	if (len <= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT)
+		return slot_cnt;
+
+	len -= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+	while (len > IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT) {
+		len -= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+		slot_cnt += *slots_per_op;
+	}
+
+	if (len)
+		slot_cnt += *slots_per_op;
+
+	return slot_cnt;
+}
+
+static inline u32 iop_desc_get_dest_addr(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan)
+{
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		return hw_desc.dma->dest_addr;
+	case AAU_ID:
+		return hw_desc.aau->dest_addr;
+	default:
+		BUG();
+	}
+	return 0;
+}
+
+static inline u32 iop_desc_get_byte_count(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan)
+{
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		return hw_desc.dma->byte_count;
+	case AAU_ID:
+		return hw_desc.aau->byte_count;
+	default:
+		BUG();
+	}
+	return 0;
+}
+
+static inline int iop3xx_src_edc_idx(int src_idx)
+{
+	const static int src_edc_idx_table[] = { 0, 0, 0, 0,
+						 0, 1, 2, 3,
+						 5, 6, 7, 8,
+						 9, 10, 11, 12,
+						 14, 15, 16, 17,
+						 18, 19, 20, 21,
+						 23, 24, 25, 26,
+						 27, 28, 29, 30,
+					       };
+
+	return src_edc_idx_table[src_idx];
+}
+
+static inline u32 iop_desc_get_src_addr(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan,
+					int src_idx)
+{
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		return hw_desc.dma->src_addr;
+	case AAU_ID:
+		break;
+	default:
+		BUG();
+	}
+
+	if (src_idx < 4)
+		return hw_desc.aau->src[src_idx];
+	else
+		return hw_desc.aau->src_edc[iop3xx_src_edc_idx(src_idx)].src_addr;
+}
+
+static inline void iop3xx_aau_desc_set_src_addr(struct iop3xx_desc_aau *hw_desc,
+					int src_idx, dma_addr_t addr)
+{
+	if (src_idx < 4)
+		hw_desc->src[src_idx] = addr;
+	else
+		hw_desc->src_edc[iop3xx_src_edc_idx(src_idx)].src_addr = addr;
+}
+
+static inline void
+iop_desc_init_memcpy(struct iop_adma_desc_slot *desc, int int_en)
+{
+	struct iop3xx_desc_dma *hw_desc = desc->hw_desc;
+	union {
+		u32 value;
+		struct iop3xx_dma_desc_ctrl field;
+	} u_desc_ctrl;
+
+	u_desc_ctrl.value = 0;
+	u_desc_ctrl.field.mem_to_mem_en = 1;
+	u_desc_ctrl.field.pci_transaction = 0xe; /* memory read block */
+	u_desc_ctrl.field.int_en = int_en;
+	hw_desc->desc_ctrl = u_desc_ctrl.value;
+	hw_desc->upper_pci_src_addr = 0;
+	hw_desc->crc_addr = 0;
+}
+
+static inline void
+iop_desc_init_memset(struct iop_adma_desc_slot *desc, int int_en)
+{
+	struct iop3xx_desc_aau *hw_desc = desc->hw_desc;
+	union {
+		u32 value;
+		struct iop3xx_aau_desc_ctrl field;
+	} u_desc_ctrl;
+
+	u_desc_ctrl.value = 0;
+	u_desc_ctrl.field.blk1_cmd_ctrl = 0x2; /* memory block fill */
+	u_desc_ctrl.field.dest_write_en = 1;
+	u_desc_ctrl.field.int_en = int_en;
+	hw_desc->desc_ctrl = u_desc_ctrl.value;
+}
+
+static inline u32
+iop3xx_desc_init_xor(struct iop3xx_desc_aau *hw_desc, int src_cnt, int int_en)
+{
+	int i, shift;
+	u32 edcr;
+	union {
+		u32 value;
+		struct iop3xx_aau_desc_ctrl field;
+	} u_desc_ctrl;
+
+	u_desc_ctrl.value = 0;
+	switch (src_cnt) {
+	case 25 ... 32:
+		u_desc_ctrl.field.blk_ctrl = 0x3; /* use EDCR[2:0] */
+		edcr = 0;
+		shift = 1;
+		for (i = 24; i < src_cnt; i++) {
+			edcr |= (1 << shift);
+			shift += 3;
+		}
+		hw_desc->src_edc[AAU_EDCR2_IDX].e_desc_ctrl = edcr;
+		src_cnt = 24;
+		/* fall through */
+	case 17 ... 24:
+		if (!u_desc_ctrl.field.blk_ctrl) {
+			hw_desc->src_edc[AAU_EDCR2_IDX].e_desc_ctrl = 0;
+			u_desc_ctrl.field.blk_ctrl = 0x3; /* use EDCR[2:0] */
+		}
+		edcr = 0;
+		shift = 1;
+		for (i = 16; i < src_cnt; i++) {
+			edcr |= (1 << shift);
+			shift += 3;
+		}
+		hw_desc->src_edc[AAU_EDCR1_IDX].e_desc_ctrl = edcr;
+		src_cnt = 16;
+		/* fall through */
+	case 9 ... 16:
+		if (!u_desc_ctrl.field.blk_ctrl)
+			u_desc_ctrl.field.blk_ctrl = 0x2; /* use EDCR0 */
+		edcr = 0;
+		shift = 1;
+		for (i = 8; i < src_cnt; i++) {
+			edcr |= (1 << shift);
+			shift += 3;
+		}
+		hw_desc->src_edc[AAU_EDCR0_IDX].e_desc_ctrl = edcr;
+		src_cnt = 8;
+		/* fall through */
+	case 2 ... 8:
+		shift = 1;
+		for (i = 0; i < src_cnt; i++) {
+			u_desc_ctrl.value |= (1 << shift);
+			shift += 3;
+		}
+
+		if (!u_desc_ctrl.field.blk_ctrl && src_cnt > 4)
+			u_desc_ctrl.field.blk_ctrl = 0x1; /* use mini-desc */
+	}
+
+	u_desc_ctrl.field.dest_write_en = 1;
+	u_desc_ctrl.field.blk1_cmd_ctrl = 0x7; /* direct fill */
+	u_desc_ctrl.field.int_en = int_en;
+	hw_desc->desc_ctrl = u_desc_ctrl.value;
+
+	return u_desc_ctrl.value;
+}
+
+static inline void
+iop_desc_init_xor(struct iop_adma_desc_slot *desc, int src_cnt, int int_en)
+{
+	iop3xx_desc_init_xor(desc->hw_desc, src_cnt, int_en);
+}
+
+/* return the number of operations */
+static inline int
+iop_desc_init_zero_sum(struct iop_adma_desc_slot *desc, int src_cnt, int int_en)
+{
+	int slot_cnt = desc->slot_cnt, slots_per_op = desc->slots_per_op;
+	struct iop3xx_desc_aau *hw_desc, *prev_hw_desc, *iter;
+	union {
+		u32 value;
+		struct iop3xx_aau_desc_ctrl field;
+	} u_desc_ctrl;
+	int i, j;
+
+	hw_desc = desc->hw_desc;
+
+	for (i = 0, j = 0; (slot_cnt -= slots_per_op) >= 0;
+		i += slots_per_op, j++) {
+		iter = iop_hw_desc_slot_idx(hw_desc, i);
+		u_desc_ctrl.value = iop3xx_desc_init_xor(iter, src_cnt, int_en);
+		u_desc_ctrl.field.dest_write_en = 0;
+		u_desc_ctrl.field.zero_result_en = 1;
+		u_desc_ctrl.field.int_en = int_en;
+		iter->desc_ctrl = u_desc_ctrl.value;
+
+		/* for the subsequent descriptors preserve the store queue
+		 * and chain them together
+		 */
+		if (i) {
+			prev_hw_desc = iop_hw_desc_slot_idx(hw_desc, i - slots_per_op);
+			prev_hw_desc->next_desc = (u32) (desc->phys + (i << 5));
+		}
+	}
+
+	return j;
+}
+
+static inline void
+iop_desc_init_null_xor(struct iop_adma_desc_slot *desc, int src_cnt, int int_en)
+{
+	struct iop3xx_desc_aau *hw_desc = desc->hw_desc;
+	union {
+		u32 value;
+		struct iop3xx_aau_desc_ctrl field;
+	} u_desc_ctrl;
+
+	u_desc_ctrl.value = 0;
+	switch (src_cnt) {
+	case 25 ... 32:
+		u_desc_ctrl.field.blk_ctrl = 0x3; /* use EDCR[2:0] */
+		hw_desc->src_edc[AAU_EDCR2_IDX].e_desc_ctrl = 0;
+		/* fall through */
+	case 17 ... 24:
+		if (!u_desc_ctrl.field.blk_ctrl) {
+			hw_desc->src_edc[AAU_EDCR2_IDX].e_desc_ctrl = 0;
+			u_desc_ctrl.field.blk_ctrl = 0x3; /* use EDCR[2:0] */
+		}
+		hw_desc->src_edc[AAU_EDCR1_IDX].e_desc_ctrl = 0;
+		/* fall through */
+	case 9 ... 16:
+		if (!u_desc_ctrl.field.blk_ctrl)
+			u_desc_ctrl.field.blk_ctrl = 0x2; /* use EDCR0 */
+		hw_desc->src_edc[AAU_EDCR0_IDX].e_desc_ctrl = 0;
+		/* fall through */
+	case 1 ... 8:
+		if (!u_desc_ctrl.field.blk_ctrl && src_cnt > 4)
+			u_desc_ctrl.field.blk_ctrl = 0x1; /* use mini-desc */
+	}
+
+	u_desc_ctrl.field.dest_write_en = 0;
+	u_desc_ctrl.field.int_en = int_en;
+	hw_desc->desc_ctrl = u_desc_ctrl.value;
+}
+
+static inline void iop_desc_set_byte_count(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan,
+					u32 byte_count)
+{
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		hw_desc.dma->byte_count = byte_count;
+		break;
+	case AAU_ID:
+		hw_desc.aau->byte_count = byte_count;
+		break;
+	default:
+		BUG();
+	}
+}
+
+static inline void
+iop_desc_init_interrupt(struct iop_adma_desc_slot *desc, struct iop_adma_chan *chan)
+{
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		iop_desc_init_memcpy(desc, 1);
+		hw_desc.dma->byte_count = 0;
+		hw_desc.dma->dest_addr = 0;
+		hw_desc.dma->src_addr = 0;
+		break;
+	case AAU_ID:
+		iop_desc_init_null_xor(desc, 2, 1);
+		hw_desc.aau->byte_count = 0;
+		hw_desc.aau->dest_addr = 0;
+		hw_desc.aau->src[0] = 0;
+		hw_desc.aau->src[1] = 0;
+		break;
+	default:
+		BUG();
+	}
+}
+
+static inline void
+iop_desc_set_zero_sum_byte_count(struct iop_adma_desc_slot *desc, u32 len)
+{
+	int slots_per_op = desc->slots_per_op;
+	struct iop3xx_desc_aau *hw_desc = desc->hw_desc, *iter;
+	int i = 0;
+
+	if (len <= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT) {
+		hw_desc->byte_count = len;
+	} else {
+		do {
+			iter = iop_hw_desc_slot_idx(hw_desc, i);
+			iter->byte_count = IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+			len -= IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT;
+			i += slots_per_op;
+		} while (len > IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT);
+
+		if (len) {
+			iter = iop_hw_desc_slot_idx(hw_desc, i);
+			iter->byte_count = len;
+		}
+	}
+}
+
+static inline void iop_desc_set_dest_addr(struct iop_adma_desc_slot *desc,
+					struct iop_adma_chan *chan,
+					dma_addr_t addr)
+{
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		hw_desc.dma->dest_addr = addr;
+		break;
+	case AAU_ID:
+		hw_desc.aau->dest_addr = addr;
+		break;
+	default:
+		BUG();
+	}
+}
+
+static inline void iop_desc_set_memcpy_src_addr(struct iop_adma_desc_slot *desc,
+					dma_addr_t addr)
+{
+	struct iop3xx_desc_dma *hw_desc = desc->hw_desc;
+	hw_desc->src_addr = addr;
+}
+
+static inline void iop_desc_set_zero_sum_src_addr(struct iop_adma_desc_slot *desc,
+					int src_idx, dma_addr_t addr)
+{
+
+	struct iop3xx_desc_aau *hw_desc = desc->hw_desc, *iter;
+	int slot_cnt = desc->slot_cnt, slots_per_op = desc->slots_per_op;
+	int i;
+
+	for (i = 0; (slot_cnt -= slots_per_op) >= 0;
+		i += slots_per_op, addr += IOP_ADMA_ZERO_SUM_MAX_BYTE_COUNT) {
+		iter = iop_hw_desc_slot_idx(hw_desc, i);
+		iop3xx_aau_desc_set_src_addr(iter, src_idx, addr);
+	}
+}
+
+static inline void iop_desc_set_xor_src_addr(struct iop_adma_desc_slot *desc,
+					int src_idx, dma_addr_t addr)
+{
+
+	struct iop3xx_desc_aau *hw_desc = desc->hw_desc, *iter;
+	int slot_cnt = desc->slot_cnt, slots_per_op = desc->slots_per_op;
+	int i;
+
+	for (i = 0; (slot_cnt -= slots_per_op) >= 0;
+		i += slots_per_op, addr += IOP_ADMA_XOR_MAX_BYTE_COUNT) {
+		iter = iop_hw_desc_slot_idx(hw_desc, i);
+		iop3xx_aau_desc_set_src_addr(iter, src_idx, addr);
+	}
+}
+
+static inline void iop_desc_set_next_desc(struct iop_adma_desc_slot *desc,
+					u32 next_desc_addr)
+{
+	/* hw_desc->next_desc is the same location for all channels */
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+	BUG_ON(hw_desc.dma->next_desc);
+	hw_desc.dma->next_desc = next_desc_addr;
+}
+
+static inline u32 iop_desc_get_next_desc(struct iop_adma_desc_slot *desc)
+{
+	/* hw_desc->next_desc is the same location for all channels */
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+	return hw_desc.dma->next_desc;
+}
+
+static inline void iop_desc_clear_next_desc(struct iop_adma_desc_slot *desc)
+{
+	/* hw_desc->next_desc is the same location for all channels */
+	union iop3xx_desc hw_desc = { .ptr = desc->hw_desc, };
+	hw_desc.dma->next_desc = 0;
+}
+
+static inline void iop_desc_set_block_fill_val(struct iop_adma_desc_slot *desc,
+						u32 val)
+{
+	struct iop3xx_desc_aau *hw_desc = desc->hw_desc;
+	hw_desc->src[0] = val;
+}
+
+static inline int iop_desc_get_zero_result(struct iop_adma_desc_slot *desc)
+{
+	struct iop3xx_desc_aau *hw_desc = desc->hw_desc;
+	struct iop3xx_aau_desc_ctrl desc_ctrl = hw_desc->desc_ctrl_field;
+
+	BUG_ON(!(desc_ctrl.tx_complete && desc_ctrl.zero_result_en));
+	return desc_ctrl.zero_result_err;
+}
+
+static inline void iop_chan_append(struct iop_adma_chan *chan)
+{
+	u32 dma_chan_ctrl;	
+	/* workaround dropped interrupts on 3xx */
+	mod_timer(&chan->cleanup_watchdog, jiffies + msecs_to_jiffies(3));
+
+	dma_chan_ctrl = __raw_readl(DMA_CCR(chan));
+	dma_chan_ctrl |= 0x2;
+	__raw_writel(dma_chan_ctrl, DMA_CCR(chan));
+}
+
+static inline void iop_chan_idle(int busy, struct iop_adma_chan *chan)
+{
+	if (!busy)
+		del_timer(&chan->cleanup_watchdog);
+}
+
+static inline u32 iop_chan_get_status(struct iop_adma_chan *chan)
+{
+	return __raw_readl(DMA_CSR(chan));
+}
+
+static inline void iop_chan_disable(struct iop_adma_chan *chan)
+{
+	u32 dma_chan_ctrl = __raw_readl(DMA_CCR(chan));
+	dma_chan_ctrl &= ~1;
+	__raw_writel(dma_chan_ctrl, DMA_CCR(chan));
+}
+
+static inline void iop_chan_enable(struct iop_adma_chan *chan)
+{
+	u32 dma_chan_ctrl = __raw_readl(DMA_CCR(chan));
+
+	/* drain write buffer */
+	asm volatile ("mcr p15, 0, r1, c7, c10, 4" : : : "%r1");
+
+	dma_chan_ctrl |= 1;
+	__raw_writel(dma_chan_ctrl, DMA_CCR(chan));
+}
+
+static inline void iop_adma_device_clear_eot_status(struct iop_adma_chan *chan)
+{
+	u32 status = __raw_readl(DMA_CSR(chan));
+	status &= (1 << 9);
+	__raw_writel(status, DMA_CSR(chan));
+}
+
+static inline void iop_adma_device_clear_eoc_status(struct iop_adma_chan *chan)
+{
+	u32 status = __raw_readl(DMA_CSR(chan));
+	status &= (1 << 8);
+	__raw_writel(status, DMA_CSR(chan));
+}
+
+static inline void iop_adma_device_clear_err_status(struct iop_adma_chan *chan)
+{
+	u32 status = __raw_readl(DMA_CSR(chan));
+
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		status &= (1 << 5) | (1 << 3) | (1 << 2) | (1 << 1);
+		break;
+	case AAU_ID:
+		status &= (1 << 5);
+		break;
+	default:
+		BUG();
+	}
+
+	__raw_writel(status, DMA_CSR(chan));
+}
+
+static inline int
+iop_is_err_int_parity(unsigned long status, struct iop_adma_chan *chan)
+{
+	return 0;
+}
+
+static inline int
+iop_is_err_mcu_abort(unsigned long status, struct iop_adma_chan *chan)
+{
+	return 0;
+}
+
+static inline int
+iop_is_err_int_tabort(unsigned long status, struct iop_adma_chan *chan)
+{
+	return 0;
+}
+
+static inline int
+iop_is_err_int_mabort(unsigned long status, struct iop_adma_chan *chan)
+{
+	return test_bit(5, &status);
+}
+
+static inline int
+iop_is_err_pci_tabort(unsigned long status, struct iop_adma_chan *chan)
+{
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		return test_bit(2, &status);
+	default:
+		return 0;
+	}
+}
+
+static inline int
+iop_is_err_pci_mabort(unsigned long status, struct iop_adma_chan *chan)
+{
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		return test_bit(3, &status);
+	default:
+		return 0;
+	}
+}
+
+static inline int
+iop_is_err_split_tx(unsigned long status, struct iop_adma_chan *chan)
+{
+	switch (chan->device->id) {
+	case DMA0_ID:
+	case DMA1_ID:
+		return test_bit(1, &status);
+	default:
+		return 0;
+	}
+}
+#endif /* _ADMA_H */
diff --git a/include/asm-arm/hardware/iop3xx.h b/include/asm-arm/hardware/iop3xx.h
index 15141a9..01e631d 100644
--- a/include/asm-arm/hardware/iop3xx.h
+++ b/include/asm-arm/hardware/iop3xx.h
@@ -128,24 +128,9 @@ extern void gpio_line_set(int line, int value);
 #define IOP3XX_IAR		(volatile u32 *)IOP3XX_REG_ADDR(0x0380)
 
 /* DMA Controller  */
-#define IOP3XX_DMA0_CCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0400)
-#define IOP3XX_DMA0_CSR		(volatile u32 *)IOP3XX_REG_ADDR(0x0404)
-#define IOP3XX_DMA0_DAR		(volatile u32 *)IOP3XX_REG_ADDR(0x040c)
-#define IOP3XX_DMA0_NDAR	(volatile u32 *)IOP3XX_REG_ADDR(0x0410)
-#define IOP3XX_DMA0_PADR	(volatile u32 *)IOP3XX_REG_ADDR(0x0414)
-#define IOP3XX_DMA0_PUADR	(volatile u32 *)IOP3XX_REG_ADDR(0x0418)
-#define IOP3XX_DMA0_LADR	(volatile u32 *)IOP3XX_REG_ADDR(0x041c)
-#define IOP3XX_DMA0_BCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0420)
-#define IOP3XX_DMA0_DCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0424)
-#define IOP3XX_DMA1_CCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0440)
-#define IOP3XX_DMA1_CSR		(volatile u32 *)IOP3XX_REG_ADDR(0x0444)
-#define IOP3XX_DMA1_DAR		(volatile u32 *)IOP3XX_REG_ADDR(0x044c)
-#define IOP3XX_DMA1_NDAR	(volatile u32 *)IOP3XX_REG_ADDR(0x0450)
-#define IOP3XX_DMA1_PADR	(volatile u32 *)IOP3XX_REG_ADDR(0x0454)
-#define IOP3XX_DMA1_PUADR	(volatile u32 *)IOP3XX_REG_ADDR(0x0458)
-#define IOP3XX_DMA1_LADR	(volatile u32 *)IOP3XX_REG_ADDR(0x045c)
-#define IOP3XX_DMA1_BCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0460)
-#define IOP3XX_DMA1_DCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0464)
+#define IOP3XX_DMA_PHYS_BASE(chan) (IOP3XX_PERIPHERAL_PHYS_BASE + \
+					(0x400 + (chan << 6)))
+#define IOP3XX_DMA_UPPER_PA(chan)  (IOP3XX_DMA_PHYS_BASE(chan) + 0x27)
 
 /* Peripheral bus interface  */
 #define IOP3XX_PBCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0680)
@@ -194,48 +179,8 @@ extern void gpio_line_set(int line, int value);
 #define IOP_TMR_RATIO_1_1  0x00
 
 /* Application accelerator unit  */
-#define IOP3XX_AAU_ACR		(volatile u32 *)IOP3XX_REG_ADDR(0x0800)
-#define IOP3XX_AAU_ASR		(volatile u32 *)IOP3XX_REG_ADDR(0x0804)
-#define IOP3XX_AAU_ADAR		(volatile u32 *)IOP3XX_REG_ADDR(0x0808)
-#define IOP3XX_AAU_ANDAR	(volatile u32 *)IOP3XX_REG_ADDR(0x080c)
-#define IOP3XX_AAU_SAR1		(volatile u32 *)IOP3XX_REG_ADDR(0x0810)
-#define IOP3XX_AAU_SAR2		(volatile u32 *)IOP3XX_REG_ADDR(0x0814)
-#define IOP3XX_AAU_SAR3		(volatile u32 *)IOP3XX_REG_ADDR(0x0818)
-#define IOP3XX_AAU_SAR4		(volatile u32 *)IOP3XX_REG_ADDR(0x081c)
-#define IOP3XX_AAU_DAR		(volatile u32 *)IOP3XX_REG_ADDR(0x0820)
-#define IOP3XX_AAU_ABCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0824)
-#define IOP3XX_AAU_ADCR		(volatile u32 *)IOP3XX_REG_ADDR(0x0828)
-#define IOP3XX_AAU_SAR5		(volatile u32 *)IOP3XX_REG_ADDR(0x082c)
-#define IOP3XX_AAU_SAR6		(volatile u32 *)IOP3XX_REG_ADDR(0x0830)
-#define IOP3XX_AAU_SAR7		(volatile u32 *)IOP3XX_REG_ADDR(0x0834)
-#define IOP3XX_AAU_SAR8		(volatile u32 *)IOP3XX_REG_ADDR(0x0838)
-#define IOP3XX_AAU_EDCR0	(volatile u32 *)IOP3XX_REG_ADDR(0x083c)
-#define IOP3XX_AAU_SAR9		(volatile u32 *)IOP3XX_REG_ADDR(0x0840)
-#define IOP3XX_AAU_SAR10	(volatile u32 *)IOP3XX_REG_ADDR(0x0844)
-#define IOP3XX_AAU_SAR11	(volatile u32 *)IOP3XX_REG_ADDR(0x0848)
-#define IOP3XX_AAU_SAR12	(volatile u32 *)IOP3XX_REG_ADDR(0x084c)
-#define IOP3XX_AAU_SAR13	(volatile u32 *)IOP3XX_REG_ADDR(0x0850)
-#define IOP3XX_AAU_SAR14	(volatile u32 *)IOP3XX_REG_ADDR(0x0854)
-#define IOP3XX_AAU_SAR15	(volatile u32 *)IOP3XX_REG_ADDR(0x0858)
-#define IOP3XX_AAU_SAR16	(volatile u32 *)IOP3XX_REG_ADDR(0x085c)
-#define IOP3XX_AAU_EDCR1	(volatile u32 *)IOP3XX_REG_ADDR(0x0860)
-#define IOP3XX_AAU_SAR17	(volatile u32 *)IOP3XX_REG_ADDR(0x0864)
-#define IOP3XX_AAU_SAR18	(volatile u32 *)IOP3XX_REG_ADDR(0x0868)
-#define IOP3XX_AAU_SAR19	(volatile u32 *)IOP3XX_REG_ADDR(0x086c)
-#define IOP3XX_AAU_SAR20	(volatile u32 *)IOP3XX_REG_ADDR(0x0870)
-#define IOP3XX_AAU_SAR21	(volatile u32 *)IOP3XX_REG_ADDR(0x0874)
-#define IOP3XX_AAU_SAR22	(volatile u32 *)IOP3XX_REG_ADDR(0x0878)
-#define IOP3XX_AAU_SAR23	(volatile u32 *)IOP3XX_REG_ADDR(0x087c)
-#define IOP3XX_AAU_SAR24	(volatile u32 *)IOP3XX_REG_ADDR(0x0880)
-#define IOP3XX_AAU_EDCR2	(volatile u32 *)IOP3XX_REG_ADDR(0x0884)
-#define IOP3XX_AAU_SAR25	(volatile u32 *)IOP3XX_REG_ADDR(0x0888)
-#define IOP3XX_AAU_SAR26	(volatile u32 *)IOP3XX_REG_ADDR(0x088c)
-#define IOP3XX_AAU_SAR27	(volatile u32 *)IOP3XX_REG_ADDR(0x0890)
-#define IOP3XX_AAU_SAR28	(volatile u32 *)IOP3XX_REG_ADDR(0x0894)
-#define IOP3XX_AAU_SAR29	(volatile u32 *)IOP3XX_REG_ADDR(0x0898)
-#define IOP3XX_AAU_SAR30	(volatile u32 *)IOP3XX_REG_ADDR(0x089c)
-#define IOP3XX_AAU_SAR31	(volatile u32 *)IOP3XX_REG_ADDR(0x08a0)
-#define IOP3XX_AAU_SAR32	(volatile u32 *)IOP3XX_REG_ADDR(0x08a4)
+#define IOP3XX_AAU_PHYS_BASE (IOP3XX_PERIPHERAL_PHYS_BASE + 0x800)
+#define IOP3XX_AAU_UPPER_PA (IOP3XX_AAU_PHYS_BASE + 0xa7)
 
 /* I2C bus interface unit  */
 #define IOP3XX_ICR0		(volatile u32 *)IOP3XX_REG_ADDR(0x1680)
@@ -315,6 +260,9 @@ static inline void write_tisr(u32 val)
 	asm volatile("mcr p6, 0, %0, c6, c1, 0" : : "r" (val));
 }
 
+extern struct platform_device iop3xx_dma_0_channel;
+extern struct platform_device iop3xx_dma_1_channel;
+extern struct platform_device iop3xx_aau_channel;
 extern struct platform_device iop3xx_i2c0_device;
 extern struct platform_device iop3xx_i2c1_device;
 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2007-03-23  6:56 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-23  6:51 [PATCH 2.6.21-rc4 00/15] md raid5 acceleration and async_tx Dan Williams
2007-03-23  6:51 ` [PATCH 2.6.21-rc4 01/15] dmaengine: add base support for the async_tx api Dan Williams
2007-03-23  6:51 ` [PATCH 2.6.21-rc4 02/15] ARM: Add drivers/dma to arch/arm/Kconfig Dan Williams
2007-03-23  6:51 ` [PATCH 2.6.21-rc4 03/15] dmaengine: add the async_tx api Dan Williams
2007-03-23  6:51 ` [PATCH 2.6.21-rc4 04/15] md: add raid5_run_ops and support routines Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 05/15] md: use raid5_run_ops for stripe cache operations Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 06/15] md: move write operations to raid5_run_ops Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 07/15] md: move raid5 compute block " Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 08/15] md: move raid5 parity checks " Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 09/15] md: satisfy raid5 read requests via raid5_run_ops Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 10/15] md: use async_tx and raid5_run_ops for raid5 expansion operations Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 11/15] md: move raid5 io requests to raid5_run_ops Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 12/15] md: remove raid5 compute_block and compute_parity5 Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 13/15] dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 14/15] iop13xx: Surface the iop13xx adma units to the iop-adma driver Dan Williams
2007-03-23  6:52 ` [PATCH 2.6.21-rc4 15/15] iop3xx: Surface the iop3xx DMA and AAU " Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).