LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [RFC] Stacking bio support
@ 2008-03-11 10:52 Daniel Phillips
  2008-03-11 11:33 ` Ph. Marek
  2008-03-14 13:59 ` Alan D. Brunelle
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel Phillips @ 2008-03-11 10:52 UTC (permalink / raw)
  To: linux-kernel

Or: The Head of the Axe

With a nice shiny new interface paradigm in hand that can be easily 
adapted to a variety of alternate back ends[1], let me now supply an 
important building block for such a back end.  This three patch series 
improves both the existing block layer and device mapper without 
disrupting anything, and lays the groundwork for generalizing the block 
layer to be much more flexible than it is now.  The ultimate goal of 
this effort is a complete merge of device mapper capabilities into the 
generic block layer, which could equally well be called lvm3 or bio2.  
While admittedly an ambitious undertaking, I should not need to explain 
why it is necessary, or why one cannot get to the end of a long road 
without taking the first step.

This patch set adds support for block device stacking to the block io 
layer.  Due to tightening up some code and removing unnecessary slab 
objects, the patch set as a whole reduces core kernel size slightly.  
Further code size reductions can be expected as things progress, and 
modest memory savings as well.

1: bio.single.alloc-2.6.23.12

This patch eliminates 50% of the slab allocations in the bio fast path, 
tightens up some code, and turns struct bio into a variable sized 
object.  It replaces the bio slab by an array of bio slabs and removes 
the existing biovec slab array.  A small amount of system memory is 
saved and some code is removed.

diffstat patch/1/bio.single.alloc-2.6.23.12
 drivers/md/dm.c     |    2
 fs/bio.c            |  187 
++++++++++++++--------------------------------------
 include/linux/bio.h |    2
 3 files changed, 54 insertions(+), 137 deletions(-)

2: bio.hide.endio-2.6.23.12

In preparation for moving the bio endio function pointer from a fixed 
location in the bio to a location on the internal stack, direct 
structure references to the bio endio field are disallowed.  This patch 
renames the endio field, introduces wrappers to get/put field contents, 
and edits all existing references to the field to go through the 
wrappers.

This is a large patch, consisting entirely of trivial changes.  May I 
have some volunteer proofreading on this, please?  With this many edits 
there are sure to be a few slips.  Not all of the affected files have 
even been compiled.  I would appreciate very much if interested 
observers would volunteer.

diffstat patch/2/bio.hide.endio-2.6.23.12
 block/ll_rw_blk.c            |    6 ++---
 drivers/block/floppy.c       |    2 -
 drivers/block/pktcdvd.c      |    8 +++----
 drivers/md/dm-crypt.c        |    2 -
 drivers/md/dm-emc.c          |    2 -
 drivers/md/dm-io.c           |    2 -
 drivers/md/dm.c              |    2 -
 drivers/md/faulty.c          |    4 +--
 drivers/md/md.c              |    6 ++---
 drivers/md/multipath.c       |    4 +--
 drivers/md/raid1.c           |   44 
+++++++++++++++++++++----------------------
 drivers/md/raid10.c          |   24 +++++++++++------------
 drivers/md/raid5.c           |   18 ++++++++---------
 drivers/s390/block/xpram.c   |    2 -
 drivers/scsi/scsi_lib.c      |    2 -
 fs/bio.c                     |   12 +++++------
 fs/block_dev.c               |    2 -
 fs/buffer.c                  |    2 -
 fs/direct-io.c               |    4 +--
 fs/gfs2/super.c              |    3 --
 fs/jfs/jfs_logmgr.c          |    4 +--
 fs/jfs/jfs_metapage.c        |    4 +--
 fs/mpage.c                   |    4 +--
 fs/ocfs2/cluster/heartbeat.c |    2 -
 fs/xfs/linux-2.6/xfs_aops.c  |    6 ++---
 fs/xfs/linux-2.6/xfs_buf.c   |    4 +--
 include/linux/bio.h          |   12 ++++++++++-
 kernel/power/swap.c          |    2 -
 mm/bounce.c                  |    8 +++----
 mm/page_io.c                 |    2 -
 30 files changed, 104 insertions(+), 95 deletions(-)

3: bio.stack-2.6.23.12

This adds an internal stack to each struct bio and introduces two new 
bio operations:

        data = bio_push(bio, worksize, endio);
        data = bio_pop(bio);

The first is used before submitting a bio and the second is used in the 
endio handler, which gives the driver a nice way to share context 
between the two events.  If the requested amount stack space is not 
available in the bio, the bio stack is automatically extended.

Currently the stack size in the bio is set to zero and is always 
extended on the first bio_push.  An upcoming revision will add a 
mechanism for a block driver to specify the initial amount of stack 
space it knows will always be needed.  Some time further in the future, 
a mechanism for discovering the stack requirements of several block 
devices in a stack may be added, so that a typical bio submission is 
able to traverse the entire stack with only a single bio allocation.  
(I learned from Andreas Dilger last month at FAST that Lustre already 
implements a mechanism along these lines.)

In addition to stacking context data, the bio endio handler is also 
placed on the stack.  This reflects the fact that the only valid 
purpose for putting data on the bio stack is to communicate it to a 
stacked endio handler, so it makes sense to set the endio handler at 
the same time as allocating stack space.

It is not as clear whether the bio target device should be moved to the 
stack as well.  If it were, the generic endio handler could easily 
account congestion data etc for the device to which the bio was 
submitted.  For now, this field just stays as it was.

This patch changes the wrapper functions introduced in the previous 
patch to access the endio function through the stack instead of a 
direct field lookup.  

As currently implemented, an empty bio stack adds eight bytes to a bio 
on a 32 bit machine.  The expectation is that considerably more memory 
in stacking drivers will be saved in return.  (As it happens, this size 
increase disappears in the noise of slab alignment effects, but all the 
same...)  A bio variant could be created without any stack for the 
common case of direct bio submission to a nonstacking driver, though 
the expected memory saving is arguably not worth the effort.

The fast path of the bio stacking code looks like this:

void *bio_push(struct bio *bio, unsigned size, bio_end_io_t *endio)
{
        struct bioframe *frame = bio->bi_stack;
        size += sizeof(struct bioframe);
        size += -size & (BIOSTACK_ALIGN - 1);
        bio->bi_stack += size;
        *(struct bioframe *)bio->bi_stack = (struct bioframe){
                .stacksize = frame->stacksize - size,
                .framesize = size, .endio = endio };
        return frame->space;
}

void *bio_pop(struct bio *bio)
{
        struct bioframe *frame = bio->bi_stack;
        frame = bio->bi_stack -= frame->framesize;
        return frame->space;
}

For the push, just an 8 byte structure assignment, add to memory and 
some arithmetic.  The pop is just an add to memory.  Both are lockless 
and cacheline efficient.  The slab allocations that will be replaced by 
this mechanism are not lockless.

diffstat patch/3/bio.stack-2.6.23.12
 fs/bio.c            |   48 
++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/bio.h |   11 +++++++++--
 2 files changed, 57 insertions(+), 2 deletions(-)

Results

Booting a typical workstation to a graphical desktop with a vanilla 
2.6.23.13 kernel leaves the slabs looking like this:

cat /proc/slabinfo | grep bio | cut -b-38
biovec-256             2      2   3072
biovec-128             2      5   1536
biovec-64              4     10    768
biovec-16              5     15    256
biovec-4              11     59     64
biovec-1              59    203     16
bio                   96    150    128

Ater bio.bvec.combine, thinks look a little different:

cat /proc/slabinfo | grep bio | cut -b-38
bio-256                2      2   3200
bio-128                2      4   1664
bio-64                16     16    896
bio-16                20     20    384
bio-4                 30     30    128
bio-1                 90    150    128

The original bio slab is gone and the biovec slabs have been replaced by 
bio slabs with slightly larger object size.  (The slab object size 
distribution will eventually be retuned so that the smallest slabs are 
not same size when rounded to a hardware cache line boundary.)

After bio.stack, the slabs look like:

cat /proc/slabinfo | grep bio | cut -b-38
biospace               0      0    128
bio-256                2      2   3200
bio-128                2      4   1664
bio-64                 5     16    896
bio-16                 9     20    384
bio-4                  6     30    128
bio-1                120    150    128

A new biospace slab has appeared to hold bio stack extension, otherwise 
things look much the same.

4: dm.reduce.allocs-2.6.23.12

Each bio submitted to a device mapper device currently requires a 
minimum of six slab allocations:

   * Submitter creates the original bio (2 slab allocations)

   * device mapper makes a clone of the bio (2 slab allocations)

   * device mapper allocates a dm_io and a dm_target_io structure,
     attaches the dm_target_io to the dm_io and attaches the dm_io
     to the private pointer of the clone (2 slab allocations)

Each additional layer in a stacked device adds at least four more 
allocations.  Further allocations are typically done inside the target 
driver, for example, ddsnap does at least one additional allocation for 
all bio transfers except origin reads.

Thus, for a simple stack consisting of a ddsnap driver on top of an lvm 
partition a minimum of 11 slab allocations take place each time a bio 
travels down the stack.  The object of the fourth patch in this set is 
to reduce this to a single allocation per bio in the common case, by 
taking advantage of the bio stacking functionality implemented in the 
first three patches.

The bio.single.alloc patch removes three slab allocations by allowing a 
bio or a clone to be created with a single allocation instead of two.  
The dm.reduce.allocs patch, when finished, will replace the three slab 
allocations at each device mapper level with a single call to bio_push.  
Finally, ddsnap can be modified to use bio_push instead of allocating a 
hook structure, which arrives at the desired single allocation for the 
entire path.

All About Clone_and_Map

Each device mapper virtual device is actually a concatenation of device 
mapper targets.  Device mapper has to handle the case where an incoming 
bio has to be split across an arbitrary number of target devices that 
it overlaps.  This is implemented in the call chain:

    dm_request  -> 
        __split_bio ->
            __clone_and_map ->
                __map_bio

The __clone_and_map function is the one of interest (note the CRUD START 
and CRUD END comments in dm.c.).

Device mapper proceeds by working from beginning to end of the original 
bio, creating and submitting clone bios that do not cross target 
boundaries until the remainder fits entirely within a single target.  
The final case is the common one, and in fact the only case when the 
virtual device has a single target.  Today's patch optimizes the 
remainder case to use the push/pop mechanism to allow the original bio 
to be resubmitted instead of submitting a clone.  The other cases are 
left alone for now, so the code does not actually get smaller yet.   
But you can compare my new "dm_map" function to the __map_bio function 
used by the clones and see that it is quite a bit shorter already, and 
so is the code that drives it from __clone_and_map.

In the general case where a bio crosses target boundaries, cloning 
cannot be avoided because we want to initiate multiple dependent 
transfers in parallel.  However, the dm_io and dm_target_io allocations 
can be replaced by stack allocations within the bio.

Finally, the whole call chain above can be cleaned up considerably, 
because it really should be a single function, and suffers from 
considerable fluff associated with communicating between the wrongly 
factored pieces.  The result will look lot like what the generic bio 
layer needs in order to do the tricks that device mapper does.

In passing I have to say that __clone_and_map, apart from looking pretty 
ugly, is a really nice hack.  Joe Thornber surely got out of the right 
side of bed the day he dreamed that up.  It is just too bad that our 
little community managed to annoy him so much that he ran off to work 
in a bank, thus depriving us of further insights from him, and leaving 
Device Mapper core in a state of suspended animation for the last four 
years while Sun did not stand still.  Food for thought.

Finally

Once again, thankyou intrepid reader for reading all the way to here.  I 
have heard at least one complaint that my posts are too long.  Well, I 
have this to say about that: if you think today's post is a lot of 
words to read, don't even think about reading War and Peace :-)

Stay tuned for more yummy patches on the road to lvm3.

[1] ddsetup, http://lkml.org/lkml/2008/3/5/82, An alternative interface 
to device mapper

http://zumastor.org/lvm3/bio.single.alloc-2.6.23.12
http://zumastor.org/lvm3/bio.hide.endio-2.6.23.12
http://zumastor.org/lvm3/bio.stack-2.6.23.12
http://zumastor.org/lvm3/dm.reduce.allocs-2.6.23.12

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-03-11 10:52 [RFC] Stacking bio support Daniel Phillips
@ 2008-03-11 11:33 ` Ph. Marek
  2008-03-11 12:07   ` Daniel Phillips
  2008-03-14 13:59 ` Alan D. Brunelle
  1 sibling, 1 reply; 14+ messages in thread
From: Ph. Marek @ 2008-03-11 11:33 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

Hello Daniel!

On Dienstag, 11. März 2008, Daniel Phillips wrote:
> 3: bio.stack-2.6.23.12
>
> This adds an internal stack to each struct bio and introduces two new
> bio operations:
>
>         data = bio_push(bio, worksize, endio);
>         data = bio_pop(bio);
>
> The first is used before submitting a bio and the second is used in the
> endio handler, which gives the driver a nice way to share context
> between the two events.  If the requested amount stack space is not
> available in the bio, the bio stack is automatically extended.
...
> (I learned from Andreas Dilger last month at FAST that Lustre already
> implements a mechanism along these lines.)
Win32 has IRP stacks, which do mostly the same AFAIU.
	http://msdn2.microsoft.com/en-us/library/ms796144.aspx

How do you handle the reallocation?
- If you don't do it (but rely on the fact that the initial allocation is
  enough), you might end up with NO_MORE_IRP_STACK_LOCATIONS
    http://msdn2.microsoft.com/en-us/library/ms793675.aspx
- If you do reallocate, the allocations have to register themselves in
  the emergency pool (see the current thread about swapping over NFS)

I don't say that it's impossible ... just that some "interesting" things will 
await you. 

> Currently the stack size in the bio is set to zero and is always
> extended on the first bio_push.  An upcoming revision will add a
> mechanism for a block driver to specify the initial amount of stack
> space it knows will always be needed.  Some time further in the future,
> a mechanism for discovering the stack requirements of several block
> devices in a stack may be added, so that a typical bio submission is
> able to traverse the entire stack with only a single bio allocation.
That's different from the Win32 way AFAIK - there it's defined that every 
layer *has* to use its own stack location. (But it's been some time since I 
needed that, so I might be wrong.)


But I sure hope you succeed!


Regards,

Phil

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-03-11 11:33 ` Ph. Marek
@ 2008-03-11 12:07   ` Daniel Phillips
  2008-03-11 12:38     ` Boaz Harrosh
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Phillips @ 2008-03-11 12:07 UTC (permalink / raw)
  To: Ph. Marek; +Cc: linux-kernel

On Tuesday 11 March 2008 04:33, Ph. Marek wrote:
> Win32 has IRP stacks, which do mostly the same AFAIU.
> 	http://msdn2.microsoft.com/en-us/library/ms796144.aspx

That seems to be filling a similar need all right, though it looks
like a fancier (read: clunkier) solution.

> How do you handle the reallocation?
> - If you don't do it (but rely on the fact that the initial allocation is
>   enough), you might end up with NO_MORE_IRP_STACK_LOCATIONS
>     http://msdn2.microsoft.com/en-us/library/ms793675.aspx
> - If you do reallocate, the allocations have to register themselves in
>   the emergency pool (see the current thread about swapping over NFS)

Yes, I reallocate.  I do not currently register these with the
emergency pool, good spotting.  I intend to do all such reallocations
with GFP_MEMALLOC (out of tree deadlock-prevention allocation flag) and
rely on (out of tree) bio throttling to prevent the memalloc reserve
from being exhausted.  Hopefully these things will be in-tree in due
course.

Incidentally, the bio stack should make the bio throttling somewhat
more elegant, a nice circular effect.

> I don't say that it's impossible ... just that some "interesting" things will 
> await you. 

Tell me about it :-)

> That's different from the Win32 way AFAIK - there it's defined that every 
> layer *has* to use its own stack location. (But it's been some time since I 
> needed that, so I might be wrong.)

I think you are right.  In fact, I thought about this for a couple of
years, always getting hung up at exactly that point.  When I stopped
trying to see the stack as a fixed size object with preassigned frames,
the rest fell into place.  One obvious problem with the pre-assigned
approach: you don't always know the path ahead of time that a bio
will take to a physical device.
 
> But I sure hope you succeed!

Thankyou for your useful comments.  I do need to present a solution
complete with deadlock prevention.  I guess the bio code will end up
simpler there too, because with the memalloc anti-deadlock approach,
the array of bio mempools can go away.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-03-11 12:07   ` Daniel Phillips
@ 2008-03-11 12:38     ` Boaz Harrosh
  2008-04-10 13:04       ` Daniel Phillips
  0 siblings, 1 reply; 14+ messages in thread
From: Boaz Harrosh @ 2008-03-11 12:38 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Ph. Marek, linux-kernel

On Tue, Mar 11 2008 at 14:07 +0200, Daniel Phillips <phillips@phunq.net> wrote:
> On Tuesday 11 March 2008 04:33, Ph. Marek wrote:
>> Win32 has IRP stacks, which do mostly the same AFAIU.
>> 	http://msdn2.microsoft.com/en-us/library/ms796144.aspx
> 
> That seems to be filling a similar need all right, though it looks
> like a fancier (read: clunkier) solution.
> 
>> How do you handle the reallocation?
>> - If you don't do it (but rely on the fact that the initial allocation is
>>   enough), you might end up with NO_MORE_IRP_STACK_LOCATIONS
>>     http://msdn2.microsoft.com/en-us/library/ms793675.aspx
>> - If you do reallocate, the allocations have to register themselves in
>>   the emergency pool (see the current thread about swapping over NFS)
> 
> Yes, I reallocate.  I do not currently register these with the
> emergency pool, good spotting.  I intend to do all such reallocations
> with GFP_MEMALLOC (out of tree deadlock-prevention allocation flag) and
> rely on (out of tree) bio throttling to prevent the memalloc reserve
> from being exhausted.  Hopefully these things will be in-tree in due
> course.
> 

I guess that is not worse then current implementation. So many slab
allocations saved, lets you have a few of your own.
>From passed experience, though, relocation is changed to a link-list
(chaining) of sorts the first time relocation starts to hit consistently.
for 1-in-100 case it can stay, any higher then that better allocate more
space and chain it to the old. It also fits better with the pools paradigm.
I would leave the reallocation for a while but make sure it is all hidden
behind the right API to be easily enhanced later on.
 
> Incidentally, the bio stack should make the bio throttling somewhat
> more elegant, a nice circular effect.
> 
>> I don't say that it's impossible ... just that some "interesting" things will 
>> await you. 
> 
> Tell me about it :-)
> 
>> That's different from the Win32 way AFAIK - there it's defined that every 
>> layer *has* to use its own stack location. (But it's been some time since I 
>> needed that, so I might be wrong.)
> 
> I think you are right.  In fact, I thought about this for a couple of
> years, always getting hung up at exactly that point.  When I stopped
> trying to see the stack as a fixed size object with preassigned frames,
> the rest fell into place.  One obvious problem with the pre-assigned
> approach: you don't always know the path ahead of time that a bio
> will take to a physical device.
>  
>> But I sure hope you succeed!
> 
> Thankyou for your useful comments.  I do need to present a solution
> complete with deadlock prevention.  I guess the bio code will end up
> simpler there too, because with the memalloc anti-deadlock approach,
> the array of bio mempools can go away.
> 
> Regards,
> 
> Daniel

Me too, I'll be watching out on your progress, It looks like the building
blocks of some very advanced possibilities. ("Pure Data")

Do you have a public git tree that we can inspect from time to time.

Cheers 
Boaz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-03-11 10:52 [RFC] Stacking bio support Daniel Phillips
  2008-03-11 11:33 ` Ph. Marek
@ 2008-03-14 13:59 ` Alan D. Brunelle
  2008-03-14 14:04   ` [RFC] Stacking bio support - patch 1/4 for 2.6.23.17 (stable) Alan D. Brunelle
                     ` (4 more replies)
  1 sibling, 5 replies; 14+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 13:59 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: linux-kernel

Hi Daniel - 

I was able to get the first 3 patches ported to the stable tree of 2.6.23, but there were compilation errors - fixed in the 4th patch in the stream I'll post next. Regardless of that, it still failed to boot on a 2-way AMD64. Do you have a set of these patches that builds correctly & boots! :-)

There are checkpatch.pl notices about your patches as well. In any event, like I said above, I'll post the 4 patches that get it to at least build properly. 

Alan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support - patch 1/4 for 2.6.23.17 (stable)
  2008-03-14 13:59 ` Alan D. Brunelle
@ 2008-03-14 14:04   ` Alan D. Brunelle
  2008-03-14 19:41     ` Daniel Phillips
  2008-03-14 14:04   ` [RFC] Stacking bio support Alan D. Brunelle
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 14:04 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: Daniel Phillips, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: 0001-BIO-single-alloc.patch --]
[-- Type: text/x-patch, Size: 4512 bytes --]

>From 7a91a074d4442d37ec43830c58ac94d39d17bd3d Mon Sep 17 00:00:00 2001
From: Alan D. Brunelle <alan.brunelle@hp.com>
Date: Fri, 14 Mar 2008 07:52:09 -0400
Subject: [PATCH] BIO single alloc


---
 drivers/md/dm.c |   76 ++++++++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 58 insertions(+), 18 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index fac09d5..02173df 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -491,11 +491,9 @@ static void dec_pending(struct dm_io *io, int error)
 	}
 }
 
-static int clone_endio(struct bio *bio, unsigned int done, int error)
+static int dm_endio(struct bio *bio, unsigned int done, int error, struct dm_target_io *tio)
 {
 	int r = 0;
-	struct dm_target_io *tio = bio->bi_private;
-	struct mapped_device *md = tio->io->md;
 	dm_endio_fn endio = tio->ti->type->end_io;
 
 	if (bio->bi_size)
@@ -522,17 +520,28 @@ static int clone_endio(struct bio *bio, unsigned int done, int error)
 	}
 
 	dec_pending(tio->io, error);
+	return r;
+}
 
+static int clone_endio(struct bio *bio, unsigned int done, int error)
+{
+	struct dm_target_io *tio = bio->bi_private;
+	struct mapped_device *md = tio->io->md;
+	int r = dm_endio(bio, done, error, tio);
 	/*
 	 * Store md for cleanup instead of tio which is about to get freed.
 	 */
 	bio->bi_private = md->bs;
-
 	bio_put(bio);
 	free_tio(md, tio);
 	return r;
 }
 
+static int dm_endio_pop(struct bio *bio, unsigned int done, int error)
+{
+	return dm_endio(bio, done, error, bio_pop(bio));
+}
+
 static sector_t max_io_len(struct mapped_device *md,
 			   sector_t sector, struct dm_target *ti)
 {
@@ -553,8 +562,7 @@ static sector_t max_io_len(struct mapped_device *md,
 	return len;
 }
 
-static void __map_bio(struct dm_target *ti, struct bio *clone,
-		      struct dm_target_io *tio)
+static void __map_bio(struct dm_target *ti, struct bio *clone, struct dm_target_io *tio)
 {
 	int r;
 	sector_t sector;
@@ -600,6 +608,37 @@ static void __map_bio(struct dm_target *ti, struct bio *clone,
 	}
 }
 
+static void dm_map(struct bio *bio, struct dm_target *ti, struct dm_target_io *tio)
+{
+	int r;
+	sector_t sector;
+
+	/*
+	 * Map the bio.  If r == 0 we don't need to do
+	 * anything, the target has assumed ownership of
+	 * this io.
+	 */
+	atomic_inc(&tio->io->io_count);
+	sector = bio->bi_sector;
+	r = ti->type->map(ti, bio, &tio->info);
+	if (r == DM_MAPIO_REMAPPED) {
+		/* the bio has been remapped so dispatch it */
+
+		blk_add_trace_remap(bdev_get_queue(bio->bi_bdev), bio,
+				    tio->io->bio->bi_bdev->bd_dev,
+				    bio->bi_sector, sector);
+
+		generic_make_request(bio);
+	} else if (r < 0 || r == DM_MAPIO_REQUEUE) {
+		/* error the io and bail out, or requeue it if needed */
+		bio_pop(bio);
+		dec_pending(tio->io, r);
+	} else if (r) {
+		DMWARN("unimplemented target map return value: %d", r);
+		BUG();
+	}
+}
+
 struct clone_info {
 	struct mapped_device *md;
 	struct dm_table *map;
@@ -676,23 +715,18 @@ static int __clone_and_map(struct clone_info *ci)
 
 	max = max_io_len(ci->md, ci->sector, ti);
 
-	/*
-	 * Allocate a target io object.
-	 */
-	tio = alloc_tio(ci->md);
-	tio->io = ci->io;
-	tio->ti = ti;
-	memset(&tio->info, 0, sizeof(tio->info));
-
 	if (ci->sector_count <= max) {
 		/*
 		 * Optimise for the simple case where we can do all of
 		 * the remaining io with a single clone.
 		 */
-		clone = clone_bio(bio, ci->sector, ci->idx,
-				  bio->bi_vcnt - ci->idx, ci->sector_count,
-				  ci->md->bs);
-		__map_bio(ti, clone, tio);
+		tio = bio_push(bio, sizeof(*tio), dm_endio_pop);
+		*tio = (typeof(*tio)){ .io = ci->io, .ti = ti };
+		bio->bi_sector = ci->sector;
+		bio->bi_idx = ci->idx;
+		bio->bi_size = to_bytes(ci->sector_count);
+		bio->bi_flags &= ~(1 << BIO_SEG_VALID);
+		dm_map(bio, ti, tio);
 		ci->sector_count = 0;
 
 	} else if (to_sector(bio->bi_io_vec[ci->idx].bv_len) <= max) {
@@ -704,6 +738,9 @@ static int __clone_and_map(struct clone_info *ci)
 		sector_t remaining = max;
 		sector_t bv_len;
 
+		tio = alloc_tio(ci->md);
+		*tio = (typeof(*tio)){ .io = ci->io, .ti = ti };
+
 		for (i = ci->idx; remaining && (i < bio->bi_vcnt); i++) {
 			bv_len = to_sector(bio->bi_io_vec[i].bv_len);
 
@@ -730,6 +767,9 @@ static int __clone_and_map(struct clone_info *ci)
 		sector_t remaining = to_sector(bv->bv_len);
 		unsigned int offset = 0;
 
+		tio = alloc_tio(ci->md);
+		*tio = (typeof(*tio)){ .io = ci->io, .ti = ti };
+
 		do {
 			if (offset) {
 				ti = dm_table_find_target(ci->map, ci->sector);
-- 
1.5.2.5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-03-14 13:59 ` Alan D. Brunelle
  2008-03-14 14:04   ` [RFC] Stacking bio support - patch 1/4 for 2.6.23.17 (stable) Alan D. Brunelle
@ 2008-03-14 14:04   ` Alan D. Brunelle
  2008-03-14 14:06     ` Alan D. Brunelle
  2008-03-14 14:05   ` [RFC] Stacking bio support - patch 3/4 for 2.6.23.17 (stable) Alan D. Brunelle
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 14:04 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: Daniel Phillips, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: 0002-Hide-endio.patch --]
[-- Type: text/x-patch, Size: 33547 bytes --]

>From afe047bdd4707a1013400f88b50b323d031ae95f Mon Sep 17 00:00:00 2001
From: Alan D. Brunelle <alan.brunelle@hp.com>
Date: Fri, 14 Mar 2008 07:52:38 -0400
Subject: [PATCH] Hide endio


---
 block/ll_rw_blk.c            |    6 ++--
 drivers/block/floppy.c       |    2 +-
 drivers/block/pktcdvd.c      |    8 +++---
 drivers/md/dm-crypt.c        |    2 +-
 drivers/md/dm-emc.c          |    2 +-
 drivers/md/dm-io.c           |    2 +-
 drivers/md/dm.c              |    2 +-
 drivers/md/faulty.c          |    4 +-
 drivers/md/md.c              |    6 ++--
 drivers/md/multipath.c       |    4 +-
 drivers/md/raid1.c           |   44 +++++++++++++++++++++---------------------
 drivers/md/raid10.c          |   24 +++++++++++-----------
 drivers/md/raid5.c           |   18 ++++++++--------
 drivers/s390/block/xpram.c   |    2 +-
 drivers/scsi/scsi_lib.c      |    2 +-
 fs/bio.c                     |   12 +++++-----
 fs/block_dev.c               |    2 +-
 fs/buffer.c                  |    2 +-
 fs/direct-io.c               |    4 +-
 fs/gfs2/super.c              |    3 +-
 fs/jfs/jfs_logmgr.c          |    4 +-
 fs/jfs/jfs_metapage.c        |    4 +-
 fs/mpage.c                   |    4 +-
 fs/ocfs2/cluster/heartbeat.c |    2 +-
 fs/xfs/linux-2.6/xfs_aops.c  |    6 ++--
 fs/xfs/linux-2.6/xfs_buf.c   |    4 +-
 include/linux/bio.h          |   12 ++++++++++-
 kernel/power/swap.c          |    2 +-
 mm/bounce.c                  |    8 +++---
 mm/page_io.c                 |    2 +-
 30 files changed, 104 insertions(+), 95 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 026cf24..b6c9ff7 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -564,14 +564,14 @@ static int ordered_bio_endio(struct request *rq, struct bio *bio,
 	if (error && !q->orderr)
 		q->orderr = error;
 
-	endio = bio->bi_end_io;
+	endio = bio_get_endio(bio);
 	private = bio->bi_private;
-	bio->bi_end_io = flush_dry_bio_endio;
+	bio_set_endio(bio, flush_dry_bio_endio);
 	bio->bi_private = q;
 
 	bio_endio(bio, nbytes, error);
 
-	bio->bi_end_io = endio;
+	bio_set_endio(bio, endio);
 	bio->bi_private = private;
 
 	return 1;
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 085b779..ac378cd 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -3855,7 +3855,7 @@ static int __floppy_read_block_0(struct block_device *bdev)
 	bio.bi_sector = 0;
 	init_completion(&complete);
 	bio.bi_private = &complete;
-	bio.bi_end_io = floppy_rb0_complete;
+	bio_set_endio(&bio, floppy_rb0_complete);
 
 	submit_bio(READ, &bio);
 	generic_unplug_device(bdev_get_queue(bdev));
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index fadbfd8..51431a0 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1150,7 +1150,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, struct packet_data *pkt)
 		bio->bi_max_vecs = 1;
 		bio->bi_sector = pkt->sector + f * (CD_FRAMESIZE >> 9);
 		bio->bi_bdev = pd->bdev;
-		bio->bi_end_io = pkt_end_io_read;
+		bio_set_endio(bio, pkt_end_io_read);
 		bio->bi_private = pkt;
 
 		p = (f * CD_FRAMESIZE) / PAGE_SIZE;
@@ -1250,7 +1250,7 @@ static int pkt_start_recovery(struct packet_data *pkt)
 	BUG_ON(pkt->bio->bi_rw != (1 << BIO_RW));
 	BUG_ON(pkt->bio->bi_vcnt != pkt->frames);
 	BUG_ON(pkt->bio->bi_size != pkt->frames * CD_FRAMESIZE);
-	BUG_ON(pkt->bio->bi_end_io != pkt_end_io_packet_write);
+	BUG_ON(bio_get_endio(pkt->bio) != pkt_end_io_packet_write);
 	BUG_ON(pkt->bio->bi_private != pkt);
 
 	drop_super(sb);
@@ -1446,7 +1446,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, struct packet_data *pkt)
 	pkt->w_bio->bi_max_vecs = PACKET_MAX_SIZE;
 	pkt->w_bio->bi_sector = pkt->sector;
 	pkt->w_bio->bi_bdev = pd->bdev;
-	pkt->w_bio->bi_end_io = pkt_end_io_packet_write;
+	bio_set_endio(pkt->w_bio, pkt_end_io_packet_write);
 	pkt->w_bio->bi_private = pkt;
 	for (f = 0; f < pkt->frames; f++)
 		if (!bio_add_page(pkt->w_bio, bvec[f].bv_page, CD_FRAMESIZE, bvec[f].bv_offset))
@@ -2503,7 +2503,7 @@ static int pkt_make_request(struct request_queue *q, struct bio *bio)
 		psd->bio = bio;
 		cloned_bio->bi_bdev = pd->bdev;
 		cloned_bio->bi_private = psd;
-		cloned_bio->bi_end_io = pkt_end_io_read_cloned;
+		bio_set_endio(cloned_bio, pkt_end_io_read_cloned);
 		pd->stats.secs_r += bio->bi_size >> 9;
 		pkt_queue_bio(pd, cloned_bio);
 		return 0;
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index ba2e135..f99cc94 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -549,7 +549,7 @@ static void clone_init(struct dm_crypt_io *io, struct bio *clone)
 	struct crypt_config *cc = io->target->private;
 
 	clone->bi_private = io;
-	clone->bi_end_io  = crypt_endio;
+	bio_set_endio(clone, crypt_endio);
 	clone->bi_bdev    = cc->dev->bdev;
 	clone->bi_rw      = io->base_bio->bi_rw;
 	clone->bi_destructor = dm_crypt_bio_destructor;
diff --git a/drivers/md/dm-emc.c b/drivers/md/dm-emc.c
index 265c467..ef78699 100644
--- a/drivers/md/dm-emc.c
+++ b/drivers/md/dm-emc.c
@@ -76,7 +76,7 @@ static struct bio *get_failover_bio(struct dm_path *path, unsigned data_size)
 	bio->bi_bdev = path->dev->bdev;
 	bio->bi_sector = 0;
 	bio->bi_private = path;
-	bio->bi_end_io = emc_endio;
+	bio_set_endio(bio, emc_endio);
 
 	page = alloc_page(GFP_ATOMIC);
 	if (!page) {
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index f3a7724..14be510 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -300,7 +300,7 @@ static void do_region(int rw, unsigned int region, struct io_region *where,
 		bio = bio_alloc_bioset(GFP_NOIO, num_bvecs, io->client->bios);
 		bio->bi_sector = where->sector + (where->count - remaining);
 		bio->bi_bdev = where->bdev;
-		bio->bi_end_io = endio;
+		bio_set_endio(bio, endio);
 		bio->bi_private = io;
 		bio->bi_destructor = dm_bio_destructor;
 		bio->bi_max_vecs--;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 02173df..fcdcf51 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -573,7 +573,7 @@ static void __map_bio(struct dm_target *ti, struct bio *clone, struct dm_target_
 	 */
 	BUG_ON(!clone->bi_size);
 
-	clone->bi_end_io = clone_endio;
+	bio_set_endio(clone, clone_endio);
 	clone->bi_private = tio;
 
 	/*
diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c
index cb059cf..15a34f1 100644
--- a/drivers/md/faulty.c
+++ b/drivers/md/faulty.c
@@ -76,7 +76,7 @@ static int faulty_fail(struct bio *bio, unsigned int bytes_done, int error)
 		bio_put(bio);
 
 	clear_bit(BIO_UPTODATE, &b->bi_flags);
-	return (b->bi_end_io)(b, bytes_done, -EIO);
+	return (bio_get_endio(b))(b, bytes_done, -EIO);
 }
 
 typedef struct faulty_conf {
@@ -212,7 +212,7 @@ static int make_request(struct request_queue *q, struct bio *bio)
 		struct bio *b = bio_clone(bio, GFP_NOIO);
 		b->bi_bdev = conf->rdev->bdev;
 		b->bi_private = bio;
-		b->bi_end_io = faulty_fail;
+		bio_set_endio(b, faulty_fail);
 		generic_make_request(b);
 		return 0;
 	} else {
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f883b7e..e0901c5 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -450,7 +450,7 @@ void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev,
 	bio->bi_sector = sector;
 	bio_add_page(bio, page, size, 0);
 	bio->bi_private = rdev;
-	bio->bi_end_io = super_written;
+	bio_set_endio(bio, super_written);
 	bio->bi_rw = rw;
 
 	atomic_inc(&mddev->pending_writes);
@@ -459,7 +459,7 @@ void md_super_write(mddev_t *mddev, mdk_rdev_t *rdev,
 		rw |= (1<<BIO_RW_BARRIER);
 		rbio = bio_clone(bio, GFP_NOIO);
 		rbio->bi_private = bio;
-		rbio->bi_end_io = super_written_barrier;
+		bio_set_endio(rbio, super_written_barrier);
 		submit_bio(rw, rbio);
 	} else
 		submit_bio(rw, bio);
@@ -512,7 +512,7 @@ int sync_page_io(struct block_device *bdev, sector_t sector, int size,
 	bio_add_page(bio, page, size, 0);
 	init_completion(&event);
 	bio->bi_private = &event;
-	bio->bi_end_io = bi_complete;
+	bio_set_endio(bio, bi_complete);
 	submit_bio(rw, bio);
 	wait_for_completion(&event);
 
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 1e2af43..7b3953e 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -179,7 +179,7 @@ static int multipath_make_request (struct request_queue *q, struct bio * bio)
 	mp_bh->bio.bi_sector += multipath->rdev->data_offset;
 	mp_bh->bio.bi_bdev = multipath->rdev->bdev;
 	mp_bh->bio.bi_rw |= (1 << BIO_RW_FAILFAST);
-	mp_bh->bio.bi_end_io = multipath_end_request;
+	bio_set_endio(&mp_bh->bio, multipath_end_request);
 	mp_bh->bio.bi_private = mp_bh;
 	generic_make_request(&mp_bh->bio);
 	return 0;
@@ -425,7 +425,7 @@ static void multipathd (mddev_t *mddev)
 			bio->bi_sector += conf->multipaths[mp_bh->path].rdev->data_offset;
 			bio->bi_bdev = conf->multipaths[mp_bh->path].rdev->bdev;
 			bio->bi_rw |= (1 << BIO_RW_FAILFAST);
-			bio->bi_end_io = multipath_end_request;
+			bio_set_endio(bio, multipath_end_request);
 			bio->bi_private = mp_bh;
 			generic_make_request(bio);
 		}
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f33a729..63ffaaf 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -197,7 +197,7 @@ static void put_buf(r1bio_t *r1_bio)
 
 	for (i=0; i<conf->raid_disks; i++) {
 		struct bio *bio = r1_bio->bios[i];
-		if (bio->bi_end_io)
+		if (bio_get_endio(bio))
 			rdev_dec_pending(conf->mirrors[i].rdev, r1_bio->mddev);
 	}
 
@@ -839,7 +839,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
 
 		read_bio->bi_sector = r1_bio->sector + mirror->rdev->data_offset;
 		read_bio->bi_bdev = mirror->rdev->bdev;
-		read_bio->bi_end_io = raid1_end_read_request;
+		bio_set_endio(read_bio, raid1_end_read_request);
 		read_bio->bi_rw = READ | do_sync;
 		read_bio->bi_private = r1_bio;
 
@@ -910,7 +910,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
 
 		mbio->bi_sector	= r1_bio->sector + conf->mirrors[i].rdev->data_offset;
 		mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
-		mbio->bi_end_io	= raid1_end_write_request;
+		bio_set_endio(mbio, raid1_end_write_request;)
 		mbio->bi_rw = WRITE | do_barriers | do_sync;
 		mbio->bi_private = r1_bio;
 
@@ -1224,7 +1224,7 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 		int primary;
 		if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) {
 			for (i=0; i<mddev->raid_disks; i++)
-				if (r1_bio->bios[i]->bi_end_io == end_sync_read)
+				if (bio_get_endio(r1_bio->bios[i]) == end_sync_read)
 					md_error(mddev, conf->mirrors[i].rdev);
 
 			md_done_sync(mddev, r1_bio->sectors, 1);
@@ -1232,15 +1232,15 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 			return;
 		}
 		for (primary=0; primary<mddev->raid_disks; primary++)
-			if (r1_bio->bios[primary]->bi_end_io == end_sync_read &&
+			if (bio_get_endio(r1_bio->bios[primary]) == end_sync_read &&
 			    test_bit(BIO_UPTODATE, &r1_bio->bios[primary]->bi_flags)) {
-				r1_bio->bios[primary]->bi_end_io = NULL;
+				bio_set_endio(r1_bio->bios[primary], NULL);
 				rdev_dec_pending(conf->mirrors[primary].rdev, mddev);
 				break;
 			}
 		r1_bio->read_disk = primary;
 		for (i=0; i<mddev->raid_disks; i++)
-			if (r1_bio->bios[i]->bi_end_io == end_sync_read) {
+			if (bio_set_endio(r1_bio->bios[i], end_sync_read)) {
 				int j;
 				int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
 				struct bio *pbio = r1_bio->bios[primary];
@@ -1261,7 +1261,7 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 				if (j >= 0)
 					mddev->resync_mismatches += r1_bio->sectors;
 				if (j < 0 || test_bit(MD_RECOVERY_CHECK, &mddev->recovery)) {
-					sbio->bi_end_io = NULL;
+					bio_set_endio(sbio, NULL);
 					rdev_dec_pending(conf->mirrors[i].rdev, mddev);
 				} else {
 					/* fixup the bio for reuse */
@@ -1309,7 +1309,7 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 			if (s > (PAGE_SIZE>>9))
 				s = PAGE_SIZE >> 9;
 			do {
-				if (r1_bio->bios[d]->bi_end_io == end_sync_read) {
+				if (bio_get_endio(r1_bio->bios[d]) == end_sync_read) {
 					/* No rcu protection needed here devices
 					 * can only be removed when no resync is
 					 * active, and resync is currently active
@@ -1337,7 +1337,7 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 					if (d == 0)
 						d = conf->raid_disks;
 					d--;
-					if (r1_bio->bios[d]->bi_end_io != end_sync_read)
+					if (bio_get_endio(r1_bio->bios[d]) != end_sync_read)
 						continue;
 					rdev = conf->mirrors[d].rdev;
 					atomic_add(s, &rdev->corrected_errors);
@@ -1353,7 +1353,7 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 					if (d == 0)
 						d = conf->raid_disks;
 					d--;
-					if (r1_bio->bios[d]->bi_end_io != end_sync_read)
+					if (bio_get_endio(r1_bio->bios[d]) != end_sync_read)
 						continue;
 					rdev = conf->mirrors[d].rdev;
 					if (sync_page_io(rdev->bdev,
@@ -1387,14 +1387,14 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 	atomic_set(&r1_bio->remaining, 1);
 	for (i = 0; i < disks ; i++) {
 		wbio = r1_bio->bios[i];
-		if (wbio->bi_end_io == NULL ||
-		    (wbio->bi_end_io == end_sync_read &&
+		if (bio_get_endio(wbio) == NULL ||
+		    (bio_get_endio(wbio) == end_sync_read &&
 		     (i == r1_bio->read_disk ||
 		      !test_bit(MD_RECOVERY_SYNC, &mddev->recovery))))
 			continue;
 
 		wbio->bi_rw = WRITE;
-		wbio->bi_end_io = end_sync_write;
+		bio_set_endio(wbio, end_sync_write);
 		atomic_inc(&r1_bio->remaining);
 		md_sync_acct(conf->mirrors[i].rdev->bdev, wbio->bi_size >> 9);
 
@@ -1580,7 +1580,7 @@ static void raid1d(mddev_t *mddev)
 					bio->bi_sector = r1_bio->sector +
 						conf->mirrors[i].rdev->data_offset;
 					bio->bi_bdev = conf->mirrors[i].rdev->bdev;
-					bio->bi_end_io = raid1_end_write_request;
+					bio_set_endio(bio, raid1_end_write_request);
 					bio->bi_rw = WRITE | do_sync;
 					bio->bi_private = r1_bio;
 					r1_bio->bios[i] = bio;
@@ -1628,7 +1628,7 @@ static void raid1d(mddev_t *mddev)
 					       (unsigned long long)r1_bio->sector);
 				bio->bi_sector = r1_bio->sector + rdev->data_offset;
 				bio->bi_bdev = rdev->bdev;
-				bio->bi_end_io = raid1_end_read_request;
+				bio_set_endio(bio, raid1_end_read_request);
 				bio->bi_rw = READ | do_sync;
 				bio->bi_private = r1_bio;
 				unplug = 1;
@@ -1763,7 +1763,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 		bio->bi_phys_segments = 0;
 		bio->bi_hw_segments = 0;
 		bio->bi_size = 0;
-		bio->bi_end_io = NULL;
+		bio_set_endio(bio, NULL);
 		bio->bi_private = NULL;
 
 		rdev = rcu_dereference(conf->mirrors[i].rdev);
@@ -1773,12 +1773,12 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 			continue;
 		} else if (!test_bit(In_sync, &rdev->flags)) {
 			bio->bi_rw = WRITE;
-			bio->bi_end_io = end_sync_write;
+			bio_set_endio(bio, end_sync_write);
 			write_targets ++;
 		} else {
 			/* may need to read from here */
 			bio->bi_rw = READ;
-			bio->bi_end_io = end_sync_read;
+			bio_set_endio(bio, end_sync_read);
 			if (test_bit(WriteMostly, &rdev->flags)) {
 				if (wonly < 0)
 					wonly = i;
@@ -1834,7 +1834,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 
 		for (i=0 ; i < conf->raid_disks; i++) {
 			bio = r1_bio->bios[i];
-			if (bio->bi_end_io) {
+			if (bio_get_endio(bio)) {
 				page = bio->bi_io_vec[bio->bi_vcnt].bv_page;
 				if (bio_add_page(bio, page, len, 0) == 0) {
 					/* stop here */
@@ -1842,7 +1842,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 					while (i > 0) {
 						i--;
 						bio = r1_bio->bios[i];
-						if (bio->bi_end_io==NULL)
+						if (bio_get_endio(bio) == NULL)
 							continue;
 						/* remove last page from this bio */
 						bio->bi_vcnt--;
@@ -1867,7 +1867,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 		atomic_set(&r1_bio->remaining, read_targets);
 		for (i=0; i<conf->raid_disks; i++) {
 			bio = r1_bio->bios[i];
-			if (bio->bi_end_io == end_sync_read) {
+			if (bio_get_endio(bio) == end_sync_read) {
 				md_sync_acct(bio->bi_bdev, nr_sectors);
 				generic_make_request(bio);
 			}
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4e53792..1d1aee0 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -863,7 +863,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
 		read_bio->bi_sector = r10_bio->devs[slot].addr +
 			mirror->rdev->data_offset;
 		read_bio->bi_bdev = mirror->rdev->bdev;
-		read_bio->bi_end_io = raid10_end_read_request;
+		bio_set_endio(read_bio, raid10_end_read_request);
 		read_bio->bi_rw = READ | do_sync;
 		read_bio->bi_private = r10_bio;
 
@@ -909,7 +909,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
 		mbio->bi_sector	= r10_bio->devs[i].addr+
 			conf->mirrors[d].rdev->data_offset;
 		mbio->bi_bdev = conf->mirrors[d].rdev->bdev;
-		mbio->bi_end_io	= raid10_end_write_request;
+		bio_set_endio(mbio, raid10_end_write_request);
 		mbio->bi_rw = WRITE | do_sync;
 		mbio->bi_private = r10_bio;
 
@@ -1273,7 +1273,7 @@ static void sync_request_write(mddev_t *mddev, r10bio_t *r10_bio)
 
 		tbio = r10_bio->devs[i].bio;
 
-		if (tbio->bi_end_io != end_sync_read)
+		if (bio_get_endio(tbio) != end_sync_read)
 			continue;
 		if (i == first)
 			continue;
@@ -1320,7 +1320,7 @@ static void sync_request_write(mddev_t *mddev, r10bio_t *r10_bio)
 			       page_address(fbio->bi_io_vec[j].bv_page),
 			       PAGE_SIZE);
 		}
-		tbio->bi_end_io = end_sync_write;
+		bio_set_endio(tbio, end_sync_write);
 
 		d = r10_bio->devs[i].devnum;
 		atomic_inc(&conf->mirrors[d].rdev->nr_pending);
@@ -1588,7 +1588,7 @@ static void raid10d(mddev_t *mddev)
 				bio->bi_bdev = rdev->bdev;
 				bio->bi_rw = READ | do_sync;
 				bio->bi_private = r10_bio;
-				bio->bi_end_io = raid10_end_read_request;
+				bio_set_endio(bio, raid10_end_read_request);
 				unplug = 1;
 				generic_make_request(bio);
 			}
@@ -1796,7 +1796,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 						bio->bi_next = biolist;
 						biolist = bio;
 						bio->bi_private = r10_bio;
-						bio->bi_end_io = end_sync_read;
+						bio_set_endio(bio, end_sync_read);
 						bio->bi_rw = READ;
 						bio->bi_sector = r10_bio->devs[j].addr +
 							conf->mirrors[d].rdev->data_offset;
@@ -1813,7 +1813,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 						bio->bi_next = biolist;
 						biolist = bio;
 						bio->bi_private = r10_bio;
-						bio->bi_end_io = end_sync_write;
+						bio_set_endio(bio, end_sync_write);
 						bio->bi_rw = WRITE;
 						bio->bi_sector = r10_bio->devs[k].addr +
 							conf->mirrors[i].rdev->data_offset;
@@ -1873,7 +1873,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 		for (i=0; i<conf->copies; i++) {
 			int d = r10_bio->devs[i].devnum;
 			bio = r10_bio->devs[i].bio;
-			bio->bi_end_io = NULL;
+			bio_set_endio(bio, NULL);
 			clear_bit(BIO_UPTODATE, &bio->bi_flags);
 			if (conf->mirrors[d].rdev == NULL ||
 			    test_bit(Faulty, &conf->mirrors[d].rdev->flags))
@@ -1883,7 +1883,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 			bio->bi_next = biolist;
 			biolist = bio;
 			bio->bi_private = r10_bio;
-			bio->bi_end_io = end_sync_read;
+			bio_set_endio(bio, end_sync_read);
 			bio->bi_rw = READ;
 			bio->bi_sector = r10_bio->devs[i].addr +
 				conf->mirrors[d].rdev->data_offset;
@@ -1894,7 +1894,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 		if (count < 2) {
 			for (i=0; i<conf->copies; i++) {
 				int d = r10_bio->devs[i].devnum;
-				if (r10_bio->devs[i].bio->bi_end_io)
+				if (bio_get_endio(&r10_bio->devs[i].bio))
 					rdev_dec_pending(conf->mirrors[d].rdev, mddev);
 			}
 			put_buf(r10_bio);
@@ -1906,7 +1906,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 	for (bio = biolist; bio ; bio=bio->bi_next) {
 
 		bio->bi_flags &= ~(BIO_POOL_MASK - 1);
-		if (bio->bi_end_io)
+		if (bio_get_endio(bio))
 			bio->bi_flags |= 1 << BIO_UPTODATE;
 		bio->bi_vcnt = 0;
 		bio->bi_idx = 0;
@@ -1956,7 +1956,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 		r10_bio = bio->bi_private;
 		r10_bio->sectors = nr_sectors;
 
-		if (bio->bi_end_io == end_sync_read) {
+		if (bio_get_endio(bio) == end_sync_read) {
 			md_sync_acct(bio->bi_bdev, nr_sectors);
 			generic_make_request(bio);
 		}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 3085228..9fe55db 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -113,9 +113,9 @@ static void return_io(struct bio *return_bi)
 		return_bi = bi->bi_next;
 		bi->bi_next = NULL;
 		bi->bi_size = 0;
-		bi->bi_end_io(bi, bytes,
+		(bio_get_endio(bi))(bi, bytes,
 			      test_bit(BIO_UPTODATE, &bi->bi_flags)
-			        ? 0 : -EIO);
+			        ? 0 : -EIO); // use bio_endio!
 		bi = return_bi;
 	}
 }
@@ -414,9 +414,9 @@ static void ops_run_io(struct stripe_head *sh)
 
 		bi->bi_rw = rw;
 		if (rw == WRITE)
-			bi->bi_end_io = raid5_end_write_request;
+			bio_set_endio(bi, raid5_end_write_request);
 		else
-			bi->bi_end_io = raid5_end_read_request;
+			bio_set_endio(bi, raid5_end_read_request);
 
 		rcu_read_lock();
 		rdev = rcu_dereference(conf->disks[i].rdev);
@@ -3101,9 +3101,9 @@ static void handle_stripe6(struct stripe_head *sh, struct page *tmp_page)
 
 		bi->bi_rw = rw;
 		if (rw == WRITE)
-			bi->bi_end_io = raid5_end_write_request;
+			bio_set_endio(bi, raid5_end_write_request);
 		else
-			bi->bi_end_io = raid5_end_read_request;
+			bio_set_endio(bi, raid5_end_read_request);
 
 		rcu_read_lock();
 		rdev = rcu_dereference(conf->disks[i].rdev);
@@ -3433,7 +3433,7 @@ static int chunk_aligned_read(struct request_queue *q, struct bio * raid_bio)
 	 *   set bi_end_io to a new function, and set bi_private to the
 	 *     original bio.
 	 */
-	align_bi->bi_end_io  = raid5_align_endio;
+	bio_set_endio(align_bi, raid5_align_endio);
 	align_bi->bi_private = raid_bio;
 	/*
 	 *	compute position
@@ -3612,7 +3612,7 @@ static int make_request(struct request_queue *q, struct bio * bi)
 		if ( rw == WRITE )
 			md_write_end(mddev);
 		bi->bi_size = 0;
-		bi->bi_end_io(bi, bytes,
+		(bio_get_endio(bi))((bi, bytes,
 			      test_bit(BIO_UPTODATE, &bi->bi_flags)
 			        ? 0 : -EIO);
 	}
@@ -3893,7 +3893,7 @@ static int  retry_aligned_read(raid5_conf_t *conf, struct bio *raid_bio)
 		int bytes = raid_bio->bi_size;
 
 		raid_bio->bi_size = 0;
-		raid_bio->bi_end_io(raid_bio, bytes,
+		(bio_get_endio(raid_bio))(raid_bio, bytes,
 			      test_bit(BIO_UPTODATE, &raid_bio->bi_flags)
 			        ? 0 : -EIO);
 	}
diff --git a/drivers/s390/block/xpram.c b/drivers/s390/block/xpram.c
index 354a060..c7a3c82 100644
--- a/drivers/s390/block/xpram.c
+++ b/drivers/s390/block/xpram.c
@@ -232,7 +232,7 @@ static int xpram_make_request(struct request_queue *q, struct bio *bio)
 	set_bit(BIO_UPTODATE, &bio->bi_flags);
 	bytes = bio->bi_size;
 	bio->bi_size = 0;
-	bio->bi_end_io(bio, bytes, 0);
+	(bio_get_endio(bio))(bio, bytes, 0); // use bio_endio!
 	return 0;
 fail:
 	bio_io_error(bio, bio->bi_size);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a417a6f..1c841a5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -324,7 +324,7 @@ static int scsi_req_map_sg(struct request *rq, struct scatterlist *sgl,
 					err = -ENOMEM;
 					goto free_bios;
 				}
-				bio->bi_end_io = scsi_bi_endio;
+				bio_set_endio(bio, scsi_bi_endio);
 			}
 
 			if (bio_add_pc_page(q, bio, page, bytes, off) !=
diff --git a/fs/bio.c b/fs/bio.c
index 29a44c1..29cdd8b 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -841,7 +841,7 @@ static struct bio *__bio_map_kern(struct request_queue *q, void *data,
 		offset = 0;
 	}
 
-	bio->bi_end_io = bio_map_kern_endio;
+	bio_set_endio(bio, bio_map_kern_endio);
 	return bio;
 }
 
@@ -1011,7 +1011,7 @@ void bio_check_pages_dirty(struct bio *bio)
  *   is the preferred way to end I/O on a bio, it takes care of decrementing
  *   bi_size and clearing BIO_UPTODATE on error. @error is 0 on success, and
  *   and one of the established -Exxxx (-EIO, for instance) error values in
- *   case something went wrong. Noone should call bi_end_io() directly on
+ *   case something went wrong. Noone should call bi_endio() directly on
  *   a bio unless they own it and thus know that it has an end_io function.
  **/
 void bio_endio(struct bio *bio, unsigned int bytes_done, int error)
@@ -1028,8 +1028,8 @@ void bio_endio(struct bio *bio, unsigned int bytes_done, int error)
 	bio->bi_size -= bytes_done;
 	bio->bi_sector += (bytes_done >> 9);
 
-	if (bio->bi_end_io)
-		bio->bi_end_io(bio, bytes_done, error);
+	if (bio_get_endio(bio))
+		(bio_get_endio(bio))(bio, bytes_done, error);
 }
 
 void bio_pair_release(struct bio_pair *bp)
@@ -1106,8 +1106,8 @@ struct bio_pair *bio_split(struct bio *bi, mempool_t *pool, int first_sectors)
 	bp->bio1.bi_max_vecs = 1;
 	bp->bio2.bi_max_vecs = 1;
 
-	bp->bio1.bi_end_io = bio_pair_end_1;
-	bp->bio2.bi_end_io = bio_pair_end_2;
+	bio_set_endio(&bp->bio1, bio_pair_end_1);
+	bio_set_endio(&bp->bio2, bio_pair_end_2);
 
 	bp->bio1.bi_private = bi;
 	bp->bio2.bi_private = pool;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 2980eab..62c0711 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -293,7 +293,7 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, const struct iovec *iov,
 		/* bio_alloc should not fail with GFP_KERNEL flag */
 		bio = bio_alloc(GFP_KERNEL, nvec);
 		bio->bi_bdev = I_BDEV(inode);
-		bio->bi_end_io = blk_end_aio;
+		bio_set_endio(bio, blk_end_aio);
 		bio->bi_private = iocb;
 		bio->bi_sector = pos >> blkbits;
 same_bio:
diff --git a/fs/buffer.c b/fs/buffer.c
index 0e5ec37..e4f8738 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2686,7 +2686,7 @@ int submit_bh(int rw, struct buffer_head * bh)
 	bio->bi_idx = 0;
 	bio->bi_size = bh->b_size;
 
-	bio->bi_end_io = end_bio_bh_io_sync;
+	bio_set_endio(bio, end_bio_bh_io_sync);
 	bio->bi_private = bh;
 
 	bio_get(bio);
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 901dc55..43c5b76 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -328,9 +328,9 @@ dio_bio_alloc(struct dio *dio, struct block_device *bdev,
 	bio->bi_bdev = bdev;
 	bio->bi_sector = first_sector;
 	if (dio->is_async)
-		bio->bi_end_io = dio_bio_end_aio;
+		bio_set_endio(bio, dio_bio_end_aio);
 	else
-		bio->bi_end_io = dio_bio_end_io;
+		bio_set_endio(bio, dio_bio_end_io);
 
 	dio->bio = bio;
 	return 0;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index f916b97..e32f69a 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -237,8 +237,7 @@ int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector)
 	bio->bi_sector = sector * (sb->s_blocksize >> 9);
 	bio->bi_bdev = sb->s_bdev;
 	bio_add_page(bio, page, PAGE_SIZE, 0);
-
-	bio->bi_end_io = end_bio_io_page;
+	bio_set_endio(bio, end_bio_io_page);
 	bio->bi_private = page;
 	submit_bio(READ_SYNC | (1 << BIO_RW_META), bio);
 	wait_on_page_locked(page);
diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index de3e4a5..d8e947a 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -2015,7 +2015,7 @@ static int lbmRead(struct jfs_log * log, int pn, struct lbuf ** bpp)
 	bio->bi_idx = 0;
 	bio->bi_size = LOGPSIZE;
 
-	bio->bi_end_io = lbmIODone;
+	bio_set_endio(bio, lbmIODone);
 	bio->bi_private = bp;
 	submit_bio(READ_SYNC, bio);
 
@@ -2156,7 +2156,7 @@ static void lbmStartIO(struct lbuf * bp)
 	bio->bi_idx = 0;
 	bio->bi_size = LOGPSIZE;
 
-	bio->bi_end_io = lbmIODone;
+	bio_set_endio(bio, lbmIODone);
 	bio->bi_private = bp;
 
 	/* check if journaling to disk has been disabled */
diff --git a/fs/jfs/jfs_metapage.c b/fs/jfs/jfs_metapage.c
index 62e96be..6924c4c 100644
--- a/fs/jfs/jfs_metapage.c
+++ b/fs/jfs/jfs_metapage.c
@@ -443,7 +443,7 @@ static int metapage_writepage(struct page *page, struct writeback_control *wbc)
 		bio = bio_alloc(GFP_NOFS, 1);
 		bio->bi_bdev = inode->i_sb->s_bdev;
 		bio->bi_sector = pblock << (inode->i_blkbits - 9);
-		bio->bi_end_io = metapage_write_end_io;
+		bio_set_endio(bio, metapage_write_end_io);
 		bio->bi_private = page;
 
 		/* Don't call bio_add_page yet, we may add to this vec */
@@ -513,7 +513,7 @@ static int metapage_readpage(struct file *fp, struct page *page)
 			bio = bio_alloc(GFP_NOFS, 1);
 			bio->bi_bdev = inode->i_sb->s_bdev;
 			bio->bi_sector = pblock << (inode->i_blkbits - 9);
-			bio->bi_end_io = metapage_read_end_io;
+			bio_set_endio(bio, metapage_read_end_io);
 			bio->bi_private = page;
 			len = xlen << inode->i_blkbits;
 			offset = block_offset << inode->i_blkbits;
diff --git a/fs/mpage.c b/fs/mpage.c
index c1698f2..b04ea49 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -92,9 +92,9 @@ static int mpage_end_io_write(struct bio *bio, unsigned int bytes_done, int err)
 
 static struct bio *mpage_bio_submit(int rw, struct bio *bio)
 {
-	bio->bi_end_io = mpage_end_io_read;
+	bio_set_endio(bio, mpage_end_io_read);
 	if (rw == WRITE)
-		bio->bi_end_io = mpage_end_io_write;
+		bio_set_endio(bio, mpage_end_io_write);
 	submit_bio(rw, bio);
 	return NULL;
 }
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 2bd7f78..1ea9ea3 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -265,7 +265,7 @@ static struct bio *o2hb_setup_one_bio(struct o2hb_region *reg,
 	bio->bi_sector = (reg->hr_start_block + cs) << (bits - 9);
 	bio->bi_bdev = reg->hr_bdev;
 	bio->bi_private = wc;
-	bio->bi_end_io = o2hb_bio_end_io;
+	bio_set_endio(bio, o2hb_bio_end_io);
 
 	vec_start = (cs << bits) % PAGE_CACHE_SIZE;
 	while(cs < max_slots) {
diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c
index 5f152f6..fcb5b09 100644
--- a/fs/xfs/linux-2.6/xfs_aops.c
+++ b/fs/xfs/linux-2.6/xfs_aops.c
@@ -338,8 +338,8 @@ xfs_end_bio(
 	ioend->io_error = test_bit(BIO_UPTODATE, &bio->bi_flags) ? 0 : error;
 
 	/* Toss bio and pass work off to an xfsdatad thread */
-	bio->bi_private = NULL;
-	bio->bi_end_io = NULL;
+	bio->bi_private = NULL; // why?
+	bio_set_endio(bio, NULL); // why?
 	bio_put(bio);
 
 	xfs_finish_ioend(ioend, 0);
@@ -354,7 +354,7 @@ xfs_submit_ioend_bio(
 	atomic_inc(&ioend->io_remaining);
 
 	bio->bi_private = ioend;
-	bio->bi_end_io = xfs_end_bio;
+	bio_set_endio(bio, xfs_end_bio);
 
 	submit_bio(WRITE, bio);
 	ASSERT(!bio_flagged(bio, BIO_EOPNOTSUPP));
diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 386fbff..2f5f9fd 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -1198,7 +1198,7 @@ _xfs_buf_ioapply(
 
 		bio->bi_bdev = bp->b_target->bt_bdev;
 		bio->bi_sector = sector - (offset >> BBSHIFT);
-		bio->bi_end_io = xfs_buf_bio_end_io;
+		bio_set_endio(bio, xfs_buf_bio_end_io);
 		bio->bi_private = bp;
 
 		bio_add_page(bio, bp->b_pages[0], PAGE_CACHE_SIZE, 0);
@@ -1236,7 +1236,7 @@ next_chunk:
 	bio = bio_alloc(GFP_NOIO, nr_pages);
 	bio->bi_bdev = bp->b_target->bt_bdev;
 	bio->bi_sector = sector;
-	bio->bi_end_io = xfs_buf_bio_end_io;
+	bio_set_endio(bio, xfs_buf_bio_end_io);
 	bio->bi_private = bp;
 
 	for (; size && nr_pages; nr_pages--, map_i++) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 1ddef34..4c8d7a0 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -108,7 +108,7 @@ struct bio {
 
 	struct bio_vec		*bi_io_vec;	/* the actual vec list */
 
-	bio_end_io_t		*bi_end_io;
+	bio_end_io_t		*bi_endio;
 	atomic_t		bi_cnt;		/* pin count */
 
 	void			*bi_private;
@@ -184,6 +184,16 @@ struct bio {
 #define bio_rw_ahead(bio)	((bio)->bi_rw & (1 << BIO_RW_AHEAD))
 #define bio_rw_meta(bio)	((bio)->bi_rw & (1 << BIO_RW_META))
 
+static inline void bio_set_endio(struct bio *bio, bio_end_io_t *endio)
+{
+	bio->bi_endio = endio;
+}
+
+static inline bio_end_io_t *bio_get_endio(struct bio *bio)
+{
+	return bio->bi_endio;
+}
+
 /*
  * will die
  */
diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index 917aba1..5b2f33a 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -70,7 +70,7 @@ static int submit(int rw, pgoff_t page_off, struct page *page,
 		return -ENOMEM;
 	bio->bi_sector = page_off * (PAGE_SIZE >> 9);
 	bio->bi_bdev = resume_bdev;
-	bio->bi_end_io = end_swap_bio_read;
+	bio_set_endio(bio, end_swap_bio_read);
 
 	if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
 		printk("swsusp: ERROR: adding page to bio at %ld\n", page_off);
diff --git a/mm/bounce.c b/mm/bounce.c
index 179fe38..3c0fbfc 100644
--- a/mm/bounce.c
+++ b/mm/bounce.c
@@ -262,13 +262,13 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig,
 	bio->bi_size = (*bio_orig)->bi_size;
 
 	if (pool == page_pool) {
-		bio->bi_end_io = bounce_end_io_write;
+		bio_set_endio(bio, bounce_end_io_write);
 		if (rw == READ)
-			bio->bi_end_io = bounce_end_io_read;
+			bio_set_endio(bio, bounce_end_io_read);
 	} else {
-		bio->bi_end_io = bounce_end_io_write_isa;
+		bio_set_endio(bio, bounce_end_io_write_isa);
 		if (rw == READ)
-			bio->bi_end_io = bounce_end_io_read_isa;
+			bio_set_endio(bio, bounce_end_io_read_isa);
 	}
 
 	bio->bi_private = *bio_orig;
diff --git a/mm/page_io.c b/mm/page_io.c
index dbffec0..f1aaee0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -39,7 +39,7 @@ static struct bio *get_swap_bio(gfp_t gfp_flags, pgoff_t index,
 		bio->bi_vcnt = 1;
 		bio->bi_idx = 0;
 		bio->bi_size = PAGE_SIZE;
-		bio->bi_end_io = end_io;
+		bio_set_endio(bio, end_io);
 	}
 	return bio;
 }
-- 
1.5.2.5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support - patch 3/4 for 2.6.23.17 (stable)
  2008-03-14 13:59 ` Alan D. Brunelle
  2008-03-14 14:04   ` [RFC] Stacking bio support - patch 1/4 for 2.6.23.17 (stable) Alan D. Brunelle
  2008-03-14 14:04   ` [RFC] Stacking bio support Alan D. Brunelle
@ 2008-03-14 14:05   ` Alan D. Brunelle
  2008-03-14 14:05   ` [RFC] Stacking bio support - patch 4/4 " Alan D. Brunelle
  2008-03-16 21:38   ` [RFC] Stacking bio support Daniel Phillips
  4 siblings, 0 replies; 14+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 14:05 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: Daniel Phillips, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: 0003-BIO-stack.patch --]
[-- Type: text/x-patch, Size: 4866 bytes --]

>From 1cb4c136df6a84b45d4e6bccbef159455b28e949 Mon Sep 17 00:00:00 2001
From: Alan D. Brunelle <alan.brunelle@hp.com>
Date: Fri, 14 Mar 2008 08:27:55 -0400
Subject: [PATCH] BIO stack


---
 fs/bio.c            |   64 +++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/bio.h |   14 ++++++++--
 2 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 29cdd8b..d57f0f3 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -34,6 +34,63 @@ static struct kmem_cache *bio_slab __read_mostly;
 
 #define BIOVEC_NR_POOLS 6
 
+/* Bio push/pop (C) 2008, Daniel Phillips <phillips@phunq.net> */
+
+#define BIOSTACK_ALIGN (1 << 2)
+#define BIOCHUNK_SIZE (1 << 7)
+#define BIOCHUNK_MAGIC (0xddc0ffee)
+#define BIOCHUNK_OVERHEAD (sizeof(struct biochunk) + sizeof(struct bioframe))
+#define BIOCHUNK_PAYLOAD (BIOCHUNK_SIZE - BIOCHUNK_OVERHEAD)
+
+struct biochunk { u32 magic; void *oldstack; char frames[]; };
+
+static struct kmem_cache *biospace __read_mostly;
+
+static struct biochunk *alloc_biochunk(void)
+{
+	return kmem_cache_alloc(biospace, __GFP_NOFAIL);
+}
+
+static void free_biochunk(struct biochunk *chunk)
+{
+	kmem_cache_free(biospace, chunk);
+}
+
+void *bio_push(struct bio *bio, unsigned size, bio_end_io_t *endio)
+{
+	struct bioframe *frame = bio->bi_stack;
+	size += sizeof(struct bioframe);
+	size += -size & (BIOSTACK_ALIGN - 1);
+	if (unlikely(size > frame->stacksize)) {
+		struct biochunk *chunk = alloc_biochunk();
+		bio_end_io_t *old = frame->endio;
+		BUG_ON(size > BIOCHUNK_PAYLOAD);
+		*chunk = (struct biochunk){ .oldstack = bio->bi_stack, .magic = BIOCHUNK_MAGIC };
+		frame = bio->bi_stack = chunk->frames;
+		*frame = (struct bioframe){ .stacksize = BIOCHUNK_PAYLOAD, .endio = old };
+	}
+	bio->bi_stack += size;
+	*(struct bioframe *)bio->bi_stack = (struct bioframe){
+		.stacksize = frame->stacksize - size,
+		.framesize = size, .endio = endio };
+	return frame->space;
+}
+EXPORT_SYMBOL_GPL(bio_push);
+
+void *bio_pop(struct bio *bio)
+{
+	struct bioframe *frame = bio->bi_stack;
+	if (unlikely(!frame->framesize)) {
+		struct biochunk *chunk = bio->bi_stack - sizeof(struct biochunk);
+		BUG_ON(chunk->magic != BIOCHUNK_MAGIC);
+		frame = bio->bi_stack = chunk->oldstack;
+		free_biochunk(chunk);
+	}
+	frame = bio->bi_stack -= frame->framesize;
+	return frame->space;
+}
+EXPORT_SYMBOL_GPL(bio_pop);
+
 /*
  * a small number of entries is fine, not going to be performance critical.
  * basically we just need to survive
@@ -113,6 +170,9 @@ void bio_free(struct bio *bio, struct bio_set *bio_set)
 
 	BIO_BUG_ON(pool_idx >= BIOVEC_NR_POOLS);
 
+	if (!((struct bioframe *)bio->bi_stack)->framesize)
+		bio_pop(bio);
+
 	mempool_free(bio->bi_io_vec, bio_set->bvec_pools[pool_idx]);
 	mempool_free(bio, bio_set->bio_pool);
 }
@@ -179,6 +239,8 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
 			bio->bi_max_vecs = bvec_slabs[idx].nr_vecs;
 		}
 		bio->bi_io_vec = bvl;
+		bio->bi_stack = &bio->space;
+		bio->space.framesize = sizeof(struct bioframe);
 	}
 out:
 	return bio;
@@ -1206,6 +1268,8 @@ static int __init init_bio(void)
 	if (!bio_split_pool)
 		panic("bio: can't create split pool\n");
 
+	biospace = kmem_cache_create("biospace", BIOCHUNK_SIZE, 0, SLAB_PANIC, NULL);
+
 	return 0;
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 4c8d7a0..cca89d1 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -68,6 +68,11 @@ typedef int (bio_end_io_t) (struct bio *, unsigned int, int);
 typedef void (bio_destructor_t) (struct bio *);
 
 /*
+ * Support endio handler stacking with per-handler private workspace
+ */
+struct bioframe { u16 framesize, stacksize; bio_end_io_t *endio; char space[]; };
+
+/*
  * main unit of I/O for the block layer and lower layers (ie drivers and
  * stacking drivers)
  */
@@ -114,6 +119,8 @@ struct bio {
 	void			*bi_private;
 
 	bio_destructor_t	*bi_destructor;	/* destructor */
+	void			*bi_stack;	/* endio stacking */
+	struct bioframe		space;
 };
 
 /*
@@ -186,12 +193,12 @@ struct bio {
 
 static inline void bio_set_endio(struct bio *bio, bio_end_io_t *endio)
 {
-	bio->bi_endio = endio;
+	((struct bioframe *)bio->bi_stack)->endio = endio;
 }
 
 static inline bio_end_io_t *bio_get_endio(struct bio *bio)
 {
-	return bio->bi_endio;
+	return ((struct bioframe *)bio->bi_stack)->endio;
 }
 
 /*
@@ -295,7 +302,8 @@ extern struct bio *bio_alloc(gfp_t, int);
 extern struct bio *bio_alloc_bioset(gfp_t, int, struct bio_set *);
 extern void bio_put(struct bio *);
 extern void bio_free(struct bio *, struct bio_set *);
-
+extern void *bio_push(struct bio *bio, unsigned size, bio_end_io_t *endio);
+extern void *bio_pop(struct bio *bio);
 extern void bio_endio(struct bio *, unsigned int, int);
 struct request_queue;
 extern int bio_phys_segments(struct request_queue *, struct bio *);
-- 
1.5.2.5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support - patch 4/4 for 2.6.23.17 (stable)
  2008-03-14 13:59 ` Alan D. Brunelle
                     ` (2 preceding siblings ...)
  2008-03-14 14:05   ` [RFC] Stacking bio support - patch 3/4 for 2.6.23.17 (stable) Alan D. Brunelle
@ 2008-03-14 14:05   ` Alan D. Brunelle
  2008-03-16 21:38   ` [RFC] Stacking bio support Daniel Phillips
  4 siblings, 0 replies; 14+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 14:05 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: Daniel Phillips, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1 bytes --]



[-- Attachment #2: 0004-Fixed-compilation-bugs.patch --]
[-- Type: text/x-patch, Size: 2232 bytes --]

>From 278ee2744eddfe67743ed3967bec3f9b773d9015 Mon Sep 17 00:00:00 2001
From: Alan D. Brunelle <alan.brunelle@hp.com>
Date: Fri, 14 Mar 2008 09:46:33 -0400
Subject: [PATCH] Fixed compilation bugs


---
 drivers/md/raid1.c |    4 ++--
 drivers/md/raid5.c |    6 +++---
 fs/bio.c           |    2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 63ffaaf..ce6b576 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -910,7 +910,7 @@ static int make_request(struct request_queue *q, struct bio * bio)
 
 		mbio->bi_sector	= r1_bio->sector + conf->mirrors[i].rdev->data_offset;
 		mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
-		bio_set_endio(mbio, raid1_end_write_request;)
+		bio_set_endio(mbio, raid1_end_write_request);
 		mbio->bi_rw = WRITE | do_barriers | do_sync;
 		mbio->bi_private = r1_bio;
 
@@ -1240,7 +1240,7 @@ static void sync_request_write(mddev_t *mddev, r1bio_t *r1_bio)
 			}
 		r1_bio->read_disk = primary;
 		for (i=0; i<mddev->raid_disks; i++)
-			if (bio_set_endio(r1_bio->bios[i], end_sync_read)) {
+			if (bio_get_endio(r1_bio->bios[i]) == end_sync_read) {
 				int j;
 				int vcnt = r1_bio->sectors >> (PAGE_SHIFT- 9);
 				struct bio *pbio = r1_bio->bios[primary];
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9fe55db..f83feb8 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3608,13 +3608,13 @@ static int make_request(struct request_queue *q, struct bio * bi)
 	spin_unlock_irq(&conf->device_lock);
 	if (remaining == 0) {
 		int bytes = bi->bi_size;
+		bio_end_io_t *endio = bio_get_endio(bi);
 
 		if ( rw == WRITE )
 			md_write_end(mddev);
 		bi->bi_size = 0;
-		(bio_get_endio(bi))((bi, bytes,
-			      test_bit(BIO_UPTODATE, &bi->bi_flags)
-			        ? 0 : -EIO);
+		endio(bi, bytes,
+			test_bit(BIO_UPTODATE, &bi->bi_flags) ? 0 : -EIO);
 	}
 	return 0;
 }
diff --git a/fs/bio.c b/fs/bio.c
index d57f0f3..0ecd345 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -199,7 +199,7 @@ void bio_init(struct bio *bio)
 	bio->bi_hw_back_size = 0;
 	bio->bi_size = 0;
 	bio->bi_max_vecs = 0;
-	bio->bi_end_io = NULL;
+	bio_set_endio(bio, NULL);
 	atomic_set(&bio->bi_cnt, 1);
 	bio->bi_private = NULL;
 }
-- 
1.5.2.5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-03-14 14:04   ` [RFC] Stacking bio support Alan D. Brunelle
@ 2008-03-14 14:06     ` Alan D. Brunelle
  0 siblings, 0 replies; 14+ messages in thread
From: Alan D. Brunelle @ 2008-03-14 14:06 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: Daniel Phillips, linux-kernel

Sorry, this is patch 2/4...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support - patch 1/4 for 2.6.23.17 (stable)
  2008-03-14 14:04   ` [RFC] Stacking bio support - patch 1/4 for 2.6.23.17 (stable) Alan D. Brunelle
@ 2008-03-14 19:41     ` Daniel Phillips
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Phillips @ 2008-03-14 19:41 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: linux-kernel

Thankyou very much :-)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-03-14 13:59 ` Alan D. Brunelle
                     ` (3 preceding siblings ...)
  2008-03-14 14:05   ` [RFC] Stacking bio support - patch 4/4 " Alan D. Brunelle
@ 2008-03-16 21:38   ` Daniel Phillips
  4 siblings, 0 replies; 14+ messages in thread
From: Daniel Phillips @ 2008-03-16 21:38 UTC (permalink / raw)
  To: Alan D. Brunelle; +Cc: linux-kernel

On Friday 14 March 2008 06:59, Alan D. Brunelle wrote:
> I was able to get the first 3 patches ported to the stable tree of
> 2.6.23, but there were compilation errors - fixed in the 4th patch
> in the stream I'll post next. Regardless of that, it still failed
> to boot on a 2-way AMD64. Do you have a set of these patches that
> builds correctly & boots! :-)    
 
Here you are:

http://code.google.com/p/zumastor/source/browse/www/lvm3/bio.single.alloc-2.6.24.3
http://code.google.com/p/zumastor/source/browse/www/lvm3/bio.hide.endio-2.6.24.3
http://code.google.com/p/zumastor/source/browse/www/lvm3/bio.stack-2.6.24.3
http://code.google.com/p/zumastor/source/browse/www/lvm3/dm.reduce.allocs-2.6.24.3

This compiles and boots for me, though I only tried device mapper
under UML, and then only lightly.  It would be nice to see some reports
on the stability on this, one way or the other.  Next move is to clean
up __clone_and_map the rest of the way.

Thanks for your merge.  Not easy, hmm?  The mainline bio changes
generated a bunch of tracking work, but they are sensible changes so
no regrets (the bogus endio bytes parameter and int return are gone).
Now... it appears to me that bi_size is an entirely redundant field,
so while we are turfing bio junk out...

Regards,

Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-03-11 12:38     ` Boaz Harrosh
@ 2008-04-10 13:04       ` Daniel Phillips
  2008-04-10 13:34         ` Boaz Harrosh
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Phillips @ 2008-04-10 13:04 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Ph. Marek, linux-kernel

On Tuesday 11 March 2008 05:38, Boaz Harrosh wrote:
> Do you have a public git tree that we can inspect from time to time.

(long lag...)

Yes I do, here:

   http://phunq.net/ddtree

And there will be a paper on the lvm3 design (for discussion) at Linuxtag
in Berlin, in May.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Stacking bio support
  2008-04-10 13:04       ` Daniel Phillips
@ 2008-04-10 13:34         ` Boaz Harrosh
  0 siblings, 0 replies; 14+ messages in thread
From: Boaz Harrosh @ 2008-04-10 13:34 UTC (permalink / raw)
  To: Daniel Phillips; +Cc: Ph. Marek, linux-kernel

On Thu, Apr 10 2008 at 16:04 +0300, Daniel Phillips <phillips@phunq.net> wrote:
> On Tuesday 11 March 2008 05:38, Boaz Harrosh wrote:
>> Do you have a public git tree that we can inspect from time to time.
> 
> (long lag...)
> 
> Yes I do, here:
> 
>    http://phunq.net/ddtree
> 
> And there will be a paper on the lvm3 design (for discussion) at Linuxtag
> in Berlin, in May.
> 
> Regards,
> 
> Daniel

Thanks I'll have a look. This is a brave and important work you are doing here
I envision all kind of stuff that can be done with this. I wish you the smoothest
possible way to get it accepted into the kernel.

Tell me if there is any thing I can do to help.

Boaz


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-04-10 13:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-11 10:52 [RFC] Stacking bio support Daniel Phillips
2008-03-11 11:33 ` Ph. Marek
2008-03-11 12:07   ` Daniel Phillips
2008-03-11 12:38     ` Boaz Harrosh
2008-04-10 13:04       ` Daniel Phillips
2008-04-10 13:34         ` Boaz Harrosh
2008-03-14 13:59 ` Alan D. Brunelle
2008-03-14 14:04   ` [RFC] Stacking bio support - patch 1/4 for 2.6.23.17 (stable) Alan D. Brunelle
2008-03-14 19:41     ` Daniel Phillips
2008-03-14 14:04   ` [RFC] Stacking bio support Alan D. Brunelle
2008-03-14 14:06     ` Alan D. Brunelle
2008-03-14 14:05   ` [RFC] Stacking bio support - patch 3/4 for 2.6.23.17 (stable) Alan D. Brunelle
2008-03-14 14:05   ` [RFC] Stacking bio support - patch 4/4 " Alan D. Brunelle
2008-03-16 21:38   ` [RFC] Stacking bio support Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).