LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* kmemcheck caught read from freed memory (cfq_free_io_context)
@ 2008-04-01 21:08 Vegard Nossum
2008-04-01 21:36 ` Peter Zijlstra
0 siblings, 1 reply; 69+ messages in thread
From: Vegard Nossum @ 2008-04-01 21:08 UTC (permalink / raw)
To: Jens Axboe
Cc: Ingo Molnar, Peter Zijlstra, Pekka Enberg, Linux Kernel Mailing List
Hi,
This appeared in my logs:
kmemcheck: Caught 32-bit read from freed memory (f7042348)
Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
EIP is at call_for_each_cic+0x2d/0x44
EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff4ff0 DR7: 00000400
[<c041cff8>] kmemcheck_read+0xa8/0xe0
[<c041d1d5>] kmemcheck_access+0x1a5/0x244
[<c0668252>] do_page_fault+0x622/0x6fc
[<c06666aa>] error_code+0x72/0x78
[<c050323f>] cfq_free_io_context+0xf/0x70
[<c04fc4d7>] put_io_context+0x4f/0x58
[<c04fc568>] exit_io_context+0x60/0x6c
[<c042f871>] do_exit+0x4d9/0x6f0
[<c042fab1>] do_group_exit+0x29/0x88
[<c042fb1f>] sys_exit_group+0xf/0x14
[<c0406105>] sysenter_past_esp+0x6d/0xa4
[<ffffffff>] 0xffffffff
The error occurs in cfq_free_io_context()'s call to
call_for_each_cic() which looks like this:
rcu_read_lock();
hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
func(ioc, cic);
called++;
}
rcu_read_unlock();
The function that is called is cic_free_func(). It is postulated that
hlist_for_each_entry_rcu() will dereference the previously freed list
element to get the ->next pointer.
After a short discussion with Pekka Enberg and Peter Zijlstra, it
seemed evident that this list traversal should use
hlist_for_each_entry_safe_rcu() instead, which would buffer the next
pointer before the object is freed.
Does this report seem to be valid?
The kernel is 2.6.25-rc7.
Kind regards,
Vegard Nossum
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-01 21:08 kmemcheck caught read from freed memory (cfq_free_io_context) Vegard Nossum
@ 2008-04-01 21:36 ` Peter Zijlstra
2008-04-01 22:51 ` Paul E. McKenney
2008-04-02 7:17 ` Jens Axboe
0 siblings, 2 replies; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-01 21:36 UTC (permalink / raw)
To: Vegard Nossum
Cc: Jens Axboe, Ingo Molnar, Pekka Enberg, Linux Kernel Mailing List,
Paul E. McKenney
On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> Hi,
>
> This appeared in my logs:
>
> kmemcheck: Caught 32-bit read from freed memory (f7042348)
>
> Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> EIP is at call_for_each_cic+0x2d/0x44
> EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff4ff0 DR7: 00000400
> [<c041cff8>] kmemcheck_read+0xa8/0xe0
> [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> [<c0668252>] do_page_fault+0x622/0x6fc
> [<c06666aa>] error_code+0x72/0x78
> [<c050323f>] cfq_free_io_context+0xf/0x70
> [<c04fc4d7>] put_io_context+0x4f/0x58
> [<c04fc568>] exit_io_context+0x60/0x6c
> [<c042f871>] do_exit+0x4d9/0x6f0
> [<c042fab1>] do_group_exit+0x29/0x88
> [<c042fb1f>] sys_exit_group+0xf/0x14
> [<c0406105>] sysenter_past_esp+0x6d/0xa4
> [<ffffffff>] 0xffffffff
>
> The error occurs in cfq_free_io_context()'s call to
> call_for_each_cic() which looks like this:
>
> rcu_read_lock();
> hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> func(ioc, cic);
> called++;
> }
> rcu_read_unlock();
>
> The function that is called is cic_free_func(). It is postulated that
> hlist_for_each_entry_rcu() will dereference the previously freed list
> element to get the ->next pointer.
>
> After a short discussion with Pekka Enberg and Peter Zijlstra, it
> seemed evident that this list traversal should use
> hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> pointer before the object is freed.
>
> Does this report seem to be valid?
>
> The kernel is 2.6.25-rc7.
The missing hlist for loop would look something like so:
#define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
for (pos = (head)->first; \
rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
pos = n)
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-01 21:36 ` Peter Zijlstra
@ 2008-04-01 22:51 ` Paul E. McKenney
2008-04-02 6:15 ` Peter Zijlstra
2008-04-02 7:17 ` Jens Axboe
1 sibling, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-01 22:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Vegard Nossum, Jens Axboe, Ingo Molnar, Pekka Enberg,
Linux Kernel Mailing List
On Tue, Apr 01, 2008 at 11:36:28PM +0200, Peter Zijlstra wrote:
> On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> > Hi,
> >
> > This appeared in my logs:
> >
> > kmemcheck: Caught 32-bit read from freed memory (f7042348)
> >
> > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> > EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> > EIP is at call_for_each_cic+0x2d/0x44
> > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > DR6: ffff4ff0 DR7: 00000400
> > [<c041cff8>] kmemcheck_read+0xa8/0xe0
> > [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> > [<c0668252>] do_page_fault+0x622/0x6fc
> > [<c06666aa>] error_code+0x72/0x78
> > [<c050323f>] cfq_free_io_context+0xf/0x70
> > [<c04fc4d7>] put_io_context+0x4f/0x58
> > [<c04fc568>] exit_io_context+0x60/0x6c
> > [<c042f871>] do_exit+0x4d9/0x6f0
> > [<c042fab1>] do_group_exit+0x29/0x88
> > [<c042fb1f>] sys_exit_group+0xf/0x14
> > [<c0406105>] sysenter_past_esp+0x6d/0xa4
> > [<ffffffff>] 0xffffffff
> >
> > The error occurs in cfq_free_io_context()'s call to
> > call_for_each_cic() which looks like this:
> >
> > rcu_read_lock();
> > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> > func(ioc, cic);
> > called++;
> > }
> > rcu_read_unlock();
> >
> > The function that is called is cic_free_func(). It is postulated that
> > hlist_for_each_entry_rcu() will dereference the previously freed list
> > element to get the ->next pointer.
> >
> > After a short discussion with Pekka Enberg and Peter Zijlstra, it
> > seemed evident that this list traversal should use
> > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> > pointer before the object is freed.
> >
> > Does this report seem to be valid?
> >
> > The kernel is 2.6.25-rc7.
>
> The missing hlist for loop would look something like so:
>
> #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
> for (pos = (head)->first; \
> rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
> ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> pos = n)
Why the heck is cic_free_func() immediately doing a kmem_cache_free()
on the cfq_io_context structure??? Shouldn't we have a call_rcu() or a
synchronize_rcu() in there somewhere??? Given the way this is written,
wouldn't readers on other code paths get dumped onto the freelist?
This would not be a good thing...
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-01 22:51 ` Paul E. McKenney
@ 2008-04-02 6:15 ` Peter Zijlstra
2008-04-02 7:19 ` Jens Axboe
2008-04-02 10:24 ` Paul E. McKenney
0 siblings, 2 replies; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 6:15 UTC (permalink / raw)
To: paulmck
Cc: Vegard Nossum, Jens Axboe, Ingo Molnar, Pekka Enberg,
Linux Kernel Mailing List
On Tue, 2008-04-01 at 15:51 -0700, Paul E. McKenney wrote:
> On Tue, Apr 01, 2008 at 11:36:28PM +0200, Peter Zijlstra wrote:
> > On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> > > Hi,
> > >
> > > This appeared in my logs:
> > >
> > > kmemcheck: Caught 32-bit read from freed memory (f7042348)
> > >
> > > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> > > EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> > > EIP is at call_for_each_cic+0x2d/0x44
> > > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> > > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > DR6: ffff4ff0 DR7: 00000400
> > > [<c041cff8>] kmemcheck_read+0xa8/0xe0
> > > [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> > > [<c0668252>] do_page_fault+0x622/0x6fc
> > > [<c06666aa>] error_code+0x72/0x78
> > > [<c050323f>] cfq_free_io_context+0xf/0x70
> > > [<c04fc4d7>] put_io_context+0x4f/0x58
> > > [<c04fc568>] exit_io_context+0x60/0x6c
> > > [<c042f871>] do_exit+0x4d9/0x6f0
> > > [<c042fab1>] do_group_exit+0x29/0x88
> > > [<c042fb1f>] sys_exit_group+0xf/0x14
> > > [<c0406105>] sysenter_past_esp+0x6d/0xa4
> > > [<ffffffff>] 0xffffffff
> > >
> > > The error occurs in cfq_free_io_context()'s call to
> > > call_for_each_cic() which looks like this:
> > >
> > > rcu_read_lock();
> > > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> > > func(ioc, cic);
> > > called++;
> > > }
> > > rcu_read_unlock();
> > >
> > > The function that is called is cic_free_func(). It is postulated that
> > > hlist_for_each_entry_rcu() will dereference the previously freed list
> > > element to get the ->next pointer.
> > >
> > > After a short discussion with Pekka Enberg and Peter Zijlstra, it
> > > seemed evident that this list traversal should use
> > > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> > > pointer before the object is freed.
> > >
> > > Does this report seem to be valid?
> > >
> > > The kernel is 2.6.25-rc7.
> >
> > The missing hlist for loop would look something like so:
> >
> > #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
> > for (pos = (head)->first; \
> > rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
> > ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> > pos = n)
>
> Why the heck is cic_free_func() immediately doing a kmem_cache_free()
> on the cfq_io_context structure??? Shouldn't we have a call_rcu() or a
> synchronize_rcu() in there somewhere??? Given the way this is written,
> wouldn't readers on other code paths get dumped onto the freelist?
> This would not be a good thing...
I was told, but didn't check for myself it is using SLAB_RCU. Of course
that requires an object validation pass, and I didn't check it does that
either.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-01 21:36 ` Peter Zijlstra
2008-04-01 22:51 ` Paul E. McKenney
@ 2008-04-02 7:17 ` Jens Axboe
2008-04-02 7:20 ` Pekka J Enberg
2008-04-02 10:40 ` Paul E. McKenney
1 sibling, 2 replies; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 7:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Vegard Nossum, Ingo Molnar, Pekka Enberg,
Linux Kernel Mailing List, Paul E. McKenney
On Tue, Apr 01 2008, Peter Zijlstra wrote:
> On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> > Hi,
> >
> > This appeared in my logs:
> >
> > kmemcheck: Caught 32-bit read from freed memory (f7042348)
> >
> > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> > EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> > EIP is at call_for_each_cic+0x2d/0x44
> > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > DR6: ffff4ff0 DR7: 00000400
> > [<c041cff8>] kmemcheck_read+0xa8/0xe0
> > [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> > [<c0668252>] do_page_fault+0x622/0x6fc
> > [<c06666aa>] error_code+0x72/0x78
> > [<c050323f>] cfq_free_io_context+0xf/0x70
> > [<c04fc4d7>] put_io_context+0x4f/0x58
> > [<c04fc568>] exit_io_context+0x60/0x6c
> > [<c042f871>] do_exit+0x4d9/0x6f0
> > [<c042fab1>] do_group_exit+0x29/0x88
> > [<c042fb1f>] sys_exit_group+0xf/0x14
> > [<c0406105>] sysenter_past_esp+0x6d/0xa4
> > [<ffffffff>] 0xffffffff
> >
> > The error occurs in cfq_free_io_context()'s call to
> > call_for_each_cic() which looks like this:
> >
> > rcu_read_lock();
> > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> > func(ioc, cic);
> > called++;
> > }
> > rcu_read_unlock();
> >
> > The function that is called is cic_free_func(). It is postulated that
> > hlist_for_each_entry_rcu() will dereference the previously freed list
> > element to get the ->next pointer.
> >
> > After a short discussion with Pekka Enberg and Peter Zijlstra, it
> > seemed evident that this list traversal should use
> > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> > pointer before the object is freed.
> >
> > Does this report seem to be valid?
> >
> > The kernel is 2.6.25-rc7.
>
> The missing hlist for loop would look something like so:
>
> #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
> for (pos = (head)->first; \
> rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
> ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> pos = n)
Good catch, I wonder why it didn't complain in my testing. I've added a
patch to fix that, please see it here:
http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=51998b2da4e5db65cb24317329059044083ea151
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 6:15 ` Peter Zijlstra
@ 2008-04-02 7:19 ` Jens Axboe
2008-04-02 10:24 ` Paul E. McKenney
1 sibling, 0 replies; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 7:19 UTC (permalink / raw)
To: Peter Zijlstra
Cc: paulmck, Vegard Nossum, Ingo Molnar, Pekka Enberg,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Peter Zijlstra wrote:
> On Tue, 2008-04-01 at 15:51 -0700, Paul E. McKenney wrote:
> > On Tue, Apr 01, 2008 at 11:36:28PM +0200, Peter Zijlstra wrote:
> > > On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> > > > Hi,
> > > >
> > > > This appeared in my logs:
> > > >
> > > > kmemcheck: Caught 32-bit read from freed memory (f7042348)
> > > >
> > > > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> > > > EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> > > > EIP is at call_for_each_cic+0x2d/0x44
> > > > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> > > > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > DR6: ffff4ff0 DR7: 00000400
> > > > [<c041cff8>] kmemcheck_read+0xa8/0xe0
> > > > [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> > > > [<c0668252>] do_page_fault+0x622/0x6fc
> > > > [<c06666aa>] error_code+0x72/0x78
> > > > [<c050323f>] cfq_free_io_context+0xf/0x70
> > > > [<c04fc4d7>] put_io_context+0x4f/0x58
> > > > [<c04fc568>] exit_io_context+0x60/0x6c
> > > > [<c042f871>] do_exit+0x4d9/0x6f0
> > > > [<c042fab1>] do_group_exit+0x29/0x88
> > > > [<c042fb1f>] sys_exit_group+0xf/0x14
> > > > [<c0406105>] sysenter_past_esp+0x6d/0xa4
> > > > [<ffffffff>] 0xffffffff
> > > >
> > > > The error occurs in cfq_free_io_context()'s call to
> > > > call_for_each_cic() which looks like this:
> > > >
> > > > rcu_read_lock();
> > > > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> > > > func(ioc, cic);
> > > > called++;
> > > > }
> > > > rcu_read_unlock();
> > > >
> > > > The function that is called is cic_free_func(). It is postulated that
> > > > hlist_for_each_entry_rcu() will dereference the previously freed list
> > > > element to get the ->next pointer.
> > > >
> > > > After a short discussion with Pekka Enberg and Peter Zijlstra, it
> > > > seemed evident that this list traversal should use
> > > > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> > > > pointer before the object is freed.
> > > >
> > > > Does this report seem to be valid?
> > > >
> > > > The kernel is 2.6.25-rc7.
> > >
> > > The missing hlist for loop would look something like so:
> > >
> > > #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
> > > for (pos = (head)->first; \
> > > rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
> > > ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> > > pos = n)
> >
> > Why the heck is cic_free_func() immediately doing a kmem_cache_free()
> > on the cfq_io_context structure??? Shouldn't we have a call_rcu() or a
> > synchronize_rcu() in there somewhere??? Given the way this is written,
> > wouldn't readers on other code paths get dumped onto the freelist?
> > This would not be a good thing...
>
> I was told, but didn't check for myself it is using SLAB_RCU. Of course
> that requires an object validation pass, and I didn't check it does that
> either.
Yes, it's using SLAB_DESTROY_BY_RCU. If you find faults with that,
please let me know.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 7:17 ` Jens Axboe
@ 2008-04-02 7:20 ` Pekka J Enberg
2008-04-02 7:24 ` Jens Axboe
2008-04-02 10:40 ` Paul E. McKenney
1 sibling, 1 reply; 69+ messages in thread
From: Pekka J Enberg @ 2008-04-02 7:20 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, Vegard Nossum, Ingo Molnar,
Linux Kernel Mailing List, Paul E. McKenney
On Wed, 2 Apr 2008, Jens Axboe wrote:
> Good catch, I wonder why it didn't complain in my testing. I've added a
> patch to fix that, please see it here:
You probably don't have kmemcheck in your kernel ;-)
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 7:20 ` Pekka J Enberg
@ 2008-04-02 7:24 ` Jens Axboe
2008-04-02 7:28 ` Ingo Molnar
0 siblings, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 7:24 UTC (permalink / raw)
To: Pekka J Enberg
Cc: Peter Zijlstra, Vegard Nossum, Ingo Molnar,
Linux Kernel Mailing List, Paul E. McKenney
On Wed, Apr 02 2008, Pekka J Enberg wrote:
> On Wed, 2 Apr 2008, Jens Axboe wrote:
> > Good catch, I wonder why it didn't complain in my testing. I've added a
> > patch to fix that, please see it here:
>
> You probably don't have kmemcheck in your kernel ;-)
Ehm no, you are right :)
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 7:24 ` Jens Axboe
@ 2008-04-02 7:28 ` Ingo Molnar
2008-04-02 7:31 ` Jens Axboe
2008-04-02 10:55 ` Paul E. McKenney
0 siblings, 2 replies; 69+ messages in thread
From: Ingo Molnar @ 2008-04-02 7:28 UTC (permalink / raw)
To: Jens Axboe
Cc: Pekka J Enberg, Peter Zijlstra, Vegard Nossum,
Linux Kernel Mailing List, Paul E. McKenney
* Jens Axboe <jens.axboe@oracle.com> wrote:
> On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > patch to fix that, please see it here:
> >
> > You probably don't have kmemcheck in your kernel ;-)
>
> Ehm no, you are right :)
... and you can get kmemcheck by testing on x86.git/latest:
http://people.redhat.com/mingo/x86.git/README
;-)
Ingo
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 7:28 ` Ingo Molnar
@ 2008-04-02 7:31 ` Jens Axboe
2008-04-02 10:55 ` Paul E. McKenney
1 sibling, 0 replies; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 7:31 UTC (permalink / raw)
To: Ingo Molnar
Cc: Pekka J Enberg, Peter Zijlstra, Vegard Nossum,
Linux Kernel Mailing List, Paul E. McKenney
On Wed, Apr 02 2008, Ingo Molnar wrote:
>
> * Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > patch to fix that, please see it here:
> > >
> > > You probably don't have kmemcheck in your kernel ;-)
> >
> > Ehm no, you are right :)
>
> ... and you can get kmemcheck by testing on x86.git/latest:
>
> http://people.redhat.com/mingo/x86.git/README
>
> ;-)
Thanks Ingo, I'll give it a spin to verify this fix!
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 6:15 ` Peter Zijlstra
2008-04-02 7:19 ` Jens Axboe
@ 2008-04-02 10:24 ` Paul E. McKenney
1 sibling, 0 replies; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 10:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Vegard Nossum, Jens Axboe, Ingo Molnar, Pekka Enberg,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 08:15:56AM +0200, Peter Zijlstra wrote:
> On Tue, 2008-04-01 at 15:51 -0700, Paul E. McKenney wrote:
> > On Tue, Apr 01, 2008 at 11:36:28PM +0200, Peter Zijlstra wrote:
> > > On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> > > > Hi,
> > > >
> > > > This appeared in my logs:
> > > >
> > > > kmemcheck: Caught 32-bit read from freed memory (f7042348)
> > > >
> > > > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> > > > EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> > > > EIP is at call_for_each_cic+0x2d/0x44
> > > > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> > > > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > DR6: ffff4ff0 DR7: 00000400
> > > > [<c041cff8>] kmemcheck_read+0xa8/0xe0
> > > > [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> > > > [<c0668252>] do_page_fault+0x622/0x6fc
> > > > [<c06666aa>] error_code+0x72/0x78
> > > > [<c050323f>] cfq_free_io_context+0xf/0x70
> > > > [<c04fc4d7>] put_io_context+0x4f/0x58
> > > > [<c04fc568>] exit_io_context+0x60/0x6c
> > > > [<c042f871>] do_exit+0x4d9/0x6f0
> > > > [<c042fab1>] do_group_exit+0x29/0x88
> > > > [<c042fb1f>] sys_exit_group+0xf/0x14
> > > > [<c0406105>] sysenter_past_esp+0x6d/0xa4
> > > > [<ffffffff>] 0xffffffff
> > > >
> > > > The error occurs in cfq_free_io_context()'s call to
> > > > call_for_each_cic() which looks like this:
> > > >
> > > > rcu_read_lock();
> > > > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> > > > func(ioc, cic);
> > > > called++;
> > > > }
> > > > rcu_read_unlock();
> > > >
> > > > The function that is called is cic_free_func(). It is postulated that
> > > > hlist_for_each_entry_rcu() will dereference the previously freed list
> > > > element to get the ->next pointer.
> > > >
> > > > After a short discussion with Pekka Enberg and Peter Zijlstra, it
> > > > seemed evident that this list traversal should use
> > > > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> > > > pointer before the object is freed.
> > > >
> > > > Does this report seem to be valid?
> > > >
> > > > The kernel is 2.6.25-rc7.
> > >
> > > The missing hlist for loop would look something like so:
> > >
> > > #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
> > > for (pos = (head)->first; \
> > > rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
> > > ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> > > pos = n)
> >
> > Why the heck is cic_free_func() immediately doing a kmem_cache_free()
> > on the cfq_io_context structure??? Shouldn't we have a call_rcu() or a
> > synchronize_rcu() in there somewhere??? Given the way this is written,
> > wouldn't readers on other code paths get dumped onto the freelist?
> > This would not be a good thing...
>
> I was told, but didn't check for myself it is using SLAB_RCU. Of course
> that requires an object validation pass, and I didn't check it does that
> either.
But doesn't SLAB_RCU only wait for a grace period when returning a
slab to the system, rather than on each kmem_cache_free()?
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 7:17 ` Jens Axboe
2008-04-02 7:20 ` Pekka J Enberg
@ 2008-04-02 10:40 ` Paul E. McKenney
2008-04-02 10:46 ` Pekka Enberg
2008-04-02 10:53 ` Peter Zijlstra
1 sibling, 2 replies; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 10:40 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, Vegard Nossum, Ingo Molnar, Pekka Enberg,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 09:17:10AM +0200, Jens Axboe wrote:
> On Tue, Apr 01 2008, Peter Zijlstra wrote:
> > On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> > > Hi,
> > >
> > > This appeared in my logs:
> > >
> > > kmemcheck: Caught 32-bit read from freed memory (f7042348)
> > >
> > > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> > > EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> > > EIP is at call_for_each_cic+0x2d/0x44
> > > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> > > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > DR6: ffff4ff0 DR7: 00000400
> > > [<c041cff8>] kmemcheck_read+0xa8/0xe0
> > > [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> > > [<c0668252>] do_page_fault+0x622/0x6fc
> > > [<c06666aa>] error_code+0x72/0x78
> > > [<c050323f>] cfq_free_io_context+0xf/0x70
> > > [<c04fc4d7>] put_io_context+0x4f/0x58
> > > [<c04fc568>] exit_io_context+0x60/0x6c
> > > [<c042f871>] do_exit+0x4d9/0x6f0
> > > [<c042fab1>] do_group_exit+0x29/0x88
> > > [<c042fb1f>] sys_exit_group+0xf/0x14
> > > [<c0406105>] sysenter_past_esp+0x6d/0xa4
> > > [<ffffffff>] 0xffffffff
> > >
> > > The error occurs in cfq_free_io_context()'s call to
> > > call_for_each_cic() which looks like this:
> > >
> > > rcu_read_lock();
> > > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> > > func(ioc, cic);
> > > called++;
> > > }
> > > rcu_read_unlock();
> > >
> > > The function that is called is cic_free_func(). It is postulated that
> > > hlist_for_each_entry_rcu() will dereference the previously freed list
> > > element to get the ->next pointer.
> > >
> > > After a short discussion with Pekka Enberg and Peter Zijlstra, it
> > > seemed evident that this list traversal should use
> > > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> > > pointer before the object is freed.
> > >
> > > Does this report seem to be valid?
> > >
> > > The kernel is 2.6.25-rc7.
> >
> > The missing hlist for loop would look something like so:
> >
> > #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
> > for (pos = (head)->first; \
> > rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
> > ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> > pos = n)
>
> Good catch, I wonder why it didn't complain in my testing. I've added a
> patch to fix that, please see it here:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=51998b2da4e5db65cb24317329059044083ea151
I am still confused.
o The hlist_for_each_entry_safe_rcu() is under rcu_read_lock().
o The kmem_cache has SLAB_DESTROY_BY_RCU.
o This means that a given slab should not be returned to the
system until a grace period elapses.
o So the bugginess (or not) of this code should not be affected
by adding hlist_for_each_entry_safe_rcu() here.
(I am not seeing the checks that would be needed to avoid
something being kmem_cache_free()ed while being accessed,
but might be missing something.)
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:40 ` Paul E. McKenney
@ 2008-04-02 10:46 ` Pekka Enberg
2008-04-02 10:49 ` Peter Zijlstra
2008-04-02 10:53 ` Peter Zijlstra
1 sibling, 1 reply; 69+ messages in thread
From: Pekka Enberg @ 2008-04-02 10:46 UTC (permalink / raw)
To: paulmck
Cc: Jens Axboe, Peter Zijlstra, Vegard Nossum, Ingo Molnar,
Linux Kernel Mailing List, Christoph Lameter
Hi Paul,
On Wed, Apr 2, 2008 at 1:40 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> I am still confused.
>
> o The kmem_cache has SLAB_DESTROY_BY_RCU.
>
> o This means that a given slab should not be returned to the
> system until a grace period elapses.
Yeah, that's what I thought too, that this is a SLUB bug but Peter
convinced me otherwise. SLUB keeps the _page_ around so the pointer
will be _valid_, although it might not be _your_ pointer so the caller
needs to do some validation step. Or at least that's how I understood
what Peter was saying.
Pekka
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:46 ` Pekka Enberg
@ 2008-04-02 10:49 ` Peter Zijlstra
2008-04-02 10:54 ` Pekka J Enberg
2008-04-02 17:35 ` Christoph Lameter
0 siblings, 2 replies; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 10:49 UTC (permalink / raw)
To: Pekka Enberg
Cc: paulmck, Jens Axboe, Vegard Nossum, Ingo Molnar,
Linux Kernel Mailing List, Christoph Lameter
On Wed, 2008-04-02 at 13:46 +0300, Pekka Enberg wrote:
> Hi Paul,
>
> On Wed, Apr 2, 2008 at 1:40 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > I am still confused.
> >
> > o The kmem_cache has SLAB_DESTROY_BY_RCU.
> >
> > o This means that a given slab should not be returned to the
> > system until a grace period elapses.
>
> Yeah, that's what I thought too, that this is a SLUB bug but Peter
> convinced me otherwise. SLUB keeps the _page_ around so the pointer
> will be _valid_, although it might not be _your_ pointer so the caller
> needs to do some validation step. Or at least that's how I understood
> what Peter was saying.
Correct, that is always how i understood SLAB_DESTROY_BY_RCU to work.
Does SLAB (as opposed to SLUB) do it differently?
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:40 ` Paul E. McKenney
2008-04-02 10:46 ` Pekka Enberg
@ 2008-04-02 10:53 ` Peter Zijlstra
2008-04-02 11:13 ` Jens Axboe
1 sibling, 1 reply; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 10:53 UTC (permalink / raw)
To: paulmck
Cc: Jens Axboe, Vegard Nossum, Ingo Molnar, Pekka Enberg,
Linux Kernel Mailing List
On Wed, 2008-04-02 at 03:40 -0700, Paul E. McKenney wrote:
> On Wed, Apr 02, 2008 at 09:17:10AM +0200, Jens Axboe wrote:
> > On Tue, Apr 01 2008, Peter Zijlstra wrote:
> > > On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> > > > Hi,
> > > >
> > > > This appeared in my logs:
> > > >
> > > > kmemcheck: Caught 32-bit read from freed memory (f7042348)
> > > >
> > > > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> > > > EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> > > > EIP is at call_for_each_cic+0x2d/0x44
> > > > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> > > > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > DR6: ffff4ff0 DR7: 00000400
> > > > [<c041cff8>] kmemcheck_read+0xa8/0xe0
> > > > [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> > > > [<c0668252>] do_page_fault+0x622/0x6fc
> > > > [<c06666aa>] error_code+0x72/0x78
> > > > [<c050323f>] cfq_free_io_context+0xf/0x70
> > > > [<c04fc4d7>] put_io_context+0x4f/0x58
> > > > [<c04fc568>] exit_io_context+0x60/0x6c
> > > > [<c042f871>] do_exit+0x4d9/0x6f0
> > > > [<c042fab1>] do_group_exit+0x29/0x88
> > > > [<c042fb1f>] sys_exit_group+0xf/0x14
> > > > [<c0406105>] sysenter_past_esp+0x6d/0xa4
> > > > [<ffffffff>] 0xffffffff
> > > >
> > > > The error occurs in cfq_free_io_context()'s call to
> > > > call_for_each_cic() which looks like this:
> > > >
> > > > rcu_read_lock();
> > > > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> > > > func(ioc, cic);
> > > > called++;
> > > > }
> > > > rcu_read_unlock();
> > > >
> > > > The function that is called is cic_free_func(). It is postulated that
> > > > hlist_for_each_entry_rcu() will dereference the previously freed list
> > > > element to get the ->next pointer.
> > > >
> > > > After a short discussion with Pekka Enberg and Peter Zijlstra, it
> > > > seemed evident that this list traversal should use
> > > > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> > > > pointer before the object is freed.
> > > >
> > > > Does this report seem to be valid?
> > > >
> > > > The kernel is 2.6.25-rc7.
> > >
> > > The missing hlist for loop would look something like so:
> > >
> > > #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
> > > for (pos = (head)->first; \
> > > rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
> > > ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> > > pos = n)
> >
> > Good catch, I wonder why it didn't complain in my testing. I've added a
> > patch to fix that, please see it here:
> >
> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=51998b2da4e5db65cb24317329059044083ea151
>
> I am still confused.
>
> o The hlist_for_each_entry_safe_rcu() is under rcu_read_lock().
>
> o The kmem_cache has SLAB_DESTROY_BY_RCU.
>
> o This means that a given slab should not be returned to the
> system until a grace period elapses.
>
> o So the bugginess (or not) of this code should not be affected
> by adding hlist_for_each_entry_safe_rcu() here.
>
> (I am not seeing the checks that would be needed to avoid
> something being kmem_cache_free()ed while being accessed,
> but might be missing something.)
Agreed, when looking at this code its not making sense.
cfq_cic_lookup() is also mightily confusing. Only the actual
radix_tree_lookup() call is protected by RCU, I'm not seeing what
guarantees the existance of cic after rcu_read_unlock().
Nor does it do a validation check to see if cic->key == cfqd, something
that would be needed when using SLAB_DESTROY_BY_RCU.
This is most fishy code.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:49 ` Peter Zijlstra
@ 2008-04-02 10:54 ` Pekka J Enberg
2008-04-02 17:35 ` Christoph Lameter
1 sibling, 0 replies; 69+ messages in thread
From: Pekka J Enberg @ 2008-04-02 10:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: paulmck, Jens Axboe, Vegard Nossum, Ingo Molnar,
Linux Kernel Mailing List, Christoph Lameter
On Wed, 2 Apr 2008, Peter Zijlstra wrote:
> Correct, that is always how i understood SLAB_DESTROY_BY_RCU to work.
>
> Does SLAB (as opposed to SLUB) do it differently?
No, SLAB works this way as well.
Pekka
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 7:28 ` Ingo Molnar
2008-04-02 7:31 ` Jens Axboe
@ 2008-04-02 10:55 ` Paul E. McKenney
2008-04-02 10:59 ` Peter Zijlstra
2008-04-02 11:01 ` Pekka Enberg
1 sibling, 2 replies; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 10:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Jens Axboe, Pekka J Enberg, Peter Zijlstra, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
>
> * Jens Axboe <jens.axboe@oracle.com> wrote:
>
> > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > patch to fix that, please see it here:
> > >
> > > You probably don't have kmemcheck in your kernel ;-)
> >
> > Ehm no, you are right :)
>
> ... and you can get kmemcheck by testing on x86.git/latest:
>
> http://people.redhat.com/mingo/x86.git/README
>
> ;-)
I will check this when I get back to some bandwidth -- but in the meantime,
does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
newly-freed items in that case, as long as you did rcu_read_lock()
before gaining a reference to them and don't hold the reference past
the matching rcu_read_unlock().
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:55 ` Paul E. McKenney
@ 2008-04-02 10:59 ` Peter Zijlstra
2008-04-02 11:33 ` Fabio Checconi
2008-04-02 11:01 ` Pekka Enberg
1 sibling, 1 reply; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 10:59 UTC (permalink / raw)
To: paulmck
Cc: Ingo Molnar, Jens Axboe, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
> On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
> >
> > * Jens Axboe <jens.axboe@oracle.com> wrote:
> >
> > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > > patch to fix that, please see it here:
> > > >
> > > > You probably don't have kmemcheck in your kernel ;-)
> > >
> > > Ehm no, you are right :)
> >
> > ... and you can get kmemcheck by testing on x86.git/latest:
> >
> > http://people.redhat.com/mingo/x86.git/README
> >
> > ;-)
>
> I will check this when I get back to some bandwidth -- but in the meantime,
> does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> newly-freed items in that case, as long as you did rcu_read_lock()
> before gaining a reference to them and don't hold the reference past
> the matching rcu_read_unlock().
I don't think it does.
It would have to register an call_rcu callback itself in order to mark
it freed - and handle the race with the object being handed out again.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:55 ` Paul E. McKenney
2008-04-02 10:59 ` Peter Zijlstra
@ 2008-04-02 11:01 ` Pekka Enberg
2008-04-02 11:07 ` Jens Axboe
2008-04-02 16:08 ` Paul E. McKenney
1 sibling, 2 replies; 69+ messages in thread
From: Pekka Enberg @ 2008-04-02 11:01 UTC (permalink / raw)
To: paulmck
Cc: Ingo Molnar, Jens Axboe, Peter Zijlstra, Vegard Nossum,
Linux Kernel Mailing List
Hi Paul,
On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> I will check this when I get back to some bandwidth -- but in the meantime,
> does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> newly-freed items in that case, as long as you did rcu_read_lock()
> before gaining a reference to them and don't hold the reference past
> the matching rcu_read_unlock().
No, kmemcheck is work in progress and does not know about
SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
was because Peter, Vegard, and myself identified this particular
warning as a real problem. But yeah, kmemcheck can cause false
positives for RCU for now.
Pekka
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:01 ` Pekka Enberg
@ 2008-04-02 11:07 ` Jens Axboe
2008-04-02 11:08 ` Peter Zijlstra
2008-04-02 16:08 ` Paul E. McKenney
1 sibling, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 11:07 UTC (permalink / raw)
To: Pekka Enberg
Cc: paulmck, Ingo Molnar, Peter Zijlstra, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Pekka Enberg wrote:
> Hi Paul,
>
> On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > I will check this when I get back to some bandwidth -- but in the meantime,
> > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > newly-freed items in that case, as long as you did rcu_read_lock()
> > before gaining a reference to them and don't hold the reference past
> > the matching rcu_read_unlock().
>
> No, kmemcheck is work in progress and does not know about
> SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> was because Peter, Vegard, and myself identified this particular
> warning as a real problem. But yeah, kmemcheck can cause false
> positives for RCU for now.
Makes sense, and to me Pauls analysis of the code looks totally correct
- there's no bug there, at least related to hlist traversal and
kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
the grace for freeing.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:07 ` Jens Axboe
@ 2008-04-02 11:08 ` Peter Zijlstra
2008-04-02 11:11 ` Pekka Enberg
2008-04-02 11:14 ` Jens Axboe
0 siblings, 2 replies; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 11:08 UTC (permalink / raw)
To: Jens Axboe
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Pekka Enberg wrote:
> > Hi Paul,
> >
> > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > before gaining a reference to them and don't hold the reference past
> > > the matching rcu_read_unlock().
> >
> > No, kmemcheck is work in progress and does not know about
> > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > was because Peter, Vegard, and myself identified this particular
> > warning as a real problem. But yeah, kmemcheck can cause false
> > positives for RCU for now.
>
> Makes sense, and to me Pauls analysis of the code looks totally correct
> - there's no bug there, at least related to hlist traversal and
> kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> the grace for freeing.
but what holds off the slab allocator re-issueing that same object and
someone else writing other stuff into it?
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:08 ` Peter Zijlstra
@ 2008-04-02 11:11 ` Pekka Enberg
2008-04-02 11:14 ` Peter Zijlstra
2008-04-02 17:36 ` Christoph Lameter
2008-04-02 11:14 ` Jens Axboe
1 sibling, 2 replies; 69+ messages in thread
From: Pekka Enberg @ 2008-04-02 11:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jens Axboe, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List, Christoph Lameter
On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > Makes sense, and to me Pauls analysis of the code looks totally correct
> > - there's no bug there, at least related to hlist traversal and
> > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > the grace for freeing.
On Wed, Apr 2, 2008 at 2:08 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> but what holds off the slab allocator re-issueing that same object and
> someone else writing other stuff into it?
Nothing. So you cannot access the object at all after you've called
kmem_cache_free(). SLAB_RCU or no SLAB_RCU.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:53 ` Peter Zijlstra
@ 2008-04-02 11:13 ` Jens Axboe
0 siblings, 0 replies; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 11:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: paulmck, Vegard Nossum, Ingo Molnar, Pekka Enberg,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 03:40 -0700, Paul E. McKenney wrote:
> > On Wed, Apr 02, 2008 at 09:17:10AM +0200, Jens Axboe wrote:
> > > On Tue, Apr 01 2008, Peter Zijlstra wrote:
> > > > On Tue, 2008-04-01 at 23:08 +0200, Vegard Nossum wrote:
> > > > > Hi,
> > > > >
> > > > > This appeared in my logs:
> > > > >
> > > > > kmemcheck: Caught 32-bit read from freed memory (f7042348)
> > > > >
> > > > > Pid: 1374, comm: bash Not tainted (2.6.25-rc7 #92)
> > > > > EIP: 0060:[<c0502f0d>] EFLAGS: 00210202 CPU: 0
> > > > > EIP is at call_for_each_cic+0x2d/0x44
> > > > > EAX: 00200286 EBX: 00000001 ECX: c200e908 EDX: f7042348
> > > > > ESI: f6c26c60 EDI: c0503310 EBP: f70fff38 ESP: c082ec88
> > > > > DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> > > > > CR0: 8005003b CR2: f7826904 CR3: 36cd7000 CR4: 000006c0
> > > > > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > > > > DR6: ffff4ff0 DR7: 00000400
> > > > > [<c041cff8>] kmemcheck_read+0xa8/0xe0
> > > > > [<c041d1d5>] kmemcheck_access+0x1a5/0x244
> > > > > [<c0668252>] do_page_fault+0x622/0x6fc
> > > > > [<c06666aa>] error_code+0x72/0x78
> > > > > [<c050323f>] cfq_free_io_context+0xf/0x70
> > > > > [<c04fc4d7>] put_io_context+0x4f/0x58
> > > > > [<c04fc568>] exit_io_context+0x60/0x6c
> > > > > [<c042f871>] do_exit+0x4d9/0x6f0
> > > > > [<c042fab1>] do_group_exit+0x29/0x88
> > > > > [<c042fb1f>] sys_exit_group+0xf/0x14
> > > > > [<c0406105>] sysenter_past_esp+0x6d/0xa4
> > > > > [<ffffffff>] 0xffffffff
> > > > >
> > > > > The error occurs in cfq_free_io_context()'s call to
> > > > > call_for_each_cic() which looks like this:
> > > > >
> > > > > rcu_read_lock();
> > > > > hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> > > > > func(ioc, cic);
> > > > > called++;
> > > > > }
> > > > > rcu_read_unlock();
> > > > >
> > > > > The function that is called is cic_free_func(). It is postulated that
> > > > > hlist_for_each_entry_rcu() will dereference the previously freed list
> > > > > element to get the ->next pointer.
> > > > >
> > > > > After a short discussion with Pekka Enberg and Peter Zijlstra, it
> > > > > seemed evident that this list traversal should use
> > > > > hlist_for_each_entry_safe_rcu() instead, which would buffer the next
> > > > > pointer before the object is freed.
> > > > >
> > > > > Does this report seem to be valid?
> > > > >
> > > > > The kernel is 2.6.25-rc7.
> > > >
> > > > The missing hlist for loop would look something like so:
> > > >
> > > > #define hlist_for_each_entry_safe_rcu(tpos, pos, n, head, member) \
> > > > for (pos = (head)->first; \
> > > > rcu_dereference(pos) && ({ n = pos->next; 1; }) && \
> > > > ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> > > > pos = n)
> > >
> > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > patch to fix that, please see it here:
> > >
> > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=51998b2da4e5db65cb24317329059044083ea151
> >
> > I am still confused.
> >
> > o The hlist_for_each_entry_safe_rcu() is under rcu_read_lock().
> >
> > o The kmem_cache has SLAB_DESTROY_BY_RCU.
> >
> > o This means that a given slab should not be returned to the
> > system until a grace period elapses.
> >
> > o So the bugginess (or not) of this code should not be affected
> > by adding hlist_for_each_entry_safe_rcu() here.
> >
> > (I am not seeing the checks that would be needed to avoid
> > something being kmem_cache_free()ed while being accessed,
> > but might be missing something.)
>
> Agreed, when looking at this code its not making sense.
Ditto agree
> cfq_cic_lookup() is also mightily confusing. Only the actual
> radix_tree_lookup() call is protected by RCU, I'm not seeing what
> guarantees the existance of cic after rcu_read_unlock().
>
> Nor does it do a validation check to see if cic->key == cfqd, something
> that would be needed when using SLAB_DESTROY_BY_RCU.
It checks cic->key != NULL, it's set to NULL when it's invalid. Not sure
if it could transition to some other cfqd and radix_tree_lookup() still
returning the cic for the old key, if so it would need a check for ->key
== cfqd there as well (like in the one-hit cache above, it checks ->key
== cfqd).
> This is most fishy code.
Well, it's definitely not straight forward ;-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:11 ` Pekka Enberg
@ 2008-04-02 11:14 ` Peter Zijlstra
2008-04-02 11:18 ` Pekka Enberg
2008-04-02 17:36 ` Christoph Lameter
1 sibling, 1 reply; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 11:14 UTC (permalink / raw)
To: Pekka Enberg
Cc: Jens Axboe, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List, Christoph Lameter
On Wed, 2008-04-02 at 14:11 +0300, Pekka Enberg wrote:
> On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > - there's no bug there, at least related to hlist traversal and
> > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > the grace for freeing.
>
> On Wed, Apr 2, 2008 at 2:08 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > but what holds off the slab allocator re-issueing that same object and
> > someone else writing other stuff into it?
>
> Nothing. So you cannot access the object at all after you've called
> kmem_cache_free(). SLAB_RCU or no SLAB_RCU.
Well, you can, but you have to validate you get the object you were
looking for.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:08 ` Peter Zijlstra
2008-04-02 11:11 ` Pekka Enberg
@ 2008-04-02 11:14 ` Jens Axboe
2008-04-02 11:20 ` Peter Zijlstra
1 sibling, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 11:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > Hi Paul,
> > >
> > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > before gaining a reference to them and don't hold the reference past
> > > > the matching rcu_read_unlock().
> > >
> > > No, kmemcheck is work in progress and does not know about
> > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > was because Peter, Vegard, and myself identified this particular
> > > warning as a real problem. But yeah, kmemcheck can cause false
> > > positives for RCU for now.
> >
> > Makes sense, and to me Pauls analysis of the code looks totally correct
> > - there's no bug there, at least related to hlist traversal and
> > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > the grace for freeing.
>
> but what holds off the slab allocator re-issueing that same object and
> someone else writing other stuff into it?
Nothing, that's how rcu destry works here. But for the validation to be
WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
NULL.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:14 ` Peter Zijlstra
@ 2008-04-02 11:18 ` Pekka Enberg
0 siblings, 0 replies; 69+ messages in thread
From: Pekka Enberg @ 2008-04-02 11:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jens Axboe, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List, Christoph Lameter
On Wed, 2008-04-02 at 14:11 +0300, Pekka Enberg wrote:
> > Nothing. So you cannot access the object at all after you've called
> > kmem_cache_free(). SLAB_RCU or no SLAB_RCU.
On Wed, Apr 2, 2008 at 2:14 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Well, you can, but you have to validate you get the object you were
> looking for.
Yes. I keep forgetting that. Sorry.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:14 ` Jens Axboe
@ 2008-04-02 11:20 ` Peter Zijlstra
2008-04-02 11:25 ` Peter Zijlstra
2008-04-02 11:32 ` Jens Axboe
0 siblings, 2 replies; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 11:20 UTC (permalink / raw)
To: Jens Axboe
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > Hi Paul,
> > > >
> > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > before gaining a reference to them and don't hold the reference past
> > > > > the matching rcu_read_unlock().
> > > >
> > > > No, kmemcheck is work in progress and does not know about
> > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > was because Peter, Vegard, and myself identified this particular
> > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > positives for RCU for now.
> > >
> > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > - there's no bug there, at least related to hlist traversal and
> > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > the grace for freeing.
> >
> > but what holds off the slab allocator re-issueing that same object and
> > someone else writing other stuff into it?
>
> Nothing, that's how rcu destry works here. But for the validation to be
> WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> NULL.
>
A B C
cfq_cic_lookup(cfqd_1, ioc)
rcu_read_lock()
cic = radix_tree_lookup(, cfqd_q);
cfq_cic_free()
cfq_cic_link(cfqd_2, ioc,)
rcu_read_unlock()
and now we have that:
cic->key == cfqd_2
I'm not seeing anything stopping this from happening.
Which is also why we need hlist_for_each_safe_rcu() because as soon as
we kfree()d the thing, someone else might get the object and start
poking at the hlist pointers, wrecking out iteration.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:20 ` Peter Zijlstra
@ 2008-04-02 11:25 ` Peter Zijlstra
2008-04-02 11:32 ` Jens Axboe
1 sibling, 0 replies; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 11:25 UTC (permalink / raw)
To: Jens Axboe
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, 2008-04-02 at 13:20 +0200, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > > Hi Paul,
> > > > >
> > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > > before gaining a reference to them and don't hold the reference past
> > > > > > the matching rcu_read_unlock().
> > > > >
> > > > > No, kmemcheck is work in progress and does not know about
> > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > > was because Peter, Vegard, and myself identified this particular
> > > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > > positives for RCU for now.
> > > >
> > > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > > - there's no bug there, at least related to hlist traversal and
> > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > > the grace for freeing.
> > >
> > > but what holds off the slab allocator re-issueing that same object and
> > > someone else writing other stuff into it?
> >
> > Nothing, that's how rcu destry works here. But for the validation to be
> > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> > NULL.
> >
>
>
> A B C
>
> cfq_cic_lookup(cfqd_1, ioc)
>
> rcu_read_lock()
> cic = radix_tree_lookup(, cfqd_q);
>
> cfq_cic_free()
>
> cfq_cic_link(cfqd_2, ioc,)
>
> rcu_read_unlock()
>
>
> and now we have that:
>
> cic->key == cfqd_2
>
>
> I'm not seeing anything stopping this from happening.
>
> Which is also why we need hlist_for_each_safe_rcu() because as soon as
> we kfree()d the thing, someone else might get the object and start
> poking at the hlist pointers, wrecking out iteration.
Or worse, when C doesn't happen and B free's the very last object and
the slab does get returned, any usage of cic after rcu_read_unlock()
might poke into free memory.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:20 ` Peter Zijlstra
2008-04-02 11:25 ` Peter Zijlstra
@ 2008-04-02 11:32 ` Jens Axboe
2008-04-02 11:37 ` Peter Zijlstra
1 sibling, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 11:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > > Hi Paul,
> > > > >
> > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > > before gaining a reference to them and don't hold the reference past
> > > > > > the matching rcu_read_unlock().
> > > > >
> > > > > No, kmemcheck is work in progress and does not know about
> > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > > was because Peter, Vegard, and myself identified this particular
> > > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > > positives for RCU for now.
> > > >
> > > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > > - there's no bug there, at least related to hlist traversal and
> > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > > the grace for freeing.
> > >
> > > but what holds off the slab allocator re-issueing that same object and
> > > someone else writing other stuff into it?
> >
> > Nothing, that's how rcu destry works here. But for the validation to be
> > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> > NULL.
> >
>
>
> A B C
>
> cfq_cic_lookup(cfqd_1, ioc)
>
> rcu_read_lock()
> cic = radix_tree_lookup(, cfqd_q);
>
> cfq_cic_free()
>
> cfq_cic_link(cfqd_2, ioc,)
>
> rcu_read_unlock()
>
>
> and now we have that:
>
> cic->key == cfqd_2
>
>
> I'm not seeing anything stopping this from happening.
I don't follow your A-B-C here, what do they refer to?
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:59 ` Peter Zijlstra
@ 2008-04-02 11:33 ` Fabio Checconi
2008-04-02 11:43 ` Jens Axboe
2008-04-02 13:32 ` Paul E. McKenney
0 siblings, 2 replies; 69+ messages in thread
From: Fabio Checconi @ 2008-04-02 11:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: paulmck, Ingo Molnar, Jens Axboe, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Wed, Apr 02, 2008 12:59:21PM +0200
>
> On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
> > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
> > >
> > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > >
> > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > > > patch to fix that, please see it here:
> > > > >
> > > > > You probably don't have kmemcheck in your kernel ;-)
> > > >
> > > > Ehm no, you are right :)
> > >
> > > ... and you can get kmemcheck by testing on x86.git/latest:
> > >
> > > http://people.redhat.com/mingo/x86.git/README
> > >
> > > ;-)
> >
> > I will check this when I get back to some bandwidth -- but in the meantime,
> > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > newly-freed items in that case, as long as you did rcu_read_lock()
> > before gaining a reference to them and don't hold the reference past
> > the matching rcu_read_unlock().
>
> I don't think it does.
>
> It would have to register an call_rcu callback itself in order to mark
> it freed - and handle the race with the object being handed out again.
>
I had the same problem while debugging a cfq-derived i/o scheduler,
and I found nothing preventing the reuse of the freed memory.
The patch below seemed to fix the logic.
Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
---
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 0f962ec..f26da2b 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1143,24 +1143,37 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
}
/*
- * Call func for each cic attached to this ioc. Returns number of cic's seen.
+ * Call func for each cic attached to this ioc.
*/
-static unsigned int
+static void
call_for_each_cic(struct io_context *ioc,
void (*func)(struct io_context *, struct cfq_io_context *))
{
struct cfq_io_context *cic;
struct hlist_node *n;
- int called = 0;
rcu_read_lock();
- hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
+ hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list)
func(ioc, cic);
- called++;
- }
rcu_read_unlock();
+}
+
+static void cfq_cic_free_rcu(struct rcu_head *head)
+{
+ struct cfq_io_context *cic;
+
+ cic = container_of(head, struct cfq_io_context, rcu_head);
+
+ kmem_cache_free(cfq_ioc_pool, cic);
+ elv_ioc_count_dec(ioc_count);
+
+ if (ioc_gone && !elv_ioc_count_read(ioc_count))
+ complete(ioc_gone);
+}
- return called;
+static void cfq_cic_free(struct cfq_io_context *cic)
+{
+ call_rcu(&cic->rcu_head, cfq_cic_free_rcu);
}
static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
@@ -1174,24 +1187,18 @@ static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
hlist_del_rcu(&cic->cic_list);
spin_unlock_irqrestore(&ioc->lock, flags);
- kmem_cache_free(cfq_ioc_pool, cic);
+ cfq_cic_free(cic);
}
static void cfq_free_io_context(struct io_context *ioc)
{
- int freed;
-
/*
- * ioc->refcount is zero here, so no more cic's are allowed to be
- * linked into this ioc. So it should be ok to iterate over the known
- * list, we will see all cic's since no new ones are added.
+ * ioc->refcount is zero here, or we are called from elv_unregister(),
+ * so no more cic's are allowed to be linked into this ioc. So it
+ * should be ok to iterate over the known list, we will see all cic's
+ * since no new ones are added.
*/
- freed = call_for_each_cic(ioc, cic_free_func);
-
- elv_ioc_count_mod(ioc_count, -freed);
-
- if (ioc_gone && !elv_ioc_count_read(ioc_count))
- complete(ioc_gone);
+ call_for_each_cic(ioc, cic_free_func);
}
static void cfq_exit_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq)
@@ -1458,15 +1465,6 @@ cfq_get_queue(struct cfq_data *cfqd, int is_sync, struct io_context *ioc,
return cfqq;
}
-static void cfq_cic_free(struct cfq_io_context *cic)
-{
- kmem_cache_free(cfq_ioc_pool, cic);
- elv_ioc_count_dec(ioc_count);
-
- if (ioc_gone && !elv_ioc_count_read(ioc_count))
- complete(ioc_gone);
-}
-
/*
* We drop cfq io contexts lazily, so we may find a dead one.
*/
@@ -2138,7 +2136,7 @@ static int __init cfq_slab_setup(void)
if (!cfq_pool)
goto fail;
- cfq_ioc_pool = KMEM_CACHE(cfq_io_context, SLAB_DESTROY_BY_RCU);
+ cfq_ioc_pool = KMEM_CACHE(cfq_io_context, 0);
if (!cfq_ioc_pool)
goto fail;
@@ -2286,7 +2284,6 @@ static void __exit cfq_exit(void)
smp_wmb();
if (elv_ioc_count_read(ioc_count))
wait_for_completion(ioc_gone);
- synchronize_rcu();
cfq_slab_kill();
}
diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
index 1b4ccf2..50e448c 100644
--- a/include/linux/iocontext.h
+++ b/include/linux/iocontext.h
@@ -54,6 +54,8 @@ struct cfq_io_context {
void (*dtor)(struct io_context *); /* destructor */
void (*exit)(struct io_context *); /* called on task exit */
+
+ struct rcu_head rcu_head;
};
/*
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:32 ` Jens Axboe
@ 2008-04-02 11:37 ` Peter Zijlstra
2008-04-02 11:42 ` Jens Axboe
0 siblings, 1 reply; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 11:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > > > Hi Paul,
> > > > > >
> > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > > > before gaining a reference to them and don't hold the reference past
> > > > > > > the matching rcu_read_unlock().
> > > > > >
> > > > > > No, kmemcheck is work in progress and does not know about
> > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > > > was because Peter, Vegard, and myself identified this particular
> > > > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > > > positives for RCU for now.
> > > > >
> > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > > > - there's no bug there, at least related to hlist traversal and
> > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > > > the grace for freeing.
> > > >
> > > > but what holds off the slab allocator re-issueing that same object and
> > > > someone else writing other stuff into it?
> > >
> > > Nothing, that's how rcu destry works here. But for the validation to be
> > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> > > NULL.
> > >
> >
> >
> > A B C
> >
> > cfq_cic_lookup(cfqd_1, ioc)
> >
> > rcu_read_lock()
> > cic = radix_tree_lookup(, cfqd_q);
> >
> > cfq_cic_free()
> >
> > cfq_cic_link(cfqd_2, ioc,)
> >
> > rcu_read_unlock()
> >
> >
> > and now we have that:
> >
> > cic->key == cfqd_2
> >
> >
> > I'm not seeing anything stopping this from happening.
>
> I don't follow your A-B-C here, what do they refer to?
A does a radix_tree_lookup() of cfqd_1 (darn typos)
B does a kfree of the same cic found by A
C does an alloc and gets the same cic as freed by B and inserts it
in a different location.
So that when we return to A, cic->key == cfqd_2 even though we did a
lookup for cfqd_1.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:37 ` Peter Zijlstra
@ 2008-04-02 11:42 ` Jens Axboe
2008-04-02 11:47 ` Peter Zijlstra
0 siblings, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 11:42 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > > > > Hi Paul,
> > > > > > >
> > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > > > > before gaining a reference to them and don't hold the reference past
> > > > > > > > the matching rcu_read_unlock().
> > > > > > >
> > > > > > > No, kmemcheck is work in progress and does not know about
> > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > > > > was because Peter, Vegard, and myself identified this particular
> > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > > > > positives for RCU for now.
> > > > > >
> > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > > > > - there's no bug there, at least related to hlist traversal and
> > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > > > > the grace for freeing.
> > > > >
> > > > > but what holds off the slab allocator re-issueing that same object and
> > > > > someone else writing other stuff into it?
> > > >
> > > > Nothing, that's how rcu destry works here. But for the validation to be
> > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> > > > NULL.
> > > >
> > >
> > >
> > > A B C
> > >
> > > cfq_cic_lookup(cfqd_1, ioc)
> > >
> > > rcu_read_lock()
> > > cic = radix_tree_lookup(, cfqd_q);
> > >
> > > cfq_cic_free()
> > >
> > > cfq_cic_link(cfqd_2, ioc,)
> > >
> > > rcu_read_unlock()
> > >
> > >
> > > and now we have that:
> > >
> > > cic->key == cfqd_2
> > >
> > >
> > > I'm not seeing anything stopping this from happening.
> >
> > I don't follow your A-B-C here, what do they refer to?
>
> A does a radix_tree_lookup() of cfqd_1 (darn typos)
> B does a kfree of the same cic found by A
> C does an alloc and gets the same cic as freed by B and inserts it
> in a different location.
>
> So that when we return to A, cic->key == cfqd_2 even though we did a
> lookup for cfqd_1.
That I follow, my question was if A, B, and C refer to different
processes but with a shared io context? I'm assuming that is correct...
And it does look buggy. It looks my assumption of what slab rcu destroy
did is WRONG, it should be replaced by a manual call_rcu() freeing
instead.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:33 ` Fabio Checconi
@ 2008-04-02 11:43 ` Jens Axboe
2008-04-02 12:36 ` Jens Axboe
2008-04-02 13:32 ` Paul E. McKenney
1 sibling, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 11:43 UTC (permalink / raw)
To: Peter Zijlstra, paulmck, Ingo Molnar, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
On Wed, Apr 02 2008, Fabio Checconi wrote:
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date: Wed, Apr 02, 2008 12:59:21PM +0200
> >
> > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
> > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
> > > >
> > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > >
> > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > > > > patch to fix that, please see it here:
> > > > > >
> > > > > > You probably don't have kmemcheck in your kernel ;-)
> > > > >
> > > > > Ehm no, you are right :)
> > > >
> > > > ... and you can get kmemcheck by testing on x86.git/latest:
> > > >
> > > > http://people.redhat.com/mingo/x86.git/README
> > > >
> > > > ;-)
> > >
> > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > before gaining a reference to them and don't hold the reference past
> > > the matching rcu_read_unlock().
> >
> > I don't think it does.
> >
> > It would have to register an call_rcu callback itself in order to mark
> > it freed - and handle the race with the object being handed out again.
> >
>
> I had the same problem while debugging a cfq-derived i/o scheduler,
> and I found nothing preventing the reuse of the freed memory.
> The patch below seemed to fix the logic.
Thanks, from a first look this looks like it'll fix this bad rcu slab
usage. I'll give it some closer scrutiny and testing.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:42 ` Jens Axboe
@ 2008-04-02 11:47 ` Peter Zijlstra
2008-04-02 11:53 ` Jens Axboe
0 siblings, 1 reply; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 11:47 UTC (permalink / raw)
To: Jens Axboe
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, 2008-04-02 at 13:42 +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
> > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > > > > > Hi Paul,
> > > > > > > >
> > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > > > > > before gaining a reference to them and don't hold the reference past
> > > > > > > > > the matching rcu_read_unlock().
> > > > > > > >
> > > > > > > > No, kmemcheck is work in progress and does not know about
> > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > > > > > was because Peter, Vegard, and myself identified this particular
> > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > > > > > positives for RCU for now.
> > > > > > >
> > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > > > > > - there's no bug there, at least related to hlist traversal and
> > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > > > > > the grace for freeing.
> > > > > >
> > > > > > but what holds off the slab allocator re-issueing that same object and
> > > > > > someone else writing other stuff into it?
> > > > >
> > > > > Nothing, that's how rcu destry works here. But for the validation to be
> > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> > > > > NULL.
> > > > >
> > > >
> > > >
> > > > A B C
> > > >
> > > > cfq_cic_lookup(cfqd_1, ioc)
> > > >
> > > > rcu_read_lock()
> > > > cic = radix_tree_lookup(, cfqd_q);
> > > >
> > > > cfq_cic_free()
> > > >
> > > > cfq_cic_link(cfqd_2, ioc,)
> > > >
> > > > rcu_read_unlock()
> > > >
> > > >
> > > > and now we have that:
> > > >
> > > > cic->key == cfqd_2
> > > >
> > > >
> > > > I'm not seeing anything stopping this from happening.
> > >
> > > I don't follow your A-B-C here, what do they refer to?
> >
> > A does a radix_tree_lookup() of cfqd_1 (darn typos)
> > B does a kfree of the same cic found by A
> > C does an alloc and gets the same cic as freed by B and inserts it
> > in a different location.
> >
> > So that when we return to A, cic->key == cfqd_2 even though we did a
> > lookup for cfqd_1.
>
> That I follow, my question was if A, B, and C refer to different
> processes but with a shared io context? I'm assuming that is correct...
Ah, yeah, whatever is needed to make this race happen :-)
> And it does look buggy. It looks my assumption of what slab rcu destroy
> did is WRONG, it should be replaced by a manual call_rcu() freeing
> instead.
Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
I'm sure this is not the first (nor the last) time people get that
wrong.
This would be one of those things that score very low on Rusty's API
list.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:47 ` Peter Zijlstra
@ 2008-04-02 11:53 ` Jens Axboe
2008-04-02 12:13 ` Peter Zijlstra
2008-04-02 12:26 ` Peter Zijlstra
0 siblings, 2 replies; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 11:53 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 13:42 +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
> > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > > > > > > Hi Paul,
> > > > > > > > >
> > > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > > > > > > before gaining a reference to them and don't hold the reference past
> > > > > > > > > > the matching rcu_read_unlock().
> > > > > > > > >
> > > > > > > > > No, kmemcheck is work in progress and does not know about
> > > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > > > > > > was because Peter, Vegard, and myself identified this particular
> > > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > > > > > > positives for RCU for now.
> > > > > > > >
> > > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > > > > > > - there's no bug there, at least related to hlist traversal and
> > > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > > > > > > the grace for freeing.
> > > > > > >
> > > > > > > but what holds off the slab allocator re-issueing that same object and
> > > > > > > someone else writing other stuff into it?
> > > > > >
> > > > > > Nothing, that's how rcu destry works here. But for the validation to be
> > > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> > > > > > NULL.
> > > > > >
> > > > >
> > > > >
> > > > > A B C
> > > > >
> > > > > cfq_cic_lookup(cfqd_1, ioc)
> > > > >
> > > > > rcu_read_lock()
> > > > > cic = radix_tree_lookup(, cfqd_q);
> > > > >
> > > > > cfq_cic_free()
> > > > >
> > > > > cfq_cic_link(cfqd_2, ioc,)
> > > > >
> > > > > rcu_read_unlock()
> > > > >
> > > > >
> > > > > and now we have that:
> > > > >
> > > > > cic->key == cfqd_2
> > > > >
> > > > >
> > > > > I'm not seeing anything stopping this from happening.
> > > >
> > > > I don't follow your A-B-C here, what do they refer to?
> > >
> > > A does a radix_tree_lookup() of cfqd_1 (darn typos)
> > > B does a kfree of the same cic found by A
> > > C does an alloc and gets the same cic as freed by B and inserts it
> > > in a different location.
> > >
> > > So that when we return to A, cic->key == cfqd_2 even though we did a
> > > lookup for cfqd_1.
> >
> > That I follow, my question was if A, B, and C refer to different
> > processes but with a shared io context? I'm assuming that is correct...
>
> Ah, yeah, whatever is needed to make this race happen :-)
The only place where you'll have multiple processes involved with this
at all is if they share io contexts. That is also why the bug isn't that
critical, since it's not possible right now (CLONE_IO flag must be
used).
> > And it does look buggy. It looks my assumption of what slab rcu destroy
> > did is WRONG, it should be replaced by a manual call_rcu() freeing
> > instead.
>
> Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
> I'm sure this is not the first (nor the last) time people get that
> wrong.
It should, SLAB_DESTROY_BY_RCU is definitely useful, but it is expected
to be an 'easier' way of doing the call_rcu() manually. So it definitely
needs more documentation.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:53 ` Jens Axboe
@ 2008-04-02 12:13 ` Peter Zijlstra
2008-04-02 12:28 ` Jens Axboe
` (2 more replies)
2008-04-02 12:26 ` Peter Zijlstra
1 sibling, 3 replies; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 12:13 UTC (permalink / raw)
To: Jens Axboe
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List, Andrew Morton, Randy Dunlap
On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:
> > Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
> > I'm sure this is not the first (nor the last) time people get that
> > wrong.
>
> It should, SLAB_DESTROY_BY_RCU is definitely useful, but it is expected
> to be an 'easier' way of doing the call_rcu() manually. So it definitely
> needs more documentation.
>
Ok I gave it a go, how bad is this text?
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
diff --git a/include/linux/slab.h b/include/linux/slab.h
index f950a89..e049ddc 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -25,6 +25,32 @@
#define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */
#define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */
#define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
+/*
+ * SLAB_DESTROY_BY_RCU - **WARNING** READ THIS!
+ *
+ * This delays freeing the SLAB page by a grace period, it does _NOT_
+ * delay object freeing. This means that if you do kmem_cache_free()
+ * that memory location is free to be reused at any time. Thus it may
+ * be possible to see another object there in the same RCU grace period.
+ *
+ * This feature only ensures the memory location backing the object
+ * stays valid, the trick to using this is relying on an independent
+ * object validation pass. Something like:
+ *
+ * rcu_read_lock()
+ * again:
+ * obj = lockless_lookup(key);
+ * if (obj) {
+ * if (!try_get_ref(obj)) // might fail for free objects
+ * goto again;
+ *
+ * if (obj->key != key) { // not the object we expected
+ * put_ref(obj);
+ * goto again;
+ * }
+ * }
+ * rcu_read_unlock();
+ */
#define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
#define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
#define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:53 ` Jens Axboe
2008-04-02 12:13 ` Peter Zijlstra
@ 2008-04-02 12:26 ` Peter Zijlstra
2008-04-02 12:34 ` Jens Axboe
1 sibling, 1 reply; 69+ messages in thread
From: Peter Zijlstra @ 2008-04-02 12:26 UTC (permalink / raw)
To: Jens Axboe
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > On Wed, 2008-04-02 at 13:42 +0200, Jens Axboe wrote:
> > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
> > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> > > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > > > > > > > Hi Paul,
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > > > > > > > before gaining a reference to them and don't hold the reference past
> > > > > > > > > > > the matching rcu_read_unlock().
> > > > > > > > > >
> > > > > > > > > > No, kmemcheck is work in progress and does not know about
> > > > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > > > > > > > was because Peter, Vegard, and myself identified this particular
> > > > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > > > > > > > positives for RCU for now.
> > > > > > > > >
> > > > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > > > > > > > - there's no bug there, at least related to hlist traversal and
> > > > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > > > > > > > the grace for freeing.
> > > > > > > >
> > > > > > > > but what holds off the slab allocator re-issueing that same object and
> > > > > > > > someone else writing other stuff into it?
> > > > > > >
> > > > > > > Nothing, that's how rcu destry works here. But for the validation to be
> > > > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> > > > > > > NULL.
> > > > > > >
> > > > > >
> > > > > >
> > > > > > A B C
> > > > > >
> > > > > > cfq_cic_lookup(cfqd_1, ioc)
> > > > > >
> > > > > > rcu_read_lock()
> > > > > > cic = radix_tree_lookup(, cfqd_q);
> > > > > >
> > > > > > cfq_cic_free()
> > > > > >
> > > > > > cfq_cic_link(cfqd_2, ioc,)
> > > > > >
> > > > > > rcu_read_unlock()
> > > > > >
> > > > > >
> > > > > > and now we have that:
> > > > > >
> > > > > > cic->key == cfqd_2
> > > > > >
> > > > > >
> > > > > > I'm not seeing anything stopping this from happening.
> > > > >
> > > > > I don't follow your A-B-C here, what do they refer to?
> > > >
> > > > A does a radix_tree_lookup() of cfqd_1 (darn typos)
> > > > B does a kfree of the same cic found by A
> > > > C does an alloc and gets the same cic as freed by B and inserts it
> > > > in a different location.
> > > >
> > > > So that when we return to A, cic->key == cfqd_2 even though we did a
> > > > lookup for cfqd_1.
> > >
> > > That I follow, my question was if A, B, and C refer to different
> > > processes but with a shared io context? I'm assuming that is correct...
> >
> > Ah, yeah, whatever is needed to make this race happen :-)
>
> The only place where you'll have multiple processes involved with this
> at all is if they share io contexts. That is also why the bug isn't that
> critical, since it's not possible right now (CLONE_IO flag must be
> used).
There are 3 races here:
1) A continues with another object than intended
(requires CLONE_IO)
2) A does hlist_for_each_rcu() and races with B,C so that
we continue the iteration on a possibly unrelated list.
3) cic is freed after the !cic->key check.
I'm not familiar enough with the code yet to see if 3 really is an
possibility. But from what I can see there is nothing guarding its
existence.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:13 ` Peter Zijlstra
@ 2008-04-02 12:28 ` Jens Axboe
2008-04-02 13:26 ` Paul E. McKenney
2008-04-02 13:43 ` Andi Kleen
2 siblings, 0 replies; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 12:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List, Andrew Morton, Randy Dunlap
On Wed, Apr 02 2008, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:
>
> > > Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
> > > I'm sure this is not the first (nor the last) time people get that
> > > wrong.
> >
> > It should, SLAB_DESTROY_BY_RCU is definitely useful, but it is expected
> > to be an 'easier' way of doing the call_rcu() manually. So it definitely
> > needs more documentation.
> >
>
> Ok I gave it a go, how bad is this text?
I think it looks good. The key point is this:
"This delays freeing the SLAB page by a grace period, it does _NOT_ delay
object freeing."
which is right in the front of the text and with sample validation
below. So you can add my acked-by to that, if you want.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:26 ` Peter Zijlstra
@ 2008-04-02 12:34 ` Jens Axboe
0 siblings, 0 replies; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 12:34 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > On Wed, 2008-04-02 at 13:42 +0200, Jens Axboe wrote:
> > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > On Wed, 2008-04-02 at 13:32 +0200, Jens Axboe wrote:
> > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > > > On Wed, 2008-04-02 at 13:14 +0200, Jens Axboe wrote:
> > > > > > > > On Wed, Apr 02 2008, Peter Zijlstra wrote:
> > > > > > > > > On Wed, 2008-04-02 at 13:07 +0200, Jens Axboe wrote:
> > > > > > > > > > On Wed, Apr 02 2008, Pekka Enberg wrote:
> > > > > > > > > > > Hi Paul,
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> > > > > > > > > > > <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > > > > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > > > > > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > > > > > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > > > > > > > > before gaining a reference to them and don't hold the reference past
> > > > > > > > > > > > the matching rcu_read_unlock().
> > > > > > > > > > >
> > > > > > > > > > > No, kmemcheck is work in progress and does not know about
> > > > > > > > > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > > > > > > > > was because Peter, Vegard, and myself identified this particular
> > > > > > > > > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > > > > > > > > positives for RCU for now.
> > > > > > > > > >
> > > > > > > > > > Makes sense, and to me Pauls analysis of the code looks totally correct
> > > > > > > > > > - there's no bug there, at least related to hlist traversal and
> > > > > > > > > > kmem_cache_free(), since we are under rcu_read_lock() and thus hold off
> > > > > > > > > > the grace for freeing.
> > > > > > > > >
> > > > > > > > > but what holds off the slab allocator re-issueing that same object and
> > > > > > > > > someone else writing other stuff into it?
> > > > > > > >
> > > > > > > > Nothing, that's how rcu destry works here. But for the validation to be
> > > > > > > > WRONG radix_tree_lookup(..., old_key) must return cic for new_key, not
> > > > > > > > NULL.
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > A B C
> > > > > > >
> > > > > > > cfq_cic_lookup(cfqd_1, ioc)
> > > > > > >
> > > > > > > rcu_read_lock()
> > > > > > > cic = radix_tree_lookup(, cfqd_q);
> > > > > > >
> > > > > > > cfq_cic_free()
> > > > > > >
> > > > > > > cfq_cic_link(cfqd_2, ioc,)
> > > > > > >
> > > > > > > rcu_read_unlock()
> > > > > > >
> > > > > > >
> > > > > > > and now we have that:
> > > > > > >
> > > > > > > cic->key == cfqd_2
> > > > > > >
> > > > > > >
> > > > > > > I'm not seeing anything stopping this from happening.
> > > > > >
> > > > > > I don't follow your A-B-C here, what do they refer to?
> > > > >
> > > > > A does a radix_tree_lookup() of cfqd_1 (darn typos)
> > > > > B does a kfree of the same cic found by A
> > > > > C does an alloc and gets the same cic as freed by B and inserts it
> > > > > in a different location.
> > > > >
> > > > > So that when we return to A, cic->key == cfqd_2 even though we did a
> > > > > lookup for cfqd_1.
> > > >
> > > > That I follow, my question was if A, B, and C refer to different
> > > > processes but with a shared io context? I'm assuming that is correct...
> > >
> > > Ah, yeah, whatever is needed to make this race happen :-)
> >
> > The only place where you'll have multiple processes involved with this
> > at all is if they share io contexts. That is also why the bug isn't that
> > critical, since it's not possible right now (CLONE_IO flag must be
> > used).
>
> There are 3 races here:
>
> 1) A continues with another object than intended
> (requires CLONE_IO)
>
> 2) A does hlist_for_each_rcu() and races with B,C so that
> we continue the iteration on a possibly unrelated list.
>
> 3) cic is freed after the !cic->key check.
>
> I'm not familiar enough with the code yet to see if 3 really is an
> possibility. But from what I can see there is nothing guarding its
> existence.
All 3 require CLONE_IO, because if that is not set, there's a 1:1
mapping between a process and io context (no sharing occurs).
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:43 ` Jens Axboe
@ 2008-04-02 12:36 ` Jens Axboe
2008-04-02 12:36 ` Jens Axboe
0 siblings, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 12:36 UTC (permalink / raw)
To: Peter Zijlstra, paulmck, Ingo Molnar, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
On Wed, Apr 02 2008, Jens Axboe wrote:
> On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > Date: Wed, Apr 02, 2008 12:59:21PM +0200
> > >
> > > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
> > > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
> > > > >
> > > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > >
> > > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > > > > > patch to fix that, please see it here:
> > > > > > >
> > > > > > > You probably don't have kmemcheck in your kernel ;-)
> > > > > >
> > > > > > Ehm no, you are right :)
> > > > >
> > > > > ... and you can get kmemcheck by testing on x86.git/latest:
> > > > >
> > > > > http://people.redhat.com/mingo/x86.git/README
> > > > >
> > > > > ;-)
> > > >
> > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > before gaining a reference to them and don't hold the reference past
> > > > the matching rcu_read_unlock().
> > >
> > > I don't think it does.
> > >
> > > It would have to register an call_rcu callback itself in order to mark
> > > it freed - and handle the race with the object being handed out again.
> > >
> >
> > I had the same problem while debugging a cfq-derived i/o scheduler,
> > and I found nothing preventing the reuse of the freed memory.
> > The patch below seemed to fix the logic.
>
> Thanks, from a first look this looks like it'll fix this bad rcu slab
> usage. I'll give it some closer scrutiny and testing.
(CC reinstated, sometimes mutt is really annoying and drops the person
you are replying too :-(
Looks good and tests fine as well. I've applied it, on top of the
hlist_for_each_entry_safe_rcu() fix.
http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:36 ` Jens Axboe
@ 2008-04-02 12:36 ` Jens Axboe
2008-04-02 12:55 ` Fabio Checconi
0 siblings, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 12:36 UTC (permalink / raw)
To: Peter Zijlstra, paulmck, Ingo Molnar, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
Cc: fabio
On Wed, Apr 02 2008, Jens Axboe wrote:
> On Wed, Apr 02 2008, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > Date: Wed, Apr 02, 2008 12:59:21PM +0200
> > > >
> > > > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
> > > > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
> > > > > >
> > > > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > > >
> > > > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > > > > > > patch to fix that, please see it here:
> > > > > > > >
> > > > > > > > You probably don't have kmemcheck in your kernel ;-)
> > > > > > >
> > > > > > > Ehm no, you are right :)
> > > > > >
> > > > > > ... and you can get kmemcheck by testing on x86.git/latest:
> > > > > >
> > > > > > http://people.redhat.com/mingo/x86.git/README
> > > > > >
> > > > > > ;-)
> > > > >
> > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > before gaining a reference to them and don't hold the reference past
> > > > > the matching rcu_read_unlock().
> > > >
> > > > I don't think it does.
> > > >
> > > > It would have to register an call_rcu callback itself in order to mark
> > > > it freed - and handle the race with the object being handed out again.
> > > >
> > >
> > > I had the same problem while debugging a cfq-derived i/o scheduler,
> > > and I found nothing preventing the reuse of the freed memory.
> > > The patch below seemed to fix the logic.
> >
> > Thanks, from a first look this looks like it'll fix this bad rcu slab
> > usage. I'll give it some closer scrutiny and testing.
>
> (CC reinstated, sometimes mutt is really annoying and drops the person
> you are replying too :-(
(for real...)
>
> Looks good and tests fine as well. I've applied it, on top of the
> hlist_for_each_entry_safe_rcu() fix.
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
>
> --
> Jens Axboe
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:36 ` Jens Axboe
@ 2008-04-02 12:55 ` Fabio Checconi
2008-04-02 12:58 ` Jens Axboe
0 siblings, 1 reply; 69+ messages in thread
From: Fabio Checconi @ 2008-04-02 12:55 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, paulmck, Ingo Molnar, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
> From: Jens Axboe <jens.axboe@oracle.com>
> Date: Wed, Apr 02, 2008 02:36:39PM +0200
>
> > Looks good and tests fine as well. I've applied it, on top of the
> > hlist_for_each_entry_safe_rcu() fix.
> >
> > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> >
ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
is needed at this point, since the pos->next pointer is still valid
(at least) until the next rcu_read_unlock(). am I wrong?
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:55 ` Fabio Checconi
@ 2008-04-02 12:58 ` Jens Axboe
2008-04-02 12:58 ` Jens Axboe
2008-04-02 13:37 ` Paul E. McKenney
0 siblings, 2 replies; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 12:58 UTC (permalink / raw)
To: Peter Zijlstra, paulmck, Ingo Molnar, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
On Wed, Apr 02 2008, Fabio Checconi wrote:
> > From: Jens Axboe <jens.axboe@oracle.com>
> > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> >
> > > Looks good and tests fine as well. I've applied it, on top of the
> > > hlist_for_each_entry_safe_rcu() fix.
> > >
> > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > >
>
> ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> is needed at this point, since the pos->next pointer is still valid
> (at least) until the next rcu_read_unlock(). am I wrong?
it isn't, but it's still clearer. so either that or a nice comment, I
just stuck with what I had already committed.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:58 ` Jens Axboe
@ 2008-04-02 12:58 ` Jens Axboe
2008-04-02 13:16 ` Fabio Checconi
2008-04-02 13:37 ` Paul E. McKenney
1 sibling, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 12:58 UTC (permalink / raw)
To: Peter Zijlstra, paulmck, Ingo Molnar, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
Cc: fabio
On Wed, Apr 02 2008, Jens Axboe wrote:
> On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > From: Jens Axboe <jens.axboe@oracle.com>
> > > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> > >
> > > > Looks good and tests fine as well. I've applied it, on top of the
> > > > hlist_for_each_entry_safe_rcu() fix.
> > > >
> > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > > >
> >
> > ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> > is needed at this point, since the pos->next pointer is still valid
> > (at least) until the next rcu_read_unlock(). am I wrong?
>
> it isn't, but it's still clearer. so either that or a nice comment, I
> just stuck with what I had already committed.
Christ, mutt is still dropping you. Sorry about that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:58 ` Jens Axboe
@ 2008-04-02 13:16 ` Fabio Checconi
2008-04-02 16:14 ` Paul E. McKenney
0 siblings, 1 reply; 69+ messages in thread
From: Fabio Checconi @ 2008-04-02 13:16 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, paulmck, Ingo Molnar, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
> From: Jens Axboe <jens.axboe@oracle.com>
> Date: Wed, Apr 02, 2008 02:58:58PM +0200
>
> On Wed, Apr 02 2008, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > > From: Jens Axboe <jens.axboe@oracle.com>
> > > > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> > > >
> > > > > Looks good and tests fine as well. I've applied it, on top of the
> > > > > hlist_for_each_entry_safe_rcu() fix.
> > > > >
> > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > > > >
> > >
> > > ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> > > is needed at this point, since the pos->next pointer is still valid
> > > (at least) until the next rcu_read_unlock(). am I wrong?
> >
> > it isn't, but it's still clearer. so either that or a nice comment, I
> > just stuck with what I had already committed.
>
ok I agree on that. the only remaining concern I have is that when
I first looked at it it seemed to me that hlist_for_each_entry_safe_rcu()
was missing by purpose from the list interface, since hlist_del_rcu()
can be called anyway during the traversal from a concurrent context,
so the semantics of *_safe_* have to be carried out by other means
(i.e., call_rcu()).
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:13 ` Peter Zijlstra
2008-04-02 12:28 ` Jens Axboe
@ 2008-04-02 13:26 ` Paul E. McKenney
2008-04-02 13:43 ` Andi Kleen
2 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 13:26 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jens Axboe, Pekka Enberg, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List, Andrew Morton, Randy Dunlap
On Wed, Apr 02, 2008 at 02:13:42PM +0200, Peter Zijlstra wrote:
> On Wed, 2008-04-02 at 13:53 +0200, Jens Axboe wrote:
>
> > > Yeah, SLAB_DESTROY_BY_RCU should have a _HUGE_ comment explaining it,
> > > I'm sure this is not the first (nor the last) time people get that
> > > wrong.
> >
> > It should, SLAB_DESTROY_BY_RCU is definitely useful, but it is expected
> > to be an 'easier' way of doing the call_rcu() manually. So it definitely
> > needs more documentation.
> >
>
> Ok I gave it a go, how bad is this text?
Looks good to me!
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index f950a89..e049ddc 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -25,6 +25,32 @@
> #define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */
> #define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */
> #define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
> +/*
> + * SLAB_DESTROY_BY_RCU - **WARNING** READ THIS!
> + *
> + * This delays freeing the SLAB page by a grace period, it does _NOT_
> + * delay object freeing. This means that if you do kmem_cache_free()
> + * that memory location is free to be reused at any time. Thus it may
> + * be possible to see another object there in the same RCU grace period.
> + *
> + * This feature only ensures the memory location backing the object
> + * stays valid, the trick to using this is relying on an independent
> + * object validation pass. Something like:
> + *
> + * rcu_read_lock()
> + * again:
> + * obj = lockless_lookup(key);
> + * if (obj) {
> + * if (!try_get_ref(obj)) // might fail for free objects
> + * goto again;
> + *
> + * if (obj->key != key) { // not the object we expected
> + * put_ref(obj);
> + * goto again;
> + * }
> + * }
> + * rcu_read_unlock();
> + */
> #define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
> #define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
> #define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */
>
>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:33 ` Fabio Checconi
2008-04-02 11:43 ` Jens Axboe
@ 2008-04-02 13:32 ` Paul E. McKenney
2008-04-02 13:40 ` Jens Axboe
1 sibling, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 13:32 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Jens Axboe, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 01:33:27PM +0200, Fabio Checconi wrote:
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date: Wed, Apr 02, 2008 12:59:21PM +0200
> >
> > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
> > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
> > > >
> > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > >
> > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > > > > patch to fix that, please see it here:
> > > > > >
> > > > > > You probably don't have kmemcheck in your kernel ;-)
> > > > >
> > > > > Ehm no, you are right :)
> > > >
> > > > ... and you can get kmemcheck by testing on x86.git/latest:
> > > >
> > > > http://people.redhat.com/mingo/x86.git/README
> > > >
> > > > ;-)
> > >
> > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > before gaining a reference to them and don't hold the reference past
> > > the matching rcu_read_unlock().
> >
> > I don't think it does.
> >
> > It would have to register an call_rcu callback itself in order to mark
> > it freed - and handle the race with the object being handed out again.
>
> I had the same problem while debugging a cfq-derived i/o scheduler,
> and I found nothing preventing the reuse of the freed memory.
> The patch below seemed to fix the logic.
Looks good to me from a strictly RCU viewpoint -- I must confess great
ignorance of the CFQ code. :-/
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Fabio Checconi <fabio@gandalf.sssup.it>
> ---
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 0f962ec..f26da2b 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1143,24 +1143,37 @@ static void cfq_put_queue(struct cfq_queue *cfqq)
> }
>
> /*
> - * Call func for each cic attached to this ioc. Returns number of cic's seen.
> + * Call func for each cic attached to this ioc.
> */
> -static unsigned int
> +static void
> call_for_each_cic(struct io_context *ioc,
> void (*func)(struct io_context *, struct cfq_io_context *))
> {
> struct cfq_io_context *cic;
> struct hlist_node *n;
> - int called = 0;
>
> rcu_read_lock();
> - hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list) {
> + hlist_for_each_entry_rcu(cic, n, &ioc->cic_list, cic_list)
> func(ioc, cic);
> - called++;
> - }
> rcu_read_unlock();
> +}
> +
> +static void cfq_cic_free_rcu(struct rcu_head *head)
> +{
> + struct cfq_io_context *cic;
> +
> + cic = container_of(head, struct cfq_io_context, rcu_head);
> +
> + kmem_cache_free(cfq_ioc_pool, cic);
> + elv_ioc_count_dec(ioc_count);
> +
> + if (ioc_gone && !elv_ioc_count_read(ioc_count))
> + complete(ioc_gone);
> +}
>
> - return called;
> +static void cfq_cic_free(struct cfq_io_context *cic)
> +{
> + call_rcu(&cic->rcu_head, cfq_cic_free_rcu);
> }
>
> static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
> @@ -1174,24 +1187,18 @@ static void cic_free_func(struct io_context *ioc, struct cfq_io_context *cic)
> hlist_del_rcu(&cic->cic_list);
> spin_unlock_irqrestore(&ioc->lock, flags);
>
> - kmem_cache_free(cfq_ioc_pool, cic);
> + cfq_cic_free(cic);
> }
>
> static void cfq_free_io_context(struct io_context *ioc)
> {
> - int freed;
> -
> /*
> - * ioc->refcount is zero here, so no more cic's are allowed to be
> - * linked into this ioc. So it should be ok to iterate over the known
> - * list, we will see all cic's since no new ones are added.
> + * ioc->refcount is zero here, or we are called from elv_unregister(),
> + * so no more cic's are allowed to be linked into this ioc. So it
> + * should be ok to iterate over the known list, we will see all cic's
> + * since no new ones are added.
> */
> - freed = call_for_each_cic(ioc, cic_free_func);
> -
> - elv_ioc_count_mod(ioc_count, -freed);
> -
> - if (ioc_gone && !elv_ioc_count_read(ioc_count))
> - complete(ioc_gone);
> + call_for_each_cic(ioc, cic_free_func);
> }
>
> static void cfq_exit_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> @@ -1458,15 +1465,6 @@ cfq_get_queue(struct cfq_data *cfqd, int is_sync, struct io_context *ioc,
> return cfqq;
> }
>
> -static void cfq_cic_free(struct cfq_io_context *cic)
> -{
> - kmem_cache_free(cfq_ioc_pool, cic);
> - elv_ioc_count_dec(ioc_count);
> -
> - if (ioc_gone && !elv_ioc_count_read(ioc_count))
> - complete(ioc_gone);
> -}
> -
> /*
> * We drop cfq io contexts lazily, so we may find a dead one.
> */
> @@ -2138,7 +2136,7 @@ static int __init cfq_slab_setup(void)
> if (!cfq_pool)
> goto fail;
>
> - cfq_ioc_pool = KMEM_CACHE(cfq_io_context, SLAB_DESTROY_BY_RCU);
> + cfq_ioc_pool = KMEM_CACHE(cfq_io_context, 0);
> if (!cfq_ioc_pool)
> goto fail;
>
> @@ -2286,7 +2284,6 @@ static void __exit cfq_exit(void)
> smp_wmb();
> if (elv_ioc_count_read(ioc_count))
> wait_for_completion(ioc_gone);
> - synchronize_rcu();
> cfq_slab_kill();
> }
>
> diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h
> index 1b4ccf2..50e448c 100644
> --- a/include/linux/iocontext.h
> +++ b/include/linux/iocontext.h
> @@ -54,6 +54,8 @@ struct cfq_io_context {
>
> void (*dtor)(struct io_context *); /* destructor */
> void (*exit)(struct io_context *); /* called on task exit */
> +
> + struct rcu_head rcu_head;
> };
>
> /*
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:58 ` Jens Axboe
2008-04-02 12:58 ` Jens Axboe
@ 2008-04-02 13:37 ` Paul E. McKenney
2008-04-02 13:41 ` Jens Axboe
1 sibling, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 13:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, Ingo Molnar, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 02:58:19PM +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > From: Jens Axboe <jens.axboe@oracle.com>
> > > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> > >
> > > > Looks good and tests fine as well. I've applied it, on top of the
> > > > hlist_for_each_entry_safe_rcu() fix.
> > > >
> > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > > >
> >
> > ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> > is needed at this point, since the pos->next pointer is still valid
> > (at least) until the next rcu_read_unlock(). am I wrong?
>
> it isn't, but it's still clearer. so either that or a nice comment, I
> just stuck with what I had already committed.
Given Peter's comment, would you be willing to drop the
hlist_for_each_entry_safe_rcu()? People tend to read too much into
the "safe" sometimes. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 13:32 ` Paul E. McKenney
@ 2008-04-02 13:40 ` Jens Axboe
2008-04-02 16:15 ` Paul E. McKenney
0 siblings, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 13:40 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Peter Zijlstra, Ingo Molnar, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Paul E. McKenney wrote:
> On Wed, Apr 02, 2008 at 01:33:27PM +0200, Fabio Checconi wrote:
> > > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > Date: Wed, Apr 02, 2008 12:59:21PM +0200
> > >
> > > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
> > > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
> > > > >
> > > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > >
> > > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > > > > > patch to fix that, please see it here:
> > > > > > >
> > > > > > > You probably don't have kmemcheck in your kernel ;-)
> > > > > >
> > > > > > Ehm no, you are right :)
> > > > >
> > > > > ... and you can get kmemcheck by testing on x86.git/latest:
> > > > >
> > > > > http://people.redhat.com/mingo/x86.git/README
> > > > >
> > > > > ;-)
> > > >
> > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > before gaining a reference to them and don't hold the reference past
> > > > the matching rcu_read_unlock().
> > >
> > > I don't think it does.
> > >
> > > It would have to register an call_rcu callback itself in order to mark
> > > it freed - and handle the race with the object being handed out again.
> >
> > I had the same problem while debugging a cfq-derived i/o scheduler,
> > and I found nothing preventing the reuse of the freed memory.
> > The patch below seemed to fix the logic.
>
> Looks good to me from a strictly RCU viewpoint -- I must confess great
> ignorance of the CFQ code. :-/
That can always be rectified, given enough spare time :-)
> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Thanks, added.
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 13:37 ` Paul E. McKenney
@ 2008-04-02 13:41 ` Jens Axboe
2008-04-02 15:33 ` Paul E. McKenney
0 siblings, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 13:41 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Peter Zijlstra, Ingo Molnar, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Paul E. McKenney wrote:
> On Wed, Apr 02, 2008 at 02:58:19PM +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > > From: Jens Axboe <jens.axboe@oracle.com>
> > > > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> > > >
> > > > > Looks good and tests fine as well. I've applied it, on top of the
> > > > > hlist_for_each_entry_safe_rcu() fix.
> > > > >
> > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > > > >
> > >
> > > ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> > > is needed at this point, since the pos->next pointer is still valid
> > > (at least) until the next rcu_read_unlock(). am I wrong?
> >
> > it isn't, but it's still clearer. so either that or a nice comment, I
> > just stuck with what I had already committed.
>
> Given Peter's comment, would you be willing to drop the
> hlist_for_each_entry_safe_rcu()? People tend to read too much into
> the "safe" sometimes. ;-)
Sure, let me rebase it... I already sent a pull request to Linus, and he
just loves when I rebase and change the diffstats behind his back :)
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 12:13 ` Peter Zijlstra
2008-04-02 12:28 ` Jens Axboe
2008-04-02 13:26 ` Paul E. McKenney
@ 2008-04-02 13:43 ` Andi Kleen
2 siblings, 0 replies; 69+ messages in thread
From: Andi Kleen @ 2008-04-02 13:43 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jens Axboe, Pekka Enberg, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List, Andrew Morton, Randy Dunlap
Peter Zijlstra <a.p.zijlstra@chello.nl> writes:
>
> Ok I gave it a go, how bad is this text?
Should be also in the kernel doc description of kmem_cache_create(), so it
appears in manpages etc.
-Andi
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index f950a89..e049ddc 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -25,6 +25,32 @@
> #define SLAB_CACHE_DMA 0x00004000UL /* Use GFP_DMA memory */
> #define SLAB_STORE_USER 0x00010000UL /* DEBUG: Store the last owner for bug hunting */
> #define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
> +/*
> + * SLAB_DESTROY_BY_RCU - **WARNING** READ THIS!
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 13:41 ` Jens Axboe
@ 2008-04-02 15:33 ` Paul E. McKenney
2008-04-02 16:31 ` Jens Axboe
0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 15:33 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, Ingo Molnar, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 03:41:23PM +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Paul E. McKenney wrote:
> > On Wed, Apr 02, 2008 at 02:58:19PM +0200, Jens Axboe wrote:
> > > On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > > > From: Jens Axboe <jens.axboe@oracle.com>
> > > > > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> > > > >
> > > > > > Looks good and tests fine as well. I've applied it, on top of the
> > > > > > hlist_for_each_entry_safe_rcu() fix.
> > > > > >
> > > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > > > > >
> > > >
> > > > ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> > > > is needed at this point, since the pos->next pointer is still valid
> > > > (at least) until the next rcu_read_unlock(). am I wrong?
> > >
> > > it isn't, but it's still clearer. so either that or a nice comment, I
> > > just stuck with what I had already committed.
> >
> > Given Peter's comment, would you be willing to drop the
> > hlist_for_each_entry_safe_rcu()? People tend to read too much into
> > the "safe" sometimes. ;-)
>
> Sure, let me rebase it... I already sent a pull request to Linus, and he
> just loves when I rebase and change the diffstats behind his back :)
Would it help if I submitted the patch to you to back it out later? ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:01 ` Pekka Enberg
2008-04-02 11:07 ` Jens Axboe
@ 2008-04-02 16:08 ` Paul E. McKenney
2008-04-02 16:15 ` Vegard Nossum
1 sibling, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 16:08 UTC (permalink / raw)
To: Pekka Enberg
Cc: Ingo Molnar, Jens Axboe, Peter Zijlstra, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 02:01:13PM +0300, Pekka Enberg wrote:
> Hi Paul,
>
> On Wed, Apr 2, 2008 at 1:55 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > I will check this when I get back to some bandwidth -- but in the meantime,
> > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > newly-freed items in that case, as long as you did rcu_read_lock()
> > before gaining a reference to them and don't hold the reference past
> > the matching rcu_read_unlock().
>
> No, kmemcheck is work in progress and does not know about
> SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> was because Peter, Vegard, and myself identified this particular
> warning as a real problem. But yeah, kmemcheck can cause false
> positives for RCU for now.
Would the following be an appropriate fix? It seems to me to be in
the same spirit as the existing check for s->ctor.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
slub_kmemcheck.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/slub_kmemcheck.c b/mm/slub_kmemcheck.c
index 8620a8b..e07f62a 100644
--- a/mm/slub_kmemcheck.c
+++ b/mm/slub_kmemcheck.c
@@ -93,6 +93,6 @@ kmemcheck_slab_alloc(struct kmem_cache *s, gfp_t gfpflags, void *object)
void
kmemcheck_slab_free(struct kmem_cache *s, void *object)
{
- if (!s->ctor)
+ if (!s->ctor && !(s->flags & SLAB_DESTROY_BY_RCU))
kmemcheck_mark_freed(object, s->objsize);
}
^ permalink raw reply related [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 13:16 ` Fabio Checconi
@ 2008-04-02 16:14 ` Paul E. McKenney
0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 16:14 UTC (permalink / raw)
To: Jens Axboe, Peter Zijlstra, Ingo Molnar, Pekka J Enberg,
Vegard Nossum, Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 03:16:25PM +0200, Fabio Checconi wrote:
> > From: Jens Axboe <jens.axboe@oracle.com>
> > Date: Wed, Apr 02, 2008 02:58:58PM +0200
> >
> > On Wed, Apr 02 2008, Jens Axboe wrote:
> > > On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > > > From: Jens Axboe <jens.axboe@oracle.com>
> > > > > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> > > > >
> > > > > > Looks good and tests fine as well. I've applied it, on top of the
> > > > > > hlist_for_each_entry_safe_rcu() fix.
> > > > > >
> > > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > > > > >
> > > >
> > > > ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> > > > is needed at this point, since the pos->next pointer is still valid
> > > > (at least) until the next rcu_read_unlock(). am I wrong?
> > >
> > > it isn't, but it's still clearer. so either that or a nice comment, I
> > > just stuck with what I had already committed.
> >
>
> ok I agree on that. the only remaining concern I have is that when
> I first looked at it it seemed to me that hlist_for_each_entry_safe_rcu()
> was missing by purpose from the list interface, since hlist_del_rcu()
> can be called anyway during the traversal from a concurrent context,
> so the semantics of *_safe_* have to be carried out by other means
> (i.e., call_rcu()).
Exactly. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 13:40 ` Jens Axboe
@ 2008-04-02 16:15 ` Paul E. McKenney
0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 16:15 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, Ingo Molnar, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 03:40:46PM +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Paul E. McKenney wrote:
> > On Wed, Apr 02, 2008 at 01:33:27PM +0200, Fabio Checconi wrote:
> > > > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > Date: Wed, Apr 02, 2008 12:59:21PM +0200
> > > >
> > > > On Wed, 2008-04-02 at 03:55 -0700, Paul E. McKenney wrote:
> > > > > On Wed, Apr 02, 2008 at 09:28:46AM +0200, Ingo Molnar wrote:
> > > > > >
> > > > > > * Jens Axboe <jens.axboe@oracle.com> wrote:
> > > > > >
> > > > > > > On Wed, Apr 02 2008, Pekka J Enberg wrote:
> > > > > > > > On Wed, 2 Apr 2008, Jens Axboe wrote:
> > > > > > > > > Good catch, I wonder why it didn't complain in my testing. I've added a
> > > > > > > > > patch to fix that, please see it here:
> > > > > > > >
> > > > > > > > You probably don't have kmemcheck in your kernel ;-)
> > > > > > >
> > > > > > > Ehm no, you are right :)
> > > > > >
> > > > > > ... and you can get kmemcheck by testing on x86.git/latest:
> > > > > >
> > > > > > http://people.redhat.com/mingo/x86.git/README
> > > > > >
> > > > > > ;-)
> > > > >
> > > > > I will check this when I get back to some bandwidth -- but in the meantime,
> > > > > does kmemcheck special-case SLAB_DESTROY_BY_RCU? It is legal to access
> > > > > newly-freed items in that case, as long as you did rcu_read_lock()
> > > > > before gaining a reference to them and don't hold the reference past
> > > > > the matching rcu_read_unlock().
> > > >
> > > > I don't think it does.
> > > >
> > > > It would have to register an call_rcu callback itself in order to mark
> > > > it freed - and handle the race with the object being handed out again.
> > >
> > > I had the same problem while debugging a cfq-derived i/o scheduler,
> > > and I found nothing preventing the reuse of the freed memory.
> > > The patch below seemed to fix the logic.
> >
> > Looks good to me from a strictly RCU viewpoint -- I must confess great
> > ignorance of the CFQ code. :-/
>
> That can always be rectified, given enough spare time :-)
Hey, I am not complaining about 2007 being gone already, because I am
still wondering what the heck happened to 2005 and 2006!!! ;-)
Thanx, Paul
> > Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>
> Thanks, added.
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 16:08 ` Paul E. McKenney
@ 2008-04-02 16:15 ` Vegard Nossum
2008-04-02 16:32 ` Pekka J Enberg
2008-04-02 16:59 ` Paul E. McKenney
0 siblings, 2 replies; 69+ messages in thread
From: Vegard Nossum @ 2008-04-02 16:15 UTC (permalink / raw)
To: paulmck
Cc: Pekka Enberg, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
On Wed, Apr 2, 2008 at 6:08 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Apr 02, 2008 at 02:01:13PM +0300, Pekka Enberg wrote:
> > No, kmemcheck is work in progress and does not know about
> > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > was because Peter, Vegard, and myself identified this particular
> > warning as a real problem. But yeah, kmemcheck can cause false
> > positives for RCU for now.
>
> Would the following be an appropriate fix? It seems to me to be in
> the same spirit as the existing check for s->ctor.
In my opinion, no.
It would fix the false positives, but would in fact also hide cases
such as this one with cfq, e.g. the real cases of mis-use.
Peter Zijlstra suggested this:
> It would have to register an call_rcu callback itself in order to mark
> it freed - and handle the race with the object being handed out again.
I will try to look into this -- for now, I need to understand RCU
first (I've seen your LWN articles -- great work! :-))
Kind regards,
Vegard Nossum
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 15:33 ` Paul E. McKenney
@ 2008-04-02 16:31 ` Jens Axboe
2008-04-02 17:00 ` Paul E. McKenney
0 siblings, 1 reply; 69+ messages in thread
From: Jens Axboe @ 2008-04-02 16:31 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Peter Zijlstra, Ingo Molnar, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02 2008, Paul E. McKenney wrote:
> On Wed, Apr 02, 2008 at 03:41:23PM +0200, Jens Axboe wrote:
> > On Wed, Apr 02 2008, Paul E. McKenney wrote:
> > > On Wed, Apr 02, 2008 at 02:58:19PM +0200, Jens Axboe wrote:
> > > > On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > > > > From: Jens Axboe <jens.axboe@oracle.com>
> > > > > > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> > > > > >
> > > > > > > Looks good and tests fine as well. I've applied it, on top of the
> > > > > > > hlist_for_each_entry_safe_rcu() fix.
> > > > > > >
> > > > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > > > > > >
> > > > >
> > > > > ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> > > > > is needed at this point, since the pos->next pointer is still valid
> > > > > (at least) until the next rcu_read_unlock(). am I wrong?
> > > >
> > > > it isn't, but it's still clearer. so either that or a nice comment, I
> > > > just stuck with what I had already committed.
> > >
> > > Given Peter's comment, would you be willing to drop the
> > > hlist_for_each_entry_safe_rcu()? People tend to read too much into
> > > the "safe" sometimes. ;-)
> >
> > Sure, let me rebase it... I already sent a pull request to Linus, and he
> > just loves when I rebase and change the diffstats behind his back :)
>
> Would it help if I submitted the patch to you to back it out later? ;-)
Hehe, I already wrote to Linus and explained why it changed, so what is
done is done :)
--
Jens Axboe
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 16:15 ` Vegard Nossum
@ 2008-04-02 16:32 ` Pekka J Enberg
2008-04-02 18:23 ` Paul E. McKenney
2008-04-02 16:59 ` Paul E. McKenney
1 sibling, 1 reply; 69+ messages in thread
From: Pekka J Enberg @ 2008-04-02 16:32 UTC (permalink / raw)
To: Vegard Nossum
Cc: paulmck, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
Hi Vegard,
On Wed, Apr 2, 2008 at 6:08 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > Would the following be an appropriate fix? It seems to me to be in
> > the same spirit as the existing check for s->ctor.
On Wed, 2 Apr 2008, Vegard Nossum wrote:
> In my opinion, no.
>
> It would fix the false positives, but would in fact also hide cases
> such as this one with cfq, e.g. the real cases of mis-use.
Yes, but we might as well put Paul's patch in now and remove that later
when we have a proper fix, no?
On Wed, 2 Apr 2008, Vegard Nossum wrote:
> Peter Zijlstra suggested this:
> > It would have to register an call_rcu callback itself in order to mark
> > it freed - and handle the race with the object being handed out again.
>
> I will try to look into this -- for now, I need to understand RCU
> first (I've seen your LWN articles -- great work! :-))
Well, maybe we can add two new states: RCU_FREED and RCU_VALIDATED? The
object is flagged with the first one as soon as an object is handed over
to kmem_cache_free() and the latter needs to hook to the validation phase
of RCU (how is that done btw?). Then kmemcheck could even give a better
error message: "RCU-freed object used without validation."
And with delayed free for kmemcheck we discussed before, we'd hold on to
the objects long enough to actually see these error conditions.
Pekka
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 16:15 ` Vegard Nossum
2008-04-02 16:32 ` Pekka J Enberg
@ 2008-04-02 16:59 ` Paul E. McKenney
2008-04-02 17:31 ` Vegard Nossum
1 sibling, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 16:59 UTC (permalink / raw)
To: Vegard Nossum
Cc: Pekka Enberg, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 06:15:52PM +0200, Vegard Nossum wrote:
> On Wed, Apr 2, 2008 at 6:08 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Wed, Apr 02, 2008 at 02:01:13PM +0300, Pekka Enberg wrote:
> > > No, kmemcheck is work in progress and does not know about
> > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > was because Peter, Vegard, and myself identified this particular
> > > warning as a real problem. But yeah, kmemcheck can cause false
> > > positives for RCU for now.
> >
> > Would the following be an appropriate fix? It seems to me to be in
> > the same spirit as the existing check for s->ctor.
>
> In my opinion, no.
>
> It would fix the false positives, but would in fact also hide cases
> such as this one with cfq, e.g. the real cases of mis-use.
Though this case apparently does not qualify as misuse until such
time as CLONE_IO is implemented.
And doesn't the current check for ->ctor also potentially hide misuse?
> Peter Zijlstra suggested this:
> > It would have to register an call_rcu callback itself in order to mark
> > it freed - and handle the race with the object being handed out again.
>
> I will try to look into this -- for now, I need to understand RCU
> first (I've seen your LWN articles -- great work! :-))
Glad you liked them!
And Peter's suggested approach would indeed be more accurate. But I
will still put my patch forward as a stopgap. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 16:31 ` Jens Axboe
@ 2008-04-02 17:00 ` Paul E. McKenney
0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 17:00 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, Ingo Molnar, Pekka J Enberg, Vegard Nossum,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 06:31:14PM +0200, Jens Axboe wrote:
> On Wed, Apr 02 2008, Paul E. McKenney wrote:
> > On Wed, Apr 02, 2008 at 03:41:23PM +0200, Jens Axboe wrote:
> > > On Wed, Apr 02 2008, Paul E. McKenney wrote:
> > > > On Wed, Apr 02, 2008 at 02:58:19PM +0200, Jens Axboe wrote:
> > > > > On Wed, Apr 02 2008, Fabio Checconi wrote:
> > > > > > > From: Jens Axboe <jens.axboe@oracle.com>
> > > > > > > Date: Wed, Apr 02, 2008 02:36:39PM +0200
> > > > > > >
> > > > > > > > Looks good and tests fine as well. I've applied it, on top of the
> > > > > > > > hlist_for_each_entry_safe_rcu() fix.
> > > > > > > >
> > > > > > > > http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=436151ad32b4a59e0d36165a7d6312545f126661
> > > > > > > >
> > > > > >
> > > > > > ok, thanks. anyway I don't think the hlist_for_each_entry_safe_rcu()
> > > > > > is needed at this point, since the pos->next pointer is still valid
> > > > > > (at least) until the next rcu_read_unlock(). am I wrong?
> > > > >
> > > > > it isn't, but it's still clearer. so either that or a nice comment, I
> > > > > just stuck with what I had already committed.
> > > >
> > > > Given Peter's comment, would you be willing to drop the
> > > > hlist_for_each_entry_safe_rcu()? People tend to read too much into
> > > > the "safe" sometimes. ;-)
> > >
> > > Sure, let me rebase it... I already sent a pull request to Linus, and he
> > > just loves when I rebase and change the diffstats behind his back :)
> >
> > Would it help if I submitted the patch to you to back it out later? ;-)
>
> Hehe, I already wrote to Linus and explained why it changed, so what is
> done is done :)
;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 16:59 ` Paul E. McKenney
@ 2008-04-02 17:31 ` Vegard Nossum
0 siblings, 0 replies; 69+ messages in thread
From: Vegard Nossum @ 2008-04-02 17:31 UTC (permalink / raw)
To: paulmck
Cc: Pekka Enberg, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
On Wed, Apr 2, 2008 at 6:59 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Wed, Apr 02, 2008 at 06:15:52PM +0200, Vegard Nossum wrote:
> > On Wed, Apr 2, 2008 at 6:08 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Wed, Apr 02, 2008 at 02:01:13PM +0300, Pekka Enberg wrote:
> > > > No, kmemcheck is work in progress and does not know about
> > > > SLAB_DESTROY_BY_RCU yet. The reason I asked Vegard to post the warning
> > > > was because Peter, Vegard, and myself identified this particular
> > > > warning as a real problem. But yeah, kmemcheck can cause false
> > > > positives for RCU for now.
> > >
> > > Would the following be an appropriate fix? It seems to me to be in
> > > the same spirit as the existing check for s->ctor.
> >
> > In my opinion, no.
> >
> > It would fix the false positives, but would in fact also hide cases
> > such as this one with cfq, e.g. the real cases of mis-use.
>
> Though this case apparently does not qualify as misuse until such
> time as CLONE_IO is implemented.
>
> And doesn't the current check for ->ctor also potentially hide misuse?
Hm, no. Objects with a ctor should retain its "initializedness"
between allocations of the same object. This is one of the features of
slab allocation.
> > Peter Zijlstra suggested this:
> > > It would have to register an call_rcu callback itself in order to mark
> > > it freed - and handle the race with the object being handed out again.
>
> Glad you liked them!
>
> And Peter's suggested approach would indeed be more accurate. But I
> will still put my patch forward as a stopgap. ;-)
Yes, I realize that it will be needed as a temporary fix. It _is_
better to hide some real warnings in favour of the showing false ones.
(But a TODO comment is in order, I believe.)
Thanks.
Vegard
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 10:49 ` Peter Zijlstra
2008-04-02 10:54 ` Pekka J Enberg
@ 2008-04-02 17:35 ` Christoph Lameter
1 sibling, 0 replies; 69+ messages in thread
From: Christoph Lameter @ 2008-04-02 17:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Pekka Enberg, paulmck, Jens Axboe, Vegard Nossum, Ingo Molnar,
Linux Kernel Mailing List
On Wed, 2 Apr 2008, Peter Zijlstra wrote:
> Does SLAB (as opposed to SLUB) do it differently?
No it works the same in both.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 11:11 ` Pekka Enberg
2008-04-02 11:14 ` Peter Zijlstra
@ 2008-04-02 17:36 ` Christoph Lameter
1 sibling, 0 replies; 69+ messages in thread
From: Christoph Lameter @ 2008-04-02 17:36 UTC (permalink / raw)
To: Pekka Enberg
Cc: Peter Zijlstra, Jens Axboe, paulmck, Ingo Molnar, Vegard Nossum,
Linux Kernel Mailing List
On Wed, 2 Apr 2008, Pekka Enberg wrote:
> On Wed, Apr 2, 2008 at 2:08 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > but what holds off the slab allocator re-issueing that same object and
> > someone else writing other stuff into it?
>
> Nothing. So you cannot access the object at all after you've called
> kmem_cache_free(). SLAB_RCU or no SLAB_RCU.
You can of course access the object. And if you establish a definite state
of locks on free (and through a ctor) then its even okay to take locks.
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 16:32 ` Pekka J Enberg
@ 2008-04-02 18:23 ` Paul E. McKenney
2008-04-02 19:53 ` Pekka Enberg
0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 18:23 UTC (permalink / raw)
To: Pekka J Enberg
Cc: Vegard Nossum, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 07:32:26PM +0300, Pekka J Enberg wrote:
> Hi Vegard,
>
> On Wed, Apr 2, 2008 at 6:08 PM, Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > > Would the following be an appropriate fix? It seems to me to be in
> > > the same spirit as the existing check for s->ctor.
>
> On Wed, 2 Apr 2008, Vegard Nossum wrote:
> > In my opinion, no.
> >
> > It would fix the false positives, but would in fact also hide cases
> > such as this one with cfq, e.g. the real cases of mis-use.
>
> Yes, but we might as well put Paul's patch in now and remove that later
> when we have a proper fix, no?
>
> On Wed, 2 Apr 2008, Vegard Nossum wrote:
> > Peter Zijlstra suggested this:
> > > It would have to register an call_rcu callback itself in order to mark
> > > it freed - and handle the race with the object being handed out again.
> >
> > I will try to look into this -- for now, I need to understand RCU
> > first (I've seen your LWN articles -- great work! :-))
>
> Well, maybe we can add two new states: RCU_FREED and RCU_VALIDATED? The
> object is flagged with the first one as soon as an object is handed over
> to kmem_cache_free() and the latter needs to hook to the validation phase
> of RCU (how is that done btw?). Then kmemcheck could even give a better
> error message: "RCU-freed object used without validation."
>
> And with delayed free for kmemcheck we discussed before, we'd hold on to
> the objects long enough to actually see these error conditions.
Well, one approach would be to add an rcu_head to the kmem_cache
structure, along with a flag stating that the rcu_head is in use. I hope
that there is a better approach, as this introduces a lock roundtrip
into kmemcheck_slab_free(). Is there a better place to put the rcu_head?
Perhaps into the per-CPU allocator? But then we have to track which
CPU has which mark pending, and there are only so many bits in a byte,
as the SGI guys would be quick to point out
Which is why I chickened out and submitted the earlier crude patch.
Anyway, here is a -very- rough sketch of the stupid lock-based approach.
Thanx, Paul
struct kmem_cache {
. . . /* existing fields */
struct rcu_head rcu;
int rcu_available; /* rcu_head above is available for use. */
spinlock_t rcu_lock; /* which of course must be initialized. */
};
Then we need to add a couple of values to the enum shadow:
enum shadow {
... /* existing values */
SHADOW_RCU_FREED,
SHADOW_RCU_FREED_PENDING,
};
Then we have:
void
kmemcheck_slab_free(struct kmem_cache *s, void *object)
{
unsigned long flags;
if (s->ctor)
return;
if (likely(!(s->flags & SLAB_DESTROY_BY_RCU)))
kmemcheck_mark_freed(object, s->objsize);
spin_lock_irqsave(&s->rcu_lock, flags);
if (s->rcu_available) {
kmemcheck_mark_rcu_freed(object, s->objsize);
/* record the address somewhere... */
call_rcu(&s->rcu, kmemcheck_slab_free_rcu);
} else {
kmemcheck_mark_rcu_pending(object, s->objsize);
/* record the address somewhere... */
}
spin_unlock_irqrestore(&s->rcu_lock, flags);
}
void kmemcheck_slab_free_rcu(struct rcu_head *rcu)
{
unsigned long flags;
struct kmem_cache *s = container_of(rcu, struct kmem_cache, rcu);
void *shadow;
spin_lock_irqsave(&s->rcu_lock, flags);
/* recover the previously recorded object address. somehow */
kmemcheck_mark_freed(object, s->objsize);
if (/* there are pending requests */) {
/* get the previously recorded object addresses, somehow */
kmemcheck_mark_rcu_freed(object, s->objsize);
call_rcu(&s->rcu, kmemcheck_slab_free_rcu);
}
spin_unlock_irqrestore(&s->rcu_lock, flags);
}
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 18:23 ` Paul E. McKenney
@ 2008-04-02 19:53 ` Pekka Enberg
2008-04-02 20:15 ` Paul E. McKenney
0 siblings, 1 reply; 69+ messages in thread
From: Pekka Enberg @ 2008-04-02 19:53 UTC (permalink / raw)
To: paulmck
Cc: Vegard Nossum, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
Hi Paul,
On Wed, Apr 02, 2008 at 07:32:26PM +0300, Pekka J Enberg wrote:
> > Well, maybe we can add two new states: RCU_FREED and RCU_VALIDATED? The
> > object is flagged with the first one as soon as an object is handed over
> > to kmem_cache_free() and the latter needs to hook to the validation phase
> > of RCU (how is that done btw?). Then kmemcheck could even give a better
> > error message: "RCU-freed object used without validation."
> >
> > And with delayed free for kmemcheck we discussed before, we'd hold on to
> > the objects long enough to actually see these error conditions.
On Wed, Apr 2, 2008 at 9:23 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> Well, one approach would be to add an rcu_head to the kmem_cache
> structure, along with a flag stating that the rcu_head is in use. I hope
> that there is a better approach, as this introduces a lock roundtrip
> into kmemcheck_slab_free(). Is there a better place to put the rcu_head?
> Perhaps into the per-CPU allocator? But then we have to track which
> CPU has which mark pending, and there are only so many bits in a byte,
> as the SGI guys would be quick to point out
I suppose you haven't actually run kmemcheck on your machine? We're
taking a page fault for _every_ memory access so a lock round-trip in
the SLAB_RCU case is probably not that bad performance-wise :-).
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 19:53 ` Pekka Enberg
@ 2008-04-02 20:15 ` Paul E. McKenney
2008-04-03 15:18 ` Paul E. McKenney
0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-02 20:15 UTC (permalink / raw)
To: Pekka Enberg
Cc: Vegard Nossum, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 10:53:53PM +0300, Pekka Enberg wrote:
> Hi Paul,
>
> On Wed, Apr 02, 2008 at 07:32:26PM +0300, Pekka J Enberg wrote:
> > > Well, maybe we can add two new states: RCU_FREED and RCU_VALIDATED? The
> > > object is flagged with the first one as soon as an object is handed over
> > > to kmem_cache_free() and the latter needs to hook to the validation phase
> > > of RCU (how is that done btw?). Then kmemcheck could even give a better
> > > error message: "RCU-freed object used without validation."
> > >
> > > And with delayed free for kmemcheck we discussed before, we'd hold on to
> > > the objects long enough to actually see these error conditions.
>
> On Wed, Apr 2, 2008 at 9:23 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > Well, one approach would be to add an rcu_head to the kmem_cache
> > structure, along with a flag stating that the rcu_head is in use. I hope
> > that there is a better approach, as this introduces a lock roundtrip
> > into kmemcheck_slab_free(). Is there a better place to put the rcu_head?
> > Perhaps into the per-CPU allocator? But then we have to track which
> > CPU has which mark pending, and there are only so many bits in a byte,
> > as the SGI guys would be quick to point out
>
> I suppose you haven't actually run kmemcheck on your machine? We're
> taking a page fault for _every_ memory access so a lock round-trip in
> the SLAB_RCU case is probably not that bad performance-wise :-).
Coward that I am, no I have not. ;-)
The thing that worries me even more than the lock is the need to keep
track of the addresses.
Then again, if you are taking a page fault on every access, perhaps not
such a big deal to allocate the memory and link it into a list...
But yikes!!! ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-02 20:15 ` Paul E. McKenney
@ 2008-04-03 15:18 ` Paul E. McKenney
2008-04-03 19:49 ` Pekka J Enberg
0 siblings, 1 reply; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-03 15:18 UTC (permalink / raw)
To: Pekka Enberg
Cc: Vegard Nossum, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
On Wed, Apr 02, 2008 at 01:15:51PM -0700, Paul E. McKenney wrote:
> On Wed, Apr 02, 2008 at 10:53:53PM +0300, Pekka Enberg wrote:
> > I suppose you haven't actually run kmemcheck on your machine? We're
> > taking a page fault for _every_ memory access so a lock round-trip in
> > the SLAB_RCU case is probably not that bad performance-wise :-).
>
> Coward that I am, no I have not. ;-)
>
> The thing that worries me even more than the lock is the need to keep
> track of the addresses.
>
> Then again, if you are taking a page fault on every access, perhaps not
> such a big deal to allocate the memory and link it into a list...
> But yikes!!! ;-)
OK, so another approach would be to use a larger shadow block for
SLAB_DESTROY_BY_RCU slabs, so that each shadow location would have enough
room for an rcu_head and a size in addition to the flag. That would
trivialize tracking, or, more accurately, delegate such tracking to the
RCU infrastructure.
Of course, the case where the block gets reallocated before the RCU
grace period ends would also need to be handled (which my rough sketch
yesterday did -not- handle, by the way...).
There are a couple of ways of doing this. Probably the easiest approach
is to add more state to the flag, so that the RCU callback would check
to see if reallocation had already happened. If so, it would update the
state to indicate that the rcu_head was again available, and would need to
repost itself if the block had been freed again after being reallocated.
The other approach would be to defer actually adding the block to the
freelist until the grace period expired. This would be more accurate,
but also quite a bit more intrusive.
Thoughts?
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-03 15:18 ` Paul E. McKenney
@ 2008-04-03 19:49 ` Pekka J Enberg
2008-04-03 21:27 ` Paul E. McKenney
0 siblings, 1 reply; 69+ messages in thread
From: Pekka J Enberg @ 2008-04-03 19:49 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Vegard Nossum, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
Hi Paul,
On Thu, 3 Apr 2008, Paul E. McKenney wrote:
> OK, so another approach would be to use a larger shadow block for
> SLAB_DESTROY_BY_RCU slabs, so that each shadow location would have enough
> room for an rcu_head and a size in addition to the flag. That would
> trivialize tracking, or, more accurately, delegate such tracking to the
> RCU infrastructure.
Yeah, or just allocate some extra spaces for the RCU case and not
overload the current shadow pages. But sounds good to me.
On Thu, 3 Apr 2008, Paul E. McKenney wrote:
> Of course, the case where the block gets reallocated before the RCU
> grace period ends would also need to be handled (which my rough sketch
> yesterday did -not- handle, by the way...).
>
> There are a couple of ways of doing this. Probably the easiest approach
> is to add more state to the flag, so that the RCU callback would check
> to see if reallocation had already happened. If so, it would update the
> state to indicate that the rcu_head was again available, and would need to
> repost itself if the block had been freed again after being reallocated.
>
> The other approach would be to defer actually adding the block to the
> freelist until the grace period expired. This would be more accurate,
> but also quite a bit more intrusive.
We already talked about deferring the actual freeing in kmemcheck to
better detect these use-after-free conditions with Vegard. So it's
something that we probably want to do regardless of RCU.
Pekka
^ permalink raw reply [flat|nested] 69+ messages in thread
* Re: kmemcheck caught read from freed memory (cfq_free_io_context)
2008-04-03 19:49 ` Pekka J Enberg
@ 2008-04-03 21:27 ` Paul E. McKenney
0 siblings, 0 replies; 69+ messages in thread
From: Paul E. McKenney @ 2008-04-03 21:27 UTC (permalink / raw)
To: Pekka J Enberg
Cc: Vegard Nossum, Ingo Molnar, Jens Axboe, Peter Zijlstra,
Linux Kernel Mailing List
On Thu, Apr 03, 2008 at 10:49:23PM +0300, Pekka J Enberg wrote:
> Hi Paul,
>
> On Thu, 3 Apr 2008, Paul E. McKenney wrote:
> > OK, so another approach would be to use a larger shadow block for
> > SLAB_DESTROY_BY_RCU slabs, so that each shadow location would have enough
> > room for an rcu_head and a size in addition to the flag. That would
> > trivialize tracking, or, more accurately, delegate such tracking to the
> > RCU infrastructure.
>
> Yeah, or just allocate some extra spaces for the RCU case and not
> overload the current shadow pages. But sounds good to me.
As long as we have an rcu_head for each memory block in the slab, I am
not to worried about where they are allocated.
> On Thu, 3 Apr 2008, Paul E. McKenney wrote:
> > Of course, the case where the block gets reallocated before the RCU
> > grace period ends would also need to be handled (which my rough sketch
> > yesterday did -not- handle, by the way...).
> >
> > There are a couple of ways of doing this. Probably the easiest approach
> > is to add more state to the flag, so that the RCU callback would check
> > to see if reallocation had already happened. If so, it would update the
> > state to indicate that the rcu_head was again available, and would need to
> > repost itself if the block had been freed again after being reallocated.
> >
> > The other approach would be to defer actually adding the block to the
> > freelist until the grace period expired. This would be more accurate,
> > but also quite a bit more intrusive.
>
> We already talked about deferring the actual freeing in kmemcheck to
> better detect these use-after-free conditions with Vegard. So it's
> something that we probably want to do regardless of RCU.
Then it is especially important that the rcu_head be pre-allocated.
Otherwise we could get into out-of-memory deadlocks where a free
operation is blocked by an allocation operation. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 69+ messages in thread
end of thread, other threads:[~2008-04-03 21:28 UTC | newest]
Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-01 21:08 kmemcheck caught read from freed memory (cfq_free_io_context) Vegard Nossum
2008-04-01 21:36 ` Peter Zijlstra
2008-04-01 22:51 ` Paul E. McKenney
2008-04-02 6:15 ` Peter Zijlstra
2008-04-02 7:19 ` Jens Axboe
2008-04-02 10:24 ` Paul E. McKenney
2008-04-02 7:17 ` Jens Axboe
2008-04-02 7:20 ` Pekka J Enberg
2008-04-02 7:24 ` Jens Axboe
2008-04-02 7:28 ` Ingo Molnar
2008-04-02 7:31 ` Jens Axboe
2008-04-02 10:55 ` Paul E. McKenney
2008-04-02 10:59 ` Peter Zijlstra
2008-04-02 11:33 ` Fabio Checconi
2008-04-02 11:43 ` Jens Axboe
2008-04-02 12:36 ` Jens Axboe
2008-04-02 12:36 ` Jens Axboe
2008-04-02 12:55 ` Fabio Checconi
2008-04-02 12:58 ` Jens Axboe
2008-04-02 12:58 ` Jens Axboe
2008-04-02 13:16 ` Fabio Checconi
2008-04-02 16:14 ` Paul E. McKenney
2008-04-02 13:37 ` Paul E. McKenney
2008-04-02 13:41 ` Jens Axboe
2008-04-02 15:33 ` Paul E. McKenney
2008-04-02 16:31 ` Jens Axboe
2008-04-02 17:00 ` Paul E. McKenney
2008-04-02 13:32 ` Paul E. McKenney
2008-04-02 13:40 ` Jens Axboe
2008-04-02 16:15 ` Paul E. McKenney
2008-04-02 11:01 ` Pekka Enberg
2008-04-02 11:07 ` Jens Axboe
2008-04-02 11:08 ` Peter Zijlstra
2008-04-02 11:11 ` Pekka Enberg
2008-04-02 11:14 ` Peter Zijlstra
2008-04-02 11:18 ` Pekka Enberg
2008-04-02 17:36 ` Christoph Lameter
2008-04-02 11:14 ` Jens Axboe
2008-04-02 11:20 ` Peter Zijlstra
2008-04-02 11:25 ` Peter Zijlstra
2008-04-02 11:32 ` Jens Axboe
2008-04-02 11:37 ` Peter Zijlstra
2008-04-02 11:42 ` Jens Axboe
2008-04-02 11:47 ` Peter Zijlstra
2008-04-02 11:53 ` Jens Axboe
2008-04-02 12:13 ` Peter Zijlstra
2008-04-02 12:28 ` Jens Axboe
2008-04-02 13:26 ` Paul E. McKenney
2008-04-02 13:43 ` Andi Kleen
2008-04-02 12:26 ` Peter Zijlstra
2008-04-02 12:34 ` Jens Axboe
2008-04-02 16:08 ` Paul E. McKenney
2008-04-02 16:15 ` Vegard Nossum
2008-04-02 16:32 ` Pekka J Enberg
2008-04-02 18:23 ` Paul E. McKenney
2008-04-02 19:53 ` Pekka Enberg
2008-04-02 20:15 ` Paul E. McKenney
2008-04-03 15:18 ` Paul E. McKenney
2008-04-03 19:49 ` Pekka J Enberg
2008-04-03 21:27 ` Paul E. McKenney
2008-04-02 16:59 ` Paul E. McKenney
2008-04-02 17:31 ` Vegard Nossum
2008-04-02 10:40 ` Paul E. McKenney
2008-04-02 10:46 ` Pekka Enberg
2008-04-02 10:49 ` Peter Zijlstra
2008-04-02 10:54 ` Pekka J Enberg
2008-04-02 17:35 ` Christoph Lameter
2008-04-02 10:53 ` Peter Zijlstra
2008-04-02 11:13 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).