LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: [PATCH] "volatile considered harmful" document
       [not found] ` <fa.fKNBJtZJWOQthlLjc1TDfY6jCLc@ifi.uio.no>
@ 2007-05-12 18:32   ` Robert Hancock
  2007-05-13 16:30     ` Krzysztof Halasa
  0 siblings, 1 reply; 25+ messages in thread
From: Robert Hancock @ 2007-05-12 18:32 UTC (permalink / raw)
  To: Bill Davidsen, linux-kernel; +Cc: Jonathan Corbet

Bill Davidsen wrote:
> Jonathan Corbet wrote:
> 
>> +There are still a few rare situations where volatile makes sense in the
>> +kernel:
>> +
>> +  - The above-mentioned accessor functions might use volatile on
>> +    architectures where direct I/O memory access does work.  
>> Essentially,
>> +    each accessor call becomes a little critical section on its own and
>> +    ensures that the access happens as expected by the programmer.
>> +
>> +  - Inline assembly code which changes memory, but which has no other
>> +    visible side effects, risks being deleted by GCC.  Adding the 
>> volatile
>> +    keyword to asm statements will prevent this removal.
>> +
>> +  - The jiffies variable is special in that it can have a different 
>> value
>> +    every time it is referenced, but it can be read without any special
>> +    locking.  So jiffies can be volatile, but the addition of other
>> +    variables of this type is frowned upon.  Jiffies is considered to 
>> be a
>> +    "stupid legacy" issue in this regard.
> 
> It would seem that any variable which is (a) subject to change by other 
> threads or hardware, and (b) the value of which is going to be used 
> without writing the variable, would be a valid use for volatile.

You don't need volatile in that case, rmb() can be used.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-12 18:32   ` [PATCH] "volatile considered harmful" document Robert Hancock
@ 2007-05-13 16:30     ` Krzysztof Halasa
  2007-05-13 23:26       ` Bill Davidsen
  0 siblings, 1 reply; 25+ messages in thread
From: Krzysztof Halasa @ 2007-05-13 16:30 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Bill Davidsen, linux-kernel, Jonathan Corbet

Robert Hancock <hancockr@shaw.ca> writes:

> You don't need volatile in that case, rmb() can be used.

rmb() invalidates all compiler assumptions, it can be much slower.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-13 16:30     ` Krzysztof Halasa
@ 2007-05-13 23:26       ` Bill Davidsen
  2007-05-13 23:53         ` Jeff Garzik
  0 siblings, 1 reply; 25+ messages in thread
From: Bill Davidsen @ 2007-05-13 23:26 UTC (permalink / raw)
  To: Krzysztof Halasa; +Cc: Robert Hancock, linux-kernel, Jonathan Corbet

Krzysztof Halasa wrote:
> Robert Hancock <hancockr@shaw.ca> writes:
>
>   
>> You don't need volatile in that case, rmb() can be used.
>>     
>
> rmb() invalidates all compiler assumptions, it can be much slower.
>   
Yes, why would you use rmb() when a read of a volatile generates optimal 
code?

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-13 23:26       ` Bill Davidsen
@ 2007-05-13 23:53         ` Jeff Garzik
  2007-05-14 14:10           ` Bill Davidsen
  0 siblings, 1 reply; 25+ messages in thread
From: Jeff Garzik @ 2007-05-13 23:53 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Krzysztof Halasa, Robert Hancock, linux-kernel, Jonathan Corbet

On Sun, May 13, 2007 at 07:26:13PM -0400, Bill Davidsen wrote:
> Krzysztof Halasa wrote:
> >Robert Hancock <hancockr@shaw.ca> writes:
> >>You don't need volatile in that case, rmb() can be used.

> >rmb() invalidates all compiler assumptions, it can be much slower.

It does not invalidate /all/ assumptions.


> Yes, why would you use rmb() when a read of a volatile generates optimal 
> code?

Read of a volatile is guaranteed to generate the least optimal code.
That's what volatile does, guarantee no optimization of that particular
access.

	Jeff




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-13 23:53         ` Jeff Garzik
@ 2007-05-14 14:10           ` Bill Davidsen
  2007-05-14 14:21             ` [2.6.21] circular locking dependency found in QUOTA OFF Folkert van Heusden
  0 siblings, 1 reply; 25+ messages in thread
From: Bill Davidsen @ 2007-05-14 14:10 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Krzysztof Halasa, Robert Hancock, linux-kernel, Jonathan Corbet

Jeff Garzik wrote:
> On Sun, May 13, 2007 at 07:26:13PM -0400, Bill Davidsen wrote:
>   
>> Krzysztof Halasa wrote:
>>     
>>> Robert Hancock <hancockr@shaw.ca> writes:
>>>       
>>>> You don't need volatile in that case, rmb() can be used.
>>>>         
>
>   
>>> rmb() invalidates all compiler assumptions, it can be much slower.
>>>       
>
> It does not invalidate /all/ assumptions.
>
>
>   
>> Yes, why would you use rmb() when a read of a volatile generates optimal 
>> code?
>>     
>
> Read of a volatile is guaranteed to generate the least optimal code.
> That's what volatile does, guarantee no optimization of that particular
> access.
>   
By optimal you seem to mean "generate fewer CPU cycles by risking use of 
an obsolete value," while by the same term I mean read the correct and 
current value from the memory location without the overhead of locks. If 
your logic doesn't require the correct value, why read it at all? And if 
it does, how fewer cycles and cache impact can anything have than a 
single register load from memory?

Locks are useful when the value will be changed by a thread, or when the 
value must not be changed briefly. That's not always the case.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [2.6.21] circular locking dependency found in QUOTA OFF
  2007-05-14 14:10           ` Bill Davidsen
@ 2007-05-14 14:21             ` Folkert van Heusden
  2007-05-14 17:43               ` Michal Piotrowski
  0 siblings, 1 reply; 25+ messages in thread
From: Folkert van Heusden @ 2007-05-14 14:21 UTC (permalink / raw)
  To: linux-kernel

Hi,

When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
and system on an 1-filesystem IDE disk, I get the following circular
locking dependency error:

[330961.226405] =======================================================
[330961.226489] [ INFO: possible circular locking dependency detected ]
[330961.226531] 2.6.21 #5
[330961.226569] -------------------------------------------------------
[330961.226611] quotaoff/12249 is trying to acquire lock:
[330961.226652]  (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
mutex_lock+0x8/0xa
[330961.226861]
[330961.226862] but task is already holding lock:
[330961.226938]  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
mutex_lock+0x8/0xa
[330961.227111]
[330961.227111] which lock already depends on the new lock.
[330961.227112]
[330961.227225]
[330961.227225] the existing dependency chain (in reverse order) is:
[330961.227303]
[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
[330961.227473]        [<c1039b02>] check_prev_add+0x15b/0x281
[330961.227766]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
[330961.228056]        [<c103b683>] __lock_acquire+0x692/0xb81
[330961.228353]        [<c103bfda>] lock_acquire+0x62/0x81
[330961.228643]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
[330961.228934]        [<c120e2a1>] mutex_lock+0x8/0xa
[330961.229221]        [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
[330961.229513]        [<c109fdd1>] vfs_quota_on+0x75/0x79
[330961.229803]        [<c10bc92d>] ext3_quota_on+0x95/0xb0
[330961.230093]        [<c10a1eb2>] do_quotactl+0xc9/0x2dd
[330961.230384]        [<c10a214a>] sys_quotactl+0x84/0xd6
[330961.230673]        [<c1003f74>] syscall_call+0x7/0xb
[330961.230963]        [<ffffffff>] 0xffffffff
[330961.231268]
[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
[330961.231469]        [<c10399db>] check_prev_add+0x34/0x281
[330961.231759]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
[330961.232049]        [<c103b683>] __lock_acquire+0x692/0xb81
[330961.232344]        [<c103bfda>] lock_acquire+0x62/0x81
[330961.232632]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
[330961.232923]        [<c120e2a1>] mutex_lock+0x8/0xa
[330961.233211]        [<c109fa6c>] vfs_quota_off+0x1cf/0x260
[330961.233500]        [<c10a2088>] do_quotactl+0x29f/0x2dd
[330961.233792]        [<c10a214a>] sys_quotactl+0x84/0xd6
[330961.234081]        [<c1003f74>] syscall_call+0x7/0xb
[330961.234503]        [<ffffffff>] 0xffffffff
[330961.234795]
[330961.234795] other info that might help us debug this:
[330961.234796]
[330961.234908] 2 locks held by quotaoff/12249:
[330961.234947]  #0:  (&type->s_umount_key#15){----}, at: [<c1070b5d>]
get_super+0x53/0x94
[330961.235183]  #1:  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
mutex_lock+0x8/0xa
[330961.235386]
[330961.235387] stack backtrace:
[330961.235462]  [<c1004d53>] show_trace_log_lvl+0x1a/0x30
[330961.235535]  [<c1004d7b>] show_trace+0x12/0x14
[330961.235606]  [<c1004e75>] dump_stack+0x16/0x18
[330961.235679]  [<c1039352>] print_circular_bug_tail+0x6f/0x71
[330961.235753]  [<c10399db>] check_prev_add+0x34/0x281
[330961.235825]  [<c1039cb3>] check_prevs_add+0x8b/0xe8
[330961.235897]  [<c103b683>] __lock_acquire+0x692/0xb81
[330961.235969]  [<c103bfda>] lock_acquire+0x62/0x81
[330961.236041]  [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
[330961.236113]  [<c120e2a1>] mutex_lock+0x8/0xa
[330961.236185]  [<c109fa6c>] vfs_quota_off+0x1cf/0x260
[330961.236257]  [<c10a2088>] do_quotactl+0x29f/0x2dd
[330961.236330]  [<c10a214a>] sys_quotactl+0x84/0xd6
[330961.236402]  [<c1003f74>] syscall_call+0x7/0xb
[330961.236473]  =======================


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [2.6.21] circular locking dependency found in QUOTA OFF
  2007-05-14 14:21             ` [2.6.21] circular locking dependency found in QUOTA OFF Folkert van Heusden
@ 2007-05-14 17:43               ` Michal Piotrowski
  2007-05-14 17:44                 ` Folkert van Heusden
  2007-05-15 14:09                 ` Jan Kara
  0 siblings, 2 replies; 25+ messages in thread
From: Michal Piotrowski @ 2007-05-14 17:43 UTC (permalink / raw)
  To: Folkert van Heusden; +Cc: Jan Kara, linux-fsdevel, linux-kernel

[adding Jan and fsdevel to CC]

Hi Folkert,

On 14/05/07, Folkert van Heusden <folkert@vanheusden.com> wrote:
> Hi,
>
> When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> and system on an 1-filesystem IDE disk, I get the following circular
> locking dependency error:
>
> [330961.226405] =======================================================
> [330961.226489] [ INFO: possible circular locking dependency detected ]
> [330961.226531] 2.6.21 #5
> [330961.226569] -------------------------------------------------------
> [330961.226611] quotaoff/12249 is trying to acquire lock:
> [330961.226652]  (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> mutex_lock+0x8/0xa
> [330961.226861]
> [330961.226862] but task is already holding lock:
> [330961.226938]  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> mutex_lock+0x8/0xa
> [330961.227111]
> [330961.227111] which lock already depends on the new lock.
> [330961.227112]
> [330961.227225]
> [330961.227225] the existing dependency chain (in reverse order) is:
> [330961.227303]
> [330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> [330961.227473]        [<c1039b02>] check_prev_add+0x15b/0x281
> [330961.227766]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> [330961.228056]        [<c103b683>] __lock_acquire+0x692/0xb81
> [330961.228353]        [<c103bfda>] lock_acquire+0x62/0x81
> [330961.228643]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> [330961.228934]        [<c120e2a1>] mutex_lock+0x8/0xa
> [330961.229221]        [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> [330961.229513]        [<c109fdd1>] vfs_quota_on+0x75/0x79
> [330961.229803]        [<c10bc92d>] ext3_quota_on+0x95/0xb0
> [330961.230093]        [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> [330961.230384]        [<c10a214a>] sys_quotactl+0x84/0xd6
> [330961.230673]        [<c1003f74>] syscall_call+0x7/0xb
> [330961.230963]        [<ffffffff>] 0xffffffff
> [330961.231268]
> [330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> [330961.231469]        [<c10399db>] check_prev_add+0x34/0x281
> [330961.231759]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> [330961.232049]        [<c103b683>] __lock_acquire+0x692/0xb81
> [330961.232344]        [<c103bfda>] lock_acquire+0x62/0x81
> [330961.232632]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> [330961.232923]        [<c120e2a1>] mutex_lock+0x8/0xa
> [330961.233211]        [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> [330961.233500]        [<c10a2088>] do_quotactl+0x29f/0x2dd
> [330961.233792]        [<c10a214a>] sys_quotactl+0x84/0xd6
> [330961.234081]        [<c1003f74>] syscall_call+0x7/0xb
> [330961.234503]        [<ffffffff>] 0xffffffff
> [330961.234795]
> [330961.234795] other info that might help us debug this:
> [330961.234796]
> [330961.234908] 2 locks held by quotaoff/12249:
> [330961.234947]  #0:  (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> get_super+0x53/0x94
> [330961.235183]  #1:  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> mutex_lock+0x8/0xa
> [330961.235386]
> [330961.235387] stack backtrace:
> [330961.235462]  [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> [330961.235535]  [<c1004d7b>] show_trace+0x12/0x14
> [330961.235606]  [<c1004e75>] dump_stack+0x16/0x18
> [330961.235679]  [<c1039352>] print_circular_bug_tail+0x6f/0x71
> [330961.235753]  [<c10399db>] check_prev_add+0x34/0x281
> [330961.235825]  [<c1039cb3>] check_prevs_add+0x8b/0xe8
> [330961.235897]  [<c103b683>] __lock_acquire+0x692/0xb81
> [330961.235969]  [<c103bfda>] lock_acquire+0x62/0x81
> [330961.236041]  [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> [330961.236113]  [<c120e2a1>] mutex_lock+0x8/0xa
> [330961.236185]  [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> [330961.236257]  [<c10a2088>] do_quotactl+0x29f/0x2dd
> [330961.236330]  [<c10a214a>] sys_quotactl+0x84/0xd6
> [330961.236402]  [<c1003f74>] syscall_call+0x7/0xb
> [330961.236473]  =======================
>

Is this a 2.6.21 regression?

Regards,
Michal

-- 
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [2.6.21] circular locking dependency found in QUOTA OFF
  2007-05-14 17:43               ` Michal Piotrowski
@ 2007-05-14 17:44                 ` Folkert van Heusden
  2007-05-15 14:09                 ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Folkert van Heusden @ 2007-05-14 17:44 UTC (permalink / raw)
  To: Michal Piotrowski; +Cc: Jan Kara, linux-fsdevel, linux-kernel

> [adding Jan and fsdevel to CC]
> Hi Folkert,
> >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> >and system on an 1-filesystem IDE disk, I get the following circular
> >locking dependency error:
> >
> >[330961.226405] =======================================================
> >[330961.226489] [ INFO: possible circular locking dependency detected ]
> >[330961.226531] 2.6.21 #5
...
> >[330961.236402]  [<c1003f74>] syscall_call+0x7/0xb
> >[330961.236473]  =======================
> 
> Is this a 2.6.21 regression?

This is new for 2.6.21, yes.


Folkert van Heusden

-- 
MultiTail est un flexible tool pour suivre de logfiles et execution de
commandements. Filtrer, pourvoir de couleur, merge, 'diff-view', etc.
http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [2.6.21] circular locking dependency found in QUOTA OFF
  2007-05-14 17:43               ` Michal Piotrowski
  2007-05-14 17:44                 ` Folkert van Heusden
@ 2007-05-15 14:09                 ` Jan Kara
  2007-05-15 17:52                   ` Folkert van Heusden
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Kara @ 2007-05-15 14:09 UTC (permalink / raw)
  To: Michal Piotrowski
  Cc: akpm, Folkert van Heusden, Jan Kara, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4579 bytes --]

  Hi,

  Thanks for report. It seems like a lockdep problem. i_mutex for quota
files is below dqonoff_sem. We were already fixing this for filesystem
specific quota IO functions but obviously we've missed a few cases. I
wonder why it showed up only now... Anyway, attached is a fix. Andrew,
would you add it? Thanks.

								Honza

On Mon 14-05-07 19:43:18, Michal Piotrowski wrote:
> [adding Jan and fsdevel to CC]
> 
> Hi Folkert,
> 
> On 14/05/07, Folkert van Heusden <folkert@vanheusden.com> wrote:
> >Hi,
> >
> >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> >and system on an 1-filesystem IDE disk, I get the following circular
> >locking dependency error:
> >
> >[330961.226405] =======================================================
> >[330961.226489] [ INFO: possible circular locking dependency detected ]
> >[330961.226531] 2.6.21 #5
> >[330961.226569] -------------------------------------------------------
> >[330961.226611] quotaoff/12249 is trying to acquire lock:
> >[330961.226652]  (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> >mutex_lock+0x8/0xa
> >[330961.226861]
> >[330961.226862] but task is already holding lock:
> >[330961.226938]  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> >mutex_lock+0x8/0xa
> >[330961.227111]
> >[330961.227111] which lock already depends on the new lock.
> >[330961.227112]
> >[330961.227225]
> >[330961.227225] the existing dependency chain (in reverse order) is:
> >[330961.227303]
> >[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> >[330961.227473]        [<c1039b02>] check_prev_add+0x15b/0x281
> >[330961.227766]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> >[330961.228056]        [<c103b683>] __lock_acquire+0x692/0xb81
> >[330961.228353]        [<c103bfda>] lock_acquire+0x62/0x81
> >[330961.228643]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> >[330961.228934]        [<c120e2a1>] mutex_lock+0x8/0xa
> >[330961.229221]        [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> >[330961.229513]        [<c109fdd1>] vfs_quota_on+0x75/0x79
> >[330961.229803]        [<c10bc92d>] ext3_quota_on+0x95/0xb0
> >[330961.230093]        [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> >[330961.230384]        [<c10a214a>] sys_quotactl+0x84/0xd6
> >[330961.230673]        [<c1003f74>] syscall_call+0x7/0xb
> >[330961.230963]        [<ffffffff>] 0xffffffff
> >[330961.231268]
> >[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> >[330961.231469]        [<c10399db>] check_prev_add+0x34/0x281
> >[330961.231759]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> >[330961.232049]        [<c103b683>] __lock_acquire+0x692/0xb81
> >[330961.232344]        [<c103bfda>] lock_acquire+0x62/0x81
> >[330961.232632]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> >[330961.232923]        [<c120e2a1>] mutex_lock+0x8/0xa
> >[330961.233211]        [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> >[330961.233500]        [<c10a2088>] do_quotactl+0x29f/0x2dd
> >[330961.233792]        [<c10a214a>] sys_quotactl+0x84/0xd6
> >[330961.234081]        [<c1003f74>] syscall_call+0x7/0xb
> >[330961.234503]        [<ffffffff>] 0xffffffff
> >[330961.234795]
> >[330961.234795] other info that might help us debug this:
> >[330961.234796]
> >[330961.234908] 2 locks held by quotaoff/12249:
> >[330961.234947]  #0:  (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> >get_super+0x53/0x94
> >[330961.235183]  #1:  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> >mutex_lock+0x8/0xa
> >[330961.235386]
> >[330961.235387] stack backtrace:
> >[330961.235462]  [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> >[330961.235535]  [<c1004d7b>] show_trace+0x12/0x14
> >[330961.235606]  [<c1004e75>] dump_stack+0x16/0x18
> >[330961.235679]  [<c1039352>] print_circular_bug_tail+0x6f/0x71
> >[330961.235753]  [<c10399db>] check_prev_add+0x34/0x281
> >[330961.235825]  [<c1039cb3>] check_prevs_add+0x8b/0xe8
> >[330961.235897]  [<c103b683>] __lock_acquire+0x692/0xb81
> >[330961.235969]  [<c103bfda>] lock_acquire+0x62/0x81
> >[330961.236041]  [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> >[330961.236113]  [<c120e2a1>] mutex_lock+0x8/0xa
> >[330961.236185]  [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> >[330961.236257]  [<c10a2088>] do_quotactl+0x29f/0x2dd
> >[330961.236330]  [<c10a214a>] sys_quotactl+0x84/0xd6
> >[330961.236402]  [<c1003f74>] syscall_call+0x7/0xb
> >[330961.236473]  =======================
> >
> 
> Is this a 2.6.21 regression?
> 
> Regards,
> Michal
> 
> -- 
> Michal K. K. Piotrowski
> Kernel Monkeys
> (http://kernel.wikidot.com/start)
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

[-- Attachment #2: quota-2.6.21-1-quota_lockdep.diff --]
[-- Type: text/x-patch, Size: 2940 bytes --]

i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
call under dqonoff_mutex and save some problems with races...

Signed-off-by: Jan Kara <jack@suse.cz>

diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/dquot.c linux-2.6.21-1-quota_lockdep/fs/dquot.c
--- linux-2.6.21/fs/dquot.c	2007-05-15 14:18:47.000000000 +0200
+++ linux-2.6.21-1-quota_lockdep/fs/dquot.c	2007-05-15 14:22:47.000000000 +0200
@@ -1421,7 +1421,7 @@ int vfs_quota_off(struct super_block *sb
 			/* If quota was reenabled in the meantime, we have
 			 * nothing to do */
 			if (!sb_has_quota_enabled(sb, cnt)) {
-				mutex_lock(&toputinode[cnt]->i_mutex);
+				mutex_lock_nested(&toputinode[cnt]->i_mutex, I_MUTEX_QUOTA);
 				toputinode[cnt]->i_flags &= ~(S_IMMUTABLE |
 				  S_NOATIME | S_NOQUOTA);
 				truncate_inode_pages(&toputinode[cnt]->i_data, 0);
diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/quota.c linux-2.6.21-1-quota_lockdep/fs/quota.c
--- linux-2.6.21/fs/quota.c	2006-11-29 22:57:37.000000000 +0100
+++ linux-2.6.21-1-quota_lockdep/fs/quota.c	2007-05-15 15:15:44.000000000 +0200
@@ -158,7 +158,6 @@ static int check_quotactl_valid(struct s
 static void quota_sync_sb(struct super_block *sb, int type)
 {
 	int cnt;
-	struct inode *discard[MAXQUOTAS];
 
 	sb->s_qcop->quota_sync(sb, type);
 	/* This is not very clever (and fast) but currently I don't know about
@@ -168,29 +167,21 @@ static void quota_sync_sb(struct super_b
 		sb->s_op->sync_fs(sb, 1);
 	sync_blockdev(sb->s_bdev);
 
-	/* Now when everything is written we can discard the pagecache so
-	 * that userspace sees the changes. We need i_mutex and so we could
-	 * not do it inside dqonoff_mutex. Moreover we need to be carefull
-	 * about races with quotaoff() (that is the reason why we have own
-	 * reference to inode). */
+	/*
+	 * Now when everything is written we can discard the pagecache so
+	 * that userspace sees the changes.
+	 */
 	mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
-		discard[cnt] = NULL;
 		if (type != -1 && cnt != type)
 			continue;
 		if (!sb_has_quota_enabled(sb, cnt))
 			continue;
-		discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);
+		mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex, I_MUTEX_NESTED);
+		truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
+		mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
 	}
 	mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
-	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
-		if (discard[cnt]) {
-			mutex_lock(&discard[cnt]->i_mutex);
-			truncate_inode_pages(&discard[cnt]->i_data, 0);
-			mutex_unlock(&discard[cnt]->i_mutex);
-			iput(discard[cnt]);
-		}
-	}
 }
 
 void sync_dquots(struct super_block *sb, int type)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [2.6.21] circular locking dependency found in QUOTA OFF
  2007-05-15 14:09                 ` Jan Kara
@ 2007-05-15 17:52                   ` Folkert van Heusden
  2007-05-15 18:14                     ` Jan Kara
  2007-05-15 19:18                     ` Jan Kara
  0 siblings, 2 replies; 25+ messages in thread
From: Folkert van Heusden @ 2007-05-15 17:52 UTC (permalink / raw)
  To: Jan Kara; +Cc: Michal Piotrowski, akpm, linux-fsdevel, linux-kernel

I'm afraid it doesn't compile:

  CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CHK     include/linux/compile.h
  CC      fs/dquot.o
  CC      fs/quota.o
fs/quota.c: In function `quota_sync_sb':
fs/quota.c:180: error: `I_MUTEX_NESTED' undeclared (first use in this function)
fs/quota.c:180: error: (Each undeclared identifier is reported only once
fs/quota.c:180: error: for each function it appears in.)
make[1]: *** [fs/quota.o] Error 1
make: *** [fs] Error 2


On Tue, May 15, 2007 at 04:09:39PM +0200, Jan Kara wrote:
>   Hi,
> 
>   Thanks for report. It seems like a lockdep problem. i_mutex for quota
> files is below dqonoff_sem. We were already fixing this for filesystem
> specific quota IO functions but obviously we've missed a few cases. I
> wonder why it showed up only now... Anyway, attached is a fix. Andrew,
> would you add it? Thanks.
> 
> 								Honza
> 
> On Mon 14-05-07 19:43:18, Michal Piotrowski wrote:
> > [adding Jan and fsdevel to CC]
> > 
> > Hi Folkert,
> > 
> > On 14/05/07, Folkert van Heusden <folkert@vanheusden.com> wrote:
> > >Hi,
> > >
> > >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> > >and system on an 1-filesystem IDE disk, I get the following circular
> > >locking dependency error:
> > >
> > >[330961.226405] =======================================================
> > >[330961.226489] [ INFO: possible circular locking dependency detected ]
> > >[330961.226531] 2.6.21 #5
> > >[330961.226569] -------------------------------------------------------
> > >[330961.226611] quotaoff/12249 is trying to acquire lock:
> > >[330961.226652]  (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> > >mutex_lock+0x8/0xa
> > >[330961.226861]
> > >[330961.226862] but task is already holding lock:
> > >[330961.226938]  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > >mutex_lock+0x8/0xa
> > >[330961.227111]
> > >[330961.227111] which lock already depends on the new lock.
> > >[330961.227112]
> > >[330961.227225]
> > >[330961.227225] the existing dependency chain (in reverse order) is:
> > >[330961.227303]
> > >[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> > >[330961.227473]        [<c1039b02>] check_prev_add+0x15b/0x281
> > >[330961.227766]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > >[330961.228056]        [<c103b683>] __lock_acquire+0x692/0xb81
> > >[330961.228353]        [<c103bfda>] lock_acquire+0x62/0x81
> > >[330961.228643]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > >[330961.228934]        [<c120e2a1>] mutex_lock+0x8/0xa
> > >[330961.229221]        [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> > >[330961.229513]        [<c109fdd1>] vfs_quota_on+0x75/0x79
> > >[330961.229803]        [<c10bc92d>] ext3_quota_on+0x95/0xb0
> > >[330961.230093]        [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> > >[330961.230384]        [<c10a214a>] sys_quotactl+0x84/0xd6
> > >[330961.230673]        [<c1003f74>] syscall_call+0x7/0xb
> > >[330961.230963]        [<ffffffff>] 0xffffffff
> > >[330961.231268]
> > >[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> > >[330961.231469]        [<c10399db>] check_prev_add+0x34/0x281
> > >[330961.231759]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > >[330961.232049]        [<c103b683>] __lock_acquire+0x692/0xb81
> > >[330961.232344]        [<c103bfda>] lock_acquire+0x62/0x81
> > >[330961.232632]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > >[330961.232923]        [<c120e2a1>] mutex_lock+0x8/0xa
> > >[330961.233211]        [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > >[330961.233500]        [<c10a2088>] do_quotactl+0x29f/0x2dd
> > >[330961.233792]        [<c10a214a>] sys_quotactl+0x84/0xd6
> > >[330961.234081]        [<c1003f74>] syscall_call+0x7/0xb
> > >[330961.234503]        [<ffffffff>] 0xffffffff
> > >[330961.234795]
> > >[330961.234795] other info that might help us debug this:
> > >[330961.234796]
> > >[330961.234908] 2 locks held by quotaoff/12249:
> > >[330961.234947]  #0:  (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> > >get_super+0x53/0x94
> > >[330961.235183]  #1:  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > >mutex_lock+0x8/0xa
> > >[330961.235386]
> > >[330961.235387] stack backtrace:
> > >[330961.235462]  [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> > >[330961.235535]  [<c1004d7b>] show_trace+0x12/0x14
> > >[330961.235606]  [<c1004e75>] dump_stack+0x16/0x18
> > >[330961.235679]  [<c1039352>] print_circular_bug_tail+0x6f/0x71
> > >[330961.235753]  [<c10399db>] check_prev_add+0x34/0x281
> > >[330961.235825]  [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > >[330961.235897]  [<c103b683>] __lock_acquire+0x692/0xb81
> > >[330961.235969]  [<c103bfda>] lock_acquire+0x62/0x81
> > >[330961.236041]  [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > >[330961.236113]  [<c120e2a1>] mutex_lock+0x8/0xa
> > >[330961.236185]  [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > >[330961.236257]  [<c10a2088>] do_quotactl+0x29f/0x2dd
> > >[330961.236330]  [<c10a214a>] sys_quotactl+0x84/0xd6
> > >[330961.236402]  [<c1003f74>] syscall_call+0x7/0xb
> > >[330961.236473]  =======================
> > >
> > 
> > Is this a 2.6.21 regression?
> > 
> > Regards,
> > Michal
> > 
> > -- 
> > Michal K. K. Piotrowski
> > Kernel Monkeys
> > (http://kernel.wikidot.com/start)
> -- 
> Jan Kara <jack@suse.cz>
> SuSE CR Labs

> i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
> acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
> comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
> file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
> call under dqonoff_mutex and save some problems with races...
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> 
> diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/dquot.c linux-2.6.21-1-quota_lockdep/fs/dquot.c
> --- linux-2.6.21/fs/dquot.c	2007-05-15 14:18:47.000000000 +0200
> +++ linux-2.6.21-1-quota_lockdep/fs/dquot.c	2007-05-15 14:22:47.000000000 +0200
> @@ -1421,7 +1421,7 @@ int vfs_quota_off(struct super_block *sb
>  			/* If quota was reenabled in the meantime, we have
>  			 * nothing to do */
>  			if (!sb_has_quota_enabled(sb, cnt)) {
> -				mutex_lock(&toputinode[cnt]->i_mutex);
> +				mutex_lock_nested(&toputinode[cnt]->i_mutex, I_MUTEX_QUOTA);
>  				toputinode[cnt]->i_flags &= ~(S_IMMUTABLE |
>  				  S_NOATIME | S_NOQUOTA);
>  				truncate_inode_pages(&toputinode[cnt]->i_data, 0);
> diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/quota.c linux-2.6.21-1-quota_lockdep/fs/quota.c
> --- linux-2.6.21/fs/quota.c	2006-11-29 22:57:37.000000000 +0100
> +++ linux-2.6.21-1-quota_lockdep/fs/quota.c	2007-05-15 15:15:44.000000000 +0200
> @@ -158,7 +158,6 @@ static int check_quotactl_valid(struct s
>  static void quota_sync_sb(struct super_block *sb, int type)
>  {
>  	int cnt;
> -	struct inode *discard[MAXQUOTAS];
>  
>  	sb->s_qcop->quota_sync(sb, type);
>  	/* This is not very clever (and fast) but currently I don't know about
> @@ -168,29 +167,21 @@ static void quota_sync_sb(struct super_b
>  		sb->s_op->sync_fs(sb, 1);
>  	sync_blockdev(sb->s_bdev);
>  
> -	/* Now when everything is written we can discard the pagecache so
> -	 * that userspace sees the changes. We need i_mutex and so we could
> -	 * not do it inside dqonoff_mutex. Moreover we need to be carefull
> -	 * about races with quotaoff() (that is the reason why we have own
> -	 * reference to inode). */
> +	/*
> +	 * Now when everything is written we can discard the pagecache so
> +	 * that userspace sees the changes.
> +	 */
>  	mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
>  	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> -		discard[cnt] = NULL;
>  		if (type != -1 && cnt != type)
>  			continue;
>  		if (!sb_has_quota_enabled(sb, cnt))
>  			continue;
> -		discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);
> +		mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex, I_MUTEX_NESTED);
> +		truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
> +		mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
>  	}
>  	mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> -	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> -		if (discard[cnt]) {
> -			mutex_lock(&discard[cnt]->i_mutex);
> -			truncate_inode_pages(&discard[cnt]->i_data, 0);
> -			mutex_unlock(&discard[cnt]->i_mutex);
> -			iput(discard[cnt]);
> -		}
> -	}
>  }
>  
>  void sync_dquots(struct super_block *sb, int type)



Folkert van Heusden

-- 
www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [2.6.21] circular locking dependency found in QUOTA OFF
  2007-05-15 17:52                   ` Folkert van Heusden
@ 2007-05-15 18:14                     ` Jan Kara
  2007-05-15 19:18                     ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Kara @ 2007-05-15 18:14 UTC (permalink / raw)
  To: Folkert van Heusden
  Cc: Jan Kara, Michal Piotrowski, akpm, linux-fsdevel, linux-kernel

On Tue 15-05-07 19:52:03, Folkert van Heusden wrote:
> I'm afraid it doesn't compile:
> 
>   CHK     include/linux/version.h
>   CHK     include/linux/utsrelease.h
>   CHK     include/linux/compile.h
>   CC      fs/dquot.o
>   CC      fs/quota.o
> fs/quota.c: In function `quota_sync_sb':
> fs/quota.c:180: error: `I_MUTEX_NESTED' undeclared (first use in this function)
> fs/quota.c:180: error: (Each undeclared identifier is reported only once
> fs/quota.c:180: error: for each function it appears in.)
> make[1]: *** [fs/quota.o] Error 1
> make: *** [fs] Error 2
  Huh, I has compiled just fine for me... Can you send me your .config?
Thanks.

									Honza

> On Tue, May 15, 2007 at 04:09:39PM +0200, Jan Kara wrote:
> >   Hi,
> > 
> >   Thanks for report. It seems like a lockdep problem. i_mutex for quota
> > files is below dqonoff_sem. We were already fixing this for filesystem
> > specific quota IO functions but obviously we've missed a few cases. I
> > wonder why it showed up only now... Anyway, attached is a fix. Andrew,
> > would you add it? Thanks.
> > 
> > 								Honza
> > 
> > On Mon 14-05-07 19:43:18, Michal Piotrowski wrote:
> > > [adding Jan and fsdevel to CC]
> > > 
> > > Hi Folkert,
> > > 
> > > On 14/05/07, Folkert van Heusden <folkert@vanheusden.com> wrote:
> > > >Hi,
> > > >
> > > >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> > > >and system on an 1-filesystem IDE disk, I get the following circular
> > > >locking dependency error:
> > > >
> > > >[330961.226405] =======================================================
> > > >[330961.226489] [ INFO: possible circular locking dependency detected ]
> > > >[330961.226531] 2.6.21 #5
> > > >[330961.226569] -------------------------------------------------------
> > > >[330961.226611] quotaoff/12249 is trying to acquire lock:
> > > >[330961.226652]  (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.226861]
> > > >[330961.226862] but task is already holding lock:
> > > >[330961.226938]  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.227111]
> > > >[330961.227111] which lock already depends on the new lock.
> > > >[330961.227112]
> > > >[330961.227225]
> > > >[330961.227225] the existing dependency chain (in reverse order) is:
> > > >[330961.227303]
> > > >[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> > > >[330961.227473]        [<c1039b02>] check_prev_add+0x15b/0x281
> > > >[330961.227766]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.228056]        [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.228353]        [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.228643]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.228934]        [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.229221]        [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> > > >[330961.229513]        [<c109fdd1>] vfs_quota_on+0x75/0x79
> > > >[330961.229803]        [<c10bc92d>] ext3_quota_on+0x95/0xb0
> > > >[330961.230093]        [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> > > >[330961.230384]        [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.230673]        [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.230963]        [<ffffffff>] 0xffffffff
> > > >[330961.231268]
> > > >[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> > > >[330961.231469]        [<c10399db>] check_prev_add+0x34/0x281
> > > >[330961.231759]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.232049]        [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.232344]        [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.232632]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.232923]        [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.233211]        [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > > >[330961.233500]        [<c10a2088>] do_quotactl+0x29f/0x2dd
> > > >[330961.233792]        [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.234081]        [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.234503]        [<ffffffff>] 0xffffffff
> > > >[330961.234795]
> > > >[330961.234795] other info that might help us debug this:
> > > >[330961.234796]
> > > >[330961.234908] 2 locks held by quotaoff/12249:
> > > >[330961.234947]  #0:  (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> > > >get_super+0x53/0x94
> > > >[330961.235183]  #1:  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.235386]
> > > >[330961.235387] stack backtrace:
> > > >[330961.235462]  [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> > > >[330961.235535]  [<c1004d7b>] show_trace+0x12/0x14
> > > >[330961.235606]  [<c1004e75>] dump_stack+0x16/0x18
> > > >[330961.235679]  [<c1039352>] print_circular_bug_tail+0x6f/0x71
> > > >[330961.235753]  [<c10399db>] check_prev_add+0x34/0x281
> > > >[330961.235825]  [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.235897]  [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.235969]  [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.236041]  [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.236113]  [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.236185]  [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > > >[330961.236257]  [<c10a2088>] do_quotactl+0x29f/0x2dd
> > > >[330961.236330]  [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.236402]  [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.236473]  =======================
> > > >
> > > 
> > > Is this a 2.6.21 regression?
> > > 
> > > Regards,
> > > Michal
> > > 
> > > -- 
> > > Michal K. K. Piotrowski
> > > Kernel Monkeys
> > > (http://kernel.wikidot.com/start)
> > -- 
> > Jan Kara <jack@suse.cz>
> > SuSE CR Labs
> 
> > i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
> > acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
> > comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
> > file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
> > call under dqonoff_mutex and save some problems with races...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/dquot.c linux-2.6.21-1-quota_lockdep/fs/dquot.c
> > --- linux-2.6.21/fs/dquot.c	2007-05-15 14:18:47.000000000 +0200
> > +++ linux-2.6.21-1-quota_lockdep/fs/dquot.c	2007-05-15 14:22:47.000000000 +0200
> > @@ -1421,7 +1421,7 @@ int vfs_quota_off(struct super_block *sb
> >  			/* If quota was reenabled in the meantime, we have
> >  			 * nothing to do */
> >  			if (!sb_has_quota_enabled(sb, cnt)) {
> > -				mutex_lock(&toputinode[cnt]->i_mutex);
> > +				mutex_lock_nested(&toputinode[cnt]->i_mutex, I_MUTEX_QUOTA);
> >  				toputinode[cnt]->i_flags &= ~(S_IMMUTABLE |
> >  				  S_NOATIME | S_NOQUOTA);
> >  				truncate_inode_pages(&toputinode[cnt]->i_data, 0);
> > diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/quota.c linux-2.6.21-1-quota_lockdep/fs/quota.c
> > --- linux-2.6.21/fs/quota.c	2006-11-29 22:57:37.000000000 +0100
> > +++ linux-2.6.21-1-quota_lockdep/fs/quota.c	2007-05-15 15:15:44.000000000 +0200
> > @@ -158,7 +158,6 @@ static int check_quotactl_valid(struct s
> >  static void quota_sync_sb(struct super_block *sb, int type)
> >  {
> >  	int cnt;
> > -	struct inode *discard[MAXQUOTAS];
> >  
> >  	sb->s_qcop->quota_sync(sb, type);
> >  	/* This is not very clever (and fast) but currently I don't know about
> > @@ -168,29 +167,21 @@ static void quota_sync_sb(struct super_b
> >  		sb->s_op->sync_fs(sb, 1);
> >  	sync_blockdev(sb->s_bdev);
> >  
> > -	/* Now when everything is written we can discard the pagecache so
> > -	 * that userspace sees the changes. We need i_mutex and so we could
> > -	 * not do it inside dqonoff_mutex. Moreover we need to be carefull
> > -	 * about races with quotaoff() (that is the reason why we have own
> > -	 * reference to inode). */
> > +	/*
> > +	 * Now when everything is written we can discard the pagecache so
> > +	 * that userspace sees the changes.
> > +	 */
> >  	mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
> >  	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> > -		discard[cnt] = NULL;
> >  		if (type != -1 && cnt != type)
> >  			continue;
> >  		if (!sb_has_quota_enabled(sb, cnt))
> >  			continue;
> > -		discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);
> > +		mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex, I_MUTEX_NESTED);
> > +		truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
> > +		mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
> >  	}
> >  	mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> > -	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> > -		if (discard[cnt]) {
> > -			mutex_lock(&discard[cnt]->i_mutex);
> > -			truncate_inode_pages(&discard[cnt]->i_data, 0);
> > -			mutex_unlock(&discard[cnt]->i_mutex);
> > -			iput(discard[cnt]);
> > -		}
> > -	}
> >  }
> >  
> >  void sync_dquots(struct super_block *sb, int type)
> 
> 
> 
> Folkert van Heusden
> 
> -- 
> www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
> ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
> HP/UX en win een vlaai naar keuze
> ----------------------------------------------------------------------
> Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [2.6.21] circular locking dependency found in QUOTA OFF
  2007-05-15 17:52                   ` Folkert van Heusden
  2007-05-15 18:14                     ` Jan Kara
@ 2007-05-15 19:18                     ` Jan Kara
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Kara @ 2007-05-15 19:18 UTC (permalink / raw)
  To: Folkert van Heusden; +Cc: Michal Piotrowski, akpm, linux-fsdevel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 9431 bytes --]

On Tue 15-05-07 19:52:03, Folkert van Heusden wrote:
> I'm afraid it doesn't compile:
> 
>   CHK     include/linux/version.h
>   CHK     include/linux/utsrelease.h
>   CHK     include/linux/compile.h
>   CC      fs/dquot.o
>   CC      fs/quota.o
> fs/quota.c: In function `quota_sync_sb':
> fs/quota.c:180: error: `I_MUTEX_NESTED' undeclared (first use in this function)
> fs/quota.c:180: error: (Each undeclared identifier is reported only once
> fs/quota.c:180: error: for each function it appears in.)
> make[1]: *** [fs/quota.o] Error 1
> make: *** [fs] Error 2
  Fixed patch follows (I messed up testing as I thought that lockdep is
under config option Debug mutexes but it is not...). I'm sorry for the
stupid mistake. Andrew, please discard my previous patch and use this new
one...

									Honza
> On Tue, May 15, 2007 at 04:09:39PM +0200, Jan Kara wrote:
> >   Hi,
> > 
> >   Thanks for report. It seems like a lockdep problem. i_mutex for quota
> > files is below dqonoff_sem. We were already fixing this for filesystem
> > specific quota IO functions but obviously we've missed a few cases. I
> > wonder why it showed up only now... Anyway, attached is a fix. Andrew,
> > would you add it? Thanks.
> > 
> > 								Honza
> > 
> > On Mon 14-05-07 19:43:18, Michal Piotrowski wrote:
> > > [adding Jan and fsdevel to CC]
> > > 
> > > Hi Folkert,
> > > 
> > > On 14/05/07, Folkert van Heusden <folkert@vanheusden.com> wrote:
> > > >Hi,
> > > >
> > > >When I cleanly reboot my pc running 2.6.21 on a P4 with HT and 2GB of ram
> > > >and system on an 1-filesystem IDE disk, I get the following circular
> > > >locking dependency error:
> > > >
> > > >[330961.226405] =======================================================
> > > >[330961.226489] [ INFO: possible circular locking dependency detected ]
> > > >[330961.226531] 2.6.21 #5
> > > >[330961.226569] -------------------------------------------------------
> > > >[330961.226611] quotaoff/12249 is trying to acquire lock:
> > > >[330961.226652]  (&sb->s_type->i_mutex_key#4){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.226861]
> > > >[330961.226862] but task is already holding lock:
> > > >[330961.226938]  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.227111]
> > > >[330961.227111] which lock already depends on the new lock.
> > > >[330961.227112]
> > > >[330961.227225]
> > > >[330961.227225] the existing dependency chain (in reverse order) is:
> > > >[330961.227303]
> > > >[330961.227303] -> #1 (&s->s_dquot.dqonoff_mutex){--..}:
> > > >[330961.227473]        [<c1039b02>] check_prev_add+0x15b/0x281
> > > >[330961.227766]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.228056]        [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.228353]        [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.228643]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.228934]        [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.229221]        [<c109fbbe>] vfs_quota_on_inode+0xc1/0x25f
> > > >[330961.229513]        [<c109fdd1>] vfs_quota_on+0x75/0x79
> > > >[330961.229803]        [<c10bc92d>] ext3_quota_on+0x95/0xb0
> > > >[330961.230093]        [<c10a1eb2>] do_quotactl+0xc9/0x2dd
> > > >[330961.230384]        [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.230673]        [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.230963]        [<ffffffff>] 0xffffffff
> > > >[330961.231268]
> > > >[330961.231268] -> #0 (&sb->s_type->i_mutex_key#4){--..}:
> > > >[330961.231469]        [<c10399db>] check_prev_add+0x34/0x281
> > > >[330961.231759]        [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.232049]        [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.232344]        [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.232632]        [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.232923]        [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.233211]        [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > > >[330961.233500]        [<c10a2088>] do_quotactl+0x29f/0x2dd
> > > >[330961.233792]        [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.234081]        [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.234503]        [<ffffffff>] 0xffffffff
> > > >[330961.234795]
> > > >[330961.234795] other info that might help us debug this:
> > > >[330961.234796]
> > > >[330961.234908] 2 locks held by quotaoff/12249:
> > > >[330961.234947]  #0:  (&type->s_umount_key#15){----}, at: [<c1070b5d>]
> > > >get_super+0x53/0x94
> > > >[330961.235183]  #1:  (&s->s_dquot.dqonoff_mutex){--..}, at: [<c120e2a1>]
> > > >mutex_lock+0x8/0xa
> > > >[330961.235386]
> > > >[330961.235387] stack backtrace:
> > > >[330961.235462]  [<c1004d53>] show_trace_log_lvl+0x1a/0x30
> > > >[330961.235535]  [<c1004d7b>] show_trace+0x12/0x14
> > > >[330961.235606]  [<c1004e75>] dump_stack+0x16/0x18
> > > >[330961.235679]  [<c1039352>] print_circular_bug_tail+0x6f/0x71
> > > >[330961.235753]  [<c10399db>] check_prev_add+0x34/0x281
> > > >[330961.235825]  [<c1039cb3>] check_prevs_add+0x8b/0xe8
> > > >[330961.235897]  [<c103b683>] __lock_acquire+0x692/0xb81
> > > >[330961.235969]  [<c103bfda>] lock_acquire+0x62/0x81
> > > >[330961.236041]  [<c120e322>] __mutex_lock_slowpath+0x75/0x28c
> > > >[330961.236113]  [<c120e2a1>] mutex_lock+0x8/0xa
> > > >[330961.236185]  [<c109fa6c>] vfs_quota_off+0x1cf/0x260
> > > >[330961.236257]  [<c10a2088>] do_quotactl+0x29f/0x2dd
> > > >[330961.236330]  [<c10a214a>] sys_quotactl+0x84/0xd6
> > > >[330961.236402]  [<c1003f74>] syscall_call+0x7/0xb
> > > >[330961.236473]  =======================
> > > >
> > > 
> > > Is this a 2.6.21 regression?
> > > 
> > > Regards,
> > > Michal
> > > 
> > > -- 
> > > Michal K. K. Piotrowski
> > > Kernel Monkeys
> > > (http://kernel.wikidot.com/start)
> > -- 
> > Jan Kara <jack@suse.cz>
> > SuSE CR Labs
> 
> > i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
> > acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
> > comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
> > file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
> > call under dqonoff_mutex and save some problems with races...
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > 
> > diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/dquot.c linux-2.6.21-1-quota_lockdep/fs/dquot.c
> > --- linux-2.6.21/fs/dquot.c	2007-05-15 14:18:47.000000000 +0200
> > +++ linux-2.6.21-1-quota_lockdep/fs/dquot.c	2007-05-15 14:22:47.000000000 +0200
> > @@ -1421,7 +1421,7 @@ int vfs_quota_off(struct super_block *sb
> >  			/* If quota was reenabled in the meantime, we have
> >  			 * nothing to do */
> >  			if (!sb_has_quota_enabled(sb, cnt)) {
> > -				mutex_lock(&toputinode[cnt]->i_mutex);
> > +				mutex_lock_nested(&toputinode[cnt]->i_mutex, I_MUTEX_QUOTA);
> >  				toputinode[cnt]->i_flags &= ~(S_IMMUTABLE |
> >  				  S_NOATIME | S_NOQUOTA);
> >  				truncate_inode_pages(&toputinode[cnt]->i_data, 0);
> > diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/quota.c linux-2.6.21-1-quota_lockdep/fs/quota.c
> > --- linux-2.6.21/fs/quota.c	2006-11-29 22:57:37.000000000 +0100
> > +++ linux-2.6.21-1-quota_lockdep/fs/quota.c	2007-05-15 15:15:44.000000000 +0200
> > @@ -158,7 +158,6 @@ static int check_quotactl_valid(struct s
> >  static void quota_sync_sb(struct super_block *sb, int type)
> >  {
> >  	int cnt;
> > -	struct inode *discard[MAXQUOTAS];
> >  
> >  	sb->s_qcop->quota_sync(sb, type);
> >  	/* This is not very clever (and fast) but currently I don't know about
> > @@ -168,29 +167,21 @@ static void quota_sync_sb(struct super_b
> >  		sb->s_op->sync_fs(sb, 1);
> >  	sync_blockdev(sb->s_bdev);
> >  
> > -	/* Now when everything is written we can discard the pagecache so
> > -	 * that userspace sees the changes. We need i_mutex and so we could
> > -	 * not do it inside dqonoff_mutex. Moreover we need to be carefull
> > -	 * about races with quotaoff() (that is the reason why we have own
> > -	 * reference to inode). */
> > +	/*
> > +	 * Now when everything is written we can discard the pagecache so
> > +	 * that userspace sees the changes.
> > +	 */
> >  	mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
> >  	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> > -		discard[cnt] = NULL;
> >  		if (type != -1 && cnt != type)
> >  			continue;
> >  		if (!sb_has_quota_enabled(sb, cnt))
> >  			continue;
> > -		discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);
> > +		mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex, I_MUTEX_NESTED);
> > +		truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
> > +		mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
> >  	}
> >  	mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
> > -	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
> > -		if (discard[cnt]) {
> > -			mutex_lock(&discard[cnt]->i_mutex);
> > -			truncate_inode_pages(&discard[cnt]->i_data, 0);
> > -			mutex_unlock(&discard[cnt]->i_mutex);
> > -			iput(discard[cnt]);
> > -		}
> > -	}
> >  }
> >  
> >  void sync_dquots(struct super_block *sb, int type)
> 
> 
> 
> Folkert van Heusden
> 
> -- 
> www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
> ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
> HP/UX en win een vlaai naar keuze
> ----------------------------------------------------------------------
> Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

[-- Attachment #2: quota-2.6.21-1-quota_lockdep.diff --]
[-- Type: text/x-patch, Size: 2939 bytes --]

i_mutex on quota files is special. Unlike i_mutexes for other inodes it is
acquired under dqonoff_mutex. Tell lockdep about this lock ranking. Also
comment and code in quota_sync_sb() seem to be bogus (as i_mutex for quota
file can be acquired under dqonoff_mutex). Move truncate_inode_pages()
call under dqonoff_mutex and save some problems with races...

Signed-off-by: Jan Kara <jack@suse.cz>

diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/dquot.c linux-2.6.21-1-quota_lockdep/fs/dquot.c
--- linux-2.6.21/fs/dquot.c	2007-05-15 14:18:47.000000000 +0200
+++ linux-2.6.21-1-quota_lockdep/fs/dquot.c	2007-05-15 14:22:47.000000000 +0200
@@ -1421,7 +1421,7 @@ int vfs_quota_off(struct super_block *sb
 			/* If quota was reenabled in the meantime, we have
 			 * nothing to do */
 			if (!sb_has_quota_enabled(sb, cnt)) {
-				mutex_lock(&toputinode[cnt]->i_mutex);
+				mutex_lock_nested(&toputinode[cnt]->i_mutex, I_MUTEX_QUOTA);
 				toputinode[cnt]->i_flags &= ~(S_IMMUTABLE |
 				  S_NOATIME | S_NOQUOTA);
 				truncate_inode_pages(&toputinode[cnt]->i_data, 0);
diff -rupX /home/jack/.kerndiffexclude linux-2.6.21/fs/quota.c linux-2.6.21-1-quota_lockdep/fs/quota.c
--- linux-2.6.21/fs/quota.c	2006-11-29 22:57:37.000000000 +0100
+++ linux-2.6.21-1-quota_lockdep/fs/quota.c	2007-05-15 20:36:18.000000000 +0200
@@ -158,7 +158,6 @@ static int check_quotactl_valid(struct s
 static void quota_sync_sb(struct super_block *sb, int type)
 {
 	int cnt;
-	struct inode *discard[MAXQUOTAS];
 
 	sb->s_qcop->quota_sync(sb, type);
 	/* This is not very clever (and fast) but currently I don't know about
@@ -168,29 +167,21 @@ static void quota_sync_sb(struct super_b
 		sb->s_op->sync_fs(sb, 1);
 	sync_blockdev(sb->s_bdev);
 
-	/* Now when everything is written we can discard the pagecache so
-	 * that userspace sees the changes. We need i_mutex and so we could
-	 * not do it inside dqonoff_mutex. Moreover we need to be carefull
-	 * about races with quotaoff() (that is the reason why we have own
-	 * reference to inode). */
+	/*
+	 * Now when everything is written we can discard the pagecache so
+	 * that userspace sees the changes.
+	 */
 	mutex_lock(&sb_dqopt(sb)->dqonoff_mutex);
 	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
-		discard[cnt] = NULL;
 		if (type != -1 && cnt != type)
 			continue;
 		if (!sb_has_quota_enabled(sb, cnt))
 			continue;
-		discard[cnt] = igrab(sb_dqopt(sb)->files[cnt]);
+		mutex_lock_nested(&sb_dqopt(sb)->files[cnt]->i_mutex, I_MUTEX_QUOTA);
+		truncate_inode_pages(&sb_dqopt(sb)->files[cnt]->i_data, 0);
+		mutex_unlock(&sb_dqopt(sb)->files[cnt]->i_mutex);
 	}
 	mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex);
-	for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
-		if (discard[cnt]) {
-			mutex_lock(&discard[cnt]->i_mutex);
-			truncate_inode_pages(&discard[cnt]->i_data, 0);
-			mutex_unlock(&discard[cnt]->i_mutex);
-			iput(discard[cnt]);
-		}
-	}
 }
 
 void sync_dquots(struct super_block *sb, int type)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 21:05 [PATCH] "volatile considered harmful" document Jonathan Corbet
                   ` (5 preceding siblings ...)
  2007-05-10 21:54 ` Bill Davidsen
@ 2007-05-16  9:30 ` Thomas De Schampheleire
  6 siblings, 0 replies; 25+ messages in thread
From: Thomas De Schampheleire @ 2007-05-16  9:30 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel, akpm, Randy Dunlap

On 5/9/07, Jonathan Corbet <corbet@lwn.net> wrote:
>  +CREDITS
> +-------
> +
> +Original impetus and research by Randy Dunlap
> +Written by Jonathan Corbet
> +Improvements via coments from Satyam Sharma Johannes Stezenbach
>


Just a small spelling mistake: coments should be comments.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-10 19:35 ` H. Peter Anvin
@ 2007-05-10 22:00   ` Alan Cox
  0 siblings, 0 replies; 25+ messages in thread
From: Alan Cox @ 2007-05-10 22:00 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Jonathan Corbet, linux-kernel, akpm, Randy Dunlap

> In Ethernet drivers, for example, it is common for the network card to
> maintain a pointer in host memory the the latest descriptor written; you
> will generally have a loop of the form:
> 
> 	while ((this_pointer = *pointer_ptr) > my_last_pointer) {
> 		for (pkt = my_last_pointer; pkt < this_pointer; pkt++)
> 			receeive_packet(pkt);
> 		my_last_pointer = this_pointer;
> 	}
> 
> pointer_p can then be a volatile pointer into said coherent memory.

True but you can happily use rmb/wmb for this which are clearer and fit
the rest of the Linux model better.

Alan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 21:05 [PATCH] "volatile considered harmful" document Jonathan Corbet
                   ` (4 preceding siblings ...)
  2007-05-10 19:35 ` H. Peter Anvin
@ 2007-05-10 21:54 ` Bill Davidsen
  2007-05-16  9:30 ` Thomas De Schampheleire
  6 siblings, 0 replies; 25+ messages in thread
From: Bill Davidsen @ 2007-05-10 21:54 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel, akpm, Randy Dunlap

Jonathan Corbet wrote:

> +There are still a few rare situations where volatile makes sense in the
> +kernel:
> +
> +  - The above-mentioned accessor functions might use volatile on
> +    architectures where direct I/O memory access does work.  Essentially,
> +    each accessor call becomes a little critical section on its own and
> +    ensures that the access happens as expected by the programmer.
> +
> +  - Inline assembly code which changes memory, but which has no other
> +    visible side effects, risks being deleted by GCC.  Adding the volatile
> +    keyword to asm statements will prevent this removal.
> +
> +  - The jiffies variable is special in that it can have a different value
> +    every time it is referenced, but it can be read without any special
> +    locking.  So jiffies can be volatile, but the addition of other
> +    variables of this type is frowned upon.  Jiffies is considered to be a
> +    "stupid legacy" issue in this regard.

It would seem that any variable which is (a) subject to change by other 
threads or hardware, and (b) the value of which is going to be used 
without writing the variable, would be a valid use for volatile.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 21:05 [PATCH] "volatile considered harmful" document Jonathan Corbet
                   ` (3 preceding siblings ...)
  2007-05-10 16:14 ` Giacomo A. Catenazzi
@ 2007-05-10 19:35 ` H. Peter Anvin
  2007-05-10 22:00   ` Alan Cox
  2007-05-10 21:54 ` Bill Davidsen
  2007-05-16  9:30 ` Thomas De Schampheleire
  6 siblings, 1 reply; 25+ messages in thread
From: H. Peter Anvin @ 2007-05-10 19:35 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel, akpm, Randy Dunlap

Jonathan Corbet wrote:
> OK, here's an updated version of the volatile document - as a plain text
> file this time.  It drops a new file in Documentation/, but might it be
> better as an addition to CodingStyle?
> 
> Comments welcome,

I have found one use of volatile which I consider legitimate: pointers
to data structures read or written by I/O devices in coherent memory
(typically pci_coherent memory.)  This is local to device drivers, but
as far as I can tell, the use of volatile here is legitimate, although
arguably it will be redundant in > 90% of all cases due to the
incidental presence of other memory barriers.

In Ethernet drivers, for example, it is common for the network card to
maintain a pointer in host memory the the latest descriptor written; you
will generally have a loop of the form:

	while ((this_pointer = *pointer_ptr) > my_last_pointer) {
		for (pkt = my_last_pointer; pkt < this_pointer; pkt++)
			receeive_packet(pkt);
		my_last_pointer = this_pointer;
	}

pointer_p can then be a volatile pointer into said coherent memory.

	-hpa

		

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 21:05 [PATCH] "volatile considered harmful" document Jonathan Corbet
                   ` (2 preceding siblings ...)
  2007-05-09 22:35 ` Scott Preece
@ 2007-05-10 16:14 ` Giacomo A. Catenazzi
  2007-05-10 19:35 ` H. Peter Anvin
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Giacomo A. Catenazzi @ 2007-05-10 16:14 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel, akpm, Randy Dunlap

Jonathan Corbet wrote:
> +The volatile storage class was originally meant for memory-mapped I/O
> +registers.  Within the kernel, register accesses, too, should be protected

I don't think it deserves to be added in documentation, but just for
reference: in userspace "volatile" is needed in signals (posix mandates
some variables to be volatile, as API, not as funtionality). I don't
know if this was also on the original signal handling.

Anyway user space APIs are not kernel problem ;-)

ciao
	cate

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 22:47     ` Jesper Juhl
@ 2007-05-09 23:20       ` Satyam Sharma
  0 siblings, 0 replies; 25+ messages in thread
From: Satyam Sharma @ 2007-05-09 23:20 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Jonathan Corbet, linux-kernel, akpm, Randy Dunlap

On 5/10/07, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 10/05/07, Satyam Sharma <satyam.sharma@gmail.com> wrote:
> > On 5/10/07, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> > > [snip]
> > > you write: "... that the variable could be changed outside of the
> > > current thread of execution ..."
> > >
> > > I suggest: "... that the variable could be changed outside of the
> > > current thread of execution - a sort of simple atomic variable ..."
> >
> > I'm not so sure here. Why would any C programmer (worth his weight in
> > salt) think that volatile objects are automatically _atomic_? At
>
> I honestly don't really know, but I've encountered that confusion a
> few times. Both from friends who (for some reason) believed that and
> from documents on the web that implied it, aparently it's a common
> confusion - a few examples:
>
>     http://lists.freebsd.org/pipermail/freebsd-perl/2004-June/000124.html
>         "... volatile (atomic) fixes the problem. ..."
>
>     http://blogs.msdn.com/ricom/archive/2006/04/28/586406.aspx
>         "That's the point of the volatile keyword. It makes sure that
> the line "dict = d;" is atomic."
>
>     http://forum.java.sun.com/thread.jspa?threadID=5126877&start=0
>         "A volatile variable is also guaranteed to be read or written
> as an atomic operation ..."  (yes, this link talks about Java, which I
> don't know, but if java volatile means atomic, that might explain why
> some people assume the same for C).

Perl / Microsoft / Java programmers are probably not worth their
weight in salt anyway :-)

I'm not an expert in any of the above platforms either, so don't know
if the semantics of "volatile" in those other languages are different
from that in C -- and this document clearly applies to only the kernel
(and thus C). But if this volatile == atomic disease is indeed common
among _C_ programmers too, then your suggested addition would make
sense.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 22:31   ` Satyam Sharma
@ 2007-05-09 22:47     ` Jesper Juhl
  2007-05-09 23:20       ` Satyam Sharma
  0 siblings, 1 reply; 25+ messages in thread
From: Jesper Juhl @ 2007-05-09 22:47 UTC (permalink / raw)
  To: Satyam Sharma; +Cc: Jonathan Corbet, linux-kernel, akpm, Randy Dunlap

On 10/05/07, Satyam Sharma <satyam.sharma@gmail.com> wrote:
> On 5/10/07, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> > On 09/05/07, Jonathan Corbet <corbet@lwn.net> wrote:
<snip>
> > > +"the new C book") has the following to say about the volatile keyword:
> > > +
> > > +       The purpose of volatile is to force an implementation to suppress
> > > +       optimization that could otherwise occur.  For example, for a
> > > +       machine with memory-mapped input/output, a pointer to a device
> > > +       register might be declared as a pointer to volatile, in
> > > +       order to prevent the compiler from removing apparently redundant
> > > +       references through the pointer.
> > > +
> > > +C programmers have often taken volatile to mean that the variable could be
> > > +changed outside of the current thread of execution; as a result, they are
> >
> > you write: "... that the variable could be changed outside of the
> > current thread of execution ..."
> >
> > I suggest: "... that the variable could be changed outside of the
> > current thread of execution - a sort of simple atomic variable ..."
>
> I'm not so sure here. Why would any C programmer (worth his weight in
> salt) think that volatile objects are automatically _atomic_? At

I honestly don't really know, but I've encountered that confusion a
few times. Both from friends who (for some reason) believed that and
from documents on the web that implied it, aparently it's a common
confusion - a few examples:

    http://lists.freebsd.org/pipermail/freebsd-perl/2004-June/000124.html
        "... volatile (atomic) fixes the problem. ..."

    http://blogs.msdn.com/ricom/archive/2006/04/28/586406.aspx
        "That's the point of the volatile keyword. It makes sure that
the line "dict = d;" is atomic."

    http://forum.java.sun.com/thread.jspa?threadID=5126877&start=0
        "A volatile variable is also guaranteed to be read or written
as an atomic operation ..."  (yes, this link talks about Java, which I
don't know, but if java volatile means atomic, that might explain why
some people assume the same for C).

In any case, it's not an uncommon belief, so I just thought it made
sense to also make that little note.


> worst, the mistake someone might make would be to _implement_ locking
> primitives using volatile. "that the variable could be changed outside
> of the current thread of execution" sounds sufficient to me, and after
> all, that is exactly what volatile hints to the compiler.
>


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 21:05 [PATCH] "volatile considered harmful" document Jonathan Corbet
  2007-05-09 21:45 ` Jesper Juhl
  2007-05-09 22:05 ` Heikki Orsila
@ 2007-05-09 22:35 ` Scott Preece
  2007-05-10 16:14 ` Giacomo A. Catenazzi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Scott Preece @ 2007-05-09 22:35 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel, akpm, Randy Dunlap

On 5/9/07, Jonathan Corbet <corbet@lwn.net> wrote:
> OK, here's an updated version of the volatile document - as a plain text
> file this time.  It drops a new file in Documentation/, but might it be
> better as an addition to CodingStyle?
>  ...
---

I think the size of this file is OK as a separate file, but much too
big for CodingStyle. It would be good to also write a pithy paragraph
to go into CodingStyle to convey the rule and point at this file.

scott

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 21:45 ` Jesper Juhl
@ 2007-05-09 22:31   ` Satyam Sharma
  2007-05-09 22:47     ` Jesper Juhl
  0 siblings, 1 reply; 25+ messages in thread
From: Satyam Sharma @ 2007-05-09 22:31 UTC (permalink / raw)
  To: Jesper Juhl; +Cc: Jonathan Corbet, linux-kernel, akpm, Randy Dunlap

On 5/10/07, Jesper Juhl <jesper.juhl@gmail.com> wrote:
> On 09/05/07, Jonathan Corbet <corbet@lwn.net> wrote:
> > OK, here's an updated version of the volatile document - as a plain text
> > file this time.  It drops a new file in Documentation/, but might it be
> > better as an addition to CodingStyle?
> >
> IMHO this is better as a sepperate document rather than an adition to
> CodingStyle. The use of volatile is not a style issue, it's a
> correctness issue, so it doesn't belong in the CodingStyle document.

Yes, definitely.

> > diff -ruNp linux-2.6/Documentation/volatile.txt volatile/Documentation/volatile.txt
>
> Might I suggest a different filename: volatile-considered-harmful.txt

Yes, I feel it's important to have a filename that strongly states the
dim view its contents take on its subject (the "volatile" type
qualifier in this case). Sort of like stable-api-nonsense.txt vs
stable-api.txt

> > --- linux-2.6/Documentation/volatile.txt        1969-12-31 17:00:00.000000000 -0700
> > +++ volatile/Documentation/volatile.txt 2007-05-09 14:56:40.000000000 -0600
> > @@ -0,0 +1,127 @@
> > +Why the "volatile" type class should not be used
> > +------------------------------------------------
> > +
> > +The C Programming Language, Second Edition (copyright 1988, still known as
> > +"the new C book") has the following to say about the volatile keyword:
> > +
> > +       The purpose of volatile is to force an implementation to suppress
> > +       optimization that could otherwise occur.  For example, for a
> > +       machine with memory-mapped input/output, a pointer to a device
> > +       register might be declared as a pointer to volatile, in
> > +       order to prevent the compiler from removing apparently redundant
> > +       references through the pointer.
> > +
> > +C programmers have often taken volatile to mean that the variable could be
> > +changed outside of the current thread of execution; as a result, they are
>
> you write: "... that the variable could be changed outside of the
> current thread of execution ..."
>
> I suggest: "... that the variable could be changed outside of the
> current thread of execution - a sort of simple atomic variable ..."

I'm not so sure here. Why would any C programmer (worth his weight in
salt) think that volatile objects are automatically _atomic_? At
worst, the mistake someone might make would be to _implement_ locking
primitives using volatile. "that the variable could be changed outside
of the current thread of execution" sounds sufficient to me, and after
all, that is exactly what volatile hints to the compiler.

> > +For most code, none of the above justifications for volatile
> > +apply.  As a result, the use of volatile is likely to be seen as a
> > +bug and will bring additional scrutiny to the code.  Developers who are
> > +tempted to use volatile should take a step back and think about
> > +what they are truly trying to accomplish.
> > +
>
> Suggested addition :
>
> Patches that remove volatile from current code (provided there's a
> good explanation of why the volatile can be removed and how the bug it
> was hiding has been dealt with) are a good thing.
>
> Friends don't let friends use volatile.

Yes, this addition would be nice :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 22:05 ` Heikki Orsila
@ 2007-05-09 22:19   ` Randy Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: Randy Dunlap @ 2007-05-09 22:19 UTC (permalink / raw)
  To: Heikki Orsila; +Cc: Jonathan Corbet, linux-kernel, akpm

Heikki Orsila wrote:
> Imo, the best style to relay information is by directly stating facts.
> I'm going to try to suggest some improvements on this..
> 
> On Wed, May 09, 2007 at 03:05:44PM -0600, Jonathan Corbet wrote:
>> Andrew Morton recently called out[1] use of volatile in a
>> +submitted patch, saying:
>> +
>> +	The volatiles are a worry - volatile is said to be
>> +	basically-always-wrong in-kernel, although we've never managed to
>> +	document why, and i386 cheerfully uses it in readb() and friends.
>> +
>> +In response, Randy Dunlap pulled together some email from Linus[2] on the
>> +topic and suggested that we could maybe "document why."  Here is the
>> +result.
> 
> I think previous text is unnecessary. Just tell the reasons why 
> volatile is bad and what should be used instead, no need to quote 
> people.

Ack, the doc. history isn't needed.

>> +The point that Linus often makes with regard to volatile is that
>> +its purpose is to suppress optimization, which is almost never what one
>> +really wants to do.  In the kernel, one must protect accesses to data
>> +against race conditions, which is very much a different task.  
> 
> Again, I would write this in non-personal way:
> 
> "Volatile is often used to prevent optimization, but it is not the
> behavior that is actually wanted. Access to data must be protected and 
> handled with kernel provided synchronization, mutual exclusion and 
> barriers tools. These tools make volatile useless."
> 
>> +Like volatile, the kernel primitives which make concurrent access to data
>> +safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
>> +unwanted optimization.  If they are being used properly, there will be no
>> +need to use volatile as well.
> 
> The previous suggestion takes care of explaining that, remove this.
> 
>> If volatile is still necessary, there is
>> +almost certainly a bug in the code somewhere.  In properly-written kernel
>> +code, volatile can only serve to slow things down.
>> +
>> +Consider a typical block of kernel code:
>> +
>> +    spin_lock(&the_lock);
>> +    do_something_on(&shared_data);
>> +    do_something_else_with(&shared_data);
>> +    spin_unlock(&the_lock);
>> +
>> +If all the code follows the locking rules, the value of shared_data cannot
>> +change unexpectedly while the_lock is held.  Any other code which might
>> +want to play with that data will be waiting on the lock.
> 
> Ok
> 
>> +The spinlock
>> +primitives act as memory barriers - they are explicitly written to do so -
>> +meaning that data accesses will not be optimized across them.
> 
> "Spinlock primitives will serialise execution of code regions, and 
> memory barrier functions must be used inside spinlocks to force 
> loads and stores."
> 
>> So the
>> +compiler might think it knows what will be in some_data, but the
>> +spin_lock() call will force it to forget anything it knows.  There will be
>> +no optimization problems with accesses to that data.
> 
> I would say it directly:
> 
> "Spinlock will force an implicit memory barrier, thus preventing 
> optimizations to data access."
> 
>> +If shared_data were declared volatile, the locking would
>> +still be necessary. But the compiler would also be prevented from
>> +optimizing access to shared _within_ the critical section,
>> +when we know that nobody else can be working with it.  While the lock is
>> +held, shared_data is not volatile.
> 
> ok
> 
>>   This is why Linus says:
>> +
>> +	Also, more importantly, "volatile" is on the wrong _part_ of the
>> +	whole system. In C, it's "data" that is volatile, but that is
>> +	insane. Data isn't volatile - _accesses_ are volatile. So it may
>> +	make sense to say "make this particular _access_ be careful", but
>> +	not "make all accesses to this data use some random strategy".
> 
> Unnecessary quoting, imo. Tell the same information directly without 
> personifying the statement.


-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 21:05 [PATCH] "volatile considered harmful" document Jonathan Corbet
  2007-05-09 21:45 ` Jesper Juhl
@ 2007-05-09 22:05 ` Heikki Orsila
  2007-05-09 22:19   ` Randy Dunlap
  2007-05-09 22:35 ` Scott Preece
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Heikki Orsila @ 2007-05-09 22:05 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel, akpm, Randy Dunlap

Imo, the best style to relay information is by directly stating facts.
I'm going to try to suggest some improvements on this..

On Wed, May 09, 2007 at 03:05:44PM -0600, Jonathan Corbet wrote:
> Andrew Morton recently called out[1] use of volatile in a
> +submitted patch, saying:
> +
> +	The volatiles are a worry - volatile is said to be
> +	basically-always-wrong in-kernel, although we've never managed to
> +	document why, and i386 cheerfully uses it in readb() and friends.
> +
> +In response, Randy Dunlap pulled together some email from Linus[2] on the
> +topic and suggested that we could maybe "document why."  Here is the
> +result.

I think previous text is unnecessary. Just tell the reasons why 
volatile is bad and what should be used instead, no need to quote 
people.

> +The point that Linus often makes with regard to volatile is that
> +its purpose is to suppress optimization, which is almost never what one
> +really wants to do.  In the kernel, one must protect accesses to data
> +against race conditions, which is very much a different task.  

Again, I would write this in non-personal way:

"Volatile is often used to prevent optimization, but it is not the
behavior that is actually wanted. Access to data must be protected and 
handled with kernel provided synchronization, mutual exclusion and 
barriers tools. These tools make volatile useless."

> +Like volatile, the kernel primitives which make concurrent access to data
> +safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
> +unwanted optimization.  If they are being used properly, there will be no
> +need to use volatile as well.

The previous suggestion takes care of explaining that, remove this.

> If volatile is still necessary, there is
> +almost certainly a bug in the code somewhere.  In properly-written kernel
> +code, volatile can only serve to slow things down.
> +
> +Consider a typical block of kernel code:
> +
> +    spin_lock(&the_lock);
> +    do_something_on(&shared_data);
> +    do_something_else_with(&shared_data);
> +    spin_unlock(&the_lock);
> +
> +If all the code follows the locking rules, the value of shared_data cannot
> +change unexpectedly while the_lock is held.  Any other code which might
> +want to play with that data will be waiting on the lock.

Ok

> +The spinlock
> +primitives act as memory barriers - they are explicitly written to do so -
> +meaning that data accesses will not be optimized across them.

"Spinlock primitives will serialise execution of code regions, and 
memory barrier functions must be used inside spinlocks to force 
loads and stores."

> So the
> +compiler might think it knows what will be in some_data, but the
> +spin_lock() call will force it to forget anything it knows.  There will be
> +no optimization problems with accesses to that data.

I would say it directly:

"Spinlock will force an implicit memory barrier, thus preventing 
optimizations to data access."

> +If shared_data were declared volatile, the locking would
> +still be necessary. But the compiler would also be prevented from
> +optimizing access to shared _within_ the critical section,
> +when we know that nobody else can be working with it.  While the lock is
> +held, shared_data is not volatile.

ok

>   This is why Linus says:
> +
> +	Also, more importantly, "volatile" is on the wrong _part_ of the
> +	whole system. In C, it's "data" that is volatile, but that is
> +	insane. Data isn't volatile - _accesses_ are volatile. So it may
> +	make sense to say "make this particular _access_ be careful", but
> +	not "make all accesses to this data use some random strategy".

Unnecessary quoting, imo. Tell the same information directly without 
personifying the statement.

-- 
Heikki Orsila			Barbie's law:
heikki.orsila@iki.fi		"Math is hard, let's go shopping!"
http://www.iki.fi/shd

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] "volatile considered harmful" document
  2007-05-09 21:05 [PATCH] "volatile considered harmful" document Jonathan Corbet
@ 2007-05-09 21:45 ` Jesper Juhl
  2007-05-09 22:31   ` Satyam Sharma
  2007-05-09 22:05 ` Heikki Orsila
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Jesper Juhl @ 2007-05-09 21:45 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel, akpm, Randy Dunlap

On 09/05/07, Jonathan Corbet <corbet@lwn.net> wrote:
> OK, here's an updated version of the volatile document - as a plain text
> file this time.  It drops a new file in Documentation/, but might it be
> better as an addition to CodingStyle?
>
IMHO this is better as a sepperate document rather than an adition to
CodingStyle. The use of volatile is not a style issue, it's a
correctness issue, so it doesn't belong in the CodingStyle document.


> Comments welcome,
>
Find a few below :)


> jon
>
> Tell kernel developers why they shouldn't use volatile.
>
> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
>
> diff -ruNp linux-2.6/Documentation/volatile.txt volatile/Documentation/volatile.txt

Might I suggest a different filename: volatile-considered-harmful.txt


> --- linux-2.6/Documentation/volatile.txt        1969-12-31 17:00:00.000000000 -0700
> +++ volatile/Documentation/volatile.txt 2007-05-09 14:56:40.000000000 -0600
> @@ -0,0 +1,127 @@
> +Why the "volatile" type class should not be used
> +------------------------------------------------
> +
> +The C Programming Language, Second Edition (copyright 1988, still known as
> +"the new C book") has the following to say about the volatile keyword:
> +
> +       The purpose of volatile is to force an implementation to suppress
> +       optimization that could otherwise occur.  For example, for a
> +       machine with memory-mapped input/output, a pointer to a device
> +       register might be declared as a pointer to volatile, in
> +       order to prevent the compiler from removing apparently redundant
> +       references through the pointer.
> +
> +C programmers have often taken volatile to mean that the variable could be
> +changed outside of the current thread of execution; as a result, they are

you write: "... that the variable could be changed outside of the
current thread of execution ..."

I suggest: "... that the variable could be changed outside of the
current thread of execution - a sort of simple atomic variable ..."

> +sometimes tempted to use it in kernel code when shared data structures are
> +being used.  Andrew Morton recently called out[1] use of volatile in a
> +submitted patch, saying:
> +
> +       The volatiles are a worry - volatile is said to be
> +       basically-always-wrong in-kernel, although we've never managed to
> +       document why, and i386 cheerfully uses it in readb() and friends.
> +
> +In response, Randy Dunlap pulled together some email from Linus[2] on the
> +topic and suggested that we could maybe "document why."  Here is the
> +result.
> +
> +The point that Linus often makes with regard to volatile is that
> +its purpose is to suppress optimization, which is almost never what one
> +really wants to do.  In the kernel, one must protect accesses to data
> +against race conditions, which is very much a different task.
> +
> +Like volatile, the kernel primitives which make concurrent access to data
> +safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
> +unwanted optimization.  If they are being used properly, there will be no
> +need to use volatile as well.  If volatile is still necessary, there is
> +almost certainly a bug in the code somewhere.  In properly-written kernel
> +code, volatile can only serve to slow things down.
> +
> +Consider a typical block of kernel code:
> +
> +    spin_lock(&the_lock);
> +    do_something_on(&shared_data);
> +    do_something_else_with(&shared_data);
> +    spin_unlock(&the_lock);
> +
> +If all the code follows the locking rules, the value of shared_data cannot
> +change unexpectedly while the_lock is held.  Any other code which might
> +want to play with that data will be waiting on the lock.  The spinlock
> +primitives act as memory barriers - they are explicitly written to do so -
> +meaning that data accesses will not be optimized across them.  So the
> +compiler might think it knows what will be in some_data, but the
> +spin_lock() call will force it to forget anything it knows.  There will be
> +no optimization problems with accesses to that data.
> +
> +If shared_data were declared volatile, the locking would
> +still be necessary.  But the compiler would also be prevented from
> +optimizing access to shared _within_ the critical section,
> +when we know that nobody else can be working with it.  While the lock is
> +held, shared_data is not volatile.  This is why Linus says:
> +
> +       Also, more importantly, "volatile" is on the wrong _part_ of the
> +       whole system. In C, it's "data" that is volatile, but that is
> +       insane. Data isn't volatile - _accesses_ are volatile. So it may
> +       make sense to say "make this particular _access_ be careful", but
> +       not "make all accesses to this data use some random strategy".
> +
> +When dealing with shared data, proper locking makes volatile unnecessary -
> +and potentially harmful.
> +
> +The volatile storage class was originally meant for memory-mapped I/O
> +registers.  Within the kernel, register accesses, too, should be protected
> +by locks, but one also does not want the compiler "optimizing" register
> +accesses within a critical section.  But, within the kernel, I/O memory
> +accesses are always done through accessor functions; accessing I/O memory
> +directly through pointers is frowned upon and does not work on all
> +architectures.  Those accessors are written to prevent unwanted
> +optimization, so, once again, volatile is unnecessary.
> +
> +Another situation where one might be tempted to use volatile is
> +when the processor is busy-waiting on the value of a variable.  The right
> +way to perform a busy wait is:
> +
> +    while (my_variable != what_i_want)
> +        cpu_relax();
> +
> +The cpu_relax() call can lower CPU power consumption or yield to a
> +hyperthreaded twin processor; it also happens to serve as a memory barrier,
> +so, once again, volatile is unnecessary.  Of course, busy-waiting is
> +generally an anti-social act to begin with.
> +
> +There are still a few rare situations where volatile makes sense in the
> +kernel:
> +
> +  - The above-mentioned accessor functions might use volatile on
> +    architectures where direct I/O memory access does work.  Essentially,
> +    each accessor call becomes a little critical section on its own and
> +    ensures that the access happens as expected by the programmer.
> +
> +  - Inline assembly code which changes memory, but which has no other
> +    visible side effects, risks being deleted by GCC.  Adding the volatile
> +    keyword to asm statements will prevent this removal.
> +
> +  - The jiffies variable is special in that it can have a different value
> +    every time it is referenced, but it can be read without any special
> +    locking.  So jiffies can be volatile, but the addition of other
> +    variables of this type is frowned upon.  Jiffies is considered to be a

suggestion: "frowned strongly upon"

> +    "stupid legacy" issue in this regard.
> +
> +For most code, none of the above justifications for volatile
> +apply.  As a result, the use of volatile is likely to be seen as a
> +bug and will bring additional scrutiny to the code.  Developers who are
> +tempted to use volatile should take a step back and think about
> +what they are truly trying to accomplish.
> +

Suggested addition :

Patches that remove volatile from current code (provided there's a
good explanation of why the volatile can be removed and how the bug it
was hiding has been dealt with) are a good thing.

Friends don't let friends use volatile.


> +NOTES
> +-----
> +
> +[1] http://lwn.net/Articles/233481/
> +[2] http://lwn.net/Articles/233482/
> +
> +CREDITS
> +-------
> +
> +Original impetus and research by Randy Dunlap
> +Written by Jonathan Corbet
> +Improvements via coments from Satyam Sharma Johannes Stezenbach


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] "volatile considered harmful" document
@ 2007-05-09 21:05 Jonathan Corbet
  2007-05-09 21:45 ` Jesper Juhl
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Jonathan Corbet @ 2007-05-09 21:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm, Randy Dunlap

OK, here's an updated version of the volatile document - as a plain text
file this time.  It drops a new file in Documentation/, but might it be
better as an addition to CodingStyle?

Comments welcome,

jon

Tell kernel developers why they shouldn't use volatile.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>

diff -ruNp linux-2.6/Documentation/volatile.txt volatile/Documentation/volatile.txt
--- linux-2.6/Documentation/volatile.txt	1969-12-31 17:00:00.000000000 -0700
+++ volatile/Documentation/volatile.txt	2007-05-09 14:56:40.000000000 -0600
@@ -0,0 +1,127 @@
+Why the "volatile" type class should not be used
+------------------------------------------------
+
+The C Programming Language, Second Edition (copyright 1988, still known as
+"the new C book") has the following to say about the volatile keyword:
+
+	The purpose of volatile is to force an implementation to suppress
+	optimization that could otherwise occur.  For example, for a
+	machine with memory-mapped input/output, a pointer to a device
+	register might be declared as a pointer to volatile, in
+	order to prevent the compiler from removing apparently redundant
+	references through the pointer.
+
+C programmers have often taken volatile to mean that the variable could be
+changed outside of the current thread of execution; as a result, they are
+sometimes tempted to use it in kernel code when shared data structures are
+being used.  Andrew Morton recently called out[1] use of volatile in a
+submitted patch, saying:
+
+	The volatiles are a worry - volatile is said to be
+	basically-always-wrong in-kernel, although we've never managed to
+	document why, and i386 cheerfully uses it in readb() and friends.
+
+In response, Randy Dunlap pulled together some email from Linus[2] on the
+topic and suggested that we could maybe "document why."  Here is the
+result.
+
+The point that Linus often makes with regard to volatile is that
+its purpose is to suppress optimization, which is almost never what one
+really wants to do.  In the kernel, one must protect accesses to data
+against race conditions, which is very much a different task.  
+
+Like volatile, the kernel primitives which make concurrent access to data
+safe (spinlocks, mutexes, memory barriers, etc.) are designed to prevent
+unwanted optimization.  If they are being used properly, there will be no
+need to use volatile as well.  If volatile is still necessary, there is
+almost certainly a bug in the code somewhere.  In properly-written kernel
+code, volatile can only serve to slow things down.
+
+Consider a typical block of kernel code:
+
+    spin_lock(&the_lock);
+    do_something_on(&shared_data);
+    do_something_else_with(&shared_data);
+    spin_unlock(&the_lock);
+
+If all the code follows the locking rules, the value of shared_data cannot
+change unexpectedly while the_lock is held.  Any other code which might
+want to play with that data will be waiting on the lock.  The spinlock
+primitives act as memory barriers - they are explicitly written to do so -
+meaning that data accesses will not be optimized across them.  So the
+compiler might think it knows what will be in some_data, but the
+spin_lock() call will force it to forget anything it knows.  There will be
+no optimization problems with accesses to that data.
+
+If shared_data were declared volatile, the locking would
+still be necessary.  But the compiler would also be prevented from
+optimizing access to shared _within_ the critical section,
+when we know that nobody else can be working with it.  While the lock is
+held, shared_data is not volatile.  This is why Linus says:
+
+	Also, more importantly, "volatile" is on the wrong _part_ of the
+	whole system. In C, it's "data" that is volatile, but that is
+	insane. Data isn't volatile - _accesses_ are volatile. So it may
+	make sense to say "make this particular _access_ be careful", but
+	not "make all accesses to this data use some random strategy".
+
+When dealing with shared data, proper locking makes volatile unnecessary -
+and potentially harmful.
+
+The volatile storage class was originally meant for memory-mapped I/O
+registers.  Within the kernel, register accesses, too, should be protected
+by locks, but one also does not want the compiler "optimizing" register
+accesses within a critical section.  But, within the kernel, I/O memory
+accesses are always done through accessor functions; accessing I/O memory
+directly through pointers is frowned upon and does not work on all
+architectures.  Those accessors are written to prevent unwanted
+optimization, so, once again, volatile is unnecessary.
+
+Another situation where one might be tempted to use volatile is
+when the processor is busy-waiting on the value of a variable.  The right
+way to perform a busy wait is:
+
+    while (my_variable != what_i_want)
+        cpu_relax();
+
+The cpu_relax() call can lower CPU power consumption or yield to a
+hyperthreaded twin processor; it also happens to serve as a memory barrier,
+so, once again, volatile is unnecessary.  Of course, busy-waiting is
+generally an anti-social act to begin with.
+
+There are still a few rare situations where volatile makes sense in the
+kernel:
+
+  - The above-mentioned accessor functions might use volatile on
+    architectures where direct I/O memory access does work.  Essentially,
+    each accessor call becomes a little critical section on its own and
+    ensures that the access happens as expected by the programmer.
+
+  - Inline assembly code which changes memory, but which has no other
+    visible side effects, risks being deleted by GCC.  Adding the volatile
+    keyword to asm statements will prevent this removal.
+
+  - The jiffies variable is special in that it can have a different value
+    every time it is referenced, but it can be read without any special
+    locking.  So jiffies can be volatile, but the addition of other
+    variables of this type is frowned upon.  Jiffies is considered to be a
+    "stupid legacy" issue in this regard.
+
+For most code, none of the above justifications for volatile
+apply.  As a result, the use of volatile is likely to be seen as a
+bug and will bring additional scrutiny to the code.  Developers who are
+tempted to use volatile should take a step back and think about
+what they are truly trying to accomplish.
+
+NOTES
+-----
+
+[1] http://lwn.net/Articles/233481/
+[2] http://lwn.net/Articles/233482/
+
+CREDITS
+-------
+
+Original impetus and research by Randy Dunlap
+Written by Jonathan Corbet
+Improvements via coments from Satyam Sharma Johannes Stezenbach 


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2007-05-16  9:30 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fa.UIDGHS72acFv9jKylmdQQwWcXPA@ifi.uio.no>
     [not found] ` <fa.fKNBJtZJWOQthlLjc1TDfY6jCLc@ifi.uio.no>
2007-05-12 18:32   ` [PATCH] "volatile considered harmful" document Robert Hancock
2007-05-13 16:30     ` Krzysztof Halasa
2007-05-13 23:26       ` Bill Davidsen
2007-05-13 23:53         ` Jeff Garzik
2007-05-14 14:10           ` Bill Davidsen
2007-05-14 14:21             ` [2.6.21] circular locking dependency found in QUOTA OFF Folkert van Heusden
2007-05-14 17:43               ` Michal Piotrowski
2007-05-14 17:44                 ` Folkert van Heusden
2007-05-15 14:09                 ` Jan Kara
2007-05-15 17:52                   ` Folkert van Heusden
2007-05-15 18:14                     ` Jan Kara
2007-05-15 19:18                     ` Jan Kara
2007-05-09 21:05 [PATCH] "volatile considered harmful" document Jonathan Corbet
2007-05-09 21:45 ` Jesper Juhl
2007-05-09 22:31   ` Satyam Sharma
2007-05-09 22:47     ` Jesper Juhl
2007-05-09 23:20       ` Satyam Sharma
2007-05-09 22:05 ` Heikki Orsila
2007-05-09 22:19   ` Randy Dunlap
2007-05-09 22:35 ` Scott Preece
2007-05-10 16:14 ` Giacomo A. Catenazzi
2007-05-10 19:35 ` H. Peter Anvin
2007-05-10 22:00   ` Alan Cox
2007-05-10 21:54 ` Bill Davidsen
2007-05-16  9:30 ` Thomas De Schampheleire

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).