Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* rtnl_trylock() versus SCHED_FIFO lockup
@ 2020-08-05 14:25 Rasmus Villemoes
2020-08-05 23:34 ` Stephen Hemminger
0 siblings, 1 reply; 10+ messages in thread
From: Rasmus Villemoes @ 2020-08-05 14:25 UTC (permalink / raw)
To: Network Development
Hi,
We're seeing occasional lockups on an embedded board (running an -rt
kernel), which I believe I've tracked down to the
if (!rtnl_trylock())
return restart_syscall();
in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
loop, and the lower-priority task never gets runtime and thus cannot
release the lock.
I've written a script that rather quickly reproduces this both on our
target and my desktop machine (pinning everything on one CPU to emulate
the uni-processor board), see below. Also, with this hacky patch
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 0318a69888d4..df8078c023d2 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -36,6 +36,7 @@ static ssize_t store_bridge_parm(struct device *d,
char *endp;
unsigned long val;
int err;
+ static unsigned int restarts;
if (!ns_capable(dev_net(br->dev)->user_ns, CAP_NET_ADMIN))
return -EPERM;
@@ -44,8 +45,14 @@ static ssize_t store_bridge_parm(struct device *d,
if (endp == buf)
return -EINVAL;
- if (!rtnl_trylock())
- return restart_syscall();
+ if (!rtnl_trylock()) {
+ restarts++;
+ if (restarts < 100)
+ return restart_syscall();
+ pr_err("too many restarts, doing unconditional
rtnl_lock()\n");
+ rtnl_lock();
+ }
+ restarts = 0;
err = (*set)(br, val);
if (!err)
priority inheritance kicks in and boosts the lower-prio thread so the
lockup doesn't happen. But I'm failing to come up with a proper solution.
Thoughts?
Thanks,
Rasmus
Reproducer:
#!/bin/bash
dev=br-test
flusher() {
# $$ doesn't work as expected in subshells
read -r pid _ < /proc/self/stat
echo "flusher: PID $pid"
chrt -f -p 20 $pid
while true ; do
echo 1 > /sys/class/net/${dev}/bridge/flush
sleep .15
done
exit 0
}
worker() {
read -r pid _ < /proc/self/stat
echo "worker: PID $pid"
chrt -f -p 10 $pid
while true ; do
read -n 1 -u 12
ip addr add 200.201.202.203/24 dev ${dev}
ip addr del 200.201.202.203/24 dev ${dev}
echo -n . >&21
done
exit 0
}
taskset -p 1 $$
chrt -f -p 30 $$
tmpdir=$(mktemp -d)
mkfifo ${tmpdir}/a
mkfifo ${tmpdir}/b
exec 12<> ${tmpdir}/a
exec 21<> ${tmpdir}/b
ip link add name $dev type bridge
( flusher ) &
flusher_pid=$!
( worker ) &
worker_pid=$!
sleep .1
printf '\n'
count=0
while ! [ -e /tmp/stop ] && [ $count -lt 1000 ]; do
echo -n . >&12
read -n 1 -u 21 -t 10
if [ $? -gt 0 ] ; then
printf '\nlockup?!\n'
sleep 20
break
fi
count=$((count+1))
printf '\r%4d' $count
sleep .02
done
kill $flusher_pid
kill $worker_pid
wait
rm -rf $tmpdir
ip link del $dev type bridge
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
2020-08-05 14:25 rtnl_trylock() versus SCHED_FIFO lockup Rasmus Villemoes
@ 2020-08-05 23:34 ` Stephen Hemminger
2020-08-06 9:17 ` Rasmus Villemoes
0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2020-08-05 23:34 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: Network Development
On Wed, 5 Aug 2020 16:25:23 +0200
Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
> Hi,
>
> We're seeing occasional lockups on an embedded board (running an -rt
> kernel), which I believe I've tracked down to the
>
> if (!rtnl_trylock())
> return restart_syscall();
>
> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
> loop, and the lower-priority task never gets runtime and thus cannot
> release the lock.
>
> I've written a script that rather quickly reproduces this both on our
> target and my desktop machine (pinning everything on one CPU to emulate
> the uni-processor board), see below. Also, with this hacky patch
There is a reason for the trylock, it works around a priority inversion.
The real problem is expecting a SCHED_FIFO task to be safe with this
kind of network operation.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
2020-08-05 23:34 ` Stephen Hemminger
@ 2020-08-06 9:17 ` Rasmus Villemoes
2020-08-06 9:46 ` Nikolay Aleksandrov
0 siblings, 1 reply; 10+ messages in thread
From: Rasmus Villemoes @ 2020-08-06 9:17 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: Network Development
On 06/08/2020 01.34, Stephen Hemminger wrote:
> On Wed, 5 Aug 2020 16:25:23 +0200
> Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
>
>> Hi,
>>
>> We're seeing occasional lockups on an embedded board (running an -rt
>> kernel), which I believe I've tracked down to the
>>
>> if (!rtnl_trylock())
>> return restart_syscall();
>>
>> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
>> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
>> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
>> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
>> loop, and the lower-priority task never gets runtime and thus cannot
>> release the lock.
>>
>> I've written a script that rather quickly reproduces this both on our
>> target and my desktop machine (pinning everything on one CPU to emulate
>> the uni-processor board), see below. Also, with this hacky patch
>
> There is a reason for the trylock, it works around a priority inversion.
Can you elaborate? It seems to me that it _causes_ a priority inversion
since priority inheritance doesn't have a chance to kick in.
> The real problem is expecting a SCHED_FIFO task to be safe with this
> kind of network operation.
Maybe. But ignoring the SCHED_FIFO/rt-prio stuff, it also seems a bit
odd to do what is essentially a busy-loop - yes, the restart_syscall()
allows signals to be delivered (including allowing the process to get
killed), but in the absence of any signals, the pattern essentially
boils down to
while (!rtnl_trylock())
;
So even for regular tasks, this seems to needlessly hog the cpu.
I tried this
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 0318a69888d4..e40e264f9b16 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -44,8 +44,8 @@ static ssize_t store_bridge_parm(struct device *d,
if (endp == buf)
return -EINVAL;
- if (!rtnl_trylock())
- return restart_syscall();
+ if (rtnl_lock_interruptible())
+ return -ERESTARTNOINTR;
err = (*set)(br, val);
if (!err)
with the obvious definition of rtnl_lock_interruptible(), and it makes
the problem go away. Isn't it better to sleep waiting for the lock (and
with -rt, giving proper priority boost) or a signal to arrive rather
than busy-looping back and forth between syscall entry point and the
trylock()?
I see quite a lot of
if (mutex_lock_interruptible(...))
return -ERESTARTSYS;
but for the rtnl_mutex, I see the trylock...restart_syscall pattern
being used in a couple of places. So there must be something special
about the rtnl_mutex?
Thanks,
Rasmus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
2020-08-06 9:17 ` Rasmus Villemoes
@ 2020-08-06 9:46 ` Nikolay Aleksandrov
2020-08-07 3:39 ` Stephen Hemminger
0 siblings, 1 reply; 10+ messages in thread
From: Nikolay Aleksandrov @ 2020-08-06 9:46 UTC (permalink / raw)
To: Rasmus Villemoes, Stephen Hemminger; +Cc: Network Development
On 06/08/2020 12:17, Rasmus Villemoes wrote:
> On 06/08/2020 01.34, Stephen Hemminger wrote:
>> On Wed, 5 Aug 2020 16:25:23 +0200
>> Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
>>
>>> Hi,
>>>
>>> We're seeing occasional lockups on an embedded board (running an -rt
>>> kernel), which I believe I've tracked down to the
>>>
>>> if (!rtnl_trylock())
>>> return restart_syscall();
>>>
>>> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
>>> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
>>> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
>>> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
>>> loop, and the lower-priority task never gets runtime and thus cannot
>>> release the lock.
>>>
>>> I've written a script that rather quickly reproduces this both on our
>>> target and my desktop machine (pinning everything on one CPU to emulate
>>> the uni-processor board), see below. Also, with this hacky patch
>>
>> There is a reason for the trylock, it works around a priority inversion.
>
> Can you elaborate? It seems to me that it _causes_ a priority inversion
> since priority inheritance doesn't have a chance to kick in.
>
>> The real problem is expecting a SCHED_FIFO task to be safe with this
>> kind of network operation.
>
> Maybe. But ignoring the SCHED_FIFO/rt-prio stuff, it also seems a bit
> odd to do what is essentially a busy-loop - yes, the restart_syscall()
> allows signals to be delivered (including allowing the process to get
> killed), but in the absence of any signals, the pattern essentially
> boils down to
>
> while (!rtnl_trylock())
> ;
>
> So even for regular tasks, this seems to needlessly hog the cpu.
>
> I tried this
>
> diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> index 0318a69888d4..e40e264f9b16 100644
> --- a/net/bridge/br_sysfs_br.c
> +++ b/net/bridge/br_sysfs_br.c
> @@ -44,8 +44,8 @@ static ssize_t store_bridge_parm(struct device *d,
> if (endp == buf)
> return -EINVAL;
>
> - if (!rtnl_trylock())
> - return restart_syscall();
> + if (rtnl_lock_interruptible())
> + return -ERESTARTNOINTR;
>
> err = (*set)(br, val);
> if (!err)
>
> with the obvious definition of rtnl_lock_interruptible(), and it makes
> the problem go away. Isn't it better to sleep waiting for the lock (and
> with -rt, giving proper priority boost) or a signal to arrive rather
> than busy-looping back and forth between syscall entry point and the
> trylock()?
>
> I see quite a lot of
>
> if (mutex_lock_interruptible(...))
> return -ERESTARTSYS;
>
> but for the rtnl_mutex, I see the trylock...restart_syscall pattern
> being used in a couple of places. So there must be something special
> about the rtnl_mutex?
>
> Thanks,
> Rasmus
>
Hi Rasmus,
I haven't tested anything but git history (and some grepping) points to deadlocks when
sysfs entries are being changed under rtnl.
For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
This is a common usage pattern throughout net/, the bridge is not the only case and there are more
commits which talk about deadlocks.
Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
Cheers,
Nik
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
2020-08-06 9:46 ` Nikolay Aleksandrov
@ 2020-08-07 3:39 ` Stephen Hemminger
2020-08-07 8:03 ` Rasmus Villemoes
0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2020-08-07 3:39 UTC (permalink / raw)
To: Nikolay Aleksandrov; +Cc: Rasmus Villemoes, Network Development
On Thu, 6 Aug 2020 12:46:43 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> On 06/08/2020 12:17, Rasmus Villemoes wrote:
> > On 06/08/2020 01.34, Stephen Hemminger wrote:
> >> On Wed, 5 Aug 2020 16:25:23 +0200
> >> Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
> >>
> >>> Hi,
> >>>
> >>> We're seeing occasional lockups on an embedded board (running an -rt
> >>> kernel), which I believe I've tracked down to the
> >>>
> >>> if (!rtnl_trylock())
> >>> return restart_syscall();
> >>>
> >>> in net/bridge/br_sysfs_br.c. The problem is that some SCHED_FIFO task
> >>> writes a "1" to the /sys/class/net/foo/bridge/flush file, while some
> >>> lower-priority SCHED_FIFO task happens to hold rtnl_lock(). When that
> >>> happens, the higher-priority task is stuck in an eternal ERESTARTNOINTR
> >>> loop, and the lower-priority task never gets runtime and thus cannot
> >>> release the lock.
> >>>
> >>> I've written a script that rather quickly reproduces this both on our
> >>> target and my desktop machine (pinning everything on one CPU to emulate
> >>> the uni-processor board), see below. Also, with this hacky patch
> >>
> >> There is a reason for the trylock, it works around a priority inversion.
> >
> > Can you elaborate? It seems to me that it _causes_ a priority inversion
> > since priority inheritance doesn't have a chance to kick in.
> >
> >> The real problem is expecting a SCHED_FIFO task to be safe with this
> >> kind of network operation.
> >
> > Maybe. But ignoring the SCHED_FIFO/rt-prio stuff, it also seems a bit
> > odd to do what is essentially a busy-loop - yes, the restart_syscall()
> > allows signals to be delivered (including allowing the process to get
> > killed), but in the absence of any signals, the pattern essentially
> > boils down to
> >
> > while (!rtnl_trylock())
> > ;
> >
> > So even for regular tasks, this seems to needlessly hog the cpu.
> >
> > I tried this
> >
> > diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
> > index 0318a69888d4..e40e264f9b16 100644
> > --- a/net/bridge/br_sysfs_br.c
> > +++ b/net/bridge/br_sysfs_br.c
> > @@ -44,8 +44,8 @@ static ssize_t store_bridge_parm(struct device *d,
> > if (endp == buf)
> > return -EINVAL;
> >
> > - if (!rtnl_trylock())
> > - return restart_syscall();
> > + if (rtnl_lock_interruptible())
> > + return -ERESTARTNOINTR;
> >
> > err = (*set)(br, val);
> > if (!err)
> >
> > with the obvious definition of rtnl_lock_interruptible(), and it makes
> > the problem go away. Isn't it better to sleep waiting for the lock (and
> > with -rt, giving proper priority boost) or a signal to arrive rather
> > than busy-looping back and forth between syscall entry point and the
> > trylock()?
> >
> > I see quite a lot of
> >
> > if (mutex_lock_interruptible(...))
> > return -ERESTARTSYS;
> >
> > but for the rtnl_mutex, I see the trylock...restart_syscall pattern
> > being used in a couple of places. So there must be something special
> > about the rtnl_mutex?
> >
> > Thanks,
> > Rasmus
> >
>
> Hi Rasmus,
> I haven't tested anything but git history (and some grepping) points to deadlocks when
> sysfs entries are being changed under rtnl.
> For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
> This is a common usage pattern throughout net/, the bridge is not the only case and there are more
> commits which talk about deadlocks.
> Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
> would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
>
I was referring to AB BA lock inversion problems.
Yes the trylock goes back to:
commit af38f2989572704a846a5577b5ab3b1e2885cbfb
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Wed May 13 17:00:41 2009 +0000
net: Fix bridgeing sysfs handling of rtnl_lock
Holding rtnl_lock when we are unregistering the sysfs files can
deadlock if we unconditionally take rtnl_lock in a sysfs file. So fix
it with the now familiar patter of: rtnl_trylock and syscall_restart()
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that the unregister of netdevice happens under rtnl and
this unregister path has to remove sysfs and other objects.
So those objects have to have conditional locking.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
2020-08-07 3:39 ` Stephen Hemminger
@ 2020-08-07 8:03 ` Rasmus Villemoes
2020-08-07 15:03 ` Stephen Hemminger
[not found] ` <20200809134924.12056-1-hdanton@sina.com>
0 siblings, 2 replies; 10+ messages in thread
From: Rasmus Villemoes @ 2020-08-07 8:03 UTC (permalink / raw)
To: Stephen Hemminger, Nikolay Aleksandrov; +Cc: Network Development
On 07/08/2020 05.39, Stephen Hemminger wrote:
> On Thu, 6 Aug 2020 12:46:43 +0300
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>
>> On 06/08/2020 12:17, Rasmus Villemoes wrote:
>>> On 06/08/2020 01.34, Stephen Hemminger wrote:
>>>> On Wed, 5 Aug 2020 16:25:23 +0200
>>
>> Hi Rasmus,
>> I haven't tested anything but git history (and some grepping) points to deadlocks when
>> sysfs entries are being changed under rtnl.
>> For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
>> This is a common usage pattern throughout net/, the bridge is not the only case and there are more
>> commits which talk about deadlocks.
>> Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
>> would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
>>
>
> I was referring to AB BA lock inversion problems.
Ah, so lock inversion, not priority inversion.
>
> Yes the trylock goes back to:
>
> commit af38f2989572704a846a5577b5ab3b1e2885cbfb
> Author: Eric W. Biederman <ebiederm@xmission.com>
> Date: Wed May 13 17:00:41 2009 +0000
>
> net: Fix bridgeing sysfs handling of rtnl_lock
>
> Holding rtnl_lock when we are unregistering the sysfs files can
> deadlock if we unconditionally take rtnl_lock in a sysfs file. So fix
> it with the now familiar patter of: rtnl_trylock and syscall_restart()
>
> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> The problem is that the unregister of netdevice happens under rtnl and
> this unregister path has to remove sysfs and other objects.
> So those objects have to have conditional locking.
I see. And the reason the "trylock, unwind all the way back to syscall
entry and start over" works is that we then go through
kernfs_fop_write()
mutex_lock(&of->mutex);
if (!kernfs_get_active(of->kn)) {
mutex_unlock(&of->mutex);
len = -ENODEV;
goto out_free;
}
which makes the write fail with ENODEV if the sysfs node has already
been marked for removal.
If I'm reading the code correctly, doing "ip link set dev foobar type
bridge fdb_flush" is equivalent to writing to that sysfs file, except
the former ends up doing an unconditional rtnl_lock() and thus won't
have the livelocking issue.
Thanks,
Rasmus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
2020-08-07 8:03 ` Rasmus Villemoes
@ 2020-08-07 15:03 ` Stephen Hemminger
[not found] ` <20200809134924.12056-1-hdanton@sina.com>
1 sibling, 0 replies; 10+ messages in thread
From: Stephen Hemminger @ 2020-08-07 15:03 UTC (permalink / raw)
To: Rasmus Villemoes; +Cc: Nikolay Aleksandrov, Network Development
On Fri, 7 Aug 2020 10:03:59 +0200
Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
> On 07/08/2020 05.39, Stephen Hemminger wrote:
> > On Thu, 6 Aug 2020 12:46:43 +0300
> > Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> >
> >> On 06/08/2020 12:17, Rasmus Villemoes wrote:
> >>> On 06/08/2020 01.34, Stephen Hemminger wrote:
> >>>> On Wed, 5 Aug 2020 16:25:23 +0200
>
> >>
> >> Hi Rasmus,
> >> I haven't tested anything but git history (and some grepping) points to deadlocks when
> >> sysfs entries are being changed under rtnl.
> >> For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
> >> This is a common usage pattern throughout net/, the bridge is not the only case and there are more
> >> commits which talk about deadlocks.
> >> Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
> >> would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
> >>
> >
> > I was referring to AB BA lock inversion problems.
>
> Ah, so lock inversion, not priority inversion.
>
> >
> > Yes the trylock goes back to:
> >
> > commit af38f2989572704a846a5577b5ab3b1e2885cbfb
> > Author: Eric W. Biederman <ebiederm@xmission.com>
> > Date: Wed May 13 17:00:41 2009 +0000
> >
> > net: Fix bridgeing sysfs handling of rtnl_lock
> >
> > Holding rtnl_lock when we are unregistering the sysfs files can
> > deadlock if we unconditionally take rtnl_lock in a sysfs file. So fix
> > it with the now familiar patter of: rtnl_trylock and syscall_restart()
> >
> > Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
> > Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> >
> > The problem is that the unregister of netdevice happens under rtnl and
> > this unregister path has to remove sysfs and other objects.
> > So those objects have to have conditional locking.
> I see. And the reason the "trylock, unwind all the way back to syscall
> entry and start over" works is that we then go through
>
> kernfs_fop_write()
> mutex_lock(&of->mutex);
> if (!kernfs_get_active(of->kn)) {
> mutex_unlock(&of->mutex);
> len = -ENODEV;
> goto out_free;
> }
>
> which makes the write fail with ENODEV if the sysfs node has already
> been marked for removal.
>
> If I'm reading the code correctly, doing "ip link set dev foobar type
> bridge fdb_flush" is equivalent to writing to that sysfs file, except
> the former ends up doing an unconditional rtnl_lock() and thus won't
> have the livelocking issue.
>
> Thanks,
> Rasmus
ip commands use netlink, and netlink doesn't have the problem because
it doesn't go through a filesystem API.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
[not found] ` <20200809134924.12056-1-hdanton@sina.com>
@ 2020-08-09 14:12 ` Nikolay Aleksandrov
2020-08-09 14:18 ` Nikolay Aleksandrov
2020-08-09 15:32 ` Stephen Hemminger
1 sibling, 1 reply; 10+ messages in thread
From: Nikolay Aleksandrov @ 2020-08-09 14:12 UTC (permalink / raw)
To: Hillf Danton, Stephen Hemminger
Cc: Rasmus Villemoes, Network Development, Markus Elfring
On 09/08/2020 16:49, Hillf Danton wrote:
>
> On Fri, 7 Aug 2020 08:03:32 -0700 Stephen Hemminger wrote:
>> On Fri, 7 Aug 2020 10:03:59 +0200
>> Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
>>
>>> On 07/08/2020 05.39, Stephen Hemminger wrote:
>>>> On Thu, 6 Aug 2020 12:46:43 +0300
>>>> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>>>>
>>>>> On 06/08/2020 12:17, Rasmus Villemoes wrote:
>>>>>> On 06/08/2020 01.34, Stephen Hemminger wrote:
>>>>>>> On Wed, 5 Aug 2020 16:25:23 +0200
>>>
>>>>>
>>>>> Hi Rasmus,
>>>>> I haven't tested anything but git history (and some grepping) points to deadlocks when
>>>>> sysfs entries are being changed under rtnl.
>>>>> For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
>>>>> This is a common usage pattern throughout net/, the bridge is not the only case and there are more
>>>>> commits which talk about deadlocks.
>>>>> Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
>>>>> would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
>>>>>
>>>>
>>>> I was referring to AB BA lock inversion problems.
>>>
>>> Ah, so lock inversion, not priority inversion.
>
> Hi folks,
>
> Is it likely that kworker helps work around that deadlock, by
> acquiring the rtnl lock in the case that the current fails to
> trylock it?
>
> Hillf
You know it's a user writing to a file expecting config change, right?
There are numerous problems with deferring it (e.g. error handling).
Thanks,
Nik
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
2020-08-09 14:12 ` Nikolay Aleksandrov
@ 2020-08-09 14:18 ` Nikolay Aleksandrov
0 siblings, 0 replies; 10+ messages in thread
From: Nikolay Aleksandrov @ 2020-08-09 14:18 UTC (permalink / raw)
To: Hillf Danton, Stephen Hemminger
Cc: Rasmus Villemoes, Network Development, Markus Elfring
On 09/08/2020 17:12, Nikolay Aleksandrov wrote:
> On 09/08/2020 16:49, Hillf Danton wrote:
>>
>> On Fri, 7 Aug 2020 08:03:32 -0700 Stephen Hemminger wrote:
>>> On Fri, 7 Aug 2020 10:03:59 +0200
>>> Rasmus Villemoes <rasmus.villemoes@prevas.dk> wrote:
>>>
>>>> On 07/08/2020 05.39, Stephen Hemminger wrote:
>>>>> On Thu, 6 Aug 2020 12:46:43 +0300
>>>>> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>>>>>
>>>>>> On 06/08/2020 12:17, Rasmus Villemoes wrote:
>>>>>>> On 06/08/2020 01.34, Stephen Hemminger wrote:
>>>>>>>> On Wed, 5 Aug 2020 16:25:23 +0200
>>>>
>>>>>>
>>>>>> Hi Rasmus,
>>>>>> I haven't tested anything but git history (and some grepping) points to deadlocks when
>>>>>> sysfs entries are being changed under rtnl.
>>>>>> For example check: af38f2989572704a846a5577b5ab3b1e2885cbfb and 336ca57c3b4e2b58ea3273e6d978ab3dfa387b4c
>>>>>> This is a common usage pattern throughout net/, the bridge is not the only case and there are more
>>>>>> commits which talk about deadlocks.
>>>>>> Again I haven't verified anything but it seems on device delete (w/ rtnl held) -> sysfs delete
>>>>>> would wait for current readers, but current readers might be stuck waiting on rtnl and we can deadlock.
>>>>>>
>>>>>
>>>>> I was referring to AB BA lock inversion problems.
>>>>
>>>> Ah, so lock inversion, not priority inversion.
>>
>> Hi folks,
>>
>> Is it likely that kworker helps work around that deadlock, by
>> acquiring the rtnl lock in the case that the current fails to
>> trylock it?
>>
>> Hillf
>
> You know it's a user writing to a file expecting config change, right?
> There are numerous problems with deferring it (e.g. error handling).
>
> Thanks,
> Nik
OK, admittedly spoke too soon about the error handling. :)
But I still think it suffers the same problem if the sysfs files are going to be destroyed
under rtnl while you're writing in one. Their users are "drained", so it will again wait forever.
Because neither rtnl will be released, nor the writer will finish.
And it may become even more interesting if we're trying to remove the bridge module at that time.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: rtnl_trylock() versus SCHED_FIFO lockup
[not found] ` <20200809134924.12056-1-hdanton@sina.com>
2020-08-09 14:12 ` Nikolay Aleksandrov
@ 2020-08-09 15:32 ` Stephen Hemminger
1 sibling, 0 replies; 10+ messages in thread
From: Stephen Hemminger @ 2020-08-09 15:32 UTC (permalink / raw)
To: Hillf Danton
Cc: Rasmus Villemoes, Nikolay Aleksandrov, Network Development,
Markus Elfring
On Sun, 9 Aug 2020 21:49:24 +0800
Hillf Danton <hdanton@sina.com> wrote:
> +
> +static void br_workfn(struct work_struct *w)
> +{
> + struct br_work *brw = container_of(w, struct br_work, work);
> +
> + rtnl_lock();
> + brw->err = brw->set(brw->br, brw->val);
> + if (!brw->err)
> + netdev_state_change(brw->br->dev);
> + rtnl_unlock();
> +
> + brw->done = true;
> + wake_up(&brw->waitq);
> +}
Sorry, this is unsafe.
This has the potential of running when bridge itself has been
deleted.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2020-08-09 15:32 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-05 14:25 rtnl_trylock() versus SCHED_FIFO lockup Rasmus Villemoes
2020-08-05 23:34 ` Stephen Hemminger
2020-08-06 9:17 ` Rasmus Villemoes
2020-08-06 9:46 ` Nikolay Aleksandrov
2020-08-07 3:39 ` Stephen Hemminger
2020-08-07 8:03 ` Rasmus Villemoes
2020-08-07 15:03 ` Stephen Hemminger
[not found] ` <20200809134924.12056-1-hdanton@sina.com>
2020-08-09 14:12 ` Nikolay Aleksandrov
2020-08-09 14:18 ` Nikolay Aleksandrov
2020-08-09 15:32 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).