LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
@ 2021-11-02  2:12 Ziyang Xuan
  2021-11-03 13:50 ` Jason Gunthorpe
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ziyang Xuan @ 2021-11-02  2:12 UTC (permalink / raw)
  To: davem, kuba; +Cc: jgg, netdev, linux-kernel

The real_dev of a vlan net_device may be freed after
unregister_vlan_dev(). Access the real_dev continually by
vlan_dev_real_dev() will trigger the UAF problem for the
real_dev like following:

==================================================================
BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
Call Trace:
 kasan_report.cold+0x83/0xdf
 vlan_dev_real_dev+0xf9/0x120
 is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
 is_eth_port_of_netdev_filter+0x28/0x40
 ib_enum_roce_netdev+0x1a3/0x300
 ib_enum_all_roce_netdevs+0xc7/0x140
 netdevice_event_work_handler+0x9d/0x210
...

Freed by task 9288:
 kasan_save_stack+0x1b/0x40
 kasan_set_track+0x1c/0x30
 kasan_set_free_info+0x20/0x30
 __kasan_slab_free+0xfc/0x130
 slab_free_freelist_hook+0xdd/0x240
 kfree+0xe4/0x690
 kvfree+0x42/0x50
 device_release+0x9f/0x240
 kobject_put+0x1c8/0x530
 put_device+0x1b/0x30
 free_netdev+0x370/0x540
 ppp_destroy_interface+0x313/0x3d0
...

Move the put_device(real_dev) to vlan_dev_free(). Ensure
real_dev not be freed before vlan_dev unregistered.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
---
 net/8021q/vlan.c     | 3 ---
 net/8021q/vlan_dev.c | 3 +++
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 55275ef9a31a..a3a0a5e994f5 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
 	}
 
 	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
-
-	/* Get rid of the vlan's reference to real_dev */
-	dev_put(real_dev);
 }
 
 int vlan_check_real_dev(struct net_device *real_dev,
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index 0c21d1fec852..aeeb5f90417b 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
 
 	free_percpu(vlan->vlan_pcpu_stats);
 	vlan->vlan_pcpu_stats = NULL;
+
+	/* Get rid of the vlan's reference to real_dev */
+	dev_put(vlan->real_dev);
 }
 
 void vlan_setup(struct net_device *dev)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-02  2:12 [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan
@ 2021-11-03 13:50 ` Jason Gunthorpe
  2021-11-03 15:47   ` Jakub Kicinski
  2021-11-03 14:30 ` patchwork-bot+netdevbpf
  2021-11-15 17:04 ` Petr Machata
  2 siblings, 1 reply; 18+ messages in thread
From: Jason Gunthorpe @ 2021-11-03 13:50 UTC (permalink / raw)
  To: Ziyang Xuan; +Cc: davem, kuba, netdev, linux-kernel

On Tue, Nov 02, 2021 at 10:12:18AM +0800, Ziyang Xuan wrote:
> The real_dev of a vlan net_device may be freed after
> unregister_vlan_dev(). Access the real_dev continually by
> vlan_dev_real_dev() will trigger the UAF problem for the
> real_dev like following:
> 
> ==================================================================
> BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> Call Trace:
>  kasan_report.cold+0x83/0xdf
>  vlan_dev_real_dev+0xf9/0x120
>  is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
>  is_eth_port_of_netdev_filter+0x28/0x40
>  ib_enum_roce_netdev+0x1a3/0x300
>  ib_enum_all_roce_netdevs+0xc7/0x140
>  netdevice_event_work_handler+0x9d/0x210
> ...
> 
> Freed by task 9288:
>  kasan_save_stack+0x1b/0x40
>  kasan_set_track+0x1c/0x30
>  kasan_set_free_info+0x20/0x30
>  __kasan_slab_free+0xfc/0x130
>  slab_free_freelist_hook+0xdd/0x240
>  kfree+0xe4/0x690
>  kvfree+0x42/0x50
>  device_release+0x9f/0x240
>  kobject_put+0x1c8/0x530
>  put_device+0x1b/0x30
>  free_netdev+0x370/0x540
>  ppp_destroy_interface+0x313/0x3d0
> ...
> 
> Move the put_device(real_dev) to vlan_dev_free(). Ensure
> real_dev not be freed before vlan_dev unregistered.
> 
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: syzbot+e4df4e1389e28972e955@syzkaller.appspotmail.com
> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> ---
>  net/8021q/vlan.c     | 3 ---
>  net/8021q/vlan_dev.c | 3 +++
>  2 files changed, 3 insertions(+), 3 deletions(-)

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

(though I can't tell either if there is a possiblecircular dep problem
in some oddball case)

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-02  2:12 [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan
  2021-11-03 13:50 ` Jason Gunthorpe
@ 2021-11-03 14:30 ` patchwork-bot+netdevbpf
  2021-11-15 17:04 ` Petr Machata
  2 siblings, 0 replies; 18+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-03 14:30 UTC (permalink / raw)
  To: Ziyang Xuan; +Cc: davem, kuba, jgg, netdev, linux-kernel

Hello:

This patch was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue, 2 Nov 2021 10:12:18 +0800 you wrote:
> The real_dev of a vlan net_device may be freed after
> unregister_vlan_dev(). Access the real_dev continually by
> vlan_dev_real_dev() will trigger the UAF problem for the
> real_dev like following:
> 
> ==================================================================
> BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120
> Call Trace:
>  kasan_report.cold+0x83/0xdf
>  vlan_dev_real_dev+0xf9/0x120
>  is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0
>  is_eth_port_of_netdev_filter+0x28/0x40
>  ib_enum_roce_netdev+0x1a3/0x300
>  ib_enum_all_roce_netdevs+0xc7/0x140
>  netdevice_event_work_handler+0x9d/0x210
> ...
> 
> [...]

Here is the summary with links:
  - [net,v2] net: vlan: fix a UAF in vlan_dev_real_dev()
    https://git.kernel.org/netdev/net/c/563bcbae3ba2

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-03 13:50 ` Jason Gunthorpe
@ 2021-11-03 15:47   ` Jakub Kicinski
  2021-11-03 16:11     ` Jason Gunthorpe
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2021-11-03 15:47 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Ziyang Xuan, davem, netdev, linux-kernel

On Wed, 3 Nov 2021 10:50:34 -0300 Jason Gunthorpe wrote:
> (though I can't tell either if there is a possiblecircular dep problem
> in some oddball case)

Same, luckily we're just starting a new dev cycle and syzbot can have
at it. 

We should probably not let this patch get into stable right away -
assuming you agree - would you mind nacking the selection if it happens?
I'm not sure I'll get CCed since it doesn't have my tags.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-03 15:47   ` Jakub Kicinski
@ 2021-11-03 16:11     ` Jason Gunthorpe
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Gunthorpe @ 2021-11-03 16:11 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Ziyang Xuan, davem, netdev, linux-kernel

On Wed, Nov 03, 2021 at 08:47:46AM -0700, Jakub Kicinski wrote:
> On Wed, 3 Nov 2021 10:50:34 -0300 Jason Gunthorpe wrote:
> > (though I can't tell either if there is a possiblecircular dep problem
> > in some oddball case)
> 
> Same, luckily we're just starting a new dev cycle and syzbot can have
> at it. 
> 
> We should probably not let this patch get into stable right away -
> assuming you agree - would you mind nacking the selection if it happens?
> I'm not sure I'll get CCed since it doesn't have my tags.

I will make an effort, sure. It is hard to be confident due to all the
stable selection emails I get :|

Thanks,
Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-02  2:12 [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan
  2021-11-03 13:50 ` Jason Gunthorpe
  2021-11-03 14:30 ` patchwork-bot+netdevbpf
@ 2021-11-15 17:04 ` Petr Machata
  2021-11-15 17:49   ` Jakub Kicinski
  2021-11-19  3:04   ` Ziyang Xuan (William)
  2 siblings, 2 replies; 18+ messages in thread
From: Petr Machata @ 2021-11-15 17:04 UTC (permalink / raw)
  To: Ziyang Xuan; +Cc: davem, kuba, jgg, netdev, linux-kernel


Ziyang Xuan <william.xuanziyang@huawei.com> writes:

> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index 55275ef9a31a..a3a0a5e994f5 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>  	}
>  
>  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
> -
> -	/* Get rid of the vlan's reference to real_dev */
> -	dev_put(real_dev);
>  }
>  
>  int vlan_check_real_dev(struct net_device *real_dev,
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index 0c21d1fec852..aeeb5f90417b 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>  
>  	free_percpu(vlan->vlan_pcpu_stats);
>  	vlan->vlan_pcpu_stats = NULL;
> +
> +	/* Get rid of the vlan's reference to real_dev */
> +	dev_put(vlan->real_dev);
>  }
>  
>  void vlan_setup(struct net_device *dev)

This is causing reference counting issues when vetoing is involved.
Consider the following snippet:

    ip link add name bond1 type bond mode 802.3ad
    ip link set dev swp1 master bond1
    ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
    # ^ vetoed, no netdevice created
    ip link del dev bond1

The setup process goes like this: vlan_newlink() calls
register_vlan_dev() calls netdev_upper_dev_link() calls
__netdev_upper_dev_link(), which issues a notifier
NETDEV_PRECHANGEUPPER, which yields a non-zero error,
because a listener vetoed it.

So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
up decreasing reference count of the real_dev. Then when when the bond
netdevice is removed, we get an endless loop of:

    kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 

Moving the dev_hold(real_dev) to always happen even if the
netdev_upper_dev_link() call makes the issue go away.

I'm not sure why this wasn't happening before. After the veto,
register_vlan_dev() follows with a goto out_unregister_netdev, which
calls unregister_netdevice() calls unregister_netdevice_queue(), which
issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
which calls unregister_vlan_dev(), which used to dev_put(real_dev),
which seems like it should have caused the same issue. Dunno.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-15 17:04 ` Petr Machata
@ 2021-11-15 17:49   ` Jakub Kicinski
  2021-11-16 14:20     ` Petr Machata
  2021-11-17 11:50     ` Petr Machata
  2021-11-19  3:04   ` Ziyang Xuan (William)
  1 sibling, 2 replies; 18+ messages in thread
From: Jakub Kicinski @ 2021-11-15 17:49 UTC (permalink / raw)
  To: Petr Machata; +Cc: Ziyang Xuan, davem, jgg, netdev, linux-kernel

On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
> Ziyang Xuan <william.xuanziyang@huawei.com> writes:
> 
> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> > index 55275ef9a31a..a3a0a5e994f5 100644
> > --- a/net/8021q/vlan.c
> > +++ b/net/8021q/vlan.c
> > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
> >  	}
> >  
> >  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
> > -
> > -	/* Get rid of the vlan's reference to real_dev */
> > -	dev_put(real_dev);
> >  }
> >  
> >  int vlan_check_real_dev(struct net_device *real_dev,
> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> > index 0c21d1fec852..aeeb5f90417b 100644
> > --- a/net/8021q/vlan_dev.c
> > +++ b/net/8021q/vlan_dev.c
> > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
> >  
> >  	free_percpu(vlan->vlan_pcpu_stats);
> >  	vlan->vlan_pcpu_stats = NULL;
> > +
> > +	/* Get rid of the vlan's reference to real_dev */
> > +	dev_put(vlan->real_dev);
> >  }
> >  
> >  void vlan_setup(struct net_device *dev)  
> 
> This is causing reference counting issues when vetoing is involved.
> Consider the following snippet:
> 
>     ip link add name bond1 type bond mode 802.3ad
>     ip link set dev swp1 master bond1
>     ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>     # ^ vetoed, no netdevice created
>     ip link del dev bond1
> 
> The setup process goes like this: vlan_newlink() calls
> register_vlan_dev() calls netdev_upper_dev_link() calls
> __netdev_upper_dev_link(), which issues a notifier
> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
> because a listener vetoed it.
> 
> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
> up decreasing reference count of the real_dev. Then when when the bond
> netdevice is removed, we get an endless loop of:
> 
>     kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 
> 
> Moving the dev_hold(real_dev) to always happen even if the
> netdev_upper_dev_link() call makes the issue go away.
> 
> I'm not sure why this wasn't happening before. After the veto,
> register_vlan_dev() follows with a goto out_unregister_netdev, which
> calls unregister_netdevice() calls unregister_netdevice_queue(), which
> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
> which seems like it should have caused the same issue. Dunno.

Does the notifier trigger unregister_vlan_dev()? I thought the notifier
triggers when lower dev is unregistered.

I think we should move the dev_hold() to ndo_init(), otherwise 
it's hard to reason if destructor was invoked or not if
register_netdevice() errors out.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-15 17:49   ` Jakub Kicinski
@ 2021-11-16 14:20     ` Petr Machata
  2021-11-17 11:50     ` Petr Machata
  1 sibling, 0 replies; 18+ messages in thread
From: Petr Machata @ 2021-11-16 14:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, Ziyang Xuan, davem, jgg, netdev, linux-kernel


Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>
>> I'm not sure why this wasn't happening before. After the veto,
>> register_vlan_dev() follows with a goto out_unregister_netdev, which
>> calls unregister_netdevice() calls unregister_netdevice_queue(), which
>> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
>> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
>> which seems like it should have caused the same issue. Dunno.
>
> Does the notifier trigger unregister_vlan_dev()? I thought the notifier
> triggers when lower dev is unregistered.

Right, I misinterpreted this bit:

        vlan_info = rtnl_dereference(dev->vlan_info);
        if (!vlan_info)
                goto out;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-15 17:49   ` Jakub Kicinski
  2021-11-16 14:20     ` Petr Machata
@ 2021-11-17 11:50     ` Petr Machata
  2021-11-18  1:46       ` Ziyang Xuan (William)
  1 sibling, 1 reply; 18+ messages in thread
From: Petr Machata @ 2021-11-17 11:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Petr Machata, Ziyang Xuan, davem, jgg, netdev, linux-kernel


Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>> Ziyang Xuan <william.xuanziyang@huawei.com> writes:
>> 
>> > diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> > index 55275ef9a31a..a3a0a5e994f5 100644
>> > --- a/net/8021q/vlan.c
>> > +++ b/net/8021q/vlan.c
>> > @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>> >  	}
>> >  
>> >  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>> > -
>> > -	/* Get rid of the vlan's reference to real_dev */
>> > -	dev_put(real_dev);
>> >  }
>> >  
>> >  int vlan_check_real_dev(struct net_device *real_dev,
>> > diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> > index 0c21d1fec852..aeeb5f90417b 100644
>> > --- a/net/8021q/vlan_dev.c
>> > +++ b/net/8021q/vlan_dev.c
>> > @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>> >  
>> >  	free_percpu(vlan->vlan_pcpu_stats);
>> >  	vlan->vlan_pcpu_stats = NULL;
>> > +
>> > +	/* Get rid of the vlan's reference to real_dev */
>> > +	dev_put(vlan->real_dev);
>> >  }
>> >  
>> >  void vlan_setup(struct net_device *dev)  
>> 
>> This is causing reference counting issues when vetoing is involved.
>> Consider the following snippet:
>> 
>>     ip link add name bond1 type bond mode 802.3ad
>>     ip link set dev swp1 master bond1
>>     ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>>     # ^ vetoed, no netdevice created
>>     ip link del dev bond1
>> 
>> The setup process goes like this: vlan_newlink() calls
>> register_vlan_dev() calls netdev_upper_dev_link() calls
>> __netdev_upper_dev_link(), which issues a notifier
>> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
>> because a listener vetoed it.
>> 
>> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
>> up decreasing reference count of the real_dev. Then when when the bond
>> netdevice is removed, we get an endless loop of:
>> 
>>     kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 
>> 
>> Moving the dev_hold(real_dev) to always happen even if the
>> netdev_upper_dev_link() call makes the issue go away.
>
> I think we should move the dev_hold() to ndo_init(), otherwise 
> it's hard to reason if destructor was invoked or not if
> register_netdevice() errors out.

Ziyang Xuan, do you intend to take care of this?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-17 11:50     ` Petr Machata
@ 2021-11-18  1:46       ` Ziyang Xuan (William)
  2021-11-18 14:17         ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Ziyang Xuan (William) @ 2021-11-18  1:46 UTC (permalink / raw)
  To: Petr Machata, Jakub Kicinski; +Cc: davem, jgg, netdev, linux-kernel

> 
> Jakub Kicinski <kuba@kernel.org> writes:
> 
>> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>>> Ziyang Xuan <william.xuanziyang@huawei.com> writes:
>>>
>>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>>>> index 55275ef9a31a..a3a0a5e994f5 100644
>>>> --- a/net/8021q/vlan.c
>>>> +++ b/net/8021q/vlan.c
>>>> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>>>>  	}
>>>>  
>>>>  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>>>> -
>>>> -	/* Get rid of the vlan's reference to real_dev */
>>>> -	dev_put(real_dev);
>>>>  }
>>>>  
>>>>  int vlan_check_real_dev(struct net_device *real_dev,
>>>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>>>> index 0c21d1fec852..aeeb5f90417b 100644
>>>> --- a/net/8021q/vlan_dev.c
>>>> +++ b/net/8021q/vlan_dev.c
>>>> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>>>>  
>>>>  	free_percpu(vlan->vlan_pcpu_stats);
>>>>  	vlan->vlan_pcpu_stats = NULL;
>>>> +
>>>> +	/* Get rid of the vlan's reference to real_dev */
>>>> +	dev_put(vlan->real_dev);
>>>>  }
>>>>  
>>>>  void vlan_setup(struct net_device *dev)  
>>>
>>> This is causing reference counting issues when vetoing is involved.
>>> Consider the following snippet:
>>>
>>>     ip link add name bond1 type bond mode 802.3ad
>>>     ip link set dev swp1 master bond1
>>>     ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>>>     # ^ vetoed, no netdevice created
>>>     ip link del dev bond1
>>>
>>> The setup process goes like this: vlan_newlink() calls
>>> register_vlan_dev() calls netdev_upper_dev_link() calls
>>> __netdev_upper_dev_link(), which issues a notifier
>>> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
>>> because a listener vetoed it.
>>>
>>> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
>>> up decreasing reference count of the real_dev. Then when when the bond
>>> netdevice is removed, we get an endless loop of:
>>>
>>>     kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 
>>>
>>> Moving the dev_hold(real_dev) to always happen even if the
>>> netdev_upper_dev_link() call makes the issue go away.
>>
>> I think we should move the dev_hold() to ndo_init(), otherwise 
>> it's hard to reason if destructor was invoked or not if
>> register_netdevice() errors out.
> 
> Ziyang Xuan, do you intend to take care of this?
> .

I am reading the related processes according to the problem scenario.
And I will give a more clear sequence and root cause as soon as possible
by some necessary tests.

Thank you!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-18  1:46       ` Ziyang Xuan (William)
@ 2021-11-18 14:17         ` Jakub Kicinski
  2021-11-19  3:29           ` Ziyang Xuan (William)
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2021-11-18 14:17 UTC (permalink / raw)
  To: Ziyang Xuan (William); +Cc: Petr Machata, davem, jgg, netdev, linux-kernel

On Thu, 18 Nov 2021 09:46:24 +0800 Ziyang Xuan (William) wrote:
> >> I think we should move the dev_hold() to ndo_init(), otherwise 
> >> it's hard to reason if destructor was invoked or not if
> >> register_netdevice() errors out.  
> > 
> > Ziyang Xuan, do you intend to take care of this?
> > .  
> 
> I am reading the related processes according to the problem scenario.
> And I will give a more clear sequence and root cause as soon as possible
> by some necessary tests.

Okay, I still don't see the fix. 

Petr would you mind submitting since you have a repro handy?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-15 17:04 ` Petr Machata
  2021-11-15 17:49   ` Jakub Kicinski
@ 2021-11-19  3:04   ` Ziyang Xuan (William)
  1 sibling, 0 replies; 18+ messages in thread
From: Ziyang Xuan (William) @ 2021-11-19  3:04 UTC (permalink / raw)
  To: Petr Machata; +Cc: davem, kuba, jgg, netdev, linux-kernel

> 
> Ziyang Xuan <william.xuanziyang@huawei.com> writes:
> 
>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> index 55275ef9a31a..a3a0a5e994f5 100644
>> --- a/net/8021q/vlan.c
>> +++ b/net/8021q/vlan.c
>> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>>  	}
>>  
>>  	vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>> -
>> -	/* Get rid of the vlan's reference to real_dev */
>> -	dev_put(real_dev);
>>  }
>>  
>>  int vlan_check_real_dev(struct net_device *real_dev,
>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> index 0c21d1fec852..aeeb5f90417b 100644
>> --- a/net/8021q/vlan_dev.c
>> +++ b/net/8021q/vlan_dev.c
>> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>>  
>>  	free_percpu(vlan->vlan_pcpu_stats);
>>  	vlan->vlan_pcpu_stats = NULL;
>> +
>> +	/* Get rid of the vlan's reference to real_dev */
>> +	dev_put(vlan->real_dev);
>>  }
>>  
>>  void vlan_setup(struct net_device *dev)
> 
> This is causing reference counting issues when vetoing is involved.
> Consider the following snippet:
> 
>     ip link add name bond1 type bond mode 802.3ad
>     ip link set dev swp1 master bond1
>     ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>     # ^ vetoed, no netdevice created
>     ip link del dev bond1
> 
> The setup process goes like this: vlan_newlink() calls
> register_vlan_dev() calls netdev_upper_dev_link() calls
> __netdev_upper_dev_link(), which issues a notifier
> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
> because a listener vetoed it.
> 
> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
> up decreasing reference count of the real_dev. Then when when the bond
> netdevice is removed, we get an endless loop of:
> 
>     kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0 
> 
> Moving the dev_hold(real_dev) to always happen even if the
> netdev_upper_dev_link() call makes the issue go away.
> 
> I'm not sure why this wasn't happening before. After the veto,
> register_vlan_dev() follows with a goto out_unregister_netdev, which
> calls unregister_netdevice() calls unregister_netdevice_queue(), which
> issues a notifier NETDEV_UNREGISTER, which invokes vlan_device_event(),
> which calls unregister_vlan_dev(), which used to dev_put(real_dev),
> which seems like it should have caused the same issue. Dunno.

netdev_upper_dev_link() failure in register_vlan_dev() will result in
no dev_hold(real_dev) and vlan_group_set_device(). So NETDEV_UNREGISTER
notifier caused by vlan_dev invokes vlan_device_event() will not find the
vlan_dev in vlan_group, and no unregister_vlan_dev() for the vlan_dev.

> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-18 14:17         ` Jakub Kicinski
@ 2021-11-19  3:29           ` Ziyang Xuan (William)
  2021-11-19 10:07             ` Petr Machata
  0 siblings, 1 reply; 18+ messages in thread
From: Ziyang Xuan (William) @ 2021-11-19  3:29 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Petr Machata, davem, jgg, netdev, linux-kernel

> On Thu, 18 Nov 2021 09:46:24 +0800 Ziyang Xuan (William) wrote:
>>>> I think we should move the dev_hold() to ndo_init(), otherwise 
>>>> it's hard to reason if destructor was invoked or not if
>>>> register_netdevice() errors out.  
>>>
>>> Ziyang Xuan, do you intend to take care of this?
>>> .  
>>
>> I am reading the related processes according to the problem scenario.
>> And I will give a more clear sequence and root cause as soon as possible
>> by some necessary tests.
> 
> Okay, I still don't see the fix. 
> 
> Petr would you mind submitting since you have a repro handy?
> .
The sequence for the problem maybe as following:

=============================================================
# create vlan device
vlan_newlink [assume real_dev refcnt == 2 just referenced by itself]
	register_vlan_dev
		register_netdevice(vlan_dev) [success]
		netdev_upper_dev_link [failed]
		unregister_netdevice(vlan_dev) [failure process]
		...
		netdev_run_todo [vlan_dev]
			vlan_dev_free(vlan_dev) [priv_destructor cb]
				dev_put(real_dev) [real_dev refcnt == 1]

# delete real device
unregister_netdevice(real_dev) [real_dev refcnt == 1]
	unregister_netdevice_many
		dev_put(real_dev) [real_dev refcnt == 0]
		net_set_todo(real_dev)
...
netdev_run_todo [real_dev]
	netdev_wait_allrefs(real_dev) [real_dev refcnt == 0]
		pr_emerg("unregister_netdevice: ...", ...)

I am thinking about how to fix the problem. priv_destructor() of the
net_device referenced not only by net_set_todo() but also failure process
in register_netdevice().

I need some time to test my some ideas. And anyone has good ideas, please
do not be stingy.

Thank you!



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-19  3:29           ` Ziyang Xuan (William)
@ 2021-11-19 10:07             ` Petr Machata
  2021-11-23  9:01               ` Ziyang Xuan (William)
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Machata @ 2021-11-19 10:07 UTC (permalink / raw)
  To: Ziyang Xuan (William)
  Cc: Jakub Kicinski, Petr Machata, davem, jgg, netdev, linux-kernel


Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:

> I need some time to test my some ideas. And anyone has good ideas, please
> do not be stingy.

Jakub Kicinski <kuba@kernel.org> writes:

> I think we should move the dev_hold() to ndo_init(), otherwise it's
> hard to reason if destructor was invoked or not if
> register_netdevice() errors out.

That makes sense to me. We always put real_dev in the destructor, so we
should always hold it in the constructor...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-19 10:07             ` Petr Machata
@ 2021-11-23  9:01               ` Ziyang Xuan (William)
  2021-11-23 12:35                 ` Petr Machata
  0 siblings, 1 reply; 18+ messages in thread
From: Ziyang Xuan (William) @ 2021-11-23  9:01 UTC (permalink / raw)
  To: Petr Machata; +Cc: Jakub Kicinski, davem, jgg, netdev, linux-kernel

> 
> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
> 
>> I need some time to test my some ideas. And anyone has good ideas, please
>> do not be stingy.
> 
> Jakub Kicinski <kuba@kernel.org> writes:
> 
>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>> hard to reason if destructor was invoked or not if
>> register_netdevice() errors out.
> 
> That makes sense to me. We always put real_dev in the destructor, so we
> should always hold it in the constructor...

Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
the following testcase:

ip link add dev dummy1 type dummy
ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
ip link del dev dummy1

Make the problem repro. The problem is solved using the following fix
according to the Jakub's suggestion:

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index a3a0a5e994f5..abaa5d96ded2 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
        if (err)
                goto out_unregister_netdev;

-       /* Account for reference in struct vlan_dev_priv */
-       dev_hold(real_dev);
-
        vlan_stacked_transfer_operstate(real_dev, dev, vlan);
        linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index ab6dee28536d..a54535cbcf4c 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
        if (!vlan->vlan_pcpu_stats)
                return -ENOMEM;

+       /* Get vlan's reference to real_dev */
+       dev_hold(real_dev);


If there is not any other idea and objection, I will send the fix patch later.

Thank you!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-23  9:01               ` Ziyang Xuan (William)
@ 2021-11-23 12:35                 ` Petr Machata
  2021-11-25 11:33                   ` Petr Machata
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Machata @ 2021-11-23 12:35 UTC (permalink / raw)
  To: Ziyang Xuan (William)
  Cc: Petr Machata, Jakub Kicinski, davem, jgg, netdev, linux-kernel


Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:

>> 
>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>> 
>>> I need some time to test my some ideas. And anyone has good ideas, please
>>> do not be stingy.
>> 
>> Jakub Kicinski <kuba@kernel.org> writes:
>> 
>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>> hard to reason if destructor was invoked or not if
>>> register_netdevice() errors out.
>> 
>> That makes sense to me. We always put real_dev in the destructor, so we
>> should always hold it in the constructor...
>
> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
> the following testcase:
>
> ip link add dev dummy1 type dummy
> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
> ip link del dev dummy1
>
> Make the problem repro. The problem is solved using the following fix
> according to the Jakub's suggestion:
>
> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
> index a3a0a5e994f5..abaa5d96ded2 100644
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
>         if (err)
>                 goto out_unregister_netdev;
>
> -       /* Account for reference in struct vlan_dev_priv */
> -       dev_hold(real_dev);
> -
>         vlan_stacked_transfer_operstate(real_dev, dev, vlan);
>         linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>
> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index ab6dee28536d..a54535cbcf4c 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
>         if (!vlan->vlan_pcpu_stats)
>                 return -ENOMEM;
>
> +       /* Get vlan's reference to real_dev */
> +       dev_hold(real_dev);
>
>
> If there is not any other idea and objection, I will send the fix patch later.
>
> Thank you!

This fixes the issue in my repro, and does not seems to break anything.
We'll take it to full regression overnight, I'll report back tomorrow
about the result.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-23 12:35                 ` Petr Machata
@ 2021-11-25 11:33                   ` Petr Machata
  2021-11-26  1:48                     ` Ziyang Xuan (William)
  0 siblings, 1 reply; 18+ messages in thread
From: Petr Machata @ 2021-11-25 11:33 UTC (permalink / raw)
  To: Petr Machata
  Cc: Ziyang Xuan (William), Jakub Kicinski, davem, jgg, netdev, linux-kernel


Petr Machata <petrm@nvidia.com> writes:

> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>
>>> 
>>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>>> 
>>>> I need some time to test my some ideas. And anyone has good ideas, please
>>>> do not be stingy.
>>> 
>>> Jakub Kicinski <kuba@kernel.org> writes:
>>> 
>>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>>> hard to reason if destructor was invoked or not if
>>>> register_netdevice() errors out.
>>> 
>>> That makes sense to me. We always put real_dev in the destructor, so we
>>> should always hold it in the constructor...
>>
>> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
>> the following testcase:
>>
>> ip link add dev dummy1 type dummy
>> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
>> ip link del dev dummy1
>>
>> Make the problem repro. The problem is solved using the following fix
>> according to the Jakub's suggestion:
>>
>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> index a3a0a5e994f5..abaa5d96ded2 100644
>> --- a/net/8021q/vlan.c
>> +++ b/net/8021q/vlan.c
>> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
>>         if (err)
>>                 goto out_unregister_netdev;
>>
>> -       /* Account for reference in struct vlan_dev_priv */
>> -       dev_hold(real_dev);
>> -
>>         vlan_stacked_transfer_operstate(real_dev, dev, vlan);
>>         linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>>
>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>> index ab6dee28536d..a54535cbcf4c 100644
>> --- a/net/8021q/vlan_dev.c
>> +++ b/net/8021q/vlan_dev.c
>> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
>>         if (!vlan->vlan_pcpu_stats)
>>                 return -ENOMEM;
>>
>> +       /* Get vlan's reference to real_dev */
>> +       dev_hold(real_dev);
>>
>>
>> If there is not any other idea and objection, I will send the fix patch later.
>>
>> Thank you!
>
> This fixes the issue in my repro, and does not seems to break anything.
> We'll take it to full regression overnight, I'll report back tomorrow
> about the result.

Sorry, was AFK yesterday. It does look good.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()
  2021-11-25 11:33                   ` Petr Machata
@ 2021-11-26  1:48                     ` Ziyang Xuan (William)
  0 siblings, 0 replies; 18+ messages in thread
From: Ziyang Xuan (William) @ 2021-11-26  1:48 UTC (permalink / raw)
  To: Petr Machata; +Cc: Jakub Kicinski, davem, jgg, netdev, linux-kernel

> 
> Petr Machata <petrm@nvidia.com> writes:
> 
>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>>
>>>>
>>>> Ziyang Xuan (William) <william.xuanziyang@huawei.com> writes:
>>>>
>>>>> I need some time to test my some ideas. And anyone has good ideas, please
>>>>> do not be stingy.
>>>>
>>>> Jakub Kicinski <kuba@kernel.org> writes:
>>>>
>>>>> I think we should move the dev_hold() to ndo_init(), otherwise it's
>>>>> hard to reason if destructor was invoked or not if
>>>>> register_netdevice() errors out.
>>>>
>>>> That makes sense to me. We always put real_dev in the destructor, so we
>>>> should always hold it in the constructor...
>>>
>>> Inject error before dev_hold(real_dev) in register_vlan_dev(), and execute
>>> the following testcase:
>>>
>>> ip link add dev dummy1 type dummy
>>> ip link add name dummy1.100 link dummy1 type vlan id 100 // failed for error injection
>>> ip link del dev dummy1
>>>
>>> Make the problem repro. The problem is solved using the following fix
>>> according to the Jakub's suggestion:
>>>
>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>>> index a3a0a5e994f5..abaa5d96ded2 100644
>>> --- a/net/8021q/vlan.c
>>> +++ b/net/8021q/vlan.c
>>> @@ -184,9 +184,6 @@ int register_vlan_dev(struct net_device *dev, struct netlink_ext_ack *extack)
>>>         if (err)
>>>                 goto out_unregister_netdev;
>>>
>>> -       /* Account for reference in struct vlan_dev_priv */
>>> -       dev_hold(real_dev);
>>> -
>>>         vlan_stacked_transfer_operstate(real_dev, dev, vlan);
>>>         linkwatch_fire_event(dev); /* _MUST_ call rfc2863_policy() */
>>>
>>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>>> index ab6dee28536d..a54535cbcf4c 100644
>>> --- a/net/8021q/vlan_dev.c
>>> +++ b/net/8021q/vlan_dev.c
>>> @@ -615,6 +615,9 @@ static int vlan_dev_init(struct net_device *dev)
>>>         if (!vlan->vlan_pcpu_stats)
>>>                 return -ENOMEM;
>>>
>>> +       /* Get vlan's reference to real_dev */
>>> +       dev_hold(real_dev);
>>>
>>>
>>> If there is not any other idea and objection, I will send the fix patch later.
>>>
>>> Thank you!
>>
>> This fixes the issue in my repro, and does not seems to break anything.
>> We'll take it to full regression overnight, I'll report back tomorrow
>> about the result.
> 
> Sorry, was AFK yesterday. It does look good.
> .

Thank you for your efforts very much! I have sent the fix patch.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-11-26  1:50 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-02  2:12 [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev() Ziyang Xuan
2021-11-03 13:50 ` Jason Gunthorpe
2021-11-03 15:47   ` Jakub Kicinski
2021-11-03 16:11     ` Jason Gunthorpe
2021-11-03 14:30 ` patchwork-bot+netdevbpf
2021-11-15 17:04 ` Petr Machata
2021-11-15 17:49   ` Jakub Kicinski
2021-11-16 14:20     ` Petr Machata
2021-11-17 11:50     ` Petr Machata
2021-11-18  1:46       ` Ziyang Xuan (William)
2021-11-18 14:17         ` Jakub Kicinski
2021-11-19  3:29           ` Ziyang Xuan (William)
2021-11-19 10:07             ` Petr Machata
2021-11-23  9:01               ` Ziyang Xuan (William)
2021-11-23 12:35                 ` Petr Machata
2021-11-25 11:33                   ` Petr Machata
2021-11-26  1:48                     ` Ziyang Xuan (William)
2021-11-19  3:04   ` Ziyang Xuan (William)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).