LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: KY Srinivasan <kys@microsoft.com>,
	"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Jason Wang <jasowang@redhat.com>
Subject: RE: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for vmbus channels
Date: Wed, 4 Feb 2015 09:54:09 +0000	[thread overview]
Message-ID: <F792CF86EFE20D4AB8064279AFBA51C61F33B904@HKNPRD3002MB017.064d.mgd.msft.net> (raw)
In-Reply-To: <877fvyrphx.fsf@vitty.brq.redhat.com>

> -----Original Message-----
> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> Sent: Wednesday, February 4, 2015 17:32 PM
> To: Dexuan Cui
> Cc: KY Srinivasan; devel@linuxdriverproject.org; Haiyang Zhang; linux-
> kernel@vger.kernel.org; Jason Wang
> Subject: Re: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow
> for vmbus channels
> 
> Dexuan Cui <decui@microsoft.com> writes:
> 
> >> -----Original Message-----
> >> From: Vitaly Kuznetsov [mailto:vkuznets@redhat.com]
> >> Sent: Wednesday, February 4, 2015 1:01 AM
> >> To: KY Srinivasan; devel@linuxdriverproject.org
> >> Cc: Haiyang Zhang; linux-kernel@vger.kernel.org; Dexuan Cui; Jason Wang
> >> Subject: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow
> for
> >> vmbus channels
> >>
> >> free_channel() function frees the channel unconditionally so we need to
> make
> >> sure nobody has any link to it. This is not trivial and there are several
> >> examples of races we have:
> >>
> >> 1) In vmbus_onoffer_rescind() we check for channel existence with
> >>    relid2channel() and then use it. This can go wrong if we're in the middle
> >>    of channel removal (free_channel() was already called).
> >>
> >> 2) In process_chn_event() we check for channel existence with
> >>    pcpu_relid2channel() and then use it. This can also go wrong.
> >>
> >> 3) vmbus_free_channels() just frees all channels, in case we're in the middle
> >>    of vmbus_process_rescind_offer() crash is possible.
> >>
> >> The issue can be solved by holding vmbus_connection.channel_lock
> everywhere,
> >> however, it looks like a way to deadlocks and performance degradation.
> Get/put
> >> workflow fits here the best.
> >>
> >> Implement vmbus_get_channel()/vmbus_put_channel() pair instead of
> >> free_channel().
> >>
> >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> >> ---
> >>  drivers/hv/channel_mgmt.c | 45
> >> ++++++++++++++++++++++++++++++++++++++-------
> >>  drivers/hv/connection.c   |  7 +++++--
> >>  drivers/hv/hyperv_vmbus.h |  4 ++++
> >>  include/linux/hyperv.h    | 13 +++++++++++++
> >>  4 files changed, 60 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> >> index 36bacc7..eb9ce94 100644
> >> --- a/drivers/hv/channel_mgmt.c
> >> +++ b/drivers/hv/channel_mgmt.c
> >> @@ -147,6 +147,8 @@ static struct vmbus_channel *alloc_channel(void)
> >>  		return NULL;
> >>
> >>  	channel->id = atomic_inc_return(&chan_num);
> >> +	atomic_set(&channel->count, 1);
> >> +
> >>  	spin_lock_init(&channel->inbound_lock);
> >>  	spin_lock_init(&channel->lock);
> >>
> >> @@ -178,19 +180,47 @@ static void release_channel(struct work_struct
> *work)
> >>  }
> >>
> >>  /*
> >> - * free_channel - Release the resources used by the vmbus channel object
> >> + * vmbus_put_channel - Decrease the channel usage counter and release
> the
> >> + * resources when this counter reaches zero.
> >>   */
> >> -static void free_channel(struct vmbus_channel *channel)
> >> +void vmbus_put_channel(struct vmbus_channel *channel)
> >>  {
> >> +	unsigned long flags;
> >>
> >>  	/*
> >>  	 * We have to release the channel's workqueue/thread in the vmbus's
> >>  	 * workqueue/thread context
> >>  	 * ie we can't destroy ourselves.
> >>  	 */
> >> -	INIT_WORK(&channel->work, release_channel);
> >> -	queue_work(vmbus_connection.work_queue, &channel->work);
> >> +	spin_lock_irqsave(&channel->lock, flags);
> >> +	if (atomic_dec_and_test(&channel->count)) {
> >> +		channel->dying = true;
> >> +		INIT_WORK(&channel->work, release_channel);
> >> +		spin_unlock_irqrestore(&channel->lock, flags);
> >> +		queue_work(vmbus_connection.work_queue, &channel->work);
> >> +	} else
> >> +		spin_unlock_irqrestore(&channel->lock, flags);
> >> +}
> >> +EXPORT_SYMBOL_GPL(vmbus_put_channel);
> >> +
> >> +/* vmbus_get_channel - Get additional reference to the channel */
> >> +struct vmbus_channel *vmbus_get_channel(struct vmbus_channel *channel)
> >> +{
> >> +	unsigned long flags;
> >> +	struct vmbus_channel *ret = NULL;
> >> +
> >> +	if (!channel)
> >> +		return NULL;
> >> +
> >> +	spin_lock_irqsave(&channel->lock, flags);
> >> +	if (!channel->dying) {
> >> +		atomic_inc(&channel->count);
> >> +		ret = channel;
> >> +	}
> >> +	spin_unlock_irqrestore(&channel->lock, flags);
> >> +	return ret;
> >>  }
> >> +EXPORT_SYMBOL_GPL(vmbus_get_channel);
> >>
> >>  static void percpu_channel_enq(void *arg)
> >>  {
> >> @@ -253,7 +283,7 @@ static void vmbus_process_rescind_offer(struct
> >> work_struct *work)
> >>  		list_del(&channel->sc_list);
> >>  		spin_unlock_irqrestore(&primary_channel->lock, flags);
> >>  	}
> >> -	free_channel(channel);
> >> +	vmbus_put_channel(channel);
> >>  }
> >>
> >>  void vmbus_free_channels(void)
> >> @@ -262,7 +292,7 @@ void vmbus_free_channels(void)
> >>
> >>  	list_for_each_entry(channel, &vmbus_connection.chn_list, listentry) {
> >>  		vmbus_device_unregister(channel->device_obj);
> >> -		free_channel(channel);
> >> +		vmbus_put_channel(channel);
> >>  	}
> >>  }
> >>
> >> @@ -391,7 +421,7 @@ done_init_rescind:
> >>  	spin_unlock_irqrestore(&newchannel->lock, flags);
> >>  	return;
> >>  err_free_chan:
> >> -	free_channel(newchannel);
> >> +	vmbus_put_channel(newchannel);
> >>  }
> >>
> >>  enum {
> >> @@ -549,6 +579,7 @@ static void vmbus_onoffer_rescind(struct
> >> vmbus_channel_message_header *hdr)
> >>  		queue_work(channel->controlwq, &channel->work);
> >>
> >>  	spin_unlock_irqrestore(&channel->lock, flags);
> >> +	vmbus_put_channel(channel);
> >>  }
> >>
> >>  /*
> >> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> >> index c4acd1c..d1ce134 100644
> >> --- a/drivers/hv/connection.c
> >> +++ b/drivers/hv/connection.c
> >> @@ -247,7 +247,8 @@ void vmbus_disconnect(void)
> >>   * Map the given relid to the corresponding channel based on the
> >>   * per-cpu list of channels that have been affinitized to this CPU.
> >>   * This will be used in the channel callback path as we can do this
> >> - * mapping in a lock-free fashion.
> >> + * mapping in a lock-free fashion. Takes additional reference to the
> >> + * channel, all users are supposed to do vmbus_put_channel().
> >>   */
> >>  static struct vmbus_channel *pcpu_relid2channel(u32 relid)
> >>  {
> >> @@ -263,7 +264,7 @@ static struct vmbus_channel
> *pcpu_relid2channel(u32
> >> relid)
> >>  		}
> >>  	}
> >>
> >> -	return found_channel;
> >> +	return vmbus_get_channel(found_channel);
> >>  }
> >>
> >>  /*
> >> @@ -297,6 +298,7 @@ struct vmbus_channel *relid2channel(u32 relid)
> >>  			}
> >>  		}
> >>  	}
> >> +	found_channel = vmbus_get_channel(found_channel);
> >>  	spin_unlock_irqrestore(&vmbus_connection.channel_lock, flags);
> >>
> >>  	return found_channel;
> >> @@ -360,6 +362,7 @@ static void process_chn_event(u32 relid)
> >>  		pr_err("no channel callback for relid - %u\n", relid);
> >>  	}
> >>
> >> +	vmbus_put_channel(channel);
> >>  }
> >>
> >>  /*
> >> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
> >> index b055e53..40d70f0 100644
> >> --- a/drivers/hv/hyperv_vmbus.h
> >> +++ b/drivers/hv/hyperv_vmbus.h
> >> @@ -687,6 +687,10 @@ void vmbus_device_unregister(struct hv_device
> >> *device_obj);
> >>  /* VmbusChildDeviceDestroy( */
> >>  /* struct hv_device *); */
> >>
> >> +/*
> >> + * Get the channel by its relid. Takes additional reference to the channel so
> >> + * all users are supposed to do vmbus_put_channel() when they're done.
> >> + */
> >>  struct vmbus_channel *relid2channel(u32 relid);
> >>
> >>  void vmbus_free_channels(void);
> >> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
> >> index e73cfeb..c576d2d 100644
> >> --- a/include/linux/hyperv.h
> >> +++ b/include/linux/hyperv.h
> >> @@ -649,6 +649,9 @@ struct vmbus_channel {
> >>  	/* Unique channel id */
> >>  	int id;
> >>
> >> +	/* Active reference count */
> >> +	atomic_t count;
> >> +
> >>  	struct list_head listentry;
> >>
> >>  	struct hv_device *device_obj;
> >> @@ -666,6 +669,7 @@ struct vmbus_channel {
> >>  	u8 monitor_bit;
> >>
> >>  	bool rescind; /* got rescind msg */
> >> +	bool dying; /* channel is dying */
> >>
> >>  	u32 ringbuffer_gpadlhandle;
> >>
> >> @@ -907,6 +911,15 @@ extern int vmbus_recvpacket_raw(struct
> >> vmbus_channel *channel,
> >>
> >>  extern void vmbus_ontimer(unsigned long data);
> >>
> >> +/*
> >> + * Decrease reference count for the channel. Frees the channel when its
> usage
> >> + * count reaches zero.
> >> + */
> >> +extern void vmbus_put_channel(struct vmbus_channel *channel);
> >> +
> >> +/* Get additional reference to the channel */
> >> +extern struct vmbus_channel *vmbus_get_channel(struct vmbus_channel
> >> *channel);
> >> +
> >>  /* Base driver object */
> >>  struct hv_driver {
> >>  	const char *name;
> >> --
> >
> > Hi Vitaly,
> > Thanks for the patchset!
> > I once tried to do the same work, but just didn't find enough time. :-)
> >
> > Please see my below questions:
> 
> Thanks for your detailed review! I should have added 'RFC' to my series :-)
> 
> >
> > Since we always get channel->lock when accessing channel->count, I don't
> > think the counter has to be of atomic_t?
> 
> Agreed
> 
> >
> > I suggest we add vmbus_get/put_channel() in vmbus_open/close().
> > In this way, IMO patch 3/4 and 4/4 would be unnecessary?
> >
> 
> I was thinking about that but I wasn't sure it is going to be enough for
> sub-channels. I'll look again and report.
> 
> > In vmbus_exit(), I suspect we should swap the order of the 2 lines:
> >         vmbus_free_channels();
> >         bus_unregister(&hv_bus);
> > I suppose in bus_unregister(), the .remove callbacks of all the drivers are
> > invoked, and vmbus_close() is  invoked for every channel.
> 
> Makes sense!
> 
> >
> > BTW, I just noticed: alloc_channel() -> alloc_workqueue(,,max_active==1,):
> > due to max_active==1 here, a channel's process_offer() and
> > process_rescind_offer() is already serialized.
> > I'm not sure if this fact can be used to simplify the logic?
> 
> Workers are serialized but e.g. vmbus_onoffer_rescind() is not
> serialized with them, e.g. there is no protection against channel
> dessapearance after we found it with relid2channel() (e.g. we're on
> failure path in vmbus_process_offer()).
> 
> Anyway, we have another examples of (theoretically) possible issues and
> get/put workflow should simplify the syncronization.
> 
> > Is the previous commit "Drivers: hv: vmbus: serialize Offer and Rescind offer"
> > necessary?
> 
> It is as not checking work function is bad anyway as (in theory) we can
> run vmbus_process_offer() twice if vmbus_onoffer_rescind() runs before
> vmbus_process_offer() was started.
> 
> >
> > BTW2, I think there is an unsafe race in the 2 paths (this is an existing issue,
> > not related to this patch of yours):
> >
> > vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel ->
> > 	list_for_each_entry(channel, pcpu_head, percpu_list)
> >
> > and
> >
> > vmbus_process_rescind_offer() -> percpu_channel_deq() ->
> > 	list_del(&channel->percpu_list)
> >
> > because the former is running in tasklet and hence can preempt the latter
> (running
> > in process context).  percpu_channel_enq() should have the same
> > issue(?)
> 
> As I said in my cover letter I don't believe my series solves all issues :-)
> 
> >
> > We should add local_bh_disable()/enable() accordingly?
> > Can you please help to fix this?
> 
> Yes, sure, I'll take a look.
> 
> >
> > vmbus_exit() -> vmbus_free_channels() -> vmbus_put_channel() may have
> > a race with vmbus_process_rescind_offer()?
> > After vmbus_process_rescind_offer() ->  vmbus_device_unregister(), the count
> > can already be the initial 1, later vmbus_free_channels() ->
> vmbus_put_channel() can
> > kfree the channel before vmbus_process_rescind_offer() completely
> > finishes?
> 
> I was going to rewrite vmbus_free_channels() not only because of
> that. E.g. vmbus_get_outgoing_channel() presumes all lists (in both
> channel.sc_list and vmbus_connection.chn_list) are valid and we don't
> take care of that. Not that I believe vmbus_get_outgoing_channel()
> may happen when vmbus_free_channels() is active (and we call
> vmbus_free_channels() only if we unload the module)
> 
> 
> Thanks for your thorough review, it gave me further inspiration :-)
> 
> --
>   Vitaly

Thanks, Vitaly!

I know it's tricky here. :-)

Thanks,
-- Dexuan

  reply	other threads:[~2015-02-04 10:09 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-03 17:00 [PATCH 0/4] Drivers: hv: Further protection for the rescind path Vitaly Kuznetsov
2015-02-03 17:00 ` [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for vmbus channels Vitaly Kuznetsov
2015-02-04  8:18   ` Dexuan Cui
2015-02-04  9:32     ` Vitaly Kuznetsov
2015-02-04  9:54       ` Dexuan Cui [this message]
2015-02-04  9:14   ` Jason Wang
2015-02-04  9:33     ` Vitaly Kuznetsov
2015-02-03 17:00 ` [PATCH 2/4] Drivers: hv: vmbus: do not lose rescind offer on failure in vmbus_process_offer() Vitaly Kuznetsov
2015-02-04  7:42   ` Dexuan Cui
2015-02-03 17:00 ` [PATCH 3/4] Drivers: hv: vmbus: protect vmbus_get_outgoing_channel() against channel removal Vitaly Kuznetsov
2015-02-04  7:27   ` Dexuan Cui
2015-02-03 17:00 ` [PATCH 4/4] hyperv: netvsc: improve protection against rescind offer Vitaly Kuznetsov
2015-02-04  7:29   ` Dexuan Cui
2015-02-04 18:26 ` [PATCH 0/4] Drivers: hv: Further protection for the rescind path KY Srinivasan
2015-02-05 10:10   ` Vitaly Kuznetsov
2015-02-05 12:44     ` Dexuan Cui
2015-02-05 22:47       ` KY Srinivasan
2015-02-06 14:53         ` Dexuan Cui
2015-02-07 16:27           ` KY Srinivasan
2015-02-05 15:52     ` KY Srinivasan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F792CF86EFE20D4AB8064279AFBA51C61F33B904@HKNPRD3002MB017.064d.mgd.msft.net \
    --to=decui@microsoft.com \
    --cc=devel@linuxdriverproject.org \
    --cc=haiyangz@microsoft.com \
    --cc=jasowang@redhat.com \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vkuznets@redhat.com \
    --subject='RE: [PATCH 1/4] Drivers: hv: vmbus: implement get/put usage workflow for vmbus channels' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).