LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Sunil Muthuswamy <sunilmut@microsoft.com>
To: Dexuan Cui <decui@microsoft.com>,
KY Srinivasan <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Stephen Hemminger <sthemmin@microsoft.com>,
Sasha Levin <sashal@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Michael Kelley <mikelley@microsoft.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH] hv_sock: Fix data loss upon socket close
Date: Tue, 14 May 2019 20:40:56 +0000 [thread overview]
Message-ID: <BN6PR21MB0465E3E8C3C386D21647A237C0080@BN6PR21MB0465.namprd21.prod.outlook.com> (raw)
In-Reply-To: <PU1P153MB01695C88469F32B9ECC7657EBF0D0@PU1P153MB0169.APCP153.PROD.OUTLOOK.COM>
> -----Original Message-----
> From: Dexuan Cui <decui@microsoft.com>
> Sent: Friday, May 10, 2019 8:57 PM
> To: Sunil Muthuswamy <sunilmut@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> Stephen Hemminger <sthemmin@microsoft.com>; Sasha Levin <sashal@kernel.org>; David S. Miller <davem@davemloft.net>;
> Michael Kelley <mikelley@microsoft.com>
> Cc: netdev@vger.kernel.org; linux-hyperv@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: RE: [PATCH] hv_sock: Fix data loss upon socket close
>
> > From: Sunil Muthuswamy <sunilmut@microsoft.com>
> > Sent: Wednesday, May 8, 2019 4:11 PM
> >
> > Currently, when a hvsock socket is closed, the socket is shutdown and
> > immediately a RST is sent. There is no wait for the FIN packet to arrive
> > from the other end. This can lead to data loss since the connection is
> > terminated abruptly. This can manifest easily in cases of a fast guest
> > hvsock writer and a much slower host hvsock reader. Essentially hvsock is
> > not following the proper STREAM(TCP) closing handshake mechanism.
>
> Hi Sunil,
> It looks to me the above description is inaccurate.
>
> In the upstream Linux kernel, closing a hv_sock file descriptor may hang
> in vmbus_hvsock_device_unregister() -> msleep(), until the host side of
> the connection is closed. This is bad and should be fixed, but I don't think
> the current code can cause data loss: when Linux calls hvs_destruct() ->
> vmbus_hvsock_device_unregister() -> vmbus_device_unregister() -> ...
> -> vmbus_close() to close the channel, Linux knows the host app has
> already called close(), and normally that means the host app has
> received all the data from the connection.
>
> BTW, technically speaking, in hv_sock there is no RST packet, while there
> is indeed a payload_len==0 packet, which is similar to TCP FIN.
>
> I think by saying "a RST is sent" you mean Linux VM closes the channel.
>
> > The fix involves adding support for the delayed close of hvsock, which is
> > in-line with other socket providers such as virtio.
>
> With this "delayed close" patch, Linux's close() won't hang until the host
> also closes the connection. This is good!
>
The next version of the patch will only focus on implementing the delayed
(or background) close logic. I will update the title and the description of the
next version patch to more accurately reflect the change.
> > While closing, the
> > socket waits for a constant timeout, for the FIN packet to arrive from the
> > other end. On timeout, it will terminate the connection (i.e a RST).
>
> As I mentioned above, I suppose the "RST" means Linux closes the channel.
>
> When Linux closes a connection, the FIN packet is written into the shared
> guest-to-host channel ringbuffer immediately, so the host is able to see it
> immediately, but the real question is: what if the host kernel and/or host app
> can not (timely) receive the data from the ringbuffer, inclding the FIN?
>
> Does the host kernel guarantee it *always* timely fetches/caches all the
> data from a connection, even if the host app has not accept()'d the
> conection, or the host app is reading from the connection too slowly?
>
> If the host doesn't guarantee that, then even with this patch there is still
> a chance Linux can time out, and close the channel before the host
> finishes receiving all the data.
>
TCP protocol does not guarantee that all the data gets delivered, especially
during close. The applications are required to mitigate this by using a
combination of shutdown() and subsequent read() on both client and server
side.
> I'm curious how Windows guest implements the "async close"?
> Does Windows guest also use the same timeout strategy here? If yes,
> what's the timeout value used?
>
Windows also implements the delayed close logic in a similar fashion. You can
lookup the MSDN article on 'graceful shutdown'.
> > diff --git a/net/vmw_vsock/hyperv_transport.c
> > b/net/vmw_vsock/hyperv_transport.c
> > index a827547..62b986d 100644
>
> Sorry, I need more time to review the rest of patch. Will try to reply ASAP.
>
> > -static int hvs_update_recv_data(struct hvsock *hvs)
> > +static int hvs_update_recv_data(struct vsock_sock *vsk)
> > {
> > struct hvs_recv_buf *recv_buf;
> > u32 payload_len;
> > + struct hvsock *hvs = vsk->trans;
> >
> > recv_buf = (struct hvs_recv_buf *)(hvs->recv_desc + 1);
> > payload_len = recv_buf->hdr.data_size;
> > @@ -543,8 +591,12 @@ static int hvs_update_recv_data(struct hvsock *hvs)
> > if (payload_len > HVS_MTU_SIZE)
> > return -EIO;
> >
> > - if (payload_len == 0)
> > + /* Peer shutdown */
> > + if (payload_len == 0) {
> > + struct sock *sk = sk_vsock(vsk);
> > hvs->vsk->peer_shutdown |= SEND_SHUTDOWN;
> > + sk->sk_state_change(sk);
> > + }
>
> Can you please explain why we need to call this sk->sk_state_change()?
>
> When we call hvs_update_recv_data(), we hold the lock_sock(sk) lock, and we
> know there is at least one byte to read. Since we hold the lock, the other
> code paths, which normally are also requried to acquire the lock before
> checking vsk->peer_shutdown, can not race with us.
>
This was to wakeup any waiters on socket state change. Since the updated
patch now only focuses on adding the delayed close logic, I will remove this in
the next version.
Thanks for the review so far.
> Thanks,
> -- Dexuan
prev parent reply other threads:[~2019-05-14 20:41 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-08 23:10 Sunil Muthuswamy
2019-05-09 20:58 ` David Miller
2019-05-14 16:33 ` Sunil Muthuswamy
2019-05-11 3:56 ` Dexuan Cui
2019-05-14 20:40 ` Sunil Muthuswamy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BN6PR21MB0465E3E8C3C386D21647A237C0080@BN6PR21MB0465.namprd21.prod.outlook.com \
--to=sunilmut@microsoft.com \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mikelley@microsoft.com \
--cc=netdev@vger.kernel.org \
--cc=sashal@kernel.org \
--cc=sthemmin@microsoft.com \
--subject='RE: [PATCH] hv_sock: Fix data loss upon socket close' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).