LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Yao Liu <yotta.liu@ucloud.cn>
Cc: Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>,
	linux-block <linux-block@vger.kernel.org>,
	nbd <nbd@other.debian.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server
Date: Tue, 28 May 2019 12:57:59 -0400	[thread overview]
Message-ID: <20190528165758.zxfrv6fum4vwcv4e@MacBook-Pro-91.local> (raw)
In-Reply-To: <20190527180743.GA20702@192-168-150-246.7~>

On Tue, May 28, 2019 at 02:07:43AM +0800, Yao Liu wrote:
> On Fri, May 24, 2019 at 09:07:42AM -0400, Josef Bacik wrote:
> > On Fri, May 24, 2019 at 05:43:54PM +0800, Yao Liu wrote:
> > > Some I/O requests that have been sent succussfully but have not yet been
> > > replied won't be resubmitted after reconnecting because of server restart,
> > > so we add a list to track them.
> > > 
> > > Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
> > 
> > Nack, this is what the timeout stuff is supposed to handle.  The commands will
> > timeout and we'll resubmit them if we have alive sockets.  Thanks,
> > 
> > Josef
> > 
> 
> On the one hand, if num_connections == 1 and the only sock has dead,
> then we do nbd_genl_reconfigure to reconnect within dead_conn_timeout,
> nbd_xmit_timeout will not resubmit commands that have been sent
> succussfully but have not yet been replied. The log is as follows:
>  
> [270551.108746] block nbd0: Receive control failed (result -104)
> [270551.108747] block nbd0: Send control failed (result -32)
> [270551.108750] block nbd0: Request send failed, requeueing
> [270551.116207] block nbd0: Attempted send on invalid socket
> [270556.119584] block nbd0: reconnected socket
> [270581.161751] block nbd0: Connection timed out
> [270581.165038] block nbd0: shutting down sockets
> [270581.165041] print_req_error: I/O error, dev nbd0, sector 5123224 flags 8801
> [270581.165149] print_req_error: I/O error, dev nbd0, sector 5123232 flags 8801
> [270581.165580] block nbd0: Connection timed out
> [270581.165587] print_req_error: I/O error, dev nbd0, sector 844680 flags 8801
> [270581.166184] print_req_error: I/O error, dev nbd0, sector 5123240 flags 8801
> [270581.166554] block nbd0: Connection timed out
> [270581.166576] print_req_error: I/O error, dev nbd0, sector 844688 flags 8801
> [270581.167124] print_req_error: I/O error, dev nbd0, sector 5123248 flags 8801
> [270581.167590] block nbd0: Connection timed out
> [270581.167597] print_req_error: I/O error, dev nbd0, sector 844696 flags 8801
> [270581.168021] print_req_error: I/O error, dev nbd0, sector 5123256 flags 8801
> [270581.168487] block nbd0: Connection timed out
> [270581.168493] print_req_error: I/O error, dev nbd0, sector 844704 flags 8801
> [270581.170183] print_req_error: I/O error, dev nbd0, sector 5123264 flags 8801
> [270581.170540] block nbd0: Connection timed out
> [270581.173333] block nbd0: Connection timed out
> [270581.173728] block nbd0: Connection timed out
> [270581.174135] block nbd0: Connection timed out
>  
> On the other hand, if we wait nbd_xmit_timeout to handle resubmission,
> the I/O requests will have a big delay. For example, if timeout time is 30s,
> and from sock dead to nbd_genl_reconfigure returned OK we only spend
> 2s, the I/O requests will still be handled by nbd_xmit_timeout after 30s.

We have to wait for the full timeout anyway to know that the socket went down,
so it'll be re-submitted right away and then we'll wait on the new connection.

Now we could definitely have requests that were submitted well after the first
thing that failed, so their timeout would be longer than simply retrying them,
but we have no idea of knowing which ones timed out and which ones didn't.  This
way lies pain, because we have to matchup tags with handles.  This is why we
rely on the generic timeout infrastructure, so everything is handled correctly
without ending up with duplicate submissions/replies.  Thanks,

Josef

  reply	other threads:[~2019-05-28 16:58 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-24  9:43 Yao Liu
2019-05-24  9:43 ` [PATCH 2/3] nbd: notify userland even if nbd has already disconnected Yao Liu
2019-05-24 13:08   ` Josef Bacik
2019-05-27 18:23     ` Yao Liu
2019-05-28 16:36       ` Mike Christie
2019-05-28 20:05         ` Yao Liu
2019-05-28 16:54       ` Josef Bacik
2019-05-24  9:43 ` [PATCH 3/3] nbd: mark sock as dead even if it's the last one Yao Liu
2019-05-24 13:17   ` Josef Bacik
2019-05-27 18:29     ` Yao Liu
2019-05-24 13:07 ` [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server Josef Bacik
2019-05-27 18:07   ` Yao Liu
2019-05-28 16:57     ` Josef Bacik [this message]
2019-05-28 19:04       ` Yao Liu
2019-05-29 13:49         ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190528165758.zxfrv6fum4vwcv4e@MacBook-Pro-91.local \
    --to=josef@toxicpanda.com \
    --cc=axboe@kernel.dk \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nbd@other.debian.org \
    --cc=yotta.liu@ucloud.cn \
    --subject='Re: [PATCH 1/3] nbd: fix connection timed out error after reconnecting to server' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).