LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Nicolas Cannasse <ncannasse@motion-twin.com>
To: swivel@shells.gnugeneration.com
Cc: linux-kernel@vger.kernel.org
Subject: Re: poll() blocked / packets not received ?
Date: Mon, 20 Oct 2008 14:13:35 +0200 [thread overview]
Message-ID: <48FC75EF.1020705@motion-twin.com> (raw)
In-Reply-To: <20081020113942.GJ2811@fc6222126.aspadmin.net>
swivel@shells.gnugeneration.com a écrit :
> When the other end of the TCP is _gone_ that leads me to believe a FIN
> will not be coming, hence the indefinite ESTABLISHED state. Why it's
> gone is a different question, maybe your problem is at the other end?
> The end initiating a shutdown has to enter FIN_WAIT_1 then FIN_WAIT_2,
> these transitions require the other side to leave ESTABLISHED (receive a
> FIN then ACK) at the very least to proceed.
>
>> I agree with your comment in general, except that we have been running
>> the same application in single-thread environment for years without
>> running into this very specific problem.
>>
>
> Perhaps when you run in multicore/threaded you are stressing the network
> stacks at both ends more, including everything in-between? The
> threading vs. single process relationship is probably not causal, but
> just coincidental.
Not sure why this should happen, since it's the same servers. What only
change is part of the software that we are using to handle our server
requests. It's either embedded in Apache 1.3 with fork() or a standalone
multithread server which acts as Apache backend.
So the only difference for networking is that we have additional
Apache<->MT-Server communications, but they should be on 127.0.0.1 so I
think they are purely software and not hardware-related.
> What is the protocol? Are there any timeouts to take care of these
> situations? Do you schedule an alarm or use SO_RCVTIMEO to shutdown
> dead connections and free up consumed threads?
The protocol is MySQL. Since we had the problem with libmysqlclient, we
reimplemented it again from scratch to make sure that it was not
software-related.
What happens at the protocol-level is the following :
a) we connect to the server
b) we make several requests and get answers back
c) at some (random+rare) point - always after making a request - we're
stuck while waiting for the answer.
Sadly, this can happen inside a transaction while we hold the lock on
some shared resource. This will lock the whole website until we run out
of File Descriptor due to accept'ed pending connections. In that case we
get an exception and the server (the multithread one, not MySQL)
restarts, which release the lock.
In some other cases when we don't hold a lock, the thread remains
blocked in poll() as I described it. After a timeout (I think it's 28800
seconds) the MySQL server closes the connection. The client - which is
waiting in poll() - does not have any timeout activated (it's relying on
the mysql server). But it doesn't notice that the socket has been closed
either.
We investigated a lot about signals since poll() can also be interrupted
by Garbage Collector and child process signals, but we correctly handle
EINTR everywhere it's needed. So unless there's a possibility that
interrupting poll() with a signal might somehow consume the data, this
is not the problem here.
> TCP being reliable can block indefinitely, you can employ TCP keepalive
> to change indefinite to quite a long time.
Sure. We could also use a client timeout, but we don't want to hold the
lock more than required, and we can't make the difference between a
given request that would take too much time to complete and a lost
connection.
Hope we can somehow understand what's going on.
Thanks for the answers so far,
Best,
Nicolas
next prev parent reply other threads:[~2008-10-20 12:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-20 8:25 Nicolas Cannasse
2008-10-20 10:15 ` swivel
2008-10-20 10:46 ` Nicolas Cannasse
2008-10-20 11:39 ` swivel
2008-10-20 12:13 ` Nicolas Cannasse [this message]
2008-10-20 12:39 ` Nicolas Cannasse
2008-10-20 15:53 ` David Schwartz
2008-10-20 17:24 ` Nicolas Cannasse
2008-10-20 23:21 ` David Schwartz
2008-10-21 5:12 ` Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48FC75EF.1020705@motion-twin.com \
--to=ncannasse@motion-twin.com \
--cc=linux-kernel@vger.kernel.org \
--cc=swivel@shells.gnugeneration.com \
--subject='Re: poll() blocked / packets not received ?' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).