LKML Archive on
help / color / mirror / Atom feed
From: Nicolas Cannasse <>
Subject: Re: poll() blocked / packets not received ?
Date: Mon, 20 Oct 2008 14:13:35 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <> a écrit :
> When the other end of the TCP is _gone_ that leads me to believe a FIN
> will not be coming, hence the indefinite ESTABLISHED state.  Why it's
> gone is a different question, maybe your problem is at the other end?
> The end initiating a shutdown has to enter FIN_WAIT_1 then FIN_WAIT_2,
> these transitions require the other side to leave ESTABLISHED (receive a
> FIN then ACK) at the very least to proceed.
>> I agree with your comment in general, except that we have been running 
>> the same application in single-thread environment for years without 
>> running into this very specific problem.
> Perhaps when you run in multicore/threaded you are stressing the network
> stacks at both ends more, including everything in-between?  The
> threading vs. single process relationship is probably not causal, but
> just coincidental.

Not sure why this should happen, since it's the same servers. What only 
change is part of the software that we are using to handle our server 
requests. It's either embedded in Apache 1.3 with fork() or a standalone 
multithread server which acts as Apache backend.

So the only difference for networking is that we have additional 
Apache<->MT-Server communications, but they should be on so I 
think they are purely software and not hardware-related.

> What is the protocol?  Are there any timeouts to take care of these
> situations?  Do you schedule an alarm or use SO_RCVTIMEO to shutdown
> dead connections and free up consumed threads?

The protocol is MySQL. Since we had the problem with libmysqlclient, we 
reimplemented it again from scratch to make sure that it was not 

What happens at the protocol-level is the following :

a) we connect to the server
b) we make several requests and get answers back
c) at some (random+rare) point - always after making a request - we're 
stuck while waiting for the answer.

Sadly, this can happen inside a transaction while we hold the lock on 
some shared resource. This will lock the whole website until we run out 
of File Descriptor due to accept'ed pending connections. In that case we 
get an exception and the server (the multithread one, not MySQL) 
restarts, which release the lock.

In some other cases when we don't hold a lock, the thread remains 
blocked in poll() as I described it. After a timeout (I think it's 28800 
seconds) the MySQL server closes the connection. The client - which is 
waiting in poll() - does not have any timeout activated (it's relying on 
the mysql server). But it doesn't notice that the socket has been closed 

We investigated a lot about signals since poll() can also be interrupted 
by Garbage Collector and child process signals, but we correctly handle 
EINTR everywhere it's needed. So unless there's a possibility that 
interrupting poll() with a signal might somehow consume the data, this 
is not the problem here.

> TCP being reliable can block indefinitely, you can employ TCP keepalive
> to change indefinite to quite a long time.

Sure. We could also use a client timeout, but we don't want to hold the 
lock more than required, and we can't make the difference between a 
given request that would take too much time to complete and a lost 

Hope we can somehow understand what's going on.
Thanks for the answers so far,


  reply	other threads:[~2008-10-20 12:13 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-20  8:25 Nicolas Cannasse
2008-10-20 10:15 ` swivel
2008-10-20 10:46   ` Nicolas Cannasse
2008-10-20 11:39     ` swivel
2008-10-20 12:13       ` Nicolas Cannasse [this message]
2008-10-20 12:39       ` Nicolas Cannasse
2008-10-20 15:53         ` David Schwartz
2008-10-20 17:24           ` Nicolas Cannasse
2008-10-20 23:21             ` David Schwartz
2008-10-21  5:12             ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \
    --subject='Re: poll() blocked / packets not received ?' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).