LKML Archive on
help / color / mirror / Atom feed
To: Nicolas Cannasse <>
Subject: Re: poll() blocked / packets not received ?
Date: Mon, 20 Oct 2008 06:39:42 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Mon, Oct 20, 2008 at 12:46:56PM +0200, Nicolas Cannasse wrote:
> >>We have Shorewall installed and enabled, but what seems strange is that 
> >>the problem depends on multithreading. It also occurs much more often on 
> >>the 4 core machines than on a 2 core ones (both with Hyperthreading 
> >>activated). We're using kernel 2.6.20-15-server (#2 SMP) provided by 
> >>Ubuntu.
> >>
> >>Any tip on we could fix that or investigate further would be 
> >>appreciated. After one month of debugging we're really out of solution 
> >>now.
> >>
> >>Best,
> >>Nicolas
> >
> >Your usage pattern is a very common one, I highly doubt you are 
> >experiencing
> >a kernel bug here or many people (including myself) would be complaining.
> >
> >Shorewall sounds like it might be suspect, are FIN's not coming in when the
> >remote closes?  You can look in the output of netstat to see what state the
> >TCP is in, still ESTABLISHED?
> Yes, it's still ESTABLISHED, but we can't see the corresponding 
> connection on the other machine while running netstat. I'm not a TCP 
> expert, so I'm not sure in which case this can occur.

If the end that's blocking still has the TCP in ESTABLISHED state, and
the other end doesnt have the TCP at all... you've already identified
why the one end is still ESTABLISHED.  ESTABLISHED state won't be left
until the FIN is received from the other end, then entering CLOSE_WAIT

When the other end of the TCP is _gone_ that leads me to believe a FIN
will not be coming, hence the indefinite ESTABLISHED state.  Why it's
gone is a different question, maybe your problem is at the other end?
The end initiating a shutdown has to enter FIN_WAIT_1 then FIN_WAIT_2,
these transitions require the other side to leave ESTABLISHED (receive a
FIN then ACK) at the very least to proceed.

> I agree with your comment in general, except that we have been running 
> the same application in single-thread environment for years without 
> running into this very specific problem.

Perhaps when you run in multicore/threaded you are stressing the network
stacks at both ends more, including everything in-between?  The
threading vs. single process relationship is probably not causal, but
just coincidental.

What is the protocol?  Are there any timeouts to take care of these
situations?  Do you schedule an alarm or use SO_RCVTIMEO to shutdown
dead connections and free up consumed threads?

TCP being reliable can block indefinitely, you can employ TCP keepalive
to change indefinite to quite a long time.

Vito Caputo

  reply	other threads:[~2008-10-20 11:39 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-20  8:25 Nicolas Cannasse
2008-10-20 10:15 ` swivel
2008-10-20 10:46   ` Nicolas Cannasse
2008-10-20 11:39     ` swivel [this message]
2008-10-20 12:13       ` Nicolas Cannasse
2008-10-20 12:39       ` Nicolas Cannasse
2008-10-20 15:53         ` David Schwartz
2008-10-20 17:24           ` Nicolas Cannasse
2008-10-20 23:21             ` David Schwartz
2008-10-21  5:12             ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \
    --subject='Re: poll() blocked / packets not received ?' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).