LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "David Schwartz" <davids@webmaster.com>
To: <swivel@shells.gnugeneration.com>, <ncannasse@motion-twin.com>
Cc: <linux-kernel@vger.kernel.org>
Subject: RE: poll() blocked / packets not received ?
Date: Mon, 20 Oct 2008 08:53:10 -0700 [thread overview]
Message-ID: <MDEHLPKNGKAHNMBLJOLKAEJCAIAD.davids@webmaster.com> (raw)
In-Reply-To: <48FC7BEE.1020701@motion-twin.com>
Nick Cannasse wrote:
> Ok, funny thing is that we just found what is occurring...
>
> We had a process that was on a regular basis doing the following :
>
> conntrack -F
>
> This was done in order to prevent the table to grow too big, because we
> were reaching the maximum size as told by :
>
> /proc/sys/net/ipv4/netfilter/ip_conntrack_max
> and
> /proc/sys/net/ipv4/netfilter/ip_conntrack_count
>
> Seems like when there are active connections, this will break netfilter
> and stop delivering packets to the socket.
>
> At least I will have nice sleep tonight.
Note that this solved your symptom, not your problem. You actually have two
problems:
1) You rely on TCP to detect a lost connection even by a side that will
never transmit any data. TCP simply does not do this. If you are not trying
to send data, you are not assured that a lost connection will be detected.
(You either need a timeout, or you need to send or dribble some data,
depending on the protocl.)
2) You hold a lock on a shared resource while you wait for a reply over a
network. If this is a low-level "block and wait indefinitely" lock, this
will cause many threads to line up behind a slow/stuck thread. The right fix
depends on your circumstances, but you need to use a synchronization
primitive that is suitable. (You need to be able to use multiple connections
or defer operations without holding a thread.)
With both of these bugs, you are vulnerable to precisely the scenario you
observed. The TCP connection close packets were lost (in this case due to
premature expiration of the connnection tracking, but other things can do
it, such as the server rebooting), TCP could not detect the lost connection
because you never sent any data, so one thread blocked forever, and other
threads got in line behind it.
DS
next prev parent reply other threads:[~2008-10-20 15:53 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-20 8:25 Nicolas Cannasse
2008-10-20 10:15 ` swivel
2008-10-20 10:46 ` Nicolas Cannasse
2008-10-20 11:39 ` swivel
2008-10-20 12:13 ` Nicolas Cannasse
2008-10-20 12:39 ` Nicolas Cannasse
2008-10-20 15:53 ` David Schwartz [this message]
2008-10-20 17:24 ` Nicolas Cannasse
2008-10-20 23:21 ` David Schwartz
2008-10-21 5:12 ` Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MDEHLPKNGKAHNMBLJOLKAEJCAIAD.davids@webmaster.com \
--to=davids@webmaster.com \
--cc=linux-kernel@vger.kernel.org \
--cc=ncannasse@motion-twin.com \
--cc=swivel@shells.gnugeneration.com \
--subject='RE: poll() blocked / packets not received ?' \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).