Netdev Archive on
help / color / mirror / Atom feed
From: Neal Cardwell <>
To: Ben Greear <>
Cc: netdev <>
Subject: Re: Debugging stuck tcp connection across localhost
Date: Thu, 6 Jan 2022 10:20:32 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Thu, Jan 6, 2022 at 10:06 AM Ben Greear <> wrote:
> Hello,
> I'm working on a strange problem, and could use some help if anyone has ideas.
> On a heavily loaded system (500+ wifi station devices, VRF device per 'real' netdev,
> traffic generation on the netdevs, etc), I see cases where two processes trying
> to communicate across localhost with TCP seem to get a stuck network
> connection:
> [greearb@bendt7 ben_debug]$ grep 4004 netstat.txt |grep
> tcp        0 7988926         ESTABLISHED
> tcp        0  59805          ESTABLISHED
> Both processes in question continue to execute, and as far as I can tell, they are properly
> attempting to read/write the socket, but they are reading/writing 0 bytes (these sockets
> are non blocking).  If one was stuck not reading, I would expect netstat
> to show bytes in the rcv buffer, but it is zero as you can see above.
> Kernel is 5.15.7+ local hacks.  I can only reproduce this in a big messy complicated
> test case, with my local ath10k-ct and other patches that enable virtual wifi stations,
> but my code can grab logs at time it sees the problem.  Is there anything
> more I can do to figure out why the TCP connection appears to be stuck?

It could be very useful to get more information about the state of all
the stuck connections (sender and receiver side) with something like:

  ss -tinmo 'sport = :4004 or sport = :4004'

I would recommend downloading and building a recent version of the
'ss' tool to maximize the information. Here is a recipe for doing

It could also be very useful to collect and share packet traces, as
long as taking traces does not consume an infeasible amount of space,
or perturb timing in a way that makes the buggy behavior disappear.
For example, as root:

  tcpdump -w /tmp/trace.pcap -s 120 -c 100000000 -i any port 4004 &

If space is an issue, you might start taking traces once things get
stuck to see what the retry behavior, if any, looks like.


  reply	other threads:[~2022-01-06 15:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-06 14:59 Ben Greear
2022-01-06 15:20 ` Neal Cardwell [this message]
2022-01-06 15:39   ` Ben Greear
2022-01-06 16:16     ` Neal Cardwell
2022-01-06 19:05       ` Ben Greear
2022-01-06 20:04         ` Neal Cardwell
2022-01-06 20:20           ` Ben Greear
2022-01-06 22:26           ` Ben Greear
2022-01-10 18:10             ` Ben Greear
2022-01-10 22:16               ` David Laight
2022-01-11 10:46               ` Eric Dumazet
2022-01-11 21:35                 ` Ben Greear
2022-01-12  7:41                   ` Eric Dumazet
2022-01-12 14:52                     ` Ben Greear
2022-01-12 17:12                       ` Eric Dumazet
2022-01-12 18:01                         ` Debugging stuck tcp connection across localhost [snip] Ben Greear
2022-01-12 18:44                           ` Ben Greear
2022-01-12 18:47                             ` Eric Dumazet
2022-01-12 18:54                               ` Ben Greear

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \
    --subject='Re: Debugging stuck tcp connection across localhost' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).