LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit
@ 2008-04-01 22:28 Alessandro Suardi
2008-04-02 8:10 ` Ilpo Järvinen
0 siblings, 1 reply; 6+ messages in thread
From: Alessandro Suardi @ 2008-04-01 22:28 UTC (permalink / raw)
To: Linux Kernel; +Cc: netdev
Found this in my FC6-based bittorrent box (K7-800 running
a 2.6.25-rc6-git2 kernel) this evening.
The kernel was upgraded two weeks ago to fix the bug
in which an USB VIA driver hammered the PCI bus
causing ATA disk performance to drop, and has been
running since then (it still is).
So it's actually 2.6.25-rc6-git2 plus patch as in here
http://www.gossamer-threads.com/lists/linux/kernel/895506#895506
If there's anything useful I can do, just ask. Thanks !
[1164423.381894] WARNING: at net/ipv4/tcp_output.c:1811
tcp_simple_retransmit+0xde/0x167()
[1164423.382797] Modules linked in: iptable_filter ip_tables x_tables
ipv6 sd_mod floppy usb_storage scsi_mod parport_pc parport ehci_hcd
uhci_hcd
[1164423.384115] Pid: 11864, comm: peerguardnf Not tainted 2.6.25-rc6-git2 #2
[1164423.384734] [<c0114964>] warn_on_slowpath+0x41/0x51
[1164423.385262] [<c012e0ab>] ? __lock_acquire+0xac8/0xb10
[1164423.385803] [<c012e0ab>] ? __lock_acquire+0xac8/0xb10
[1164423.386235] [<c02d6dfc>] ? __inet_lookup_established+0x171/0x17b
[1164423.386837] [<c02e7235>] ? tcp_v4_err+0xf4/0x353
[1164423.387338] [<c02e3f71>] tcp_simple_retransmit+0xde/0x167
[1164423.387883] [<c02e733b>] tcp_v4_err+0x1fa/0x353
[1164423.388369] [<c02eec6f>] ? icmp_unreach+0x1e7/0x280
[1164423.388893] [<c02eecc6>] icmp_unreach+0x23e/0x280
[1164423.389392] [<c02ee1ea>] icmp_rcv+0x1c9/0x1f1
[1164423.389893] [<c02d08b2>] ip_local_deliver_finish+0xf6/0x1a1
[1164423.390453] [<c02cc3c1>] nf_reinject+0xc3/0x136
[1164423.390940] [<c02fb3d0>] ipq_rcv_skb+0x359/0x3b5
[1164423.391441] [<c02c8d5c>] netlink_unicast+0x1b8/0x21c
[1164423.391968] [<c02c98f9>] netlink_sendmsg+0x241/0x24e
[1164423.392487] [<c02b166b>] sock_sendmsg+0xc9/0xe0
[1164423.392981] [<c012455d>] ? autoremove_wake_function+0x0/0x33
[1164423.393560] [<c020c81c>] ? copy_from_user+0x3b/0x5e
[1164423.394226] [<c02b7d58>] ? verify_iovec+0x40/0x70
[1164423.394733] [<c02b17cf>] sys_sendmsg+0x14d/0x1a8
[1164423.395228] [<c020ca54>] ? copy_to_user+0x3d/0x49
[1164423.395733] [<c02b1f57>] ? move_addr_to_user+0x3b/0x52
[1164423.396268] [<c02b21a5>] ? sys_recvfrom+0xb8/0xc3
[1164423.396774] [<c012e0ab>] ? __lock_acquire+0xac8/0xb10
[1164423.397305] [<c010463e>] ? do_softirq+0x9a/0xa8
[1164423.397893] [<c0102c8b>] ? restore_nocheck+0x12/0x15
[1164423.398417] [<c012d209>] ? trace_hardirqs_on+0xe6/0x10d
[1164423.398957] [<c0102c8b>] ? restore_nocheck+0x12/0x15
[1164423.399483] [<c02b2577>] sys_socketcall+0x14e/0x166
[1164423.399995] [<c0102bdd>] ? sysenter_past_esp+0x9a/0xa5
[1164423.400528] [<c0102ba2>] sysenter_past_esp+0x5f/0xa5
[1164423.401047] [<c0172824>] ? bio_fs_destructor+0x0/0x10
[1164423.401580] =======================
[1164423.401892] ---[ end trace ec1e34ea2cdbd7ae ]---
[1164425.080514] ------------[ cut here ]------------
[1164425.081052] WARNING: at net/ipv4/tcp_input.c:1771
tcp_enter_frto+0x150/0x1ec()
[1164425.081910] Modules linked in: iptable_filter ip_tables x_tables
ipv6 sd_mod floppy usb_storage scsi_mod parport_pc parport ehci_hcd
uhci_hcd
[1164425.083225] Pid: 0, comm: swapper Not tainted 2.6.25-rc6-git2 #2
[1164425.083797] [<c0114964>] warn_on_slowpath+0x41/0x51
[1164425.084323] [<c012e0ab>] ? __lock_acquire+0xac8/0xb10
[1164425.084844] [<c012e0ab>] ? __lock_acquire+0xac8/0xb10
[1164425.085374] [<c012e0ab>] ? __lock_acquire+0xac8/0xb10
[1164425.085904] [<c02e4eed>] ? tcp_write_timer+0x16/0x575
[1164425.086434] [<c02e0644>] tcp_enter_frto+0x150/0x1ec
[1164425.086943] [<c02e524c>] tcp_write_timer+0x375/0x575
[1164425.087458] [<c011bb23>] run_timer_softirq+0x112/0x16d
[1164425.087985] [<c02e4ed7>] ? tcp_write_timer+0x0/0x575
[1164425.088490] [<c02e4ed7>] ? tcp_write_timer+0x0/0x575
[1164425.089012] [<c0118c8e>] __do_softirq+0x51/0xa8
[1164425.089505] [<c01045fc>] do_softirq+0x58/0xa8
[1164425.089986] [<c013647d>] ? handle_level_irq+0x0/0xbf
[1164425.090511] [<c0118bee>] irq_exit+0x3b/0x47
[1164425.090975] [<c01046e4>] do_IRQ+0x98/0xad
[1164425.091428] [<c0102e3a>] common_interrupt+0x2e/0x34
[1164425.091939] [<c01017e5>] ? default_idle+0x45/0x72
[1164425.092490] [<c01017a0>] ? default_idle+0x0/0x72
[1164425.092989] [<c0101778>] cpu_idle+0x60/0x88
[1164425.093451] [<c031b6f4>] rest_init+0x5c/0x5e
[1164425.093927] =======================
[1164425.094334] ---[ end trace ec1e34ea2cdbd7ae ]---
--alessandro
"Hold back the years, hold back the hours
I want to live to see the sun break through these days"
(Patrick Wolf, 'This Weather')
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit
2008-04-01 22:28 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit Alessandro Suardi
@ 2008-04-02 8:10 ` Ilpo Järvinen
2008-04-02 20:09 ` Alessandro Suardi
0 siblings, 1 reply; 6+ messages in thread
From: Ilpo Järvinen @ 2008-04-02 8:10 UTC (permalink / raw)
To: Alessandro Suardi; +Cc: Linux Kernel, Netdev
On Wed, 2 Apr 2008, Alessandro Suardi wrote:
> Found this in my FC6-based bittorrent box (K7-800 running
> a 2.6.25-rc6-git2 kernel) this evening.
>
> The kernel was upgraded two weeks ago to fix the bug
> in which an USB VIA driver hammered the PCI bus
> causing ATA disk performance to drop, and has been
> running since then (it still is).
>
> So it's actually 2.6.25-rc6-git2 plus patch as in here
> http://www.gossamer-threads.com/lists/linux/kernel/895506#895506
>
> If there's anything useful I can do, just ask. Thanks !
Can you reproduce?
--
i.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit
2008-04-02 8:10 ` Ilpo Järvinen
@ 2008-04-02 20:09 ` Alessandro Suardi
2008-04-02 20:39 ` Ilpo Järvinen
0 siblings, 1 reply; 6+ messages in thread
From: Alessandro Suardi @ 2008-04-02 20:09 UTC (permalink / raw)
To: Ilpo Järvinen; +Cc: Linux Kernel, Netdev
On Wed, Apr 2, 2008 at 10:10 AM, Ilpo Järvinen
<ilpo.jarvinen@helsinki.fi> wrote:
> On Wed, 2 Apr 2008, Alessandro Suardi wrote:
>
> > Found this in my FC6-based bittorrent box (K7-800 running
> > a 2.6.25-rc6-git2 kernel) this evening.
> >
> > The kernel was upgraded two weeks ago to fix the bug
> > in which an USB VIA driver hammered the PCI bus
> > causing ATA disk performance to drop, and has been
> > running since then (it still is).
> >
> > So it's actually 2.6.25-rc6-git2 plus patch as in here
> > http://www.gossamer-threads.com/lists/linux/kernel/895506#895506
> >
> > If there's anything useful I can do, just ask. Thanks !
>
> Can you reproduce?
Nope. That only happened once in this uptime:
[root@donkey ~]# uptime
21:57:21 up 14 days, 22:04, 5 users, load average: 0.64, 0.47, 0.44
The machine runs unattended as a bittorrent client, 24x7,
with a very low traffic (uploading at a steady ~36KB/s,
and downloads happen in peaks).
I VNC into it in the evening and manage the torrents;
that is all.
Now, there's an interesting tidbit:
[root@donkey ~]# ip -s link show eth0
3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:c0:49:a7:33:fe brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped overrun mcast
959044156 60549131 0 0 0 0
TX: bytes packets errors dropped carrier collsns
744533957 66149681 0 0 0 0
[root@donkey ~]# ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:C0:49:A7:33:FE
inet addr:192.168.1.7 Bcast:192.168.1.255 Mask:255.255.255.0
inet6 addr: fe80::2c0:49ff:fea7:33fe/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:60551468 errors:0 dropped:0 overruns:0 frame:0
TX packets:66152721 errors:0 dropped:0 overruns:1 carrier:0
collisions:0 txqueuelen:1000
RX bytes:959199592 (914.7 MiB) TX bytes:748395356 (713.7 MiB)
Interrupt:12 Base address:0xce00
Somehow the "overruns" counter is seen differently by 'ip'
and 'ifconfig' - one says 0, the other says 1 - perhaps the
packet that WARN'd me on tcp_simple_retransmit ?
If there's anything else - reproducing seems really really
unlikely - 1 packet in 66 million...
Thanks,
--alessandro
"Hold back the years, hold back the hours
I want to live to see the sun break through these days"
(Patrick Wolf, 'This Weather')
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit
2008-04-02 20:09 ` Alessandro Suardi
@ 2008-04-02 20:39 ` Ilpo Järvinen
2008-04-03 12:52 ` Ilpo Järvinen
0 siblings, 1 reply; 6+ messages in thread
From: Ilpo Järvinen @ 2008-04-02 20:39 UTC (permalink / raw)
To: Alessandro Suardi; +Cc: Linux Kernel, Netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2880 bytes --]
On Wed, 2 Apr 2008, Alessandro Suardi wrote:
> On Wed, Apr 2, 2008 at 10:10 AM, Ilpo Järvinen
> <ilpo.jarvinen@helsinki.fi> wrote:
> > On Wed, 2 Apr 2008, Alessandro Suardi wrote:
> >
> > > Found this in my FC6-based bittorrent box (K7-800 running
> > > a 2.6.25-rc6-git2 kernel) this evening.
> > >
> > > The kernel was upgraded two weeks ago to fix the bug
> > > in which an USB VIA driver hammered the PCI bus
> > > causing ATA disk performance to drop, and has been
> > > running since then (it still is).
> > >
> > > So it's actually 2.6.25-rc6-git2 plus patch as in here
> > > http://www.gossamer-threads.com/lists/linux/kernel/895506#895506
> > >
> > > If there's anything useful I can do, just ask. Thanks !
> >
> > Can you reproduce?
>
> Nope. That only happened once in this uptime:
>
> [root@donkey ~]# uptime
> 21:57:21 up 14 days, 22:04, 5 users, load average: 0.64, 0.47, 0.44
>
> The machine runs unattended as a bittorrent client, 24x7,
> with a very low traffic (uploading at a steady ~36KB/s,
> and downloads happen in peaks).
>
> I VNC into it in the evening and manage the torrents;
> that is all.
>
> Now, there's an interesting tidbit:
>
> [root@donkey ~]# ip -s link show eth0
> 3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
> link/ether 00:c0:49:a7:33:fe brd ff:ff:ff:ff:ff:ff
> RX: bytes packets errors dropped overrun mcast
> 959044156 60549131 0 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 744533957 66149681 0 0 0 0
>
> [root@donkey ~]# ifconfig eth0
> eth0 Link encap:Ethernet HWaddr 00:C0:49:A7:33:FE
> inet addr:192.168.1.7 Bcast:192.168.1.255 Mask:255.255.255.0
> inet6 addr: fe80::2c0:49ff:fea7:33fe/64 Scope:Link
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> RX packets:60551468 errors:0 dropped:0 overruns:0 frame:0
> TX packets:66152721 errors:0 dropped:0 overruns:1 carrier:0
> collisions:0 txqueuelen:1000
> RX bytes:959199592 (914.7 MiB) TX bytes:748395356 (713.7 MiB)
> Interrupt:12 Base address:0xce00
>
> Somehow the "overruns" counter is seen differently by 'ip'
> and 'ifconfig' - one says 0, the other says 1 - perhaps the
> packet that WARN'd me on tcp_simple_retransmit ?
...I find that extreme unlikely.
> If there's anything else - reproducing seems really really
> unlikely - 1 packet in 66 million...
Yeah, it seems some hard to hit corner case and so far nobody has
a reproducable scenario (or at least I'm not aware of any). I tried
to reproduce it last weekend and failed, even with some netem stimuli
added while torrenting.
I'll probably ask soon Andrew to queue some low cost debug patch into
-mm to get a bit more clues when somebody running mm happens to hit
it.
--
i.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit
2008-04-02 20:39 ` Ilpo Järvinen
@ 2008-04-03 12:52 ` Ilpo Järvinen
0 siblings, 0 replies; 6+ messages in thread
From: Ilpo Järvinen @ 2008-04-03 12:52 UTC (permalink / raw)
To: Alessandro Suardi, David Miller, Arjan van de Ven; +Cc: Linux Kernel, Netdev
[-- Attachment #1: Type: TEXT/PLAIN, Size: 5191 bytes --]
On Wed, 2 Apr 2008, Ilpo Järvinen wrote:
> On Wed, 2 Apr 2008, Alessandro Suardi wrote:
>
> > On Wed, Apr 2, 2008 at 10:10 AM, Ilpo Järvinen
> > <ilpo.jarvinen@helsinki.fi> wrote:
> > > On Wed, 2 Apr 2008, Alessandro Suardi wrote:
> > >
> > > > Found this in my FC6-based bittorrent box (K7-800 running
> > > > a 2.6.25-rc6-git2 kernel) this evening.
> > > >
> > > > The kernel was upgraded two weeks ago to fix the bug
> > > > in which an USB VIA driver hammered the PCI bus
> > > > causing ATA disk performance to drop, and has been
> > > > running since then (it still is).
> > > >
> > > > So it's actually 2.6.25-rc6-git2 plus patch as in here
> > > > http://www.gossamer-threads.com/lists/linux/kernel/895506#895506
> > > >
> > > > If there's anything useful I can do, just ask. Thanks !
> > >
> > > Can you reproduce?
> >
> > Nope. That only happened once in this uptime:
> >
> > [root@donkey ~]# uptime
> > 21:57:21 up 14 days, 22:04, 5 users, load average: 0.64, 0.47, 0.44
> >
[...snip...]
> > If there's anything else - reproducing seems really really
> > unlikely - 1 packet in 66 million...
>
> Yeah, it seems some hard to hit corner case and so far nobody has
> a reproducable scenario (or at least I'm not aware of any). I tried
> to reproduce it last weekend and failed, even with some netem stimuli
> added while torrenting.
No wonder you won't hit it too often if it's this one. This wasn't
reported be kernel before 2.6.24-gsomething but I think pre-2.6.24s are
broken too, they my return negative packets_in_flight because of this for
a while (which can lead to packet bursts).
This shouldn't cause fackets_out inconsistencies, so there's liekly more
to find. I already checked rest of the lost_out players and found no other
similar bugs in them. Maybe I should try to play with mtu probing at home
if it's able to catch more fishes.
The patch is just compile tested.
--
i.
[PATCH] [TCP]: tcp_simple_retransmit can cause S+L
tcp_simple_retransmit does L increment without any checking
whatsoever for overflowing S+L when NewReno is in use.
The simplest scenario I can currently think of is rather
complex in practice (there might be some more straightforward
cases though). Ie., if mss is reduced during mtu probing, it
endd up marking everything lost and if some duplicate ACKs
arrived prior to that sacked_out will be non-zero as well,
leading to S+L > packets_out, tcp_clean_rtx_queue on the next
cumulative ACK or tcp_fastretrans_alert on the next duplicate
ACK will fix the S counter.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
include/net/tcp.h | 2 ++
net/ipv4/tcp_input.c | 22 ++++++++++++++++------
net/ipv4/tcp_output.c | 3 +++
3 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 723b368..8f5fc52 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -760,6 +760,8 @@ static inline unsigned int tcp_packets_in_flight(const struct tcp_sock *tp)
return tp->packets_out - tcp_left_out(tp) + tp->retrans_out;
}
+extern int tcp_limit_reno_sacked(struct tcp_sock *tp);
+
/* If cwnd > ssthresh, we may raise ssthresh to be half-way to cwnd.
* The exception is rate halving phase, when cwnd is decreasing towards
* ssthresh.
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6e46b4c..94f3015 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1625,13 +1625,11 @@ out:
return flag;
}
-/* If we receive more dupacks than we expected counting segments
- * in assumption of absent reordering, interpret this as reordering.
- * The only another reason could be bug in receiver TCP.
+/* Limits sacked_out so that sum with lost_out isn't ever larger than
+ * packets_out. Returns zero if sacked_out adjustement wasn't necessary.
*/
-static void tcp_check_reno_reordering(struct sock *sk, const int addend)
+int tcp_limit_reno_sacked(struct tcp_sock *tp)
{
- struct tcp_sock *tp = tcp_sk(sk);
u32 holes;
holes = max(tp->lost_out, 1U);
@@ -1639,8 +1637,20 @@ static void tcp_check_reno_reordering(struct sock *sk, const int addend)
if ((tp->sacked_out + holes) > tp->packets_out) {
tp->sacked_out = tp->packets_out - holes;
- tcp_update_reordering(sk, tp->packets_out + addend, 0);
+ return 1;
}
+ return 0;
+}
+
+/* If we receive more dupacks than we expected counting segments
+ * in assumption of absent reordering, interpret this as reordering.
+ * The only another reason could be bug in receiver TCP.
+ */
+static void tcp_check_reno_reordering(struct sock *sk, const int addend)
+{
+ struct tcp_sock *tp = tcp_sk(sk);
+ if (tcp_limit_reno_sacked(tp))
+ tcp_update_reordering(sk, tp->packets_out + addend, 0);
}
/* Emulate SACKs for SACKless connection: account for a new dupack. */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6e25540..441fdd3 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1808,6 +1808,9 @@ void tcp_simple_retransmit(struct sock *sk)
if (!lost)
return;
+ if (tcp_is_reno(tp))
+ tcp_limit_reno_sacked(tp);
+
tcp_verify_left_out(tp);
/* Don't muck with the congestion window here.
--
1.5.2.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit
2008-04-03 22:49 2.6.25-rc8-git2: Reported regressions from 2.6.24 Rafael J. Wysocki
@ 2008-04-03 23:30 ` Rafael J. Wysocki
0 siblings, 0 replies; 6+ messages in thread
From: Rafael J. Wysocki @ 2008-04-03 23:30 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Alessandro Suardi, Ilpo Jarvinen
The following report is on the current list of known regressions
from 2.6.24. Please verify if the issue is still present in the
mainline.
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10384
Subject : 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit
Submitter : Alessandro Suardi <alessandro.suardi@gmail.com>
Date : 2008-04-02 00:28 (2 days old)
References : http://lkml.org/lkml/2008/4/1/408
Handled-By : Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-04-04 0:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-01 22:28 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit Alessandro Suardi
2008-04-02 8:10 ` Ilpo Järvinen
2008-04-02 20:09 ` Alessandro Suardi
2008-04-02 20:39 ` Ilpo Järvinen
2008-04-03 12:52 ` Ilpo Järvinen
2008-04-03 22:49 2.6.25-rc8-git2: Reported regressions from 2.6.24 Rafael J. Wysocki
2008-04-03 23:30 ` 2.6.25-rc6-git2: warn_on_slowpath for tcp_simple_retransmit Rafael J. Wysocki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).