LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: cups slow on linux-2.6.24
@ 2008-02-10 15:06 Jeff Chua
  2008-02-14 15:02 ` Jozsef Kadlecsik
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Chua @ 2008-02-10 15:06 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jozsef Kadlecsik, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist, Linus Torvalds


On Feb 5, 2008 9:47 PM, Patrick McHardy wrote:
>> On Feb 5, 2008 4:16 PM, Jozsef Kadlecsik wrote:
>> Patrick, I suppose you need a patch against the latest git, don't you?
> Yes, please. I'll take you first patch for -stable though if you
> send me a Signed-off-by: line.

Please note the lastest git commit is missing one part (which was in 
Jozsef's original patch).

 	commit b2155e7f70b3f058efe94c0c459db023b05057bd
 	Author: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
 	Date:   Thu Feb 7 17:54:56 2008 -0800

 	    [NETFILTER]: nf_conntrack: TCP conntrack reopening fix


I've tested the following missing patch, and it solves the cups slow 
printing problem.


Thanks,
Jeff

--- a/net/netfilter/nf_conntrack_proto_tcp.c.org	2008-02-10 22:45:49 +0800
+++ a/net/netfilter/nf_conntrack_proto_tcp.c	2008-02-10 22:45:56 +0800
@@ -945,7 +945,7 @@

  	ct->proto.tcp.state = new_state;
  	if (old_state != new_state
-	    && new_state == TCP_CONNTRACK_CLOSE)
+	    && new_state == TCP_CONNTRACK_FIN_WAIT)
  		ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
  	timeout = ct->proto.tcp.retrans >= nf_ct_tcp_max_retrans
  		  && tcp_timeouts[new_state] > nf_ct_tcp_timeout_max_retrans

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-10 15:06 cups slow on linux-2.6.24 Jeff Chua
@ 2008-02-14 15:02 ` Jozsef Kadlecsik
  2008-02-14 22:50   ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Jozsef Kadlecsik @ 2008-02-14 15:02 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Patrick McHardy, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist, Linus Torvalds

Hi,

On Sun, 10 Feb 2008, Jeff Chua wrote:

> Please note the lastest git commit is missing one part (which was in Jozsef's
> original patch).

Sorry everyone, that's my fault: the patch I sent for the stable branch 
was correct, however I mistyped a state in the patch for the newest 
git tree - Jeff spotted it, thanks! So here follows the patch against the 
newest git tree which corrects the mistake. Please review and commit.

commit 1177898d185f382f491afaee3cdc752729c74bd2
Author: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Date:   Thu Feb 14 15:51:21 2008 +0100

    Mistyped state corrected.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 202d7fa..6256795 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -945,7 +945,7 @@ static int tcp_packet(struct nf_conn *ct,
 
 	ct->proto.tcp.state = new_state;
 	if (old_state != new_state
-	    && new_state == TCP_CONNTRACK_CLOSE)
+	    && new_state == TCP_CONNTRACK_FIN_WAIT)
 		ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
 	timeout = ct->proto.tcp.retrans >= nf_ct_tcp_max_retrans
 		  && tcp_timeouts[new_state] > nf_ct_tcp_timeout_max_retrans
 

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-14 15:02 ` Jozsef Kadlecsik
@ 2008-02-14 22:50   ` David Miller
  2008-02-15  1:44     ` Patrick McHardy
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-02-14 22:50 UTC (permalink / raw)
  To: kadlec
  Cc: jeff.chua.linux, kaber, linux-kernel, ole, cups-bugs,
	netfilter-devel, torvalds

From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Date: Thu, 14 Feb 2008 16:02:29 +0100 (CET)

> Hi,
> 
> On Sun, 10 Feb 2008, Jeff Chua wrote:
> 
> > Please note the lastest git commit is missing one part (which was in Jozsef's
> > original patch).
> 
> Sorry everyone, that's my fault: the patch I sent for the stable branch 
> was correct, however I mistyped a state in the patch for the newest 
> git tree - Jeff spotted it, thanks! So here follows the patch against the 
> newest git tree which corrects the mistake. Please review and commit.

I'll apply this since Patrick still isn't feeling well.

Thanks!

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-14 22:50   ` David Miller
@ 2008-02-15  1:44     ` Patrick McHardy
  0 siblings, 0 replies; 28+ messages in thread
From: Patrick McHardy @ 2008-02-15  1:44 UTC (permalink / raw)
  To: David Miller
  Cc: kadlec, jeff.chua.linux, linux-kernel, ole, cups-bugs,
	netfilter-devel, torvalds

David Miller wrote:
> From: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Date: Thu, 14 Feb 2008 16:02:29 +0100 (CET)
> 
>> Hi,
>>
>> On Sun, 10 Feb 2008, Jeff Chua wrote:
>>
>>> Please note the lastest git commit is missing one part (which was in Jozsef's
>>> original patch).
>> Sorry everyone, that's my fault: the patch I sent for the stable branch 
>> was correct, however I mistyped a state in the patch for the newest 
>> git tree - Jeff spotted it, thanks! So here follows the patch against the 
>> newest git tree which corrects the mistake. Please review and commit.
> 
> I'll apply this since Patrick still isn't feeling well.


Thanks Dave. I'm almost feeling healthy again, so I'll go through
my backlog over the next days and will push everything important
early next week.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-05  8:16 ` Jozsef Kadlecsik
@ 2008-02-05 13:47   ` Patrick McHardy
  0 siblings, 0 replies; 28+ messages in thread
From: Patrick McHardy @ 2008-02-05 13:47 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Jeff Chua, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

Jozsef Kadlecsik wrote:
> On Tue, 5 Feb 2008, Jeff Chua wrote:
> 
>> On Feb 5, 2008 4:17 AM, Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
>>
>>> Actively closed connections are not handled properly, i.e. the initiator of
>>> the active close should not be taken into account. So could you give a try
>>> to the patch below? Does it just suppress the 'invalid packed ignored' and
>>> all other kernel messages or both suppresses them and produces normal
>>> printing speed?
>> Amazing! You fixed it. No more 'invalid packed ignored', and speed back to
>> normal (continues after approx. 20 seconds of pausing after 503 prints).
> 
> Great! Thank you for the testing!


Thanks for figuring this out.

>> I used the latest git, and have to modify your patch slightly to make it work
>> (changing "conntrack" to "ct").
> 
> Patrick, I suppose you need a patch against the latest git, don't you?


Yes, please. I'll take you first patch for -stable though if you
send me a Signed-off-by: line.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-05  1:17 Jeff Chua
@ 2008-02-05  8:16 ` Jozsef Kadlecsik
  2008-02-05 13:47   ` Patrick McHardy
  0 siblings, 1 reply; 28+ messages in thread
From: Jozsef Kadlecsik @ 2008-02-05  8:16 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Patrick McHardy, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

On Tue, 5 Feb 2008, Jeff Chua wrote:

> On Feb 5, 2008 4:17 AM, Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> 
> > Actively closed connections are not handled properly, i.e. the initiator of
> > the active close should not be taken into account. So could you give a try
> > to the patch below? Does it just suppress the 'invalid packed ignored' and
> > all other kernel messages or both suppresses them and produces normal
> > printing speed?
> 
> Amazing! You fixed it. No more 'invalid packed ignored', and speed back to
> normal (continues after approx. 20 seconds of pausing after 503 prints).

Great! Thank you for the testing!
 
> I used the latest git, and have to modify your patch slightly to make it work
> (changing "conntrack" to "ct").

Patrick, I suppose you need a patch against the latest git, don't you?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
@ 2008-02-05  1:17 Jeff Chua
  2008-02-05  8:16 ` Jozsef Kadlecsik
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Chua @ 2008-02-05  1:17 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Patrick McHardy, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist



On Feb 5, 2008 4:17 AM, Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:

> Actively closed connections are not handled properly, i.e. the initiator 
> of the active close should not be taken into account. So could you give 
> a try to the patch below? Does it just suppress the 'invalid packed 
> ignored' and all other kernel messages or both suppresses them and 
> produces normal printing speed?

Jozsef,

Amazing! You fixed it. No more 'invalid packed ignored', and speed back to 
normal (continues after approx. 20 seconds of pausing after 503 prints).

I used the latest git, and have to modify your patch slightly to make it 
work (changing "conntrack" to "ct").


Thank you for fixing this.

Jeff


Here's your patch modified so it'll apply to the latest git.

--- a/net/netfilter/nf_conntrack_proto_tcp.c.org	2008-02-05 08:29:39 +0800
+++ a/net/netfilter/nf_conntrack_proto_tcp.c	2008-02-05 08:28:05 +0800
@@ -125,7 +125,7 @@
   * CLOSE_WAIT:	ACK seen (after FIN)
   * LAST_ACK:	FIN seen (after FIN)
   * TIME_WAIT:	last ACK seen
- * CLOSE:	closed connection
+ * CLOSE:	closed connection (RST)
   *
   * LISTEN state is not used.
   *
@@ -824,9 +824,23 @@
  	case TCP_CONNTRACK_SYN_SENT:
  		if (old_state < TCP_CONNTRACK_TIME_WAIT)
  			break;
-		if ((ct->proto.tcp.seen[!dir].flags & IP_CT_TCP_FLAG_CLOSE_INIT)
-		    || (ct->proto.tcp.last_dir == dir
-		        && ct->proto.tcp.last_index == TCP_RST_SET)) {
+		/* RFC 1122: "When a connection is closed actively,
+		 * it MUST linger in TIME-WAIT state for a time 2xMSL
+		 * (Maximum Segment Lifetime). However, it MAY accept
+		 * a new SYN from the remote TCP to reopen the connection
+		 * directly from TIME-WAIT state, if..."
+		 * We ignore the conditions because we are in the
+		 * TIME-WAIT state anyway.
+		 *
+		 * Handle aborted connections: we and the server
+		 * think there is an existing connection but the client
+		 * aborts it and starts a new one.
+		 */
+		if (((ct->proto.tcp.seen[dir].flags
+		      | ct->proto.tcp.seen[!dir].flags)
+		     & IP_CT_TCP_FLAG_CLOSE_INIT)
+ 		    || (ct->proto.tcp.last_dir == dir
+ 		        && ct->proto.tcp.last_index == TCP_RST_SET)) {
  			/* Attempt to reopen a closed/aborted connection.
  			 * Delete this connection and look up again. */
  			write_unlock_bh(&tcp_lock);
@@ -838,15 +852,23 @@
  	case TCP_CONNTRACK_IGNORE:
  		/* Ignored packets:
  		 *
+		 * Our connection entry may be out of sync, so ignore
+		 * packets which may signal the real connection between
+		 * the client and the server.
+		 *
  		 * a) SYN in ORIGINAL
  		 * b) SYN/ACK in REPLY
  		 * c) ACK in reply direction after initial SYN in original.
+		 *
+		 * If the ignored packet is invalid, the receiver will send 
+		 * a RST we'll catch below.
  		 */
  		if (index == TCP_SYNACK_SET
  		    && ct->proto.tcp.last_index == TCP_SYN_SET
  		    && ct->proto.tcp.last_dir != dir
  		    && ntohl(th->ack_seq) == ct->proto.tcp.last_end) {
  			/* This SYN/ACK acknowledges a SYN that we earlier
+			/* b) This SYN/ACK acknowledges a SYN that we earlier
  			 * ignored as invalid. This means that the client and
  			 * the server are both in sync, while the firewall is
  			 * not. We kill this session and block the SYN/ACK so
@@ -924,8 +946,7 @@

  	ct->proto.tcp.state = new_state;
  	if (old_state != new_state
-	    && (new_state == TCP_CONNTRACK_FIN_WAIT
-		|| new_state == TCP_CONNTRACK_CLOSE))
+	    && new_state == TCP_CONNTRACK_FIN_WAIT)
  		ct->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
  	timeout = ct->proto.tcp.retrans >= nf_ct_tcp_max_retrans
  		  && tcp_timeouts[new_state] > nf_ct_tcp_timeout_max_retrans

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
       [not found] <Pine.LNX.4.64.0802042226340.28805@boston.corp.fedex.com>
@ 2008-02-04 20:17 ` Jozsef Kadlecsik
  0 siblings, 0 replies; 28+ messages in thread
From: Jozsef Kadlecsik @ 2008-02-04 20:17 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Patrick McHardy, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

Hi,

On Mon, 4 Feb 2008, Jeff Chua wrote:

> > Attached are the dump files mentioned.
> 
> Not sure whether the attached files got uploaded. So, I'm sending this one
> more time.

I could reproduce the slow-down by a loop of socat commands. The dump you 
sent looks exactly like the traces I got at dumping the traffic generated by 
socat and that is good.

Actively closed connections are not handled properly, i.e. the initiator 
of the active close should not be taken into account. So could you give a 
try to the patch below? Does it just suppress the 'invalid packed ignored' 
and all other kernel messages or both suppresses them and produces normal 
printing speed?

commit 2067ab64ef876efaa0f4dd8fe73f48f50e215369
Author: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Date:   Sun Feb 3 17:14:22 2008 +0100

    Handle active close for both sides, it is not restricted
    to one side alone.

diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index 7a3f64c..8c7efe2 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -135,7 +135,7 @@ enum tcp_bit_set {
  * CLOSE_WAIT:	ACK seen (after FIN)
  * LAST_ACK:	FIN seen (after FIN)
  * TIME_WAIT:	last ACK seen
- * CLOSE:	closed connection
+ * CLOSE:	closed connection (RST)
  *
  * LISTEN state is not used.
  *
@@ -834,8 +834,21 @@ static int tcp_packet(struct nf_conn *conntrack,
 	case TCP_CONNTRACK_SYN_SENT:
 		if (old_state < TCP_CONNTRACK_TIME_WAIT)
 			break;
-		if ((conntrack->proto.tcp.seen[!dir].flags &
-			IP_CT_TCP_FLAG_CLOSE_INIT)
+		/* RFC 1122: "When a connection is closed actively,
+		 * it MUST linger in TIME-WAIT state for a time 2xMSL
+		 * (Maximum Segment Lifetime). However, it MAY accept
+		 * a new SYN from the remote TCP to reopen the connection
+		 * directly from TIME-WAIT state, if..."
+		 * We ignore the conditions because we are in the
+		 * TIME-WAIT state anyway.
+		 *
+		 * Handle aborted connections: we and the server
+		 * think there is an existing connection but the client
+		 * aborts it and starts a new one.
+		 */
+		if (((conntrack->proto.tcp.seen[dir].flags
+		      | conntrack->proto.tcp.seen[!dir].flags)
+		     & IP_CT_TCP_FLAG_CLOSE_INIT)
 		    || (conntrack->proto.tcp.last_dir == dir
 		        && conntrack->proto.tcp.last_index == TCP_RST_SET)) {
 			/* Attempt to reopen a closed/aborted connection.
@@ -848,18 +861,25 @@ static int tcp_packet(struct nf_conn *conntrack,
 		}
 		/* Fall through */
 	case TCP_CONNTRACK_IGNORE:
-		/* Ignored packets:
+		/* Ignored packets: 
+		 *
+		 * Our connection entry may be out of sync, so ignore
+		 * packets which may signal the real connection between
+		 * the client and the server.
 		 *
 		 * a) SYN in ORIGINAL
 		 * b) SYN/ACK in REPLY
 		 * c) ACK in reply direction after initial SYN in original.
+		 *
+		 * If the ignored packet is invalid, the receiver will send 
+		 * a RST we'll catch below.
 		 */
 		if (index == TCP_SYNACK_SET
 		    && conntrack->proto.tcp.last_index == TCP_SYN_SET
 		    && conntrack->proto.tcp.last_dir != dir
 		    && ntohl(th->ack_seq) ==
 			     conntrack->proto.tcp.last_end) {
-			/* This SYN/ACK acknowledges a SYN that we earlier
+			/* b) This SYN/ACK acknowledges a SYN that we earlier
 			 * ignored as invalid. This means that the client and
 			 * the server are both in sync, while the firewall is
 			 * not. We kill this session and block the SYN/ACK so
@@ -938,8 +958,7 @@ static int tcp_packet(struct nf_conn *conntrack,
 
 	conntrack->proto.tcp.state = new_state;
 	if (old_state != new_state
-	    && (new_state == TCP_CONNTRACK_FIN_WAIT
-		|| new_state == TCP_CONNTRACK_CLOSE))
+	    && new_state == TCP_CONNTRACK_FIN_WAIT)
 		conntrack->proto.tcp.seen[dir].flags |= IP_CT_TCP_FLAG_CLOSE_INIT;
 	timeout = conntrack->proto.tcp.retrans >= nf_ct_tcp_max_retrans
 		  && *tcp_timeouts[new_state] > nf_ct_tcp_timeout_max_retrans

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-02 14:44                       ` Jozsef Kadlecsik
@ 2008-02-03 16:08                         ` Jeff Chua
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Chua @ 2008-02-03 16:08 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Patrick McHardy, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

On Feb 2, 2008 10:44 PM, Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:

> Could I ask you to make two another tests? (I have been unable to
> reproduce the bug so far, but it must be my fault.)

You need to send more than 510 jobs to see the problem.

> In both cases enable loggin invalid messages as Patrick wrote in a previous mail:
> # modprobe ipt_LOG
> # echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid

Ok.

> In the first case run the unpatched 2.4.24 kernel and before doing
> any printing, start dumping all the traffic at the 515 port so that we
> won't miss any connection and send the dump file:
> # tcpdump -i lo -s 0 -w dump.pcap tcp port 515

Attached are the file dump1.bad, and dmesg.bad.

The console only starts printing out the "nf_ct_tcp: invalid packed
ignored" messages after the 22 seconds "pause". Look for pause at
"23:40:27.454677", and resume at "23:40:59.469704". The first console
message at "23:40:59".

> In the second case run the patched kernel and just start printing: do you
> get any 'nf_ct_tcp: invalid SYN' kernel message?

No such message. Even after trying many times. But there's still a 26
"pause" from "23:49:41.368812" to "23:50:07.392278". It's able to
resume and continue printing in "burst".

These tests were ran with just 520 jobs using:

for((i=1; i<520;i++)); do echo $i | lpr -Plp; done


Sorry for the delay. Traveling.

Thanks,
Jeff.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-01  5:28                     ` Jeff Chua
@ 2008-02-02 14:44                       ` Jozsef Kadlecsik
  2008-02-03 16:08                         ` Jeff Chua
  0 siblings, 1 reply; 28+ messages in thread
From: Jozsef Kadlecsik @ 2008-02-02 14:44 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Patrick McHardy, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

Hi Jeff,

On Fri, 1 Feb 2008, Jeff Chua wrote:

> I recaptured it again, and attached are the logs.
[...]

Thank you! One can see a plain connection-initiating SYN, which triggers 
the message. No reply from the server, then three seconds later comes a 
retransmitted SYN and immediately after the SYN/ACK reply. What makes it 
interesting is that the first SYN was let through by the conntrack: it was 
*not* blocked at all.

In the dump file there is no other previous connection between 
127.0.0.1:1021 -> 127.0.0.1:515. But there must be a previous connection 
which was not captured by tcpdump.

Could I ask you to make two another tests? (I have been unable to 
reproduce the bug so far, but it must be my fault.) In both cases enable 
loggin invalid messages as Patrick wrote in a previous mail:

# modprobe ipt_LOG
# echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid

In the first case run the unpatched 2.4.24 kernel and before doing 
any printing, start dumping all the traffic at the 515 port so that we 
won't miss any connection and send the dump file:

# tcpdump -i lo -s 0 -w dump.pcap tcp port 515 

In the second case run the patched kernel and just start printing: do you 
get any 'nf_ct_tcp: invalid SYN' kernel message?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-01  6:07                           ` David Newall
@ 2008-02-01  6:10                             ` Patrick McHardy
  0 siblings, 0 replies; 28+ messages in thread
From: Patrick McHardy @ 2008-02-01  6:10 UTC (permalink / raw)
  To: David Newall
  Cc: David Miller, kadlec, jeff.chua.linux, linux-kernel, ole,
	cups-bugs, netfilter-devel

David Newall wrote:
> I'm not debating that checksums are wrong. The question was how and
> where? It's not as if there are any unreliable communication paths in a
> loopback interface, so it's surprising that they could be wrong. How? Where?
> 

As I said, loopback doesn't perform full checksum calculation.
Its simply an optimization, nothing to worry about ...


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-01  2:12                         ` David Miller
@ 2008-02-01  6:07                           ` David Newall
  2008-02-01  6:10                             ` Patrick McHardy
  0 siblings, 1 reply; 28+ messages in thread
From: David Newall @ 2008-02-01  6:07 UTC (permalink / raw)
  To: David Miller
  Cc: kaber, kadlec, jeff.chua.linux, linux-kernel, ole, cups-bugs,
	netfilter-devel

David Miller wrote:
> From: David Newall <davidn@davidnewall.com>
> Date: Fri, 01 Feb 2008 11:17:14 +1030
>
>   
>> Patrick McHardy wrote:
>>     
>>> Jozsef Kadlecsik wrote:
>>>       
>>>> Strange, but there are a lot of incorrect checksum packets. How does
>>>> it come on the loopback interface?
>>>>         
>>> Loopback doesn't perform full checksumming, so thats expected.
>>>       
>> The question remains: How do loopback packets get incorrect checksum? 
>> Where and how can they get corrupted?
>>     
>
> There are zeros there, which is an incorrect checksum most
> of the time.
>   

I'm not debating that checksums are wrong. The question was how and
where? It's not as if there are any unreliable communication paths in a
loopback interface, so it's surprising that they could be wrong. How? Where?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-31 10:40                   ` Jozsef Kadlecsik
  2008-01-31 18:53                     ` Patrick McHardy
@ 2008-02-01  5:28                     ` Jeff Chua
  2008-02-02 14:44                       ` Jozsef Kadlecsik
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff Chua @ 2008-02-01  5:28 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Patrick McHardy, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 1194 bytes --]

On Jan 31, 2008 6:40 PM, Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> Hi Jeff,
>
> On Thu, 31 Jan 2008, Jeff Chua wrote:
>
> > On the bad run, I got the following message ...
> >
> > boston kernel: nf_ct_tcp: invalid packed ignored IN= OUT=
> > SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=8162
> > DF PROTO=TCP SPT=1016 DPT=515 SEQ=3834958843 ACK=0 WINDOW=32792
> > RES=0x00 SYN URGP=0 OPT (0204400C0402080A0000CC190000000001030307)
> > UID=0 GID=65534
>
> I can't find this packet in the last lo.bad pcap file. Could you dump the
> bad traffic and send it together with the generated log record?


I recaptured it again, and attached are the logs.

The first hang is at (from tcpdump)

13:15:42.330162 IP localhost.corp.fedex.com.1021 >
localhost.corp.fedex.com.printer: S 668990751:668990751(0) win 32792
<mss 16396,sackOK,timestamp 4294925792 0,nop,wscale 3>

Corresponding to dmesg ...

nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1
LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=41215 DF PROTO=TCP SPT=1021
DPT=515 SEQ=668990751 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT
(0204400C0402080AFFFF5DE00000000001030303) UID=0 GID=65534


Thanks,
Jeff.

[-- Attachment #2: dump1 --]
[-- Type: application/octet-stream, Size: 210129 bytes --]

[-- Attachment #3: dmesg1 --]
[-- Type: application/octet-stream, Size: 3420 bytes --]

nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=41215 DF PROTO=TCP SPT=1021 DPT=515 SEQ=668990751 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT (0204400C0402080AFFFF5DE00000000001030303) UID=0 GID=65534
nf_ct_tcp: killing out of sync session IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=515 DPT=1021 SEQ=666851408 ACK=668990752 WINDOW=32768 RES=0x00 ACK SYN URGP=0 OPT (0204400C0402080AFFFF5DE1FFFF5DE001030303) UID=0 GID=0
nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=15031 DF PROTO=TCP SPT=1023 DPT=515 SEQ=715545168 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT (0204400C0402080AFFFF60E70000000001030303) UID=0 GID=65534
nf_ct_tcp: killing out of sync session IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=515 DPT=1023 SEQ=724077908 ACK=715545169 WINDOW=32768 RES=0x00 ACK SYN URGP=0 OPT (0204400C0402080AFFFF60F0FFFF60E701030303) UID=0 GID=0
nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=29773 DF PROTO=TCP SPT=1022 DPT=515 SEQ=767236025 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT (0204400C0402080AFFFF63F40000000001030303) UID=0 GID=65534
nf_ct_tcp: killing out of sync session IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=515 DPT=1022 SEQ=765295327 ACK=767236026 WINDOW=32768 RES=0x00 ACK SYN URGP=0 OPT (0204400C0402080AFFFF63FDFFFF63F401030303) UID=0 GID=0
nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=34901 DF PROTO=TCP SPT=1020 DPT=515 SEQ=820965201 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT (0204400C0402080AFFFF67040000000001030303) UID=0 GID=65534
nf_ct_tcp: killing out of sync session IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=515 DPT=1020 SEQ=823328258 ACK=820965202 WINDOW=32768 RES=0x00 ACK SYN URGP=0 OPT (0204400C0402080AFFFF6710FFFF670401030303) UID=0 GID=0
nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=4642 DF PROTO=TCP SPT=1019 DPT=515 SEQ=856389685 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT (0204400C0402080AFFFF6A170000000001030303) UID=0 GID=65534
nf_ct_tcp: killing out of sync session IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=515 DPT=1019 SEQ=859392335 ACK=856389686 WINDOW=32768 RES=0x00 ACK SYN URGP=0 OPT (0204400C0402080AFFFF6A23FFFF6A1701030303) UID=0 GID=0
nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=31307 DF PROTO=TCP SPT=1018 DPT=515 SEQ=916330342 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT (0204400C0402080AFFFF6D2D0000000001030303) UID=0 GID=65534
nf_ct_tcp: killing out of sync session IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=515 DPT=1018 SEQ=919377640 ACK=916330343 WINDOW=32768 RES=0x00 ACK SYN URGP=0 OPT (0204400C0402080AFFFF6D2EFFFF6D2D01030303) UID=0 GID=0
nf_ct_tcp: invalid packed ignored IN= OUT= SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55160 DF PROTO=TCP SPT=1017 DPT=515 SEQ=969385804 ACK=0 WINDOW=32792 RES=0x00 SYN URGP=0 OPT (0204400C0402080AFFFF70340000000001030303) UID=0 GID=65534

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-02-01  0:47                       ` David Newall
@ 2008-02-01  2:12                         ` David Miller
  2008-02-01  6:07                           ` David Newall
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-02-01  2:12 UTC (permalink / raw)
  To: davidn
  Cc: kaber, kadlec, jeff.chua.linux, linux-kernel, ole, cups-bugs,
	netfilter-devel

From: David Newall <davidn@davidnewall.com>
Date: Fri, 01 Feb 2008 11:17:14 +1030

> Patrick McHardy wrote:
> > Jozsef Kadlecsik wrote:
> >> Strange, but there are a lot of incorrect checksum packets. How does
> >> it come on the loopback interface?
> >
> > Loopback doesn't perform full checksumming, so thats expected.
> 
> The question remains: How do loopback packets get incorrect checksum? 
> Where and how can they get corrupted?

There are zeros there, which is an incorrect checksum most
of the time.

Tcpdump is just not knowledgable about this and tries to
validate the checksum as if it were computed correctly.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-31 18:53                     ` Patrick McHardy
@ 2008-02-01  0:47                       ` David Newall
  2008-02-01  2:12                         ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: David Newall @ 2008-02-01  0:47 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jozsef Kadlecsik, Jeff Chua, lkml, Krzysztof Piotr Oledzki,
	David S. Miller, cups-bugs, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Jozsef Kadlecsik wrote:
>> Strange, but there are a lot of incorrect checksum packets. How does
>> it come on the loopback interface?
>
> Loopback doesn't perform full checksumming, so thats expected.

The question remains: How do loopback packets get incorrect checksum? 
Where and how can they get corrupted?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-31 10:40                   ` Jozsef Kadlecsik
@ 2008-01-31 18:53                     ` Patrick McHardy
  2008-02-01  0:47                       ` David Newall
  2008-02-01  5:28                     ` Jeff Chua
  1 sibling, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2008-01-31 18:53 UTC (permalink / raw)
  To: Jozsef Kadlecsik
  Cc: Jeff Chua, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

Jozsef Kadlecsik wrote:
> Hi Jeff,
> 
> On Thu, 31 Jan 2008, Jeff Chua wrote:
> 
>> On the bad run, I got the following message ...
>>
>> boston kernel: nf_ct_tcp: invalid packed ignored IN= OUT=
>> SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=8162
>> DF PROTO=TCP SPT=1016 DPT=515 SEQ=3834958843 ACK=0 WINDOW=32792
>> RES=0x00 SYN URGP=0 OPT (0204400C0402080A0000CC190000000001030307)
>> UID=0 GID=65534
> 
> I can't find this packet in the last lo.bad pcap file. Could you dump the 
> bad traffic and send it together with the generated log record?
> 
> Strange, but there are a lot of incorrect checksum packets. How does it 
> come on the loopback interface?


Loopback doesn't perform full checksumming, so thats expected.
Current kernel versions can tell userspace that its only a
partial checksum, but I'm not sure whether tcpdump already
supports this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-31  5:01                 ` Jeff Chua
@ 2008-01-31 10:40                   ` Jozsef Kadlecsik
  2008-01-31 18:53                     ` Patrick McHardy
  2008-02-01  5:28                     ` Jeff Chua
  0 siblings, 2 replies; 28+ messages in thread
From: Jozsef Kadlecsik @ 2008-01-31 10:40 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Patrick McHardy, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

Hi Jeff,

On Thu, 31 Jan 2008, Jeff Chua wrote:

> On the bad run, I got the following message ...
> 
> boston kernel: nf_ct_tcp: invalid packed ignored IN= OUT=
> SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=8162
> DF PROTO=TCP SPT=1016 DPT=515 SEQ=3834958843 ACK=0 WINDOW=32792
> RES=0x00 SYN URGP=0 OPT (0204400C0402080A0000CC190000000001030307)
> UID=0 GID=65534

I can't find this packet in the last lo.bad pcap file. Could you dump the 
bad traffic and send it together with the generated log record?

Strange, but there are a lot of incorrect checksum packets. How does it 
come on the loopback interface?

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-31  3:25               ` Patrick McHardy
@ 2008-01-31  5:01                 ` Jeff Chua
  2008-01-31 10:40                   ` Jozsef Kadlecsik
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Chua @ 2008-01-31  5:01 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jozsef Kadlecsik, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

On Jan 31, 2008 11:25 AM, Patrick McHardy <kaber@trash.net> wrote:

> Actually its probably the SYN/ACK that is dropped. Please try whether
>
> modprobe ipt_LOG
> echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid


On the good run, I don't get any message, which is good.

On the bad run, I got the following message ...

boston kernel: nf_ct_tcp: invalid packed ignored IN= OUT=
SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=8162
DF PROTO=TCP SPT=1016 DPT=515 SEQ=3834958843 ACK=0 WINDOW=32792
RES=0x00 SYN URGP=0 OPT (0204400C0402080A0000CC190000000001030307)
UID=0 GID=65534


This message is displayed repeatedly after the job got "stuck", once
ever 3 seconds coinciding with every 3 seconds of the print job sent.

Hope this helps.

Jeff.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-31  3:21             ` Jeff Chua
@ 2008-01-31  3:25               ` Patrick McHardy
  2008-01-31  5:01                 ` Jeff Chua
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2008-01-31  3:25 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Jozsef Kadlecsik, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

Jeff Chua wrote:
> On Jan 31, 2008 10:41 AM, Patrick McHardy <kaber@trash.net> wrote:
>
>   
>> Thanks. In the dump we can see that connections reusing ports
>> always have their first SYN dropped and retransmissted three
>> seconds later. I'm not sure whats causing this yet, do you have
>> any firewall rules that affect loopback traffic?
>>     
>
> No firewall.
>
> And the "lp" is just to print to a file@localhost.
>
> # lpadmin -p lp -i /etc/cups/interfaces/lp -v lpd://localhost/file -o
> printer-error-policy=retry-job
> # lpadmin -p file -i /etc/cups/interfaces/file -v file:/dev/null -o
> printer-error-policy=retry-job
>
> Filter for lp is ..
>   cat $6
>
> Filter for file is ...
>   cat $6 >/tmp/$$
>   

Actually its probably the SYN/ACK that is dropped. Please try whether

modprobe ipt_LOG
echo 255 >/proc/sys/net/netfilter/nf_conntrack_log_invalid

and then running the test again logs something to the ring buffer.
You don't need to run the entire test, just until you get a log
message from conntrack.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-31  2:41           ` Patrick McHardy
@ 2008-01-31  3:21             ` Jeff Chua
  2008-01-31  3:25               ` Patrick McHardy
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Chua @ 2008-01-31  3:21 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jozsef Kadlecsik, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

On Jan 31, 2008 10:41 AM, Patrick McHardy <kaber@trash.net> wrote:

> Thanks. In the dump we can see that connections reusing ports
> always have their first SYN dropped and retransmissted three
> seconds later. I'm not sure whats causing this yet, do you have
> any firewall rules that affect loopback traffic?

No firewall.

And the "lp" is just to print to a file@localhost.

# lpadmin -p lp -i /etc/cups/interfaces/lp -v lpd://localhost/file -o
printer-error-policy=retry-job
# lpadmin -p file -i /etc/cups/interfaces/file -v file:/dev/null -o
printer-error-policy=retry-job

Filter for lp is ..
  cat $6

Filter for file is ...
  cat $6 >/tmp/$$


Thanks,
Jeff.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
       [not found]         ` <b6a2187b0801301826l5a50ce84p7d5dce3d0a74b3c0@mail.gmail.com>
@ 2008-01-31  2:41           ` Patrick McHardy
  2008-01-31  3:21             ` Jeff Chua
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2008-01-31  2:41 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Jozsef Kadlecsik, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

Jeff Chua wrote:
> On Jan 31, 2008 10:23 AM, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
>
>   
>> Again, using latest linux, one with
>> 17311393f969090ab060540bd9dbe7dc885a76d5 reverted, and the other
>> without.
>>     
>
> Sorry, here's the attached output files.

Thanks. In the dump we can see that connections reusing ports
always have their first SYN dropped and retransmissted three
seconds later. I'm not sure whats causing this yet, do you have
any firewall rules that affect loopback traffic?

This filter (wireshark) shows one example of the problem in case
someone else wants to have a look:

(tcp.dstport == 1017 || tcp.srcport == 1017) && (tcp.flags == 0x02)


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-30 13:47     ` Patrick McHardy
@ 2008-01-31  2:23       ` Jeff Chua
       [not found]         ` <b6a2187b0801301826l5a50ce84p7d5dce3d0a74b3c0@mail.gmail.com>
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Chua @ 2008-01-31  2:23 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jozsef Kadlecsik, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

On Jan 30, 2008 9:47 PM, Patrick McHardy <kaber@trash.net> wrote:

> A binary dump would be more useful:
>
> tcpdump -i lo -w <outfile>
>
> and I guess Jozsef also wants "-s 0" so the full packets are included.

Attached. Again, both runs with this command to print ...

for((i=1; i<1001;i++)); do echo $i | lpr -Plp; done

In the good file (lo.good), look for this timestamp (08:38:11.818587)
when it paused then continue again at 08:38:22.477261. That's almost
11 seconds of "sleep" ... (may be a feature of TCP/IP?).

In the bad file (lo.bad), look for 08:47:55.434722 where it paused,
and then continue at
08:48:24.449176. That's 28 seconds of sleep. But, after it continued,
lpstat shows it's printing a job every 3 seconds. All 100 jobs take
approx 1400 seconds to complete as compared to under 100 seconds for
the good run.

Again, using latest linux, one with
17311393f969090ab060540bd9dbe7dc885a76d5 reverted, and the other
without.


Thanks,
Jeff.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
       [not found]   ` <b6a2187b0801291924k15f883aeg54f704156e4f2e3e@mail.gmail.com>
@ 2008-01-30 13:47     ` Patrick McHardy
  2008-01-31  2:23       ` Jeff Chua
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick McHardy @ 2008-01-30 13:47 UTC (permalink / raw)
  To: Jeff Chua
  Cc: Jozsef Kadlecsik, lkml, Krzysztof Piotr Oledzki, David S. Miller,
	cups-bugs, Netfilter Development Mailinglist

Jeff Chua wrote:
> On Jan 29, 2008 6:53 PM, Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> wrote:
> 
>> As the problem can be reproduced so easily, could you capture a full TCP
>> session and send the pcap file? Thus it could be analyzed, replayed, etc.
>> and found the reason why the patch above slows down the printing.
>> By reverting back it we loose TCP reopening support.
> 
> Not an expert here. I use 'tcpdump -i lo' and don't see anything
> different between the good and bad.  Anyway, attached are the good and
> bad log files.
> 
> There's a slight pause of 27 seconds after 502 jobs (both good and bad
> runs), then the good one will resume and transmit in burst until
> completing all the 1000 times. The bad run will resume and transmit
> one-by-one every 3 seconds.
> 
> Please tell me what options to use with tcpdump to dump more useful output.


A binary dump would be more useful:

tcpdump -i lo -w <outfile>

and I guess Jozsef also wants "-s 0" so the full packets are included.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-28 22:41 Jeff Chua
  2008-01-28 22:56 ` Krzysztof Oledzki
@ 2008-01-29 10:53 ` Jozsef Kadlecsik
       [not found]   ` <b6a2187b0801291924k15f883aeg54f704156e4f2e3e@mail.gmail.com>
  1 sibling, 1 reply; 28+ messages in thread
From: Jozsef Kadlecsik @ 2008-01-29 10:53 UTC (permalink / raw)
  To: Jeff Chua
  Cc: lkml, Krzysztof Piotr Oledzki, Patrick McHardy, David S. Miller,
	cups-bugs

On Tue, 29 Jan 2008, Jeff Chua wrote:

> On Jan 28, 2008 7:18 AM, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
> > I'm sending printing jobs to a network printer (it's actually printing
> > to the localhost simply creating a file), and running this on
> > Linux-2.6.24 will cause the printing to slow down to 1 print every 3
> > seconds after printing 500 times.
> 
> I bisected the kernel since the last known good 2.6.23 and zeroed in to this
> commit.
> 
> commit 17311393f969090ab060540bd9dbe7dc885a76d5
> Author: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Date:   Thu Oct 11 14:35:52 2007 -0700
> 
>     [NETFILTER]: nf_conntrack_tcp: fix connection reopening
> 
> 
> Reverting this commit solves the problem.
> 
> Version              1000 jobs
> 2.6.23                 90 sec
> 2.6.24              1,492 sec  <== with commit
> 2.6.24(patch)          90 sec  <== reverted the commit
> 
> 
> Since the code has changed an can't simply revert the commit for the latest
> linux git download, I made a patch to revert the above commit. I've tested and
> it's working, but don't know if it breaks other things. Please review.

As the problem can be reproduced so easily, could you capture a full TCP 
session and send the pcap file? Thus it could be analyzed, replayed, etc. 
and found the reason why the patch above slows down the printing.
By reverting back it we loose TCP reopening support.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@sunserv.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-28 22:56 ` Krzysztof Oledzki
@ 2008-01-28 23:55   ` Jeff Chua
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Chua @ 2008-01-28 23:55 UTC (permalink / raw)
  To: Krzysztof Oledzki
  Cc: lkml, cups-bugs, Jozsef Kadlecsik, Patrick McHardy,
	David S. Miller, netfilter-devel

2008/1/29 Krzysztof Oledzki <ole@ans.pl>:

> Strange. You stated that 2.6.23.12 is OK, however above patch
> was included in 2.6.23.4:
> Are you 100% sure that 2.6.23.12 is OK?

Sorry, my mistake. I had another system on 2.6.23.12 and was not OK,
so I bisected starting from 2.6.23.

git bisect good bbf25010f1a6b761914430f5fca081ec8c7accd1        # 2.6.23

So, 2.6.23.12 is not ok.

Thanks for pointing this out.

Jeff.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
  2008-01-28 22:41 Jeff Chua
@ 2008-01-28 22:56 ` Krzysztof Oledzki
  2008-01-28 23:55   ` Jeff Chua
  2008-01-29 10:53 ` Jozsef Kadlecsik
  1 sibling, 1 reply; 28+ messages in thread
From: Krzysztof Oledzki @ 2008-01-28 22:56 UTC (permalink / raw)
  To: Jeff Chua
  Cc: lkml, cups-bugs, Jozsef Kadlecsik, Patrick McHardy,
	David S. Miller, cups-bugs, netfilter-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1617 bytes --]



On Tue, 29 Jan 2008, Jeff Chua wrote:

>
>
> On Jan 28, 2008 7:18 AM, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
>> I'm sending printing jobs to a network printer (it's actually printing
>> to the localhost simply creating a file), and running this on
>> Linux-2.6.24 will cause the printing to slow down to 1 print every 3
>> seconds after printing 500 times.
>
> I bisected the kernel since the last known good 2.6.23 and zeroed in to this 
> commit.
>
> commit 17311393f969090ab060540bd9dbe7dc885a76d5
> Author: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Date:   Thu Oct 11 14:35:52 2007 -0700
>
>    [NETFILTER]: nf_conntrack_tcp: fix connection reopening
>
>
> Reverting this commit solves the problem.
>
> Version              1000 jobs
> 2.6.23                 90 sec
> 2.6.24              1,492 sec  <== with commit
> 2.6.24(patch)          90 sec  <== reverted the commit
>

Strange. You stated that 2.6.23.12 is OK, however above patch
was included in 2.6.23.4:

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.23.4
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.23.y.git;a=commitdiff;h=0ac38060c5e1e12e851ed3e281597286b57f9ad1

Commit 0ac38060c5e1e12e851ed3e281597286b57f9ad1 combines two fixes from 
2.6.24:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=17311393f969090ab060540bd9dbe7dc885a76d5
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=bc34b841556aad437baf4199744e55500bfa2088

Are you 100% sure that 2.6.23.12 is OK?

Best regards,

 				Krzysztof Olędzki

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: cups slow on linux-2.6.24
@ 2008-01-28 22:41 Jeff Chua
  2008-01-28 22:56 ` Krzysztof Oledzki
  2008-01-29 10:53 ` Jozsef Kadlecsik
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff Chua @ 2008-01-28 22:41 UTC (permalink / raw)
  To: lkml, cups-bugs, Jozsef Kadlecsik, Krzysztof Piotr Oledzki,
	Patrick McHardy, David S. Miller, cups-bugs



On Jan 28, 2008 7:18 AM, Jeff Chua <jeff.chua.linux@gmail.com> wrote:
> I'm sending printing jobs to a network printer (it's actually printing
> to the localhost simply creating a file), and running this on
> Linux-2.6.24 will cause the printing to slow down to 1 print every 3
> seconds after printing 500 times.

I bisected the kernel since the last known good 2.6.23 and zeroed in to 
this commit.

commit 17311393f969090ab060540bd9dbe7dc885a76d5
Author: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Date:   Thu Oct 11 14:35:52 2007 -0700

     [NETFILTER]: nf_conntrack_tcp: fix connection reopening


Reverting this commit solves the problem.

Version              1000 jobs
2.6.23                 90 sec
2.6.24              1,492 sec  <== with commit
2.6.24(patch)          90 sec  <== reverted the commit


Since the code has changed an can't simply revert the commit for the 
latest linux git download, I made a patch to revert the above commit. I've 
tested and it's working, but don't know if it breaks other things. Please 
review.

Thanks,
Jeff.


--- net/netfilter/nf_conntrack_proto_tcp.c.org	2008-01-29 03:34:43 +0800
+++ net/netfilter/nf_conntrack_proto_tcp.c	2008-01-29 03:36:26 +0800
@@ -836,8 +836,8 @@
  			break;
  		if ((conntrack->proto.tcp.seen[!dir].flags &
  			IP_CT_TCP_FLAG_CLOSE_INIT)
-		    || (conntrack->proto.tcp.last_dir == dir
-		        && conntrack->proto.tcp.last_index == TCP_RST_SET)) {
+		    || after(ntohl(th->seq),
+		        conntrack->proto.tcp.seen[dir].td_end)) {
  			/* Attempt to reopen a closed/aborted connection.
  			 * Delete this connection and look up again. */
  			write_unlock_bh(&tcp_lock);
@@ -845,6 +845,12 @@
  				conntrack->timeout.function((unsigned long)
  							    conntrack);
  			return -NF_REPEAT;
+		} else {
+			write_unlock_bh(&tcp_lock);
+			if (LOG_INVALID(IPPROTO_TCP))
+				nf_log_packet(pf, 0, skb, NULL, NULL,
+					NULL, "nf_ct_tcp: invalid SYN");
+			return -NF_ACCEPT;
  		}
  		/* Fall through */
  	case TCP_CONNTRACK_IGNORE:



^ permalink raw reply	[flat|nested] 28+ messages in thread

* cups slow on linux-2.6.24
@ 2008-01-27 23:18 Jeff Chua
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Chua @ 2008-01-27 23:18 UTC (permalink / raw)
  To: lkml, cups-bugs

I'm sending printing jobs to a network printer (it's actually printing
to the localhost simply creating a file), and running this on
Linux-2.6.24 will cause the printing to slow down to 1 print every 3
seconds after printing 500 times.

No such symptoms on 2.6.23.12, or 2.6.20.21.

It's repeatable.

I'm just check if this is already reported/resolved before bisecting the kernel.

Thanks,
Jeff.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2008-02-15  1:44 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-10 15:06 cups slow on linux-2.6.24 Jeff Chua
2008-02-14 15:02 ` Jozsef Kadlecsik
2008-02-14 22:50   ` David Miller
2008-02-15  1:44     ` Patrick McHardy
  -- strict thread matches above, loose matches on Subject: below --
2008-02-05  1:17 Jeff Chua
2008-02-05  8:16 ` Jozsef Kadlecsik
2008-02-05 13:47   ` Patrick McHardy
     [not found] <Pine.LNX.4.64.0802042226340.28805@boston.corp.fedex.com>
2008-02-04 20:17 ` Jozsef Kadlecsik
2008-01-28 22:41 Jeff Chua
2008-01-28 22:56 ` Krzysztof Oledzki
2008-01-28 23:55   ` Jeff Chua
2008-01-29 10:53 ` Jozsef Kadlecsik
     [not found]   ` <b6a2187b0801291924k15f883aeg54f704156e4f2e3e@mail.gmail.com>
2008-01-30 13:47     ` Patrick McHardy
2008-01-31  2:23       ` Jeff Chua
     [not found]         ` <b6a2187b0801301826l5a50ce84p7d5dce3d0a74b3c0@mail.gmail.com>
2008-01-31  2:41           ` Patrick McHardy
2008-01-31  3:21             ` Jeff Chua
2008-01-31  3:25               ` Patrick McHardy
2008-01-31  5:01                 ` Jeff Chua
2008-01-31 10:40                   ` Jozsef Kadlecsik
2008-01-31 18:53                     ` Patrick McHardy
2008-02-01  0:47                       ` David Newall
2008-02-01  2:12                         ` David Miller
2008-02-01  6:07                           ` David Newall
2008-02-01  6:10                             ` Patrick McHardy
2008-02-01  5:28                     ` Jeff Chua
2008-02-02 14:44                       ` Jozsef Kadlecsik
2008-02-03 16:08                         ` Jeff Chua
2008-01-27 23:18 Jeff Chua

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).