LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* time for TCP ECN defaulting to on?
@ 2008-11-04 14:32 Daniel J Blueman
  2008-11-04 16:16 ` Dave Hudson
  2008-11-05 22:20 ` Mikael Abrahamsson
  0 siblings, 2 replies; 28+ messages in thread
From: Daniel J Blueman @ 2008-11-04 14:32 UTC (permalink / raw)
  To: Linux Kernel, Linux Netdev, Linux Networking

Is it time to enable TCP ECN per default and get the benefits, since
router support has been around and known-about for really considerable
time?

Perhaps it should be a question of enabling it, and educating people
to disable it if they run into issues, since we'll probably be in the
same situation in 5 years...and it'll be some time before these
kernels hit devices/servers anyway.

Daniel
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-04 14:32 time for TCP ECN defaulting to on? Daniel J Blueman
@ 2008-11-04 16:16 ` Dave Hudson
  2008-11-04 22:52   ` David Miller
  2008-11-05 22:20 ` Mikael Abrahamsson
  1 sibling, 1 reply; 28+ messages in thread
From: Dave Hudson @ 2008-11-04 16:16 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, Linux Netdev, Linux Networking

Daniel J Blueman wrote:
> Is it time to enable TCP ECN per default and get the benefits, since
> router support has been around and known-about for really considerable
> time?
> 
> Perhaps it should be a question of enabling it, and educating people
> to disable it if they run into issues, since we'll probably be in the
> same situation in 5 years...and it'll be some time before these
> kernels hit devices/servers anyway.
> 
> Daniel

Unfortunately I think you'll find there are sufficiently large numbers 
of broken SOHO routers out there that if you try this you'll cause a lot 
of problems.  The problems range from no connectivity to in a few 
extreme cases routers actually crashing or behaving in very 
unpredictable ways.  Here's one summary that got presented to the IETF 
about 18 months ago:

http://www.ietf.org/proceedings/07mar/slides/tsvarea-3/sld6.htm

When a clueless end-user gets a Linux-enabled netbook that crashes their 
router while their existing Vista or XP systems appear to work just fine 
then the Linux network stack will get the blame for being buggy, not the 
router :-(


Regards,
Dave



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-04 16:16 ` Dave Hudson
@ 2008-11-04 22:52   ` David Miller
  2008-11-05  1:16     ` Michael Chan
  2008-11-16  9:13     ` Herbert Xu
  0 siblings, 2 replies; 28+ messages in thread
From: David Miller @ 2008-11-04 22:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: daniel.blueman, linux-kernel, netdev, linux-net

From: Dave Hudson <linux-kernel@blueteddy.net>
Date: Tue, 04 Nov 2008 16:16:03 +0000

> Daniel J Blueman wrote:
> > Is it time to enable TCP ECN per default and get the benefits, since
> > router support has been around and known-about for really considerable
> > time?
> > Perhaps it should be a question of enabling it, and educating people
> > to disable it if they run into issues, since we'll probably be in the
> > same situation in 5 years...and it'll be some time before these
> > kernels hit devices/servers anyway.
> > Daniel
>
> Unfortunately I think you'll find there are sufficiently large
> numbers of broken SOHO routers out there that if you try this you'll
> cause a lot of problems.  The problems range from no connectivity to
> in a few extreme cases routers actually crashing or behaving in very
> unpredictable ways.  Here's one summary that got presented to the
> IETF about 18 months ago:
>
> http://www.ietf.org/proceedings/07mar/slides/tsvarea-3/sld6.htm

Another issue is that, even if we turn it on by default, it won't
be on for a significant number of network cards out there.

This is because TSO, which is on by default, doesn't support ECN
in many implementations.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-04 22:52   ` David Miller
@ 2008-11-05  1:16     ` Michael Chan
  2008-11-05  5:58       ` David Miller
  2008-11-16  9:13     ` Herbert Xu
  1 sibling, 1 reply; 28+ messages in thread
From: Michael Chan @ 2008-11-05  1:16 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, daniel.blueman, linux-kernel, netdev, linux-net


On Tue, 2008-11-04 at 14:52 -0800, David Miller wrote:
> From: Dave Hudson <linux-kernel@blueteddy.net>
> Date: Tue, 04 Nov 2008 16:16:03 +0000
> 
> > Daniel J Blueman wrote:
> > > Is it time to enable TCP ECN per default and get the benefits, since
> > > router support has been around and known-about for really considerable
> > > time?
> > > Perhaps it should be a question of enabling it, and educating people
> > > to disable it if they run into issues, since we'll probably be in the
> > > same situation in 5 years...and it'll be some time before these
> > > kernels hit devices/servers anyway.
> > > Daniel
> >
> > Unfortunately I think you'll find there are sufficiently large
> > numbers of broken SOHO routers out there that if you try this you'll
> > cause a lot of problems.  The problems range from no connectivity to
> > in a few extreme cases routers actually crashing or behaving in very
> > unpredictable ways.  Here's one summary that got presented to the
> > IETF about 18 months ago:
> >
> > http://www.ietf.org/proceedings/07mar/slides/tsvarea-3/sld6.htm
> 
> Another issue is that, even if we turn it on by default, it won't
> be on for a significant number of network cards out there.
> 
> This is because TSO, which is on by default, doesn't support ECN
> in many implementations.

I think this is no longer a limitation.  The GSO code will take care of
ECN properly if the hardware does not support it when doing TSO.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-05  1:16     ` Michael Chan
@ 2008-11-05  5:58       ` David Miller
  2008-11-05  6:31         ` Michael Chan
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-11-05  5:58 UTC (permalink / raw)
  To: mchan; +Cc: linux-kernel, daniel.blueman, linux-kernel, netdev, linux-net

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 04 Nov 2008 17:16:03 -0800

> I think this is no longer a limitation.  The GSO code will take care
> of ECN properly if the hardware does not support it when doing TSO.

Hmm, good point, but if that is what happens I don't know if I agree
with it.

If "take care of ECN" means doing TSO in software, that's in my
opinion the wrong thing to do.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-05  5:58       ` David Miller
@ 2008-11-05  6:31         ` Michael Chan
  2008-11-05  7:29           ` David Miller
  0 siblings, 1 reply; 28+ messages in thread
From: Michael Chan @ 2008-11-05  6:31 UTC (permalink / raw)
  To: 'David Miller'
  Cc: linux-kernel, daniel.blueman, linux-kernel, netdev, linux-net

David Miller wrote:

> From: "Michael Chan" <mchan@broadcom.com>
> Date: Tue, 04 Nov 2008 17:16:03 -0800
>
> > I think this is no longer a limitation.  The GSO code will take care
> > of ECN properly if the hardware does not support it when doing TSO.
>
> Hmm, good point, but if that is what happens I don't know if I agree
> with it.
>
> If "take care of ECN" means doing TSO in software, that's in my
> opinion the wrong thing to do.
>
Right, it means TSO will be done in software by the GSO code if
ECE or CWR is set in a TSO frame and the driver indicates that
the hardware cannot segment such packets properly.

This allows TSO and ECN to coexist.  Before this, ECN was always
disabled when TSO was enabled.

Assuming ECE and CWR are set infrequently on TSO frames, we still
benefit from hardware TSO most of the time.  Why is it the wrong
thing to do?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-05  6:31         ` Michael Chan
@ 2008-11-05  7:29           ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2008-11-05  7:29 UTC (permalink / raw)
  To: mchan; +Cc: linux-kernel, daniel.blueman, linux-kernel, netdev, linux-net

From: "Michael Chan" <mchan@broadcom.com>
Date: Tue, 4 Nov 2008 22:31:00 -0800

> Assuming ECE and CWR are set infrequently on TSO frames, we still
> benefit from hardware TSO most of the time.  Why is it the wrong
> thing to do?

I had forgotten about that aspect, and yes this is a
good tradeoff considering that.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-04 14:32 time for TCP ECN defaulting to on? Daniel J Blueman
  2008-11-04 16:16 ` Dave Hudson
@ 2008-11-05 22:20 ` Mikael Abrahamsson
  2008-11-05 23:10   ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Mikael Abrahamsson @ 2008-11-05 22:20 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Linux Kernel, Linux Netdev, Linux Networking

On Tue, 4 Nov 2008, Daniel J Blueman wrote:

> Is it time to enable TCP ECN per default and get the benefits, since 
> router support has been around and known-about for really considerable 
> time?

I think enabling ECN by default is a bad idea.

Looking at the largest core router vendor out there:

http://www.cisco.com/en/US/docs/ios/12_2t/12_2t8/feature/guide/ftwrdecn.html#wp1031751

ECN has to actually be turned on in the routers along the way, it's not 
default behaviour. No ISP I know of does this, but I can do a poll of 
other ISP engineers in case more information is wanted.

So the upside of enabling it is minimal (I'd gladly see data proving the 
opposite) and the downside is a lot of trouble with lots of older devices 
which behave badly when ECN is enabled.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-05 22:20 ` Mikael Abrahamsson
@ 2008-11-05 23:10   ` David Miller
  2008-11-07  4:46     ` Mikael Abrahamsson
  0 siblings, 1 reply; 28+ messages in thread
From: David Miller @ 2008-11-05 23:10 UTC (permalink / raw)
  To: swmike; +Cc: daniel.blueman, linux-kernel, netdev, linux-net

From: Mikael Abrahamsson <swmike@swm.pp.se>
Date: Wed, 5 Nov 2008 23:20:25 +0100 (CET)

> So the upside of enabling it is minimal (I'd gladly see data proving
> the opposite) and the downside is a lot of trouble with lots of
> older devices which behave badly when ECN is enabled.

This kind of thinking just perpetuates the problem forever.

If nothing important on end nodes enables it by default, people
running core routers have no reason to turn it on, and so on and so
forth.

Linux is much bigger and smarter than that, so we should break
the loop and enable it by default some point soon.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-05 23:10   ` David Miller
@ 2008-11-07  4:46     ` Mikael Abrahamsson
  2008-11-07  4:49       ` David Miller
  2008-11-07  7:53       ` Ilpo Järvinen
  0 siblings, 2 replies; 28+ messages in thread
From: Mikael Abrahamsson @ 2008-11-07  4:46 UTC (permalink / raw)
  To: David Miller; +Cc: daniel.blueman, linux-kernel, netdev, linux-net

On Wed, 5 Nov 2008, David Miller wrote:

> This kind of thinking just perpetuates the problem forever.

Well, I also think IPv6 breaks things for some people (mostly buggt DNS 
resolvers) but I wholly support this being default on. The ISP business is 
going in the direction of faster links and smaller interface buffers, 
meaning WRED is used less and less, thus lessening the benefit of ECN.

I see that in 
<http://www.icir.org/floyd/papers/draft-ietf-tsvwg-tcp-ecn-00.txt> there 
is a recommendation to not use ECN on retransmits, is there code right now 
(or planned) to do some kind of "ECN blackhole detection", ie if no 
response is received to SYN with ECN set, continue by sending the second 
SYN without ECN and keep this information for the duration of the TCP 
session?

If there is, I do agree with you that enabling ECN by default is a good 
idea. Without such code, I still believe there are enough broken devices 
out there that will create problems for people.

It's like the TCP option order "bug", where some devices would drop the 
packets because of buggy implementations, that was changed in Linux to 
work around others buggy code, and I see "ECN blackhole detection" as a 
similar measure.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07  4:46     ` Mikael Abrahamsson
@ 2008-11-07  4:49       ` David Miller
  2008-11-07  7:53       ` Ilpo Järvinen
  1 sibling, 0 replies; 28+ messages in thread
From: David Miller @ 2008-11-07  4:49 UTC (permalink / raw)
  To: swmike; +Cc: daniel.blueman, linux-kernel, netdev, linux-net

From: Mikael Abrahamsson <swmike@swm.pp.se>
Date: Fri, 7 Nov 2008 05:46:28 +0100 (CET)

> I see that in
> <http://www.icir.org/floyd/papers/draft-ietf-tsvwg-tcp-ecn-00.txt>
> there is a recommendation to not use ECN on retransmits, is there
> code right now (or planned) to do some kind of "ECN blackhole
> detection", ie if no response is received to SYN with ECN set,
> continue by sending the second SYN without ECN and keep this
> information for the duration of the TCP session?

No, we are firmly against any form of ECN blackhole detection.  Alexey Kuznetsov and
Sally Floyd argued this out exhaustively several years ago.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07  4:46     ` Mikael Abrahamsson
  2008-11-07  4:49       ` David Miller
@ 2008-11-07  7:53       ` Ilpo Järvinen
  2008-11-07  8:11         ` Mikael Abrahamsson
  2008-11-07 11:18         ` Dave Hudson
  1 sibling, 2 replies; 28+ messages in thread
From: Ilpo Järvinen @ 2008-11-07  7:53 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: David Miller, daniel.blueman, LKML, Netdev, linux-net

On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:

> On Wed, 5 Nov 2008, David Miller wrote:
> 
> > This kind of thinking just perpetuates the problem forever.
> 
> It's like the TCP option order "bug", where some devices would drop the
> packets because of buggy implementations, that was changed in Linux to work
> around others buggy code, and I see "ECN blackhole detection" as a similar
> measure.

That is entirely bogus claim! The different ordering of options cost us 
nothing, while disabling ECN certainly has an innumerable cost both in 
performance and in nobody taking the initiative which makes the situation 
worse for everybody.

And about somebody earlier claiming that they'll get an impressions that 
Linux stack is broken (if such people even know that there's some network 
stack in Linux :-))... I'm rather sure those isp supports etc. put a blaim 
on us anyway even when loads of counterproof would exists because it's 
just cheaper to do nothing and blaim linux instead. Also some claims 
asserted by incompetent people easily start to live among random forums; 
an example from the previous incident: "since disabling timestamps helps, 
it must be that timestamps are broken" (and somebody even "more clueful" 
added that they got enabled for 2.6.27?!?), needless to say, neither 
holds.


-- 
 i.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07  7:53       ` Ilpo Järvinen
@ 2008-11-07  8:11         ` Mikael Abrahamsson
  2008-11-07  9:12           ` Bjørn Mork
  2008-11-07 11:28           ` Ilpo Järvinen
  2008-11-07 11:18         ` Dave Hudson
  1 sibling, 2 replies; 28+ messages in thread
From: Mikael Abrahamsson @ 2008-11-07  8:11 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, daniel.blueman, LKML, Netdev, linux-net

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3161 bytes --]

On Fri, 7 Nov 2008, Ilpo Järvinen wrote:

> On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:
>
>> On Wed, 5 Nov 2008, David Miller wrote:
>>
>>> This kind of thinking just perpetuates the problem forever.
>>
>> It's like the TCP option order "bug", where some devices would drop the
>> packets because of buggy implementations, that was changed in Linux to work
>> around others buggy code, and I see "ECN blackhole detection" as a similar
>> measure.
>
> That is entirely bogus claim! The different ordering of options cost us
> nothing, while disabling ECN certainly has an innumerable cost both in
> performance and in nobody taking the initiative which makes the situation
> worse for everybody.

I can't comment on "ECN blackhole detection" costing or costing none since 
I haven't been able to find the discussion between Alexey Kuznetsov and 
Sally Floyd that David Miller was referring to. Anything more to go on? A 
direct link to the thread would be great.

I have sent an email (which will hopefully initiate a discussion) to a 
mailinglist populated by a lot of the operational ISP community and asked 
around about ECN and views on that. I also checked around on core router 
platforms (Cisco 12000 and Cisco CRS-1, which definitely is two of the top 
three core router platforms deployed in the world) and it seems they do 
not support ECN as far as I can discern. This pretty much in the next 5 
year timeframe ECN widespread support in the major core ISP networks out 
of the question, leaving ECN support on the slower links where it might be 
deployed faster. I doubt it though.

> And about somebody earlier claiming that they'll get an impressions that
> Linux stack is broken (if such people even know that there's some network
> stack in Linux :-))... I'm rather sure those isp supports etc. put a blaim
> on us anyway even when loads of counterproof would exists because it's
> just cheaper to do nothing and blaim linux instead. Also some claims
> asserted by incompetent people easily start to live among random forums;
> an example from the previous incident: "since disabling timestamps helps,
> it must be that timestamps are broken" (and somebody even "more clueful"
> added that they got enabled for 2.6.27?!?), needless to say, neither
> holds.

People just want it to work, people disable IPv6 because their DNS servers 
don't respond properly to AAAA queries so they shut off IPv6 because they 
they just want everything to work, they don't want to understand.

Now, IPv6 for me is cruicial to the continuing life and prosperity of the 
Internet (NAT is bad). ECN is "nice to have".

But let me check out what the ISP community has to say before we get too 
upset, it might be that people agree and will start requesting ECN in the 
core equipment (I know I will) and then it might be worthwile after all.

I do see Linux (and Linux users) as leader(s) in deploying new technology, 
with ECN being one of them. Question is how much hurt we're going to take 
for it.

<http://www.merit.edu/mail.archives/nanog/msg12756.html> is a link to my 
email to the NANOG ML referenced above.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07  8:11         ` Mikael Abrahamsson
@ 2008-11-07  9:12           ` Bjørn Mork
  2008-11-07 11:16             ` Mikael Abrahamsson
  2008-11-07 11:28           ` Ilpo Järvinen
  1 sibling, 1 reply; 28+ messages in thread
From: Bjørn Mork @ 2008-11-07  9:12 UTC (permalink / raw)
  To: Mikael Abrahamsson
  Cc: Ilpo Järvinen, David Miller, daniel.blueman, LKML, Netdev,
	linux-net

Mikael Abrahamsson <swmike@swm.pp.se> writes:

> I have sent an email (which will hopefully initiate a discussion) to a
> mailinglist populated by a lot of the operational ISP community and
> asked around about ECN and views on that. I also checked around on
> core router platforms (Cisco 12000 and Cisco CRS-1, which definitely
> is two of the top three core router platforms deployed in the world)
> and it seems they do not support ECN as far as I can discern.

I believe you can forget about ECN in core networks as long as these
searches fail:

 http://www.google.com/search?q=%22rfc+5129%22+site%3Acisco.com
 http://www.google.com/search?q=%22rfc+5129%22+site%3Ajuniper.net



Bjørn

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07  9:12           ` Bjørn Mork
@ 2008-11-07 11:16             ` Mikael Abrahamsson
  0 siblings, 0 replies; 28+ messages in thread
From: Mikael Abrahamsson @ 2008-11-07 11:16 UTC (permalink / raw)
  To: LKML; +Cc: Netdev, linux-net

[-- Attachment #1: Type: TEXT/PLAIN, Size: 425 bytes --]

On Fri, 7 Nov 2008, Bjørn Mork wrote:

> I believe you can forget about ECN in core networks as long as these 
> searches fail:

A lot of ISPs use MPLS yes, but it would be a simplification to say that 
it's useless without MPLS support, because it's quite common that 
congestion happens at the interconnections/borders between ISPs and they 
are very rarely MPLS-labeled.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07  7:53       ` Ilpo Järvinen
  2008-11-07  8:11         ` Mikael Abrahamsson
@ 2008-11-07 11:18         ` Dave Hudson
  2008-11-07 14:29           ` Alan Cox
  2008-11-07 14:33           ` Andi Kleen
  1 sibling, 2 replies; 28+ messages in thread
From: Dave Hudson @ 2008-11-07 11:18 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Mikael Abrahamsson, David Miller, daniel.blueman, LKML, Netdev,
	linux-net

Ilpo Järvinen wrote:
> On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:
> 
> And about somebody earlier claiming that they'll get an impressions that 
> Linux stack is broken (if such people even know that there's some network 
> stack in Linux :-))... I'm rather sure those isp supports etc. put a blaim 
> on us anyway even when loads of counterproof would exists because it's 
> just cheaper to do nothing and blaim linux instead. Also some claims 
> asserted by incompetent people easily start to live among random forums; 
> an example from the previous incident: "since disabling timestamps helps, 
> it must be that timestamps are broken" (and somebody even "more clueful" 
> added that they got enabled for 2.6.27?!?), needless to say, neither 
> holds.

Not all of the routers in question (the ones that crash, block packets 
or otherwise misbehave) are provided by ISPs - in fact a huge number of 
them are and have been sold retail.  Over time most of those boxes will 
get replaced with ones that don't have the problem because most 
(probably all major) SOHO router suppliers now test that they don't 
break with ECN so eventually there will be a point where enabling ECN by 
default will make a lot of sense (there will be too few broken routers 
to care about).

What I do believe (having spent a lot of years writing embedded device 
and router code - and no, not the ones that crash ;-)) is that if you 
enable a feature that causes just 1% of users to have an out-of-the-box 
problem you'll see a seriously disproportionate response from end users. 
  Most people (and engineers are not "most people" :-)) will blame the 
new thing that they've just added or changed, not the old thing that was 
broken to begin with (it's human nature not to truly understand cause 
and effect).

Whether we like it or not there's currently a known problem deploying 
ECN on a wide scale - it has been sufficient to stop pretty-much 
everyone from enabling it by default so far.


Regards,
Dave



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07  8:11         ` Mikael Abrahamsson
  2008-11-07  9:12           ` Bjørn Mork
@ 2008-11-07 11:28           ` Ilpo Järvinen
  2008-11-07 11:38             ` Mikael Abrahamsson
  1 sibling, 1 reply; 28+ messages in thread
From: Ilpo Järvinen @ 2008-11-07 11:28 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: David Miller, daniel.blueman, LKML, Netdev, linux-net

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3692 bytes --]

On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:

> On Fri, 7 Nov 2008, Ilpo Järvinen wrote:
> 
> > On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:
> >
> > > On Wed, 5 Nov 2008, David Miller wrote:
> > >
> > > > This kind of thinking just perpetuates the problem forever.
> > >
> > > It's like the TCP option order "bug", where some devices would drop the
> > > packets because of buggy implementations, that was changed in Linux to
> > > work
> > > around others buggy code, and I see "ECN blackhole detection" as a similar
> > > measure.
> >
> > That is entirely bogus claim! The different ordering of options cost us
> > nothing, while disabling ECN certainly has an innumerable cost both in
> > performance and in nobody taking the initiative which makes the situation
> > worse for everybody.
> 
> I can't comment on "ECN blackhole detection" costing or costing none since I
> haven't been able to find the discussion between Alexey Kuznetsov and Sally
> Floyd that David Miller was referring to. Anything more to go on? A direct
> link to the thread would be great.

No idea about the mail. But anyway some cost comes from the fact that 
there is no desired to fix broken things then, nor even to start doing 
compliant equipment. Thus losing the potential benefits of ECN. It
has been around for years and we're still having this discussion about 
blackhole detection being necessary to keep operating, which is 
ridicilous.

And, would there be a need for reorder the TCP headers it would certainly 
get done with all breakage associated (not very likely that need will 
arise though because those parts of the header are well utilized already). 
It would basically be the same as with such things like window scaling, 
there's no window scaling blackhole detection in kernel besides one 
manually turning it off. Would there be detection why would those window 
scaling broken devices ever get fixed (and the corresponding end hosts 
would be doomed for 64k window forever)... Not to mention other similar 
examples.

> I have sent an email (which will hopefully initiate a discussion) to a
> mailinglist populated by a lot of the operational ISP community and asked
> around about ECN and views on that. I also checked around on core router
> platforms (Cisco 12000 and Cisco CRS-1, which definitely is two of the top
> three core router platforms deployed in the world) and it seems they do not
> support ECN as far as I can discern. This pretty much in the next 5 year
> timeframe ECN widespread support in the major core ISP networks out of the
> question, leaving ECN support on the slower links where it might be deployed
> faster. I doubt it though.

I think you partially miss the point here. In many cases not every single 
router has to _support_ ECN to get its benefits, not-supporting is not the 
problem in itself (though it would be nice to get that "fixed" as well) 
but breaking ecn-enabled connections. I suppose you didn't check that 
aspect? I'd guess those mentioned devices will interoperate just fine 
since one can mostly connect ok with ecn too besides rare exceptions 
rather than things being vice-versa.

The most crucial components are anyway the points of congestion, I don't 
know enough isp topologies but I suppose those core routers are not the 
ones where towards subscribers device traffic congests?

> Now, IPv6 for me is cruicial to the continuing life and prosperity of the
> Internet (NAT is bad). ECN is "nice to have".

Sure.

> I do see Linux (and Linux users) as leader(s) in deploying new technology,
> with ECN being one of them. Question is how much hurt we're going to take for
> it.

I doubt it any worse than with eg. timestamps.


-- 
 i.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 11:28           ` Ilpo Järvinen
@ 2008-11-07 11:38             ` Mikael Abrahamsson
  2008-11-07 12:22               ` Ilpo Järvinen
  0 siblings, 1 reply; 28+ messages in thread
From: Mikael Abrahamsson @ 2008-11-07 11:38 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, daniel.blueman, LKML, Netdev, linux-net

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2221 bytes --]

On Fri, 7 Nov 2008, Ilpo Järvinen wrote:

> I think you partially miss the point here. In many cases not every 
> single router has to _support_ ECN to get its benefits, not-supporting 
> is not the problem in itself (though it would be nice to get that 
> "fixed" as well) but breaking ecn-enabled connections. I suppose you 
> didn't check that aspect? I'd guess those mentioned devices will 
> interoperate just fine since one can mostly connect ok with ecn too 
> besides rare exceptions rather than things being vice-versa.

I don't understand. My point is that most of the ISP core equipment out 
there doesn't act on ECN rendering it mostly useless. The N in ECN renders 
useless because there is no device doing the *notification*. They'll just 
pass the traffic without acting on it differently regardless if ECN is on 
or off.

> The most crucial components are anyway the points of congestion, I don't know 
> enough isp topologies but I suppose those core routers are not the ones where 
> towards subscribers device traffic congests?

There can be congestion anywhere in the network, best would be if all 
routers supported it. My problem with ECN is that the most advanced 
routers do not support it, it's useless with L2/L3 switches (as they have 
very small buffers, there is "nothing" to do WRED on), so that leaves 
potential implementation by either DSLAM/BRAS vendors (where Cisco BRAS 
does support it but it needs to be enabled by the ISP) or the SOHO devices 
which run Linux and might implement it, but I'd rather see them do active 
queue management at all (fair-queue for instance) before asking them to do 
ECN. Of course, if users start to ask for ECN and we get fair-queue at the 
same time, all the better. One very common congestion point is definitely 
the upstream connection of someones cable or DSL modem.

> I doubt it any worse than with eg. timestamps.

According to <http://www.imperialviolet.org/binary/ecntest.pdf> it's 0.5% 
of hosts that drop packets when ECN is enabled. It's a substantial part of 
the Internet. Yes, not doing blackhole detection might get these hosts 
fixed faster, but at the expense of more end user hurt.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 11:38             ` Mikael Abrahamsson
@ 2008-11-07 12:22               ` Ilpo Järvinen
  2008-11-07 13:43                 ` Daniel J Blueman
  0 siblings, 1 reply; 28+ messages in thread
From: Ilpo Järvinen @ 2008-11-07 12:22 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: David Miller, daniel.blueman, LKML, Netdev, linux-net

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3523 bytes --]

On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:

> On Fri, 7 Nov 2008, Ilpo Järvinen wrote:
> 
> > I think you partially miss the point here. In many cases not every single
> > router has to _support_ ECN to get its benefits, not-supporting is not the
> > problem in itself (though it would be nice to get that "fixed" as well) but
> > breaking ecn-enabled connections. I suppose you didn't check that aspect?
> > I'd guess those mentioned devices will interoperate just fine since one can
> > mostly connect ok with ecn too besides rare exceptions rather than things
> > being vice-versa.
> 
> I don't understand. My point is that most of the ISP core equipment out there
> doesn't act on ECN rendering it mostly useless. The N in ECN renders useless
> because there is no device doing the *notification*. They'll just pass the
> traffic without acting on it differently regardless if ECN is on or off.

Likewise, not enabling ecn renders any device doing notification useless.

One alternative to full enable would be enable it for the listening 
allowing client end to decide but that is effectively same as not enabling 
it at all (using the same logic as above).

> > The most crucial components are anyway the points of congestion, I don't
> > know enough isp topologies but I suppose those core routers are not the ones
> > where towards subscribers device traffic congests?
> 
> There can be congestion anywhere in the network, best would be if all routers
> supported it.

I agree. But...

> My problem with ECN is that the most advanced routers do not
> support it, it's useless with L2/L3 switches (as they have very small buffers,
> there is "nothing" to do WRED on), so that leaves potential implementation by
> either DSLAM/BRAS vendors (where Cisco BRAS does support it but it needs to be
> enabled by the ISP) or the SOHO devices which run Linux and might implement
> it, but I'd rather see them do active queue management at all (fair-queue for
> instance) before asking them to do ECN. Of course, if users start to ask for
> ECN and we get fair-queue at the same time, all the better. One very common
> congestion point is definitely the upstream connection of someones cable or
> DSL modem.

...I'd assume that marking on end(s) of the cable/dsl link which is often 
congested is the most low hanging fruit, and like you seem to say (if I 
understood you correctly), there exists some support already which could 
then incrementally be turned on. Just realized though that it might, after 
all, not be what isps like doing since just getting higher bw 
subscriptions is perhaps more rewarding from their perspective (please 
don't take this as an offense, it's not meant to be one :-))...

Imho, the end-users end has less gain since those packets usually travel 
just locally before getting dropped but ecn-aware streaming might change 
that as well.

> > I doubt it any worse than with eg. timestamps.
> 
> According to <http://www.imperialviolet.org/binary/ecntest.pdf> it's 0.5% of
> hosts that drop packets when ECN is enabled. It's a substantial part of the
> Internet. Yes, not doing blackhole detection might get these hosts fixed
> faster, but at the expense of more end user hurt.

Did you notice:

However, the failing hosts are not distributed randomly. The
7,627 hosts span only 4,613 /24 subnets. The top twenty
subnets account for 778 failing hosts (10%). WHOIS[1] information
for 18 of those 20 subnets suggests that they are located in China.

Just some bigger device(s) perhaps?


-- 
 i.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 12:22               ` Ilpo Järvinen
@ 2008-11-07 13:43                 ` Daniel J Blueman
  2008-11-07 14:30                   ` Ilpo Järvinen
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel J Blueman @ 2008-11-07 13:43 UTC (permalink / raw)
  To: Mikael Abrahamsson, David Miller, LKML, Netdev, linux-net,
	Ilpo Järvinen

On Fri, Nov 7, 2008 at 12:22 PM, Ilpo Järvinen
<ilpo.jarvinen@helsinki.fi> wrote:
> On Fri, 7 Nov 2008, Mikael Abrahamsson wrote:
>> On Fri, 7 Nov 2008, Ilpo Järvinen wrote:
>> > I think you partially miss the point here. In many cases not every
>> > single
>> > router has to _support_ ECN to get its benefits, not-supporting is not
>> > the
>> > problem in itself (though it would be nice to get that "fixed" as well)
>> > but
>> > breaking ecn-enabled connections. I suppose you didn't check that
>> > aspect?
>> > I'd guess those mentioned devices will interoperate just fine since one
>> > can
>> > mostly connect ok with ecn too besides rare exceptions rather than
>> > things
>> > being vice-versa.
>>
>> I don't understand. My point is that most of the ISP core equipment out
>> there
>> doesn't act on ECN rendering it mostly useless. The N in ECN renders
>> useless
>> because there is no device doing the *notification*. They'll just pass the
>> traffic without acting on it differently regardless if ECN is on or off.

I've been running with ECN enabled on all my client linux systems and
(personal) webservers for the past 6 or so years. When I've
encountered issues accessing particular hosts, I turn it and TCP
window scaling off, but invariably it is always another cause.

If most ECN-broken hardware is embedded consumer appliances (which are
generally short-lifespan and moving more and more to linux), then we
avoid hurting these users by enabling ECN per default when eg
CONFIG_IP_ADVANCED_ROUTER is set (to little direct benefit of course).
It's a start and a constructive idea; by doing this and documenting
it, we provide a wake-up call for vendors, laying the path for
enabling it for all types of host in a few years. Even enabling ECN
for -rc kernels will raise awareness.

Alternatively, an ECN-day could be publicised targeting the linux tech
community, where we can report failing networks/sites to a central
website to quantify actual potential negative impact.

But doing nothing is cyclic - when will the natural break suddenly occur?
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 11:18         ` Dave Hudson
@ 2008-11-07 14:29           ` Alan Cox
  2008-11-07 14:45             ` David Newall
  2008-11-07 14:33           ` Andi Kleen
  1 sibling, 1 reply; 28+ messages in thread
From: Alan Cox @ 2008-11-07 14:29 UTC (permalink / raw)
  To: Dave Hudson
  Cc: Ilpo Järvinen, Mikael Abrahamsson, David Miller,
	daniel.blueman, LKML, Netdev, linux-net

> Not all of the routers in question (the ones that crash, block packets 
> or otherwise misbehave) are provided by ISPs

The ones that block and misbehave are the bigger problem. The ones that
crash are less of a problem and I think they are less common. Certainly
if they were common then people would be abusing the flaw routinely.
Similar end users tend to grasp "if it keeps crashing blame the supplier".

When stuff just mysteriously doesn't work it is a whole lot more
problematic. I think however Sally Floyd had it right and Alexey has it
wrong (as does Davem).

If you turn it off on a retransmit then you provide an immediate
incentive for everyone on the web server end of the business to fix their
network. Especially if you turn it off for second retransmit. That will
cause faulty ECN handling sites to feel "a bit slow" and we know from
marketing data that web site performance is crucial to customer base. A
three or four second delay getting a page up translates into dramatically
reduced hit counts.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 13:43                 ` Daniel J Blueman
@ 2008-11-07 14:30                   ` Ilpo Järvinen
  0 siblings, 0 replies; 28+ messages in thread
From: Ilpo Järvinen @ 2008-11-07 14:30 UTC (permalink / raw)
  To: Daniel J Blueman
  Cc: Mikael Abrahamsson, David Miller, LKML, Netdev, linux-net

On Fri, 7 Nov 2008, Daniel J Blueman wrote:

> then we
> avoid hurting these users by enabling ECN per default when eg
> CONFIG_IP_ADVANCED_ROUTER is set (to little direct benefit of course).

I suppose all distros enable that anyway in generic kernels so it's not 
going to be any different from just enabling it.

> It's a start and a constructive idea; by doing this and documenting
> it, we provide a wake-up call for vendors, laying the path for
> enabling it for all types of host in a few years. Even enabling ECN
> for -rc kernels will raise awareness.
>
> Alternatively, an ECN-day could be publicised targeting the linux tech
> community, where we can report failing networks/sites to a central
> website to quantify actual potential negative impact.

This will still miss much. Eg., the ordering problems were not discovered 
afaik until 2.6.27 release, that's quite long time of testing without 
anybody noticing that hey it's broken (it might be that some distro 
circles saw this with some -rcx if they were using them but that didn't 
gain much attention until 2.6.27 was already out). And at that time 
the imminent release of Ubuntu's made the amount of testers much more 
abundant resource than with some other kernel version.

Agreed that we definately should do more than just turn it on and wait for 
troubles but educating users might turn out to be quite hard problem. 
And certainly there will be troubles as even with the most comprehensive 
attempts within linux' dev+tester community are going to leave major holes 
like was proven with the tcp option ordering saga.


-- 
 i.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 11:18         ` Dave Hudson
  2008-11-07 14:29           ` Alan Cox
@ 2008-11-07 14:33           ` Andi Kleen
  1 sibling, 0 replies; 28+ messages in thread
From: Andi Kleen @ 2008-11-07 14:33 UTC (permalink / raw)
  To: Dave Hudson
  Cc: Ilpo Järvinen, Mikael Abrahamsson, David Miller,
	daniel.blueman, LKML, Netdev, linux-net

Dave Hudson <linux-kernel@blueteddy.net> writes:
>
> Not all of the routers in question (the ones that crash, block packets
> or otherwise misbehave) are provided by ISPs - in fact a huge number
> of them are and have been sold retail.  Over time most of those boxes
> will get replaced with ones that don't have the problem because most
> (probably all major) SOHO router suppliers now test that they don't
> break with ECN so eventually there will be a point where enabling ECN
> by default will make a lot of sense (there will be too few broken
> routers to care about).

One option would be also to enable it by default for IPv6 only.

-Andi

-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 14:29           ` Alan Cox
@ 2008-11-07 14:45             ` David Newall
  2008-11-07 15:07               ` Rémi Denis-Courmont
  2008-11-07 18:38               ` John Heffner
  0 siblings, 2 replies; 28+ messages in thread
From: David Newall @ 2008-11-07 14:45 UTC (permalink / raw)
  To: Alan Cox
  Cc: Dave Hudson, Ilpo Järvinen, Mikael Abrahamsson,
	David Miller, daniel.blueman, LKML, Netdev, linux-net

Isn't this a question for the IETF to answer?  Are they saying turn on
ECN now?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 14:45             ` David Newall
@ 2008-11-07 15:07               ` Rémi Denis-Courmont
  2008-11-07 18:38               ` John Heffner
  1 sibling, 0 replies; 28+ messages in thread
From: Rémi Denis-Courmont @ 2008-11-07 15:07 UTC (permalink / raw)
  To: ext David Newall
  Cc: Alan Cox, Dave Hudson, Ilpo Järvinen, Mikael Abrahamsson,
	David Miller, daniel.blueman, LKML, Netdev, linux-net

On Friday 07 November 2008 16:45:55 ext David Newall, you wrote:
> Isn't this a question for the IETF to answer?  Are they saying turn on
> ECN now?

For what it's worth, the IESG says RFC3168 is a PROPOSED STANDARD.
This is the "entry-level maturity for the standards track":
(ftp://ftp.rfc-editor.org/in-notes/bcp/bcp9.txt)

   A Proposed Standard specification is generally stable, has resolved
   known design choices, is believed to be well-understood, has received
   significant community review, and appears to enjoy enough community
   interest to be considered valuable.  However, further experience
   might result in a change or even retraction of the specification
   before it advances.
   (...)
   Implementors should treat Proposed Standards as immature
   specifications.  It is desirable to implement them in order to gain
   experience and to validate, test, and clarify the specification.
   However, since the content of Proposed Standards may be changed if
   problems are found or better solutions are identified, deploying
   implementations of such standards into a disruption-sensitive
   environment is not recommended.

-- 
Rémi Denis-Courmont
Maemo Software, Nokia Devices R&D

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-07 14:45             ` David Newall
  2008-11-07 15:07               ` Rémi Denis-Courmont
@ 2008-11-07 18:38               ` John Heffner
  1 sibling, 0 replies; 28+ messages in thread
From: John Heffner @ 2008-11-07 18:38 UTC (permalink / raw)
  To: David Newall
  Cc: Alan Cox, Dave Hudson, Ilpo Järvinen, Mikael Abrahamsson,
	David Miller, daniel.blueman, LKML, Netdev, linux-net

The IETF has provided a spec, and additional documents on deployment
issues.  They have provided all the guidance they are going to.  It's
now up to implementers to weigh the trade-offs.

My own observations and opinions, for what they're worth:

Turning on ECN doesn't hurt as much as it used to.  Back in the early
'00s, there were a lot of devices sold especially to financial
institutions to "protect" their web sites.  These devices dropped any
packets with (previously) reserved header bits set, because some
people used these as a covert information channel.  I believe these
devices are not as common as they once were, but there are still a few
big sites that black hole these packets.  (I know that southwest.com
is still an offender.)

I have not actually heard of any issues with consumer-grade stuff, but
that may be because ECN has been disabled by default for so long.

Almost no network operators turn on ECN marking in their routers.  In
fact, almost none care to do any sort of AQM.  The practical benefits
of ECN are still somewhat unclear for most people.  For example, it
can help with latency-sensitive applications, but mostly requires a
big queue to work well, so doesn't help as much as you would hope.
There are some interesting ideas on how to better use ECN information,
but these are mostly still research.

ECN black hole detection is pretty simple, and I don't see much reason
not to do it.

  -John


On Fri, Nov 7, 2008 at 6:45 AM, David Newall <davidn@davidnewall.com> wrote:
> Isn't this a question for the IETF to answer?  Are they saying turn on
> ECN now?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-04 22:52   ` David Miller
  2008-11-05  1:16     ` Michael Chan
@ 2008-11-16  9:13     ` Herbert Xu
  2008-11-16  9:24       ` David Miller
  1 sibling, 1 reply; 28+ messages in thread
From: Herbert Xu @ 2008-11-16  9:13 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, daniel.blueman, linux-kernel, netdev, linux-net

David Miller <davem@davemloft.net> wrote:
>
> This is because TSO, which is on by default, doesn't support ECN
> in many implementations.

Dave, we fixed that ages ago.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: time for TCP ECN defaulting to on?
  2008-11-16  9:13     ` Herbert Xu
@ 2008-11-16  9:24       ` David Miller
  0 siblings, 0 replies; 28+ messages in thread
From: David Miller @ 2008-11-16  9:24 UTC (permalink / raw)
  To: herbert; +Cc: linux-kernel, daniel.blueman, linux-kernel, netdev, linux-net

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 16 Nov 2008 17:13:06 +0800

> David Miller <davem@davemloft.net> wrote:
> >
> > This is because TSO, which is on by default, doesn't support ECN
> > in many implementations.
> 
> Dave, we fixed that ages ago.

I know, Michael Chan corrected me in a followup posting :)

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2008-11-16  9:24 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-04 14:32 time for TCP ECN defaulting to on? Daniel J Blueman
2008-11-04 16:16 ` Dave Hudson
2008-11-04 22:52   ` David Miller
2008-11-05  1:16     ` Michael Chan
2008-11-05  5:58       ` David Miller
2008-11-05  6:31         ` Michael Chan
2008-11-05  7:29           ` David Miller
2008-11-16  9:13     ` Herbert Xu
2008-11-16  9:24       ` David Miller
2008-11-05 22:20 ` Mikael Abrahamsson
2008-11-05 23:10   ` David Miller
2008-11-07  4:46     ` Mikael Abrahamsson
2008-11-07  4:49       ` David Miller
2008-11-07  7:53       ` Ilpo Järvinen
2008-11-07  8:11         ` Mikael Abrahamsson
2008-11-07  9:12           ` Bjørn Mork
2008-11-07 11:16             ` Mikael Abrahamsson
2008-11-07 11:28           ` Ilpo Järvinen
2008-11-07 11:38             ` Mikael Abrahamsson
2008-11-07 12:22               ` Ilpo Järvinen
2008-11-07 13:43                 ` Daniel J Blueman
2008-11-07 14:30                   ` Ilpo Järvinen
2008-11-07 11:18         ` Dave Hudson
2008-11-07 14:29           ` Alan Cox
2008-11-07 14:45             ` David Newall
2008-11-07 15:07               ` Rémi Denis-Courmont
2008-11-07 18:38               ` John Heffner
2008-11-07 14:33           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).