LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Re: serial flow control appears broken
[not found] <fa.Z6O0xFRT69zes0Mg+agt3Uiwux4@ifi.uio.no>
@ 2007-07-26 7:20 ` Robert Hancock
2007-07-26 16:08 ` Lee Howard
0 siblings, 1 reply; 50+ messages in thread
From: Robert Hancock @ 2007-07-26 7:20 UTC (permalink / raw)
To: Lee Howard; +Cc: linux-serial, tytso, rmk, linux-kernel
Lee Howard wrote:
> Hello.
>
> I have fax modems that will, in their proper behavior with certain
> features, send up to 64 kilobytes of data to the host DTE all at once.
> (So, the fax modem handles an incoming fax and periodically will send
> between 256 bytes and 64 kilobytes of data in bursts.)
>
> When the DCE-DTE (modem-to-host) communication rate is established at
> 115200 bps data loss occurs systems using at least Linux kernels 2.6.5
> and 2.6.18 (and probably everything in-beween and then some more). This
> is because the modem overflows the host's buffer. This is evidenced in
> kernel logging:
>
> Jul 23 14:01:30 gollum kernel: ttyS1: 1 input overrun(s)
> Jul 23 17:09:45 gollum kernel: ttyS1: 1 input overrun(s)
>
> Normally I would blame the modem itself for not honoring the host's flow
> control signals. However, I have worked with the modem manufacturer
> closely on this matter for over three months now. In that process they
> have improved the responsiveness of the modem and have fixed other
> problems, but the end result is that it truly does appear that the
> serial tty driver is not using flow control. Whether software flow
> control (XON/XOFF) or hardware flow control (RTS/CTS) is used the result
> is the same.
>
> This is evidenced in hardware flow control by a little LED labeled "RTS"
> that is on the external modem. This LED lights up when pin 7 of the DB9
> serial connection is given +12Vdc current (signalling "RTS" is on - that
> the host can accept data). The LED goes dark when the current is
> removed (signalling that the host cannot accept data). This "RTS" LED
> never flickers at all, as it should, when receiving these bursts of data
> - the LED stays lit as long as the serial cable is connected to the
> host... and yet I will see those "input overrun" messages. Thus, it
> seems quite clear that the Linux serial tty driver is not deasserting
> RTS as it should in hardware flow control. (And probably the analogous
> problem exists in software flow control, too.)
>
> Please tell me what I can do to help you resove and/or remedy this
> matter. Also, please let me know if I have contacted the wrong people.
> (I have cross-posted to linux-kernel as a catch-all. I am not
> subscribed to either linux-serial or linux-kernel mailing lists. So
> please CC me in any list responses.)
>
> If it is of any value to know (perhaps they have common code?), the same
> error occurs on FreeBSD 6.2 as well. The problem does not occur on
> Windows. The problem does not occur on RedHat 6.0 (kernel 2.2.5).
What kind of serial port and machine is this on? From what I can see, a
standard 16550 UART (not a special variant) just doesn't have any
support for clearing RTS on its own when its input FIFO gets too full.
The kernel would have to do it in that case. I'm not seeing where it
would be controlling that automatically (as opposed to manually from the
application with TIOCM_RTS). I'm also not sure if the UART gives the
kernel enough information for it to even be able to control this line
properly automatically.
That's assuming it actually is a 16550 or similar with a 16-byte FIFO at
all, which assuming it's a non-ancient PC it should be, but who knows.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-26 7:20 ` serial flow control appears broken Robert Hancock
@ 2007-07-26 16:08 ` Lee Howard
2007-07-26 16:31 ` Alan Cox
2007-07-27 11:32 ` Maciej W. Rozycki
0 siblings, 2 replies; 50+ messages in thread
From: Lee Howard @ 2007-07-26 16:08 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-serial, tytso, rmk, linux-kernel
Robert Hancock wrote:
> Lee Howard wrote:
>
>> Hello.
>>
>> I have fax modems that will, in their proper behavior with certain
>> features, send up to 64 kilobytes of data to the host DTE all at
>> once. (So, the fax modem handles an incoming fax and periodically
>> will send between 256 bytes and 64 kilobytes of data in bursts.)
>>
>> When the DCE-DTE (modem-to-host) communication rate is established at
>> 115200 bps data loss occurs systems using at least Linux kernels
>> 2.6.5 and 2.6.18 (and probably everything in-beween and then some
>> more). This is because the modem overflows the host's buffer. This
>> is evidenced in kernel logging:
>>
>> Jul 23 14:01:30 gollum kernel: ttyS1: 1 input overrun(s)
>> Jul 23 17:09:45 gollum kernel: ttyS1: 1 input overrun(s)
>>
>> Normally I would blame the modem itself for not honoring the host's
>> flow control signals. However, I have worked with the modem
>> manufacturer closely on this matter for over three months now. In
>> that process they have improved the responsiveness of the modem and
>> have fixed other problems, but the end result is that it truly does
>> appear that the serial tty driver is not using flow control. Whether
>> software flow control (XON/XOFF) or hardware flow control (RTS/CTS)
>> is used the result is the same.
>>
>> This is evidenced in hardware flow control by a little LED labeled
>> "RTS" that is on the external modem. This LED lights up when pin 7
>> of the DB9 serial connection is given +12Vdc current (signalling
>> "RTS" is on - that the host can accept data). The LED goes dark when
>> the current is removed (signalling that the host cannot accept
>> data). This "RTS" LED never flickers at all, as it should, when
>> receiving these bursts of data - the LED stays lit as long as the
>> serial cable is connected to the host... and yet I will see those
>> "input overrun" messages. Thus, it seems quite clear that the Linux
>> serial tty driver is not deasserting RTS as it should in hardware
>> flow control. (And probably the analogous problem exists in software
>> flow control, too.)
>>
>> Please tell me what I can do to help you resove and/or remedy this
>> matter. Also, please let me know if I have contacted the wrong
>> people. (I have cross-posted to linux-kernel as a catch-all. I am
>> not subscribed to either linux-serial or linux-kernel mailing lists.
>> So please CC me in any list responses.)
>>
>> If it is of any value to know (perhaps they have common code?), the
>> same error occurs on FreeBSD 6.2 as well. The problem does not
>> occur on Windows. The problem does not occur on RedHat 6.0 (kernel
>> 2.2.5).
>
>
> What kind of serial port and machine is this on? From what I can see,
> a standard 16550 UART (not a special variant) just doesn't have any
> support for clearing RTS on its own when its input FIFO gets too full.
> The kernel would have to do it in that case. I'm not seeing where it
> would be controlling that automatically (as opposed to manually from
> the application with TIOCM_RTS). I'm also not sure if the UART gives
> the kernel enough information for it to even be able to control this
> line properly automatically.
>
> That's assuming it actually is a 16550 or similar with a 16-byte FIFO
> at all, which assuming it's a non-ancient PC it should be, but who knows.
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
It's a Shuttle HOT-661 motherboard (VIA Apollo Pro Plus mainboard
chipset). Both FreeBSD and Linux identify the serial chipset type as
16550A.
If the application were to use TIOCM_RTS how would it know when to apply
it or not? Is there some approach that the application could take to
manage flow control on the serial port? What about software flow
control? Does the application (and not the driver) need to be managing
the DC1/DC3 signalling on the host-side?
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-26 16:08 ` Lee Howard
@ 2007-07-26 16:31 ` Alan Cox
2007-07-27 5:53 ` Lee Howard
2007-07-27 11:32 ` Maciej W. Rozycki
1 sibling, 1 reply; 50+ messages in thread
From: Alan Cox @ 2007-07-26 16:31 UTC (permalink / raw)
To: Lee Howard; +Cc: Robert Hancock, linux-serial, tytso, rmk, linux-kernel
> Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled
> ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
> ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
>
> It's a Shuttle HOT-661 motherboard (VIA Apollo Pro Plus mainboard
> chipset). Both FreeBSD and Linux identify the serial chipset type as
> 16550A.
So you've got 16bytes of buffering. That ought to be enough on a modern
PC. The older kernels use quite limited internal buffers which may be a
factor, the current ones have a rewritten tty buffering layer which may
improve matters enormously.
Alan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-26 16:31 ` Alan Cox
@ 2007-07-27 5:53 ` Lee Howard
2007-07-27 13:45 ` Tilman Schmidt
0 siblings, 1 reply; 50+ messages in thread
From: Lee Howard @ 2007-07-27 5:53 UTC (permalink / raw)
To: Alan Cox; +Cc: Robert Hancock, linux-serial, tytso, rmk, linux-kernel
Alan Cox wrote:
>>Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled
>>ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
>>ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
>>
>>It's a Shuttle HOT-661 motherboard (VIA Apollo Pro Plus mainboard
>>chipset). Both FreeBSD and Linux identify the serial chipset type as
>>16550A.
>>
>>
>
>So you've got 16bytes of buffering. That ought to be enough on a modern
>PC. The older kernels use quite limited internal buffers which may be a
>factor, the current ones have a rewritten tty buffering layer which may
>improve matters enormously.
>
So, does this explain why I wouldn't have a problem at 115200 bps with
kernel 2.2.5 but why I do with 2.6.5 and 2.6.18? Both hardware and
software flow control work fine with 2.2.5 (meaning I don't see any
error message and I don't have any data corruption), but neither works
to avoid the "kernel: ttyS1: 1 input overrun(s)" and consequent data
corruption issue in 2.6.5 nor 2.6.18.
Was there some associated application change in tty handling that needed
to occur between the 2.2 and 2.6 kernels to properly implement flow control?
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-26 16:08 ` Lee Howard
2007-07-26 16:31 ` Alan Cox
@ 2007-07-27 11:32 ` Maciej W. Rozycki
2007-07-27 17:11 ` Lee Howard
1 sibling, 1 reply; 50+ messages in thread
From: Maciej W. Rozycki @ 2007-07-27 11:32 UTC (permalink / raw)
To: Lee Howard; +Cc: Robert Hancock, linux-serial, tytso, rmk, linux-kernel
On Thu, 26 Jul 2007, Lee Howard wrote:
> If the application were to use TIOCM_RTS how would it know when to apply it or
> not? Is there some approach that the application could take to manage flow
> control on the serial port? What about software flow control? Does the
Well, an application could negate RTS when it receives a character and
is running out of resources for further processing of incoming data.
Smarter UARTs may be able to negate RTS themselves based on the amount of
data in their receive FIFO. The threshold may be configurable.
Maciej
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 5:53 ` Lee Howard
@ 2007-07-27 13:45 ` Tilman Schmidt
2007-07-27 20:05 ` Lee Howard
2007-08-27 20:38 ` Paul Fulghum
0 siblings, 2 replies; 50+ messages in thread
From: Tilman Schmidt @ 2007-07-27 13:45 UTC (permalink / raw)
To: Lee Howard
Cc: Alan Cox, Robert Hancock, linux-serial, tytso, rmk, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]
Lee Howard schrieb:
>
> So, does this explain why I wouldn't have a problem at 115200 bps with
> kernel 2.2.5 but why I do with 2.6.5 and 2.6.18? Both hardware and
> software flow control work fine with 2.2.5 (meaning I don't see any
> error message and I don't have any data corruption), but neither works
> to avoid the "kernel: ttyS1: 1 input overrun(s)" and consequent data
> corruption issue in 2.6.5 nor 2.6.18.
>
> Was there some associated application change in tty handling that needed
> to occur between the 2.2 and 2.6 kernels to properly implement flow control?
Could this be related?
http://lkml.org/lkml/2007/7/18/245
Quote:
"I've recently found (using 2.6.21.4) that configuring a serial ports
(ST16654) which use the 8250 driver using setserial results in the
UART's FIFOs being disabled (unless you specify autoconfig)."
--
Tilman Schmidt E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 11:32 ` Maciej W. Rozycki
@ 2007-07-27 17:11 ` Lee Howard
2007-07-27 17:41 ` Alan Cox
2007-07-27 17:53 ` Maciej W. Rozycki
0 siblings, 2 replies; 50+ messages in thread
From: Lee Howard @ 2007-07-27 17:11 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Robert Hancock, linux-serial, tytso, rmk, linux-kernel
Maciej W. Rozycki wrote:
>On Thu, 26 Jul 2007, Lee Howard wrote:
>
>
>
>>If the application were to use TIOCM_RTS how would it know when to apply it or
>>not? Is there some approach that the application could take to manage flow
>>control on the serial port? What about software flow control? Does the
>>
>>
>
> Well, an application could negate RTS when it receives a character and
>is running out of resources for further processing of incoming data.
>
> Smarter UARTs may be able to negate RTS themselves based on the amount of
>data in their receive FIFO. The threshold may be configurable.
>
Okay, so let's say we've got a loop around a blocking read on the modem
file descriptor...
for (;;) {
read some data from modem
process data from modem
if (end-of-data detected) break;
}
Are you suggesting that the application should be using deasserting RTS
after the read and asserting it before?
I had previously thought that the control of RTS was something that the
serial/tty driver was supposed to do independently based on the buffer
fill. Was I wrong?
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 17:11 ` Lee Howard
@ 2007-07-27 17:41 ` Alan Cox
2007-07-27 17:53 ` Maciej W. Rozycki
1 sibling, 0 replies; 50+ messages in thread
From: Alan Cox @ 2007-07-27 17:41 UTC (permalink / raw)
To: Lee Howard
Cc: Maciej W. Rozycki, Robert Hancock, linux-serial, tytso, rmk,
linux-kernel
> I had previously thought that the control of RTS was something that the
> serial/tty driver was supposed to do independently based on the buffer
> fill. Was I wrong?
If the kernel is asked to do CRTSCTS then the kernel handles the flow
control. It uses it when the internal buffers are nearly full.
The direct access to the lines is normally only used by special drivers
such as half duplex radio modem drivers.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 17:11 ` Lee Howard
2007-07-27 17:41 ` Alan Cox
@ 2007-07-27 17:53 ` Maciej W. Rozycki
2007-07-27 18:11 ` Lee Howard
` (2 more replies)
1 sibling, 3 replies; 50+ messages in thread
From: Maciej W. Rozycki @ 2007-07-27 17:53 UTC (permalink / raw)
To: Lee Howard; +Cc: Robert Hancock, linux-serial, tytso, rmk, linux-kernel
On Fri, 27 Jul 2007, Lee Howard wrote:
> Okay, so let's say we've got a loop around a blocking read on the modem file
> descriptor...
>
> for (;;) {
> read some data from modem
> process data from modem
> if (end-of-data detected) break;
> }
>
> Are you suggesting that the application should be using deasserting RTS after
> the read and asserting it before?
It certainly could -- you were asking how it would know. ;-)
> I had previously thought that the control of RTS was something that the
> serial/tty driver was supposed to do independently based on the buffer fill.
The TTY line discipline driver could do that based on the amount of
received data present in its buffer. And it should if asked to (a brief
look at drivers/char/n_tty.c reveals it does; obviously there may be a bug
somewhere though). So could e.g. the SLIP and PPP line discipline
drivers, though the criteria might be different (apparently they do not,
which is a shame).
The serial drivers have nothing to do about it -- all they can do is
pushing data upstream, to the discipline driver. They can provide an
interface to hardware flow control features though, if implemented by a
given UART.
Maciej
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 17:53 ` Maciej W. Rozycki
@ 2007-07-27 18:11 ` Lee Howard
2007-07-30 9:36 ` Maciej W. Rozycki
2007-07-27 18:22 ` Robert Hancock
2007-08-04 18:19 ` Lee Howard
2 siblings, 1 reply; 50+ messages in thread
From: Lee Howard @ 2007-07-27 18:11 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Robert Hancock, linux-serial, tytso, rmk, linux-kernel
Maciej W. Rozycki wrote:
> The TTY line discipline driver could do that based on the amount of
>received data present in its buffer. And it should if asked to (a brief
>look at drivers/char/n_tty.c reveals it does; obviously there may be a bug
>somewhere though). So could e.g. the SLIP and PPP line discipline
>drivers, though the criteria might be different (apparently they do not,
>which is a shame).
>
> The serial drivers have nothing to do about it -- all they can do is
>pushing data upstream, to the discipline driver. They can provide an
>interface to hardware flow control features though, if implemented by a
>given UART.
>
Thank you for this clarification. So I should have more correctly been
saying that "tty flow control appears broken". Right?
I've asked the manufacturer to take a look at drivers/char/n_tty.c to
see if they can't see anything obvious.
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 17:53 ` Maciej W. Rozycki
2007-07-27 18:11 ` Lee Howard
@ 2007-07-27 18:22 ` Robert Hancock
2007-07-27 18:46 ` Paul Fulghum
` (4 more replies)
2007-08-04 18:19 ` Lee Howard
2 siblings, 5 replies; 50+ messages in thread
From: Robert Hancock @ 2007-07-27 18:22 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Lee Howard, linux-serial, tytso, rmk, linux-kernel
Maciej W. Rozycki wrote:
> On Fri, 27 Jul 2007, Lee Howard wrote:
>
>> Okay, so let's say we've got a loop around a blocking read on the modem file
>> descriptor...
>>
>> for (;;) {
>> read some data from modem
>> process data from modem
>> if (end-of-data detected) break;
>> }
>>
>> Are you suggesting that the application should be using deasserting RTS after
>> the read and asserting it before?
>
> It certainly could -- you were asking how it would know. ;-)
>
>> I had previously thought that the control of RTS was something that the
>> serial/tty driver was supposed to do independently based on the buffer fill.
>
> The TTY line discipline driver could do that based on the amount of
> received data present in its buffer. And it should if asked to (a brief
> look at drivers/char/n_tty.c reveals it does; obviously there may be a bug
Really, where? In my look through the code I haven't found any mechanism
that would result in RTS being lowered based on TTY buffers filling up,
at least not in the 8250 case.
In this situation, though, it appears it's not the TTY buffers that are
filling but the UART's own buffer. I would think this must be caused by
some kind of interrupt latency that results in not draining the FIFO in
time.
> somewhere though). So could e.g. the SLIP and PPP line discipline
> drivers, though the criteria might be different (apparently they do not,
> which is a shame).
>
> The serial drivers have nothing to do about it -- all they can do is
> pushing data upstream, to the discipline driver. They can provide an
> interface to hardware flow control features though, if implemented by a
> given UART.
>
> Maciej
>
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 18:22 ` Robert Hancock
@ 2007-07-27 18:46 ` Paul Fulghum
2007-07-27 19:05 ` Paul Fulghum
` (3 subsequent siblings)
4 siblings, 0 replies; 50+ messages in thread
From: Paul Fulghum @ 2007-07-27 18:46 UTC (permalink / raw)
To: Robert Hancock
Cc: Maciej W. Rozycki, Lee Howard, linux-serial, tytso, rmk, linux-kernel
On Fri, 2007-07-27 at 12:22 -0600, Robert Hancock wrote:
> Maciej W. Rozycki wrote:
> > The TTY line discipline driver could do that based on the amount of
> > received data present in its buffer. And it should if asked to (a brief
> > look at drivers/char/n_tty.c reveals it does; obviously there may be a bug
>
> Really, where? In my look through the code I haven't found any mechanism
> that would result in RTS being lowered based on TTY buffers filling up,
> at least not in the 8250 case.
serial_core.c:uart_throttle()
serial_core.c:uart_unthrottle()
These are called by N_TTY in response to buffer levels.
--
Paul Fulghum
Microgate Systems, Ltd
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 18:22 ` Robert Hancock
2007-07-27 18:46 ` Paul Fulghum
@ 2007-07-27 19:05 ` Paul Fulghum
2007-07-30 9:39 ` Maciej W. Rozycki
2007-07-27 19:14 ` Paul Fulghum
` (2 subsequent siblings)
4 siblings, 1 reply; 50+ messages in thread
From: Paul Fulghum @ 2007-07-27 19:05 UTC (permalink / raw)
To: Robert Hancock
Cc: Maciej W. Rozycki, Lee Howard, linux-serial, tytso, rmk, linux-kernel
On Fri, 2007-07-27 at 12:22 -0600, Robert Hancock wrote:
> In this situation, though, it appears it's not the TTY buffers that are
> filling but the UART's own buffer. I would think this must be caused by
> some kind of interrupt latency that results in not draining the FIFO in
> time.
You are right, this error is output when the character flag TTY_OVERRUN
is encountered by n_tty.c which should be set by the driver
in response to a hardware FIFO overrun (not an ldisc buffer overrun).
I can't see anyplace in serial_core.c or 8250.c that sets TTY_OVERRUN.
--
Paul Fulghum
Microgate Systems, Ltd
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 18:22 ` Robert Hancock
2007-07-27 18:46 ` Paul Fulghum
2007-07-27 19:05 ` Paul Fulghum
@ 2007-07-27 19:14 ` Paul Fulghum
2007-07-28 9:28 ` Russell King
2007-07-30 9:34 ` Maciej W. Rozycki
4 siblings, 0 replies; 50+ messages in thread
From: Paul Fulghum @ 2007-07-27 19:14 UTC (permalink / raw)
To: Robert Hancock
Cc: Maciej W. Rozycki, Lee Howard, linux-serial, tytso, rmk, linux-kernel
OK, I see where TTY_OVERRUN is set:
include/linux/serial_core.h:uart_insert_char()
--
Paul Fulghum
Microgate Systems, Ltd
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 13:45 ` Tilman Schmidt
@ 2007-07-27 20:05 ` Lee Howard
2007-08-27 20:38 ` Paul Fulghum
1 sibling, 0 replies; 50+ messages in thread
From: Lee Howard @ 2007-07-27 20:05 UTC (permalink / raw)
To: Tilman Schmidt
Cc: Alan Cox, Robert Hancock, linux-serial, tytso, rmk, linux-kernel
Tilman Schmidt wrote:
>Lee Howard schrieb:
>
>
>>So, does this explain why I wouldn't have a problem at 115200 bps with
>>kernel 2.2.5 but why I do with 2.6.5 and 2.6.18? Both hardware and
>>software flow control work fine with 2.2.5 (meaning I don't see any
>>error message and I don't have any data corruption), but neither works
>>to avoid the "kernel: ttyS1: 1 input overrun(s)" and consequent data
>>corruption issue in 2.6.5 nor 2.6.18.
>>
>>Was there some associated application change in tty handling that needed
>>to occur between the 2.2 and 2.6 kernels to properly implement flow control?
>>
>>
>
>Could this be related?
>
>http://lkml.org/lkml/2007/7/18/245
>
>Quote:
>"I've recently found (using 2.6.21.4) that configuring a serial ports
>(ST16654) which use the 8250 driver using setserial results in the
>UART's FIFOs being disabled (unless you specify autoconfig)."
>
>
I'm not running setserial on the port, myself. But to test to see if it
is related, I included this code in the application:
#include <linux/serial.h>
....
struct serial_struct serial;
ioctl(modemFd, TIOCGSERIAL, &serial);
traceModemOp("modem xmit_fifo_size: %u", serial.xmit_fifo_size);
And I get this resulting logging:
"MODEM modem xmit_fifo_size: 16"
So it's clear from here that the xmit_fifo_size is set correctly on this
system.
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-27 20:38 ` Paul Fulghum
@ 2007-07-27 20:48 ` Lee Howard
2007-07-27 23:28 ` Paul Fulghum
0 siblings, 1 reply; 50+ messages in thread
From: Lee Howard @ 2007-07-27 20:48 UTC (permalink / raw)
To: Paul Fulghum
Cc: Tilman Schmidt, Alan Cox, Robert Hancock, linux-serial, tytso,
rmk, linux-kernel
Paul Fulghum wrote:
> Tilman Schmidt wrote:
>
>> Could this be related?
>>
>> http://lkml.org/lkml/2007/7/18/245
>>
>> Quote:
>> "I've recently found (using 2.6.21.4) that configuring a serial ports
>> (ST16654) which use the 8250 driver using setserial results in the
>> UART's FIFOs being disabled (unless you specify autoconfig)."
>
>
> That would make sense.
>
> Lee's error is a hardware FIFO overrun which could occur
> if the FIFO is being disabled as described in your
> link (by trying to set the uart type with setserial).
I'm not using setserial on this port, myself. If something in init is
calling on setserial then I don't know about it.
That said, tests on the serial port from within the application show
that xmit_fifo_size is set to 16 as it should be.
I wrote up a little test app:
struct serial_struct serial;
ioctl(modemFd, TIOCGSERIAL, &serial);
printf(" type: %d\n", serial.type);
printf(" line: %d\n", serial.line);
printf(" line: %u\n", serial.port);
printf(" irq: %d\n", serial.irq);
printf(" flags: %d\n", serial.flags);
printf(" xmit_fifo_size: %d\n", serial.xmit_fifo_size);
printf(" custom_divisor: %d\n", serial.custom_divisor);
printf(" baud_base: %d\n", serial.baud_base);
printf(" close_delay: %u\n", serial.close_delay);
printf(" io_type: 0x%X\n", serial.io_type);
printf("reserved_char[0]: 0x%X\n", serial.reserved_char[0]);
printf(" hub6: %d\n", serial.hub6);
printf(" closing_wait: %u\n", serial.closing_wait);
printf(" closing_wait2: %u\n", serial.closing_wait2);
printf(" iomem_reg_shift: %u\n", serial.iomem_reg_shift);
printf(" port_high: %u\n", serial.port_high);
printf(" reserved[0]: %d\n", serial.reserved[0]);
Here's the output:
type: 4
line: 1
line: 760
irq: 3
flags: 1358954688
xmit_fifo_size: 16
custom_divisor: 0
baud_base: 115200
close_delay: 500
io_type: 0x0
reserved_char[0]: 0x0
hub6: 0
closing_wait: 30000
closing_wait2: 0
iomem_reg_shift: 0
port_high: 0
reserved[0]: 0
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 20:48 ` Lee Howard
@ 2007-07-27 23:28 ` Paul Fulghum
2007-07-28 4:51 ` Lee Howard
0 siblings, 1 reply; 50+ messages in thread
From: Paul Fulghum @ 2007-07-27 23:28 UTC (permalink / raw)
To: Lee Howard
Cc: Tilman Schmidt, Alan Cox, Robert Hancock, linux-serial, tytso,
rmk, linux-kernel
On Fri, 2007-07-27 at 13:48 -0700, Lee Howard wrote:
> Here's the output:
>
> type: 4
> line: 1
> line: 760
> irq: 3
> flags: 1358954688
> xmit_fifo_size: 16
> custom_divisor: 0
> baud_base: 115200
OK, the FIFO should be enabled.
What is known:
* The error is a hardware FIFO overrun.
- observed message is in n_tty due to driver setting TTY_OVERRUN
* The RTS/CTS flow control is not involved
- this is done only by the ldisc in response to buffer levels
- you verified crtscts is set
- you did not observed RTS change when 'overflow error' logged
- you did observe RTS change when application stopped reading
So this seems to be a latency issue reading the receive
FIFO in the ISR. The current rx FIFO trigger level
should be 8 bytes (UART_FCR_R_TRIG_10) which gives the
ISR 694usec to get the data at 115200bps.
IIRC, in 2.2.X kernels this defaulted to 4 bytes
(TRIG_01) which gave a little more time to service the interrupt.
How does the data rate affect the frequency of the overrun errors?
Does 57600bps make them go away?
--
Paul
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 23:28 ` Paul Fulghum
@ 2007-07-28 4:51 ` Lee Howard
2007-07-28 9:18 ` Russell King
` (2 more replies)
0 siblings, 3 replies; 50+ messages in thread
From: Lee Howard @ 2007-07-28 4:51 UTC (permalink / raw)
To: Paul Fulghum
Cc: Tilman Schmidt, Alan Cox, Robert Hancock, linux-serial, tytso,
rmk, linux-kernel
Paul Fulghum wrote:
>So this seems to be a latency issue reading the receive
>FIFO in the ISR. The current rx FIFO trigger level
>should be 8 bytes (UART_FCR_R_TRIG_10) which gives the
>ISR 694usec to get the data at 115200bps.
>
>IIRC, in 2.2.X kernels this defaulted to 4 bytes
>(TRIG_01) which gave a little more time to service the interrupt.
>
>How does the data rate affect the frequency of the overrun errors?
>Does 57600bps make them go away?
>
>
The overrun error message does not occur on every instance of data
corruption. (I just became aware of this as I've not been paying so
much attention to the error messages as I have been to the corrupt
data.) The data gets far more corrupted than the error messages would
lead me to believe. Since the data being sent from the fax modem to the
host is identical (same image data) every time it's easier for me to
measure the effect of one bitrate over another by examining the number
of missing bytes from the data.
The image has a total of 140465 bytes. Just now I sent it 5 times each
at 115200, 57600, 38400, and 19200 bps.
At 115200 bps the number of bytes skipped were: 63, 5, 44, 48, and 2.
At 57600 bps the number of bytes skipped were: 0, 1, 13, 9, and 12.
At 38400 bps the number of bytes skipped were 858, 0, 0, 0, and 8.
At 19200 bps the number of bytes skipped were 0, 0, 0, 0, and 0.
Curiously, the session at 38400 bps that skipped 858 bytes... coincided,
not just in sequence but also in precice timing within the session, with
a small but noticeable disk load that I caused by grepping through a
hundred session logs. (I can't reproduce it easily, though, because of
disk caching.)
And, perhaps this is relevant... the way that I have the fax modem
sending the data to the host is by receiving it from another fax modem
which is sending it. Thus, the modem on ttyS0 is sending a fax to the
modem on ttyS1. Due to the error correction protocol that is performed
between the two fax endpoints I can guarantee that the data is correct
as it leaves the DCE. I mention this in case there is any limitation to
how the 8250 driver performs when two modems are being run simultaneously.
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-28 4:51 ` Lee Howard
@ 2007-07-28 9:18 ` Russell King
2007-07-28 12:00 ` Alan Cox
2007-07-28 16:41 ` Ray Lee
2 siblings, 0 replies; 50+ messages in thread
From: Russell King @ 2007-07-28 9:18 UTC (permalink / raw)
To: Lee Howard
Cc: Paul Fulghum, Tilman Schmidt, Alan Cox, Robert Hancock,
linux-serial, tytso, linux-kernel
On Fri, Jul 27, 2007 at 09:51:25PM -0700, Lee Howard wrote:
> Curiously, the session at 38400 bps that skipped 858 bytes... coincided,
> not just in sequence but also in precice timing within the session, with
> a small but noticeable disk load that I caused by grepping through a
> hundred session logs. (I can't reproduce it easily, though, because of
> disk caching.)
If you have other parts of the system which run with IRQs disabled for
a significant time period, then you will get serial corruption. That's
not the serial driver's fault - that's a problem with the other device
drivers/rest of the system.
You may be table to track down where IRQs are being held off for too long
by hooking into the 8250 interrupt handler, and when an overrun error is
reported, printk a _minimal_ message reporting the instruction pointer
obtained via get_irq_regs().
Note, however, that I don't actively maintain serial anymore.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 18:22 ` Robert Hancock
` (2 preceding siblings ...)
2007-07-27 19:14 ` Paul Fulghum
@ 2007-07-28 9:28 ` Russell King
2007-07-30 9:45 ` Maciej W. Rozycki
2007-07-30 9:34 ` Maciej W. Rozycki
4 siblings, 1 reply; 50+ messages in thread
From: Russell King @ 2007-07-28 9:28 UTC (permalink / raw)
To: Robert Hancock
Cc: Maciej W. Rozycki, Lee Howard, linux-serial, tytso, linux-kernel
On Fri, Jul 27, 2007 at 12:22:57PM -0600, Robert Hancock wrote:
> Maciej W. Rozycki wrote:
> > The TTY line discipline driver could do that based on the amount of
> >received data present in its buffer. And it should if asked to (a brief
> >look at drivers/char/n_tty.c reveals it does; obviously there may be a bug
>
> Really, where? In my look through the code I haven't found any mechanism
> that would result in RTS being lowered based on TTY buffers filling up,
> at least not in the 8250 case.
That's something for the line discipline to decide.
> In this situation, though, it appears it's not the TTY buffers that are
> filling but the UART's own buffer. I would think this must be caused by
> some kind of interrupt latency that results in not draining the FIFO in
> time.
Correct, and suggested approach to tracking down the culpret has been
mentioned in a previous email.
Also note that there's nothing the serial driver can do to detect this
condition before it occurs. The problem occurs because the serial driver
is starved of CPU time due to other parts of the system, and the driver
has precisely zero knowledge as to when that's going to happen.
There are two possible scenarios when such starvation can occur:
1. interrupts are disabled for a long period.
2. the serial interrupt has started to run, but has been interrupted
by _another_ interrupt which runs for a long period.
Essentially, any complex interrupt handler (such as an IDE interrupt
doing a multi-sector PIO transfer _in interrupt context_) can cause this
kind of starvation. That's why Linux 1.x had bottom halves - so that
the time consuming work could be moved out of the interrupt handler,
thereby causing minimal the blockage of other interrupts.
Unfortunately, that kind of design has been long since forgotten.
Apparantly modern machines are fast enough that it doesn't have to be
worried about anymore... Or are they?
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-28 4:51 ` Lee Howard
2007-07-28 9:18 ` Russell King
@ 2007-07-28 12:00 ` Alan Cox
2007-07-28 15:39 ` Lee Howard
2007-07-28 16:41 ` Ray Lee
2 siblings, 1 reply; 50+ messages in thread
From: Alan Cox @ 2007-07-28 12:00 UTC (permalink / raw)
To: Lee Howard
Cc: Paul Fulghum, Tilman Schmidt, Robert Hancock, linux-serial,
tytso, rmk, linux-kernel
> Curiously, the session at 38400 bps that skipped 858 bytes... coincided,
> not just in sequence but also in precice timing within the session, with
> a small but noticeable disk load that I caused by grepping through a
> hundred session logs. (I can't reproduce it easily, though, because of
> disk caching.)
Can you send me a dmesg, there are some cases when high disk load can
cause high interrupt latency in both 2.2 and 2.6 depending upon what is
configured. I don't think thats related to the main problem but it is
worth knowing about hdparm -u1
> as it leaves the DCE. I mention this in case there is any limitation to
> how the 8250 driver performs when two modems are being run simultaneously.
It means more load but that shouldn't matter much, and the transmit side
if under load with asynchronous traffic will not lose bytes sending.
Alan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-28 12:00 ` Alan Cox
@ 2007-07-28 15:39 ` Lee Howard
0 siblings, 0 replies; 50+ messages in thread
From: Lee Howard @ 2007-07-28 15:39 UTC (permalink / raw)
To: Alan Cox
Cc: Paul Fulghum, Tilman Schmidt, Robert Hancock, linux-serial,
tytso, rmk, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]
Alan Cox wrote:
>>Curiously, the session at 38400 bps that skipped 858 bytes... coincided,
>>not just in sequence but also in precice timing within the session, with
>>a small but noticeable disk load that I caused by grepping through a
>>hundred session logs. (I can't reproduce it easily, though, because of
>>disk caching.)
>>
>>
>
>Can you send me a dmesg, there are some cases when high disk load can
>cause high interrupt latency in both 2.2 and 2.6 depending upon what is
>configured.
>
I've attached dmesg output. The os version I used yesterday to run
those tests was Debian 4.0r0 (kernel 2.6.18-4-686). It's still running,
and that's where I give you this dmesg output from.
> I don't think thats related to the main problem but it is
>worth knowing about hdparm -u1
>
# hdparm -u1 /dev/hda
/dev/hda:
setting unmaskirq to 1 (on)
unmaskirq = 1 (on)
#
After doing this I re-ran the 5 test sends at 115200 bps. The number of
lost bytes were: 0, 14, 8, 0, and 3. Compared with yesterday's 63, 5,
44, 48, and 2 this may indicate an improvement. Note also that in the
4th session where no bytes were lost there was still one element of
corrupt data as detected by the image decoder.
Thanks,
Lee.
[-- Attachment #2: dmesg.out --]
[-- Type: text/plain, Size: 7308 bytes --]
Linux version 2.6.18-4-686 (Debian 2.6.18.dfsg.1-12) (waldi@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Mon Mar 26 17:17:36 UTC 2007
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000e000000 (usable)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
0MB HIGHMEM available.
224MB LOWMEM available.
On node 0 totalpages: 57344
DMA zone: 4096 pages, LIFO batch:0
Normal zone: 53248 pages, LIFO batch:15
DMI 2.0 present.
ACPI: Unable to locate RSDP
Allocating PCI resources starting at 10000000 (gap: 0e000000:f1ff0000)
Detected 400.953 MHz processor.
Built 1 zonelists. Total pages: 57344
Kernel command line: root=/dev/hda3 ro
Local APIC disabled by BIOS -- you can enable it with "lapic"
mapped APIC to ffffd000 (011c9000)
Enabling fast FPU save and restore... done.
Initializing CPU#0
PID hash table entries: 1024 (order: 10, 4096 bytes)
Console: colour VGA+ 80x25
Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
Memory: 219828k/229376k available (1544k kernel code, 9052k reserved, 577k data, 196k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 802.59 BogoMIPS (lpj=1605193)
Security Framework v1.0.0 initialized
SELinux: Disabled at boot.
Capability LSM initialized
Mount-cache hash table entries: 512
CPU: After generic identify, caps: 0183f9ff 00000000 00000000 00000000 00000000 00000000 00000000
CPU: After vendor identify, caps: 0183f9ff 00000000 00000000 00000000 00000000 00000000 00000000
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 512K
CPU: After all inits, caps: 0183f9ff 00000000 00000000 00000040 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Compat vDSO mapped to ffffe000.
Checking 'hlt' instruction... OK.
SMP alternatives: switching to UP code
Freeing SMP alternatives: 16k freed
CPU0: Intel Pentium II (Deschutes) stepping 03
SMP motherboard not detected.
Local APIC not detected. Using dummy APIC emulation.
Brought up 1 CPUs
migration_cost=0
checking if image is initramfs... it is
Freeing initrd memory: 4375k freed
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb4a0, last bus=1
PCI: Using configuration type 1
Setting up standard PCI resources
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
PnPBIOS: Scanning system for PnP BIOS support...
PnPBIOS: Found PnP BIOS installation structure at 0xc00fc0f0
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xc118, dseg 0xf0000
PnPBIOS: 14 nodes reported by PnP BIOS; 14 recorded by driver
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Firmware left 0000:00:0b.0 e100 interrupts enabled, disabling
Boot video device is 0000:01:00.0
PCI: Bridge: 0000:00:01.0
IO window: c000-cfff
MEM window: e4000000-e5ffffff
PREFETCH window: e7000000-e77fffff
PCI: Setting latency timer of device 0000:00:01.0 to 64
NET: Registered protocol family 2
IP route cache hash table entries: 2048 (order: 1, 8192 bytes)
TCP established hash table entries: 8192 (order: 4, 65536 bytes)
TCP bind hash table entries: 4096 (order: 3, 32768 bytes)
TCP: Hash tables configured (established 8192 bind 4096)
TCP reno registered
audit: initializing netlink socket (disabled)
audit(1185563327.604:1): initialized
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
Activating ISA DMA hang workarounds.
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
serial8250: ttyS2 at I/O 0x3e8 (irq = 4) is a 16550A
serial8250: ttyS3 at I/O 0x2e8 (irq = 3) is a 16550A
00:0a: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:0d: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
Couldn't register serial port 0000:00:0a.0: -28
Couldn't register serial port 0000:00:0c.0: -28
RAMDISK driver initialized: 16 RAM disks of 8192K size 1024 blocksize
PNP: PS/2 Controller [PNP0303] at 0x60,0x64 irq 1
PNP: PS/2 controller doesn't have AUX irq; using default 12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
TCP bic registered
NET: Registered protocol family 1
NET: Registered protocol family 17
NET: Registered protocol family 8
NET: Registered protocol family 20
Using IPI No-Shortcut mode
Freeing unused kernel memory: 196k freed
Time: tsc clocksource has been installed.
input: AT Translated Set 2 keyboard as /class/input/input0
e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
e100: Copyright(c) 1999-2005 Intel Corporation
e100: eth0: e100_probe: addr 0xe7a00000, irq 10, MAC addr 00:60:B0:68:53:F4
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller at PCI slot 0000:00:07.1
PCI: VIA IRQ fixup for 0000:00:07.1, from 255 to 0
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
VP_IDE: VIA vt82c596a (rev 06) IDE UDMA33 controller on pci0000:00:07.1
ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:pio, hdb:pio
ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: QUANTUM BIGFOOT TS12.7A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
hdc: LTN301, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 25075008 sectors (12838 MB) w/418KiB Cache, CHS=24876/16/63, UDMA(33)
hda: hda1 hda2 hda3
hdc: ATAPI 32X CD-ROM drive, 120kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
Attempting manual resume
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
input: PC Speaker as /class/input/input1
Real Time Clock Driver v1.12ac
Linux agpgart interface v0.101 (c) Dave Jones
agpgart: Detected VIA Apollo Pro 133 chipset
agpgart: AGP aperture is 64M @ 0xe0000000
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
Adding 498004k swap on /dev/hda2. Priority:-1 extents:1 across:498004k
EXT3 FS on hda3, internal journal
loop: loaded (max 8 devices)
device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
kjournald starting. Commit interval 5 seconds
EXT3 FS on hda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
eth0: no IPv6 routers present
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-28 4:51 ` Lee Howard
2007-07-28 9:18 ` Russell King
2007-07-28 12:00 ` Alan Cox
@ 2007-07-28 16:41 ` Ray Lee
2007-08-04 18:21 ` Lee Howard
2 siblings, 1 reply; 50+ messages in thread
From: Ray Lee @ 2007-07-28 16:41 UTC (permalink / raw)
To: Lee Howard
Cc: Paul Fulghum, Tilman Schmidt, Alan Cox, Robert Hancock,
linux-serial, tytso, rmk, linux-kernel
On 7/27/07, Lee Howard <faxguy@howardsilvan.com> wrote:
> Curiously, the session at 38400 bps that skipped 858 bytes... coincided,
> not just in sequence but also in precice timing within the session, with
> a small but noticeable disk load that I caused by grepping through a
> hundred session logs. (I can't reproduce it easily, though, because of
> disk caching.)
`echo 1 > /proc/sys/vm/drop_caches` will clear out most (all?) of what
the kernel has cached from the drive. It's there just for this kind of
repeatability of tests...
Ray
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 18:22 ` Robert Hancock
` (3 preceding siblings ...)
2007-07-28 9:28 ` Russell King
@ 2007-07-30 9:34 ` Maciej W. Rozycki
4 siblings, 0 replies; 50+ messages in thread
From: Maciej W. Rozycki @ 2007-07-30 9:34 UTC (permalink / raw)
To: Robert Hancock; +Cc: Lee Howard, linux-serial, tytso, rmk, linux-kernel
On Fri, 27 Jul 2007, Robert Hancock wrote:
> > The TTY line discipline driver could do that based on the amount of received
> > data present in its buffer. And it should if asked to (a brief look at
> > drivers/char/n_tty.c reveals it does; obviously there may be a bug
>
> Really, where? In my look through the code I haven't found any mechanism that
> would result in RTS being lowered based on TTY buffers filling up, at least
> not in the 8250 case.
Look for calls to ->throttle() and ->unthrottle(). XON and XOFF might be
used instead as a result of these calls though, depending on terminal
settings.
> In this situation, though, it appears it's not the TTY buffers that are
> filling but the UART's own buffer. I would think this must be caused by some
> kind of interrupt latency that results in not draining the FIFO in time.
Well, the UART only has its FIFO which is rather small, so automatic flow
control would be useful. Though, admittedly, tty_insert_flip_char() might
return some kind of a status related to how much space is left in the
receive buffer which would indicate that there is a lag in data stream
processing -- which in turn may relate to the system being loaded, so that
the receive ISR could decide whether to negate RTS itself for the less
capable UARTs (i.e. ones with no autoflow and a tiny or no FIFO).
Maciej
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 18:11 ` Lee Howard
@ 2007-07-30 9:36 ` Maciej W. Rozycki
0 siblings, 0 replies; 50+ messages in thread
From: Maciej W. Rozycki @ 2007-07-30 9:36 UTC (permalink / raw)
To: Lee Howard; +Cc: Robert Hancock, linux-serial, tytso, rmk, linux-kernel
On Fri, 27 Jul 2007, Lee Howard wrote:
> >The serial drivers have nothing to do about it -- all they can do is pushing
> >data upstream, to the discipline driver. They can provide an interface to
> >hardware flow control features though, if implemented by a given UART.
> >
>
> Thank you for this clarification. So I should have more correctly been saying
> that "tty flow control appears broken". Right?
Probably. It might be, as Alan suggested, that it is meant to work, but
the latency kills it.
Maciej
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 19:05 ` Paul Fulghum
@ 2007-07-30 9:39 ` Maciej W. Rozycki
0 siblings, 0 replies; 50+ messages in thread
From: Maciej W. Rozycki @ 2007-07-30 9:39 UTC (permalink / raw)
To: Paul Fulghum
Cc: Robert Hancock, Lee Howard, linux-serial, tytso, rmk, linux-kernel
On Fri, 27 Jul 2007, Paul Fulghum wrote:
> I can't see anyplace in serial_core.c or 8250.c that sets TTY_OVERRUN.
Look for UART_LSR_OE in 8250.c -- the serial core accepts any bit that
has been defined by the low-level driver and sets TTY_OVERRUN in
uart_insert_char().
Maciej
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-28 9:28 ` Russell King
@ 2007-07-30 9:45 ` Maciej W. Rozycki
2007-07-30 9:59 ` Russell King
2007-08-02 14:57 ` Mark Lord
0 siblings, 2 replies; 50+ messages in thread
From: Maciej W. Rozycki @ 2007-07-30 9:45 UTC (permalink / raw)
To: Russell King
Cc: Robert Hancock, Lee Howard, linux-serial, tytso, linux-kernel
On Sat, 28 Jul 2007, Russell King wrote:
> Essentially, any complex interrupt handler (such as an IDE interrupt
> doing a multi-sector PIO transfer _in interrupt context_) can cause this
> kind of starvation. That's why Linux 1.x had bottom halves - so that
> the time consuming work could be moved out of the interrupt handler,
> thereby causing minimal the blockage of other interrupts.
>
> Unfortunately, that kind of design has been long since forgotten.
> Apparantly modern machines are fast enough that it doesn't have to be
> worried about anymore... Or are they?
I would guess it is not that the machines are fast enough, but that this
two-level processing makes things more complicated. Enough that most
people would not bother digging into it unless really forced. Only
occasional latency problems are probably not enough of a force.
Maciej
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-30 9:45 ` Maciej W. Rozycki
@ 2007-07-30 9:59 ` Russell King
2007-08-02 14:57 ` Mark Lord
1 sibling, 0 replies; 50+ messages in thread
From: Russell King @ 2007-07-30 9:59 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Robert Hancock, Lee Howard, linux-serial, tytso, linux-kernel
On Mon, Jul 30, 2007 at 10:45:19AM +0100, Maciej W. Rozycki wrote:
> On Sat, 28 Jul 2007, Russell King wrote:
>
> > Essentially, any complex interrupt handler (such as an IDE interrupt
> > doing a multi-sector PIO transfer _in interrupt context_) can cause this
> > kind of starvation. That's why Linux 1.x had bottom halves - so that
> > the time consuming work could be moved out of the interrupt handler,
> > thereby causing minimal the blockage of other interrupts.
> >
> > Unfortunately, that kind of design has been long since forgotten.
> > Apparantly modern machines are fast enough that it doesn't have to be
> > worried about anymore... Or are they?
>
> I would guess it is not that the machines are fast enough, but that this
> two-level processing makes things more complicated. Enough that most
> people would not bother digging into it unless really forced. Only
> occasional latency problems are probably not enough of a force.
It's a shame we don't have a way to measure IRQ latency - it would be
very useful to flag up problems.
I think the best we could do is to arrange for the timer interrupt to
complain if it's delayed by more than 1ms or so - but some architectures
already run their timers with IRQF_DISABLED as a work around some of
the latency issues.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-30 9:45 ` Maciej W. Rozycki
2007-07-30 9:59 ` Russell King
@ 2007-08-02 14:57 ` Mark Lord
2007-08-02 16:14 ` Robert Hancock
1 sibling, 1 reply; 50+ messages in thread
From: Mark Lord @ 2007-08-02 14:57 UTC (permalink / raw)
To: Maciej W. Rozycki
Cc: Russell King, Robert Hancock, Lee Howard, linux-serial, tytso,
linux-kernel
Maciej W. Rozycki wrote:
> On Sat, 28 Jul 2007, Russell King wrote:
>
>> Essentially, any complex interrupt handler (such as an IDE interrupt
>> doing a multi-sector PIO transfer _in interrupt context_) can cause this
>> kind of starvation. That's why Linux 1.x had bottom halves - so that
>> the time consuming work could be moved out of the interrupt handler,
>> thereby causing minimal the blockage of other interrupts.
>>
>> Unfortunately, that kind of design has been long since forgotten.
>> Apparantly modern machines are fast enough that it doesn't have to be
>> worried about anymore... Or are they?
>
> I would guess it is not that the machines are fast enough, but that this
> two-level processing makes things more complicated. Enough that most
> people would not bother digging into it unless really forced. Only
> occasional latency problems are probably not enough of a force.
I don't believe the speed of the machine has much to do with it,
as IDE PIO is always at pretty much the same speed (or slower)
regardless of the CPU speed.
Best case is about .120 usec per 16-bit word, but that doesn't often pan out
in practice. More typical is something closer to 1 usec per 16-bit word.
So, for multcount=16 (very common), best case is 16 * 256 * .120 = 491 usec,
plus extra overhead for reading the IDE status register (another usec or so),
and other stuff. Figure maybe 500usec total per interrupt for multcount=16
in the best case, or 4000usec in the worst case.
At 115200bps, we get a byte every 86 usec or so. Assuming the UART FIFO
is set to interrupt (warn) us at 12/16 full, we have 4*86 = 344 usec to
respond and de-assert RTS. Less than that in practice.
Conclusion: using IDE multisector PIO is not a good idea with high speed
serial transfers happening, since we cannot respond quickly enough.
It might be possible to set the buffer underrun threshold lower in the UART (?).
All that said, I doubt that his system is using IDE PIO in the first place.
Dunno how long IDE DMA interrupts take, but it's probably in the 20-50 usec range.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-02 14:57 ` Mark Lord
@ 2007-08-02 16:14 ` Robert Hancock
2007-08-02 16:29 ` Mark Lord
2007-08-02 16:57 ` Alan Cox
0 siblings, 2 replies; 50+ messages in thread
From: Robert Hancock @ 2007-08-02 16:14 UTC (permalink / raw)
To: Mark Lord
Cc: Maciej W. Rozycki, Russell King, Lee Howard, linux-serial, tytso,
linux-kernel
Mark Lord wrote:
> I don't believe the speed of the machine has much to do with it,
> as IDE PIO is always at pretty much the same speed (or slower)
> regardless of the CPU speed.
>
> Best case is about .120 usec per 16-bit word, but that doesn't often pan
> out
> in practice. More typical is something closer to 1 usec per 16-bit word.
>
> So, for multcount=16 (very common), best case is 16 * 256 * .120 = 491
> usec,
> plus extra overhead for reading the IDE status register (another usec or
> so),
> and other stuff. Figure maybe 500usec total per interrupt for multcount=16
> in the best case, or 4000usec in the worst case.
>
> At 115200bps, we get a byte every 86 usec or so. Assuming the UART FIFO
> is set to interrupt (warn) us at 12/16 full, we have 4*86 = 344 usec to
> respond and de-assert RTS. Less than that in practice.
>
> Conclusion: using IDE multisector PIO is not a good idea with high speed
> serial transfers happening, since we cannot respond quickly enough.
>
> It might be possible to set the buffer underrun threshold lower in the
> UART (?).
>
> All that said, I doubt that his system is using IDE PIO in the first place.
> Dunno how long IDE DMA interrupts take, but it's probably in the 20-50
> usec range.
I think that PIO transfers only have to be done with interrupts disabled
on really old, evil controllers (without unmask set). I don't think
libata ever disables interrupts during transfers(?)
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-02 16:14 ` Robert Hancock
@ 2007-08-02 16:29 ` Mark Lord
2007-08-02 16:40 ` Robert Hancock
` (2 more replies)
2007-08-02 16:57 ` Alan Cox
1 sibling, 3 replies; 50+ messages in thread
From: Mark Lord @ 2007-08-02 16:29 UTC (permalink / raw)
To: Robert Hancock
Cc: Maciej W. Rozycki, Russell King, Lee Howard, linux-serial, tytso,
linux-kernel
Robert Hancock wrote:
> Mark Lord wrote:
>> I don't believe the speed of the machine has much to do with it,
>> as IDE PIO is always at pretty much the same speed (or slower)
>> regardless of the CPU speed.
>>
>> Best case is about .120 usec per 16-bit word, but that doesn't often
>> pan out
>> in practice. More typical is something closer to 1 usec per 16-bit word.
>>
>> So, for multcount=16 (very common), best case is 16 * 256 * .120 = 491
>> usec,
>> plus extra overhead for reading the IDE status register (another usec
>> or so),
>> and other stuff. Figure maybe 500usec total per interrupt for
>> multcount=16
>> in the best case, or 4000usec in the worst case.
>>
>> At 115200bps, we get a byte every 86 usec or so. Assuming the UART FIFO
>> is set to interrupt (warn) us at 12/16 full, we have 4*86 = 344 usec to
>> respond and de-assert RTS. Less than that in practice.
>>
>> Conclusion: using IDE multisector PIO is not a good idea with high speed
>> serial transfers happening, since we cannot respond quickly enough.
>>
>> It might be possible to set the buffer underrun threshold lower in the
>> UART (?).
>>
>> All that said, I doubt that his system is using IDE PIO in the first
>> place.
>> Dunno how long IDE DMA interrupts take, but it's probably in the 20-50
>> usec range.
>
> I think that PIO transfers only have to be done with interrupts disabled
> on really old, evil controllers (without unmask set). I don't think
> libata ever disables interrupts during transfers(?)
That's what "hdparm -u1" (or -u0) controls.
But it doesn't matter a whit here. The problem is that the IDE interrupt
handling can take a long time, regardless of whether it unmasks IRQs or not.
And if that IDE interrupt interrupts a serial interrupt, then the serial
stuff won't get handled until the IDE stuff completes. Thus the problem.
The "fix" could be to have the serial IRQ handler never unmask interrupts,
but that's a bit unsociable to others. The IDE stuff really needs to not
do so much during the actual IRQ handler.
Ingo's RT patches would probably fix all of this.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-02 16:29 ` Mark Lord
@ 2007-08-02 16:40 ` Robert Hancock
2007-08-02 17:13 ` Alan Cox
2007-08-04 19:38 ` Lee Howard
2 siblings, 0 replies; 50+ messages in thread
From: Robert Hancock @ 2007-08-02 16:40 UTC (permalink / raw)
To: Mark Lord
Cc: Maciej W. Rozycki, Russell King, Lee Howard, linux-serial, tytso,
linux-kernel
Mark Lord wrote:
>> I think that PIO transfers only have to be done with interrupts
>> disabled on really old, evil controllers (without unmask set). I don't
>> think libata ever disables interrupts during transfers(?)
>
> That's what "hdparm -u1" (or -u0) controls.
>
> But it doesn't matter a whit here. The problem is that the IDE interrupt
> handling can take a long time, regardless of whether it unmasks IRQs or
> not.
> And if that IDE interrupt interrupts a serial interrupt, then the serial
> stuff won't get handled until the IDE stuff completes. Thus the problem.
>
> The "fix" could be to have the serial IRQ handler never unmask interrupts,
> but that's a bit unsociable to others. The IDE stuff really needs to not
> do so much during the actual IRQ handler.
>
> Ingo's RT patches would probably fix all of this.
libata also doesn't do the actual PIO transfer from the interrupt
handler like old IDE does, either, and it only disables interrupts for
the transfer if it's transferring to/from high memory..
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-02 16:14 ` Robert Hancock
2007-08-02 16:29 ` Mark Lord
@ 2007-08-02 16:57 ` Alan Cox
2007-08-02 17:02 ` Robert Hancock
2007-08-03 9:32 ` Maciej W. Rozycki
1 sibling, 2 replies; 50+ messages in thread
From: Alan Cox @ 2007-08-02 16:57 UTC (permalink / raw)
To: Robert Hancock
Cc: Mark Lord, Maciej W. Rozycki, Russell King, Lee Howard,
linux-serial, tytso, linux-kernel
> I think that PIO transfers only have to be done with interrupts disabled
> on really old, evil controllers (without unmask set). I don't think
> libata ever disables interrupts during transfers(?)
Currently libata PIO is mostly done in the IRQ path. Albert Lee was doing
some work on that but its actually very hard to fix without doing polled
PIO.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-02 16:57 ` Alan Cox
@ 2007-08-02 17:02 ` Robert Hancock
2007-08-03 9:32 ` Maciej W. Rozycki
1 sibling, 0 replies; 50+ messages in thread
From: Robert Hancock @ 2007-08-02 17:02 UTC (permalink / raw)
To: Alan Cox
Cc: Mark Lord, Maciej W. Rozycki, Russell King, Lee Howard,
linux-serial, tytso, linux-kernel
Alan Cox wrote:
>> I think that PIO transfers only have to be done with interrupts disabled
>> on really old, evil controllers (without unmask set). I don't think
>> libata ever disables interrupts during transfers(?)
>
> Currently libata PIO is mostly done in the IRQ path. Albert Lee was doing
> some work on that but its actually very hard to fix without doing polled
> PIO.
Ah, right. Misread the code.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-02 16:29 ` Mark Lord
2007-08-02 16:40 ` Robert Hancock
@ 2007-08-02 17:13 ` Alan Cox
2007-08-04 19:38 ` Lee Howard
2 siblings, 0 replies; 50+ messages in thread
From: Alan Cox @ 2007-08-02 17:13 UTC (permalink / raw)
To: Mark Lord
Cc: Robert Hancock, Maciej W. Rozycki, Russell King, Lee Howard,
linux-serial, tytso, linux-kernel
> That's what "hdparm -u1" (or -u0) controls.
Only some of the time.
> Ingo's RT patches would probably fix all of this.
The worst case IDE times we've seen for executing a single indivisible
un-interruptible I/O cycle with a drive are around 1mS. Thats a hardware
limit.
Alan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-02 16:57 ` Alan Cox
2007-08-02 17:02 ` Robert Hancock
@ 2007-08-03 9:32 ` Maciej W. Rozycki
1 sibling, 0 replies; 50+ messages in thread
From: Maciej W. Rozycki @ 2007-08-03 9:32 UTC (permalink / raw)
To: Alan Cox
Cc: Robert Hancock, Mark Lord, Russell King, Lee Howard,
linux-serial, tytso, linux-kernel
On Thu, 2 Aug 2007, Alan Cox wrote:
> Currently libata PIO is mostly done in the IRQ path. Albert Lee was doing
> some work on that but its actually very hard to fix without doing polled
> PIO.
Hmm, when the drive signals it is ready for a PIO data transfer can't
just the interrupt handler mask the originating interrupt and post a
softirq to handle the case? That should be rather straightforward.
Maciej
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 17:53 ` Maciej W. Rozycki
2007-07-27 18:11 ` Lee Howard
2007-07-27 18:22 ` Robert Hancock
@ 2007-08-04 18:19 ` Lee Howard
2 siblings, 0 replies; 50+ messages in thread
From: Lee Howard @ 2007-08-04 18:19 UTC (permalink / raw)
To: Maciej W. Rozycki; +Cc: Robert Hancock, linux-serial, tytso, rmk, linux-kernel
Maciej W. Rozycki wrote:
>On Fri, 27 Jul 2007, Lee Howard wrote:
>
>
>
>>Okay, so let's say we've got a loop around a blocking read on the modem file
>>descriptor...
>>
>> for (;;) {
>> read some data from modem
>> process data from modem
>> if (end-of-data detected) break;
>> }
>>
>>Are you suggesting that the application should be using deasserting RTS after
>>the read and asserting it before?
>>
>>
>
> It certainly could -- you were asking how it would know. ;-)
>
So, to test... I put this in the application before every read:
int flags;
ioctl(modemFd, TIOCMGET, &flags);
flags |= TIOCM_RTS;
ioctl(modemFd, TIOCMSET, &flags);
and this after:
int flags;
ioctl(modemFd, TIOCMGET, &flags);
flags &= ~TIOCM_RTS;
ioctl(modemFd, TIOCMSET, &flags);
Now I can see the RTS light blink on the modem (and during heavy
communication it merely "dims" depending on the amount of delay in the
processing.
However, it does not help. Data still goes missing.
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-28 16:41 ` Ray Lee
@ 2007-08-04 18:21 ` Lee Howard
2007-08-04 22:07 ` Paul Fulghum
0 siblings, 1 reply; 50+ messages in thread
From: Lee Howard @ 2007-08-04 18:21 UTC (permalink / raw)
To: Ray Lee
Cc: Paul Fulghum, Tilman Schmidt, Alan Cox, Robert Hancock,
linux-serial, tytso, rmk, linux-kernel
Ray Lee wrote:
>On 7/27/07, Lee Howard <faxguy@howardsilvan.com> wrote:
>
>
>>Curiously, the session at 38400 bps that skipped 858 bytes... coincided,
>>not just in sequence but also in precice timing within the session, with
>>a small but noticeable disk load that I caused by grepping through a
>>hundred session logs. (I can't reproduce it easily, though, because of
>>disk caching.)
>>
>>
>
>`echo 1 > /proc/sys/vm/drop_caches` will clear out most (all?) of what
>the kernel has cached from the drive. It's there just for this kind of
>repeatability of tests...
>
And in repeat tests it is quite evident that IDE disk activity is,
indeed, at least part of the problem. As IDE disk activity increases an
increased amount of data coming in on the serial port goes missing.
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-02 16:29 ` Mark Lord
2007-08-02 16:40 ` Robert Hancock
2007-08-02 17:13 ` Alan Cox
@ 2007-08-04 19:38 ` Lee Howard
2 siblings, 0 replies; 50+ messages in thread
From: Lee Howard @ 2007-08-04 19:38 UTC (permalink / raw)
To: Mark Lord
Cc: Robert Hancock, Maciej W. Rozycki, Russell King, linux-serial,
tytso, linux-kernel
Mark Lord wrote:
> The "fix" could be to have the serial IRQ handler never unmask
> interrupts,
> but that's a bit unsociable to others. The IDE stuff really needs to not
> do so much during the actual IRQ handler.
>
> Ingo's RT patches would probably fix all of this.
I did a Fedora 7 installation and installed Ingo's kernel from here:
http://people.redhat.com/mingo/realtime-preempt/yum-testing/yum/i686/kernel-rt-2.6.21-0182.rt11cfsv17.i686.rpm
Even then, the problem still occurs, unfortunately.
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-04 18:21 ` Lee Howard
@ 2007-08-04 22:07 ` Paul Fulghum
2007-08-05 0:00 ` Lee Howard
0 siblings, 1 reply; 50+ messages in thread
From: Paul Fulghum @ 2007-08-04 22:07 UTC (permalink / raw)
To: Lee Howard
Cc: Ray Lee, Tilman Schmidt, Alan Cox, Robert Hancock, linux-serial,
tytso, rmk, linux-kernel
Lee Howard wrote:
> And in repeat tests it is quite evident that IDE disk activity is,
> indeed, at least part of the problem. As IDE disk activity increases an
> increased amount of data coming in on the serial port goes missing.
Lee, you mentioned 2.2.x kernels did not exhibit this problem.
Was this on the same hardware you are currently testing?
Which 2.2.x version were you using?
Was the 2.2.x serial driver also identifying the UART as a 16550A?
Can you get /proc/interrupts output
from both the current setup and the 2.2.x setup?
It would be interesting to compare the interrupt
assignment and UART setup between the versions.
--
Paul
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-04 22:07 ` Paul Fulghum
@ 2007-08-05 0:00 ` Lee Howard
2007-08-05 14:52 ` Paul Fulghum
0 siblings, 1 reply; 50+ messages in thread
From: Lee Howard @ 2007-08-05 0:00 UTC (permalink / raw)
To: Paul Fulghum
Cc: Ray Lee, Tilman Schmidt, Alan Cox, Robert Hancock, linux-serial,
tytso, rmk, linux-kernel
Paul Fulghum wrote:
> Lee Howard wrote:
>
>> And in repeat tests it is quite evident that IDE disk activity is,
>> indeed, at least part of the problem. As IDE disk activity increases
>> an increased amount of data coming in on the serial port goes missing.
>
>
> Lee, you mentioned 2.2.x kernels did not exhibit this problem.
>
> Was this on the same hardware you are currently testing?
Yes it was... except for the hard drive. I have different installs of
different operating systems on different hard drives. I change the hard
drive when switching between 2.2.5 and 2.6.5.
> Which 2.2.x version were you using?
The default 2.2.5 kernel that comes with RedHat 6.0.
> Was the 2.2.x serial driver also identifying the UART as a 16550A?
Yes it does.
> Can you get /proc/interrupts output
> from both the current setup and the 2.2.x setup?
Current (2.6.5):
CPU0
0: 14660696 XT-PIC timer
1: 8 XT-PIC i8042
2: 0 XT-PIC cascade
3: 1240314 XT-PIC serial
4: 778901 XT-PIC serial
8: 1 XT-PIC rtc
10: 111647 XT-PIC eth0
14: 221202 XT-PIC ide0
15: 34 XT-PIC ide1
NMI: 0
ERR: 5
(2.2.5):
CPU0
0: 5908 XT-PIC timer
1: 88 XT-PIC i8042
2: 0 XT-PIC cascade
8: 2 XT-PIC rtc
10: 38 XT-PIC Intel EtherExpress Pro 10/100 Ethernet
13: 1 XT-PIC fpu
14: 36637 XT-PIC ide0
15: 4 XT-PIC ide1
NMI: 0
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-08-05 0:00 ` Lee Howard
@ 2007-08-05 14:52 ` Paul Fulghum
0 siblings, 0 replies; 50+ messages in thread
From: Paul Fulghum @ 2007-08-05 14:52 UTC (permalink / raw)
To: Lee Howard
Cc: Ray Lee, Tilman Schmidt, Alan Cox, Robert Hancock, linux-serial,
tytso, rmk, linux-kernel
2.2.5 is using the same UART setup (trigger level of 8) as
the current code. There is no obvious difference in the
interrupt setup (same devices on the same interrupts).
So I have no helpful suggestions :-(
--
Paul
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 13:45 ` Tilman Schmidt
2007-07-27 20:05 ` Lee Howard
@ 2007-08-27 20:38 ` Paul Fulghum
2007-07-27 20:48 ` Lee Howard
1 sibling, 1 reply; 50+ messages in thread
From: Paul Fulghum @ 2007-08-27 20:38 UTC (permalink / raw)
To: Tilman Schmidt
Cc: Lee Howard, Alan Cox, Robert Hancock, linux-serial, tytso, rmk,
linux-kernel
Tilman Schmidt wrote:
> Could this be related?
>
> http://lkml.org/lkml/2007/7/18/245
>
> Quote:
> "I've recently found (using 2.6.21.4) that configuring a serial ports
> (ST16654) which use the 8250 driver using setserial results in the
> UART's FIFOs being disabled (unless you specify autoconfig)."
That would make sense.
Lee's error is a hardware FIFO overrun which could occur
if the FIFO is being disabled as described in your
link (by trying to set the uart type with setserial).
Since the tty flow control is only triggered
by the line discipline in response to ldisc
buffer levels and not hardware FIFO overruns,
you would never see any flow control action
as reported by Lee.
--
Paul Fulghum
Microgate Systems, Ltd.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 11:56 ` Alan Cox
@ 2007-07-27 18:00 ` Lee Howard
0 siblings, 0 replies; 50+ messages in thread
From: Lee Howard @ 2007-07-27 18:00 UTC (permalink / raw)
To: Alan Cox; +Cc: Uwe Kleine-König, linux-serial, linux-kernel
Alan Cox wrote:
>As the flow control is driven by software on most 16x50 chips (there are
>a couple of exceptions) if we fail to empty the fifo fast enough then any
>flow control will be asserted too late to save the day.
>
>If you stop the application and do the following
>
> cat /dev/ttywhatever
> ^Z
> [stopped]
>
>(so you are asking the OS to buffer data but not ever reading it)
>
>and then fire data at it does the flow control eventually occur ?
>
Yes it does appear to. I told the application to simply sleep(300) at
the appropriate moment, and I watched the application and when it began
the sleep I ran:
cat /dev/ttyS1
(lots of "garbage" began spewing forth)
^Z
(about 2 or 3 seconds and the RTS light goes dark)
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-27 6:17 ` Lee Howard
@ 2007-07-27 11:56 ` Alan Cox
2007-07-27 18:00 ` Lee Howard
0 siblings, 1 reply; 50+ messages in thread
From: Alan Cox @ 2007-07-27 11:56 UTC (permalink / raw)
To: Lee Howard; +Cc: Uwe Kleine-König, linux-serial, linux-kernel
> -parenb -parodd cs8 -hupcl -cstopb cread clocal crtscts
Ok so crtscts is set, but you have clocal set too. That shouldn't matter
> Using software flow control this is what stty tells me about the port
> set up done by the application:
This also looks fine
> They seem correct to me, but I am certainly willing to be wrong.
clocal set as well is unusual but if I remember the spec right then
clocal would not interfere with rts/cts handshake and certainly not with
xon/xoff
Looks correct, two boards so its unlikely both didnt wire it.
> A quick google on "input overrun(s)" may lend some credence (although,
> certainly this is not in any way conclusive) that I'm not the only one
> who may be seeking a solution on this matter.
>
> http://www.google.com/search?hl=en&q=%2B%22input+overrun%28s%29%22
Those look different on the whole - there are two reasons you'll get an
input overrun with a 16x50 UART. The first is because we ran out of
buffers to empty the chip, in which case we would have asserted flow
control in software. The second is if we cannot keep up and fail to empty
the on chip FIFO within the required time (about 1mS)
As the flow control is driven by software on most 16x50 chips (there are
a couple of exceptions) if we fail to empty the fifo fast enough then any
flow control will be asserted too late to save the day.
If you stop the application and do the following
cat /dev/ttywhatever
^Z
[stopped]
(so you are asking the OS to buffer data but not ever reading it)
and then fire data at it does the flow control eventually occur ?
Alan
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-26 16:41 ` Alan Cox
@ 2007-07-27 6:17 ` Lee Howard
2007-07-27 11:56 ` Alan Cox
0 siblings, 1 reply; 50+ messages in thread
From: Lee Howard @ 2007-07-27 6:17 UTC (permalink / raw)
To: Alan Cox; +Cc: Uwe Kleine-König, linux-serial, tytso, rmk, linux-kernel
Alan Cox wrote:
>>The manufacturer is using a scope to look for RTS and they're not seeing
>>it, either. I just use my eyes to look at the LED, but I can see the
>>CTS, DTR, DCD, RD, and TD lights blink, flicker, or dim... (and TD, RD,
>>and CTS tend to go on and off rather quickly).
>>
>>
>
>And you have
>
>1. The port set up correctly for flow control options in the
>kernel ?
>
>
I suppose that you mean that the application has properly set up the
port using termios/tcsetattr/ioctl and the like... rather than if the
kernel build/config options were set to permit flow control (I know of
no relevant flow-control-enabling kernel build options). Using hardware
flow control this is what stty tells me about the port set up done by
the application:
# stty -F /dev/ttyS1 -a
speed 115200 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
eol2 = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase =
^W; lnext = ^V; flush = ^O;
min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread clocal crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl
-ixon -ixoff -iuclc -ixany -imaxbel
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0
bs0 vt0 ff0
-isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop
-echoprt -echoctl -echoke
#
Using software flow control this is what stty tells me about the port
set up done by the application:
# stty -F /dev/ttyS1 -a
speed 115200 baud; rows 0; columns 0; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>;
eol2 = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase =
^W; lnext = ^V; flush = ^O;
min = 1; time = 0;
-parenb -parodd cs8 -hupcl -cstopb cread clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl ixon
ixoff -iuclc -ixany -imaxbel
-opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0
bs0 vt0 ff0
-isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop
-echoprt -echoctl -echoke
#
They seem correct to me, but I am certainly willing to be wrong.
>2. Verified that the board vendor remembered to wire it ?
>
I don't know how to verify directly that the board manufacturer wired
the serial port correctly. I've tested this on two different
motherboards made several years apart (but, yes, both were made by the
same manufacturer). However, when using RedHat 6.0 (kernel 2.2.5) I
have no problems with data corruption occurring in the data coming from
the DCE. So that tells me that *something* was working before that
isn't working now... and I'm trying to determine what the difference
is... whether it be a problem in modern kernels or whether it be
something that the application (HylaFAX) is not doing to accomodate
whatever changes occurred in modern kernels.
A quick google on "input overrun(s)" may lend some credence (although,
certainly this is not in any way conclusive) that I'm not the only one
who may be seeking a solution on this matter.
http://www.google.com/search?hl=en&q=%2B%22input+overrun%28s%29%22
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-26 16:28 ` Lee Howard
@ 2007-07-26 16:41 ` Alan Cox
2007-07-27 6:17 ` Lee Howard
0 siblings, 1 reply; 50+ messages in thread
From: Alan Cox @ 2007-07-26 16:41 UTC (permalink / raw)
To: Lee Howard; +Cc: Uwe Kleine-König, linux-serial, tytso, rmk, linux-kernel
> The manufacturer is using a scope to look for RTS and they're not seeing
> it, either. I just use my eyes to look at the LED, but I can see the
> CTS, DTR, DCD, RD, and TD lights blink, flicker, or dim... (and TD, RD,
> and CTS tend to go on and off rather quickly).
And you have
1. The port set up correctly for flow control options in the
kernel ?
2. Verified that the board vendor remembered to wire it ?
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-26 12:34 ` Uwe Kleine-König
@ 2007-07-26 16:28 ` Lee Howard
2007-07-26 16:41 ` Alan Cox
0 siblings, 1 reply; 50+ messages in thread
From: Lee Howard @ 2007-07-26 16:28 UTC (permalink / raw)
To: Uwe Kleine-König; +Cc: linux-serial, tytso, rmk, linux-kernel
Uwe Kleine-König wrote:
>Hello,
>
>
>
>>This is evidenced in hardware flow control by a little LED labeled "RTS"
>>that is on the external modem. This LED lights up when pin 7 of the DB9
>>serial connection is given +12Vdc current (signalling "RTS" is on - that
>>the host can accept data). The LED goes dark when the current is
>>removed (signalling that the host cannot accept data). This "RTS" LED
>>never flickers at all, as it should, when receiving these bursts of data
>>- the LED stays lit as long as the serial cable is connected to the
>>host... and yet I will see those "input overrun" messages. Thus, it
>>seems quite clear that the Linux serial tty driver is not deasserting
>>RTS as it should in hardware flow control. (And probably the analogous
>>problem exists in software flow control, too.)
>>
>>
>I don't know the relevant timings for problem, but just to be sure that
>your prerequisites are correct: How did you check that the LED stays
>lit all the time? Just from looking might not be accurate. You might
>want to mesure the signal with an oscilloscope.
>
The manufacturer is using a scope to look for RTS and they're not seeing
it, either. I just use my eyes to look at the LED, but I can see the
CTS, DTR, DCD, RD, and TD lights blink, flicker, or dim... (and TD, RD,
and CTS tend to go on and off rather quickly).
All of that said... even though I don't see RTS flicker or blink or dim
when using kernel 2.2.5 (RedHat 6.0) I don't have any problems using
115200 bps DTE-DCE communication rate.
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: serial flow control appears broken
2007-07-26 1:52 Lee Howard
@ 2007-07-26 12:34 ` Uwe Kleine-König
2007-07-26 16:28 ` Lee Howard
0 siblings, 1 reply; 50+ messages in thread
From: Uwe Kleine-König @ 2007-07-26 12:34 UTC (permalink / raw)
To: Lee Howard; +Cc: linux-serial, tytso, rmk, linux-kernel
Hello,
> This is evidenced in hardware flow control by a little LED labeled "RTS"
> that is on the external modem. This LED lights up when pin 7 of the DB9
> serial connection is given +12Vdc current (signalling "RTS" is on - that
> the host can accept data). The LED goes dark when the current is
> removed (signalling that the host cannot accept data). This "RTS" LED
> never flickers at all, as it should, when receiving these bursts of data
> - the LED stays lit as long as the serial cable is connected to the
> host... and yet I will see those "input overrun" messages. Thus, it
> seems quite clear that the Linux serial tty driver is not deasserting
> RTS as it should in hardware flow control. (And probably the analogous
> problem exists in software flow control, too.)
I don't know the relevant timings for problem, but just to be sure that
your prerequisites are correct: How did you check that the LED stays
lit all the time? Just from looking might not be accurate. You might
want to mesure the signal with an oscilloscope.
Just my 0.02¢
Uwe
--
Uwe Kleine-König
fib where fib = 0 : 1 : zipWith (+) fib (tail fib)
^ permalink raw reply [flat|nested] 50+ messages in thread
* serial flow control appears broken
@ 2007-07-26 1:52 Lee Howard
2007-07-26 12:34 ` Uwe Kleine-König
0 siblings, 1 reply; 50+ messages in thread
From: Lee Howard @ 2007-07-26 1:52 UTC (permalink / raw)
To: linux-serial, tytso, rmk, linux-kernel
Hello.
I have fax modems that will, in their proper behavior with certain
features, send up to 64 kilobytes of data to the host DTE all at once.
(So, the fax modem handles an incoming fax and periodically will send
between 256 bytes and 64 kilobytes of data in bursts.)
When the DCE-DTE (modem-to-host) communication rate is established at
115200 bps data loss occurs systems using at least Linux kernels 2.6.5
and 2.6.18 (and probably everything in-beween and then some more). This
is because the modem overflows the host's buffer. This is evidenced in
kernel logging:
Jul 23 14:01:30 gollum kernel: ttyS1: 1 input overrun(s)
Jul 23 17:09:45 gollum kernel: ttyS1: 1 input overrun(s)
Normally I would blame the modem itself for not honoring the host's flow
control signals. However, I have worked with the modem manufacturer
closely on this matter for over three months now. In that process they
have improved the responsiveness of the modem and have fixed other
problems, but the end result is that it truly does appear that the
serial tty driver is not using flow control. Whether software flow
control (XON/XOFF) or hardware flow control (RTS/CTS) is used the result
is the same.
This is evidenced in hardware flow control by a little LED labeled "RTS"
that is on the external modem. This LED lights up when pin 7 of the DB9
serial connection is given +12Vdc current (signalling "RTS" is on - that
the host can accept data). The LED goes dark when the current is
removed (signalling that the host cannot accept data). This "RTS" LED
never flickers at all, as it should, when receiving these bursts of data
- the LED stays lit as long as the serial cable is connected to the
host... and yet I will see those "input overrun" messages. Thus, it
seems quite clear that the Linux serial tty driver is not deasserting
RTS as it should in hardware flow control. (And probably the analogous
problem exists in software flow control, too.)
Please tell me what I can do to help you resove and/or remedy this
matter. Also, please let me know if I have contacted the wrong people.
(I have cross-posted to linux-kernel as a catch-all. I am not
subscribed to either linux-serial or linux-kernel mailing lists. So
please CC me in any list responses.)
If it is of any value to know (perhaps they have common code?), the same
error occurs on FreeBSD 6.2 as well. The problem does not occur on
Windows. The problem does not occur on RedHat 6.0 (kernel 2.2.5).
Thanks,
Lee.
^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2007-08-05 14:54 UTC | newest]
Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <fa.Z6O0xFRT69zes0Mg+agt3Uiwux4@ifi.uio.no>
2007-07-26 7:20 ` serial flow control appears broken Robert Hancock
2007-07-26 16:08 ` Lee Howard
2007-07-26 16:31 ` Alan Cox
2007-07-27 5:53 ` Lee Howard
2007-07-27 13:45 ` Tilman Schmidt
2007-07-27 20:05 ` Lee Howard
2007-08-27 20:38 ` Paul Fulghum
2007-07-27 20:48 ` Lee Howard
2007-07-27 23:28 ` Paul Fulghum
2007-07-28 4:51 ` Lee Howard
2007-07-28 9:18 ` Russell King
2007-07-28 12:00 ` Alan Cox
2007-07-28 15:39 ` Lee Howard
2007-07-28 16:41 ` Ray Lee
2007-08-04 18:21 ` Lee Howard
2007-08-04 22:07 ` Paul Fulghum
2007-08-05 0:00 ` Lee Howard
2007-08-05 14:52 ` Paul Fulghum
2007-07-27 11:32 ` Maciej W. Rozycki
2007-07-27 17:11 ` Lee Howard
2007-07-27 17:41 ` Alan Cox
2007-07-27 17:53 ` Maciej W. Rozycki
2007-07-27 18:11 ` Lee Howard
2007-07-30 9:36 ` Maciej W. Rozycki
2007-07-27 18:22 ` Robert Hancock
2007-07-27 18:46 ` Paul Fulghum
2007-07-27 19:05 ` Paul Fulghum
2007-07-30 9:39 ` Maciej W. Rozycki
2007-07-27 19:14 ` Paul Fulghum
2007-07-28 9:28 ` Russell King
2007-07-30 9:45 ` Maciej W. Rozycki
2007-07-30 9:59 ` Russell King
2007-08-02 14:57 ` Mark Lord
2007-08-02 16:14 ` Robert Hancock
2007-08-02 16:29 ` Mark Lord
2007-08-02 16:40 ` Robert Hancock
2007-08-02 17:13 ` Alan Cox
2007-08-04 19:38 ` Lee Howard
2007-08-02 16:57 ` Alan Cox
2007-08-02 17:02 ` Robert Hancock
2007-08-03 9:32 ` Maciej W. Rozycki
2007-07-30 9:34 ` Maciej W. Rozycki
2007-08-04 18:19 ` Lee Howard
2007-07-26 1:52 Lee Howard
2007-07-26 12:34 ` Uwe Kleine-König
2007-07-26 16:28 ` Lee Howard
2007-07-26 16:41 ` Alan Cox
2007-07-27 6:17 ` Lee Howard
2007-07-27 11:56 ` Alan Cox
2007-07-27 18:00 ` Lee Howard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).