Netdev Archive on lore.kernel.org help / color / mirror / Atom feed
* Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected @ 2021-08-05 20:53 Martin Zaharinov 2021-08-06 4:40 ` Greg KH 0 siblings, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-08-05 20:53 UTC (permalink / raw) To: netdev, gregkh, Eric Dumazet Hi Net dev team Please check this error : Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html But not find any solution. Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. Server is work fine users is down/up 500+ users . But in one moment server make spike and affect other vlans in same server . And in accel I see many row with this error. Is there options to find and fix this bug. With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected Martin ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-05 20:53 Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected Martin Zaharinov @ 2021-08-06 4:40 ` Greg KH 2021-08-06 5:40 ` Martin Zaharinov 2021-08-08 15:14 ` Martin Zaharinov 0 siblings, 2 replies; 23+ messages in thread From: Greg KH @ 2021-08-06 4:40 UTC (permalink / raw) To: Martin Zaharinov; +Cc: netdev, Eric Dumazet On Thu, Aug 05, 2021 at 11:53:50PM +0300, Martin Zaharinov wrote: > Hi Net dev team > > > Please check this error : > Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html > > But not find any solution. > > Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. > > Server is work fine users is down/up 500+ users . > But in one moment server make spike and affect other vlans in same server . > And in accel I see many row with this error. > > Is there options to find and fix this bug. > > With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. > > > [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected These are userspace error messages, not kernel messages. What kernel version are you using? thanks, greg k-h ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-06 4:40 ` Greg KH @ 2021-08-06 5:40 ` Martin Zaharinov 2021-08-08 15:14 ` Martin Zaharinov 1 sibling, 0 replies; 23+ messages in thread From: Martin Zaharinov @ 2021-08-06 5:40 UTC (permalink / raw) To: Greg KH; +Cc: netdev, Eric Dumazet Hi Greg Latest kernel 5.13.8. I try old version from 5.10 to 5.13 and its same error. Martin > On 6 Aug 2021, at 7:40, Greg KH <gregkh@linuxfoundation.org> wrote: > > On Thu, Aug 05, 2021 at 11:53:50PM +0300, Martin Zaharinov wrote: >> Hi Net dev team >> >> >> Please check this error : >> Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html >> >> But not find any solution. >> >> Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. >> >> Server is work fine users is down/up 500+ users . >> But in one moment server make spike and affect other vlans in same server . >> And in accel I see many row with this error. >> >> Is there options to find and fix this bug. >> >> With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. >> >> >> [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > > These are userspace error messages, not kernel messages. > > What kernel version are you using? > > thanks, > > greg k-h ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-06 4:40 ` Greg KH 2021-08-06 5:40 ` Martin Zaharinov @ 2021-08-08 15:14 ` Martin Zaharinov 2021-08-08 15:23 ` Pali Rohár 1 sibling, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-08-08 15:14 UTC (permalink / raw) To: Greg KH; +Cc: netdev, Eric Dumazet, pali Add Pali Rohár, If have any idea . Martin > On 6 Aug 2021, at 7:40, Greg KH <gregkh@linuxfoundation.org> wrote: > > On Thu, Aug 05, 2021 at 11:53:50PM +0300, Martin Zaharinov wrote: >> Hi Net dev team >> >> >> Please check this error : >> Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html >> >> But not find any solution. >> >> Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. >> >> Server is work fine users is down/up 500+ users . >> But in one moment server make spike and affect other vlans in same server . >> And in accel I see many row with this error. >> >> Is there options to find and fix this bug. >> >> With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. >> >> >> [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > > These are userspace error messages, not kernel messages. > > What kernel version are you using? > > thanks, > > greg k-h ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-08 15:14 ` Martin Zaharinov @ 2021-08-08 15:23 ` Pali Rohár 2021-08-08 15:29 ` Martin Zaharinov 0 siblings, 1 reply; 23+ messages in thread From: Pali Rohár @ 2021-08-08 15:23 UTC (permalink / raw) To: Martin Zaharinov; +Cc: Greg KH, netdev, Eric Dumazet Hello! On Sunday 08 August 2021 18:14:09 Martin Zaharinov wrote: > Add Pali Rohár, > > If have any idea . > > Martin > > > On 6 Aug 2021, at 7:40, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Thu, Aug 05, 2021 at 11:53:50PM +0300, Martin Zaharinov wrote: > >> Hi Net dev team > >> > >> > >> Please check this error : > >> Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html > >> > >> But not find any solution. > >> > >> Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. > >> > >> Server is work fine users is down/up 500+ users . > >> But in one moment server make spike and affect other vlans in same server . When this error started to happen? After kernel upgrade? After pppd upgrade? Or after system upgrade? Or when more users started to connecting? > >> And in accel I see many row with this error. > >> > >> Is there options to find and fix this bug. > >> > >> With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. > >> > >> > >> [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > > > > These are userspace error messages, not kernel messages. > > > > What kernel version are you using? Yes, we need to know, what kernel version are you using. > > thanks, > > > > greg k-h > And also another question, what version of pppd daemon are you using? Also, are you able to dump state of ppp channels and ppp units? It is needed to know to which tty device, file descriptor (or socket extension) is (or should be) particular ppp channel bounded. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-08 15:23 ` Pali Rohár @ 2021-08-08 15:29 ` Martin Zaharinov 2021-08-09 15:15 ` Pali Rohár 0 siblings, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-08-08 15:29 UTC (permalink / raw) To: Pali Rohár; +Cc: Greg KH, netdev, Eric Dumazet Hi Pali Kernel 5.13.8 The problem is from kernel 5.8 > I try all major update 5.9, 5.10, 5.11 ,5.12 I use accel-pppd daemon (not pppd) . And yes after users started to connecting . When system boot and connect first time all user connect without any problem . In time of work user disconnect and connect (power cut , fiber cut or other problem in network) , but in time of spike (may be make lock or other problem ) disconnect ~ 400-500 users and affect other users. Process go to load over 100% and In statistic I see many finishing connection and many start connection. And in this time in log get many lines with ioctl(PPPIOCCONNECT): Transport endpoint is not connected. After finish (unlock or other) stop to see this error and system is back to normal. And connect all disconnected users. Martin > On 8 Aug 2021, at 18:23, Pali Rohár <pali@kernel.org> wrote: > > Hello! > > On Sunday 08 August 2021 18:14:09 Martin Zaharinov wrote: >> Add Pali Rohár, >> >> If have any idea . >> >> Martin >> >>> On 6 Aug 2021, at 7:40, Greg KH <gregkh@linuxfoundation.org> wrote: >>> >>> On Thu, Aug 05, 2021 at 11:53:50PM +0300, Martin Zaharinov wrote: >>>> Hi Net dev team >>>> >>>> >>>> Please check this error : >>>> Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html >>>> >>>> But not find any solution. >>>> >>>> Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. >>>> >>>> Server is work fine users is down/up 500+ users . >>>> But in one moment server make spike and affect other vlans in same server . > > When this error started to happen? After kernel upgrade? After pppd > upgrade? Or after system upgrade? Or when more users started to > connecting? > >>>> And in accel I see many row with this error. >>>> >>>> Is there options to find and fix this bug. >>>> >>>> With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. >>>> >>>> >>>> [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>> [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> >>> These are userspace error messages, not kernel messages. >>> >>> What kernel version are you using? > > Yes, we need to know, what kernel version are you using. > >>> thanks, >>> >>> greg k-h >> > > And also another question, what version of pppd daemon are you using? > > Also, are you able to dump state of ppp channels and ppp units? It is > needed to know to which tty device, file descriptor (or socket > extension) is (or should be) particular ppp channel bounded. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-08 15:29 ` Martin Zaharinov @ 2021-08-09 15:15 ` Pali Rohár 2021-08-10 18:27 ` Martin Zaharinov 2021-08-11 11:10 ` Martin Zaharinov 0 siblings, 2 replies; 23+ messages in thread From: Pali Rohár @ 2021-08-09 15:15 UTC (permalink / raw) To: Martin Zaharinov; +Cc: Greg KH, netdev, Eric Dumazet On Sunday 08 August 2021 18:29:30 Martin Zaharinov wrote: > Hi Pali > > Kernel 5.13.8 > > > The problem is from kernel 5.8 > I try all major update 5.9, 5.10, 5.11 ,5.12 > > I use accel-pppd daemon (not pppd) . I'm not using accel-pppd, so cannot help here. I would suggest to try "git bisect" kernel version which started to be problematic for accel-pppd. Providing state of ppp channels and ppp units could help to debug this issue, but I'm not sure if accel-pppd has this debug feature. IIRC only process which has ppp file descriptors can retrieve and dump this information. > And yes after users started to connecting . > > When system boot and connect first time all user connect without any problem . > In time of work user disconnect and connect (power cut , fiber cut or other problem in network) , but in time of spike (may be make lock or other problem ) disconnect ~ 400-500 users and affect other users. Process go to load over 100% and In statistic I see many finishing connection and many start connection. > And in this time in log get many lines with ioctl(PPPIOCCONNECT): Transport endpoint is not connected. After finish (unlock or other) stop to see this error and system is back to normal. And connect all disconnected users. > > Martin > > > On 8 Aug 2021, at 18:23, Pali Rohár <pali@kernel.org> wrote: > > > > Hello! > > > > On Sunday 08 August 2021 18:14:09 Martin Zaharinov wrote: > >> Add Pali Rohár, > >> > >> If have any idea . > >> > >> Martin > >> > >>> On 6 Aug 2021, at 7:40, Greg KH <gregkh@linuxfoundation.org> wrote: > >>> > >>> On Thu, Aug 05, 2021 at 11:53:50PM +0300, Martin Zaharinov wrote: > >>>> Hi Net dev team > >>>> > >>>> > >>>> Please check this error : > >>>> Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html > >>>> > >>>> But not find any solution. > >>>> > >>>> Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. > >>>> > >>>> Server is work fine users is down/up 500+ users . > >>>> But in one moment server make spike and affect other vlans in same server . > > > > When this error started to happen? After kernel upgrade? After pppd > > upgrade? Or after system upgrade? Or when more users started to > > connecting? > > > >>>> And in accel I see many row with this error. > >>>> > >>>> Is there options to find and fix this bug. > >>>> > >>>> With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. > >>>> > >>>> > >>>> [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> > >>> These are userspace error messages, not kernel messages. > >>> > >>> What kernel version are you using? > > > > Yes, we need to know, what kernel version are you using. > > > >>> thanks, > >>> > >>> greg k-h > >> > > > > And also another question, what version of pppd daemon are you using? > > > > Also, are you able to dump state of ppp channels and ppp units? It is > > needed to know to which tty device, file descriptor (or socket > > extension) is (or should be) particular ppp channel bounded. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-09 15:15 ` Pali Rohár @ 2021-08-10 18:27 ` Martin Zaharinov 2021-08-11 16:40 ` Guillaume Nault 2021-08-11 11:10 ` Martin Zaharinov 1 sibling, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-08-10 18:27 UTC (permalink / raw) To: Pali Rohár; +Cc: Greg KH, netdev, Eric Dumazet, Guillaume Nault Add Guillaume Nault > On 9 Aug 2021, at 18:15, Pali Rohár <pali@kernel.org> wrote: > > On Sunday 08 August 2021 18:29:30 Martin Zaharinov wrote: >> Hi Pali >> >> Kernel 5.13.8 >> >> >> The problem is from kernel 5.8 > I try all major update 5.9, 5.10, 5.11 ,5.12 >> >> I use accel-pppd daemon (not pppd) . > > I'm not using accel-pppd, so cannot help here. > > I would suggest to try "git bisect" kernel version which started to be > problematic for accel-pppd. > > Providing state of ppp channels and ppp units could help to debug this > issue, but I'm not sure if accel-pppd has this debug feature. IIRC only > process which has ppp file descriptors can retrieve and dump this > information. > >> And yes after users started to connecting . >> >> When system boot and connect first time all user connect without any problem . >> In time of work user disconnect and connect (power cut , fiber cut or other problem in network) , but in time of spike (may be make lock or other problem ) disconnect ~ 400-500 users and affect other users. Process go to load over 100% and In statistic I see many finishing connection and many start connection. >> And in this time in log get many lines with ioctl(PPPIOCCONNECT): Transport endpoint is not connected. After finish (unlock or other) stop to see this error and system is back to normal. And connect all disconnected users. >> >> Martin >> >>> On 8 Aug 2021, at 18:23, Pali Rohár <pali@kernel.org> wrote: >>> >>> Hello! >>> >>> On Sunday 08 August 2021 18:14:09 Martin Zaharinov wrote: >>>> Add Pali Rohár, >>>> >>>> If have any idea . >>>> >>>> Martin >>>> >>>>> On 6 Aug 2021, at 7:40, Greg KH <gregkh@linuxfoundation.org> wrote: >>>>> >>>>> On Thu, Aug 05, 2021 at 11:53:50PM +0300, Martin Zaharinov wrote: >>>>>> Hi Net dev team >>>>>> >>>>>> >>>>>> Please check this error : >>>>>> Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html >>>>>> >>>>>> But not find any solution. >>>>>> >>>>>> Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. >>>>>> >>>>>> Server is work fine users is down/up 500+ users . >>>>>> But in one moment server make spike and affect other vlans in same server . >>> >>> When this error started to happen? After kernel upgrade? After pppd >>> upgrade? Or after system upgrade? Or when more users started to >>> connecting? >>> >>>>>> And in accel I see many row with this error. >>>>>> >>>>>> Is there options to find and fix this bug. >>>>>> >>>>>> With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. >>>>>> >>>>>> >>>>>> [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>> >>>>> These are userspace error messages, not kernel messages. >>>>> >>>>> What kernel version are you using? >>> >>> Yes, we need to know, what kernel version are you using. >>> >>>>> thanks, >>>>> >>>>> greg k-h >>>> >>> >>> And also another question, what version of pppd daemon are you using? >>> >>> Also, are you able to dump state of ppp channels and ppp units? It is >>> needed to know to which tty device, file descriptor (or socket >>> extension) is (or should be) particular ppp channel bounded. >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-10 18:27 ` Martin Zaharinov @ 2021-08-11 16:40 ` Guillaume Nault 0 siblings, 0 replies; 23+ messages in thread From: Guillaume Nault @ 2021-08-11 16:40 UTC (permalink / raw) To: Martin Zaharinov; +Cc: Pali Rohár, Greg KH, netdev, Eric Dumazet On Tue, Aug 10, 2021 at 09:27:14PM +0300, Martin Zaharinov wrote: > Add Guillaume Nault > > > On 9 Aug 2021, at 18:15, Pali Rohár <pali@kernel.org> wrote: > > > > On Sunday 08 August 2021 18:29:30 Martin Zaharinov wrote: > >>>>>> [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>>>> [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected The PPPIOCCONNECT ioctl returns -ENOTCONN if the ppp channel has been unregistered. From a user space point of view, this means that accel-ppp establishes PPPoE sessions, starts negociating PPP connection parameters on top of them (LCP and authentication) and finally the PPPoE sessions get disconnected before accel-ppp connects them to ppp units (units are roughly the "pppX" network devices). Unregistration of PPPoE channels can happen for the following reasons: * Changing some parameters of the network interface used by the PPPoE connection: MAC address, MTU, bringing the device down. * Reception of a PADT (PPPoE disconnection message sent from the peer). * Closing the PPPoE socket. * Re-connecting a PPPoE socket with a different session ID (this unregisters the previous channel and creates a new one, so that shouldn't be the problem you're facing here). Given that this seems to affect all PPPoE connections, I guess something happened to the underlying network interface (1st bullet point). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-09 15:15 ` Pali Rohár 2021-08-10 18:27 ` Martin Zaharinov @ 2021-08-11 11:10 ` Martin Zaharinov 2021-08-11 16:48 ` Guillaume Nault 1 sibling, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-08-11 11:10 UTC (permalink / raw) To: Pali Rohár, Guillaume Nault; +Cc: Greg KH, netdev, Eric Dumazet And one more that see. Problem is come when accel start finishing sessions, Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans , And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans. May be kernel destroy old session slow and entrained other users by locking other sessions. is there a way to speed up the closing of stopped/dead sessions. Martin > On 9 Aug 2021, at 18:15, Pali Rohár <pali@kernel.org> wrote: > > On Sunday 08 August 2021 18:29:30 Martin Zaharinov wrote: >> Hi Pali >> >> Kernel 5.13.8 >> >> >> The problem is from kernel 5.8 > I try all major update 5.9, 5.10, 5.11 ,5.12 >> >> I use accel-pppd daemon (not pppd) . > > I'm not using accel-pppd, so cannot help here. > > I would suggest to try "git bisect" kernel version which started to be > problematic for accel-pppd. > > Providing state of ppp channels and ppp units could help to debug this > issue, but I'm not sure if accel-pppd has this debug feature. IIRC only > process which has ppp file descriptors can retrieve and dump this > information. > >> And yes after users started to connecting . >> >> When system boot and connect first time all user connect without any problem . >> In time of work user disconnect and connect (power cut , fiber cut or other problem in network) , but in time of spike (may be make lock or other problem ) disconnect ~ 400-500 users and affect other users. Process go to load over 100% and In statistic I see many finishing connection and many start connection. >> And in this time in log get many lines with ioctl(PPPIOCCONNECT): Transport endpoint is not connected. After finish (unlock or other) stop to see this error and system is back to normal. And connect all disconnected users. >> >> Martin >> >>> On 8 Aug 2021, at 18:23, Pali Rohár <pali@kernel.org> wrote: >>> >>> Hello! >>> >>> On Sunday 08 August 2021 18:14:09 Martin Zaharinov wrote: >>>> Add Pali Rohár, >>>> >>>> If have any idea . >>>> >>>> Martin >>>> >>>>> On 6 Aug 2021, at 7:40, Greg KH <gregkh@linuxfoundation.org> wrote: >>>>> >>>>> On Thu, Aug 05, 2021 at 11:53:50PM +0300, Martin Zaharinov wrote: >>>>>> Hi Net dev team >>>>>> >>>>>> >>>>>> Please check this error : >>>>>> Last time I write for this problem : https://www.spinics.net/lists/netdev/msg707513.html >>>>>> >>>>>> But not find any solution. >>>>>> >>>>>> Config of server is : Bonding port channel (LACP) > Accel PPP server > Huawei switch. >>>>>> >>>>>> Server is work fine users is down/up 500+ users . >>>>>> But in one moment server make spike and affect other vlans in same server . >>> >>> When this error started to happen? After kernel upgrade? After pppd >>> upgrade? Or after system upgrade? Or when more users started to >>> connecting? >>> >>>>>> And in accel I see many row with this error. >>>>>> >>>>>> Is there options to find and fix this bug. >>>>>> >>>>>> With accel team I discus this problem and they claim it is kernel bug and need to find solution with Kernel dev team. >>>>>> >>>>>> >>>>>> [2021-08-05 13:52:05.294] vlan912: 24b205903d09718e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:05.298] vlan912: 24b205903d097162: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:05.626] vlan641: 24b205903d09711b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:11.000] vlan912: 24b205903d097105: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:17.852] vlan912: 24b205903d0971ae: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:21.113] vlan641: 24b205903d09715b: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:27.963] vlan912: 24b205903d09718d: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:30.249] vlan496: 24b205903d097184: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:30.992] vlan420: 24b205903d09718a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:33.937] vlan640: 24b205903d0971cd: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:40.032] vlan912: 24b205903d097182: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:40.420] vlan912: 24b205903d0971d5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:42.799] vlan912: 24b205903d09713a: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:42.799] vlan614: 24b205903d0971e5: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.102] vlan912: 24b205903d097190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097153: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.850] vlan479: 24b205903d097141: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.852] vlan912: 24b205903d097198: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:43.977] vlan637: 24b205903d097148: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>>> [2021-08-05 13:52:44.528] vlan637: 24b205903d0971c3: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>>>> >>>>> These are userspace error messages, not kernel messages. >>>>> >>>>> What kernel version are you using? >>> >>> Yes, we need to know, what kernel version are you using. >>> >>>>> thanks, >>>>> >>>>> greg k-h >>>> >>> >>> And also another question, what version of pppd daemon are you using? >>> >>> Also, are you able to dump state of ppp channels and ppp units? It is >>> needed to know to which tty device, file descriptor (or socket >>> extension) is (or should be) particular ppp channel bounded. >> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-11 11:10 ` Martin Zaharinov @ 2021-08-11 16:48 ` Guillaume Nault 2021-09-07 6:16 ` Martin Zaharinov 0 siblings, 1 reply; 23+ messages in thread From: Guillaume Nault @ 2021-08-11 16:48 UTC (permalink / raw) To: Martin Zaharinov; +Cc: Pali Rohár, Greg KH, netdev, Eric Dumazet On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote: > And one more that see. > > Problem is come when accel start finishing sessions, > Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans , > And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans. > May be kernel destroy old session slow and entrained other users by locking other sessions. > is there a way to speed up the closing of stopped/dead sessions. What are the CPU stats when that happen? Is it users space or kernel space that keeps it busy? One easy way to check is to run "mpstat 1" for a few seconds when the problem occurs. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-08-11 16:48 ` Guillaume Nault @ 2021-09-07 6:16 ` Martin Zaharinov 2021-09-07 6:42 ` Martin Zaharinov 0 siblings, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-09-07 6:16 UTC (permalink / raw) To: Guillaume Nault; +Cc: Pali Rohár, Greg KH, netdev, Eric Dumazet Hi Sorry for delay but not easy to catch moment . See this is mpstatl 1 : Linux 5.14.1 (demobng) 09/07/21 _x86_64_ (12 CPU) 11:12:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 11:12:17 all 0.17 0.00 6.66 0.00 0.00 4.13 0.00 0.00 0.00 89.05 11:12:18 all 0.25 0.00 8.36 0.00 0.00 4.88 0.00 0.00 0.00 86.51 11:12:19 all 0.26 0.00 9.62 0.00 0.00 3.91 0.00 0.00 0.00 86.21 11:12:20 all 0.85 0.00 6.00 0.00 0.00 4.31 0.00 0.00 0.00 88.84 11:12:21 all 0.08 0.00 4.45 0.00 0.00 4.79 0.00 0.00 0.00 90.67 11:12:22 all 0.17 0.00 9.50 0.00 0.00 4.58 0.00 0.00 0.00 85.75 11:12:23 all 0.00 0.00 6.92 0.00 0.00 2.48 0.00 0.00 0.00 90.61 11:12:24 all 0.17 0.00 5.45 0.00 0.00 4.27 0.00 0.00 0.00 90.11 11:12:25 all 0.25 0.00 5.38 0.00 0.00 4.79 0.00 0.00 0.00 89.58 11:12:26 all 0.60 0.00 1.45 0.00 0.00 2.65 0.00 0.00 0.00 95.30 11:12:27 all 0.42 0.00 6.91 0.00 0.00 4.47 0.00 0.00 0.00 88.20 11:12:28 all 0.00 0.00 6.75 0.00 0.00 4.18 0.00 0.00 0.00 89.07 11:12:29 all 0.17 0.00 3.52 0.00 0.00 5.11 0.00 0.00 0.00 91.20 11:12:30 all 1.45 0.00 10.14 0.00 0.00 3.49 0.00 0.00 0.00 84.92 11:12:31 all 0.09 0.00 5.11 0.00 0.00 4.77 0.00 0.00 0.00 90.03 11:12:32 all 0.25 0.00 3.11 0.00 0.00 4.46 0.00 0.00 0.00 92.17 Average: all 0.32 0.00 6.21 0.00 0.00 4.21 0.00 0.00 0.00 89.26 I attache and one screenshot from perf top (Screenshot is send on preview mail) And I see in lsmod pppoe 20480 8198 pppox 16384 1 pppoe ppp_generic 45056 16364 pppox,pppoe slhc 16384 1 ppp_generic To slow remove pppoe session . And from log : [2021-09-07 11:01:11.129] vlan3020: ebdd1c5d8b5900f6: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-09-07 11:01:53.621] vlan643: ebdd1c5d8b59014e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-09-07 11:02:00.359] vlan1616: ebdd1c5d8b590195: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-09-07 11:02:05.859] vlan3020: ebdd1c5d8b5900d8: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-09-07 11:02:08.258] vlan3005: ebdd1c5d8b590190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-09-07 11:02:13.820] vlan643: ebdd1c5d8b590152: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-09-07 11:02:15.839] vlan727: ebdd1c5d8b590144: ioctl(PPPIOCCONNECT): Transport endpoint is not connected [2021-09-07 11:02:20.139] vlan1693: ebdd1c5d8b59019f: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > On 11 Aug 2021, at 19:48, Guillaume Nault <gnault@redhat.com> wrote: > > On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote: >> And one more that see. >> >> Problem is come when accel start finishing sessions, >> Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans , >> And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans. >> May be kernel destroy old session slow and entrained other users by locking other sessions. >> is there a way to speed up the closing of stopped/dead sessions. > > What are the CPU stats when that happen? Is it users space or kernel > space that keeps it busy? > > One easy way to check is to run "mpstat 1" for a few seconds when the > problem occurs. > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-07 6:16 ` Martin Zaharinov @ 2021-09-07 6:42 ` Martin Zaharinov 2021-09-11 6:26 ` Martin Zaharinov 0 siblings, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-09-07 6:42 UTC (permalink / raw) To: Guillaume Nault; +Cc: Pali Rohár, Greg KH, netdev, Eric Dumazet Perf top from text PerfTop: 28391 irqs/sec kernel:98.0% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 17.01% [nf_conntrack] [k] nf_ct_iterate_cleanup 9.73% [kernel] [k] mutex_spin_on_owner 9.07% [pppoe] [k] pppoe_rcv 2.77% [nf_nat] [k] device_cmp 1.66% [kernel] [k] osq_lock 1.65% [kernel] [k] _raw_spin_lock 1.61% [kernel] [k] __local_bh_enable_ip 1.35% [nf_nat] [k] inet_cmp 1.30% [kernel] [k] __netif_receive_skb_core.constprop.0 1.16% [kernel] [k] menu_select 0.99% [kernel] [k] cpuidle_enter_state 0.96% [ixgbe] [k] ixgbe_clean_rx_irq 0.86% [kernel] [k] __dev_queue_xmit 0.70% [kernel] [k] __cond_resched 0.69% [sch_cake] [k] cake_dequeue 0.67% [nf_tables] [k] nft_do_chain 0.63% [kernel] [k] rcu_all_qs 0.61% [kernel] [k] fib_table_lookup 0.57% [kernel] [k] __schedule 0.57% [kernel] [k] skb_release_data 0.54% [kernel] [k] sched_clock 0.54% [kernel] [k] __copy_skb_header 0.53% [kernel] [k] dev_queue_xmit_nit 0.53% [kernel] [k] _raw_spin_lock_irqsave 0.50% [kernel] [k] kmem_cache_free 0.48% libfrr.so.0.0.0 [.] 0x00000000000ce970 0.47% [ixgbe] [k] ixgbe_clean_tx_irq 0.45% [kernel] [k] timerqueue_add 0.45% [kernel] [k] lapic_next_deadline 0.45% [kernel] [k] csum_partial_copy_generic 0.44% [nf_flow_table] [k] nf_flow_offload_ip_hook 0.44% [kernel] [k] kmem_cache_alloc 0.44% [nf_conntrack] [k] nf_conntrack_lock > On 7 Sep 2021, at 9:16, Martin Zaharinov <micron10@gmail.com> wrote: > > Hi > Sorry for delay but not easy to catch moment . > > > See this is mpstatl 1 : > > Linux 5.14.1 (demobng) 09/07/21 _x86_64_ (12 CPU) > > 11:12:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle > 11:12:17 all 0.17 0.00 6.66 0.00 0.00 4.13 0.00 0.00 0.00 89.05 > 11:12:18 all 0.25 0.00 8.36 0.00 0.00 4.88 0.00 0.00 0.00 86.51 > 11:12:19 all 0.26 0.00 9.62 0.00 0.00 3.91 0.00 0.00 0.00 86.21 > 11:12:20 all 0.85 0.00 6.00 0.00 0.00 4.31 0.00 0.00 0.00 88.84 > 11:12:21 all 0.08 0.00 4.45 0.00 0.00 4.79 0.00 0.00 0.00 90.67 > 11:12:22 all 0.17 0.00 9.50 0.00 0.00 4.58 0.00 0.00 0.00 85.75 > 11:12:23 all 0.00 0.00 6.92 0.00 0.00 2.48 0.00 0.00 0.00 90.61 > 11:12:24 all 0.17 0.00 5.45 0.00 0.00 4.27 0.00 0.00 0.00 90.11 > 11:12:25 all 0.25 0.00 5.38 0.00 0.00 4.79 0.00 0.00 0.00 89.58 > 11:12:26 all 0.60 0.00 1.45 0.00 0.00 2.65 0.00 0.00 0.00 95.30 > 11:12:27 all 0.42 0.00 6.91 0.00 0.00 4.47 0.00 0.00 0.00 88.20 > 11:12:28 all 0.00 0.00 6.75 0.00 0.00 4.18 0.00 0.00 0.00 89.07 > 11:12:29 all 0.17 0.00 3.52 0.00 0.00 5.11 0.00 0.00 0.00 91.20 > 11:12:30 all 1.45 0.00 10.14 0.00 0.00 3.49 0.00 0.00 0.00 84.92 > 11:12:31 all 0.09 0.00 5.11 0.00 0.00 4.77 0.00 0.00 0.00 90.03 > 11:12:32 all 0.25 0.00 3.11 0.00 0.00 4.46 0.00 0.00 0.00 92.17 > Average: all 0.32 0.00 6.21 0.00 0.00 4.21 0.00 0.00 0.00 89.26 > > > I attache and one screenshot from perf top (Screenshot is send on preview mail) > > And I see in lsmod > > pppoe 20480 8198 > pppox 16384 1 pppoe > ppp_generic 45056 16364 pppox,pppoe > slhc 16384 1 ppp_generic > > To slow remove pppoe session . > > And from log : > > [2021-09-07 11:01:11.129] vlan3020: ebdd1c5d8b5900f6: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-09-07 11:01:53.621] vlan643: ebdd1c5d8b59014e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-09-07 11:02:00.359] vlan1616: ebdd1c5d8b590195: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-09-07 11:02:05.859] vlan3020: ebdd1c5d8b5900d8: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-09-07 11:02:08.258] vlan3005: ebdd1c5d8b590190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-09-07 11:02:13.820] vlan643: ebdd1c5d8b590152: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-09-07 11:02:15.839] vlan727: ebdd1c5d8b590144: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > [2021-09-07 11:02:20.139] vlan1693: ebdd1c5d8b59019f: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >> On 11 Aug 2021, at 19:48, Guillaume Nault <gnault@redhat.com> wrote: >> >> On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote: >>> And one more that see. >>> >>> Problem is come when accel start finishing sessions, >>> Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans , >>> And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans. >>> May be kernel destroy old session slow and entrained other users by locking other sessions. >>> is there a way to speed up the closing of stopped/dead sessions. >> >> What are the CPU stats when that happen? Is it users space or kernel >> space that keeps it busy? >> >> One easy way to check is to run "mpstat 1" for a few seconds when the >> problem occurs. >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-07 6:42 ` Martin Zaharinov @ 2021-09-11 6:26 ` Martin Zaharinov 2021-09-14 6:16 ` Martin Zaharinov 0 siblings, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-09-11 6:26 UTC (permalink / raw) To: Guillaume Nault; +Cc: Pali Rohár, Greg KH, netdev, Eric Dumazet Hi Guillaume Main problem is overload of service because have many finishing ppp (customer) last two day down from 40-50 to 100-200 users and make problem when is happen if try to type : ip a wait 10-20 sec to start list interface . But how to find where is a problem any locking or other. And is there options to make fast remove ppp interface from kernel to reduce this load. Martin > On 7 Sep 2021, at 9:42, Martin Zaharinov <micron10@gmail.com> wrote: > > Perf top from text > > > PerfTop: 28391 irqs/sec kernel:98.0% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 17.01% [nf_conntrack] [k] nf_ct_iterate_cleanup > 9.73% [kernel] [k] mutex_spin_on_owner > 9.07% [pppoe] [k] pppoe_rcv > 2.77% [nf_nat] [k] device_cmp > 1.66% [kernel] [k] osq_lock > 1.65% [kernel] [k] _raw_spin_lock > 1.61% [kernel] [k] __local_bh_enable_ip > 1.35% [nf_nat] [k] inet_cmp > 1.30% [kernel] [k] __netif_receive_skb_core.constprop.0 > 1.16% [kernel] [k] menu_select > 0.99% [kernel] [k] cpuidle_enter_state > 0.96% [ixgbe] [k] ixgbe_clean_rx_irq > 0.86% [kernel] [k] __dev_queue_xmit > 0.70% [kernel] [k] __cond_resched > 0.69% [sch_cake] [k] cake_dequeue > 0.67% [nf_tables] [k] nft_do_chain > 0.63% [kernel] [k] rcu_all_qs > 0.61% [kernel] [k] fib_table_lookup > 0.57% [kernel] [k] __schedule > 0.57% [kernel] [k] skb_release_data > 0.54% [kernel] [k] sched_clock > 0.54% [kernel] [k] __copy_skb_header > 0.53% [kernel] [k] dev_queue_xmit_nit > 0.53% [kernel] [k] _raw_spin_lock_irqsave > 0.50% [kernel] [k] kmem_cache_free > 0.48% libfrr.so.0.0.0 [.] 0x00000000000ce970 > 0.47% [ixgbe] [k] ixgbe_clean_tx_irq > 0.45% [kernel] [k] timerqueue_add > 0.45% [kernel] [k] lapic_next_deadline > 0.45% [kernel] [k] csum_partial_copy_generic > 0.44% [nf_flow_table] [k] nf_flow_offload_ip_hook > 0.44% [kernel] [k] kmem_cache_alloc > 0.44% [nf_conntrack] [k] nf_conntrack_lock > >> On 7 Sep 2021, at 9:16, Martin Zaharinov <micron10@gmail.com> wrote: >> >> Hi >> Sorry for delay but not easy to catch moment . >> >> >> See this is mpstatl 1 : >> >> Linux 5.14.1 (demobng) 09/07/21 _x86_64_ (12 CPU) >> >> 11:12:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle >> 11:12:17 all 0.17 0.00 6.66 0.00 0.00 4.13 0.00 0.00 0.00 89.05 >> 11:12:18 all 0.25 0.00 8.36 0.00 0.00 4.88 0.00 0.00 0.00 86.51 >> 11:12:19 all 0.26 0.00 9.62 0.00 0.00 3.91 0.00 0.00 0.00 86.21 >> 11:12:20 all 0.85 0.00 6.00 0.00 0.00 4.31 0.00 0.00 0.00 88.84 >> 11:12:21 all 0.08 0.00 4.45 0.00 0.00 4.79 0.00 0.00 0.00 90.67 >> 11:12:22 all 0.17 0.00 9.50 0.00 0.00 4.58 0.00 0.00 0.00 85.75 >> 11:12:23 all 0.00 0.00 6.92 0.00 0.00 2.48 0.00 0.00 0.00 90.61 >> 11:12:24 all 0.17 0.00 5.45 0.00 0.00 4.27 0.00 0.00 0.00 90.11 >> 11:12:25 all 0.25 0.00 5.38 0.00 0.00 4.79 0.00 0.00 0.00 89.58 >> 11:12:26 all 0.60 0.00 1.45 0.00 0.00 2.65 0.00 0.00 0.00 95.30 >> 11:12:27 all 0.42 0.00 6.91 0.00 0.00 4.47 0.00 0.00 0.00 88.20 >> 11:12:28 all 0.00 0.00 6.75 0.00 0.00 4.18 0.00 0.00 0.00 89.07 >> 11:12:29 all 0.17 0.00 3.52 0.00 0.00 5.11 0.00 0.00 0.00 91.20 >> 11:12:30 all 1.45 0.00 10.14 0.00 0.00 3.49 0.00 0.00 0.00 84.92 >> 11:12:31 all 0.09 0.00 5.11 0.00 0.00 4.77 0.00 0.00 0.00 90.03 >> 11:12:32 all 0.25 0.00 3.11 0.00 0.00 4.46 0.00 0.00 0.00 92.17 >> Average: all 0.32 0.00 6.21 0.00 0.00 4.21 0.00 0.00 0.00 89.26 >> >> >> I attache and one screenshot from perf top (Screenshot is send on preview mail) >> >> And I see in lsmod >> >> pppoe 20480 8198 >> pppox 16384 1 pppoe >> ppp_generic 45056 16364 pppox,pppoe >> slhc 16384 1 ppp_generic >> >> To slow remove pppoe session . >> >> And from log : >> >> [2021-09-07 11:01:11.129] vlan3020: ebdd1c5d8b5900f6: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-09-07 11:01:53.621] vlan643: ebdd1c5d8b59014e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-09-07 11:02:00.359] vlan1616: ebdd1c5d8b590195: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-09-07 11:02:05.859] vlan3020: ebdd1c5d8b5900d8: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-09-07 11:02:08.258] vlan3005: ebdd1c5d8b590190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-09-07 11:02:13.820] vlan643: ebdd1c5d8b590152: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-09-07 11:02:15.839] vlan727: ebdd1c5d8b590144: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> [2021-09-07 11:02:20.139] vlan1693: ebdd1c5d8b59019f: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >> >>> On 11 Aug 2021, at 19:48, Guillaume Nault <gnault@redhat.com> wrote: >>> >>> On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote: >>>> And one more that see. >>>> >>>> Problem is come when accel start finishing sessions, >>>> Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans , >>>> And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans. >>>> May be kernel destroy old session slow and entrained other users by locking other sessions. >>>> is there a way to speed up the closing of stopped/dead sessions. >>> >>> What are the CPU stats when that happen? Is it users space or kernel >>> space that keeps it busy? >>> >>> One easy way to check is to run "mpstat 1" for a few seconds when the >>> problem occurs. >>> >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-11 6:26 ` Martin Zaharinov @ 2021-09-14 6:16 ` Martin Zaharinov 2021-09-14 8:02 ` Guillaume Nault 0 siblings, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-09-14 6:16 UTC (permalink / raw) To: Guillaume Nault; +Cc: Pali Rohár, Greg KH, netdev, Eric Dumazet Hi Nault See this stats : Linux 5.14.2 (testb) 09/14/21 _x86_64_ (12 CPU) 11:33:44 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 11:33:45 all 1.75 0.00 18.85 0.00 0.00 5.00 0.00 0.00 0.00 74.40 11:33:46 all 1.74 0.00 17.88 0.00 0.00 4.72 0.00 0.00 0.00 75.66 11:33:47 all 2.23 0.00 17.62 0.00 0.00 5.05 0.00 0.00 0.00 75.10 11:33:48 all 1.82 0.00 13.64 0.00 0.00 5.70 0.00 0.00 0.00 78.84 11:33:49 all 1.50 0.00 13.46 0.00 0.00 5.15 0.00 0.00 0.00 79.90 11:33:50 all 3.06 0.00 13.96 0.00 0.00 4.79 0.00 0.00 0.00 78.20 11:33:51 all 1.40 0.00 16.53 0.00 0.00 5.21 0.00 0.00 0.00 76.86 11:33:52 all 4.43 0.00 19.44 0.00 0.00 6.56 0.00 0.00 0.00 69.57 11:33:53 all 1.51 0.00 16.40 0.00 0.00 4.77 0.00 0.00 0.00 77.32 11:33:54 all 1.51 0.00 16.55 0.00 0.00 4.71 0.00 0.00 0.00 77.23 11:33:55 all 1.00 0.00 13.21 0.00 0.00 5.90 0.00 0.00 0.00 79.90 Average: all 2.00 0.00 16.14 0.00 0.00 5.23 0.00 0.00 0.00 76.63 PerfTop: 28046 irqs/sec kernel:96.3% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 23.37% [nf_conntrack] [k] nf_ct_iterate_cleanup 17.76% [kernel] [k] mutex_spin_on_owner 9.47% [pppoe] [k] pppoe_rcv 7.71% [kernel] [k] osq_lock 2.77% [nf_nat] [k] inet_cmp 2.59% [nf_nat] [k] device_cmp 2.55% [kernel] [k] __local_bh_enable_ip 2.04% [kernel] [k] _raw_spin_lock 1.23% [kernel] [k] __cond_resched 1.16% [kernel] [k] rcu_all_qs 1.13% libfrr.so.0.0.0 [.] 0x00000000000ce970 0.79% [nf_conntrack] [k] nf_conntrack_lock 0.75% libfrr.so.0.0.0 [.] 0x00000000000ce94e 0.53% [kernel] [k] __netif_receive_skb_core.constprop.0 0.46% [kernel] [k] fib_table_lookup 0.46% [ip_tables] [k] ipt_do_table 0.45% [ixgbe] [k] ixgbe_clean_rx_irq 0.37% [kernel] [k] __dev_queue_xmit 0.34% [nf_conntrack] [k] __nf_conntrack_find_get.isra.0 0.33% [ixgbe] [k] ixgbe_clean_tx_irq 0.30% [kernel] [k] menu_select 0.25% [kernel] [k] vlan_do_receive 0.21% [kernel] [k] ip_finish_output2 0.21% [ixgbe] [k] ixgbe_poll 0.20% [kernel] [k] _raw_spin_lock_irqsave 0.19% [kernel] [k] get_rps_cpu 0.19% libc.so.6 [.] 0x0000000000186afa 0.19% [kernel] [k] queued_read_lock_slowpath 0.19% [kernel] [k] do_poll.constprop.0 0.19% [kernel] [k] cpuidle_enter_state 0.18% [kernel] [k] dev_hard_start_xmit 0.18% [kernel] [k] ___slab_alloc.constprop.0 0.17% zebra [.] 0x00000000000b9271 0.16% [kernel] [k] csum_partial_copy_generic 0.16% zebra [.] 0x00000000000b91f1 0.16% [kernel] [k] page_frag_free 0.16% [kernel] [k] kmem_cache_alloc 0.15% [kernel] [k] __skb_flow_dissect 0.15% [kernel] [k] sched_clock 0.15% libc.so.6 [.] 0x00000000000965a2 0.15% [kernel] [k] kmem_cache_free_bulk.part.0 0.15% [pppoe] [k] pppoe_flush_dev 0.15% [ixgbe] [k] ixgbe_tx_map 0.14% [kernel] [k] _raw_spin_lock_bh 0.14% [kernel] [k] fib_table_flush 0.14% [kernel] [k] native_irq_return_iret 0.14% [kernel] [k] __dev_xmit_skb 0.13% [kernel] [k] nf_hook_slow 0.13% [kernel] [k] fib_lookup_good_nhc 0.12% [kernel] [k] __fget_files 0.12% [kernel] [k] process_backlog 0.12% [xt_dtvqos] [k] 0x00000000000008d1 0.12% [kernel] [k] __list_del_entry_valid 0.12% [kernel] [k] skb_release_data 0.12% [kernel] [k] ip_route_input_slow 0.11% [kernel] [k] netif_skb_features 0.11% [kernel] [k] sock_poll 0.11% [kernel] [k] __schedule 0.11% [kernel] [k] __softirqentry_text_start And on time of problem when try to write : ip a to list interface wait 15-20 sec i finaly have options to simulate but users is angry when down internet. In case need to know why system is overloaded when deconfig ppp interface. Best regards, Martin > On 11 Sep 2021, at 9:26, Martin Zaharinov <micron10@gmail.com> wrote: > > Hi Guillaume > > Main problem is overload of service because have many finishing ppp (customer) last two day down from 40-50 to 100-200 users and make problem when is happen if try to type : ip a wait 10-20 sec to start list interface . > But how to find where is a problem any locking or other. > And is there options to make fast remove ppp interface from kernel to reduce this load. > > > Martin > >> On 7 Sep 2021, at 9:42, Martin Zaharinov <micron10@gmail.com> wrote: >> >> Perf top from text >> >> >> PerfTop: 28391 irqs/sec kernel:98.0% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> 17.01% [nf_conntrack] [k] nf_ct_iterate_cleanup >> 9.73% [kernel] [k] mutex_spin_on_owner >> 9.07% [pppoe] [k] pppoe_rcv >> 2.77% [nf_nat] [k] device_cmp >> 1.66% [kernel] [k] osq_lock >> 1.65% [kernel] [k] _raw_spin_lock >> 1.61% [kernel] [k] __local_bh_enable_ip >> 1.35% [nf_nat] [k] inet_cmp >> 1.30% [kernel] [k] __netif_receive_skb_core.constprop.0 >> 1.16% [kernel] [k] menu_select >> 0.99% [kernel] [k] cpuidle_enter_state >> 0.96% [ixgbe] [k] ixgbe_clean_rx_irq >> 0.86% [kernel] [k] __dev_queue_xmit >> 0.70% [kernel] [k] __cond_resched >> 0.69% [sch_cake] [k] cake_dequeue >> 0.67% [nf_tables] [k] nft_do_chain >> 0.63% [kernel] [k] rcu_all_qs >> 0.61% [kernel] [k] fib_table_lookup >> 0.57% [kernel] [k] __schedule >> 0.57% [kernel] [k] skb_release_data >> 0.54% [kernel] [k] sched_clock >> 0.54% [kernel] [k] __copy_skb_header >> 0.53% [kernel] [k] dev_queue_xmit_nit >> 0.53% [kernel] [k] _raw_spin_lock_irqsave >> 0.50% [kernel] [k] kmem_cache_free >> 0.48% libfrr.so.0.0.0 [.] 0x00000000000ce970 >> 0.47% [ixgbe] [k] ixgbe_clean_tx_irq >> 0.45% [kernel] [k] timerqueue_add >> 0.45% [kernel] [k] lapic_next_deadline >> 0.45% [kernel] [k] csum_partial_copy_generic >> 0.44% [nf_flow_table] [k] nf_flow_offload_ip_hook >> 0.44% [kernel] [k] kmem_cache_alloc >> 0.44% [nf_conntrack] [k] nf_conntrack_lock >> >>> On 7 Sep 2021, at 9:16, Martin Zaharinov <micron10@gmail.com> wrote: >>> >>> Hi >>> Sorry for delay but not easy to catch moment . >>> >>> >>> See this is mpstatl 1 : >>> >>> Linux 5.14.1 (demobng) 09/07/21 _x86_64_ (12 CPU) >>> >>> 11:12:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle >>> 11:12:17 all 0.17 0.00 6.66 0.00 0.00 4.13 0.00 0.00 0.00 89.05 >>> 11:12:18 all 0.25 0.00 8.36 0.00 0.00 4.88 0.00 0.00 0.00 86.51 >>> 11:12:19 all 0.26 0.00 9.62 0.00 0.00 3.91 0.00 0.00 0.00 86.21 >>> 11:12:20 all 0.85 0.00 6.00 0.00 0.00 4.31 0.00 0.00 0.00 88.84 >>> 11:12:21 all 0.08 0.00 4.45 0.00 0.00 4.79 0.00 0.00 0.00 90.67 >>> 11:12:22 all 0.17 0.00 9.50 0.00 0.00 4.58 0.00 0.00 0.00 85.75 >>> 11:12:23 all 0.00 0.00 6.92 0.00 0.00 2.48 0.00 0.00 0.00 90.61 >>> 11:12:24 all 0.17 0.00 5.45 0.00 0.00 4.27 0.00 0.00 0.00 90.11 >>> 11:12:25 all 0.25 0.00 5.38 0.00 0.00 4.79 0.00 0.00 0.00 89.58 >>> 11:12:26 all 0.60 0.00 1.45 0.00 0.00 2.65 0.00 0.00 0.00 95.30 >>> 11:12:27 all 0.42 0.00 6.91 0.00 0.00 4.47 0.00 0.00 0.00 88.20 >>> 11:12:28 all 0.00 0.00 6.75 0.00 0.00 4.18 0.00 0.00 0.00 89.07 >>> 11:12:29 all 0.17 0.00 3.52 0.00 0.00 5.11 0.00 0.00 0.00 91.20 >>> 11:12:30 all 1.45 0.00 10.14 0.00 0.00 3.49 0.00 0.00 0.00 84.92 >>> 11:12:31 all 0.09 0.00 5.11 0.00 0.00 4.77 0.00 0.00 0.00 90.03 >>> 11:12:32 all 0.25 0.00 3.11 0.00 0.00 4.46 0.00 0.00 0.00 92.17 >>> Average: all 0.32 0.00 6.21 0.00 0.00 4.21 0.00 0.00 0.00 89.26 >>> >>> >>> I attache and one screenshot from perf top (Screenshot is send on preview mail) >>> >>> And I see in lsmod >>> >>> pppoe 20480 8198 >>> pppox 16384 1 pppoe >>> ppp_generic 45056 16364 pppox,pppoe >>> slhc 16384 1 ppp_generic >>> >>> To slow remove pppoe session . >>> >>> And from log : >>> >>> [2021-09-07 11:01:11.129] vlan3020: ebdd1c5d8b5900f6: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> [2021-09-07 11:01:53.621] vlan643: ebdd1c5d8b59014e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> [2021-09-07 11:02:00.359] vlan1616: ebdd1c5d8b590195: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> [2021-09-07 11:02:05.859] vlan3020: ebdd1c5d8b5900d8: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> [2021-09-07 11:02:08.258] vlan3005: ebdd1c5d8b590190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> [2021-09-07 11:02:13.820] vlan643: ebdd1c5d8b590152: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> [2021-09-07 11:02:15.839] vlan727: ebdd1c5d8b590144: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> [2021-09-07 11:02:20.139] vlan1693: ebdd1c5d8b59019f: ioctl(PPPIOCCONNECT): Transport endpoint is not connected >>> >>>> On 11 Aug 2021, at 19:48, Guillaume Nault <gnault@redhat.com> wrote: >>>> >>>> On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote: >>>>> And one more that see. >>>>> >>>>> Problem is come when accel start finishing sessions, >>>>> Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans , >>>>> And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans. >>>>> May be kernel destroy old session slow and entrained other users by locking other sessions. >>>>> is there a way to speed up the closing of stopped/dead sessions. >>>> >>>> What are the CPU stats when that happen? Is it users space or kernel >>>> space that keeps it busy? >>>> >>>> One easy way to check is to run "mpstat 1" for a few seconds when the >>>> problem occurs. >>>> >>> >> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-14 6:16 ` Martin Zaharinov @ 2021-09-14 8:02 ` Guillaume Nault 2021-09-14 9:50 ` Florian Westphal 0 siblings, 1 reply; 23+ messages in thread From: Guillaume Nault @ 2021-09-14 8:02 UTC (permalink / raw) To: Martin Zaharinov; +Cc: Pali Rohár, Greg KH, netdev, Eric Dumazet On Tue, Sep 14, 2021 at 09:16:55AM +0300, Martin Zaharinov wrote: > Hi Nault > > See this stats : > > Linux 5.14.2 (testb) 09/14/21 _x86_64_ (12 CPU) > > 11:33:44 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle > 11:33:45 all 1.75 0.00 18.85 0.00 0.00 5.00 0.00 0.00 0.00 74.40 > 11:33:46 all 1.74 0.00 17.88 0.00 0.00 4.72 0.00 0.00 0.00 75.66 > 11:33:47 all 2.23 0.00 17.62 0.00 0.00 5.05 0.00 0.00 0.00 75.10 > 11:33:48 all 1.82 0.00 13.64 0.00 0.00 5.70 0.00 0.00 0.00 78.84 > 11:33:49 all 1.50 0.00 13.46 0.00 0.00 5.15 0.00 0.00 0.00 79.90 > 11:33:50 all 3.06 0.00 13.96 0.00 0.00 4.79 0.00 0.00 0.00 78.20 > 11:33:51 all 1.40 0.00 16.53 0.00 0.00 5.21 0.00 0.00 0.00 76.86 > 11:33:52 all 4.43 0.00 19.44 0.00 0.00 6.56 0.00 0.00 0.00 69.57 > 11:33:53 all 1.51 0.00 16.40 0.00 0.00 4.77 0.00 0.00 0.00 77.32 > 11:33:54 all 1.51 0.00 16.55 0.00 0.00 4.71 0.00 0.00 0.00 77.23 > 11:33:55 all 1.00 0.00 13.21 0.00 0.00 5.90 0.00 0.00 0.00 79.90 > Average: all 2.00 0.00 16.14 0.00 0.00 5.23 0.00 0.00 0.00 76.63 > > > PerfTop: 28046 irqs/sec kernel:96.3% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 23.37% [nf_conntrack] [k] nf_ct_iterate_cleanup > 17.76% [kernel] [k] mutex_spin_on_owner > 9.47% [pppoe] [k] pppoe_rcv > 7.71% [kernel] [k] osq_lock > 2.77% [nf_nat] [k] inet_cmp > 2.59% [nf_nat] [k] device_cmp > 2.55% [kernel] [k] __local_bh_enable_ip > 2.04% [kernel] [k] _raw_spin_lock > 1.23% [kernel] [k] __cond_resched > 1.16% [kernel] [k] rcu_all_qs > 1.13% libfrr.so.0.0.0 [.] 0x00000000000ce970 > 0.79% [nf_conntrack] [k] nf_conntrack_lock > 0.75% libfrr.so.0.0.0 [.] 0x00000000000ce94e > 0.53% [kernel] [k] __netif_receive_skb_core.constprop.0 > 0.46% [kernel] [k] fib_table_lookup > 0.46% [ip_tables] [k] ipt_do_table > 0.45% [ixgbe] [k] ixgbe_clean_rx_irq > 0.37% [kernel] [k] __dev_queue_xmit > 0.34% [nf_conntrack] [k] __nf_conntrack_find_get.isra.0 > 0.33% [ixgbe] [k] ixgbe_clean_tx_irq > 0.30% [kernel] [k] menu_select > 0.25% [kernel] [k] vlan_do_receive > 0.21% [kernel] [k] ip_finish_output2 > 0.21% [ixgbe] [k] ixgbe_poll > 0.20% [kernel] [k] _raw_spin_lock_irqsave > 0.19% [kernel] [k] get_rps_cpu > 0.19% libc.so.6 [.] 0x0000000000186afa > 0.19% [kernel] [k] queued_read_lock_slowpath > 0.19% [kernel] [k] do_poll.constprop.0 > 0.19% [kernel] [k] cpuidle_enter_state > 0.18% [kernel] [k] dev_hard_start_xmit > 0.18% [kernel] [k] ___slab_alloc.constprop.0 > 0.17% zebra [.] 0x00000000000b9271 > 0.16% [kernel] [k] csum_partial_copy_generic > 0.16% zebra [.] 0x00000000000b91f1 > 0.16% [kernel] [k] page_frag_free > 0.16% [kernel] [k] kmem_cache_alloc > 0.15% [kernel] [k] __skb_flow_dissect > 0.15% [kernel] [k] sched_clock > 0.15% libc.so.6 [.] 0x00000000000965a2 > 0.15% [kernel] [k] kmem_cache_free_bulk.part.0 > 0.15% [pppoe] [k] pppoe_flush_dev > 0.15% [ixgbe] [k] ixgbe_tx_map > 0.14% [kernel] [k] _raw_spin_lock_bh > 0.14% [kernel] [k] fib_table_flush > 0.14% [kernel] [k] native_irq_return_iret > 0.14% [kernel] [k] __dev_xmit_skb > 0.13% [kernel] [k] nf_hook_slow > 0.13% [kernel] [k] fib_lookup_good_nhc > 0.12% [kernel] [k] __fget_files > 0.12% [kernel] [k] process_backlog > 0.12% [xt_dtvqos] [k] 0x00000000000008d1 > 0.12% [kernel] [k] __list_del_entry_valid > 0.12% [kernel] [k] skb_release_data > 0.12% [kernel] [k] ip_route_input_slow > 0.11% [kernel] [k] netif_skb_features > 0.11% [kernel] [k] sock_poll > 0.11% [kernel] [k] __schedule > 0.11% [kernel] [k] __softirqentry_text_start > > > And on time of problem when try to write : ip a > to list interface wait 15-20 sec i finaly have options to simulate but users is angry when down internet. Probably some contention on the rtnl lock. > In case need to know why system is overloaded when deconfig ppp interface. Does it help if you disable conntrack? > > Best regards, > Martin > > > > > > On 11 Sep 2021, at 9:26, Martin Zaharinov <micron10@gmail.com> wrote: > > > > Hi Guillaume > > > > Main problem is overload of service because have many finishing ppp (customer) last two day down from 40-50 to 100-200 users and make problem when is happen if try to type : ip a wait 10-20 sec to start list interface . > > But how to find where is a problem any locking or other. > > And is there options to make fast remove ppp interface from kernel to reduce this load. > > > > > > Martin > > > >> On 7 Sep 2021, at 9:42, Martin Zaharinov <micron10@gmail.com> wrote: > >> > >> Perf top from text > >> > >> > >> PerfTop: 28391 irqs/sec kernel:98.0% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) > >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >> > >> 17.01% [nf_conntrack] [k] nf_ct_iterate_cleanup > >> 9.73% [kernel] [k] mutex_spin_on_owner > >> 9.07% [pppoe] [k] pppoe_rcv > >> 2.77% [nf_nat] [k] device_cmp > >> 1.66% [kernel] [k] osq_lock > >> 1.65% [kernel] [k] _raw_spin_lock > >> 1.61% [kernel] [k] __local_bh_enable_ip > >> 1.35% [nf_nat] [k] inet_cmp > >> 1.30% [kernel] [k] __netif_receive_skb_core.constprop.0 > >> 1.16% [kernel] [k] menu_select > >> 0.99% [kernel] [k] cpuidle_enter_state > >> 0.96% [ixgbe] [k] ixgbe_clean_rx_irq > >> 0.86% [kernel] [k] __dev_queue_xmit > >> 0.70% [kernel] [k] __cond_resched > >> 0.69% [sch_cake] [k] cake_dequeue > >> 0.67% [nf_tables] [k] nft_do_chain > >> 0.63% [kernel] [k] rcu_all_qs > >> 0.61% [kernel] [k] fib_table_lookup > >> 0.57% [kernel] [k] __schedule > >> 0.57% [kernel] [k] skb_release_data > >> 0.54% [kernel] [k] sched_clock > >> 0.54% [kernel] [k] __copy_skb_header > >> 0.53% [kernel] [k] dev_queue_xmit_nit > >> 0.53% [kernel] [k] _raw_spin_lock_irqsave > >> 0.50% [kernel] [k] kmem_cache_free > >> 0.48% libfrr.so.0.0.0 [.] 0x00000000000ce970 > >> 0.47% [ixgbe] [k] ixgbe_clean_tx_irq > >> 0.45% [kernel] [k] timerqueue_add > >> 0.45% [kernel] [k] lapic_next_deadline > >> 0.45% [kernel] [k] csum_partial_copy_generic > >> 0.44% [nf_flow_table] [k] nf_flow_offload_ip_hook > >> 0.44% [kernel] [k] kmem_cache_alloc > >> 0.44% [nf_conntrack] [k] nf_conntrack_lock > >> > >>> On 7 Sep 2021, at 9:16, Martin Zaharinov <micron10@gmail.com> wrote: > >>> > >>> Hi > >>> Sorry for delay but not easy to catch moment . > >>> > >>> > >>> See this is mpstatl 1 : > >>> > >>> Linux 5.14.1 (demobng) 09/07/21 _x86_64_ (12 CPU) > >>> > >>> 11:12:16 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle > >>> 11:12:17 all 0.17 0.00 6.66 0.00 0.00 4.13 0.00 0.00 0.00 89.05 > >>> 11:12:18 all 0.25 0.00 8.36 0.00 0.00 4.88 0.00 0.00 0.00 86.51 > >>> 11:12:19 all 0.26 0.00 9.62 0.00 0.00 3.91 0.00 0.00 0.00 86.21 > >>> 11:12:20 all 0.85 0.00 6.00 0.00 0.00 4.31 0.00 0.00 0.00 88.84 > >>> 11:12:21 all 0.08 0.00 4.45 0.00 0.00 4.79 0.00 0.00 0.00 90.67 > >>> 11:12:22 all 0.17 0.00 9.50 0.00 0.00 4.58 0.00 0.00 0.00 85.75 > >>> 11:12:23 all 0.00 0.00 6.92 0.00 0.00 2.48 0.00 0.00 0.00 90.61 > >>> 11:12:24 all 0.17 0.00 5.45 0.00 0.00 4.27 0.00 0.00 0.00 90.11 > >>> 11:12:25 all 0.25 0.00 5.38 0.00 0.00 4.79 0.00 0.00 0.00 89.58 > >>> 11:12:26 all 0.60 0.00 1.45 0.00 0.00 2.65 0.00 0.00 0.00 95.30 > >>> 11:12:27 all 0.42 0.00 6.91 0.00 0.00 4.47 0.00 0.00 0.00 88.20 > >>> 11:12:28 all 0.00 0.00 6.75 0.00 0.00 4.18 0.00 0.00 0.00 89.07 > >>> 11:12:29 all 0.17 0.00 3.52 0.00 0.00 5.11 0.00 0.00 0.00 91.20 > >>> 11:12:30 all 1.45 0.00 10.14 0.00 0.00 3.49 0.00 0.00 0.00 84.92 > >>> 11:12:31 all 0.09 0.00 5.11 0.00 0.00 4.77 0.00 0.00 0.00 90.03 > >>> 11:12:32 all 0.25 0.00 3.11 0.00 0.00 4.46 0.00 0.00 0.00 92.17 > >>> Average: all 0.32 0.00 6.21 0.00 0.00 4.21 0.00 0.00 0.00 89.26 > >>> > >>> > >>> I attache and one screenshot from perf top (Screenshot is send on preview mail) > >>> > >>> And I see in lsmod > >>> > >>> pppoe 20480 8198 > >>> pppox 16384 1 pppoe > >>> ppp_generic 45056 16364 pppox,pppoe > >>> slhc 16384 1 ppp_generic > >>> > >>> To slow remove pppoe session . > >>> > >>> And from log : > >>> > >>> [2021-09-07 11:01:11.129] vlan3020: ebdd1c5d8b5900f6: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> [2021-09-07 11:01:53.621] vlan643: ebdd1c5d8b59014e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> [2021-09-07 11:02:00.359] vlan1616: ebdd1c5d8b590195: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> [2021-09-07 11:02:05.859] vlan3020: ebdd1c5d8b5900d8: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> [2021-09-07 11:02:08.258] vlan3005: ebdd1c5d8b590190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> [2021-09-07 11:02:13.820] vlan643: ebdd1c5d8b590152: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> [2021-09-07 11:02:15.839] vlan727: ebdd1c5d8b590144: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> [2021-09-07 11:02:20.139] vlan1693: ebdd1c5d8b59019f: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>> > >>>> On 11 Aug 2021, at 19:48, Guillaume Nault <gnault@redhat.com> wrote: > >>>> > >>>> On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote: > >>>>> And one more that see. > >>>>> > >>>>> Problem is come when accel start finishing sessions, > >>>>> Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans , > >>>>> And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans. > >>>>> May be kernel destroy old session slow and entrained other users by locking other sessions. > >>>>> is there a way to speed up the closing of stopped/dead sessions. > >>>> > >>>> What are the CPU stats when that happen? Is it users space or kernel > >>>> space that keeps it busy? > >>>> > >>>> One easy way to check is to run "mpstat 1" for a few seconds when the > >>>> problem occurs. > >>>> > >>> > >> > > > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-14 8:02 ` Guillaume Nault @ 2021-09-14 9:50 ` Florian Westphal 2021-09-14 10:01 ` Martin Zaharinov 2021-09-14 10:53 ` Martin Zaharinov 0 siblings, 2 replies; 23+ messages in thread From: Florian Westphal @ 2021-09-14 9:50 UTC (permalink / raw) To: Guillaume Nault Cc: Martin Zaharinov, Pali Rohár, Greg KH, netdev, Eric Dumazet Guillaume Nault <gnault@redhat.com> wrote: > > And on time of problem when try to write : ip a > > to list interface wait 15-20 sec i finaly have options to simulate but users is angry when down internet. > > Probably some contention on the rtnl lock. Yes, I'll create a patch. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-14 9:50 ` Florian Westphal @ 2021-09-14 10:01 ` Martin Zaharinov 2021-09-14 11:00 ` Florian Westphal 2021-09-14 10:53 ` Martin Zaharinov 1 sibling, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-09-14 10:01 UTC (permalink / raw) To: Florian Westphal Cc: Guillaume Nault, Pali Rohár, Greg KH, netdev, Eric Dumazet Hi Nault and Florian Nault : No not test need conntrack to log user traffic. Florian: If you make patch send to test please. Martin > On 14 Sep 2021, at 12:50, Florian Westphal <fw@strlen.de> wrote: > > Guillaume Nault <gnault@redhat.com> wrote: >>> And on time of problem when try to write : ip a >>> to list interface wait 15-20 sec i finaly have options to simulate but users is angry when down internet. >> >> Probably some contention on the rtnl lock. > > Yes, I'll create a patch. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-14 10:01 ` Martin Zaharinov @ 2021-09-14 11:00 ` Florian Westphal 2021-09-15 14:25 ` Martin Zaharinov 2021-09-16 20:00 ` Martin Zaharinov 0 siblings, 2 replies; 23+ messages in thread From: Florian Westphal @ 2021-09-14 11:00 UTC (permalink / raw) To: Martin Zaharinov; +Cc: Florian Westphal, Guillaume Nault, netdev [-- Attachment #1: Type: text/plain, Size: 238 bytes --] Martin Zaharinov <micron10@gmail.com> wrote: [ Trimming CC list ] > Florian: > > If you make patch send to test please. Attached. No idea if it helps, but 'ip' should stay responsive even when masquerade processes netdevice events. [-- Attachment #2: defer_masq_work.diff --] [-- Type: text/x-diff, Size: 6674 bytes --] diff --git a/net/netfilter/nf_nat_masquerade.c b/net/netfilter/nf_nat_masquerade.c index 8e8a65d46345..50c6d6992ed6 100644 --- a/net/netfilter/nf_nat_masquerade.c +++ b/net/netfilter/nf_nat_masquerade.c @@ -9,8 +9,19 @@ #include <net/netfilter/nf_nat_masquerade.h> +struct masq_dev_work { + struct work_struct work; + struct net *net; + union nf_inet_addr addr; + int ifindex; + int (*iter)(struct nf_conn *i, void *data); +}; + +#define MAX_MASQ_WORKER_COUNT 16 + static DEFINE_MUTEX(masq_mutex); static unsigned int masq_refcnt __read_mostly; +static atomic_t masq_worker_count __read_mostly; unsigned int nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, @@ -63,13 +74,68 @@ nf_nat_masquerade_ipv4(struct sk_buff *skb, unsigned int hooknum, } EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv4); -static int device_cmp(struct nf_conn *i, void *ifindex) +static void iterate_cleanup_work(struct work_struct *work) +{ + struct masq_dev_work *w; + + w = container_of(work, struct masq_dev_work, work); + + nf_ct_iterate_cleanup_net(w->net, w->iter, (void *)w, 0, 0); + + put_net(w->net); + kfree(w); + atomic_dec(&masq_worker_count); + module_put(THIS_MODULE); +} + +/* Iterate conntrack table in the background and remove conntrack entries + * that use the device/address being removed. + * + * In case too many work items have been queued already or memory allocation + * fails iteration is skipped, conntrack entries will time out eventually. + */ +static void nf_nat_masq_schedule(struct net *net, union nf_inet_addr *addr, + int ifindex, + int (*iter)(struct nf_conn *i, void *data), + gfp_t gfp_flags) +{ + struct masq_dev_work *w; + + net = maybe_get_net(net); + if (!net) + return; + + if (!try_module_get(THIS_MODULE)) + goto err_module; + + w = kzalloc(sizeof(*w), gfp_flags); + if (w) { + /* We can overshoot MAX_MASQ_WORKER_COUNT, no big deal */ + atomic_inc(&masq_worker_count); + + INIT_WORK(&w->work, iterate_cleanup_work); + w->ifindex = ifindex; + w->net = net; + w->iter = iter; + if (addr) + w->addr = *addr; + schedule_work(&w->work); + return; + } + + module_put(THIS_MODULE); + err_module: + put_net(net); +} + +static int device_cmp(struct nf_conn *i, void *arg) { const struct nf_conn_nat *nat = nfct_nat(i); + const struct masq_dev_work *w = arg; if (!nat) return 0; - return nat->masq_index == (int)(long)ifindex; + return nat->masq_index == w->ifindex; } static int masq_device_event(struct notifier_block *this, @@ -85,8 +151,8 @@ static int masq_device_event(struct notifier_block *this, * and forget them. */ - nf_ct_iterate_cleanup_net(net, device_cmp, - (void *)(long)dev->ifindex, 0, 0); + nf_nat_masq_schedule(net, NULL, dev->ifindex, + device_cmp, GFP_KERNEL); } return NOTIFY_DONE; @@ -94,35 +160,45 @@ static int masq_device_event(struct notifier_block *this, static int inet_cmp(struct nf_conn *ct, void *ptr) { - struct in_ifaddr *ifa = (struct in_ifaddr *)ptr; - struct net_device *dev = ifa->ifa_dev->dev; struct nf_conntrack_tuple *tuple; + struct masq_dev_work *w = ptr; - if (!device_cmp(ct, (void *)(long)dev->ifindex)) + if (!device_cmp(ct, ptr)) return 0; tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple; - return ifa->ifa_address == tuple->dst.u3.ip; + return nf_inet_addr_cmp(&w->addr, &tuple->dst.u3); } static int masq_inet_event(struct notifier_block *this, unsigned long event, void *ptr) { - struct in_device *idev = ((struct in_ifaddr *)ptr)->ifa_dev; - struct net *net = dev_net(idev->dev); + const struct in_ifaddr *ifa = ptr; + const struct in_device *idev; + const struct net_device *dev; + union nf_inet_addr addr; + + if (event != NETDEV_DOWN) + return NOTIFY_DONE; /* The masq_dev_notifier will catch the case of the device going * down. So if the inetdev is dead and being destroyed we have * no work to do. Otherwise this is an individual address removal * and we have to perform the flush. */ + idev = ifa->ifa_dev; if (idev->dead) return NOTIFY_DONE; - if (event == NETDEV_DOWN) - nf_ct_iterate_cleanup_net(net, inet_cmp, ptr, 0, 0); + memset(&addr, 0, sizeof(addr)); + + addr.ip = ifa->ifa_address; + + dev = idev->dev; + nf_nat_masq_schedule(dev_net(idev->dev), &addr, dev->ifindex, + inet_cmp, GFP_KERNEL); return NOTIFY_DONE; } @@ -136,8 +212,6 @@ static struct notifier_block masq_inet_notifier = { }; #if IS_ENABLED(CONFIG_IPV6) -static atomic_t v6_worker_count __read_mostly; - static int nat_ipv6_dev_get_saddr(struct net *net, const struct net_device *dev, const struct in6_addr *daddr, unsigned int srcprefs, @@ -187,40 +261,6 @@ nf_nat_masquerade_ipv6(struct sk_buff *skb, const struct nf_nat_range2 *range, } EXPORT_SYMBOL_GPL(nf_nat_masquerade_ipv6); -struct masq_dev_work { - struct work_struct work; - struct net *net; - struct in6_addr addr; - int ifindex; -}; - -static int inet6_cmp(struct nf_conn *ct, void *work) -{ - struct masq_dev_work *w = (struct masq_dev_work *)work; - struct nf_conntrack_tuple *tuple; - - if (!device_cmp(ct, (void *)(long)w->ifindex)) - return 0; - - tuple = &ct->tuplehash[IP_CT_DIR_REPLY].tuple; - - return ipv6_addr_equal(&w->addr, &tuple->dst.u3.in6); -} - -static void iterate_cleanup_work(struct work_struct *work) -{ - struct masq_dev_work *w; - - w = container_of(work, struct masq_dev_work, work); - - nf_ct_iterate_cleanup_net(w->net, inet6_cmp, (void *)w, 0, 0); - - put_net(w->net); - kfree(w); - atomic_dec(&v6_worker_count); - module_put(THIS_MODULE); -} - /* atomic notifier; can't call nf_ct_iterate_cleanup_net (it can sleep). * * Defer it to the system workqueue. @@ -233,36 +273,19 @@ static int masq_inet6_event(struct notifier_block *this, { struct inet6_ifaddr *ifa = ptr; const struct net_device *dev; - struct masq_dev_work *w; - struct net *net; + union nf_inet_addr addr; - if (event != NETDEV_DOWN || atomic_read(&v6_worker_count) >= 16) + if (event != NETDEV_DOWN) return NOTIFY_DONE; dev = ifa->idev->dev; - net = maybe_get_net(dev_net(dev)); - if (!net) - return NOTIFY_DONE; - - if (!try_module_get(THIS_MODULE)) - goto err_module; - w = kmalloc(sizeof(*w), GFP_ATOMIC); - if (w) { - atomic_inc(&v6_worker_count); + memset(&addr, 0, sizeof(addr)); - INIT_WORK(&w->work, iterate_cleanup_work); - w->ifindex = dev->ifindex; - w->net = net; - w->addr = ifa->addr; - schedule_work(&w->work); + addr.in6 = ifa->addr; - return NOTIFY_DONE; - } - - module_put(THIS_MODULE); - err_module: - put_net(net); + nf_nat_masq_schedule(dev_net(dev), &addr, dev->ifindex, inet_cmp, + GFP_ATOMIC); return NOTIFY_DONE; } ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-14 11:00 ` Florian Westphal @ 2021-09-15 14:25 ` Martin Zaharinov 2021-09-15 14:37 ` Martin Zaharinov 2021-09-16 20:00 ` Martin Zaharinov 1 sibling, 1 reply; 23+ messages in thread From: Martin Zaharinov @ 2021-09-15 14:25 UTC (permalink / raw) To: Florian Westphal; +Cc: Guillaume Nault, netdev Hey Florian make test in lab and look much better that before. see this perf PerfTop: 6551 irqs/sec kernel:77.8% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 15.70% [ixgbe] [k] ixgbe_read_reg 13.33% [kernel] [k] mutex_spin_on_owner 7.65% [kernel] [k] osq_lock 2.85% libfrr.so.0.0.0 [.] 0x00000000000ce970 1.94% libfrr.so.0.0.0 [.] 0x00000000000ce94e 1.19% libc.so.6 [.] 0x0000000000186afa 1.15% [kernel] [k] do_poll.constprop.0 0.99% [kernel] [k] inet_dump_ifaddr 0.94% libteam.so.5.6.1 [.] 0x0000000000006470 0.79% libc.so.6 [.] 0x0000000000186e57 0.71% [ixgbe] [k] ixgbe_update_mc_addr_list_generic 0.65% [kernel] [k] __fget_files 0.61% [kernel] [k] sock_poll 0.57% libteam.so.5.6.1 [.] 0x0000000000009e7d 0.51% perf [.] 0x00000000000bc7b3 0.51% libteam.so.5.6.1 [.] 0x0000000000006501 0.48% [kernel] [k] next_uptodate_page 0.46% [kernel] [k] _raw_read_lock_bh 0.43% libc.so.6 [.] 0x0000000000186eac 0.42% bgpd [.] 0x0000000000070a46 0.41% [pppoe] [k] pppoe_flush_dev 0.39% [kernel] [k] zap_pte_range This happened when remove and add new interface on time of drop and reconnect users. now : ip a command work fine ! Martin > On 14 Sep 2021, at 14:00, Florian Westphal <fw@strlen.de> wrote: > > Martin Zaharinov <micron10@gmail.com> wrote: > > [ Trimming CC list ] > >> Florian: >> >> If you make patch send to test please. > > Attached. No idea if it helps, but 'ip' should stay responsive > even when masquerade processes netdevice events. > <defer_masq_work.diff> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-15 14:25 ` Martin Zaharinov @ 2021-09-15 14:37 ` Martin Zaharinov 0 siblings, 0 replies; 23+ messages in thread From: Martin Zaharinov @ 2021-09-15 14:37 UTC (permalink / raw) To: Florian Westphal; +Cc: Guillaume Nault, netdev and this : PerfTop: 26378 irqs/sec kernel:61.4% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 5.65% libfrr.so.0.0.0 [.] 0x00000000000ce970 5.56% [kernel] [k] osq_lock 5.22% [kernel] [k] mutex_spin_on_owner 3.66% [pppoe] [k] pppoe_flush_dev 3.01% libfrr.so.0.0.0 [.] 0x00000000000ce94e 1.98% libc.so.6 [.] 0x00000000000965a2 1.84% libc.so.6 [.] 0x0000000000186afa 1.55% libc.so.6 [.] 0x0000000000186e57 1.54% zebra [.] 0x00000000000b9271 1.46% zebra [.] 0x00000000000b91f1 1.46% libteam.so.5.6.1 [.] 0x0000000000006470 1.44% libc.so.6 [.] 0x00000000000965a0 1.30% libteam.so.5.6.1 [.] 0x0000000000009e7d 1.08% [kernel] [k] fib_table_flush 1.02% libc.so.6 [.] 0x0000000000186eac 0.93% [kernel] [k] do_poll.constprop.0 0.85% libc.so.6 [.] 0x0000000000186afe 0.80% dtvbras [.] 0x0000000000014be8 0.78% [kernel] [k] queued_read_lock_slowpath 0.72% [kernel] [k] next_uptodate_page 0.64% [kernel] [k] zap_pte_range 0.64% bgpd [.] 0x0000000000070a46 0.61% [kernel] [k] fib_table_insert > On 15 Sep 2021, at 17:25, Martin Zaharinov <micron10@gmail.com> wrote: > > Hey Florian > > make test in lab and look much better that before. > > see this perf > > PerfTop: 6551 irqs/sec kernel:77.8% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > 15.70% [ixgbe] [k] ixgbe_read_reg > 13.33% [kernel] [k] mutex_spin_on_owner > 7.65% [kernel] [k] osq_lock > 2.85% libfrr.so.0.0.0 [.] 0x00000000000ce970 > 1.94% libfrr.so.0.0.0 [.] 0x00000000000ce94e > 1.19% libc.so.6 [.] 0x0000000000186afa > 1.15% [kernel] [k] do_poll.constprop.0 > 0.99% [kernel] [k] inet_dump_ifaddr > 0.94% libteam.so.5.6.1 [.] 0x0000000000006470 > 0.79% libc.so.6 [.] 0x0000000000186e57 > 0.71% [ixgbe] [k] ixgbe_update_mc_addr_list_generic > 0.65% [kernel] [k] __fget_files > 0.61% [kernel] [k] sock_poll > 0.57% libteam.so.5.6.1 [.] 0x0000000000009e7d > 0.51% perf [.] 0x00000000000bc7b3 > 0.51% libteam.so.5.6.1 [.] 0x0000000000006501 > 0.48% [kernel] [k] next_uptodate_page > 0.46% [kernel] [k] _raw_read_lock_bh > 0.43% libc.so.6 [.] 0x0000000000186eac > 0.42% bgpd [.] 0x0000000000070a46 > 0.41% [pppoe] [k] pppoe_flush_dev > 0.39% [kernel] [k] zap_pte_range > > > This happened when remove and add new interface on time of drop and reconnect users. > > > now : ip a command work fine ! > > > Martin > > >> On 14 Sep 2021, at 14:00, Florian Westphal <fw@strlen.de> wrote: >> >> Martin Zaharinov <micron10@gmail.com> wrote: >> >> [ Trimming CC list ] >> >>> Florian: >>> >>> If you make patch send to test please. >> >> Attached. No idea if it helps, but 'ip' should stay responsive >> even when masquerade processes netdevice events. >> <defer_masq_work.diff> > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-14 11:00 ` Florian Westphal 2021-09-15 14:25 ` Martin Zaharinov @ 2021-09-16 20:00 ` Martin Zaharinov 1 sibling, 0 replies; 23+ messages in thread From: Martin Zaharinov @ 2021-09-16 20:00 UTC (permalink / raw) To: Florian Westphal; +Cc: Guillaume Nault, netdev Small Updates After switch from frr to bird bgp reduce load from frr but still when have disconnect 5k+ users have slow pppoe_flush_dev PerfTop: 15606 irqs/sec kernel:77.7% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 8.24% [kernel] [k] osq_lock 7.55% [kernel] [k] mutex_spin_on_owner 7.04% [pppoe] [k] pppoe_flush_dev 2.77% libteam.so.5.6.1 [.] 0x0000000000009e7d 2.67% libteam.so.5.6.1 [.] 0x0000000000006470 1.90% [kernel] [k] fib_table_flush 1.73% [kernel] [k] queued_read_lock_slowpath 1.68% [kernel] [k] next_uptodate_page 1.36% ip [.] 0x0000000000011b74 1.23% ip [.] 0x00000000000121b0 1.09% [kernel] [k] zap_pte_range 0.99% libteam.so.5.6.1 [.] 0x0000000000006501 0.88% dtvbras [.] 0x0000000000014be8 0.87% [kernel] [k] inet_dump_ifaddr 0.74% [kernel] [k] filemap_map_pages 0.72% [kernel] [k] neigh_flush_dev.isra.0 0.66% [kernel] [k] snmp_get_cpu_field 0.65% [kernel] [k] fib_table_insert 0.63% [kernel] [k] native_irq_return_iret 0.63% libteam.so.5.6.1 [.] 0x0000000000005c78 0.60% [kernel] [k] copy_page 0.52% libteam.so.5.6.1 [.] 0x000000000000647f 0.50% [kernel] [k] _raw_spin_lock 0.48% libc.so.6 [.] 0x00000000000965a2 0.45% [kernel] [k] _raw_read_lock_bh 0.44% [kernel] [k] release_pages 0.42% [kernel] [k] clear_page_erms 0.42% [kernel] [k] page_remove_rmap 0.41% [kernel] [k] queued_spin_lock_slowpath 0.38% [kernel] [k] kmem_cache_alloc 0.36% [kernel] [k] vma_interval_tree_insert 0.36% libteam.so.5.6.1 [.] 0x0000000000009e6f 0.36% [kernel] [k] do_set_pte sessions: starting: 296 active: 3868 finishing: 6748 > On 14 Sep 2021, at 14:00, Florian Westphal <fw@strlen.de> wrote: > > Martin Zaharinov <micron10@gmail.com> wrote: > > [ Trimming CC list ] > >> Florian: >> >> If you make patch send to test please. > > Attached. No idea if it helps, but 'ip' should stay responsive > even when masquerade processes netdevice events. > <defer_masq_work.diff> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected 2021-09-14 9:50 ` Florian Westphal 2021-09-14 10:01 ` Martin Zaharinov @ 2021-09-14 10:53 ` Martin Zaharinov 1 sibling, 0 replies; 23+ messages in thread From: Martin Zaharinov @ 2021-09-14 10:53 UTC (permalink / raw) To: Florian Westphal Cc: Guillaume Nault, Pali Rohár, Greg KH, netdev, Eric Dumazet Florian Hi One more please see i try to remove nf_nat and xt_MASQUERADE and on time of problem need 50-80 sec to remove and overload system . see perf from this moment: PerfTop: 1738 irqs/sec kernel:85.0% exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 12 CPUs) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 40.63% [nf_conntrack] [k] nf_ct_iterate_cleanup 21.23% [kernel] [k] __local_bh_enable_ip 10.93% [kernel] [k] __cond_resched 9.20% [kernel] [k] _raw_spin_lock 8.91% [kernel] [k] rcu_all_qs 5.83% [nf_conntrack] [k] nf_conntrack_lock 0.10% [kernel] [k] mutex_spin_on_owner 0.08% telegraf [.] 0x0000000000021bf0 0.06% [kernel] [k] osq_lock 0.06% [kernel] [k] kallsyms_expand_symbol.constprop.0 0.05% [kernel] [k] format_decode 0.04% [kernel] [k] rtnl_fill_ifinfo.constprop.0.isra.0 0.04% perf [.] 0x00000000000bc7b3 0.04% [kernel] [k] memcpy_erms 0.03% [kernel] [k] string 0.03% [kernel] [k] menu_select 0.03% [kernel] [k] nla_put 0.03% [kernel] [k] vsnprintf Martin > On 14 Sep 2021, at 12:50, Florian Westphal <fw@strlen.de> wrote: > > Guillaume Nault <gnault@redhat.com> wrote: >>> And on time of problem when try to write : ip a >>> to list interface wait 15-20 sec i finaly have options to simulate but users is angry when down internet. >> >> Probably some contention on the rtnl lock. > > Yes, I'll create a patch. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2021-09-16 20:00 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-08-05 20:53 Urgent Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected Martin Zaharinov 2021-08-06 4:40 ` Greg KH 2021-08-06 5:40 ` Martin Zaharinov 2021-08-08 15:14 ` Martin Zaharinov 2021-08-08 15:23 ` Pali Rohár 2021-08-08 15:29 ` Martin Zaharinov 2021-08-09 15:15 ` Pali Rohár 2021-08-10 18:27 ` Martin Zaharinov 2021-08-11 16:40 ` Guillaume Nault 2021-08-11 11:10 ` Martin Zaharinov 2021-08-11 16:48 ` Guillaume Nault 2021-09-07 6:16 ` Martin Zaharinov 2021-09-07 6:42 ` Martin Zaharinov 2021-09-11 6:26 ` Martin Zaharinov 2021-09-14 6:16 ` Martin Zaharinov 2021-09-14 8:02 ` Guillaume Nault 2021-09-14 9:50 ` Florian Westphal 2021-09-14 10:01 ` Martin Zaharinov 2021-09-14 11:00 ` Florian Westphal 2021-09-15 14:25 ` Martin Zaharinov 2021-09-15 14:37 ` Martin Zaharinov 2021-09-16 20:00 ` Martin Zaharinov 2021-09-14 10:53 ` Martin Zaharinov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).