Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Martin Zaharinov <micron10@gmail.com>
To: Guillaume Nault <gnault@redhat.com>
Cc: "Pali Rohár" <pali@kernel.org>,
	"Greg KH" <gregkh@linuxfoundation.org>,
	netdev <netdev@vger.kernel.org>,
	"Eric Dumazet" <eric.dumazet@gmail.com>
Subject: Re: Urgent  Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected
Date: Sat, 11 Sep 2021 09:26:45 +0300	[thread overview]
Message-ID: <A16DCD3E-43AA-4D50-97FC-EBB776481840@gmail.com> (raw)
In-Reply-To: <6A3B4C11-EF48-4CE9-9EC7-5882E330D7EA@gmail.com>

Hi Guillaume

Main problem is overload of service because have many finishing ppp (customer) last two day down from 40-50 to 100-200 users and make problem when is happen if try to type : ip a wait 10-20 sec to start list interface .
But how to find where is a problem any locking or other.
And is there options to make fast remove ppp interface from kernel to reduce this load.


Martin

> On 7 Sep 2021, at 9:42, Martin Zaharinov <micron10@gmail.com> wrote:
> 
> Perf top from text
> 
> 
> PerfTop:   28391 irqs/sec  kernel:98.0%  exact: 100.0% lost: 0/0 drop: 0/0 [4000Hz cycles],  (all, 12 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 
>    17.01%  [nf_conntrack]           [k] nf_ct_iterate_cleanup
>     9.73%  [kernel]                 [k] mutex_spin_on_owner
>     9.07%  [pppoe]                  [k] pppoe_rcv
>     2.77%  [nf_nat]                 [k] device_cmp
>     1.66%  [kernel]                 [k] osq_lock
>     1.65%  [kernel]                 [k] _raw_spin_lock
>     1.61%  [kernel]                 [k] __local_bh_enable_ip
>     1.35%  [nf_nat]                 [k] inet_cmp
>     1.30%  [kernel]                 [k] __netif_receive_skb_core.constprop.0
>     1.16%  [kernel]                 [k] menu_select
>     0.99%  [kernel]                 [k] cpuidle_enter_state
>     0.96%  [ixgbe]                  [k] ixgbe_clean_rx_irq
>     0.86%  [kernel]                 [k] __dev_queue_xmit
>     0.70%  [kernel]                 [k] __cond_resched
>     0.69%  [sch_cake]               [k] cake_dequeue
>     0.67%  [nf_tables]              [k] nft_do_chain
>     0.63%  [kernel]                 [k] rcu_all_qs
>     0.61%  [kernel]                 [k] fib_table_lookup
>     0.57%  [kernel]                 [k] __schedule
>     0.57%  [kernel]                 [k] skb_release_data
>     0.54%  [kernel]                 [k] sched_clock
>     0.54%  [kernel]                 [k] __copy_skb_header
>     0.53%  [kernel]                 [k] dev_queue_xmit_nit
>     0.53%  [kernel]                 [k] _raw_spin_lock_irqsave
>     0.50%  [kernel]                 [k] kmem_cache_free
>     0.48%  libfrr.so.0.0.0          [.] 0x00000000000ce970
>     0.47%  [ixgbe]                  [k] ixgbe_clean_tx_irq
>     0.45%  [kernel]                 [k] timerqueue_add
>     0.45%  [kernel]                 [k] lapic_next_deadline
>     0.45%  [kernel]                 [k] csum_partial_copy_generic
>     0.44%  [nf_flow_table]          [k] nf_flow_offload_ip_hook
>     0.44%  [kernel]                 [k] kmem_cache_alloc
>     0.44%  [nf_conntrack]           [k] nf_conntrack_lock
> 
>> On 7 Sep 2021, at 9:16, Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi 
>> Sorry for delay but not easy to catch moment .
>> 
>> 
>> See this is mpstatl 1 :
>> 
>> Linux 5.14.1 (demobng) 	09/07/21 	_x86_64_	(12 CPU)
>> 
>> 11:12:16     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
>> 11:12:17     all    0.17    0.00    6.66    0.00    0.00    4.13    0.00    0.00    0.00   89.05
>> 11:12:18     all    0.25    0.00    8.36    0.00    0.00    4.88    0.00    0.00    0.00   86.51
>> 11:12:19     all    0.26    0.00    9.62    0.00    0.00    3.91    0.00    0.00    0.00   86.21
>> 11:12:20     all    0.85    0.00    6.00    0.00    0.00    4.31    0.00    0.00    0.00   88.84
>> 11:12:21     all    0.08    0.00    4.45    0.00    0.00    4.79    0.00    0.00    0.00   90.67
>> 11:12:22     all    0.17    0.00    9.50    0.00    0.00    4.58    0.00    0.00    0.00   85.75
>> 11:12:23     all    0.00    0.00    6.92    0.00    0.00    2.48    0.00    0.00    0.00   90.61
>> 11:12:24     all    0.17    0.00    5.45    0.00    0.00    4.27    0.00    0.00    0.00   90.11
>> 11:12:25     all    0.25    0.00    5.38    0.00    0.00    4.79    0.00    0.00    0.00   89.58
>> 11:12:26     all    0.60    0.00    1.45    0.00    0.00    2.65    0.00    0.00    0.00   95.30
>> 11:12:27     all    0.42    0.00    6.91    0.00    0.00    4.47    0.00    0.00    0.00   88.20
>> 11:12:28     all    0.00    0.00    6.75    0.00    0.00    4.18    0.00    0.00    0.00   89.07
>> 11:12:29     all    0.17    0.00    3.52    0.00    0.00    5.11    0.00    0.00    0.00   91.20
>> 11:12:30     all    1.45    0.00   10.14    0.00    0.00    3.49    0.00    0.00    0.00   84.92
>> 11:12:31     all    0.09    0.00    5.11    0.00    0.00    4.77    0.00    0.00    0.00   90.03
>> 11:12:32     all    0.25    0.00    3.11    0.00    0.00    4.46    0.00    0.00    0.00   92.17
>> Average:     all    0.32    0.00    6.21    0.00    0.00    4.21    0.00    0.00    0.00   89.26
>> 
>> 
>> I attache and one screenshot from perf top (Screenshot is send on preview mail)
>> 
>> And I see in lsmod 
>> 
>> pppoe                  20480  8198
>> pppox                  16384  1 pppoe
>> ppp_generic            45056  16364 pppox,pppoe
>> slhc                   16384  1 ppp_generic
>> 
>> To slow remove pppoe session .
>> 
>> And from log : 
>> 
>> [2021-09-07 11:01:11.129] vlan3020: ebdd1c5d8b5900f6: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> [2021-09-07 11:01:53.621] vlan643: ebdd1c5d8b59014e: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> [2021-09-07 11:02:00.359] vlan1616: ebdd1c5d8b590195: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> [2021-09-07 11:02:05.859] vlan3020: ebdd1c5d8b5900d8: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> [2021-09-07 11:02:08.258] vlan3005: ebdd1c5d8b590190: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> [2021-09-07 11:02:13.820] vlan643: ebdd1c5d8b590152: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> [2021-09-07 11:02:15.839] vlan727: ebdd1c5d8b590144: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> [2021-09-07 11:02:20.139] vlan1693: ebdd1c5d8b59019f: ioctl(PPPIOCCONNECT): Transport endpoint is not connected
>> 
>>> On 11 Aug 2021, at 19:48, Guillaume Nault <gnault@redhat.com> wrote:
>>> 
>>> On Wed, Aug 11, 2021 at 02:10:32PM +0300, Martin Zaharinov wrote:
>>>> And one more that see.
>>>> 
>>>> Problem is come when accel start finishing sessions,
>>>> Now in server have 2k users and restart on one of vlans 3 Olt with 400 users and affect other vlans ,
>>>> And problem is start when start destroying dead sessions from vlan with 3 Olt and this affect all other vlans.
>>>> May be kernel destroy old session slow and entrained other users by locking other sessions.
>>>> is there a way to speed up the closing of stopped/dead sessions.
>>> 
>>> What are the CPU stats when that happen? Is it users space or kernel
>>> space that keeps it busy?
>>> 
>>> One easy way to check is to run "mpstat 1" for a few seconds when the
>>> problem occurs.
>>> 
>> 
> 


  reply	other threads:[~2021-09-11  6:26 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-05 20:53 Martin Zaharinov
2021-08-06  4:40 ` Greg KH
2021-08-06  5:40   ` Martin Zaharinov
2021-08-08 15:14   ` Martin Zaharinov
2021-08-08 15:23     ` Pali Rohár
2021-08-08 15:29       ` Martin Zaharinov
2021-08-09 15:15         ` Pali Rohár
2021-08-10 18:27           ` Martin Zaharinov
2021-08-11 16:40             ` Guillaume Nault
2021-08-11 11:10           ` Martin Zaharinov
2021-08-11 16:48             ` Guillaume Nault
2021-09-07  6:16               ` Martin Zaharinov
2021-09-07  6:42                 ` Martin Zaharinov
2021-09-11  6:26                   ` Martin Zaharinov [this message]
2021-09-14  6:16                     ` Martin Zaharinov
2021-09-14  8:02                       ` Guillaume Nault
2021-09-14  9:50                         ` Florian Westphal
2021-09-14 10:01                           ` Martin Zaharinov
2021-09-14 11:00                             ` Florian Westphal
2021-09-15 14:25                               ` Martin Zaharinov
2021-09-15 14:37                                 ` Martin Zaharinov
2021-09-16 20:00                               ` Martin Zaharinov
2021-09-14 10:53                           ` Martin Zaharinov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A16DCD3E-43AA-4D50-97FC-EBB776481840@gmail.com \
    --to=micron10@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=gnault@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=netdev@vger.kernel.org \
    --cc=pali@kernel.org \
    --subject='Re: Urgent  Bug report: PPPoE ioctl(PPPIOCCONNECT): Transport endpoint is not connected' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).