On Fri, Jan 28, 2022 at 12:00:35PM +0800, Yunsheng Lin wrote: > On 2022/1/26 22:30, Jean-Philippe Brucker wrote: > > Hi, > > > > On Fri, Aug 06, 2021 at 10:46:22AM +0800, Yunsheng Lin wrote: > >> This patch adds skb's frag page recycling support based on > >> the frag page support in page pool. > >> > >> The performance improves above 10~20% for single thread iperf > >> TCP flow with IOMMU disabled when iperf server and irq/NAPI > >> have a different CPU. > >> > >> The performance improves about 135%(14Gbit to 33Gbit) for single > >> thread iperf TCP flow when IOMMU is in strict mode and iperf > >> server shares the same cpu with irq/NAPI. > >> > >> Signed-off-by: Yunsheng Lin > > > > This commit is giving me some trouble, but I haven't managed to pinpoint > > the exact problem. > > Hi, > Thanks for reporting the problem. > > We also hit a similiar problem during internal CI testing, but was not > able to trigger it manually, so was not able to find the root case yet. > > Is your test case more likely to trigger the problem? The problem shows up shortly after boot for me, when I'm not doing anything special, just ssh'ing into the server. I did manage to trigger it faster with a "netperf -T TCP_MAERTS" job. Maybe I have something enabled in my config that makes it easier to trigger? Attached the .config to this reply, but I think it corresponds pretty much to debian's config. > > Symptoms are: > > * A page gets unmapped twice from page_pool_release_page(). The second > > time, dma-iommu.c warns about the empty PTE [1] > > * The rx ring still accesses the page after the first unmap, causing SMMU > > translation faults [2] > > * That leads to APEI errors and reset of the device, at which time > > page_pool_inflight() complains about "Negative(-x) inflight packet-pages". > > > > After some debugging, it looks like the page gets released three times > > instead of two: > > > > (1) first in page_pool_drain_frag(): > > > > page_pool_alloc_frag+0x1fc/0x248 > > hns3_alloc_and_map_buffer+0x30/0x170 > > hns3_nic_alloc_rx_buffers+0x9c/0x170 > > hns3_clean_rx_ring+0x854/0x950 > > hns3_nic_common_poll+0xa0/0x218 > > __napi_poll+0x38/0x1b0 > > net_rx_action+0xe8/0x248 > > __do_softirq+0x120/0x284 > > > > (2) Then later by page_pool_return_skb_page(), which (I guess) unmaps the > > page: > > > > page_pool_put_page+0x214/0x308 > > page_pool_return_skb_page+0x48/0x60 > > skb_release_data+0x168/0x188 > > skb_release_all+0x28/0x38 > > kfree_skb+0x30/0x90 > > packet_rcv+0x4c/0x410 > > __netif_receive_skb_list_core+0x1f4/0x218 > > netif_receive_skb_list_internal+0x18c/0x2a8 > > > > (3) And finally, soon after, by clean_rx_ring() which causes pp_frag_count > > underflow (seen after removing the optimization in > > page_pool_atomic_sub_frag_count_return): > > > > page_pool_put_page+0x2a0/0x308 > > page_pool_put_full_page > > hns3_alloc_skb > > hns3_handle_rx_bd > > hns3_clean_rx_ring+0x744/0x950 > > hns3_nic_common_poll+0xa0/0x218 > > __napi_poll+0x38/0x1b0 > > net_rx_action+0xe8/0x248 > > > > So I'm guessing (2) happens too early while the RX ring is still using the > > page, but I don't know more. I'd be happy to add more debug and to test > > If the reference counting or pp_frag_count of the page is manipulated correctly, > I think step 2&3 does not have any dependency between each other. > > > fixes if you have any suggestions. > > My initial thinking is to track if the reference counting or pp_frag_count of > the page is manipulated correctly. It looks like pp_frag_count is dropped too many times: after (1), pp_frag_count only has 1 ref, so (2) drops it to 0 and (3) results in underflow. I turned page_pool_atomic_sub_frag_count_return() into "atomic_long_sub_return(nr, &page->pp_frag_count)" to make sure (the atomic_long_read() bit normally hides this). Wasn't entirely sure if this is expected behavior, though. Thanks, Jean > > Perhaps using the newly added reference counting tracking infrastructure? > Will look into how to use the reference counting tracking infrastructure > for the above problem. > > > > > Thanks, > > Jean > > > > > > [1] ------------[ cut here ]------------ > > WARNING: CPU: 71 PID: 0 at drivers/iommu/dma-iommu.c:848 iommu_dma_unmap_page+0xbc/0xd8 > > Modules linked in: fuse overlay ipmi_si hisi_hpre hisi_zip ecdh_generic hisi_trng_v2 ecc ipmi_d> > > CPU: 71 PID: 0 Comm: swapper/71 Not tainted 5.16.0-g3813c61fbaad #22 > > Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B133.01 03/25/2021 > > pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > pc : iommu_dma_unmap_page+0xbc/0xd8 > > lr : iommu_dma_unmap_page+0x38/0xd8 > > sp : ffff800010abb8d0 > > x29: ffff800010abb8d0 x28: ffff20200ee80000 x27: 0000000000000042 > > x26: ffff20201a7ed800 x25: ffff20200be7a5c0 x24: 0000000000000002 > > x23: 0000000000000020 x22: 0000000000001000 x21: 0000000000000000 > > x20: 0000000000000002 x19: ffff002086b730c8 x18: 0000000000000001 > > x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > > x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000008 > > x11: 000000000000ffff x10: 0000000000000001 x9 : 0000000000000004 > > x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff202006274800 > > x5 : 0000000000000009 x4 : 0000000000000001 x3 : 000000000000001e > > x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 > > Call trace: > > iommu_dma_unmap_page+0xbc/0xd8 > > dma_unmap_page_attrs+0x30/0x1d0 > > page_pool_release_page+0x40/0x88 > > page_pool_return_page+0x18/0x80 > > page_pool_put_page+0x248/0x288 > > hns3_clean_rx_ring+0x744/0x950 > > hns3_nic_common_poll+0xa0/0x218 > > __napi_poll+0x38/0x1b0 > > net_rx_action+0xe8/0x248 > > __do_softirq+0x120/0x284 > > irq_exit_rcu+0xe0/0x100 > > el1_interrupt+0x3c/0x88 > > el1h_64_irq_handler+0x18/0x28 > > el1h_64_irq+0x78/0x7c > > arch_cpu_idle+0x18/0x28 > > default_idle_call+0x20/0x68 > > do_idle+0x214/0x260 > > cpu_startup_entry+0x28/0x70 > > secondary_start_kernel+0x160/0x170 > > __secondary_switched+0x90/0x94 > > ---[ end trace 432d1737b4b96ed9 ]--- > > > > (please ignore the kernel version, I can reproduce this with v5.14 and > > v5.17-rc1, and bisected to this commit.) > > > > [2] arm-smmu-v3 arm-smmu-v3.6.auto: event 0x10 received: > > arm-smmu-v3 arm-smmu-v3.6.auto: 0x0000bd0000000010 > > arm-smmu-v3 arm-smmu-v3.6.auto: 0x000012000000007c > > arm-smmu-v3 arm-smmu-v3.6.auto: 0x00000000ff905800 > > arm-smmu-v3 arm-smmu-v3.6.auto: 0x00000000ff905000