LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Saravana Kannan <saravanak@google.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Rob Herring <robh+dt@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Frank Rowand <frowand.list@gmail.com>,
	<devicetree@vger.kernel.org>,
	Russell King <linux@armlinux.org.uk>,
	"Linus Walleij" <linus.walleij@linaro.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Dmitry Torokhov <dmitry.torokhov@gmail.com>,
	<linux-input@vger.kernel.org>
Subject: Re: [BUG] amba: Remove deferred device addition
Date: Sat, 27 Aug 2022 18:26:32 +0800	[thread overview]
Message-ID: <051f1eb5-67f1-b3f9-cc4e-c5902068532f@huawei.com> (raw)
In-Reply-To: <CAGETcx8RLor0JcboBuMrB96xUot14P1CAcqoen7ZHnYRi7KMEQ@mail.gmail.com>



On 2022/7/6 3:25, Saravana Kannan wrote:
> On Fri, Sep 10, 2021 at 12:59 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>
>>
>> On 2021/9/9 11:30, Saravana Kannan wrote:
>>> On Fri, Aug 27, 2021 at 6:09 PM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>
>>>> On 2021/8/28 3:09, Saravana Kannan wrote:
>>>>> On Fri, Aug 27, 2021 at 7:38 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>>> On 2021/8/27 8:04, Saravana Kannan wrote:
>>>>>>> On Thu, Aug 26, 2021 at 1:22 AM Kefeng Wang <wangkefeng.wang@huawei.com> wrote:
>>>>>>>>>>> Btw, I've been working on [1] cleaning up the one-off deferred probe
>>>>>>>>>>> solution that we have for amba devices. That causes a bunch of other
>>>>>>>>>>> headaches. Your patch 3/3 takes us further in the wrong direction by
>>>>>>>>>>> adding more reasons for delaying the addition of the device.
>>>>>>>> Hi Saravana, I try the link[1], but with it, there is a crash when boot
>>>>>>>> (qemu-system-arm -M vexpress-a15),
>>>>> I'm assuming it's this one?
>>>>> arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
>>>> I use arch/arm/boot/dts/vexpress-v2p-ca15-tc1.dts.
>>>>
>>>> qemu-system-arm -M vexpress-a15 -dtb vexpress-v2p-ca15-tc1.dtb -cpu
>>>> cortex-a15 -smp 2 -m size=3G -kernel zImage -rtc base=localtime -initrd
>>>> initrd-arm32 -append 'console=ttyAMA0 cma=0 kfence.sample_interval=0
>>>> earlyprintk debug ' -device virtio-net-device,netdev=net8 -netdev
>>>> type=tap,id=net8,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown
>>>> -nographic
>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> It's hard to make sense of the logs. Looks like two different threads
>>>>>>> might be printing to the log at the same time? Can you please enable
>>>>>>> the config that prints the thread ID (forgot what it's called) and
>>>>>>> collect this again? With what I could tell the crash seems to be
>>>>>>> happening somewhere in platform_match(), but that's not related to
>>>>>>> this patch at all?
>>>>>> Can you reproduce it? it is very likely related(without your patch, the
>>>>>> boot is fine),
>>>>> Sorry, I haven't ever setup qemu and booted vexpress. Thanks for your help.
>>>>>
>>>>>> the NULL ptr is about serio, it is registed from amba driver.
>>>>>>
>>>>>> ambakmi_driver_init
>>>>>>
>>>>>>     -- amba_kmi_probe
>>>>>>
>>>>>>       -- __serio_register_port
>>>>> Thanks for the pointer. I took a look at the logs and the code. It's
>>>>> very strange. As you can see from the backtrace, platform_match() is
>>>>> being called for the device_add() from serio_handle_event(). But the
>>>>> device that gets added there is on the serio_bus which obviously
>>>>> should be using the serio_bus_match.
>>>> Yes, I am confused too.
>>>>>> +Dmitry and input maillist, is there some known issue about serio ?
>>>>>>
>>>>>> I add some debug, the full log is attached.
>>>>>>
>>>>>> [    2.958355][   T41] input: AT Raw Set 2 keyboard as
>>>>>> /devices/platform/bus@8000000/bus@8000000:motherboard-bus/bus@8000000:motherboard-bus:iofpga-bus@300000000/1c060000.kmi/serio0/input/input0
>>>>>> [    2.977441][   T41] serio serio1: pdev c1e05508, pdev->name (null),
>>>>>> drv c1090fc0, drv->name vexpress-reset
>>>>> Based on the logs you added, it's pretty clear we are getting to
>>>>> platform_match(). It's also strange that the drv->name is
>>>>> vexpress-reset
>>>> ...
>>>>>> [    3.003113][   T41] Backtrace:
>>>>>> [    3.003451][   T41] [<c0560bb4>] (strcmp) from [<c0646358>] (platform_match+0xdc/0xf0)
>>>>>> [    3.003963][   T41] [<c064627c>] (platform_match) from [<c06437d4>] (__device_attach_driver+0x3c/0xf4)
>>>>>> [    3.004769][   T41] [<c0643798>] (__device_attach_driver) from [<c0641180>] (bus_for_each_drv+0x68/0xc8)
>>>>>> [    3.005481][   T41] [<c0641118>] (bus_for_each_drv) from [<c0642f40>] (__device_attach+0xf0/0x16c)
>>>>>> [    3.006152][   T41] [<c0642e50>] (__device_attach) from [<c06439d4>] (device_initial_probe+0x1c/0x20)
>>>>>> [    3.006853][   T41] [<c06439b8>] (device_initial_probe) from [<c0642030>] (bus_probe_device+0x94/0x9c)
>>>>>> [    3.007259][   T41] [<c0641f9c>] (bus_probe_device) from [<c063f9cc>] (device_add+0x408/0x8b8)
>>>>>> [    3.007900][   T41] [<c063f5c4>] (device_add) from [<c071c1cc>] (serio_handle_event+0x1b8/0x234)
>>>>>> [    3.008824][   T41] [<c071c014>] (serio_handle_event) from [<c01475a4>] (process_one_work+0x238/0x594)
>>>>>> [    3.009737][   T41] [<c014736c>] (process_one_work) from [<c014795c>] (worker_thread+0x5c/0x5f4)
>>>>>> [    3.010638][   T41] [<c0147900>] (worker_thread) from [<c014feb4>] (kthread+0x178/0x194)
>>>>>> [    3.011496][   T41] [<c014fd3c>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
>>>>>> [    3.011860][   T41] Exception stack(0xc1675fb0 to 0xc1675ff8)
>>>>> But the platform_match() is happening for the device_add() from
>>>>> serio_event_handle() that's adding a device to the serio_bus and it
>>>>> should be using serio_bus_match().
>>>>>
>>>>> I haven't reached any conclusion yet, but my current thought process
>>>>> is that it's either:
>>>>> 1. My patch is somehow causing list corruption. But I don't directly
>>>>> touch any list in my change (other than deleting a list entirely), so
>>>>> it's not clear how that would be happening.
>>>> Maybe some concurrent driver load?
>>>>
>>>>> 2. Without my patch, these AMBA device's probe would be delayed at
>>>>> least until 5 seconds or possibly later. I'm wondering if my patch is
>>>>> catching some bad timing assumptions in other code.
>>>> After Rob's patch, It will retry soon.
>>>>
>>>> commit 039599c92d3b2e73689e8b6e519d653fd4770abb
>>>> Author: Rob Herring <robh@kernel.org>
>>>> Date:   Wed Apr 29 15:58:12 2020 -0500
>>>>
>>>>       amba: Retry adding deferred devices at late_initcall
>>>>
>>>>       If amba bus devices defer when adding, the amba bus code simply retries
>>>>       adding the devices every 5 seconds. This doesn't work well as it
>>>>       completely unsynchronized with starting the init process which can
>>>>       happen in less than 5 secs. Add a retry during late_initcall. If the
>>>>       amba devices are added, then deferred probe takes over. If the
>>>>       dependencies have not probed at this point, then there's no improvement
>>>>       over previous behavior. To completely solve this, we'd need to retry
>>>>       after every successful probe as deferred probe does.
>>>>
>>>>       The list_empty() check now happens outside the mutex, but the mutex
>>>>       wasn't necessary in the first place.
>>>>
>>>>       This needed to use deferred probe instead of fragile initcall ordering
>>>>       on 32-bit VExpress systems where the apb_pclk has a number of probe
>>>>       dependencies (vexpress-sysregs, vexpress-config).
>>>>
>>>>
>>>>> You might be able to test out theory (2) by DEFERRED_DEVICE_TIMEOUT to
>>>>> a much smaller number. Say 500ms or 100ms. If it doesn't crash, it
>>>>> doesn't mean it's not (2), but if it does, then we know for sure it's
>>>>> (2).
>>>> ok, I will try this one, but due to above patch, it may not work.
>>> Were you able to find anything more?
>> I can't find any clue, and have no time to check this for now, is there
>> any news from your side?

Hi, Saravana and Kefeng:
  I've spent the whole afternoon trying to figure this out, and the fix
patch has been cc you two.

> 
> To close out this thread, the issue was due to a UAF bug in driver
> core that was fixed by:
> https://lore.kernel.org/all/20220513112444.45112-1-schspa@gmail.com/
> 
> With that fix, there wouldn't have been a crash, but amba driver
> registration would have failed (because match returned
> non-EPROBE_DEFER error).
> 
> -Saravana
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

-- 
Regards,
  Zhen Lei

  reply	other threads:[~2022-08-27 10:27 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16  7:46 [PATCH 0/3] amba: Properly handle device probe without IRQ domain Kefeng Wang
2021-08-16  7:46 ` [PATCH 1/3] amba: Drop unused functions about APB/AHB devices add Kefeng Wang
2021-08-16  7:46 ` [PATCH 2/3] Revert "ARM: amba: make use of -1 IRQs warn" Kefeng Wang
2021-08-16  7:46 ` [PATCH 3/3] amba: Properly handle device probe without IRQ domain Kefeng Wang
2021-08-24 20:05   ` Rob Herring
2021-08-24 20:08     ` Saravana Kannan
2021-08-25  4:05       ` Kefeng Wang
2021-08-25  8:04         ` Saravana Kannan
2021-08-25  8:39           ` Kefeng Wang
2021-08-26  2:45           ` Kefeng Wang
2021-08-26  4:45             ` Saravana Kannan
2021-08-26  6:22               ` Kefeng Wang
2021-08-26  8:22               ` [BUG] amba: Remove deferred device addition Kefeng Wang
2021-08-27  0:04                 ` Saravana Kannan
2021-08-27 14:38                   ` Kefeng Wang
2021-08-27 19:09                     ` Saravana Kannan
2021-08-28  1:09                       ` Kefeng Wang
2021-09-09  3:30                         ` Saravana Kannan
2021-09-10  7:59                           ` Kefeng Wang
2022-07-05 19:25                             ` Saravana Kannan
2022-08-27 10:26                               ` Leizhen (ThunderTown) [this message]
2021-08-25 12:33         ` [PATCH 3/3] amba: Properly handle device probe without IRQ domain Rob Herring
2021-08-25 14:41           ` Kefeng Wang
2021-08-17 22:27 ` [PATCH 0/3] " Rob Herring
2021-08-23  2:19   ` Kefeng Wang
2021-08-23  9:05     ` Russell King (Oracle)
2021-08-23 10:57       ` Kefeng Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=051f1eb5-67f1-b3f9-cc4e-c5902068532f@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=devicetree@vger.kernel.org \
    --cc=dmitry.torokhov@gmail.com \
    --cc=frowand.list@gmail.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-input@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=robh+dt@kernel.org \
    --cc=saravanak@google.com \
    --cc=wangkefeng.wang@huawei.com \
    --subject='Re: [BUG] amba: Remove deferred device addition' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).