LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Linux 5.15-rc1
@ 2021-09-12 23:58 Linus Torvalds
  2021-09-13  3:57 ` Guenter Roeck
  2021-09-13 14:18 ` Dave Jones
  0 siblings, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2021-09-12 23:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So 5.15 isn't shaping up to be a particularly large release, at least
in number of commits. At only just over 10k non-merge commits, this is
in fact the smallest rc1 we have had in the 5.x series. We're usually
hovering in the 12-14k commit range.

That said, counting commits isn't necessarily the best measure, and
that might be particularly true this time around. We have a few new
subsystems, with NTFSv3 and ksmbd standing out. And as a result, when
you look at the stats on a "lines changed" basis, 5.15-rc1 ends up
looking much more middle-of-the-road. It still doesn't look like a
particularly _big_ merge window, but also not remotely the smallest
one.

And while this is not up there with some larger releases, it's
actually been one of the messier merge windows. Part of it was
self-inflicted damage from me trying to enable -Werror much more
aggressively, but I also ended up having to push back a lot more on
some of the patch series and had a number o full requests where I went
"ok, I've pulled this, but XYZ is wrong".

So we've had merge windows that went much more smoothly. In fact, I
have a pull request or two that I just didn't feel like going through
fully, and I might still pull the upcoming week, but I got a bit fed
up with how I ended up seeing new pull requests - and not for fixes -
coming in fairly late in the merge window. Yes, the merge window is
two weeks, but part of that is very literally to give _me_ time to
actually look things through, not for people to send me new requests
up until the very end of the merge window.

Anyway, I'm hoping that things calm down, and I'll take a look at a
few things still in my inbox, but on the whole you should expect that
"that's it" and send me fixes only.

And in order to get those fixes going, please go out and test this.

Appended, as always, is my "mergelog" - since even at "only" 10k+
commits, the shortlog is not really realistically readable or useful
as a summary. And as always, the mergelog credits the person I pulled
from, which is not the same as the actual author of all the changes.
There's just over a hundred people listed below that I've pulled from,
but over 1500 people with authorship credit in the git tree. So that's
where you'd need to dig for all the details.

Thanks,
            Linus

---

Al Viro (4):
    iov_iter fixes
    root filesystem type handling updates
    gfs2 setattr updates
    namei updates

Alex Williamson (1):
    VFIO updates

Alexandre Belloni (1):
    RTC updates

Andreas Gruenbacher (1):
    gfs2 updates

Andrew Morton (3):
    misc updates
    more updates
    yet more updates and hotfixes

Anna Schumaker (1):
    NFS client updates

Arnaldo Carvalho de Melo (2):
    perf tool updates
    more perf tools updates

Arnd Bergmann (5):
    asm-generic updates
    ARM SoC updates
    ARM SoC driver updates
    ARM defconfig updates
    ARM SoC DT updates

Bartosz Golaszewski (1):
    gpio updates

Benson Leung (1):
    chrome platform updates

Bjorn Andersson (1):
    remoteproc updates

Bjorn Helgaas (1):
    PCI updates

Borislav Petkov (8):
    EDAC updates
    RAS update
    x86 build updates
    x86 resource control updates
    x86 cleanups
    timer fix
    locking fixes
    scheduler fixes

Casey Schaufler (1):
    smack updates

Catalin Marinas (2):
    arm64 updates
    arm64 fixes

Christian Brauner (4):
    move_mount updates
    close_range() cleanup
    idmapping documentation updates
    set_user()  update

Christoph Hellwig (2):
    configfs updates
    dma-mapping updates

Chuck Lever (2):
    nfsd updates
    nfsd fixes

Corey Minyard (1):
    IPMI updates

Dan Williams (2):
    libnvdimm updates
    CXL (Compute Express Link) updates

Daniel Lezcano (1):
    thermal updates

Daniel Thompson (1):
    kgdb updates

Darrick Wong (3):
    project quota update
    iomap updates
    xfs updates

Dave Airlie (2):
    drm updates
    drm fixes

David Hildenbrand (1):
    MAP_DENYWRITE removal

David Howells (1):
    fscache updates

David Sterba (2):
    btrfs updates
    btrfs fixes

David Teigland (1):
    dlm updates

Dmitry Torokhov (1):
    input updates

Dominique Martinet (1):
    9p updates

Eric Biederman (2):
    siginfo si_trapno updates
    exit cleanups

Eric Biggers (1):
    fscrypt updates

Gao Xiang (1):
    erofs updates

Geert Uytterhoeven (1):
    m68k updates

Greg KH (8):
    char / misc driver updates
    driver core updates
    IIO and staging driver updates
    tty / serial updates
    USB / Thunderbolt updates
    more USB updates
    habanalabs updates
    misc driver fix

Greg Ungerer (1):
    m68knommu updates

Guenter Roeck (1):
    hwmon updates

Hans de Goede (1):
    x86 platform driver updates

Heiko Carstens (2):
    s390 updates
    more s390 updates

Helge Deller (3):
    parisc architecture updates
    parisc architecture fixes
    parisc fixes

Herbert Xu (1):
    crypto updates

Ilya Dryomov (1):
    ceph updates

Ingo Molnar (4):
    scheduler updates
    x86 perf event updates
    EFI updates
    memory model updates

Jaegeuk Kim (1):
    f2fs updates

Jakub Kicinski (2):
    networking updates
    networking fixes and stragglers

James Bottomley (1):
    SCSI updates

Jan Kara (4):
    fsnotify updates
    FIEMAP cleanups
    UDF and isofs updates
    fs hole punching vs cache filling race fixes

Jarkko Sakkinen (1):
    tpm driver updates

Jason Gunthorpe (2):
    rdma updates
    rdma fixes

Jassi Brar (1):
    mailbox updates

Jean Delvare (1):
    dmi fix

Jeff Layton (1):
    file locking updates

Jens Axboe (13):
    block updates
    block driver updates
    libata updates
    io_uring updates
    support for struct bio recycling
    io_uring mkdirat/symlinkat/linkat support
    io_uring fixes
    libata fixes
    CDROM maintainer update
    block fixes
    libata maintainer update
    block fixes
    io_uring fixes

Jessica Yu (1):
    module updates

Jiri Kosina (1):
    HID updates

Joerg Roedel (2):
    iommu updates
    iommu fixes

Jon Mason (1):
    NTB updates

Jonathan Corbet (2):
    documentation updates
    more documentation updates

Juergen Gross (1):
    xen updates

Julia Lawall (1):
    coccinelle updates

Kees Cook (1):
    hardening updates

Konrad Rzeszutek Wilk (3):
    ibft updates
    swiotlb updates
    ibft fix

Konstantin Komarov (1):
    NTFSv3 filesystem

Lee Jones (2):
    MFD updates
    backlight updates

Linus Walleij (1):
    pin control updates

Mark Brown (3):
    regmap updates
    regulator updates
    spi updates

Masahiro Yamada (1):
    Kbuild updates

Mauro Carvalho Chehab (1):
    media updates

Max Filippov (1):
    Xtensa updates

Michael Ellerman (1):
    powerpc updates

Michael Tsirkin (1):
    virtio updates

Michal Simek (1):
    microblaze update

Miguel Ojeda (2):
    auxdisplay updates
    compiler attributes updates

Mike Rapoport (1):
    memblock updates

Mike Snitzer (1):
    device mapper updates

Miklos Szeredi (2):
    overlayfs update
    fuse updates

Mimi Zohar (1):
    integrity subsystem updates

Miquel Raynal (1):
    MTD updates

Palmer Dabbelt (2):
    RISC-V updates
    more RISC-V updates

Paolo Bonzini (1):
    KVM updates

Paul McKenney (1):
    RCU updates

Paul Moore (2):
    selinux update
    audit updates

Pavel Machek (1):
    LED updates

Petr Mladek (2):
    printk updates
    livepatching update

Rafael Wysocki (7):
    power management updates
    ACPI updates
    device properties framework updates
    more ACPI updates
    more power management updates
    more power management updates
    more ACPI updates

Richard Weinberger (1):
    UML updates

Rob Herring (2):
    devicetree updates
    devicetree fixes

Russell King (1):
    ARM development updates

Sebastian Reichel (1):
    power supply and reset updates

Shuah Khan (2):
    KUnit updates
    Kselftest updates

Stafford Horne (1):
    OpenRISC updates

Stefan Richter (1):
    firewire updates

Stephen Boyd (2):
    clk updates
    clk fix

Steve French (4):
    initial ksmbd implementation
    cifs client updates
    ksmbd fixes
    smbfs updates

Steven Rostedt (3):
    tracing updates
    more tracing updates
    tracing fixes

Takashi Iwai (2):
    sound updates
    sound fixes

Ted Ts'o (1):
    ext4 updates

Tejun Heo (2):
    cgroup updates
    workqueue updates

Thierry Reding (1):
    pwm updates

Thomas Bogendoerfer (1):
    MIPS updates

Thomas Gleixner (9):
    debugobjects update
    SMP core updates
    locking and atomics updates
    irq updates
    x86 cache flush updates
    x86 PIRQ updates
    misc x86 updates
    timer updates
    CPU hotplug updates

Ulf Hansson (1):
    MMC and MEMSTICK updates

Vineet Gupta (1):
    ARC updates

Vinod Koul (1):
    dmaengine updates

Vlastimil Babka (1):
    SLUB updates

Wei Liu (1):
    hyperv updates

Wim Van Sebroeck (1):
    watchdog updates

Wolfram Sang (1):
    i2c updates

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-12 23:58 Linux 5.15-rc1 Linus Torvalds
@ 2021-09-13  3:57 ` Guenter Roeck
  2021-09-28 23:18   ` Michael S. Tsirkin
  2021-09-13 14:18 ` Dave Jones
  1 sibling, 1 reply; 38+ messages in thread
From: Guenter Roeck @ 2021-09-13  3:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote:
[ ... ]
> 
> And in order to get those fixes going, please go out and test this.
> 

Build results:
	total: 153 pass: 143 fail: 10
Failed builds:
	alpha:allmodconfig (15)
	arm:allmodconfig (1)
	m68k:allmodconfig (47)
	mips:allmodconfig (2)
	parisc:allmodconfig (8)
	riscv32:allmodconfig (1)
	riscv:allmodconfig (1)
	s390:allmodconfig (6)
	sparc64:allmodconfig (1)
	xtensa:allmodconfig (53)
Qemu test results:
	total: 479 pass: 478 fail: 1
Failed tests:
	sparc64:sun4u:nodebug:smp:virtio-pci:net,i82559er:hd

The allmodconfig build failures are mostly if not all the result of
enabling -Werror on test builds. The individual errors are too many
to list; the number of errors is listed in () after the build above.
Anyone interested can find build logs at https://kerneltests.org/builders.
I am not entirely sure what to do about that - if history teaches a
lesson, new build failures may pile up faster than existing ones get
fixed, but then I am aware that several fixes are queued already. We'll
see how it goes. I'll monitor the situation until maybe -rc4 or -rc5 and
then decide if I disable -Werror in my test builds to catch any new "real"
problems.

The qemu runtime failure bisects to commit 694a1116b405 ("virtio: Bind
virtio device to device-tree node"), and reverting that commit fixes the
problem.  With that patch applied, the virtio block device does not
instantiate on sparc64. This results in a crash since that is where the
test is trying to boot from.

Good news is that I don't see any new runtime warnings.

Guenter

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-12 23:58 Linux 5.15-rc1 Linus Torvalds
  2021-09-13  3:57 ` Guenter Roeck
@ 2021-09-13 14:18 ` Dave Jones
  2021-09-13 18:59   ` Heiner Kallweit
  2021-09-13 23:46   ` Bjorn Helgaas
  1 sibling, 2 replies; 38+ messages in thread
From: Dave Jones @ 2021-09-13 14:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Heiner Kallweit

On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote:
 > So 5.15 isn't shaping up to be a particularly large release, at least
 > in number of commits. At only just over 10k non-merge commits, this is
 > in fact the smallest rc1 we have had in the 5.x series. We're usually
 > hovering in the 12-14k commit range.

This release takes over two minutes longer to boot on one my machines than 5.14.
The time just seems to be unaccounted for, even with initcall_debug

[    2.190051] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
[    2.190057] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3]
[    2.190239] acpi PNP0A08:00: _OSC: platform does not support [PME]
[    2.190360] acpi PNP0A08:00: _OSC: OS now controls [AER PCIeCapability LTR]
[    2.190362] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration
[    2.190783] PCI host bridge to bus 0000:00
[    2.190785] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    2.190787] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    2.190789] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    2.190790] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff window]
[    2.190792] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff window]
[    2.190794] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff window]
[    2.190795] pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff window]
[    2.190797] pci_bus 0000:00: root bus resource [mem 0xbf200000-0xfeafffff window]
[    2.190799] pci_bus 0000:00: root bus resource [bus 00-fe]
[    2.190828] pci 0000:00:00.0: calling  quirk_mmio_always_on+0x0/0x10 @ 1
[    2.190833] pci 0000:00:00.0: quirk_mmio_always_on+0x0/0x10 took 0 usecs
[    2.190836] pci 0000:00:00.0: [8086:0c00] type 00 class 0x060000
[    2.190851] pci 0000:00:00.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.190855] pci 0000:00:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.190950] pci 0000:00:01.0: calling  quirk_no_aersid+0x0/0x10 @ 1
[    2.190953] pci 0000:00:01.0: quirk_no_aersid+0x0/0x10 took 0 usecs
[    2.190955] pci 0000:00:01.0: [8086:0c01] type 01 class 0x060400
[    2.190972] pci 0000:00:01.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.190975] pci 0000:00:01.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.190978] pci 0000:00:01.0: calling  pci_fixup_transparent_bridge+0x0/0x20 @ 1
[    2.190981] pci 0000:00:01.0: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs
[    2.190993] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold
[    2.191233] pci 0000:00:02.0: [8086:0412] type 00 class 0x030000
[    2.191240] pci 0000:00:02.0: reg 0x10: [mem 0xdf000000-0xdf3fffff 64bit]
[    2.191244] pci 0000:00:02.0: reg 0x18: [mem 0xc0000000-0xcfffffff 64bit pref]
[    2.191248] pci 0000:00:02.0: reg 0x20: [io  0xf000-0xf03f]
[    2.191256] pci 0000:00:02.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.191260] pci 0000:00:02.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.191339] pci 0000:00:03.0: [8086:0c0c] type 00 class 0x040300
[    2.191345] pci 0000:00:03.0: reg 0x10: [mem 0xdfe34000-0xdfe37fff 64bit]
[    2.191356] pci 0000:00:03.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.191359] pci 0000:00:03.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.191450] pci 0000:00:14.0: [8086:8cb1] type 00 class 0x0c0330
[    2.191463] pci 0000:00:14.0: reg 0x10: [mem 0xdfe20000-0xdfe2ffff 64bit]
[    2.191492] pci 0000:00:14.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.191495] pci 0000:00:14.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.191514] pci 0000:00:14.0: PME# supported from D3hot D3cold
[    2.191592] pci 0000:00:16.0: [8086:8cba] type 00 class 0x078000
[    2.191606] pci 0000:00:16.0: reg 0x10: [mem 0xdfe3e000-0xdfe3e00f 64bit]
[    2.191637] pci 0000:00:16.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.191640] pci 0000:00:16.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.191660] pci 0000:00:16.0: PME# supported from D0 D3hot D3cold
[    2.191790] pci 0000:00:19.0: calling  quirk_f0_vpd_link+0x0/0x60 @ 1
[    2.191794] pci 0000:00:19.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs
[    2.191797] pci 0000:00:19.0: [8086:15a1] type 00 class 0x020000
[    2.191807] pci 0000:00:19.0: reg 0x10: [mem 0xdfe00000-0xdfe1ffff]
[    2.191814] pci 0000:00:19.0: reg 0x14: [mem 0xdfe3c000-0xdfe3cfff]
[    2.191820] pci 0000:00:19.0: reg 0x18: [io  0xf080-0xf09f]
[    2.191844] pci 0000:00:19.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.191847] pci 0000:00:19.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.191869] pci 0000:00:19.0: PME# supported from D0 D3hot D3cold
[    2.191953] pci 0000:00:1a.0: [8086:8cad] type 00 class 0x0c0320
[    2.191966] pci 0000:00:1a.0: reg 0x10: [mem 0xdfe3b000-0xdfe3b3ff]
[    2.192007] pci 0000:00:1a.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.192010] pci 0000:00:1a.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.192038] pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold
[    2.192121] pci 0000:00:1b.0: [8086:8ca0] type 00 class 0x040300
[    2.192132] pci 0000:00:1b.0: reg 0x10: [mem 0xdfe30000-0xdfe33fff 64bit]
[    2.192158] pci 0000:00:1b.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.192161] pci 0000:00:1b.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.192187] pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
[    2.192313] pci 0000:00:1c.0: calling  quirk_no_aersid+0x0/0x10 @ 1
[    2.192316] pci 0000:00:1c.0: quirk_no_aersid+0x0/0x10 took 0 usecs
[    2.192318] pci 0000:00:1c.0: [8086:8c90] type 01 class 0x060400
[    2.192353] pci 0000:00:1c.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.192356] pci 0000:00:1c.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.192359] pci 0000:00:1c.0: calling  pci_fixup_transparent_bridge+0x0/0x20 @ 1
[    2.192362] pci 0000:00:1c.0: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs
[    2.192387] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
[    2.192615] pci 0000:00:1c.3: calling  quirk_no_aersid+0x0/0x10 @ 1
[    2.192618] pci 0000:00:1c.3: quirk_no_aersid+0x0/0x10 took 0 usecs
[    2.192620] pci 0000:00:1c.3: [8086:8c96] type 01 class 0x060400
[    2.192656] pci 0000:00:1c.3: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.192659] pci 0000:00:1c.3: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.192662] pci 0000:00:1c.3: calling  pci_fixup_transparent_bridge+0x0/0x20 @ 1
[    2.192664] pci 0000:00:1c.3: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs
[    2.192689] pci 0000:00:1c.3: PME# supported from D0 D3hot D3cold
[    2.192914] pci 0000:00:1c.4: calling  quirk_no_aersid+0x0/0x10 @ 1
[    2.192917] pci 0000:00:1c.4: quirk_no_aersid+0x0/0x10 took 0 usecs
[    2.192919] pci 0000:00:1c.4: [8086:8c98] type 01 class 0x060400
[    2.192954] pci 0000:00:1c.4: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.192957] pci 0000:00:1c.4: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.192960] pci 0000:00:1c.4: calling  pci_fixup_transparent_bridge+0x0/0x20 @ 1
[    2.192962] pci 0000:00:1c.4: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs
[    2.192988] pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold
[    2.193213] pci 0000:00:1c.6: calling  quirk_no_aersid+0x0/0x10 @ 1
[    2.193215] pci 0000:00:1c.6: quirk_no_aersid+0x0/0x10 took 0 usecs
[    2.193217] pci 0000:00:1c.6: [8086:8c9c] type 01 class 0x060400
[    2.193253] pci 0000:00:1c.6: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.193256] pci 0000:00:1c.6: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.193259] pci 0000:00:1c.6: calling  pci_fixup_transparent_bridge+0x0/0x20 @ 1
[    2.193261] pci 0000:00:1c.6: pci_fixup_transparent_bridge+0x0/0x20 took 0 usecs
[    2.193286] pci 0000:00:1c.6: PME# supported from D0 D3hot D3cold
[    2.193515] pci 0000:00:1d.0: [8086:8ca6] type 00 class 0x0c0320
[    2.193528] pci 0000:00:1d.0: reg 0x10: [mem 0xdfe3a000-0xdfe3a3ff]
[    2.193570] pci 0000:00:1d.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.193573] pci 0000:00:1d.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.193601] pci 0000:00:1d.0: PME# supported from D0 D3hot D3cold
[    2.193688] pci 0000:00:1f.0: [8086:8cc4] type 00 class 0x060100
[    2.193754] pci 0000:00:1f.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.193757] pci 0000:00:1f.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.193855] pci 0000:00:1f.2: [8086:8c82] type 00 class 0x010601
[    2.193864] pci 0000:00:1f.2: reg 0x10: [io  0xf0d0-0xf0d7]
[    2.193870] pci 0000:00:1f.2: reg 0x14: [io  0xf0c0-0xf0c3]
[    2.193875] pci 0000:00:1f.2: reg 0x18: [io  0xf0b0-0xf0b7]
[    2.193881] pci 0000:00:1f.2: reg 0x1c: [io  0xf0a0-0xf0a3]
[    2.193886] pci 0000:00:1f.2: reg 0x20: [io  0xf060-0xf07f]
[    2.193892] pci 0000:00:1f.2: reg 0x24: [mem 0xdfe39000-0xdfe397ff]
[    2.193901] pci 0000:00:1f.2: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.193904] pci 0000:00:1f.2: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.193923] pci 0000:00:1f.2: PME# supported from D3hot
[    2.193997] pci 0000:00:1f.3: [8086:8ca2] type 00 class 0x0c0500
[    2.194010] pci 0000:00:1f.3: reg 0x10: [mem 0xdfe38000-0xdfe380ff 64bit]
[    2.194025] pci 0000:00:1f.3: reg 0x20: [io  0xf040-0xf05f]
[    2.194039] pci 0000:00:1f.3: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.194042] pci 0000:00:1f.3: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.194093] pci 0000:01:00.0: calling  quirk_f0_vpd_link+0x0/0x60 @ 1
[    2.194097] pci 0000:01:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs
[    2.194100] pci 0000:01:00.0: [8086:10fb] type 00 class 0x020000
[    2.194109] pci 0000:01:00.0: reg 0x10: [mem 0xd0080000-0xd00fffff 64bit pref]
[    2.194113] pci 0000:01:00.0: reg 0x18: [io  0xe020-0xe03f]
[    2.194121] pci 0000:01:00.0: reg 0x20: [mem 0xd0104000-0xd0107fff 64bit pref]
[    2.194126] pci 0000:01:00.0: reg 0x30: [mem 0xdfd80000-0xdfdfffff pref]
[    2.194136] pci 0000:01:00.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[    2.194139] pci 0000:01:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[    2.194164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold

* stall here for 86 seconds *

[   88.675114] pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref]
[   88.675118] pci 0000:01:00.0: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR0 for 64 VFs)
[   88.675125] pci 0000:01:00.0: reg 0x190: [mem 0x00000000-0x00003fff 64bit pref]
[   88.675127] pci 0000:01:00.0: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR3 for 64 VFs)
[   88.675252] pci 0000:01:00.1: calling  quirk_f0_vpd_link+0x0/0x60 @ 1
[   88.675255] pci 0000:01:00.1: quirk_f0_vpd_link+0x0/0x60 took 0 usecs
[   88.675258] pci 0000:01:00.1: [8086:10fb] type 00 class 0x020000
[   88.675267] pci 0000:01:00.1: reg 0x10: [mem 0xd0000000-0xd007ffff 64bit pref]
[   88.675272] pci 0000:01:00.1: reg 0x18: [io  0xe000-0xe01f]
[   88.675280] pci 0000:01:00.1: reg 0x20: [mem 0xd0100000-0xd0103fff 64bit pref]
[   88.675284] pci 0000:01:00.1: reg 0x30: [mem 0xdfd00000-0xdfd7ffff pref]
[   88.675295] pci 0000:01:00.1: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[   88.675298] pci 0000:01:00.1: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[   88.675324] pci 0000:01:00.1: PME# supported from D0 D3hot D3cold

* stall again for 98 seconds *

[  186.595114] pci 0000:01:00.1: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref]
[  186.595117] pci 0000:01:00.1: VF(n) BAR0 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR0 for 64 VFs)
[  186.595124] pci 0000:01:00.1: reg 0x190: [mem 0x00000000-0x00003fff 64bit pref]
[  186.595126] pci 0000:01:00.1: VF(n) BAR3 space: [mem 0x00000000-0x000fffff 64bit pref] (contains BAR3 for 64 VFs)
[  186.595242] pci 0000:00:01.0: PCI bridge to [bus 01]
[  186.595245] pci 0000:00:01.0:   bridge window [io  0xe000-0xefff]
[  186.595247] pci 0000:00:01.0:   bridge window [mem 0xdfd00000-0xdfdfffff]
[  186.595249] pci 0000:00:01.0:   bridge window [mem 0xd0000000-0xd01fffff 64bit pref]
[  186.595251] pci 0000:00:01.0: bridge has subordinate 01 but max busn 02
[  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
[  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
[  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
[  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
[  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
[  186.735140] pci 0000:02:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 5.0 GT/s PCIe x2 link at 0000:00:1c.0 (capable of 16.000 Gb/s with 5.0 GT/s PCIe x4 link)
[  186.735255] pci 0000:00:1c.0: PCI bridge to [bus 02]
[  186.735260] pci 0000:00:1c.0:   bridge window [mem 0xdfc00000-0xdfcfffff]
[  186.735310] pci 0000:03:00.0: [1b21:1187] type 01 class 0x060400
[  186.735360] pci 0000:03:00.0: enabling Extended Tags
[  186.735422] pci 0000:03:00.0: PME# supported from D0 D3hot D3cold
[  186.735567] pci 0000:00:1c.3: PCI bridge to [bus 03-0b]
[  186.735570] pci 0000:00:1c.3:   bridge window [io  0xb000-0xcfff]
[  186.735573] pci 0000:00:1c.3:   bridge window [mem 0xdf400000-0xdf9fffff]
[  186.735649] pci 0000:04:01.0: [1b21:1187] type 01 class 0x060400
[  186.735702] pci 0000:04:01.0: enabling Extended Tags
[  186.735762] pci 0000:04:01.0: PME# supported from D0 D3hot D3cold
[  186.735848] pci 0000:04:02.0: [1b21:1187] type 01 class 0x060400
[  186.735901] pci 0000:04:02.0: enabling Extended Tags
[  186.735961] pci 0000:04:02.0: PME# supported from D0 D3hot D3cold
[  186.736048] pci 0000:04:03.0: [1b21:1187] type 01 class 0x060400
[  186.736100] pci 0000:04:03.0: enabling Extended Tags
[  186.736160] pci 0000:04:03.0: PME# supported from D0 D3hot D3cold
[  186.736243] pci 0000:04:04.0: [1b21:1187] type 01 class 0x060400
[  186.736296] pci 0000:04:04.0: enabling Extended Tags
[  186.736355] pci 0000:04:04.0: PME# supported from D0 D3hot D3cold
[  186.736439] pci 0000:04:05.0: [1b21:1187] type 01 class 0x060400
[  186.736492] pci 0000:04:05.0: enabling Extended Tags
[  186.736551] pci 0000:04:05.0: PME# supported from D0 D3hot D3cold
[  186.736635] pci 0000:04:06.0: [1b21:1187] type 01 class 0x060400
[  186.736688] pci 0000:04:06.0: enabling Extended Tags
[  186.736747] pci 0000:04:06.0: PME# supported from D0 D3hot D3cold
[  186.736830] pci 0000:04:07.0: [1b21:1187] type 01 class 0x060400
[  186.736883] pci 0000:04:07.0: enabling Extended Tags
[  186.736943] pci 0000:04:07.0: PME# supported from D0 D3hot D3cold
[  186.737024] pci 0000:03:00.0: PCI bridge to [bus 04-0b]
[  186.737030] pci 0000:03:00.0:   bridge window [io  0xb000-0xcfff]
[  186.737034] pci 0000:03:00.0:   bridge window [mem 0xdf400000-0xdf9fffff]
[  186.737071] pci 0000:04:01.0: PCI bridge to [bus 05]
[  186.737153] pci 0000:06:00.0: [14e4:43b1] type 00 class 0x028000
[  186.737183] pci 0000:06:00.0: reg 0x10: [mem 0xdf600000-0xdf607fff 64bit]
[  186.737202] pci 0000:06:00.0: reg 0x18: [mem 0xdf400000-0xdf5fffff 64bit]
[  186.737372] pci 0000:06:00.0: supports D1 D2
[  186.737374] pci 0000:06:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[  186.737516] pci 0000:04:02.0: PCI bridge to [bus 06]
[  186.737524] pci 0000:04:02.0:   bridge window [mem 0xdf400000-0xdf6fffff]
[  186.737586] pci 0000:07:00.0: calling  fixup_mpss_256+0x0/0x20 @ 1
[  186.737590] pci 0000:07:00.0: fixup_mpss_256+0x0/0x20 took 0 usecs
[  186.737593] pci 0000:07:00.0: [1b21:0612] type 00 class 0x010601
[  186.737617] pci 0000:07:00.0: reg 0x10: [io  0xc050-0xc057]
[  186.737630] pci 0000:07:00.0: reg 0x14: [io  0xc040-0xc043]
[  186.737643] pci 0000:07:00.0: reg 0x18: [io  0xc030-0xc037]
[  186.737657] pci 0000:07:00.0: reg 0x1c: [io  0xc020-0xc023]
[  186.737670] pci 0000:07:00.0: reg 0x20: [io  0xc000-0xc01f]
[  186.737683] pci 0000:07:00.0: reg 0x24: [mem 0xdf900000-0xdf9001ff]
[  186.737851] pci 0000:04:03.0: PCI bridge to [bus 07]
[  186.737857] pci 0000:04:03.0:   bridge window [io  0xc000-0xcfff]
[  186.737860] pci 0000:04:03.0:   bridge window [mem 0xdf900000-0xdf9fffff]
[  186.737899] pci 0000:04:04.0: PCI bridge to [bus 08]
[  186.737943] pci 0000:04:05.0: PCI bridge to [bus 09]
[  186.738025] pci 0000:0a:00.0: calling  quirk_f0_vpd_link+0x0/0x60 @ 1
[  186.738028] pci 0000:0a:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs
[  186.738031] pci 0000:0a:00.0: [8086:1539] type 00 class 0x020000
[  186.738058] pci 0000:0a:00.0: reg 0x10: [mem 0xdf800000-0xdf81ffff]
[  186.738082] pci 0000:0a:00.0: reg 0x18: [io  0xb000-0xb01f]
[  186.738095] pci 0000:0a:00.0: reg 0x1c: [mem 0xdf820000-0xdf823fff]
[  186.738159] pci 0000:0a:00.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
[  186.738162] pci 0000:0a:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
[  186.738265] pci 0000:0a:00.0: PME# supported from D0 D3hot D3cold
[  186.738397] pci 0000:04:06.0: PCI bridge to [bus 0a]
[  186.738403] pci 0000:04:06.0:   bridge window [io  0xb000-0xbfff]
[  186.738407] pci 0000:04:06.0:   bridge window [mem 0xdf800000-0xdf8fffff]
[  186.738448] pci 0000:04:07.0: PCI bridge to [bus 0b]
[  186.738540] pci 0000:0c:00.0: calling  fixup_mpss_256+0x0/0x20 @ 1
[  186.738543] pci 0000:0c:00.0: fixup_mpss_256+0x0/0x20 took 0 usecs
[  186.738546] pci 0000:0c:00.0: [1b21:0612] type 00 class 0x010601
[  186.738563] pci 0000:0c:00.0: reg 0x10: [io  0xd050-0xd057]
[  186.738572] pci 0000:0c:00.0: reg 0x14: [io  0xd040-0xd043]
[  186.738582] pci 0000:0c:00.0: reg 0x18: [io  0xd030-0xd037]
[  186.738591] pci 0000:0c:00.0: reg 0x1c: [io  0xd020-0xd023]
[  186.738601] pci 0000:0c:00.0: reg 0x20: [io  0xd000-0xd01f]
[  186.738610] pci 0000:0c:00.0: reg 0x24: [mem 0xdfb00000-0xdfb001ff]
[  186.738799] pci 0000:00:1c.4: PCI bridge to [bus 0c]
[  186.738802] pci 0000:00:1c.4:   bridge window [io  0xd000-0xdfff]
[  186.738805] pci 0000:00:1c.4:   bridge window [mem 0xdfb00000-0xdfbfffff]
[  186.738855] pci 0000:0d:00.0: [1b21:1142] type 00 class 0x0c0330
[  186.738881] pci 0000:0d:00.0: reg 0x10: [mem 0xdfa00000-0xdfa07fff 64bit]
[  186.739001] pci 0000:0d:00.0: PME# supported from D3cold
[  186.739136] pci 0000:00:1c.6: PCI bridge to [bus 0d]
[  186.739140] pci 0000:00:1c.6:   bridge window [mem 0xdfa00000-0xdfafffff]
[  186.739160] pci_bus 0000:00: on NUMA node 0


* boot seems to continue as normal from here *


I did a bisect, which landed in the VPD rework that Heiner landed in
this merge window.   Specifically it ended up on ...


git bisect start
# good: [7d2a07b769330c34b4deabeed939325c77a7ec2f] Linux 5.14
git bisect good 7d2a07b769330c34b4deabeed939325c77a7ec2f
# bad: [6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f] Linux 5.15-rc1
git bisect bad 6880fa6c56601bb8ed59df6c30fd390cc5f6dd8f
# good: [1b4f3dfb4792f03b139edf10124fcbeb44e608e6] Merge tag 'usb-serial-5.15-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-next                                              
git bisect good 1b4f3dfb4792f03b139edf10124fcbeb44e608e6
# good: [c793011242d182e5f12800c12dbaf37af80be735] Merge tag 'pinctrl-v5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl                                                              
git bisect good c793011242d182e5f12800c12dbaf37af80be735
# good: [5e6a5845dd651b00754a62edec2f0a439182024d] Merge tag 'gpio-updates-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux                                                                 
git bisect good 5e6a5845dd651b00754a62edec2f0a439182024d
# bad: [0f4b9289bad354b606190a4cd54d5222b4e41d98] Merge tag 'docs-5.15-2' of git://git.lwn.net/linux
git bisect bad 0f4b9289bad354b606190a4cd54d5222b4e41d98
# good: [626bf91a292e2035af5b9d9cce35c5c138dfe06d] Merge tag 'net-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net                                                                           
git bisect good 626bf91a292e2035af5b9d9cce35c5c138dfe06d
# bad: [49832c819ab85b33b7a2a1429c8d067e82be2977] Makefile: use -Wno-main in the full kernel tree
git bisect bad 49832c819ab85b33b7a2a1429c8d067e82be2977
# bad: [739c4747a25af3a2571f69e4709f2b476a17d5d4] Merge branch 'pci/misc'
git bisect bad 739c4747a25af3a2571f69e4709f2b476a17d5d4
# bad: [2c208abd4f9efac02622d8f3c9989f4b7b1ad973] PCI/VPD: Use unaligned access helpers
git bisect bad 2c208abd4f9efac02622d8f3c9989f4b7b1ad973
# bad: [f240e15097c5004811a58f2cbc170bf90d06d0a9] tg3: Read VPD with pci_vpd_alloc()
git bisect bad f240e15097c5004811a58f2cbc170bf90d06d0a9
# good: [d27f7344ba89897d0ce6ebe0c9eecbe214f9bb8f] PCI/VPD: Reorder pci_read_vpd(), pci_write_vpd()
git bisect good d27f7344ba89897d0ce6ebe0c9eecbe214f9bb8f
# bad: [fe7568cf2f2dc3a0783f6ffdb3802c1ce2085466] PCI/VPD: Treat invalid VPD like missing VPD capability
git bisect bad fe7568cf2f2dc3a0783f6ffdb3802c1ce2085466
# good: [22ff2bcec704a7a8c43a998251e0757cd2de66e1] PCI/VPD: Remove struct pci_vpd.valid member
git bisect good 22ff2bcec704a7a8c43a998251e0757cd2de66e1
# bad: [7bac54497c3e3b2ca37b7043f1fa78586540f10e] PCI/VPD: Determine VPD size in pci_vpd_init()
git bisect bad 7bac54497c3e3b2ca37b7043f1fa78586540f10e
# good: [fd00faa375fbb9d46ae0730d0faf4a3006301005] PCI/VPD: Embed struct pci_vpd in struct pci_dev
git bisect good fd00faa375fbb9d46ae0730d0faf4a3006301005
# first bad commit: [7bac54497c3e3b2ca37b7043f1fa78586540f10e] PCI/VPD: Determine VPD size in pci_vpd_init()   

7bac54497c3e3b2ca37b7043f1fa78586540f10e is the first bad commit
commit 7bac54497c3e3b2ca37b7043f1fa78586540f10e
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Sun Aug 8 19:22:52 2021 +0200

    PCI/VPD: Determine VPD size in pci_vpd_init()

    Determine VPD size in pci_vpd_init().

    Quirks set dev->vpd.len to a non-zero value, so they cause us to skip the
    dynamic size calculation.  Prerequisite is that we move the quirks from
    FINAL to HEADER so they are run before pci_vpd_init().

    Link: https://lore.kernel.org/r/cc4a6538-557a-294d-4f94-e6d1d3c91589@gmail.com
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>


Which unfortunately doesn't revert cleanly I can't test it reverted in
isolation.

My guess is there's something quirky about the PCI bus on this machine
that causes stalls until we hit timeout, but I'm not sure where to begin
debugging this.

There's nothing too exotic on this machine:

00:00.0 Host bridge: Intel Corporation 4th Gen Core Processor DRAM Controller (rev 06)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller (rev 06)
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
00:03.0 Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor HD Audio Controller (rev 06)
00:14.0 USB controller: Intel Corporation 9 Series Chipset Family USB xHCI Controller
00:16.0 Communication controller: Intel Corporation 9 Series Chipset Family ME Interface #1
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V
00:1a.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #2
00:1b.0 Audio device: Intel Corporation 9 Series Chipset Family HD Audio Controller
00:1c.0 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 1 (rev d0)
00:1c.3 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 4 (rev d0)
00:1c.4 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 5 (rev d0)
00:1c.6 PCI bridge: Intel Corporation 9 Series Chipset Family PCI Express Root Port 7 (rev d0)
00:1d.0 USB controller: Intel Corporation 9 Series Chipset Family USB EHCI Controller #1
00:1f.0 ISA bridge: Intel Corporation Z97 Chipset LPC Controller
00:1f.2 SATA controller: Intel Corporation 9 Series Chipset Family SATA Controller [AHCI Mode]
00:1f.3 SMBus: Intel Corporation 9 Series Chipset Family SMBus Controller
01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)
03:00.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:01.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:02.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:03.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:04.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:05.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:06.0 PCI bridge: ASMedia Technology Inc. Device 1187
04:07.0 PCI bridge: ASMedia Technology Inc. Device 1187
06:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4352 802.11ac Wireless Network Adapter (rev 03)
07:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
0a:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
0c:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 02)
0d:00.0 USB controller: ASMedia Technology Inc. ASM1042A USB 3.0 Host Controller


	Dave


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 14:18 ` Dave Jones
@ 2021-09-13 18:59   ` Heiner Kallweit
  2021-09-13 19:51     ` Linus Torvalds
  2021-09-13 20:15     ` Dave Jones
  2021-09-13 23:46   ` Bjorn Helgaas
  1 sibling, 2 replies; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-13 18:59 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas
  Cc: linux-pci

On 13.09.2021 16:18, Dave Jones wrote:
> [  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
> [  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
> [  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
> [  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
> [  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update

Thanks for the report! The stalls may be related to this one. Device is:
02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)

With an older kernel you may experience the stall when accessing the vpd
attribute of this device in sysfs.

Maybe the device indicates VPD capability but doesn't actually support it.
Could you please provide the "lspci -vv" output for this device?

And could you please test with the following applied to verify the
assumption? It disables VPD access for this device.

---
 drivers/pci/vpd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 517789205..fc92e880e 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd);
 /*
  * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
  * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
-- 
2.33.0







^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 18:59   ` Heiner Kallweit
@ 2021-09-13 19:51     ` Linus Torvalds
  2021-09-13 20:09       ` Bjorn Helgaas
  2021-09-13 20:11       ` Heiner Kallweit
  2021-09-13 20:15     ` Dave Jones
  1 sibling, 2 replies; 38+ messages in thread
From: Linus Torvalds @ 2021-09-13 19:51 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci

On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> With an older kernel you may experience the stall when accessing the vpd
> attribute of this device in sysfs.

Honestly, that old behavior seems to be the *much* better behavior.

A synchronous stall at boot time is truly annoying, and a pain to deal
with (and debug).

That pci_vpd_read() function is clearly NOT designed to deal with
boot-time callers in the first place, so I think that commit is simply
wrong.

And yes, I see that "128ms timeout". If it was _one_ timeout, that
would be one thing,. But it looks like it's repeated over and over.

Not acceptable at boot time. Not at all.

Bjorn. Please revert. Or I can do it.

            Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 19:51     ` Linus Torvalds
@ 2021-09-13 20:09       ` Bjorn Helgaas
  2021-09-13 20:11       ` Heiner Kallweit
  1 sibling, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2021-09-13 20:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiner Kallweit, Dave Jones, Linux Kernel Mailing List,
	Bjorn Helgaas, linux-pci

On Mon, Sep 13, 2021 at 12:51:30PM -0700, Linus Torvalds wrote:
> On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
> >
> > With an older kernel you may experience the stall when accessing the vpd
> > attribute of this device in sysfs.
> 
> Honestly, that old behavior seems to be the *much* better behavior.
> 
> A synchronous stall at boot time is truly annoying, and a pain to deal
> with (and debug).
> 
> That pci_vpd_read() function is clearly NOT designed to deal with
> boot-time callers in the first place, so I think that commit is simply
> wrong.
> 
> And yes, I see that "128ms timeout". If it was _one_ timeout, that
> would be one thing,. But it looks like it's repeated over and over.
> 
> Not acceptable at boot time. Not at all.
> 
> Bjorn. Please revert. Or I can do it.

Sorry about this.

Dave said it wasn't a trivial revert, but I'll be happy to work with
Heiner and Dave to revert and test it.

I agree that we shouldn't read VPD at boot-time unless we actually
need the data then.

Bjorn

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 19:51     ` Linus Torvalds
  2021-09-13 20:09       ` Bjorn Helgaas
@ 2021-09-13 20:11       ` Heiner Kallweit
  2021-09-13 20:15         ` Linus Torvalds
  1 sibling, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci

On 13.09.2021 21:51, Linus Torvalds wrote:
> On Mon, Sep 13, 2021 at 12:00 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> With an older kernel you may experience the stall when accessing the vpd
>> attribute of this device in sysfs.
> 
> Honestly, that old behavior seems to be the *much* better behavior.
> 
> A synchronous stall at boot time is truly annoying, and a pain to deal
> with (and debug).
> 
> That pci_vpd_read() function is clearly NOT designed to deal with
> boot-time callers in the first place, so I think that commit is simply
> wrong.
> 
> And yes, I see that "128ms timeout". If it was _one_ timeout, that
> would be one thing,. But it looks like it's repeated over and over.
> 
No. The timeout is not the issue, otherwise you would see the message
"VPD access failed.."  over and over again. The issue here seems to be
that this call in PCI config space access to adress
vpd->cap + PCI_VPD_ADDR stalls.

In a first place this seems to be due to a buggy device. We'll know
for sure once Dave checks with the test patch applied. To deal with
such buggy devices we have the VPD blacklist quirk.

Secondly you could blame the PCI subsystem for not detecting stalled
access to a buggy device. However I don't know the PCIe spec good
enough to really comment on this.

> Not acceptable at boot time. Not at all.
> 
> Bjorn. Please revert. Or I can do it.
> 
>             Linus
> 
Heiner

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 18:59   ` Heiner Kallweit
  2021-09-13 19:51     ` Linus Torvalds
@ 2021-09-13 20:15     ` Dave Jones
  2021-09-13 20:22       ` Heiner Kallweit
  2021-09-13 20:32       ` Linux 5.15-rc1 Heiner Kallweit
  1 sibling, 2 replies; 38+ messages in thread
From: Dave Jones @ 2021-09-13 20:15 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci

On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote:
 > On 13.09.2021 16:18, Dave Jones wrote:
 > > [  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
 > > [  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
 > > [  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
 > > [  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
 > > [  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
 > 
 > Thanks for the report! The stalls may be related to this one. Device is:
 > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)
 > 
 > With an older kernel you may experience the stall when accessing the vpd
 > attribute of this device in sysfs.
 > 
 > Maybe the device indicates VPD capability but doesn't actually support it.
 > Could you please provide the "lspci -vv" output for this device?

02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0])
        Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 16
        Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K]
        Expansion ROM at dfc00000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s (ok), Width x2 (downgraded)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS- TPHComp- ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
                LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
                         EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
                         Retimer- 2Retimers- CrosslinkRes: unsupported
        Capabilities: [d0] Vital Product Data
                Not readable
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [150 v1] Power Budgeting <?>
        Capabilities: [160 v1] Latency Tolerance Reporting
                Max snoop latency: 71680ns
                Max no snoop latency: 71680ns
        Kernel driver in use: ahci


 > And could you please test with the following applied to verify the
 > assumption? It disables VPD access for this device.
 > 
 > ---
 >  drivers/pci/vpd.c | 1 +
 >  1 file changed, 1 insertion(+)
 > 
 > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
 > index 517789205..fc92e880e 100644
 > --- a/drivers/pci/vpd.c
 > +++ b/drivers/pci/vpd.c
 > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
 >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
 >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
 >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
 > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd);
 >  /*
 >   * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
 >   * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.


This didn't help I'm afraid :(
It changed the VPD warning, but that's about it...

[  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
[  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
[  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs


	Dave


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:11       ` Heiner Kallweit
@ 2021-09-13 20:15         ` Linus Torvalds
  0 siblings, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2021-09-13 20:15 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci

On Mon, Sep 13, 2021 at 1:11 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> No. The timeout is not the issue, otherwise you would see the message
> "VPD access failed.."  over and over again.

Ahh, I did check that that error did exist, but you're right, it's not
there all the time.

> The issue here seems to be
> that this call in PCI config space access to adress
> vpd->cap + PCI_VPD_ADDR stalls.

Ok. Regardless, we shouldn't do this since the boot code shouldn't
need any of the VPD information.

        Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:15     ` Dave Jones
@ 2021-09-13 20:22       ` Heiner Kallweit
  2021-09-13 20:32         ` Dave Jones
  2021-09-13 20:32       ` Linux 5.15-rc1 Heiner Kallweit
  1 sibling, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:22 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List,
	Bjorn Helgaas, linux-pci

On 13.09.2021 22:15, Dave Jones wrote:
> On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote:
>  > On 13.09.2021 16:18, Dave Jones wrote:
>  > > [  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
>  > > [  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
>  > > [  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
>  > > [  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
>  > > [  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
>  > 
>  > Thanks for the report! The stalls may be related to this one. Device is:
>  > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)
>  > 
>  > With an older kernel you may experience the stall when accessing the vpd
>  > attribute of this device in sysfs.
>  > 
>  > Maybe the device indicates VPD capability but doesn't actually support it.
>  > Could you please provide the "lspci -vv" output for this device?
> 
> 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0])
>         Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 16
>         Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K]
>         Expansion ROM at dfc00000 [disabled] [size=64K]
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [70] Express (v2) Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
>                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>                 LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
>                         ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>                 LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 5GT/s (ok), Width x2 (downgraded)
>                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
>                          10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
>                          EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
>                          FRS- TPHComp- ExtTPHComp-
>                          AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
>                          AtomicOpsCtl: ReqEn-
>                 LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
>                 LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>                          EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
>                          Retimer- 2Retimers- CrosslinkRes: unsupported
>         Capabilities: [d0] Vital Product Data
>                 Not readable
>         Capabilities: [100 v2] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>                 AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>                         MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>                 HeaderLog: 00000000 00000000 00000000 00000000
>         Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00
>         Capabilities: [150 v1] Power Budgeting <?>
>         Capabilities: [160 v1] Latency Tolerance Reporting
>                 Max snoop latency: 71680ns
>                 Max no snoop latency: 71680ns
>         Kernel driver in use: ahci
> 
> 
>  > And could you please test with the following applied to verify the
>  > assumption? It disables VPD access for this device.
>  > 
>  > ---
>  >  drivers/pci/vpd.c | 1 +
>  >  1 file changed, 1 insertion(+)
>  > 
>  > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
>  > index 517789205..fc92e880e 100644
>  > --- a/drivers/pci/vpd.c
>  > +++ b/drivers/pci/vpd.c
>  > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>  > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd);
>  >  /*
>  >   * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
>  >   * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
> 
> 
> This didn't help I'm afraid :(
> It changed the VPD warning, but that's about it...
> 
> [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
> [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
> [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
> 
With this patch there's no VPD access to this device any longer. So this can't be
the root cause. Do you have any other PCI device that has VPD capability?
-> Capabilities: [...] Vital Product Data

> 
> 	Dave
> 
Heiner

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:22       ` Heiner Kallweit
@ 2021-09-13 20:32         ` Dave Jones
  2021-09-13 20:44           ` Linux 5.15-rc1 - 82599ES VPD access isue Heiner Kallweit
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Jones @ 2021-09-13 20:32 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci

On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:

 > > This didn't help I'm afraid :(
 > > It changed the VPD warning, but that's about it...
 > > 
 > > [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
 > > [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
 > > [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
 > > 
 > With this patch there's no VPD access to this device any longer. So this can't be
 > the root cause. Do you have any other PCI device that has VPD capability?
 > -> Capabilities: [...] Vital Product Data


01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        Subsystem: Device 1dcf:030a
	...
	        Capabilities: [e0] Vital Product Data
                Unknown small resource type 06, will not decode more.


I'll add that to the quirk list and see if that helps.

	Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:15     ` Dave Jones
  2021-09-13 20:22       ` Heiner Kallweit
@ 2021-09-13 20:32       ` Heiner Kallweit
  2021-09-13 20:36         ` Linus Torvalds
  2021-09-13 20:41         ` Dave Jones
  1 sibling, 2 replies; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:32 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Linux Kernel Mailing List,
	Bjorn Helgaas, linux-pci

On 13.09.2021 22:15, Dave Jones wrote:
> On Mon, Sep 13, 2021 at 08:59:49PM +0200, Heiner Kallweit wrote:
>  > On 13.09.2021 16:18, Dave Jones wrote:
>  > > [  186.595296] pci 0000:02:00.0: [144d:a800] type 00 class 0x010601
>  > > [  186.595351] pci 0000:02:00.0: reg 0x24: [mem 0xdfc10000-0xdfc11fff]
>  > > [  186.595361] pci 0000:02:00.0: reg 0x30: [mem 0xdfc00000-0xdfc0ffff pref]
>  > > [  186.595425] pci 0000:02:00.0: PME# supported from D3hot D3cold
>  > > [  186.735107] pci 0000:02:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update
>  > 
>  > Thanks for the report! The stalls may be related to this one. Device is:
>  > 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01)
>  > 
>  > With an older kernel you may experience the stall when accessing the vpd
>  > attribute of this device in sysfs.
>  > 
>  > Maybe the device indicates VPD capability but doesn't actually support it.
>  > Could you please provide the "lspci -vv" output for this device?
> 
> 02:00.0 SATA controller: Samsung Electronics Co Ltd XP941 PCIe SSD (rev 01) (prog-if 01 [AHCI 1.0])
>         Subsystem: Samsung Electronics Co Ltd XP941 PCIe SSD
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 16
>         Region 5: Memory at dfc10000 (32-bit, non-prefetchable) [size=8K]
>         Expansion ROM at dfc00000 [disabled] [size=64K]
>         Capabilities: [40] Power Management version 3
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [70] Express (v2) Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 25.000W
>                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
>                         MaxPayload 128 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
>                 LnkCap: Port #0, Speed 5GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
>                         ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
>                 LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 5GT/s (ok), Width x2 (downgraded)
>                         TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
>                          10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
>                          EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
>                          FRS- TPHComp- ExtTPHComp-
>                          AtomicOpsCap: 32bit- 64bit- 128bitCAS-
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
>                          AtomicOpsCtl: ReqEn-
>                 LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
>                 LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
>                          EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
>                          Retimer- 2Retimers- CrosslinkRes: unsupported
>         Capabilities: [d0] Vital Product Data
>                 Not readable
>         Capabilities: [100 v2] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>                 AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
>                         MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>                 HeaderLog: 00000000 00000000 00000000 00000000
>         Capabilities: [140 v1] Device Serial Number 00-00-00-00-00-00-00-00
>         Capabilities: [150 v1] Power Budgeting <?>
>         Capabilities: [160 v1] Latency Tolerance Reporting
>                 Max snoop latency: 71680ns
>                 Max no snoop latency: 71680ns
>         Kernel driver in use: ahci
> 
> 
>  > And could you please test with the following applied to verify the
>  > assumption? It disables VPD access for this device.
>  > 
>  > ---
>  >  drivers/pci/vpd.c | 1 +
>  >  1 file changed, 1 insertion(+)
>  > 
>  > diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
>  > index 517789205..fc92e880e 100644
>  > --- a/drivers/pci/vpd.c
>  > +++ b/drivers/pci/vpd.c
>  > @@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
>  >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>  > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_SAMSUNG, 0xa800, quirk_blacklist_vpd);
>  >  /*
>  >   * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
>  >   * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
> 
> 
> This didn't help I'm afraid :(
> It changed the VPD warning, but that's about it...
> 
> [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
> [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
> [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
> 

OK, so this device is buggy too but not the root cause. After checking again the
stalls happen for VPD access to both ports of the Intel network adapter.

01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

I modified the test patch accordingly, could you please test again?

---
 drivers/pci/vpd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 517789205..fc92e880e 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -540,6 +540,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x10fb, quirk_blacklist_vpd);
 /*
  * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
  * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
-- 
2.33.0


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:32       ` Linux 5.15-rc1 Heiner Kallweit
@ 2021-09-13 20:36         ` Linus Torvalds
  2021-09-13 20:41         ` Dave Jones
  1 sibling, 0 replies; 38+ messages in thread
From: Linus Torvalds @ 2021-09-13 20:36 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci

On Mon, Sep 13, 2021 at 1:33 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> OK, so this device is buggy too but not the root cause. After checking again the
> stalls happen for VPD access to both ports of the Intel network adapter.

I really don't want to add quirks like this.

If this happens to a major manufacturer (whoe _defined_ the PCI
standard, for chrissake), then this is not something where we want a
quirk to avoid a boot-time delay.

That commit needs to be reverted. If reverting it is hard, then we
need to revert everything it depends on.

              Linus

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 20:32       ` Linux 5.15-rc1 Heiner Kallweit
  2021-09-13 20:36         ` Linus Torvalds
@ 2021-09-13 20:41         ` Dave Jones
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Jones @ 2021-09-13 20:41 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Linus Torvalds, Linux Kernel Mailing List, Bjorn Helgaas, linux-pci

On Mon, Sep 13, 2021 at 10:32:56PM +0200, Heiner Kallweit wrote:

 > > [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
 > > [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
 > > [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
 > > 
 > 
 > OK, so this device is buggy too but not the root cause. After checking again the
 > stalls happen for VPD access to both ports of the Intel network adapter.
 > 
 > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
 > 01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
 > 
 > I modified the test patch accordingly, could you please test again?

That does avoid the stall.

	Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-13 20:32         ` Dave Jones
@ 2021-09-13 20:44           ` Heiner Kallweit
  2021-09-13 23:32             ` [Intel-wired-lan] " Hisashi T Fujinaka
  0 siblings, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-13 20:44 UTC (permalink / raw)
  To: Jesse Brandeburg, Tony Nguyen
  Cc: linux-pci, Linus Torvalds, Linux Kernel Mailing List,
	Bjorn Helgaas, Dave Jones, intel-wired-lan

On 13.09.2021 22:32, Dave Jones wrote:

+ Jesse and Tony as Intel NIC maintainers

> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:
> 
>  > > This didn't help I'm afraid :(
>  > > It changed the VPD warning, but that's about it...
>  > > 
>  > > [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
>  > > [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)                                                                                           
>  > > [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
>  > > 
>  > With this patch there's no VPD access to this device any longer. So this can't be
>  > the root cause. Do you have any other PCI device that has VPD capability?
>  > -> Capabilities: [...] Vital Product Data
> 
> 
> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>         Subsystem: Device 1dcf:030a
> 	...
> 	        Capabilities: [e0] Vital Product Data
>                 Unknown small resource type 06, will not decode more.
> 

When searching I found the same symptom of invalid VPD data for 82599EB.
Do these adapters have non-VPD data in VPD address space? Or is the actual
VPD data at another offset than 0? I know that few Chelsio devices have
such a non-standard VPD structure.

> 
> I'll add that to the quirk list and see if that helps.
> 
> 	Dave
> 
Heiner

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-13 20:44           ` Linux 5.15-rc1 - 82599ES VPD access isue Heiner Kallweit
@ 2021-09-13 23:32             ` Hisashi T Fujinaka
  2021-09-14  5:51               ` Heiner Kallweit
  0 siblings, 1 reply; 38+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-13 23:32 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jesse Brandeburg, Tony Nguyen, Dave Jones, linux-pci,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas,
	Linus Torvalds, todd.fujinaka

On Mon, 13 Sep 2021, Heiner Kallweit wrote:

> On 13.09.2021 22:32, Dave Jones wrote:
>
> + Jesse and Tony as Intel NIC maintainers
>
>> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:
>>
>> >> This didn't help I'm afraid :(
>> >> It changed the VPD warning, but that's about it...
>> >>
>> >> [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
>> >> [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)
>> >> [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
>> >>
>> > With this patch there's no VPD access to this device any longer. So this can't be
>> > the root cause. Do you have any other PCI device that has VPD capability?
>> > -> Capabilities: [...] Vital Product Data
>>
>>
>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>         Subsystem: Device 1dcf:030a
>> 	...
>> 	        Capabilities: [e0] Vital Product Data
>>                 Unknown small resource type 06, will not decode more.
>>
>
> When searching I found the same symptom of invalid VPD data for 82599EB.
> Do these adapters have non-VPD data in VPD address space? Or is the actual
> VPD data at another offset than 0? I know that few Chelsio devices have
> such a non-standard VPD structure.
>
>>
>> I'll add that to the quirk list and see if that helps.
>>
>> 	Dave
>>
> Heiner

Sorry to reply from my personal account. If I did it from my work
account I'd be top-posting because of Outlook and that goes over like a
lead balloon.

Anyway, can you send us a dump of your eeprom using ethtool -e? You can
either send it via a bug on e1000.sourceforge.net or try sending it to
todd.fujinaka@intel.com

The other thing is I'm wondering is what the subvendor device ID you
have is referring to because it's not in the pci database. Some ODMs
like getting creative with what they put in the NVM.

Todd Fujinaka (todd.fujinaka@intel.com)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 14:18 ` Dave Jones
  2021-09-13 18:59   ` Heiner Kallweit
@ 2021-09-13 23:46   ` Bjorn Helgaas
  2021-09-14  0:39     ` Dave Jones
  2021-09-14  6:21     ` Heiner Kallweit
  1 sibling, 2 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2021-09-13 23:46 UTC (permalink / raw)
  To: Dave Jones, Linus Torvalds, Heiner Kallweit; +Cc: linux-kernel, linux-pci

On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote:
> On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote:
>  > So 5.15 isn't shaping up to be a particularly large release, at least
>  > in number of commits. At only just over 10k non-merge commits, this is
>  > in fact the smallest rc1 we have had in the 5.x series. We're usually
>  > hovering in the 12-14k commit range.
> 
> This release takes over two minutes longer to boot on one my
> machines than 5.14.  The time just seems to be unaccounted for, even
> with initcall_debug

> ...
> [    2.194093] pci 0000:01:00.0: calling  quirk_f0_vpd_link+0x0/0x60 @ 1
> [    2.194097] pci 0000:01:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs
> [    2.194100] pci 0000:01:00.0: [8086:10fb] type 00 class 0x020000
> [    2.194109] pci 0000:01:00.0: reg 0x10: [mem 0xd0080000-0xd00fffff 64bit pref]
> [    2.194113] pci 0000:01:00.0: reg 0x18: [io  0xe020-0xe03f]
> [    2.194121] pci 0000:01:00.0: reg 0x20: [mem 0xd0104000-0xd0107fff 64bit pref]
> [    2.194126] pci 0000:01:00.0: reg 0x30: [mem 0xdfd80000-0xdfdfffff pref]
> [    2.194136] pci 0000:01:00.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
> [    2.194139] pci 0000:01:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
> [    2.194164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
> 
> * stall here for 86 seconds *
> 
> [   88.675114] pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref]
> ...


> 7bac54497c3e3b2ca37b7043f1fa78586540f10e is the first bad commit
> commit 7bac54497c3e3b2ca37b7043f1fa78586540f10e
> Author: Heiner Kallweit <hkallweit1@gmail.com>
> Date:   Sun Aug 8 19:22:52 2021 +0200
> 
>     PCI/VPD: Determine VPD size in pci_vpd_init()
> 
>     Determine VPD size in pci_vpd_init().
> 
>     Quirks set dev->vpd.len to a non-zero value, so they cause us to skip the
>     dynamic size calculation.  Prerequisite is that we move the quirks from
>     FINAL to HEADER so they are run before pci_vpd_init().
> 
>     Link: https://lore.kernel.org/r/cc4a6538-557a-294d-4f94-e6d1d3c91589@gmail.com
>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> 
> Which unfortunately doesn't revert cleanly I can't test it reverted in
> isolation.
> 
> My guess is there's something quirky about the PCI bus on this machine
> that causes stalls until we hit timeout, but I'm not sure where to begin
> debugging this.

Sorry for the inconvenience of this, and thank you very much for doing
the bisection to track it down.

We *could* revert 7bac54497c3e, but it'd be messy because a bunch of
follow-up stuff depends on it.

I propose something like the patch below.  Would you mind trying it
out?


commit 4ede9949b93c ("PCI/VPD: Defer VPD sizing until first access")
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Mon Sep 13 16:13:26 2021 -0500

    PCI/VPD: Defer VPD sizing until first access
    
    7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") reads VPD at
    enumeration-time to find the size.  But this is quite slow, and we don't
    need the size until we actually need data from VPD.  Dave reported a boot
    slowdown of more than two minutes [1].
    
    Defer the VPD sizing until a driver or the user requests information from
    VPD.  If devices are quirked because VPD is known not to work, clear the
    vpd.cap pointer so we don't access it at all.
    
    [1] https://lore.kernel.org/r/20210913141818.GA27911@codemonkey.org.uk/
    Fixes: 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()")
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 25557b272a4f..ca823ceee10c 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -46,13 +46,12 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev)
 }
 
 #define PCI_VPD_MAX_SIZE	(PCI_VPD_ADDR_MASK + 1)
-#define PCI_VPD_SZ_INVALID	UINT_MAX
 
 /**
  * pci_vpd_size - determine actual size of Vital Product Data
  * @dev:	pci device struct
  */
-static size_t pci_vpd_size(struct pci_dev *dev)
+static void pci_vpd_size(struct pci_dev *dev)
 {
 	size_t off = 0, size;
 	unsigned char tag, header[1+2];	/* 1 byte tag, 2 bytes length */
@@ -71,7 +70,7 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 			if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) {
 				pci_warn(dev, "failed VPD read at offset %zu\n",
 					 off + 1);
-				return off ?: PCI_VPD_SZ_INVALID;
+				goto finish;
 			}
 			size = pci_vpd_lrdt_size(header);
 			if (off + size > PCI_VPD_MAX_SIZE)
@@ -87,16 +86,19 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 
 			off += PCI_VPD_SRDT_TAG_SIZE + size;
 			if (tag == PCI_VPD_STIN_END)	/* End tag descriptor */
-				return off;
+				goto finish;
 		}
 	}
-	return off;
+	goto finish;
 
 error:
 	pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n",
 		 header[0], size, off, off == 0 ?
 		 "; assume missing optional EEPROM" : "");
-	return off ?: PCI_VPD_SZ_INVALID;
+finish:
+	dev->vpd.len = off;
+	if (off == 0)
+		dev->vpd.cap = 0;		/* No VPD at all */
 }
 
 /*
@@ -145,9 +147,6 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	u8 *buf = arg;
 
-	if (!vpd->cap)
-		return -ENODEV;
-
 	if (pos < 0)
 		return -EINVAL;
 
@@ -206,9 +205,6 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	int ret = 0;
 
-	if (!vpd->cap)
-		return -ENODEV;
-
 	if (pos < 0 || (pos & 3) || (count & 3))
 		return -EINVAL;
 
@@ -244,12 +240,6 @@ void pci_vpd_init(struct pci_dev *dev)
 {
 	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
 	mutex_init(&dev->vpd.lock);
-
-	if (!dev->vpd.len)
-		dev->vpd.len = pci_vpd_size(dev);
-
-	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
-		dev->vpd.cap = 0;
 }
 
 static ssize_t vpd_read(struct file *filp, struct kobject *kobj,
@@ -294,25 +284,29 @@ const struct attribute_group pci_dev_vpd_attr_group = {
 
 void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
 {
-	unsigned int len = dev->vpd.len;
+	struct pci_vpd *vpd = &dev->vpd;
+	unsigned int len;
 	void *buf;
 	int cnt;
 
-	if (!dev->vpd.cap)
+	if (!vpd->cap)
 		return ERR_PTR(-ENODEV);
 
 	buf = kmalloc(len, GFP_KERNEL);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	cnt = pci_read_vpd(dev, 0, len, buf);
+	if (vpd->len == 0)
+		pci_vpd_size(dev);
+
+	cnt = pci_read_vpd(dev, 0, vpd->len, buf);
 	if (cnt != len) {
 		kfree(buf);
 		return ERR_PTR(-EIO);
 	}
 
 	if (size)
-		*size = len;
+		*size = vpd->len;
 
 	return buf;
 }
@@ -374,6 +368,7 @@ static int pci_vpd_find_info_keyword(const u8 *buf, unsigned int off,
  */
 ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf)
 {
+	struct pci_vpd *vpd;
 	ssize_t ret;
 
 	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
@@ -381,11 +376,27 @@ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf)
 		if (!dev)
 			return -ENODEV;
 
+		vpd = &dev->vpd;
+		if (!vpd->cap) {
+			pci_dev_put(dev);
+			return -ENODEV;
+		}
+
+		if (vpd->len == 0)
+			pci_vpd_size(dev);
+
 		ret = pci_vpd_read(dev, pos, count, buf);
 		pci_dev_put(dev);
 		return ret;
 	}
 
+	vpd = &dev->vpd;
+	if (!vpd->cap)
+		return -ENODEV;
+
+	if (vpd->len == 0)
+		pci_vpd_size(dev);
+
 	return pci_vpd_read(dev, pos, count, buf);
 }
 EXPORT_SYMBOL(pci_read_vpd);
@@ -399,6 +410,7 @@ EXPORT_SYMBOL(pci_read_vpd);
  */
 ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf)
 {
+	struct pci_vpd *vpd;
 	ssize_t ret;
 
 	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
@@ -406,11 +418,26 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void
 		if (!dev)
 			return -ENODEV;
 
+		vpd = &dev->vpd;
+		if (!vpd->cap) {
+			pci_dev_put(dev);
+			return -ENODEV;
+		}
+
+		if (vpd->len == 0)
+			pci_vpd_size(dev);
+
 		ret = pci_vpd_write(dev, pos, count, buf);
 		pci_dev_put(dev);
 		return ret;
 	}
 
+	if (!vpd->cap)
+		return -ENODEV;
+
+	if (vpd->len == 0)
+		pci_vpd_size(dev);
+
 	return pci_vpd_write(dev, pos, count, buf);
 }
 EXPORT_SYMBOL(pci_write_vpd);
@@ -500,27 +527,27 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
  */
 static void quirk_blacklist_vpd(struct pci_dev *dev)
 {
-	dev->vpd.len = PCI_VPD_SZ_INVALID;
+	dev->vpd.cap = 0;
 	pci_warn(dev, FW_BUG "disabling VPD access (can't determine size of non-standard VPD format)\n");
 }
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
 /*
  * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
  * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
  */
-DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031,
-			       PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd);
+DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031,
+			      PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd);
 
 static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
 {
@@ -545,7 +572,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
 		dev->vpd.len = 2048;
 }
 
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
-			 quirk_chelsio_extend_vpd);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
+			quirk_chelsio_extend_vpd);
 
 #endif

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 23:46   ` Bjorn Helgaas
@ 2021-09-14  0:39     ` Dave Jones
  2021-09-14  6:21     ` Heiner Kallweit
  1 sibling, 0 replies; 38+ messages in thread
From: Dave Jones @ 2021-09-14  0:39 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Linus Torvalds, Heiner Kallweit, linux-kernel, linux-pci

On Mon, Sep 13, 2021 at 06:46:08PM -0500, Bjorn Helgaas wrote:
 > On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote:
 > > On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote:
 > >  > So 5.15 isn't shaping up to be a particularly large release, at least
 > >  > in number of commits. At only just over 10k non-merge commits, this is
 > >  > in fact the smallest rc1 we have had in the 5.x series. We're usually
 > >  > hovering in the 12-14k commit range.
 > > 
 > > This release takes over two minutes longer to boot on one my
 > > machines than 5.14.  The time just seems to be unaccounted for, even
 > > with initcall_debug
 > 
 > Sorry for the inconvenience of this, and thank you very much for doing
 > the bisection to track it down.
 > 
 > We *could* revert 7bac54497c3e, but it'd be messy because a bunch of
 > follow-up stuff depends on it.
 > 
 > I propose something like the patch below.  Would you mind trying it
 > out?

This also fixes the problem for me.

	Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-13 23:32             ` [Intel-wired-lan] " Hisashi T Fujinaka
@ 2021-09-14  5:51               ` Heiner Kallweit
  2021-09-14 14:24                 ` Dave Jones
  0 siblings, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-14  5:51 UTC (permalink / raw)
  To: Dave Jones
  Cc: Jesse Brandeburg, Tony Nguyen, linux-pci,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas,
	todd.fujinaka, Hisashi T Fujinaka

On 14.09.2021 01:32, Hisashi T Fujinaka wrote:
> On Mon, 13 Sep 2021, Heiner Kallweit wrote:
> 
>> On 13.09.2021 22:32, Dave Jones wrote:
>>
>> + Jesse and Tony as Intel NIC maintainers
>>
>>> On Mon, Sep 13, 2021 at 10:22:57PM +0200, Heiner Kallweit wrote:
>>>
>>> >> This didn't help I'm afraid :(
>>> >> It changed the VPD warning, but that's about it...
>>> >>
>>> >> [  184.235496] pci 0000:02:00.0: calling  quirk_blacklist_vpd+0x0/0x22 @ 1
>>> >> [  184.235499] pci 0000:02:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format)
>>> >> [  184.235501] pci 0000:02:00.0: quirk_blacklist_vpd+0x0/0x22 took 0 usecs
>>> >>
>>> > With this patch there's no VPD access to this device any longer. So this can't be
>>> > the root cause. Do you have any other PCI device that has VPD capability?
>>> > -> Capabilities: [...] Vital Product Data
>>>
>>>
>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>>         Subsystem: Device 1dcf:030a
>>>     ...
>>>             Capabilities: [e0] Vital Product Data
>>>                 Unknown small resource type 06, will not decode more.
>>>
>>
>> When searching I found the same symptom of invalid VPD data for 82599EB.
>> Do these adapters have non-VPD data in VPD address space? Or is the actual
>> VPD data at another offset than 0? I know that few Chelsio devices have
>> such a non-standard VPD structure.
>>
>>>
>>> I'll add that to the quirk list and see if that helps.
>>>
>>>     Dave
>>>
>> Heiner
> 
> Sorry to reply from my personal account. If I did it from my work
> account I'd be top-posting because of Outlook and that goes over like a
> lead balloon.
> 
> Anyway, can you send us a dump of your eeprom using ethtool -e? You can
> either send it via a bug on e1000.sourceforge.net or try sending it to
> todd.fujinaka@intel.com
> 
> The other thing is I'm wondering is what the subvendor device ID you
> have is referring to because it's not in the pci database. Some ODMs
> like getting creative with what they put in the NVM.
> 
> Todd Fujinaka (todd.fujinaka@intel.com)

Thanks for the prompt reply. Dave, could you please provide the requested
information?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13 23:46   ` Bjorn Helgaas
  2021-09-14  0:39     ` Dave Jones
@ 2021-09-14  6:21     ` Heiner Kallweit
  2021-09-14 11:26       ` Bjorn Helgaas
  1 sibling, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-14  6:21 UTC (permalink / raw)
  To: Bjorn Helgaas, Dave Jones, Linus Torvalds; +Cc: linux-kernel, linux-pci

On 14.09.2021 01:46, Bjorn Helgaas wrote:
> On Mon, Sep 13, 2021 at 10:18:18AM -0400, Dave Jones wrote:
>> On Sun, Sep 12, 2021 at 04:58:27PM -0700, Linus Torvalds wrote:
>>  > So 5.15 isn't shaping up to be a particularly large release, at least
>>  > in number of commits. At only just over 10k non-merge commits, this is
>>  > in fact the smallest rc1 we have had in the 5.x series. We're usually
>>  > hovering in the 12-14k commit range.
>>
>> This release takes over two minutes longer to boot on one my
>> machines than 5.14.  The time just seems to be unaccounted for, even
>> with initcall_debug
> 
>> ...
>> [    2.194093] pci 0000:01:00.0: calling  quirk_f0_vpd_link+0x0/0x60 @ 1
>> [    2.194097] pci 0000:01:00.0: quirk_f0_vpd_link+0x0/0x60 took 0 usecs
>> [    2.194100] pci 0000:01:00.0: [8086:10fb] type 00 class 0x020000
>> [    2.194109] pci 0000:01:00.0: reg 0x10: [mem 0xd0080000-0xd00fffff 64bit pref]
>> [    2.194113] pci 0000:01:00.0: reg 0x18: [io  0xe020-0xe03f]
>> [    2.194121] pci 0000:01:00.0: reg 0x20: [mem 0xd0104000-0xd0107fff 64bit pref]
>> [    2.194126] pci 0000:01:00.0: reg 0x30: [mem 0xdfd80000-0xdfdfffff pref]
>> [    2.194136] pci 0000:01:00.0: calling  quirk_igfx_skip_te_disable+0x0/0x50 @ 1
>> [    2.194139] pci 0000:01:00.0: quirk_igfx_skip_te_disable+0x0/0x50 took 0 usecs
>> [    2.194164] pci 0000:01:00.0: PME# supported from D0 D3hot D3cold
>>
>> * stall here for 86 seconds *
>>
>> [   88.675114] pci 0000:01:00.0: reg 0x184: [mem 0x00000000-0x00003fff 64bit pref]
>> ...
> 
> 
>> 7bac54497c3e3b2ca37b7043f1fa78586540f10e is the first bad commit
>> commit 7bac54497c3e3b2ca37b7043f1fa78586540f10e
>> Author: Heiner Kallweit <hkallweit1@gmail.com>
>> Date:   Sun Aug 8 19:22:52 2021 +0200
>>
>>     PCI/VPD: Determine VPD size in pci_vpd_init()
>>
>>     Determine VPD size in pci_vpd_init().
>>
>>     Quirks set dev->vpd.len to a non-zero value, so they cause us to skip the
>>     dynamic size calculation.  Prerequisite is that we move the quirks from
>>     FINAL to HEADER so they are run before pci_vpd_init().
>>
>>     Link: https://lore.kernel.org/r/cc4a6538-557a-294d-4f94-e6d1d3c91589@gmail.com
>>     Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>>
>>
>> Which unfortunately doesn't revert cleanly I can't test it reverted in
>> isolation.
>>
>> My guess is there's something quirky about the PCI bus on this machine
>> that causes stalls until we hit timeout, but I'm not sure where to begin
>> debugging this.
> 
> Sorry for the inconvenience of this, and thank you very much for doing
> the bisection to track it down.
> 
> We *could* revert 7bac54497c3e, but it'd be messy because a bunch of
> follow-up stuff depends on it.
> 
> I propose something like the patch below.  Would you mind trying it
> out?
> 
> 
> commit 4ede9949b93c ("PCI/VPD: Defer VPD sizing until first access")
> Author: Bjorn Helgaas <bhelgaas@google.com>
> Date:   Mon Sep 13 16:13:26 2021 -0500
> 
>     PCI/VPD: Defer VPD sizing until first access
>     
>     7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") reads VPD at
>     enumeration-time to find the size.  But this is quite slow, and we don't
>     need the size until we actually need data from VPD.  Dave reported a boot
>     slowdown of more than two minutes [1].
>     
>     Defer the VPD sizing until a driver or the user requests information from
>     VPD.  If devices are quirked because VPD is known not to work, clear the
>     vpd.cap pointer so we don't access it at all.
>     
>     [1] https://lore.kernel.org/r/20210913141818.GA27911@codemonkey.org.uk/
>     Fixes: 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()")
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
> index 25557b272a4f..ca823ceee10c 100644
> --- a/drivers/pci/vpd.c
> +++ b/drivers/pci/vpd.c
> @@ -46,13 +46,12 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev)
>  }
>  
>  #define PCI_VPD_MAX_SIZE	(PCI_VPD_ADDR_MASK + 1)
> -#define PCI_VPD_SZ_INVALID	UINT_MAX
>  
>  /**
>   * pci_vpd_size - determine actual size of Vital Product Data
>   * @dev:	pci device struct
>   */
> -static size_t pci_vpd_size(struct pci_dev *dev)
> +static void pci_vpd_size(struct pci_dev *dev)
>  {
>  	size_t off = 0, size;
>  	unsigned char tag, header[1+2];	/* 1 byte tag, 2 bytes length */
> @@ -71,7 +70,7 @@ static size_t pci_vpd_size(struct pci_dev *dev)
>  			if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) {
>  				pci_warn(dev, "failed VPD read at offset %zu\n",
>  					 off + 1);
> -				return off ?: PCI_VPD_SZ_INVALID;
> +				goto finish;
>  			}
>  			size = pci_vpd_lrdt_size(header);
>  			if (off + size > PCI_VPD_MAX_SIZE)
> @@ -87,16 +86,19 @@ static size_t pci_vpd_size(struct pci_dev *dev)
>  
>  			off += PCI_VPD_SRDT_TAG_SIZE + size;
>  			if (tag == PCI_VPD_STIN_END)	/* End tag descriptor */
> -				return off;
> +				goto finish;
>  		}
>  	}
> -	return off;
> +	goto finish;
>  
>  error:
>  	pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n",
>  		 header[0], size, off, off == 0 ?
>  		 "; assume missing optional EEPROM" : "");
> -	return off ?: PCI_VPD_SZ_INVALID;
> +finish:
> +	dev->vpd.len = off;
> +	if (off == 0)
> +		dev->vpd.cap = 0;		/* No VPD at all */
>  }
>  
>  /*
> @@ -145,9 +147,6 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
>  	loff_t end = pos + count;
>  	u8 *buf = arg;
>  
> -	if (!vpd->cap)
> -		return -ENODEV;
> -
>  	if (pos < 0)
>  		return -EINVAL;
>  
> @@ -206,9 +205,6 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
>  	loff_t end = pos + count;
>  	int ret = 0;
>  
> -	if (!vpd->cap)
> -		return -ENODEV;
> -
>  	if (pos < 0 || (pos & 3) || (count & 3))
>  		return -EINVAL;
>  
> @@ -244,12 +240,6 @@ void pci_vpd_init(struct pci_dev *dev)
>  {
>  	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
>  	mutex_init(&dev->vpd.lock);
> -
> -	if (!dev->vpd.len)
> -		dev->vpd.len = pci_vpd_size(dev);
> -
> -	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
> -		dev->vpd.cap = 0;
>  }
>  
>  static ssize_t vpd_read(struct file *filp, struct kobject *kobj,
> @@ -294,25 +284,29 @@ const struct attribute_group pci_dev_vpd_attr_group = {
>  
>  void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
>  {
> -	unsigned int len = dev->vpd.len;
> +	struct pci_vpd *vpd = &dev->vpd;
> +	unsigned int len;
>  	void *buf;
>  	int cnt;
>  
> -	if (!dev->vpd.cap)
> +	if (!vpd->cap)
>  		return ERR_PTR(-ENODEV);
>  
>  	buf = kmalloc(len, GFP_KERNEL);
>  	if (!buf)
>  		return ERR_PTR(-ENOMEM);
>  
> -	cnt = pci_read_vpd(dev, 0, len, buf);
> +	if (vpd->len == 0)
> +		pci_vpd_size(dev);
> +
> +	cnt = pci_read_vpd(dev, 0, vpd->len, buf);
>  	if (cnt != len) {
>  		kfree(buf);
>  		return ERR_PTR(-EIO);
>  	}
>  
>  	if (size)
> -		*size = len;
> +		*size = vpd->len;
>  
>  	return buf;
>  }
> @@ -374,6 +368,7 @@ static int pci_vpd_find_info_keyword(const u8 *buf, unsigned int off,
>   */
>  ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf)
>  {
> +	struct pci_vpd *vpd;
>  	ssize_t ret;
>  
>  	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
> @@ -381,11 +376,27 @@ ssize_t pci_read_vpd(struct pci_dev *dev, loff_t pos, size_t count, void *buf)
>  		if (!dev)
>  			return -ENODEV;
>  
> +		vpd = &dev->vpd;
> +		if (!vpd->cap) {
> +			pci_dev_put(dev);
> +			return -ENODEV;
> +		}
> +
> +		if (vpd->len == 0)
> +			pci_vpd_size(dev);
> +
>  		ret = pci_vpd_read(dev, pos, count, buf);
>  		pci_dev_put(dev);
>  		return ret;
>  	}
>  
> +	vpd = &dev->vpd;
> +	if (!vpd->cap)
> +		return -ENODEV;
> +
> +	if (vpd->len == 0)
> +		pci_vpd_size(dev);
> +
>  	return pci_vpd_read(dev, pos, count, buf);
>  }
>  EXPORT_SYMBOL(pci_read_vpd);
> @@ -399,6 +410,7 @@ EXPORT_SYMBOL(pci_read_vpd);
>   */
>  ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void *buf)
>  {
> +	struct pci_vpd *vpd;
>  	ssize_t ret;
>  
>  	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
> @@ -406,11 +418,26 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void
>  		if (!dev)
>  			return -ENODEV;
>  
> +		vpd = &dev->vpd;
> +		if (!vpd->cap) {
> +			pci_dev_put(dev);
> +			return -ENODEV;
> +		}
> +
> +		if (vpd->len == 0)
> +			pci_vpd_size(dev);
> +
>  		ret = pci_vpd_write(dev, pos, count, buf);
>  		pci_dev_put(dev);
>  		return ret;
>  	}
>  
> +	if (!vpd->cap)
> +		return -ENODEV;
> +
> +	if (vpd->len == 0)
> +		pci_vpd_size(dev);
> +
>  	return pci_vpd_write(dev, pos, count, buf);
>  }
>  EXPORT_SYMBOL(pci_write_vpd);
> @@ -500,27 +527,27 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
>   */
>  static void quirk_blacklist_vpd(struct pci_dev *dev)
>  {
> -	dev->vpd.len = PCI_VPD_SZ_INVALID;
> +	dev->vpd.cap = 0;
>  	pci_warn(dev, FW_BUG "disabling VPD access (can't determine size of non-standard VPD format)\n");
>  }
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0060, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x007c, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0413, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0078, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0079, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0073, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x0071, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005b, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x002f, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005d, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_LSI_LOGIC, 0x005f, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>  /*

Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
blacklisted devices the vpd sysfs attribute isn't visibale. The needed
changes to the patch are minimal.

>   * The Amazon Annapurna Labs 0x0031 device id is reused for other non Root Port
>   * device types, so the quirk is registered for the PCI_CLASS_BRIDGE_PCI class.
>   */
> -DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031,
> -			       PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd);
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_AMAZON_ANNAPURNA_LABS, 0x0031,
> +			      PCI_CLASS_BRIDGE_PCI, 8, quirk_blacklist_vpd);
>  
>  static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>  {
> @@ -545,7 +572,7 @@ static void quirk_chelsio_extend_vpd(struct pci_dev *dev)
>  		dev->vpd.len = 2048;
>  }
>  
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
> -			 quirk_chelsio_extend_vpd);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
> +			quirk_chelsio_extend_vpd);
>  
>  #endif
> 


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14  6:21     ` Heiner Kallweit
@ 2021-09-14 11:26       ` Bjorn Helgaas
  2021-09-14 17:07         ` Heiner Kallweit
  0 siblings, 1 reply; 38+ messages in thread
From: Bjorn Helgaas @ 2021-09-14 11:26 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci

On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote:
> On 14.09.2021 01:46, Bjorn Helgaas wrote:

> > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
> >  /*
> 
> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
> blacklisted devices the vpd sysfs attribute isn't visibale. The needed
> changes to the patch are minimal.

What do you have in mind?  The only thing I can think of would be to
add a "pci_dev.no_vpd" bit.  "vpd.cap == 0" means the device has no
VPD, and "vpd.len == 0" means we haven't determined the size yet.  All
devices start off with vpd.cap == 0 and vpd.len == 0, so a
FIXUP_HEADER quirk would have to set a sentinel value or some other
bit.

Bjorn

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14  5:51               ` Heiner Kallweit
@ 2021-09-14 14:24                 ` Dave Jones
  2021-09-14 18:28                   ` Fujinaka, Todd
  2021-09-14 20:00                   ` Hisashi T Fujinaka
  0 siblings, 2 replies; 38+ messages in thread
From: Dave Jones @ 2021-09-14 14:24 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Jesse Brandeburg, Tony Nguyen, linux-pci,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas,
	todd.fujinaka, Hisashi T Fujinaka

On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
 
 > > Sorry to reply from my personal account. If I did it from my work
 > > account I'd be top-posting because of Outlook and that goes over like a
 > > lead balloon.
 > > 
 > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can
 > > either send it via a bug on e1000.sourceforge.net or try sending it to
 > > todd.fujinaka@intel.com
 > > 
 > > The other thing is I'm wondering is what the subvendor device ID you
 > > have is referring to because it's not in the pci database. Some ODMs
 > > like getting creative with what they put in the NVM.
 > > 
 > > Todd Fujinaka (todd.fujinaka@intel.com)
 > 
 > Thanks for the prompt reply. Dave, could you please provide the requested
 > information?

sent off-list.

	Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 11:26       ` Bjorn Helgaas
@ 2021-09-14 17:07         ` Heiner Kallweit
  2021-09-14 21:55           ` Bjorn Helgaas
  0 siblings, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-14 17:07 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci

On 14.09.2021 13:26, Bjorn Helgaas wrote:
> On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote:
>> On 14.09.2021 01:46, Bjorn Helgaas wrote:
> 
>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>>>  /*
>>
>> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
>> blacklisted devices the vpd sysfs attribute isn't visibale. The needed
>> changes to the patch are minimal.
> 
> What do you have in mind?  The only thing I can think of would be to
> add a "pci_dev.no_vpd" bit.  "vpd.cap == 0" means the device has no
> VPD, and "vpd.len == 0" means we haven't determined the size yet.  All
> devices start off with vpd.cap == 0 and vpd.len == 0, so a
> FIXUP_HEADER quirk would have to set a sentinel value or some other
> bit.
> 
> Bjorn
> 

Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel?

And one more question: Why do you move the "if (!vpd->cap)" check from
pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit.

Here comes my version. Your changes to pci_vpd_size() I left as-is.
I tested the positive case and it works as expected.


diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 25557b272..04b14c488 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -52,7 +52,7 @@ static struct pci_dev *pci_get_func0_dev(struct pci_dev *dev)
  * pci_vpd_size - determine actual size of Vital Product Data
  * @dev:	pci device struct
  */
-static size_t pci_vpd_size(struct pci_dev *dev)
+static void pci_vpd_size(struct pci_dev *dev)
 {
 	size_t off = 0, size;
 	unsigned char tag, header[1+2];	/* 1 byte tag, 2 bytes length */
@@ -71,7 +71,7 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 			if (pci_read_vpd(dev, off + 1, 2, &header[1]) != 2) {
 				pci_warn(dev, "failed VPD read at offset %zu\n",
 					 off + 1);
-				return off ?: PCI_VPD_SZ_INVALID;
+				goto finish;
 			}
 			size = pci_vpd_lrdt_size(header);
 			if (off + size > PCI_VPD_MAX_SIZE)
@@ -87,16 +87,19 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 
 			off += PCI_VPD_SRDT_TAG_SIZE + size;
 			if (tag == PCI_VPD_STIN_END)	/* End tag descriptor */
-				return off;
+				goto finish;
 		}
 	}
-	return off;
+	goto finish;
 
 error:
 	pci_info(dev, "invalid VPD tag %#04x (size %zu) at offset %zu%s\n",
 		 header[0], size, off, off == 0 ?
 		 "; assume missing optional EEPROM" : "");
-	return off ?: PCI_VPD_SZ_INVALID;
+finish:
+	dev->vpd.len = off;
+	if (off == 0)
+		dev->vpd.cap = 0;		/* No VPD at all */
 }
 
 /*
@@ -145,6 +148,8 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	u8 *buf = arg;
 
+	if (vpd->len == 0 && vpd->cap)
+		pci_vpd_size(dev);
 	if (!vpd->cap)
 		return -ENODEV;
 
@@ -206,6 +211,8 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	int ret = 0;
 
+	if (vpd->len == 0 && vpd->cap)
+		pci_vpd_size(dev);
 	if (!vpd->cap)
 		return -ENODEV;
 
@@ -245,9 +252,6 @@ void pci_vpd_init(struct pci_dev *dev)
 	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
 	mutex_init(&dev->vpd.lock);
 
-	if (!dev->vpd.len)
-		dev->vpd.len = pci_vpd_size(dev);
-
 	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
 		dev->vpd.cap = 0;
 }
@@ -294,25 +298,27 @@ const struct attribute_group pci_dev_vpd_attr_group = {
 
 void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
 {
-	unsigned int len = dev->vpd.len;
+	struct pci_vpd *vpd = &dev->vpd;
 	void *buf;
 	int cnt;
 
-	if (!dev->vpd.cap)
+	if (vpd->len == 0 && vpd->cap)
+		pci_vpd_size(dev);
+	if (!vpd->cap)
 		return ERR_PTR(-ENODEV);
 
-	buf = kmalloc(len, GFP_KERNEL);
+	buf = kmalloc(vpd->len, GFP_KERNEL);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);
 
-	cnt = pci_read_vpd(dev, 0, len, buf);
-	if (cnt != len) {
+	cnt = pci_read_vpd(dev, 0, vpd->len, buf);
+	if (cnt != vpd->len) {
 		kfree(buf);
 		return ERR_PTR(-EIO);
 	}
 
 	if (size)
-		*size = len;
+		*size = vpd->len;
 
 	return buf;
 }
-- 
2.33.0





^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14 14:24                 ` Dave Jones
@ 2021-09-14 18:28                   ` Fujinaka, Todd
  2021-09-14 20:00                   ` Hisashi T Fujinaka
  1 sibling, 0 replies; 38+ messages in thread
From: Fujinaka, Todd @ 2021-09-14 18:28 UTC (permalink / raw)
  To: Dave Jones, Heiner Kallweit
  Cc: Brandeburg, Jesse, Nguyen, Anthony L, linux-pci,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas,
	Hisashi T Fujinaka

Wow, I'm waiting for the hardware guy to look at this but this is an off-brand 10Gtek NIC from Amazon that just has nonsense data in the VPD as far as I can tell (3's).

I'll let you know if I find out that the critical settings are bad, but I would just ignore any VPD errors.

Todd Fujinaka
Software Application Engineer
Ethernet Products Group
Intel Corporation
todd.fujinaka@intel.com

-----Original Message-----
From: Dave Jones <davej@codemonkey.org.uk> 
Sent: Tuesday, September 14, 2021 7:24 AM
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Brandeburg, Jesse <jesse.brandeburg@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; linux-pci@vger.kernel.org; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; intel-wired-lan <intel-wired-lan@lists.osuosl.org>; Bjorn Helgaas <bhelgaas@google.com>; Fujinaka, Todd <todd.fujinaka@intel.com>; Hisashi T Fujinaka <htodd@twofifty.com>
Subject: Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue

On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
 
 > > Sorry to reply from my personal account. If I did it from my work  > > account I'd be top-posting because of Outlook and that goes over like a  > > lead balloon.
 > >
 > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can  > > either send it via a bug on e1000.sourceforge.net or try sending it to  > > todd.fujinaka@intel.com  > >  > > The other thing is I'm wondering is what the subvendor device ID you  > > have is referring to because it's not in the pci database. Some ODMs  > > like getting creative with what they put in the NVM.
 > >
 > > Todd Fujinaka (todd.fujinaka@intel.com)  >  > Thanks for the prompt reply. Dave, could you please provide the requested  > information?

sent off-list.

	Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14 14:24                 ` Dave Jones
  2021-09-14 18:28                   ` Fujinaka, Todd
@ 2021-09-14 20:00                   ` Hisashi T Fujinaka
  2021-09-14 21:51                     ` Heiner Kallweit
  1 sibling, 1 reply; 38+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-14 20:00 UTC (permalink / raw)
  To: Dave Jones
  Cc: Heiner Kallweit, linux-pci, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas

On Tue, 14 Sep 2021, Dave Jones wrote:

> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>
> > > Sorry to reply from my personal account. If I did it from my work
> > > account I'd be top-posting because of Outlook and that goes over like a
> > > lead balloon.
> > >
> > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can
> > > either send it via a bug on e1000.sourceforge.net or try sending it to
> > > todd.fujinaka@intel.com
> > >
> > > The other thing is I'm wondering is what the subvendor device ID you
> > > have is referring to because it's not in the pci database. Some ODMs
> > > like getting creative with what they put in the NVM.
> > >
> > > Todd Fujinaka (todd.fujinaka@intel.com)
> >
> > Thanks for the prompt reply. Dave, could you please provide the requested
> > information?
>
> sent off-list.
>
> 	Dave

Whoops. I replied from outlook again.

I have confirmation that this should be a valid image. The VPD is just a
series of 3's. There are changes to preboot header, flash and BAR size,
and as far as I can tell, a nonsense subdevice ID, but this should work.

What was the original question?

Todd Fujinaka <todd.fujinaka@intel.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14 20:00                   ` Hisashi T Fujinaka
@ 2021-09-14 21:51                     ` Heiner Kallweit
  2021-09-15 14:18                       ` Hisashi T Fujinaka
  0 siblings, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-14 21:51 UTC (permalink / raw)
  To: Hisashi T Fujinaka, Dave Jones
  Cc: linux-pci, Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas

On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
> On Tue, 14 Sep 2021, Dave Jones wrote:
> 
>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>>
>> > > Sorry to reply from my personal account. If I did it from my work
>> > > account I'd be top-posting because of Outlook and that goes over like a
>> > > lead balloon.
>> > >
>> > > Anyway, can you send us a dump of your eeprom using ethtool -e? You can
>> > > either send it via a bug on e1000.sourceforge.net or try sending it to
>> > > todd.fujinaka@intel.com
>> > >
>> > > The other thing is I'm wondering is what the subvendor device ID you
>> > > have is referring to because it's not in the pci database. Some ODMs
>> > > like getting creative with what they put in the NVM.
>> > >
>> > > Todd Fujinaka (todd.fujinaka@intel.com)
>> >
>> > Thanks for the prompt reply. Dave, could you please provide the requested
>> > information?
>>
>> sent off-list.
>>
>>     Dave
> 
> Whoops. I replied from outlook again.
> 
> I have confirmation that this should be a valid image. The VPD is just a
> series of 3's. There are changes to preboot header, flash and BAR size,
> and as far as I can tell, a nonsense subdevice ID, but this should work.
> 
> What was the original question?
> 
"lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD
code resulted in a stall. So it seems the data doesn't have valid VPD
format as defined in PCI specification.

01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        Subsystem: Device 1dcf:030a
	...
	        Capabilities: [e0] Vital Product Data
                *Unknown small resource type 06, will not decode more.*

Not sure which method is used by the driver to get the EEPROM content.
For the issue here is relevant what is exposed via PCI VPD.

The related kernel error message has been reported few times, e.g. here:
https://access.redhat.com/solutions/3001451
Only due to a change in kernel code this became a more prominent
issue now.

You say that VPD is just a series of 3's. This may explain why kernel and
tools complain about an invalid VPD format. VPD misses the tag structure.

> Todd Fujinaka <todd.fujinaka@intel.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 17:07         ` Heiner Kallweit
@ 2021-09-14 21:55           ` Bjorn Helgaas
  2021-09-14 22:06             ` Heiner Kallweit
  0 siblings, 1 reply; 38+ messages in thread
From: Bjorn Helgaas @ 2021-09-14 21:55 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci

On Tue, Sep 14, 2021 at 07:07:40PM +0200, Heiner Kallweit wrote:
> On 14.09.2021 13:26, Bjorn Helgaas wrote:
> > On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote:
> >> On 14.09.2021 01:46, Bjorn Helgaas wrote:
> > 
> >>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
> >>>  /*
> >>
> >> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
> >> blacklisted devices the vpd sysfs attribute isn't visibale. The needed
> >> changes to the patch are minimal.
> > 
> > What do you have in mind?  The only thing I can think of would be to
> > add a "pci_dev.no_vpd" bit.  "vpd.cap == 0" means the device has no
> > VPD, and "vpd.len == 0" means we haven't determined the size yet.  All
> > devices start off with vpd.cap == 0 and vpd.len == 0, so a
> > FIXUP_HEADER quirk would have to set a sentinel value or some other
> > bit.
> 
> Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel?

Sentinel values aren't really my favorite thing, but it certainly does
have the advantage of hiding the sysfs attribute.

> And one more question: Why do you move the "if (!vpd->cap)" check from
> pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit.

I'm pretty sure I *had* a reason, but I can't remember right now :(
Moving it sure does uglify pci_read_vpd() and pci_write_vpd(), though.

What do you think of the following?  (This is a diff from v5.15-rc1.)

diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
index 25557b272a4f..4be24890132e 100644
--- a/drivers/pci/vpd.c
+++ b/drivers/pci/vpd.c
@@ -99,6 +99,24 @@ static size_t pci_vpd_size(struct pci_dev *dev)
 	return off ?: PCI_VPD_SZ_INVALID;
 }
 
+static bool pci_vpd_available(struct pci_dev *dev)
+{
+	struct pci_vpd *vpd = &dev->vpd;
+
+	if (!vpd->cap)
+		return false;
+
+	if (vpd->len == 0) {
+		vpd->len = pci_vpd_size(dev);
+		if (vpd->len == PCI_VPD_SZ_INVALID) {
+			vpd->cap = 0;
+			return false;
+		}
+	}
+
+	return true;
+}
+
 /*
  * Wait for last operation to complete.
  * This code has to spin since there is no other notification from the PCI
@@ -145,7 +163,7 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	u8 *buf = arg;
 
-	if (!vpd->cap)
+	if (!pci_vpd_available(dev))
 		return -ENODEV;
 
 	if (pos < 0)
@@ -206,7 +224,7 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
 	loff_t end = pos + count;
 	int ret = 0;
 
-	if (!vpd->cap)
+	if (!pci_vpd_available(dev))
 		return -ENODEV;
 
 	if (pos < 0 || (pos & 3) || (count & 3))
@@ -242,14 +260,11 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
 
 void pci_vpd_init(struct pci_dev *dev)
 {
+	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
+		return;
+
 	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
 	mutex_init(&dev->vpd.lock);
-
-	if (!dev->vpd.len)
-		dev->vpd.len = pci_vpd_size(dev);
-
-	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
-		dev->vpd.cap = 0;
 }
 
 static ssize_t vpd_read(struct file *filp, struct kobject *kobj,
@@ -294,13 +309,14 @@ const struct attribute_group pci_dev_vpd_attr_group = {
 
 void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
 {
-	unsigned int len = dev->vpd.len;
+	unsigned int len;
 	void *buf;
 	int cnt;
 
-	if (!dev->vpd.cap)
+	if (!pci_vpd_available(dev))
 		return ERR_PTR(-ENODEV);
 
+	len = dev->vpd.len;
 	buf = kmalloc(len, GFP_KERNEL);
 	if (!buf)
 		return ERR_PTR(-ENOMEM);

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 21:55           ` Bjorn Helgaas
@ 2021-09-14 22:06             ` Heiner Kallweit
  2021-09-14 22:33               ` Dave Jones
  0 siblings, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-14 22:06 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Dave Jones, Linus Torvalds, linux-kernel, linux-pci

On 14.09.2021 23:55, Bjorn Helgaas wrote:
> On Tue, Sep 14, 2021 at 07:07:40PM +0200, Heiner Kallweit wrote:
>> On 14.09.2021 13:26, Bjorn Helgaas wrote:
>>> On Tue, Sep 14, 2021 at 08:21:46AM +0200, Heiner Kallweit wrote:
>>>> On 14.09.2021 01:46, Bjorn Helgaas wrote:
>>>
>>>>> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, PCI_ANY_ID, quirk_blacklist_vpd);
>>>>>  /*
>>>>
>>>> Leaving the quirks in FIXUP_HEADER stage would have the advantage that for
>>>> blacklisted devices the vpd sysfs attribute isn't visibale. The needed
>>>> changes to the patch are minimal.
>>>
>>> What do you have in mind?  The only thing I can think of would be to
>>> add a "pci_dev.no_vpd" bit.  "vpd.cap == 0" means the device has no
>>> VPD, and "vpd.len == 0" means we haven't determined the size yet.  All
>>> devices start off with vpd.cap == 0 and vpd.len == 0, so a
>>> FIXUP_HEADER quirk would have to set a sentinel value or some other
>>> bit.
>>
>> Why not leave vpd.len == PCI_VPD_SZ_INVALID as sentinel?
> 
> Sentinel values aren't really my favorite thing, but it certainly does
> have the advantage of hiding the sysfs attribute.
> 
>> And one more question: Why do you move the "if (!vpd->cap)" check from
>> pci_vpd_read() to pci_read_vpd()? At a first glance I see no benefit.
> 
> I'm pretty sure I *had* a reason, but I can't remember right now :(
> Moving it sure does uglify pci_read_vpd() and pci_write_vpd(), though.
> 
> What do you think of the following?  (This is a diff from v5.15-rc1.)
> 
> diff --git a/drivers/pci/vpd.c b/drivers/pci/vpd.c
> index 25557b272a4f..4be24890132e 100644
> --- a/drivers/pci/vpd.c
> +++ b/drivers/pci/vpd.c
> @@ -99,6 +99,24 @@ static size_t pci_vpd_size(struct pci_dev *dev)
>  	return off ?: PCI_VPD_SZ_INVALID;
>  }
>  
> +static bool pci_vpd_available(struct pci_dev *dev)
> +{
> +	struct pci_vpd *vpd = &dev->vpd;
> +
> +	if (!vpd->cap)
> +		return false;
> +
> +	if (vpd->len == 0) {
> +		vpd->len = pci_vpd_size(dev);
> +		if (vpd->len == PCI_VPD_SZ_INVALID) {
> +			vpd->cap = 0;
> +			return false;
> +		}
> +	}
> +
> +	return true;
> +}
> +
>  /*
>   * Wait for last operation to complete.
>   * This code has to spin since there is no other notification from the PCI
> @@ -145,7 +163,7 @@ static ssize_t pci_vpd_read(struct pci_dev *dev, loff_t pos, size_t count,
>  	loff_t end = pos + count;
>  	u8 *buf = arg;
>  
> -	if (!vpd->cap)
> +	if (!pci_vpd_available(dev))
>  		return -ENODEV;
>  
>  	if (pos < 0)
> @@ -206,7 +224,7 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
>  	loff_t end = pos + count;
>  	int ret = 0;
>  
> -	if (!vpd->cap)
> +	if (!pci_vpd_available(dev))
>  		return -ENODEV;
>  
>  	if (pos < 0 || (pos & 3) || (count & 3))
> @@ -242,14 +260,11 @@ static ssize_t pci_vpd_write(struct pci_dev *dev, loff_t pos, size_t count,
>  
>  void pci_vpd_init(struct pci_dev *dev)
>  {
> +	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
> +		return;
> +
>  	dev->vpd.cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
>  	mutex_init(&dev->vpd.lock);
> -
> -	if (!dev->vpd.len)
> -		dev->vpd.len = pci_vpd_size(dev);
> -
> -	if (dev->vpd.len == PCI_VPD_SZ_INVALID)
> -		dev->vpd.cap = 0;
>  }
>  
>  static ssize_t vpd_read(struct file *filp, struct kobject *kobj,
> @@ -294,13 +309,14 @@ const struct attribute_group pci_dev_vpd_attr_group = {
>  
>  void *pci_vpd_alloc(struct pci_dev *dev, unsigned int *size)
>  {
> -	unsigned int len = dev->vpd.len;
> +	unsigned int len;
>  	void *buf;
>  	int cnt;
>  
> -	if (!dev->vpd.cap)
> +	if (!pci_vpd_available(dev))
>  		return ERR_PTR(-ENODEV);
>  
> +	len = dev->vpd.len;
>  	buf = kmalloc(len, GFP_KERNEL);
>  	if (!buf)
>  		return ERR_PTR(-ENOMEM);
> 

This looks very good to me.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 22:06             ` Heiner Kallweit
@ 2021-09-14 22:33               ` Dave Jones
  2021-09-15 21:31                 ` Bjorn Helgaas
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Jones @ 2021-09-14 22:33 UTC (permalink / raw)
  To: Heiner Kallweit; +Cc: Bjorn Helgaas, Linus Torvalds, linux-kernel, linux-pci

On Wed, Sep 15, 2021 at 12:06:33AM +0200, Heiner Kallweit wrote:

 > > What do you think of the following?  (This is a diff from v5.15-rc1.)
 > > 
 > 
 > This looks very good to me.

fwiw, I tested this too, and it still works.

	Dave

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-14 21:51                     ` Heiner Kallweit
@ 2021-09-15 14:18                       ` Hisashi T Fujinaka
  2021-09-15 16:05                         ` Heiner Kallweit
  0 siblings, 1 reply; 38+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-15 14:18 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, linux-pci, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas

On Tue, 14 Sep 2021, Heiner Kallweit wrote:

> On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
>> On Tue, 14 Sep 2021, Dave Jones wrote:
>>
>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>>>
>>>>> Sorry to reply from my personal account. If I did it from my work
>>>>> account I'd be top-posting because of Outlook and that goes over like a
>>>>> lead balloon.
>>>>>
>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can
>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to
>>>>> todd.fujinaka@intel.com
>>>>>
>>>>> The other thing is I'm wondering is what the subvendor device ID you
>>>>> have is referring to because it's not in the pci database. Some ODMs
>>>>> like getting creative with what they put in the NVM.
>>>>>
>>>>> Todd Fujinaka (todd.fujinaka@intel.com)
>>>>
>>>> Thanks for the prompt reply. Dave, could you please provide the requested
>>>> information?
>>>
>>> sent off-list.
>>>
>>>     Dave
>>
>> Whoops. I replied from outlook again.
>>
>> I have confirmation that this should be a valid image. The VPD is just a
>> series of 3's. There are changes to preboot header, flash and BAR size,
>> and as far as I can tell, a nonsense subdevice ID, but this should work.
>>
>> What was the original question?
>>
> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD
> code resulted in a stall. So it seems the data doesn't have valid VPD
> format as defined in PCI specification.
>
> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>        Subsystem: Device 1dcf:030a
> 	...
> 	        Capabilities: [e0] Vital Product Data
>                *Unknown small resource type 06, will not decode more.*
>
> Not sure which method is used by the driver to get the EEPROM content.
> For the issue here is relevant what is exposed via PCI VPD.
>
> The related kernel error message has been reported few times, e.g. here:
> https://access.redhat.com/solutions/3001451
> Only due to a change in kernel code this became a more prominent
> issue now.
>
> You say that VPD is just a series of 3's. This may explain why kernel and
> tools complain about an invalid VPD format. VPD misses the tag structure.

I think I conflated two issues and yours may not be the one with the
weird Amazon NIC. In any case, the VPD does not match the spec and two
people have confirmed it's just full of 3's. With the bogus subvendor
ID, I'm thinking this is not an Intel NIC.

Next step is to contact whoever made the NIC and ask them for guidance.

Todd Fujinaka <todd.fujinaka@intel.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 14:18                       ` Hisashi T Fujinaka
@ 2021-09-15 16:05                         ` Heiner Kallweit
  2021-09-15 16:16                           ` Hisashi T Fujinaka
  0 siblings, 1 reply; 38+ messages in thread
From: Heiner Kallweit @ 2021-09-15 16:05 UTC (permalink / raw)
  To: Hisashi T Fujinaka
  Cc: Dave Jones, linux-pci, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas

On 15.09.2021 16:18, Hisashi T Fujinaka wrote:
> On Tue, 14 Sep 2021, Heiner Kallweit wrote:
> 
>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
>>> On Tue, 14 Sep 2021, Dave Jones wrote:
>>>
>>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>>>>
>>>>>> Sorry to reply from my personal account. If I did it from my work
>>>>>> account I'd be top-posting because of Outlook and that goes over like a
>>>>>> lead balloon.
>>>>>>
>>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can
>>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to
>>>>>> todd.fujinaka@intel.com
>>>>>>
>>>>>> The other thing is I'm wondering is what the subvendor device ID you
>>>>>> have is referring to because it's not in the pci database. Some ODMs
>>>>>> like getting creative with what they put in the NVM.
>>>>>>
>>>>>> Todd Fujinaka (todd.fujinaka@intel.com)
>>>>>
>>>>> Thanks for the prompt reply. Dave, could you please provide the requested
>>>>> information?
>>>>
>>>> sent off-list.
>>>>
>>>>     Dave
>>>
>>> Whoops. I replied from outlook again.
>>>
>>> I have confirmation that this should be a valid image. The VPD is just a
>>> series of 3's. There are changes to preboot header, flash and BAR size,
>>> and as far as I can tell, a nonsense subdevice ID, but this should work.
>>>
>>> What was the original question?
>>>
>> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD
>> code resulted in a stall. So it seems the data doesn't have valid VPD
>> format as defined in PCI specification.
>>
>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>        Subsystem: Device 1dcf:030a
>>     ...
>>             Capabilities: [e0] Vital Product Data
>>                *Unknown small resource type 06, will not decode more.*
>>
>> Not sure which method is used by the driver to get the EEPROM content.
>> For the issue here is relevant what is exposed via PCI VPD.
>>
>> The related kernel error message has been reported few times, e.g. here:
>> https://access.redhat.com/solutions/3001451
>> Only due to a change in kernel code this became a more prominent
>> issue now.
>>
>> You say that VPD is just a series of 3's. This may explain why kernel and
>> tools complain about an invalid VPD format. VPD misses the tag structure.
> 
> I think I conflated two issues and yours may not be the one with the
> weird Amazon NIC. In any case, the VPD does not match the spec and two
> people have confirmed it's just full of 3's. With the bogus subvendor
> ID, I'm thinking this is not an Intel NIC.
> 
> Next step is to contact whoever made the NIC and ask them for guidance.
> 
In an earlier mail in this thread was stated that subvendor id is unknown.
Checking here https://pcisig.com/membership/member-companies?combine=1dcf
it says: Beijing Sinead Technology Co., Ltd.

> Todd Fujinaka <todd.fujinaka@intel.com>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 16:05                         ` Heiner Kallweit
@ 2021-09-15 16:16                           ` Hisashi T Fujinaka
  2021-09-15 22:32                             ` Bjorn Helgaas
  0 siblings, 1 reply; 38+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-15 16:16 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Dave Jones, linux-pci, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas

On Wed, 15 Sep 2021, Heiner Kallweit wrote:

> On 15.09.2021 16:18, Hisashi T Fujinaka wrote:
>> On Tue, 14 Sep 2021, Heiner Kallweit wrote:
>>
>>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
>>>> On Tue, 14 Sep 2021, Dave Jones wrote:
>>>>
>>>>> On Tue, Sep 14, 2021 at 07:51:22AM +0200, Heiner Kallweit wrote:
>>>>>
>>>>>>> Sorry to reply from my personal account. If I did it from my work
>>>>>>> account I'd be top-posting because of Outlook and that goes over like a
>>>>>>> lead balloon.
>>>>>>>
>>>>>>> Anyway, can you send us a dump of your eeprom using ethtool -e? You can
>>>>>>> either send it via a bug on e1000.sourceforge.net or try sending it to
>>>>>>> todd.fujinaka@intel.com
>>>>>>>
>>>>>>> The other thing is I'm wondering is what the subvendor device ID you
>>>>>>> have is referring to because it's not in the pci database. Some ODMs
>>>>>>> like getting creative with what they put in the NVM.
>>>>>>>
>>>>>>> Todd Fujinaka (todd.fujinaka@intel.com)
>>>>>>
>>>>>> Thanks for the prompt reply. Dave, could you please provide the requested
>>>>>> information?
>>>>>
>>>>> sent off-list.
>>>>>
>>>>>     Dave
>>>>
>>>> Whoops. I replied from outlook again.
>>>>
>>>> I have confirmation that this should be a valid image. The VPD is just a
>>>> series of 3's. There are changes to preboot header, flash and BAR size,
>>>> and as far as I can tell, a nonsense subdevice ID, but this should work.
>>>>
>>>> What was the original question?
>>>>
>>> "lspci -vv" complains about an invalid short tag 0x06 and the PCI VPD
>>> code resulted in a stall. So it seems the data doesn't have valid VPD
>>> format as defined in PCI specification.
>>>
>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>>        Subsystem: Device 1dcf:030a
>>>     ...
>>>             Capabilities: [e0] Vital Product Data
>>>                *Unknown small resource type 06, will not decode more.*
>>>
>>> Not sure which method is used by the driver to get the EEPROM content.
>>> For the issue here is relevant what is exposed via PCI VPD.
>>>
>>> The related kernel error message has been reported few times, e.g. here:
>>> https://access.redhat.com/solutions/3001451
>>> Only due to a change in kernel code this became a more prominent
>>> issue now.
>>>
>>> You say that VPD is just a series of 3's. This may explain why kernel and
>>> tools complain about an invalid VPD format. VPD misses the tag structure.
>>
>> I think I conflated two issues and yours may not be the one with the
>> weird Amazon NIC. In any case, the VPD does not match the spec and two
>> people have confirmed it's just full of 3's. With the bogus subvendor
>> ID, I'm thinking this is not an Intel NIC.
>>
>> Next step is to contact whoever made the NIC and ask them for guidance.
>>
> In an earlier mail in this thread was stated that subvendor id is unknown.
> Checking here https://pcisig.com/membership/member-companies?combine=1dcf
> it says: Beijing Sinead Technology Co., Ltd.

Huh. I didn't realize there was an official list beyond pciids.ucw.cz.

In any case, that's who you need to talk to about the unlisted (to
Linux) vendor ID and also the odd VPD data.

-- 
Hisashi T Fujinaka - htodd@twofifty.com
BSEE + BSChem + BAEnglish + MSCS + $2.50 = coffee

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-14 22:33               ` Dave Jones
@ 2021-09-15 21:31                 ` Bjorn Helgaas
  0 siblings, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2021-09-15 21:31 UTC (permalink / raw)
  To: Dave Jones, Heiner Kallweit, Linus Torvalds, linux-kernel, linux-pci

On Tue, Sep 14, 2021 at 06:33:27PM -0400, Dave Jones wrote:
> On Wed, Sep 15, 2021 at 12:06:33AM +0200, Heiner Kallweit wrote:
> 
>  > > What do you think of the following?  (This is a diff from v5.15-rc1.)
>  > > 
>  > 
>  > This looks very good to me.
> 
> fwiw, I tested this too, and it still works.

Thanks very much for proactively testing this.  I hated to burden you
before anybody else had looked at it.

I added this to for-linus for v5.15.

Bjorn

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 16:16                           ` Hisashi T Fujinaka
@ 2021-09-15 22:32                             ` Bjorn Helgaas
  2021-09-15 23:46                               ` Hisashi T Fujinaka
  0 siblings, 1 reply; 38+ messages in thread
From: Bjorn Helgaas @ 2021-09-15 22:32 UTC (permalink / raw)
  To: Hisashi T Fujinaka
  Cc: Heiner Kallweit, Dave Jones, linux-pci,
	Linux Kernel Mailing List, intel-wired-lan, Bjorn Helgaas

On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote:
> On Wed, 15 Sep 2021, Heiner Kallweit wrote:
> > On 15.09.2021 16:18, Hisashi T Fujinaka wrote:
> > > On Tue, 14 Sep 2021, Heiner Kallweit wrote:
> > > > On 14.09.2021 22:00, Hisashi T Fujinaka wrote:

> > > > > I have confirmation that this should be a valid image. The
> > > > > VPD is just a series of 3's. There are changes to preboot
> > > > > header, flash and BAR size, and as far as I can tell, a
> > > > > nonsense subdevice ID, but this should work.
> > > > > 
> > > > > What was the original question?
> > > > > 
> > > > "lspci -vv" complains about an invalid short tag 0x06 and the
> > > > PCI VPD code resulted in a stall. So it seems the data doesn't
> > > > have valid VPD format as defined in PCI specification.
> > > > 
> > > > 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> > > >        Subsystem: Device 1dcf:030a
> > > >     ...
> > > >             Capabilities: [e0] Vital Product Data
> > > >                *Unknown small resource type 06, will not decode more.*
> > > > 
> > > > Not sure which method is used by the driver to get the EEPROM
> > > > content.  For the issue here is relevant what is exposed via
> > > > PCI VPD.
> > > > 
> > > > The related kernel error message has been reported few times,
> > > > e.g. here: https://access.redhat.com/solutions/3001451 Only
> > > > due to a change in kernel code this became a more prominent
> > > > issue now.
> > > > 
> > > > You say that VPD is just a series of 3's. This may explain why
> > > > kernel and tools complain about an invalid VPD format. VPD
> > > > misses the tag structure.
> > > 
> > > I think I conflated two issues and yours may not be the one with the
> > > weird Amazon NIC. In any case, the VPD does not match the spec and two
> > > people have confirmed it's just full of 3's. With the bogus subvendor
> > > ID, I'm thinking this is not an Intel NIC.

A series of 0x03 0x03 0x03 ... bytes would decode as "small items of
type 00", so I assume the VPD contains a series of 0x33 0x33 0x33 ...
bytes.  That would decode to a series of "small items of type 06",
each of length four (one byte for the tag, three bytes of data).

Prior to v5.15, we would complain "invalid short VPD tag 06" and stop
reading.  As of v5.15, I think we'll just keep reading looking for a
valid "end" tag, but we'll never find one.

I think in v5.15 there will be no error message because the series of
four-byte small data items happens to fit exactly in the maximum 32KB
size of VPD and is technically legal syntactic structure, even though
it makes no sense.

But it will be much slower and might account for the boot slowdown
Dave reported.

> > In an earlier mail in this thread was stated that subvendor id is unknown.
> > Checking here https://pcisig.com/membership/member-companies?combine=1dcf
> > it says: Beijing Sinead Technology Co., Ltd.
> 
> Huh. I didn't realize there was an official list beyond pciids.ucw.cz.
> 
> In any case, that's who you need to talk to about the unlisted (to
> Linux) vendor ID and also the odd VPD data.

https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it
is basically crowd-sourced data, not the "official" Vendor IDs
controlled by the PCI SIG.

I submitted an addition to https://pci-ids.ucw.cz/

Bjorn

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 22:32                             ` Bjorn Helgaas
@ 2021-09-15 23:46                               ` Hisashi T Fujinaka
  2021-09-17 15:09                                 ` Bjorn Helgaas
  0 siblings, 1 reply; 38+ messages in thread
From: Hisashi T Fujinaka @ 2021-09-15 23:46 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Dave Jones, linux-pci, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas, Heiner Kallweit

On Wed, 15 Sep 2021, Bjorn Helgaas wrote:

> On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote:
>> On Wed, 15 Sep 2021, Heiner Kallweit wrote:
>>> On 15.09.2021 16:18, Hisashi T Fujinaka wrote:
>>>> On Tue, 14 Sep 2021, Heiner Kallweit wrote:
>>>>> On 14.09.2021 22:00, Hisashi T Fujinaka wrote:
>
>>>>>> I have confirmation that this should be a valid image. The
>>>>>> VPD is just a series of 3's. There are changes to preboot
>>>>>> header, flash and BAR size, and as far as I can tell, a
>>>>>> nonsense subdevice ID, but this should work.
>>>>>>
>>>>>> What was the original question?
>>>>>>
>>>>> "lspci -vv" complains about an invalid short tag 0x06 and the
>>>>> PCI VPD code resulted in a stall. So it seems the data doesn't
>>>>> have valid VPD format as defined in PCI specification.
>>>>>
>>>>> 01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
>>>>>        Subsystem: Device 1dcf:030a
>>>>>     ...
>>>>>             Capabilities: [e0] Vital Product Data
>>>>>                *Unknown small resource type 06, will not decode more.*
>>>>>
>>>>> Not sure which method is used by the driver to get the EEPROM
>>>>> content.  For the issue here is relevant what is exposed via
>>>>> PCI VPD.
>>>>>
>>>>> The related kernel error message has been reported few times,
>>>>> e.g. here: https://access.redhat.com/solutions/3001451 Only
>>>>> due to a change in kernel code this became a more prominent
>>>>> issue now.
>>>>>
>>>>> You say that VPD is just a series of 3's. This may explain why
>>>>> kernel and tools complain about an invalid VPD format. VPD
>>>>> misses the tag structure.
>>>>
>>>> I think I conflated two issues and yours may not be the one with the
>>>> weird Amazon NIC. In any case, the VPD does not match the spec and two
>>>> people have confirmed it's just full of 3's. With the bogus subvendor
>>>> ID, I'm thinking this is not an Intel NIC.
>
> A series of 0x03 0x03 0x03 ... bytes would decode as "small items of
> type 00", so I assume the VPD contains a series of 0x33 0x33 0x33 ...
> bytes.  That would decode to a series of "small items of type 06",
> each of length four (one byte for the tag, three bytes of data).
>
> Prior to v5.15, we would complain "invalid short VPD tag 06" and stop
> reading.  As of v5.15, I think we'll just keep reading looking for a
> valid "end" tag, but we'll never find one.
>
> I think in v5.15 there will be no error message because the series of
> four-byte small data items happens to fit exactly in the maximum 32KB
> size of VPD and is technically legal syntactic structure, even though
> it makes no sense.
>
> But it will be much slower and might account for the boot slowdown
> Dave reported.
>
>>> In an earlier mail in this thread was stated that subvendor id is unknown.
>>> Checking here https://pcisig.com/membership/member-companies?combine=1dcf
>>> it says: Beijing Sinead Technology Co., Ltd.
>>
>> Huh. I didn't realize there was an official list beyond pciids.ucw.cz.
>>
>> In any case, that's who you need to talk to about the unlisted (to
>> Linux) vendor ID and also the odd VPD data.
>
> https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it
> is basically crowd-sourced data, not the "official" Vendor IDs
> controlled by the PCI SIG.
>
> I submitted an addition to https://pci-ids.ucw.cz/
>
> Bjorn

Just for my edumacation, do they keep track of device IDs, subvendor IDs
(which are probably just the same as vendor IDs), and subdevice IDs in
the PCI SIG? Or even the branding strings?

Todd Fujinaka <todd.fujinaka@intel.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [Intel-wired-lan] Linux 5.15-rc1 - 82599ES VPD access isue
  2021-09-15 23:46                               ` Hisashi T Fujinaka
@ 2021-09-17 15:09                                 ` Bjorn Helgaas
  0 siblings, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2021-09-17 15:09 UTC (permalink / raw)
  To: Hisashi T Fujinaka
  Cc: Dave Jones, linux-pci, Linux Kernel Mailing List,
	intel-wired-lan, Bjorn Helgaas, Heiner Kallweit

On Wed, Sep 15, 2021 at 04:46:54PM -0700, Hisashi T Fujinaka wrote:
> On Wed, 15 Sep 2021, Bjorn Helgaas wrote:
> 
> > On Wed, Sep 15, 2021 at 09:16:47AM -0700, Hisashi T Fujinaka wrote:
> > > On Wed, 15 Sep 2021, Heiner Kallweit wrote:

> > > > In an earlier mail in this thread was stated that subvendor id is unknown.
> > > > Checking here https://pcisig.com/membership/member-companies?combine=1dcf
> > > > it says: Beijing Sinead Technology Co., Ltd.
> > > 
> > > Huh. I didn't realize there was an official list beyond pciids.ucw.cz.
> > > 
> > > In any case, that's who you need to talk to about the unlisted (to
> > > Linux) vendor ID and also the odd VPD data.
> > 
> > https://pci-ids.ucw.cz/ is definitely unofficial in the sense that it
> > is basically crowd-sourced data, not the "official" Vendor IDs
> > controlled by the PCI SIG.
> > 
> > I submitted an addition to https://pci-ids.ucw.cz/
> > 
> > Bjorn
> 
> Just for my edumacation, do they keep track of device IDs, subvendor IDs
> (which are probably just the same as vendor IDs), and subdevice IDs in
> the PCI SIG? Or even the branding strings?

The PCI SIG does not manage Device IDs.  That's the responsibility of
the vendor.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-13  3:57 ` Guenter Roeck
@ 2021-09-28 23:18   ` Michael S. Tsirkin
  2021-09-30 13:44     ` Guenter Roeck
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2021-09-28 23:18 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: Linux Kernel Mailing List

On Sun, Sep 12, 2021 at 08:57:50PM -0700, Guenter Roeck wrote:
> The qemu runtime failure bisects to commit 694a1116b405 ("virtio: Bind
> virtio device to device-tree node"), and reverting that commit fixes the
> problem.  With that patch applied, the virtio block device does not
> instantiate on sparc64. This results in a crash since that is where the
> test is trying to boot from.
> 
> Good news is that I don't see any new runtime warnings.
> 
> Guenter

I think the fix is now merged by Linus - could you please try it out and
confirm that it's ok?

Thanks a lot for the testing!

-- 
MST


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: Linux 5.15-rc1
  2021-09-28 23:18   ` Michael S. Tsirkin
@ 2021-09-30 13:44     ` Guenter Roeck
  0 siblings, 0 replies; 38+ messages in thread
From: Guenter Roeck @ 2021-09-30 13:44 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Linux Kernel Mailing List

On Tue, Sep 28, 2021 at 07:18:41PM -0400, Michael S. Tsirkin wrote:
> On Sun, Sep 12, 2021 at 08:57:50PM -0700, Guenter Roeck wrote:
> > The qemu runtime failure bisects to commit 694a1116b405 ("virtio: Bind
> > virtio device to device-tree node"), and reverting that commit fixes the
> > problem.  With that patch applied, the virtio block device does not
> > instantiate on sparc64. This results in a crash since that is where the
> > test is trying to boot from.
> > 
> > Good news is that I don't see any new runtime warnings.
> > 
> > Guenter
> 
> I think the fix is now merged by Linus - could you please try it out and
> confirm that it's ok?

Yes, it is.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2021-09-30 13:44 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-12 23:58 Linux 5.15-rc1 Linus Torvalds
2021-09-13  3:57 ` Guenter Roeck
2021-09-28 23:18   ` Michael S. Tsirkin
2021-09-30 13:44     ` Guenter Roeck
2021-09-13 14:18 ` Dave Jones
2021-09-13 18:59   ` Heiner Kallweit
2021-09-13 19:51     ` Linus Torvalds
2021-09-13 20:09       ` Bjorn Helgaas
2021-09-13 20:11       ` Heiner Kallweit
2021-09-13 20:15         ` Linus Torvalds
2021-09-13 20:15     ` Dave Jones
2021-09-13 20:22       ` Heiner Kallweit
2021-09-13 20:32         ` Dave Jones
2021-09-13 20:44           ` Linux 5.15-rc1 - 82599ES VPD access isue Heiner Kallweit
2021-09-13 23:32             ` [Intel-wired-lan] " Hisashi T Fujinaka
2021-09-14  5:51               ` Heiner Kallweit
2021-09-14 14:24                 ` Dave Jones
2021-09-14 18:28                   ` Fujinaka, Todd
2021-09-14 20:00                   ` Hisashi T Fujinaka
2021-09-14 21:51                     ` Heiner Kallweit
2021-09-15 14:18                       ` Hisashi T Fujinaka
2021-09-15 16:05                         ` Heiner Kallweit
2021-09-15 16:16                           ` Hisashi T Fujinaka
2021-09-15 22:32                             ` Bjorn Helgaas
2021-09-15 23:46                               ` Hisashi T Fujinaka
2021-09-17 15:09                                 ` Bjorn Helgaas
2021-09-13 20:32       ` Linux 5.15-rc1 Heiner Kallweit
2021-09-13 20:36         ` Linus Torvalds
2021-09-13 20:41         ` Dave Jones
2021-09-13 23:46   ` Bjorn Helgaas
2021-09-14  0:39     ` Dave Jones
2021-09-14  6:21     ` Heiner Kallweit
2021-09-14 11:26       ` Bjorn Helgaas
2021-09-14 17:07         ` Heiner Kallweit
2021-09-14 21:55           ` Bjorn Helgaas
2021-09-14 22:06             ` Heiner Kallweit
2021-09-14 22:33               ` Dave Jones
2021-09-15 21:31                 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).