Netdev Archive on lore.kernel.org help / color / mirror / Atom feed
From: Yongji Xie <email@example.com> To: Robin Murphy <firstname.lastname@example.org> Cc: kvm <email@example.com>, "Michael S. Tsirkin" <firstname.lastname@example.org>, "Jason Wang" <email@example.com>, virtualization <firstname.lastname@example.org>, "Christian Brauner" <email@example.com>, "Jonathan Corbet" <firstname.lastname@example.org>, "Matthew Wilcox" <email@example.com>, "Christoph Hellwig" <firstname.lastname@example.org>, "Dan Carpenter" <email@example.com>, "Stefano Garzarella" <firstname.lastname@example.org>, "Liu Xiaodong" <email@example.com>, "Joe Perches" <firstname.lastname@example.org>, "Al Viro" <email@example.com>, "Stefan Hajnoczi" <firstname.lastname@example.org>, email@example.com, "Jens Axboe" <firstname.lastname@example.org>, "He Zhe" <email@example.com>, "Greg KH" <firstname.lastname@example.org>, "Randy Dunlap" <email@example.com>, linux-kernel <firstname.lastname@example.org>, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, "Mika Penttilä" <email@example.com> Subject: Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast() Date: Thu, 5 Aug 2021 20:34:01 +0800 [thread overview] Message-ID: <CACycT3vARzvd4-dkZhDHqUkeYoSxTa2ty0z0ivE1znGtifirstname.lastname@example.org> (raw) In-Reply-To: <email@example.com> On Wed, Aug 4, 2021 at 11:43 PM Robin Murphy <firstname.lastname@example.org> wrote: > > On 2021-08-04 06:02, Yongji Xie wrote: > > On Tue, Aug 3, 2021 at 6:54 PM Robin Murphy <email@example.com> wrote: > >> > >> On 2021-08-03 09:54, Yongji Xie wrote: > >>> On Tue, Aug 3, 2021 at 3:41 PM Jason Wang <firstname.lastname@example.org> wrote: > >>>> > >>>> > >>>> 在 2021/7/29 下午3:34, Xie Yongji 写道: > >>>>> Export alloc_iova_fast() and free_iova_fast() so that > >>>>> some modules can use it to improve iova allocation efficiency. > >>>> > >>>> > >>>> It's better to explain why alloc_iova() is not sufficient here. > >>>> > >>> > >>> Fine. > >> > >> What I fail to understand from the later patches is what the IOVA domain > >> actually represents. If the "device" is a userspace process then > >> logically the "IOVA" would be the userspace address, so presumably > >> somewhere you're having to translate between this arbitrary address > >> space and actual usable addresses - if you're worried about efficiency > >> surely it would be even better to not do that? > >> > > > > Yes, userspace daemon needs to translate the "IOVA" in a DMA > > descriptor to the VA (from mmap(2)). But this actually doesn't affect > > performance since it's an identical mapping in most cases. > > I'm not familiar with the vhost_iotlb stuff, but it looks suspiciously > like you're walking yet another tree to make those translations. Even if > the buffer can be mapped all at once with a fixed offset such that each > DMA mapping call doesn't need a lookup for each individual "IOVA" - that > might be what's happening already, but it's a bit hard to follow just > reading the patches in my mail client - vhost_iotlb_add_range() doesn't > look like it's super-cheap to call, and you're serialising on a lock for > that. > Yes, that's true. Since the software IOTLB is not used in the VM case, we need a unified way (vhost_iotlb) to manage the IOVA mapping for both VM and Container cases. > My main point, though, is that if you've already got something else > keeping track of the actual addresses, then the way you're using an > iova_domain appears to be something you could do with a trivial bitmap > allocator. That's why I don't buy the efficiency argument. The main > design points of the IOVA allocator are to manage large address spaces > while trying to maximise spatial locality to minimise the underlying > pagetable usage, and allocating with a flexible limit to support > multiple devices with different addressing capabilities in the same > address space. If none of those aspects are relevant to the use-case - > which AFAICS appears to be true here - then as a general-purpose > resource allocator it's rubbish and has an unreasonably massive memory > overhead and there are many, many better choices. > OK, I get your point. Actually we used the genpool allocator in the early version. Maybe we can fall back to using it. > FWIW I've recently started thinking about moving all the caching stuff > out of iova_domain and into the iommu-dma layer since it's now a giant > waste of space for all the other current IOVA users. > > >> Presumably userspace doesn't have any concern about alignment and the > >> things we have to worry about for the DMA API in general, so it's pretty > >> much just allocating slots in a buffer, and there are far more effective > >> ways to do that than a full-blown address space manager. > > > > Considering iova allocation efficiency, I think the iova allocator is > > better here. In most cases, we don't even need to hold a spin lock > > during iova allocation. > > > >> If you're going > >> to reuse any infrastructure I'd have expected it to be SWIOTLB rather > >> than the IOVA allocator. Because, y'know, you're *literally implementing > >> a software I/O TLB* ;) > >> > > > > But actually what we can reuse in SWIOTLB is the IOVA allocator. > > Huh? Those are completely unrelated and orthogonal things - SWIOTLB does > not use an external allocator (see find_slots()). By SWIOTLB I mean > specifically the library itself, not dma-direct or any of the other > users built around it. The functionality for managing slots in a buffer > and bouncing data in and out can absolutely be reused - that's why users > like the Xen and iommu-dma code *are* reusing it instead of open-coding > their own versions. > I see. Actually the slots management in SWIOTLB is what I mean by IOVA allocator. > > And > > the IOVA management in SWIOTLB is not what we want. For example, > > SWIOTLB allocates and uses contiguous memory for bouncing, which is > > not necessary in VDUSE case. > > alloc_iova() allocates a contiguous (in IOVA address) region of space. > In vduse_domain_map_page() you use it to allocate a contiguous region of > space from your bounce buffer. Can you clarify how that is fundamentally > different from allocating a contiguous region of space from a bounce > buffer? Nobody's saying the underlying implementation details of where > the buffer itself comes from can't be tweaked. > I mean physically contiguous memory here. We can currently allocate the bounce pages one by one rather than allocating a bunch of physically contiguous memory at once which is not friendly to a userspace device. > > And VDUSE needs coherent mapping which is > > not supported by the SWIOTLB. Besides, the SWIOTLB works in singleton > > mode (designed for platform IOMMU) , but VDUSE is based on on-chip > > IOMMU (supports multiple instances). > That's not entirely true - the IOMMU bounce buffering scheme introduced > in intel-iommu and now moved into the iommu-dma layer was already a step > towards something conceptually similar. It does still rely on stealing > the underlying pages from the global SWIOTLB pool at the moment, but the > bouncing is effectively done in a per-IOMMU-domain context. > > The next step is currently queued in linux-next, wherein we can now have > individual per-device SWIOTLB pools. In fact at that point I think you > might actually be able to do your thing without implementing any special > DMA ops at all - you'd need to set up a pool for your "device" with > force_bounce set, then when you mmap() that to userspace, set up > dev->dma_range_map to describe an offset from the physical address of > the buffer to the userspace address, and I think dma-direct would be > tricked into doing the right thing. It's a bit wacky, but it could stand > to save a hell of a lot of bother. > Cool! I missed this work, sorry. But it looks like its current version can't meet our needs (e.g. avoid using physically contiguous memory). So I'd like to consider it as a follow-up optimization and use a general IOVA allocator in this initial version. The IOVA allocator would be still needed for coherent mapping (vduse_domain_alloc_coherent() and vduse_domain_free_coherent()) after we reuse the SWIOTLB. > Finally, enhancing SWIOTLB to cope with virtually-mapped buffers that > don't have to be physically contiguous is a future improvement which I > think could benefit various use-cases - indeed it's possibly already on > the table for IOMMU bounce pages - so would probably be welcome in general. > Yes, it's indeed needed by VDUSE. But I'm not sure if it would be needed by other drivers. Looks like we need swiotlb_tbl_map_single() to return a virtual address and introduce some way to let the caller do some translation between VA to PA. Thanks, Yongji
next prev parent reply other threads:[~2021-08-05 12:34 UTC|newest] Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-07-29 7:34 [PATCH v10 00/17] Introduce VDUSE - vDPA Device in Userspace Xie Yongji 2021-07-29 7:34 ` [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast() Xie Yongji 2021-08-03 7:41 ` Jason Wang 2021-08-03 7:41 ` Jason Wang 2021-08-03 8:54 ` Yongji Xie 2021-08-03 10:53 ` Robin Murphy 2021-08-04 5:02 ` Yongji Xie 2021-08-04 15:43 ` Robin Murphy 2021-08-05 12:34 ` Yongji Xie [this message] 2021-08-05 13:31 ` Jason Wang 2021-08-09 5:56 ` Yongji Xie 2021-08-10 3:02 ` Jason Wang 2021-08-10 7:43 ` Yongji Xie 2021-07-29 7:34 ` [PATCH v10 02/17] file: Export receive_fd() to modules Xie Yongji 2021-08-03 7:45 ` Jason Wang 2021-08-03 9:01 ` Yongji Xie 2021-08-04 8:27 ` Jason Wang 2021-07-29 7:34 ` [PATCH v10 03/17] vdpa: Fix code indentation Xie Yongji 2021-08-03 7:50 ` Jason Wang 2021-08-03 9:13 ` Yongji Xie 2021-07-29 7:34 ` [PATCH v10 04/17] vdpa: Fail the vdpa_reset() if fail to set device status to zero Xie Yongji 2021-08-03 7:58 ` Jason Wang 2021-08-03 9:31 ` Yongji Xie 2021-08-04 8:30 ` Jason Wang 2021-07-29 7:34 ` [PATCH v10 05/17] vhost-vdpa: Fail the vhost_vdpa_set_status() on reset failure Xie Yongji 2021-08-03 8:10 ` Jason Wang 2021-08-03 9:50 ` Yongji Xie 2021-08-04 8:33 ` Jason Wang 2021-07-29 7:34 ` [PATCH v10 06/17] vhost-vdpa: Handle the failure of vdpa_reset() Xie Yongji 2021-07-29 7:34 ` [PATCH v10 07/17] virtio: Don't set FAILED status bit on device index allocation failure Xie Yongji 2021-08-03 8:02 ` Jason Wang 2021-08-03 9:17 ` Yongji Xie 2021-07-29 7:34 ` [PATCH v10 08/17] virtio_config: Add a return value to reset function Xie Yongji 2021-07-29 7:34 ` [PATCH v10 09/17] virtio-vdpa: Handle the failure of vdpa_reset() Xie Yongji 2021-07-29 7:34 ` [PATCH v10 10/17] virtio: Handle device reset failure in register_virtio_device() Xie Yongji 2021-08-03 8:09 ` Jason Wang 2021-08-03 9:38 ` Yongji Xie 2021-08-04 8:32 ` Jason Wang 2021-08-04 8:50 ` Yongji Xie 2021-08-04 8:54 ` Jason Wang 2021-08-04 9:07 ` Yongji Xie 2021-08-05 7:12 ` Jason Wang 2021-07-29 7:34 ` [PATCH v10 11/17] vhost-iotlb: Add an opaque pointer for vhost IOTLB Xie Yongji 2021-07-29 7:34 ` [PATCH v10 12/17] vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() Xie Yongji 2021-07-29 7:34 ` [PATCH v10 13/17] vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() Xie Yongji 2021-07-29 7:35 ` [PATCH v10 14/17] vdpa: Support transferring virtual addressing during DMA mapping Xie Yongji 2021-07-29 7:35 ` [PATCH v10 15/17] vduse: Implement an MMU-based software IOTLB Xie Yongji 2021-07-29 7:35 ` [PATCH v10 16/17] vduse: Introduce VDUSE - vDPA Device in Userspace Xie Yongji 2021-07-29 9:00 ` Greg KH 2021-07-29 9:57 ` Yongji Xie 2021-08-03 7:30 ` Jason Wang 2021-08-03 8:39 ` Yongji Xie 2021-07-29 7:35 ` [PATCH v10 17/17] Documentation: Add documentation for VDUSE Xie Yongji 2021-08-03 7:35 ` Jason Wang 2021-08-03 8:52 ` Yongji Xie
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CACycT3vARzvd4-dkZhDHqUkeYoSxTa2ty0z0ivE1znGtiemail@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).