LKML Archive on
help / color / mirror / Atom feed
From: David Hildenbrand <>
To: Dan Williams <>
Cc: Linux Kernel Mailing List <>,
	Arnd Bergmann <>,
	Greg Kroah-Hartman <>,
	"Michael S. Tsirkin" <>,
	Jason Wang <>,
	"Rafael J. Wysocki" <>,
	Andrew Morton <>,
	Hanjun Guo <>,
	Andy Shevchenko <>,,
	Linux MM <>
Subject: Re: [PATCH v2 1/3] /dev/mem: disallow access to explicitly excluded system RAM regions
Date: Wed, 25 Aug 2021 19:27:20 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 25.08.21 19:07, Dan Williams wrote:
> On Wed, Aug 25, 2021 at 12:23 AM David Hildenbrand <> wrote:
>> On 25.08.21 02:58, Dan Williams wrote:
>>> On Mon, Aug 16, 2021 at 7:25 AM David Hildenbrand <> wrote:
>>>> virtio-mem dynamically exposes memory inside a device memory region as
>>>> system RAM to Linux, coordinating with the hypervisor which parts are
>>>> actually "plugged" and consequently usable/accessible. On the one hand, the
>>>> virtio-mem driver adds/removes whole memory blocks, creating/removing busy
>>>> IORESOURCE_SYSTEM_RAM resources, on the other hand, it logically (un)plugs
>>>> memory inside added memory blocks, dynamically either exposing them to
>>>> the buddy or hiding them from the buddy and marking them PG_offline.
>>>> virtio-mem wants to make sure that in a sane environment, nobody
>>>> "accidentially" accesses unplugged memory inside the device managed
>>>> region. After /proc/kcore has been sanitized and /dev/kmem has been
>>>> removed, /dev/mem is the remaining interface that still allows uncontrolled
>>>> access to the device-managed region of virtio-mem devices from user
>>>> space.
>>>> There is no known sane use case for mapping virtio-mem device memory
>>>> via /dev/mem while virtio-mem driver concurrently (un)plugs memory inside
>>>> that region. So once the driver was loaded and detected the device
>>>> along the device-managed region, we just want to disallow any access via
>>>> /dev/mem to it.
>>>> Let's add the basic infrastructure to exclude some physical memory
>>>> regions completely from /dev/mem access, on any architecture and under
>>>> any system configuration (independent of CONFIG_STRICT_DEVMEM and
>>>> independent of "iomem=").
>>> I'm certainly on team "/dev/mem considered harmful", but this approach
>>> feels awkward. It feels wrong for being non-committal about whether
>>> CONFIG_STRICT_DEVMEM is in wide enough use that the safety can be
>>> turned on all the time, and the configuration option dropped, or there
>>> are users clinging onto /dev/mem where they expect to be able to build
>>> a debug kernel to turn all of these restrictions off, even the
>>> virtio-mem ones. This splits the difference and says some /dev/mem
>>> accesses are always disallowed for "reasons", but I could say the same
>>> thing about pmem, there's no sane reason to allow /dev/mem which has
>>> no idea about the responsibilities of properly touching pmem to get
>>> access to it.
>> For virtio-mem, there is no use case *and* access could be harmful; I
>> don't even want to allow if for debugging purposes. If you want to
>> inspect virtio-mem device memory content, use /proc/kcore, which
>> performs proper synchronized access checks. Modifying random virtio-mem
>> memory via /dev/mem in a debug kernel will not be possible: if you
>> really have to play with fire, use kdb or better don't load the
>> virtio-mem driver during boot, such that the kernel won't even be making
>> use of device memory.
>> I don't want people disabling CONFIG_STRICT_DEVMEM, or booting with
>> "iomem=relaxed", and "accidentally" accessing any of virtio-mem memory
>> via /dev/mem, while it gets concurrently plugged/unplugged by the
>> virtio-mem driver. Not even for debugging purposes.
> That sounds more an argument that all of the existing "kernel is using
> this region" cases should become mandatory exclusions. If unloading
> the driver removes the exclusion then that's precisely
> CONFIG_IO_STRICT_DEVMEM. Why is the virtio-mem driver more special
> than any other driver that expects this integrity guarantee?

Unloading the driver will only remove exclusion if the driver can be 
unloaded cleanly -- if there is no memory added to Linux. Similar to 
force-unbinding dax/kmem without offlining memory, the whole device 
range will remain excluded.

(unloading the driver is only even implemented because there is no way 
to not implement it; there is no sane use case for virtio-mem to do that)

There are 2 things that are relevant for virtio-mem memory in regards of 
this series:

1. Kernel is currently using it (added virtio-mem memory). Don't allow 
access. Pretty much like most other things we want to exclude, I agree.

2. Kernel is currently not using it (not yet added virtio-mem memory), 
or not using it right now any more (removed virtio-mem memory). In 
contrast to other devices (DIMM, PMEM, ...) there is no sane use case 
for this memory, because the VM must not use it (as defined in the 

I care about 2) a lot because I don't want people looking at 
/proc/iomem, figuring out that there is something to map. And by the 
time they try to map it via /dev/mem, virtio-mem emoved that memory, yet 
a /dev/mem mapping happened and we have invalid memory access.

Mapping /dev/mem and accidentally being able to read/write virtio-mem 
memory has to be forbidden in sane environments. Force unloading a 
driver or preventing it from loading just to touch virtio-mem memory via 
/dev/mem is not a sane environment, someone is explicitly is asking for 
trouble, which is fine.

>> We disallow mapping to some other regions independent of
>> CONFIG_STRICT_DEVMEM already, so the idea to ignore CONFIG_STRICT_DEVMEM
>> is not completely new:
>> "Note that with PAT support enabled, even in this case there are
>> restrictions on /dev/mem use due to the cache aliasing requirements."
>> Maybe you even want to do something similar with PMEM now that there is
>> infrastructure for it and just avoid having to deal with revoking
>> /dev/mem mappings later.
> That would be like blocking writes to /dev/sda just because a
> filesytem might later be mounted on it. If the /dev/mem access is not
> actively colliding with other kernel operations what business does the
> kernel have saying no?

That the spec defines that that memory must not be read/written, because 
there might not be any memory after all anymore backing the virtio-mem 
device, or there is and the hypervisor will flag you as "malicious" and 
eventually zap the VM. That's different to most physical devices I am 
aware of.

> I'm pushing on this topic because I am also considering an exclusion
> on PCI configuration access to the "DOE mailbox" since it can disrupt
> the kernel's operation, at the same time, root can go change PCI BARs
> to nonsensical values whenever it wants which is also in the category
> of "has no use case && could be harmful".


>> I think there are weird debugging/educational setups [1] that still
>> require CONFIG_STRICT_DEVMEM=n even with iomem=relaxed. Take a look at
>> lib/devmem_is_allowed.c:devmem_is_allowed(), it disallows any access to
>> (what's currently added as) System RAM. It might just do what people
>> want when dealing with system RAM that doesn't suddenly vanish , so I
>> don't ultimately see why we should remove CONFIG_STRICT_DEVMEM=n.
> Yes, I wanted to tease out more of your rationale on where the line
> should be drawn, I think a mostly unfettered /dev/mem mode is here to
> stay.

I could most certainly be convinced to

a) Leave CONFIG_STRICT_DEVMEM=n untouched
b) Restrict what I propose to CONFIG_STRICT_DEVMEM=y.

I could even go ahead and require CONFIG_STRICT_DEVMEM for virtio-mem.


David / dhildenb

  reply	other threads:[~2021-08-25 17:27 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-16 14:25 [PATCH v2 0/3] virtio-mem: disallow mapping virtio-mem memory via /dev/mem David Hildenbrand
2021-08-16 14:25 ` [PATCH v2 1/3] /dev/mem: disallow access to explicitly excluded system RAM regions David Hildenbrand
2021-08-25  0:58   ` Dan Williams
2021-08-25  7:23     ` David Hildenbrand
2021-08-25 17:07       ` Dan Williams
2021-08-25 17:27         ` David Hildenbrand [this message]
2021-08-16 14:25 ` [PATCH v2 2/3] virtio-mem: disallow mapping virtio-mem memory via /dev/mem David Hildenbrand
2021-08-16 14:25 ` [PATCH v2 3/3] kernel/resource: cleanup and optimize iomem_is_exclusive() David Hildenbrand
2021-08-23 19:14 ` [PATCH v2 0/3] virtio-mem: disallow mapping virtio-mem memory via /dev/mem David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH v2 1/3] /dev/mem: disallow access to explicitly excluded system RAM regions' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).