LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Yordan@web.codeaurora.org, Karadzhov@web.codeaurora.org,
	VMware <" <y.karadz"@gmail.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	viro@zeniv.linux.org.uk, mingo@redhat.com, hagen@jauu.net,
	rppt@kernel.org, akpm@linux-foundation.org, vvs@virtuozzo.com,
	shakeelb@google.com, christian.brauner@ubuntu.com,
	mkoutny@suse.com,
	"Linux Containers <containers@lists.linux.dev>" 
	<""@web.codeaurora.org>
Subject: Re: [RFC PATCH 0/4] namespacefs: Proof-of-Concept
Date: Fri, 19 Nov 2021 11:47:36 -0500	[thread overview]
Message-ID: <20211119114736.5d9dcf6c@gandalf.local.home> (raw)
In-Reply-To: <f6ca1f5bdb3b516688f291d9685a6a59f49f1393.camel@HansenPartnership.com>

[ Fixed strange email header ]

On Fri, 19 Nov 2021 11:30:43 -0500
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Fri, 2021-11-19 at 09:27 -0500, Steven Rostedt wrote:
> > On Fri, 19 Nov 2021 07:45:01 -0500
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >   
> > > On Thu, 2021-11-18 at 14:24 -0500, Steven Rostedt wrote:  
> > > > On Thu, 18 Nov 2021 12:55:07 -0600
> > > > ebiederm@xmission.com (Eric W. Biederman) wrote:
> > > >     
> > > > > It is not correct to use inode numbers as the actual names for
> > > > > namespaces.
> > > > > 
> > > > > I can not see anything else you can possibly uses as names for
> > > > > namespaces.    
> > > > 
> > > > This is why we used inode numbers.
> > > >     
> > > > > To allow container migration between machines and similar
> > > > > things the you wind up needing a namespace for your names of
> > > > > namespaces.    
> > > > 
> > > > Is this why you say inode numbers are incorrect?    
> > > 
> > > The problem is you seem to have picked on one orchestration system
> > > without considering all the uses of namespaces and how this would
> > > impact them.  So let me explain why inode numbers are incorrect and
> > > it will possibly illuminate some of the cans of worms you're
> > > opening.
> > > 
> > > We have a container checkpoint/restore system called CRIU that can
> > > be used to snapshot the state of a pid subtree and restore it.  It
> > > can be used for the entire system or piece of it.  It is also used
> > > by some orchestration systems to live migrate containers.  Any
> > > property of a container system that has meaning must be saved and
> > > restored by CRIU.
> > > 
> > > The inode number is simply a semi random number assigned to the
> > > namespace.  it shows up in /proc/<pid>/ns but nowhere else and
> > > isn't used by anything.  When CRIU migrates or restores containers,
> > > all the namespaces that compose them get different inode values on
> > > the restore.  If you want to make the inode number equivalent to
> > > the container name, they'd have to restore to the previous number
> > > because you've made it a property of the namespace.  The way
> > > everything is set up now, that's just not possible and never will
> > > be.  Inode numbers are a 32 bit space and can't be globally
> > > unique.  If you want a container name, it will have to be something
> > > like a new UUID and that's the first problem you should tackle.  
> > 
> > So everyone seems to be all upset about using inode number. We could
> > do what Kirill suggested and just create some random UUID and use
> > that. We could have a file in the directory called inode that has the
> > inode number (as that's what both docker and podman use to identify
> > their containers, and it's nice to have something to map back to
> > them).
> > 
> > On checkpoint restore, only the directories that represent the
> > container that migrated matter, so as Kirill said, make sure they get
> > the old UUID name, and expose that as the directory.
> > 
> > If a container is looking at directories of other containers on the
> > system, then it gets migrated to another system, it should be treated
> > as though those directories were deleted under them.
> > 
> > I still do not see what the issue is here.  
> 
> The issue is you're introducing a new core property for namespaces they
> didn't have before.  Everyone has different use cases for containers
> and we need to make sure the new property works with all of them.

What new core property is this? We simply want a way to see what namespaces
are defined in the kernel from a systems point of view. This just defines a
file system to show that.

> 
> Having a "name" for a namespace has been discussed before which is the
> landmine you stepped on when you advocated using the inode number as
> the name, because that's already known to be unworkable.

We don't care what name it is, or if it is a name at all. We just want to
know what is there, and not hidden behind tasks that create namespaces.

> 
> Can we back up and ask what problem you're trying to solve before we
> start introducing new objects like namespace name?  The problem

Again, this has nothing to do with naming namespaces.

> statement just seems to be "Being able to see the structure of the
> namespaces can be very useful in the context of the containerized
> workloads."  which you later expanded on as "trying to add more
> visibility into the working of things like kubernetes".  If you just
> want to see the namespace "tree" you can script that (as root) by
> matching the process tree and the /proc/<pid>/ns changes without
> actually needing to construct it in the kernel.  This can also be done
> without introducing the concept of a namespace name.  However, there is
> a subtlety of doing this matching in the way I described in that you
> don't get proper parenting to the user namespace ownership ... but that
> seems to be something you don't want anyway?

Can a privileged container be able to create a full tree of all current
namespaces defined in the system, by just installing it, and reading the
host procfs system?  If so, then that's all we want, and will look at doing
that. But from our initial approach it's not obvious how to do so.

-- Steve

  parent reply	other threads:[~2021-11-19 16:47 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-18 18:12 Yordan Karadzhov (VMware)
2021-11-18 18:12 ` [RFC PATCH 1/4] namespacefs: Introduce 'namespacefs' Yordan Karadzhov (VMware)
2021-11-18 18:12 ` [RFC PATCH 2/4] namespacefs: Add methods to create/remove PID namespace directories Yordan Karadzhov (VMware)
2021-11-18 18:12 ` [RFC PATCH 3/4] namespacefs: Couple namespacefs to the PID namespace Yordan Karadzhov (VMware)
2021-11-18 18:12 ` [RFC PATCH 4/4] namespacefs: Couple namespacefs to the UTS namespace Yordan Karadzhov (VMware)
2021-11-18 18:55 ` [RFC PATCH 0/4] namespacefs: Proof-of-Concept Eric W. Biederman
2021-11-18 19:02   ` Steven Rostedt
2021-11-18 19:22     ` Eric W. Biederman
2021-11-18 19:36       ` Steven Rostedt
2021-11-18 19:24   ` Steven Rostedt
2021-11-19  9:50     ` Kirill Tkhai
2021-11-19 12:45     ` James Bottomley
     [not found]       ` <20211119092758.1012073e@gandalf.local.home>
2021-11-19 16:42         ` James Bottomley
2021-11-19 17:14           ` Yordan Karadzhov
2021-11-19 17:22             ` Steven Rostedt
2021-11-19 23:22             ` James Bottomley
2021-11-20  0:07               ` Steven Rostedt
2021-11-20  0:14                 ` James Bottomley
     [not found]         ` <f6ca1f5bdb3b516688f291d9685a6a59f49f1393.camel@HansenPartnership.com>
2021-11-19 16:47           ` Steven Rostedt [this message]
2021-11-19 16:49             ` Steven Rostedt
2021-11-19 23:08               ` James Bottomley
2021-11-22 13:02                 ` Yordan Karadzhov
2021-11-22 13:44                   ` James Bottomley
2021-11-22 15:00                     ` Yordan Karadzhov
2021-11-22 15:47                       ` James Bottomley
2021-11-22 16:15                         ` Yordan Karadzhov
2021-11-19 14:26   ` Yordan Karadzhov
2021-11-18 21:24 ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211119114736.5d9dcf6c@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=" <y.karadz"@gmail.com \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=Karadzhov@web.codeaurora.org \
    --cc=Yordan@web.codeaurora.org \
    --cc=akpm@linux-foundation.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=ebiederm@xmission.com \
    --cc=hagen@jauu.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=rppt@kernel.org \
    --cc=shakeelb@google.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=vvs@virtuozzo.com \
    --subject='Re: [RFC PATCH 0/4] namespacefs: Proof-of-Concept' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).