LKML Archive on
help / color / mirror / Atom feed
From: Pierre Peiffer <>
To: "Serge E. Hallyn" <>,
	Andrew Morton <>
Cc: Oren Laadan <>,,,
	Alexey Dobriyan <>
Subject: Re: [PATCH 2.6.24-rc8-mm1 09/15] (RFC) IPC: new kernel API to	change an ID
Date: Fri, 08 Feb 2008 11:12:33 +0100	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

Serge E. Hallyn wrote:
> But note that in either case we need to deal with a bunch of locking.
> So getting back to Pierre's patchset, IIRC 1-8 are cleanups worth
> doing no matter 1.  9-11 sound like they are contentuous until
> we decide whether we want to go with a create_with_id() type approach
> or a set_id().  12 is IMO a good locking cleanup regardless.  13 and
> 15 are contentous until we decide whether we want userspace-controlled
> checkpoint or a one-shot fs.  14 IMO is useful for both c/r approaches.
> Is that pretty accurate?

Ok, so, so far, the discussion stays opened about the new functionalities for c/r.

As there were no objection about the first patches, which rewrite/enhance the
existing code, Andrew, could you consider them (ie patches 1 to 8 of this
series) for inclusion in -mm ? (I mean, as soon as it is possible, as I guess
you're pretty busy for now with the merge for 2.6.25)

If you prefer, I can resend them separately ?



>> It isn't strictly necessary to export a new interface in order to
>> support checkpoint/restart. **. Hence, I think that the speculation
>> "we may need it in the future" is too abstract and isn't a good
>> excuse to commit to a new, currently unneeded, interface.
> OTOH it did succeed in starting some conversation :)
>> Should the
>> need arise in the future, it will be easy to design a new interface
>> (also based on aggregated experience until then).
> What aggregated experience?  We have to start somewhere...
>> ** In fact, the suggested interface may prove problematic (as noted
>> earlier in this thread): if you first create the resource with some
>> arbitrary identifier and then modify the identifier (in our case,
>> IPC id), then the restart procedure is bound to execute sequentially,
>> because of lack of atomicity.
> Hmm?  Lack of atomicity wrt what?  All the tasks being restarted were
> checkpointed at the same time so there will be no conflict in the
> requested IDs, so I don't know what you're referring to.
>> That said, I suggest the following method instead (this is the method
>> we use in Zap to determine the desired resource identifier when a new
>> resource is allocated; I recall that we had discussed it in the past,
>> perhaps the mini-summit in september ?):
>> 1) The process/thread tells the kernel that it wishes to pre-determine
>> the resource identifier of a subsequent call (this can be done via a
>> new syscall, or by writing to /proc/self/...).
>> 2) Each system call that allocates a resource and assigns an identifier
>> is modified to check this per-thread field first; if it is set then
>> it will attempt to allocate that particular value (if already taken,
>> return an error, eg. EBUSY). Otherwise it will proceed as it is today.
> But I thought you were just advocating a one-shot filesystem approach
> for c/r, so we wouldn't be creating the resources piecemeal?
> The /proc/self approach is one way to go, it has been working for LSMs
> this long.  I'd agree that it would be nice if we could have a
> consistent interface to the create_with_id()/set_id() problem.  A first
> shot addressing ipcs and pids would be a great start.
>> (I left out some details - eg. the kernel will keep the desire value
>> on a per-thread field, when it will be reset, whether we want to also
>> tag the field with its type and so on, but the idea is now clear).
>> The main two advantages are that first, we don't need to devise a new
>> method for every syscall that allocates said resources (sigh... just
> Agreed.
>> think of clone() nightmare to add a new argument);
> Yes, and then there will need to be the clone_with_pid() extension on
> top of that.
>> second, the change
>> is incremental: first code the mechanism to set the field, then add
>> support in the IPC subsystem, later in the DEVPTS, then in clone and
>> so forth.
>> Oren.
>> Pierre Peiffer wrote:
>>> Kirill Korotaev wrote:
>>>> Why user space can need this API? for checkpointing only?
>>> I would say "at least for checkpointing"... ;) May be someone else may 
>>> find an
>>> interest about this for something else.
>>> In fact, I'm sure that you have some interest in checkpointing; and thus, 
>>> you
>>> have probably some ideas in mind; but whatever the solution you will 
>>> propose,
>>> I'm pretty sure that I could say the same thing for your solution.
>>> And what I finally think is: even if it's for "checkpointing only", if 
>>> many
>>> people are interested by this, it may be sufficient to push this ?
>>>> Then I would not consider it for inclusion until it is clear how to 
>>>> implement checkpointing.
>>>> As for me personally - I'm against exporting such APIs, since they are 
>>>> not needed in real-life user space applications and maintaining it 
>>>> forever for compatibility doesn't worth it.
>>> Maintaining these patches is not a big deal, really, but this is not the 
>>> main
>>> point; the "need in real life" (1) is in fact the main one, and then, the 
>>> "is
>>> this solution the best one ?" (2) the second one.
>>> About (1), as said in my first mail, as the namespaces and containers are 
>>> being
>>> integrated into the mainline kernel, checkpoint/restart is (or will be) 
>>> the next
>>> need.
>>> About (2), my solution propose to do that, as much as possible from 
>>> userspace,
>>> to minimize the kernel impact. Of course, this is subject to discussion. 
>>> My
>>> opinion is that doing a full checkpoint/restart from kernel space will 
>>> need lot
>>> of new specific and intrusive code; I'm not sure that this will be 
>>> acceptable by
>>> the community. But this is my opinion only. Discusion is opened.
>>>> Also such APIs allow creation of non-GPL checkpointing in user-space, 
>>>> which can be of concern as well.
>>> Honestly, I don't think this really a concern at all. I mean: I've never 
>>> seen
>>> "this allows non-GPL binary and thus, this is bad" as an argument to 
>>> reject a
>>> functionality, but I may be wrong, and thus, it can be discussed as well.
>>> I think the points (1) and (2) as stated above are the key ones.
>>> Pierre
>>>> Kirill
>>>> Pierre Peiffer wrote:
>>>>> Hi again,
>>>>> 	Thinking more about this, I think I must clarify why I choose this way.
>>>>> In fact, the idea of these patches is to provide the missing user APIs 
>>>>> (or
>>>>> extend the existing ones) that allow to set or update _all_ properties 
>>>>> of all
>>>>> IPCs, as needed in the case of the checkpoint/restart of an application 
>>>>> (the
>>>>> current user API does not allow to specify an ID for a created IPC, for
>>>>> example). And this, without changing the existing API of course.
>>>>> 	And msgget(), semget() and shmget() does not have any parameter we can 
>>>>> use to
>>>>> specify an ID.
>>>>> 	That's why I've decided to not change these routines and add a new 
>>>>> control
>>>>> command, IP_SETID, with which we can can change the ID of an IPC. (that 
>>>>> looks to
>>>>> me more straightforward and logical)
>>>>> 	Now, this patch is, in fact, only a preparation for the patch 10/15 
>>>>> which
>>>>> really complete the user API by adding this IPC_SETID command.
>>>>> (... continuing below ...)
>>>>> Alexey Dobriyan wrote:
>>>>>> On Tue, Jan 29, 2008 at 05:02:38PM +0100, 
>>>>>> wrote:
>>>>>>> This patch provides three new API to change the ID of an existing
>>>>>>> System V IPCs.
>>>>>>> These APIs are:
>>>>>>> 	long msg_chid(struct ipc_namespace *ns, int id, int newid);
>>>>>>> 	long sem_chid(struct ipc_namespace *ns, int id, int newid);
>>>>>>> 	long shm_chid(struct ipc_namespace *ns, int id, int newid);
>>>>>>> They return 0 or an error code in case of failure.
>>>>>>> They may be useful for setting a specific ID for an IPC when preparing
>>>>>>> a restart operation.
>>>>>>> To be successful, the following rules must be respected:
>>>>>>> - the IPC exists (of course...)
>>>>>>> - the new ID must satisfy the ID computation rule.
>>>>>>> - the entry in the idr corresponding to the new ID must be free.
>>>>>>>  ipc/util.c          |   48 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  ipc/util.h          |    1 +
>>>>>>>  8 files changed, 197 insertions(+)
>>>>>> For the record, OpenVZ uses "create with predefined ID" method which
>>>>>> leads to less code. For example, change at the end is all we want from
>>>>>> ipc/util.c .
>>>>> And in fact, you do that from kernel space, you don't have the 
>>>>> constraint to fit
>>>>> the existing user API.
>>>>> Again, this patch, even if it presents a new kernel API, is in fact a
>>>>> preparation for the next patch which introduces a new user API.
>>>>> Do you think that this could fit your need ?
>> _______________________________________________
>> Containers mailing list

Pierre Peiffer

  parent reply	other threads:[~2008-02-08 10:12 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-29 16:02 [PATCH 2.6.24-rc8-mm1 00/15] IPC: code rewrite + new functionalities pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 01/15] IPC/semaphores: code factorisation pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 02/15] IPC/shared memory: introduce shmctl_down pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 03/15] IPC/message queues: introduce msgctl_down pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 04/15] IPC/semaphores: move the rwmutex handling inside semctl_down pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 05/15] IPC/semaphores: remove one unused parameter from semctl_down() pierre.peiffer
2008-01-31  8:32   ` Nadia Derbey
2008-01-31 10:18     ` Pierre Peiffer
2008-01-31 11:30       ` Nadia Derbey
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 06/15] IPC: get rid of the use *_setbuf structure pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 07/15] IPC: introduce ipc_update_perm() pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 08/15] IPC: consolidate all xxxctl_down() functions pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 09/15] (RFC) IPC: new kernel API to change an ID pierre.peiffer
2008-01-29 21:06   ` Alexey Dobriyan
2008-01-30  9:52     ` Pierre Peiffer
2008-01-31  9:00     ` Pierre Peiffer
2008-01-31  9:54       ` Kirill Korotaev
2008-01-31 11:57         ` Pierre Peiffer
2008-01-31 13:11           ` Kirill Korotaev
2008-01-31 16:10             ` Cedric Le Goater
2008-02-04 13:41               ` Kirill Korotaev
2008-02-04 14:06                 ` [Devel] " Pavel Emelyanov
2008-02-04 15:00                   ` Daniel Lezcano
2008-02-04 15:16                     ` Pavel Emelyanov
2008-02-05  9:51           ` Oren Laadan
2008-02-05 18:00             ` Dave Hansen
2008-02-05 18:42             ` Serge E. Hallyn
2008-02-06  2:07               ` Oren Laadan
2008-02-06  5:00                 ` Serge E. Hallyn
2008-02-08 10:12               ` Pierre Peiffer [this message]
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 10/15] (RFC) IPC: new IPC_SETID command to modify " pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 11/15] (RFC) IPC: new IPC_SETALL command to modify all settings pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 12/15] (RFC) IPC/semaphores: make use of RCU to free the sem_undo_list pierre.peiffer
2008-01-30 21:26   ` Serge E. Hallyn
2008-01-31  9:52     ` Pierre Peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 13/15] (RFC) IPC/semaphores: per <pid> semundo file in procfs pierre.peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 14/15] (RFC) IPC/semaphores: prepare semundo code to work on another task than current pierre.peiffer
2008-01-30 21:44   ` Serge E. Hallyn
2008-01-31  9:48     ` Pierre Peiffer
2008-01-31 18:01       ` Serge E. Hallyn
2008-02-01 12:09         ` Pierre Peiffer
2008-01-29 16:02 ` [PATCH 2.6.24-rc8-mm1 15/15] (RFC) IPC/semaphores: add write() operation to semundo file in procfs pierre.peiffer
2008-02-02 18:23 ` [PATCH 2.6.24-rc8-mm1 00/15] IPC: code rewrite + new functionalities Pavel Machek
2008-02-04 13:52   ` Pierre Peiffer
2008-02-04 15:44   ` Benjamin Thery
2008-02-04 19:51     ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH 2.6.24-rc8-mm1 09/15] (RFC) IPC: new kernel API to	change an ID' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).