LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: staubach@redhat.com
Cc: miklos@szeredi.hu, akpm@linux-foundation.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [patch 01/22] update ctime and mtime for mmaped write
Date: Wed, 28 Feb 2007 21:35:27 +0100	[thread overview]
Message-ID: <E1HMVWJ-00009K-00@dorka.pomaz.szeredi.hu> (raw)
In-Reply-To: <45E5DF8E.80109@redhat.com> (message from Peter Staubach on Wed, 28 Feb 2007 15:01:18 -0500)

> >> While these entry points do not actually modify the file itself,
> >> as was pointed out, they are handy points at which the kernel gains
> >> control and could actually notice that the contents of the file are
> >> no longer the same as they were, ie. modified.
> >>
> >>  From the operating system viewpoint, this is where the semantics of
> >> modification to file contents via mmap differs from the semantics of
> >> modification to file contents via write(2).
> >>
> >> It is desirable for the file times to be updated as quickly as
> >> possible after the actual modification has occurred.
> >>     
> >
> > I disagree.
> >
> > You don't worry about the timestamp being updated _during_ a large
> > write() call, even though the file is constantly being modified.
> >
> >   
> 
> No, but you do worry about the timestamps being updated after
> every write() call, no matter how large or small.

Right.  All I'm saying is that just writing to a shared mapping
without calling msync() is similar to a write() which hasn't yet
finished.  In both cases, you have a modified file, without a modified
timestamp.

> > You think of write() as something instantaneous, while you think of
> > writing to a shared mapping, then doing msync() as something taking a
> > long time.  In actual fact both of these are basically equivalent
> > operations, the differences being, that you can easily modify
> > non-contiguous parts of a file with mmap, while you can't do that with
> > write.  The disadvantage from mmap comes from the cost of setting up
> > the page tables and handling the faults.
> >
> > Think of it this way:
> >
> >   shared mmap write + msync(MS_ASYNC)  ==  write()
> >   msync(MS_ASYNC) + fsync()  ==  msync(MS_SYNC)
> >
> >   
> 
> I don't believe that this is a valid characterization because the
> changes to the contents of the file, made through the mmap'd region,
> are immediately visible to any and all other applications accessing
> the file.  Since the contents of the file are changing, then so
> should the timestamps to reflect this.

Same case with a large write().  Nothing prevents you from reading a
file, while a huge write is taking place to it, and yet, the
modification time isn't updated.

> I think that we are going to have to agree to disagree because
> I don't agree either with your characterizations of the desirable
> semantics associated with shared mmap or that maintaining the
> correctness in the system is a waste of CPU.

I didn't quite say _that_ in so many words :).  I said that updating
the timestamp on a per-page first dirtying base, or per-inode first
dirtying base is a waste of effort.  Why?

What happens if the application overwrites what it had written some
time later?  Nothing.  The page is already read-write, the pte dirty,
so even though the file was clearly modified, there's absolutely no
way in which this can be used to force an update to the timestamp.

Is there anything special about the _first_ modification?  I don't
think so.  From an external application's point of view it doesn't
matter one whit, whether a modification was through write() or after a
page-fault, or on an already present read-write page.

So what exactly _are_ the semantics we are trying to achieve?

> I view mmap as a way for an application to treat the contents of a
> file as another segment in its address space.  This allows it to
> manipulate the contents of a file without incurring the overhead of
> the read and write system calls and the double buffering that
> naturally occurs with those system calls.  I think that:
> 
>     char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
>     *p = 1;
>     *(p + 4096) = 2;
> 
> should have the same effect as:
> 
>     char c = 1;
>     pwrite(fd, &c, 1, 0);
>     c = 2;
>     pwrite(fd, &c, 1, 4096);

Not necessarily.  This is the equivalent _portable_ call sequence:

     char *p = mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
     *p = 1;
     *(p + 4096) = 2;
     msync(p, 4097, MS_ASYNC);

Yes, on linux the prior would work too, but there's really no point in
allowing applications to be lax and not do it properly.  But we've
been over this.

Miklos

  reply	other threads:[~2007-02-28 20:36 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-27 23:14 [patch 00/22] misc VFS/VM patches and fuse writable shared mapping support Miklos Szeredi
2007-02-27 23:14 ` [patch 01/22] update ctime and mtime for mmaped write Miklos Szeredi
2007-02-28 14:16   ` Peter Staubach
2007-02-28 17:06     ` Miklos Szeredi
2007-02-28 17:21       ` Peter Staubach
2007-02-28 17:51         ` Miklos Szeredi
2007-02-28 20:01           ` Peter Staubach
2007-02-28 20:35             ` Miklos Szeredi [this message]
2007-02-28 20:58               ` Miklos Szeredi
2007-02-28 21:09                 ` Peter Staubach
2007-03-01  7:25                   ` Miklos Szeredi
2007-02-27 23:14 ` [patch 02/22] fix quadratic behavior of shrink_dcache_parent() Miklos Szeredi
2007-02-27 23:14 ` [patch 03/22] fix deadlock in balance_dirty_pages Miklos Szeredi
2007-02-27 23:14 ` [patch 04/22] fix deadlock in throttle_vm_writeout Miklos Szeredi
2007-02-27 23:14 ` [patch 05/22] balance dirty pages from loop device Miklos Szeredi
2007-02-27 23:14 ` [patch 06/22] consolidate generic_writepages and mpage_writepages Miklos Szeredi
2007-02-27 23:14 ` [patch 07/22] add filesystem subtype support Miklos Szeredi
2007-02-27 23:14 ` [patch 08/22] fuse: update backing_dev_info congestion state Miklos Szeredi
2007-02-27 23:14 ` [patch 09/22] fuse: fix reserved request wake up Miklos Szeredi
2007-02-27 23:14 ` [patch 10/22] fuse: add reference counting to fuse_file Miklos Szeredi
2007-02-27 23:14 ` [patch 11/22] fuse: add truncation semaphore Miklos Szeredi
2007-02-27 23:14 ` [patch 12/22] fuse: fix page invalidation Miklos Szeredi
2007-02-27 23:14 ` [patch 13/22] fuse: add list of writable files to fuse_inode Miklos Szeredi
2007-02-27 23:14 ` [patch 14/22] fuse: add helper for asynchronous writes Miklos Szeredi
2007-02-27 23:14 ` [patch 15/22] add non-owner variant of down_read_trylock() Miklos Szeredi
2007-02-27 23:14 ` [patch 16/22] fuse: add fuse_writepage() function Miklos Szeredi
2007-02-27 23:14 ` [patch 17/22] fuse: writable shared mmap support Miklos Szeredi
2007-02-27 23:15 ` [patch 18/22] fuse: add fuse_writepages() function Miklos Szeredi
2007-02-27 23:15 ` [patch 19/22] export sync_sb() to modules Miklos Szeredi
2007-02-27 23:15 ` [patch 20/22] fuse: make dirty stats available Miklos Szeredi
2007-02-27 23:15 ` [patch 21/22] fuse: limit dirty pages Miklos Szeredi
2007-02-27 23:15 ` [patch 22/22] fuse: allow big write requests Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E1HMVWJ-00009K-00@dorka.pomaz.szeredi.hu \
    --to=miklos@szeredi.hu \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=staubach@redhat.com \
    --subject='Re: [patch 01/22] update ctime and mtime for mmaped write' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).