LKML Archive on
help / color / mirror / Atom feed
From: David Chinner <>
To: Takashi Sato <>
Cc: David Chinner <>,
	"" <>,
	"" <>,
	"" <>,
	"" <>,
	"" <>
Subject: Re: [RFC PATCH] freeze feature ver 1.0
Date: Mon, 31 Mar 2008 08:31:02 +1000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Fri, Mar 28, 2008 at 06:01:45PM +0900, Takashi Sato wrote:
> Hi,
> David Chinner wrote:
> > Can you please split this into two patches - one which introduces the
> > generic functionality *without* the timeout stuff, and a second patch
> > that introduces the timeouts.
> OK.
> I will send the split patches in subsequent mails. 
> > I think this timeout stuff is dangerous - it adds significant
> > complexity and really does not protect against anything that can't
> > be done in userspace.  i.e. If your system is running well enough
> > for the timer to fire and unfreeze the filesystem, it's running well
> > enough for you to do "freeze X; sleep Y; unfreeze X".  
> If the process is terminated at "sleep Y" by an unexpected
> accident (e.g. signals),  the filesystem will be left frozen.

At which point you run "unfreeze /mnt/fs" and unfreeze it.

If you've got a script that fails in the middle of an operation that
freezes the filesystem, then the error handling of that script
needs to unfreeze the filesystem. The kernel does not need to do

> So, I think the timeout is needed to unfreeze more definitely.

No, it can be handled by userspace perfectly well.

> > FWIW, there is nothing to guarantee that the filesystem has finished
> > freezing when the timeout fires (it's not uncommon to see
> > freeze_bdev() taking *minutes*) and unfreezing in the middle of a
> > freeze operation will cause problems - either for the filesystem
> > in the middle of a freeze operation, or for whatever is freezing the
> > filesystem to get a consistent image.....
> Do you mention the freeze_bdev()'s hang?

If freeze_bdev() hangs, then you've got a buggy filesystem and far
more problems to worry about than undoing the freeze. It's likely
you're going to need a reboot to unwedge then hung filesystem.....

> The salvage target of my timeout is freeze process's accident as below.
> - It is killed before calling the unfreeze ioctl

Can be fixed from userspace.

> - It causes a deadlock by accessing the frozen filesystem

Application bug. Undeadlock it by running "unfreeze /mnt/fs"....

FWIW, DM is quite capable of freezing the filesystem, snapshotting
it and then unfreezing it without hanging, crashing or having nasty
stuff in general happen. We've used 'xfs_freeze -f /mnt/fs;
do_something; xfs_freeze -u /mnt/fs' for years without having
problems with freeze hanging, application deadlocks, etc.....

... And if something has gone wrong during the freeze, it is far,
far better to leave the filesystem stuck in a frozen state than to
unfreeze it and allow it to be damaged further. If you get stuck or
a script gets killed in the middle of execution, then an admin needs
to look at the problem immediately. Just timing out and unfreezing
is about the worst thing you can do because it allows problems
(corruptions, errors, etc) to be propagated and potentially make
things worse before an admin can intervene and fix things up....

Basically, I don't want to have to deal with the "snapshot
image corrupt" bug reports that will come from user
misuse/misunderstanding of the "freeze timeout". It's hard enough
tracking down these sorts of problems without throwing in the
"freeze timed out before completion" possibility that guarantees
a non-consistent snapshot image.....

/me points to the ASSERT_ALWAYS() in xfs_attr_quiesce() that ensures
we get bug reports when the filesystem is still being actively
modified when the freeze "completes".

> So the delayed work for the timeout is set after all of freeze operations
> in freeze_bdev() in my patches.
> I think the filesystem dependent code (write_super_lockfs operation)
> should be implemented not to cause a hang.

And that should already be the case. If write_super_lockfs() hangs,
then you've got a filesystem bug ;)


Dave Chinner
Principal Engineer
SGI Australian Software Group

  reply	other threads:[~2008-03-30 22:31 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-28  9:01 Takashi Sato
2008-03-30 22:31 ` David Chinner [this message]
  -- strict thread matches above, loose matches on Subject: below --
2008-03-24 11:11 Takashi Sato
2008-03-25  1:33 ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \
    --subject='Re: [RFC PATCH] freeze feature ver 1.0' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).