LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Finding hardlinks
@ 2006-12-20  9:03 Mikulas Patocka
  2006-12-20 11:44 ` Miklos Szeredi
  0 siblings, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-20  9:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel

Hi

I've came across this problem: how can a userspace program (such as for 
example "cp -a") tell that two files form a hardlink? Comparing inode 
number will break on filesystems that can have more than 2^32 files (NFS3, 
OCFS, SpadFS; kernel developers already implemented iget5_locked for the 
case of colliding inode numbers). Other possibilities:

--- compare not only ino, but all stat entries and make sure that
 	i_nlink > 1?
 	--- is not 100% reliable either, only lowers failure probability
--- create a hardlink and watch if i_nlink is increased on both files?
 	--- doesn't work on read-only filesystems
--- compare file content?
 	--- "cp -a" won't then corrupt data at least, but will create
 	hardlinks where they shouldn't be.

Is there some reliable way how should "cp -a" command determine that? 
Finding in kernel whether two dentries point to the same inode is trivial 
but I am not sure how to let userspace know ... am I missing something?

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-20  9:03 Finding hardlinks Mikulas Patocka
@ 2006-12-20 11:44 ` Miklos Szeredi
  2006-12-20 16:36   ` Mikulas Patocka
  2006-12-21 18:58   ` Jan Harkes
  0 siblings, 2 replies; 100+ messages in thread
From: Miklos Szeredi @ 2006-12-20 11:44 UTC (permalink / raw)
  To: mikulas; +Cc: linux-kernel, linux-fsdevel

> I've came across this problem: how can a userspace program (such as for 
> example "cp -a") tell that two files form a hardlink? Comparing inode 
> number will break on filesystems that can have more than 2^32 files (NFS3, 
> OCFS, SpadFS; kernel developers already implemented iget5_locked for the 
> case of colliding inode numbers). Other possibilities:
> 
> --- compare not only ino, but all stat entries and make sure that
>  	i_nlink > 1?
>  	--- is not 100% reliable either, only lowers failure probability
> --- create a hardlink and watch if i_nlink is increased on both files?
>  	--- doesn't work on read-only filesystems
> --- compare file content?
>  	--- "cp -a" won't then corrupt data at least, but will create
>  	hardlinks where they shouldn't be.
> 
> Is there some reliable way how should "cp -a" command determine that? 
> Finding in kernel whether two dentries point to the same inode is trivial 
> but I am not sure how to let userspace know ... am I missing something?

The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend
the kstat.ino field to 64bit and fix those filesystems to fill in
kstat correctly.

SUSv3 requires st_ino/st_dev to be unique within a system so the
application shouldn't need to bend over backwards.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-20 11:44 ` Miklos Szeredi
@ 2006-12-20 16:36   ` Mikulas Patocka
  2006-12-20 16:50     ` Miklos Szeredi
  2006-12-21 18:58   ` Jan Harkes
  1 sibling, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-20 16:36 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-kernel, linux-fsdevel

>> I've came across this problem: how can a userspace program (such as for
>> example "cp -a") tell that two files form a hardlink? Comparing inode
>> number will break on filesystems that can have more than 2^32 files (NFS3,
>> OCFS, SpadFS; kernel developers already implemented iget5_locked for the
>> case of colliding inode numbers). Other possibilities:
>>
>> --- compare not only ino, but all stat entries and make sure that
>>  	i_nlink > 1?
>>  	--- is not 100% reliable either, only lowers failure probability
>> --- create a hardlink and watch if i_nlink is increased on both files?
>>  	--- doesn't work on read-only filesystems
>> --- compare file content?
>>  	--- "cp -a" won't then corrupt data at least, but will create
>>  	hardlinks where they shouldn't be.
>>
>> Is there some reliable way how should "cp -a" command determine that?
>> Finding in kernel whether two dentries point to the same inode is trivial
>> but I am not sure how to let userspace know ... am I missing something?
>
> The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend
> the kstat.ino field to 64bit and fix those filesystems to fill in
> kstat correctly.

There is 32-bit __st_ino and 64-bit st_ino --- what is their purpose? Some 
old compatibility code?

> SUSv3 requires st_ino/st_dev to be unique within a system so the
> application shouldn't need to bend over backwards.

I see but kernel needs to be fixed for that. Would patches for changing 
kstat be accepted?

Mikulas

> Miklos
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-20 16:36   ` Mikulas Patocka
@ 2006-12-20 16:50     ` Miklos Szeredi
  2006-12-20 19:54       ` Al Viro
  0 siblings, 1 reply; 100+ messages in thread
From: Miklos Szeredi @ 2006-12-20 16:50 UTC (permalink / raw)
  To: mikulas; +Cc: linux-kernel, linux-fsdevel

> >> I've came across this problem: how can a userspace program (such as for
> >> example "cp -a") tell that two files form a hardlink? Comparing inode
> >> number will break on filesystems that can have more than 2^32 files (NFS3,
> >> OCFS, SpadFS; kernel developers already implemented iget5_locked for the
> >> case of colliding inode numbers). Other possibilities:
> >>
> >> --- compare not only ino, but all stat entries and make sure that
> >>  	i_nlink > 1?
> >>  	--- is not 100% reliable either, only lowers failure probability
> >> --- create a hardlink and watch if i_nlink is increased on both files?
> >>  	--- doesn't work on read-only filesystems
> >> --- compare file content?
> >>  	--- "cp -a" won't then corrupt data at least, but will create
> >>  	hardlinks where they shouldn't be.
> >>
> >> Is there some reliable way how should "cp -a" command determine that?
> >> Finding in kernel whether two dentries point to the same inode is trivial
> >> but I am not sure how to let userspace know ... am I missing something?
> >
> > The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend
> > the kstat.ino field to 64bit and fix those filesystems to fill in
> > kstat correctly.
> 
> There is 32-bit __st_ino and 64-bit st_ino --- what is their purpose? Some 
> old compatibility code?

Yes.

> > SUSv3 requires st_ino/st_dev to be unique within a system so the
> > application shouldn't need to bend over backwards.
> 
> I see but kernel needs to be fixed for that. Would patches for changing 
> kstat be accepted?

I don't see any problems with changing struct kstat.  There would be
reservations against changing inode.i_ino though.

So filesystems that have 64bit inodes will need a specialized
getattr() method instead of generic_fillattr().

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-20 16:50     ` Miklos Szeredi
@ 2006-12-20 19:54       ` Al Viro
  2006-12-20 20:12         ` Mikulas Patocka
  2006-12-31 15:02         ` Mikulas Patocka
  0 siblings, 2 replies; 100+ messages in thread
From: Al Viro @ 2006-12-20 19:54 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: mikulas, linux-kernel, linux-fsdevel

On Wed, Dec 20, 2006 at 05:50:11PM +0100, Miklos Szeredi wrote:
> I don't see any problems with changing struct kstat.  There would be
> reservations against changing inode.i_ino though.
> 
> So filesystems that have 64bit inodes will need a specialized
> getattr() method instead of generic_fillattr().

And they are already free to do so.  And no, struct kstat doesn't need
to be changed - it has u64 ino already.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-20 19:54       ` Al Viro
@ 2006-12-20 20:12         ` Mikulas Patocka
  2006-12-31 15:02         ` Mikulas Patocka
  1 sibling, 0 replies; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-20 20:12 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, linux-kernel, linux-fsdevel



On Wed, 20 Dec 2006, Al Viro wrote:

> On Wed, Dec 20, 2006 at 05:50:11PM +0100, Miklos Szeredi wrote:
>> I don't see any problems with changing struct kstat.  There would be
>> reservations against changing inode.i_ino though.
>>
>> So filesystems that have 64bit inodes will need a specialized
>> getattr() method instead of generic_fillattr().
>
> And they are already free to do so.  And no, struct kstat doesn't need
> to be changed - it has u64 ino already.

I see, I should have checked recent kernel.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-20 11:44 ` Miklos Szeredi
  2006-12-20 16:36   ` Mikulas Patocka
@ 2006-12-21 18:58   ` Jan Harkes
  2006-12-21 23:49     ` Mikulas Patocka
  1 sibling, 1 reply; 100+ messages in thread
From: Jan Harkes @ 2006-12-21 18:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: mikulas, linux-kernel, linux-fsdevel

On Wed, Dec 20, 2006 at 12:44:42PM +0100, Miklos Szeredi wrote:
> The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend
> the kstat.ino field to 64bit and fix those filesystems to fill in
> kstat correctly.

Coda actually uses 128-bit file identifiers internally, so 64-bits
really doesn't cut it. Since the 128-bit space is used pretty sparsely
there is a hash which avoids most collistions in 32-bit i_ino space, but
not completely. I can also imagine that at some point someone wants to
implement a git-based filesystem where it would be more natural to use
160-bit SHA1 hashes as unique object identifiers.

But Coda only allow hardlinks within a single directory and if someone
renames a hardlinked file and one of the names ends up in a different
directory we implicitly create a copy of the object. This actually
leverages off of the way we handle volume snapshots and the fact that we
use whole file caching and writes, so we only copy the metadata while
the data is 'copy-on-write'.

I'm considering changing the way we handle hardlinks by having link(2)
always create a new object with copy-on-write semantics (i.e. replacing
link with some sort of a copyfile operation). This way we can get rid of
several special cases like the cross-directory rename. It also avoids
problems when the various replicas of an object are found to be
inconsistent and we allow the user to expand the file. On expansion a
file becomes a directory that contains all the objects on individual
replicas. Handling the expansion in a dcache friendly way is nasty
enough as is and complicated by the fact that we really don't want such
an expansion to result in hard-linked directories, so we are forced to
inventing new unique object identifiers, etc. Again, not having
hardlinks would simplify things somewhat here.

Any application that tries to be smart enough to keep track of which
files are hardlinked should (in my opinion) also have a way to disable
this behaviour.

Jan


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-21 18:58   ` Jan Harkes
@ 2006-12-21 23:49     ` Mikulas Patocka
  2006-12-22  5:05       ` Jan Harkes
  2006-12-23 10:18       ` Arjan van de Ven
  0 siblings, 2 replies; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-21 23:49 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Miklos Szeredi, linux-kernel, linux-fsdevel

On Thu, 21 Dec 2006, Jan Harkes wrote:

> On Wed, Dec 20, 2006 at 12:44:42PM +0100, Miklos Szeredi wrote:
>> The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend
>> the kstat.ino field to 64bit and fix those filesystems to fill in
>> kstat correctly.
>
> Coda actually uses 128-bit file identifiers internally, so 64-bits
> really doesn't cut it. Since the 128-bit space is used pretty sparsely
> there is a hash which avoids most collistions in 32-bit i_ino space, but
> not completely. I can also imagine that at some point someone wants to
> implement a git-based filesystem where it would be more natural to use
> 160-bit SHA1 hashes as unique object identifiers.
>
> But Coda only allow hardlinks within a single directory and if someone
> renames a hardlinked file and one of the names ends up in a different
> directory we implicitly create a copy of the object. This actually
> leverages off of the way we handle volume snapshots and the fact that we
> use whole file caching and writes, so we only copy the metadata while
> the data is 'copy-on-write'.

The problem is that if inode number collision happens occasionally, you 
get data corruption with cp -a command --- it will just copy one file and 
hardlink the other.

> Any application that tries to be smart enough to keep track of which
> files are hardlinked should (in my opinion) also have a way to disable
> this behaviour.

If user (or script) doesn't specify that flag, it doesn't help. I think 
the best solution for these filesystems would be either to add new syscall
 	int is_hardlink(char *filename1, char *filename2)
(but I know adding syscall bloat may be objectionable)
or add new field in statvfs ST_HAS_BROKEN_INO_T, that applications can 
test and disable hardlink processing.

Mikulas

> Jan
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-21 23:49     ` Mikulas Patocka
@ 2006-12-22  5:05       ` Jan Harkes
  2006-12-23 10:18       ` Arjan van de Ven
  1 sibling, 0 replies; 100+ messages in thread
From: Jan Harkes @ 2006-12-22  5:05 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Miklos Szeredi, linux-kernel, linux-fsdevel

On Fri, Dec 22, 2006 at 12:49:42AM +0100, Mikulas Patocka wrote:
> On Thu, 21 Dec 2006, Jan Harkes wrote:
> >On Wed, Dec 20, 2006 at 12:44:42PM +0100, Miklos Szeredi wrote:
> >>The stat64.st_ino field is 64bit, so AFAICS you'd only need to extend
> >>the kstat.ino field to 64bit and fix those filesystems to fill in
> >>kstat correctly.
> >
> >Coda actually uses 128-bit file identifiers internally, so 64-bits
> >really doesn't cut it. Since the 128-bit space is used pretty sparsely
> 
> The problem is that if inode number collision happens occasionally, you 
> get data corruption with cp -a command --- it will just copy one file and 
> hardlink the other.

Our 128-bit space is fairly sparse and there is some regularity so we
optimized the hash to minimize the chance on collisions. This is also
useful for iget5_locked, each 32-bit ino_t is effectively a hash bucket
in our case and avoiding collisions makes the lookup in the inode cache
more efficient.

Another part is that only few applications actually care about hardlinks
(cp -a, rsync, tar/afio). All of these already could miss some files or
create false hardlinks when files in the tree are renamed during the
copy. We also have a special atomic volume snapshot function that is
used to create a backup, which backs up additional attributes that are
not visible through the standard POSIX/vfs api (directory acls,
creator/owner information, version-vector information for conflict
detection and resolution)

I've also found that most applications that care about hardlinks already
have a check whether the link count is greater than one and the object
is not a directory. This is probably done more for efficiency, it would
be a waste of memory to track every object as a possible hardlink.

And because Coda already restrict hardlinks in many cases they end up
not being used very much, or get converted by a cross-directory rename
to COW objects which of course have nlink == 1.

> If user (or script) doesn't specify that flag, it doesn't help. I think 
> the best solution for these filesystems would be either to add new syscall
> 	int is_hardlink(char *filename1, char *filename2)
> (but I know adding syscall bloat may be objectionable)
> or add new field in statvfs ST_HAS_BROKEN_INO_T, that applications can 
> test and disable hardlink processing.

BROKEN_INO_T sounds a bit harsh, and something like that would really
have to be incorporated in the SuS standard for it to be useful. It also
would require all applications to be updated to check for this flag. On
the other hand if we don't worry about this flag we just have to fix the
few applications that do not yet check that nlink>1 && !IS_DIR. Those
applications would probably appreciate the resulting reduced memory
requirements and performance increase because they end up with
considerably fewer candidates in their internal list of potential
hardlinked objects.

Of course this doesn't solve the problem for some filesystem with
larger than 64-bit object identifiers that wants to support normal
hardlinked files. But adding a BROKEN_INO_T flag doesn't solve it
either, since the backup/copy would not perform hardlink processing in
which case such a file system can just as well always pretend that
i_nlink for files is always one.

Jan


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-21 23:49     ` Mikulas Patocka
  2006-12-22  5:05       ` Jan Harkes
@ 2006-12-23 10:18       ` Arjan van de Ven
  2006-12-23 14:00         ` Mikulas Patocka
  1 sibling, 1 reply; 100+ messages in thread
From: Arjan van de Ven @ 2006-12-23 10:18 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Jan Harkes, Miklos Szeredi, linux-kernel, linux-fsdevel


> 
> If user (or script) doesn't specify that flag, it doesn't help. I think 
> the best solution for these filesystems would be either to add new syscall
>  	int is_hardlink(char *filename1, char *filename2)
> (but I know adding syscall bloat may be objectionable)

it's also the wrong api; the filenames may have been changed under you
just as you return from this call, so it really is a
"was_hardlink_at_some_point()" as you specify it.
If you make it work on fd's.. it has a chance at least.

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-23 10:18       ` Arjan van de Ven
@ 2006-12-23 14:00         ` Mikulas Patocka
  2006-12-28  9:06           ` Benny Halevy
  2006-12-29 10:02           ` Pavel Machek
  0 siblings, 2 replies; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-23 14:00 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Jan Harkes, Miklos Szeredi, linux-kernel, linux-fsdevel

>> If user (or script) doesn't specify that flag, it doesn't help. I think
>> the best solution for these filesystems would be either to add new syscall
>>  	int is_hardlink(char *filename1, char *filename2)
>> (but I know adding syscall bloat may be objectionable)
>
> it's also the wrong api; the filenames may have been changed under you
> just as you return from this call, so it really is a
> "was_hardlink_at_some_point()" as you specify it.
> If you make it work on fd's.. it has a chance at least.

Yes, but it doesn't matter --- if the tree changes under "cp -a" command, 
no one guarantees you what you get.
 	int fis_hardlink(int handle1, int handle 2);
Is another possibility but it can't detect hardlinked symlinks.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-23 14:00         ` Mikulas Patocka
@ 2006-12-28  9:06           ` Benny Halevy
  2006-12-28 10:05             ` Arjan van de Ven
                               ` (2 more replies)
  2006-12-29 10:02           ` Pavel Machek
  1 sibling, 3 replies; 100+ messages in thread
From: Benny Halevy @ 2006-12-28  9:06 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arjan van de Ven, Jan Harkes, Miklos Szeredi, linux-kernel,
	linux-fsdevel, nfsv4

Mikulas Patocka wrote:
>>> If user (or script) doesn't specify that flag, it doesn't help. I think
>>> the best solution for these filesystems would be either to add new syscall
>>>  	int is_hardlink(char *filename1, char *filename2)
>>> (but I know adding syscall bloat may be objectionable)
>> it's also the wrong api; the filenames may have been changed under you
>> just as you return from this call, so it really is a
>> "was_hardlink_at_some_point()" as you specify it.
>> If you make it work on fd's.. it has a chance at least.
> 
> Yes, but it doesn't matter --- if the tree changes under "cp -a" command, 
> no one guarantees you what you get.
>  	int fis_hardlink(int handle1, int handle 2);
> Is another possibility but it can't detect hardlinked symlinks.

It seems like the posix idea of unique <st_dev, st_ino> doesn't
hold water for modern file systems and that creates real problems for
backup apps which rely on that to detect hard links.

Adding a vfs call to check for file equivalence seems like a good idea to me.
A syscall exposing it to user mode apps can look like what you sketched above,
and another variant of it can maybe take two paths and possibly a flags field
(for e.g. don't follow symlinks).

I'm cross-posting this also to nfsv4@ietf. NFS has exactly the same problem
with <fsid, fileid> as fileid is 64 bit wide. Although the nfs client can
determine that two filesystem objects are hard linked if they have the same
filehandle but there are cases where two distinct filehandles can still refer to
the same filesystem object.  Letting the nfs client determine file equivalency
based on filehandles will probably satisfy most users but if the exported
fs supports the new call discussed above, exporting it over NFS makes a
lot of sense to me... What do you guys think about adding such an operation
to NFS?

Benny

> 
> Mikulas
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28  9:06           ` Benny Halevy
@ 2006-12-28 10:05             ` Arjan van de Ven
  2006-12-28 15:24               ` Benny Halevy
  2006-12-28 18:14               ` Mikulas Patocka
  2006-12-28 13:22             ` Jeff Layton
  2007-01-11 23:35             ` Denis Vlasenko
  2 siblings, 2 replies; 100+ messages in thread
From: Arjan van de Ven @ 2006-12-28 10:05 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel,
	linux-fsdevel, nfsv4


> It seems like the posix idea of unique <st_dev, st_ino> doesn't
> hold water for modern file systems 

are you really sure?
and if so, why don't we fix *THAT* instead, rather than adding racy
syscalls and such that just can't really be used right...


-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28  9:06           ` Benny Halevy
  2006-12-28 10:05             ` Arjan van de Ven
@ 2006-12-28 13:22             ` Jeff Layton
  2006-12-28 15:12               ` Benny Halevy
  2007-01-11 23:35             ` Denis Vlasenko
  2 siblings, 1 reply; 100+ messages in thread
From: Jeff Layton @ 2006-12-28 13:22 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Mikulas Patocka, Arjan van de Ven, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

Benny Halevy wrote:
> 
> It seems like the posix idea of unique <st_dev, st_ino> doesn't
> hold water for modern file systems and that creates real problems for
> backup apps which rely on that to detect hard links.
> 

Why not? Granted, many of the filesystems in the Linux kernel don't enforce that 
they have unique st_ino values, but I'm working on a set of patches to try and 
fix that.

> Adding a vfs call to check for file equivalence seems like a good idea to me.
> A syscall exposing it to user mode apps can look like what you sketched above,
> and another variant of it can maybe take two paths and possibly a flags field
> (for e.g. don't follow symlinks).
> 
> I'm cross-posting this also to nfsv4@ietf. NFS has exactly the same problem
> with <fsid, fileid> as fileid is 64 bit wide. Although the nfs client can
> determine that two filesystem objects are hard linked if they have the same
> filehandle but there are cases where two distinct filehandles can still refer to
> the same filesystem object.  Letting the nfs client determine file equivalency
> based on filehandles will probably satisfy most users but if the exported
> fs supports the new call discussed above, exporting it over NFS makes a
> lot of sense to me... What do you guys think about adding such an operation
> to NFS?
> 

This sounds like a bug to me. It seems like we should have a one to one 
correspondence of filehandle -> inode. In what situations would this not be the 
case?

-- Jeff


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 13:22             ` Jeff Layton
@ 2006-12-28 15:12               ` Benny Halevy
  2006-12-28 15:54                 ` Jeff Layton
                                   ` (2 more replies)
  0 siblings, 3 replies; 100+ messages in thread
From: Benny Halevy @ 2006-12-28 15:12 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Mikulas Patocka, Arjan van de Ven, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4


Jeff Layton wrote:
> Benny Halevy wrote:
>> It seems like the posix idea of unique <st_dev, st_ino> doesn't
>> hold water for modern file systems and that creates real problems for
>> backup apps which rely on that to detect hard links.
>>
> 
> Why not? Granted, many of the filesystems in the Linux kernel don't enforce that 
> they have unique st_ino values, but I'm working on a set of patches to try and 
> fix that.

That's great and will surely help most file systems (apparently not Coda as
Jan says they use 128 bit internal file identifiers).

What about 32 bit architectures? Is ino_t going to be 64 bit
there too?

> 
>> Adding a vfs call to check for file equivalence seems like a good idea to me.
>> A syscall exposing it to user mode apps can look like what you sketched above,
>> and another variant of it can maybe take two paths and possibly a flags field
>> (for e.g. don't follow symlinks).
>>
>> I'm cross-posting this also to nfsv4@ietf. NFS has exactly the same problem
>> with <fsid, fileid> as fileid is 64 bit wide. Although the nfs client can
>> determine that two filesystem objects are hard linked if they have the same
>> filehandle but there are cases where two distinct filehandles can still refer to
>> the same filesystem object.  Letting the nfs client determine file equivalency
>> based on filehandles will probably satisfy most users but if the exported
>> fs supports the new call discussed above, exporting it over NFS makes a
>> lot of sense to me... What do you guys think about adding such an operation
>> to NFS?
>>
> 
> This sounds like a bug to me. It seems like we should have a one to one 
> correspondence of filehandle -> inode. In what situations would this not be the 
> case?

Well, the NFS protocol allows that [see rfc1813, p. 21: "If two file handles from
the same server are equal, they must refer to the same file, but if they are not
equal, no conclusions can be drawn."]

As an example, some file systems encode hint information into the filehandle
and the hints may change over time, another example is encoding parent
information into the filehandle and then handles representing hard links
to the same file from different directories will differ.

> 
> -- Jeff
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 10:05             ` Arjan van de Ven
@ 2006-12-28 15:24               ` Benny Halevy
  2006-12-28 19:58                 ` Miklos Szeredi
  2006-12-28 18:14               ` Mikulas Patocka
  1 sibling, 1 reply; 100+ messages in thread
From: Benny Halevy @ 2006-12-28 15:24 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel,
	linux-fsdevel, nfsv4

Arjan van de Ven wrote:
>> It seems like the posix idea of unique <st_dev, st_ino> doesn't
>> hold water for modern file systems 
> 
> are you really sure?

Well Jan's example was of Coda that uses 128-bit internal file ids.

> and if so, why don't we fix *THAT* instead

Hmm, sometimes you can't fix the world, especially if the filesystem
is exported over NFS and has a problem with fitting its file IDs uniquely
into a 64-bit identifier.

> rather than adding racy
> syscalls and such that just can't really be used right...
> 

If the syscall is working on two pathnames I agree there might be a race
that isn't different from calling lstat() on each of these names
before opening them. But I'm not sure I see a race if you operate on two
open file descriptors (compared to fstat()ing both of them)

On the nfs side, if the client looked up two names (or opened them over nfsv4)
and has two filehandles in hand, asking the server whether they refer to the
same object isn't racy.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 15:12               ` Benny Halevy
@ 2006-12-28 15:54                 ` Jeff Layton
  2006-12-28 16:26                   ` Jan Engelhardt
  2006-12-28 18:17                 ` Mikulas Patocka
  2006-12-29 10:12                 ` Trond Myklebust
  2 siblings, 1 reply; 100+ messages in thread
From: Jeff Layton @ 2006-12-28 15:54 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Mikulas Patocka, Arjan van de Ven, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

Benny Halevy wrote:
> Jeff Layton wrote:
>> Benny Halevy wrote:
>>> It seems like the posix idea of unique <st_dev, st_ino> doesn't
>>> hold water for modern file systems and that creates real problems for
>>> backup apps which rely on that to detect hard links.
>>>
>> Why not? Granted, many of the filesystems in the Linux kernel don't enforce that 
>> they have unique st_ino values, but I'm working on a set of patches to try and 
>> fix that.
> 
> That's great and will surely help most file systems (apparently not Coda as
> Jan says they use 128 bit internal file identifiers).
> 
> What about 32 bit architectures? Is ino_t going to be 64 bit
> there too?
> 

Sorry, I should qualify that statement. A lot of filesystems don't have 
permanent i_ino values (mostly pseudo filesystems -- pipefs, sockfs, /proc 
stuff, etc). For those, the idea is to try to make sure we use 32 bit values for 
them and to ensure that they are uniquely assigned. I unfortunately can't do 
much about filesystems that do have permanent inode numbers.

>>> Adding a vfs call to check for file equivalence seems like a good idea to me.
>>> A syscall exposing it to user mode apps can look like what you sketched above,
>>> and another variant of it can maybe take two paths and possibly a flags field
>>> (for e.g. don't follow symlinks).
>>>
>>> I'm cross-posting this also to nfsv4@ietf. NFS has exactly the same problem
>>> with <fsid, fileid> as fileid is 64 bit wide. Although the nfs client can
>>> determine that two filesystem objects are hard linked if they have the same
>>> filehandle but there are cases where two distinct filehandles can still refer to
>>> the same filesystem object.  Letting the nfs client determine file equivalency
>>> based on filehandles will probably satisfy most users but if the exported
>>> fs supports the new call discussed above, exporting it over NFS makes a
>>> lot of sense to me... What do you guys think about adding such an operation
>>> to NFS?
>>>
>> This sounds like a bug to me. It seems like we should have a one to one 
>> correspondence of filehandle -> inode. In what situations would this not be the 
>> case?
> 
> Well, the NFS protocol allows that [see rfc1813, p. 21: "If two file handles from
> the same server are equal, they must refer to the same file, but if they are not
> equal, no conclusions can be drawn."]
> 
> As an example, some file systems encode hint information into the filehandle
> and the hints may change over time, another example is encoding parent
> information into the filehandle and then handles representing hard links
> to the same file from different directories will differ.
> 

Interesting. That does seem to break the method of st_dev/st_ino for finding 
hardlinks. For Linux fileservers I think we generally do have 1:1 correspondence 
so that's not generally an issue.

If we're getting into changing specs, though, I think it would be better to 
change it to enforce a 1:1 filehandle to inode correspondence rather than making 
new NFS ops. That does mean you can't use the filehandle for carrying other 
info, but it seems like there ought to be better mechanisms for that.

-- Jeff

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 15:54                 ` Jeff Layton
@ 2006-12-28 16:26                   ` Jan Engelhardt
  0 siblings, 0 replies; 100+ messages in thread
From: Jan Engelhardt @ 2006-12-28 16:26 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Benny Halevy, Mikulas Patocka, Arjan van de Ven, Jan Harkes,
	Miklos Szeredi, linux-kernel, linux-fsdevel, nfsv4


On Dec 28 2006 10:54, Jeff Layton wrote:
>
> Sorry, I should qualify that statement. A lot of filesystems don't have
> permanent i_ino values (mostly pseudo filesystems -- pipefs, sockfs, /proc
> stuff, etc). For those, the idea is to try to make sure we use 32 bit values
> for them and to ensure that they are uniquely assigned. I unfortunately can't
> do much about filesystems that do have permanent inode numbers.

Anyway, this could probably come in handy for unionfs too.


	-`J'
-- 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 10:05             ` Arjan van de Ven
  2006-12-28 15:24               ` Benny Halevy
@ 2006-12-28 18:14               ` Mikulas Patocka
  2006-12-29 10:34                 ` Trond Myklebust
  1 sibling, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-28 18:14 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Benny Halevy, Jan Harkes, Miklos Szeredi, linux-kernel,
	linux-fsdevel, nfsv4

On Thu, 28 Dec 2006, Arjan van de Ven wrote:

>
>> It seems like the posix idea of unique <st_dev, st_ino> doesn't
>> hold water for modern file systems
>
> are you really sure?
> and if so, why don't we fix *THAT* instead, rather than adding racy
> syscalls and such that just can't really be used right...

Why don't you rip off the support for colliding inode number from the 
kernel at all (i.e. remove iget5_locked)?

It's reasonable to have either no support for colliding ino_t or full 
support for that (including syscalls that userspace can use to work with 
such filesystem) --- but I don't see any point in having half-way support 
in kernel as is right now.

As for syscall races --- if you pack something with tar and the directory 
changes underneath, you can't expect sane output anyway.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 15:12               ` Benny Halevy
  2006-12-28 15:54                 ` Jeff Layton
@ 2006-12-28 18:17                 ` Mikulas Patocka
  2006-12-28 20:07                   ` Halevy, Benny
  2006-12-29 10:12                 ` Trond Myklebust
  2 siblings, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-28 18:17 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Jeff Layton, Arjan van de Ven, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

>> This sounds like a bug to me. It seems like we should have a one to one
>> correspondence of filehandle -> inode. In what situations would this not be the
>> case?
>
> Well, the NFS protocol allows that [see rfc1813, p. 21: "If two file handles from
> the same server are equal, they must refer to the same file, but if they are not
> equal, no conclusions can be drawn."]
>
> As an example, some file systems encode hint information into the filehandle
> and the hints may change over time, another example is encoding parent
> information into the filehandle and then handles representing hard links
> to the same file from different directories will differ.

BTW. how does (or how should?) NFS client deal with cache coherency if 
filehandles for the same file differ?

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 15:24               ` Benny Halevy
@ 2006-12-28 19:58                 ` Miklos Szeredi
  2007-01-02 19:15                   ` Pavel Machek
  0 siblings, 1 reply; 100+ messages in thread
From: Miklos Szeredi @ 2006-12-28 19:58 UTC (permalink / raw)
  To: bhalevy; +Cc: arjan, mikulas, jaharkes, linux-kernel, linux-fsdevel, nfsv4

> >> It seems like the posix idea of unique <st_dev, st_ino> doesn't
> >> hold water for modern file systems 
> > 
> > are you really sure?
> 
> Well Jan's example was of Coda that uses 128-bit internal file ids.
> 
> > and if so, why don't we fix *THAT* instead
> 
> Hmm, sometimes you can't fix the world, especially if the filesystem
> is exported over NFS and has a problem with fitting its file IDs uniquely
> into a 64-bit identifier.

Note, it's pretty easy to fit _anything_ into a 64-bit identifier with
the use of a good hash function.  The chance of an accidental
collision is infinitesimally small.  For a set of 

         100 files: 0.00000000000003%
   1,000,000 files: 0.000003%

And usually (tar, diff, cp -a, etc.) work with a very limited set of
st_ino's.  An app that would store a million st_ino values and compare
each new to all the existing ones would be having severe performance
problems and yet _almost never_ come across a false positive.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: Finding hardlinks
  2006-12-28 18:17                 ` Mikulas Patocka
@ 2006-12-28 20:07                   ` Halevy, Benny
  2006-12-29 10:28                     ` [nfsv4] " Trond Myklebust
  0 siblings, 1 reply; 100+ messages in thread
From: Halevy, Benny @ 2006-12-28 20:07 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jeff Layton, Arjan van de Ven, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

Mikulas Patocka wrote:
> 
>>> This sounds like a bug to me. It seems like we should have a one to one
>>> correspondence of filehandle -> inode. In what situations would this not be the
>>> case?
>>
>> Well, the NFS protocol allows that [see rfc1813, p. 21: "If two file handles from
>> the same server are equal, they must refer to the same file, but if they are not
>> equal, no conclusions can be drawn."]
>>
>> As an example, some file systems encode hint information into the filehandle
>> and the hints may change over time, another example is encoding parent
>> information into the filehandle and then handles representing hard links
>> to the same file from different directories will differ.
>
>BTW. how does (or how should?) NFS client deal with cache coherency if 
>filehandles for the same file differ?
>

Trond can probably answer this better than me...
As I read it, currently the nfs client matches both the fileid and the
filehandle (in nfs_find_actor). This means that different filehandles
for the same file would result in different inodes :(.
Strictly following the nfs protocol, comparing only the fileid should
be enough IF fileids are indeed unique within the filesystem.
Comparing the filehandle works as a workaround when the exported filesystem
(or the nfs server) violates that.  From a user stand point I think that
this should be configurable, probably per mount point.

>Mikulas
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-23 14:00         ` Mikulas Patocka
  2006-12-28  9:06           ` Benny Halevy
@ 2006-12-29 10:02           ` Pavel Machek
  2007-01-01 22:47             ` Mikulas Patocka
  1 sibling, 1 reply; 100+ messages in thread
From: Pavel Machek @ 2006-12-29 10:02 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arjan van de Ven, Jan Harkes, Miklos Szeredi, linux-kernel,
	linux-fsdevel

Hi!

> >>If user (or script) doesn't specify that flag, it 
> >>doesn't help. I think
> >>the best solution for these filesystems would be 
> >>either to add new syscall
> >> 	int is_hardlink(char *filename1, char *filename2)
> >>(but I know adding syscall bloat may be objectionable)
> >
> >it's also the wrong api; the filenames may have been 
> >changed under you
> >just as you return from this call, so it really is a
> >"was_hardlink_at_some_point()" as you specify it.
> >If you make it work on fd's.. it has a chance at least.
> 
> Yes, but it doesn't matter --- if the tree changes under 
> "cp -a" command, no one guarantees you what you get.
> 	int fis_hardlink(int handle1, int handle 2);
> Is another possibility but it can't detect hardlinked 
> symlinks.

Ugh. Is it even legal to hardlink symlinks?

Anyway, cp -a is not the only application that wants to do hardlink
detection.
						Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 15:12               ` Benny Halevy
  2006-12-28 15:54                 ` Jeff Layton
  2006-12-28 18:17                 ` Mikulas Patocka
@ 2006-12-29 10:12                 ` Trond Myklebust
  2006-12-31 21:19                   ` Halevy, Benny
  2 siblings, 1 reply; 100+ messages in thread
From: Trond Myklebust @ 2006-12-29 10:12 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Jeff Layton, Mikulas Patocka, Arjan van de Ven, Jan Harkes,
	Miklos Szeredi, linux-kernel, linux-fsdevel, nfsv4

On Thu, 2006-12-28 at 17:12 +0200, Benny Halevy wrote:

> As an example, some file systems encode hint information into the filehandle
> and the hints may change over time, another example is encoding parent
> information into the filehandle and then handles representing hard links
> to the same file from different directories will differ.

Both these examples are bogus. Filehandle information should not change
over time (except in the special case of NFSv4 "volatile filehandles")
and they should definitely not encode parent directory information that
can change over time (think rename()!).

Cheers
  Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2006-12-28 20:07                   ` Halevy, Benny
@ 2006-12-29 10:28                     ` Trond Myklebust
  2006-12-31 21:25                       ` Halevy, Benny
  0 siblings, 1 reply; 100+ messages in thread
From: Trond Myklebust @ 2006-12-29 10:28 UTC (permalink / raw)
  To: Halevy, Benny
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

On Thu, 2006-12-28 at 15:07 -0500, Halevy, Benny wrote:
> Mikulas Patocka wrote:

> >BTW. how does (or how should?) NFS client deal with cache coherency if 
> >filehandles for the same file differ?
> >
> 
> Trond can probably answer this better than me...
> As I read it, currently the nfs client matches both the fileid and the
> filehandle (in nfs_find_actor). This means that different filehandles
> for the same file would result in different inodes :(.
> Strictly following the nfs protocol, comparing only the fileid should
> be enough IF fileids are indeed unique within the filesystem.
> Comparing the filehandle works as a workaround when the exported filesystem
> (or the nfs server) violates that.  From a user stand point I think that
> this should be configurable, probably per mount point.

Matching files by fileid instead of filehandle is a lot more trouble
since fileids may be reused after a file has been deleted. Every time
you look up a file, and get a new filehandle for the same fileid, you
would at the very least have to do another GETATTR using one of the
'old' filehandles in order to ensure that the file is the same object as
the one you have cached. Then there is the issue of what to do when you
open(), read() or write() to the file: which filehandle do you use, are
the access permissions the same for all filehandles, ...

All in all, much pain for little or no gain.

Most servers therefore take great pains to ensure that clients can use
filehandles to identify inodes. The exceptions tend to be broken in
other ways (Note: knfsd without the no_subtree_check option is one of
these exceptions - it can break in the case of cross-directory renames).

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 18:14               ` Mikulas Patocka
@ 2006-12-29 10:34                 ` Trond Myklebust
  2006-12-30  1:04                   ` Mikulas Patocka
  0 siblings, 1 reply; 100+ messages in thread
From: Trond Myklebust @ 2006-12-29 10:34 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:
> Why don't you rip off the support for colliding inode number from the 
> kernel at all (i.e. remove iget5_locked)?
> 
> It's reasonable to have either no support for colliding ino_t or full 
> support for that (including syscalls that userspace can use to work with 
> such filesystem) --- but I don't see any point in having half-way support 
> in kernel as is right now.

What would ino_t have to do with inode numbers? It is only used as a
hash table lookup. The inode number is set in the ->getattr() callback.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-29 10:34                 ` Trond Myklebust
@ 2006-12-30  1:04                   ` Mikulas Patocka
  2007-01-01  2:30                     ` Nikita Danilov
  2007-01-02 23:14                     ` Trond Myklebust
  0 siblings, 2 replies; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-30  1:04 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4



On Fri, 29 Dec 2006, Trond Myklebust wrote:

> On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:
>> Why don't you rip off the support for colliding inode number from the
>> kernel at all (i.e. remove iget5_locked)?
>>
>> It's reasonable to have either no support for colliding ino_t or full
>> support for that (including syscalls that userspace can use to work with
>> such filesystem) --- but I don't see any point in having half-way support
>> in kernel as is right now.
>
> What would ino_t have to do with inode numbers? It is only used as a
> hash table lookup. The inode number is set in the ->getattr() callback.

The question is: why does the kernel contain iget5 function that looks up 
according to callback, if the filesystem cannot have more than 64-bit 
inode identifier?

This lookup callback just induces writing bad filesystems with coliding 
inode numbers. Either remove coda, smb (and possibly other) filesystems 
from the kernel or make a proper support for userspace for them.

The situation is that current coreutils 6.7 fail to recursively copy 
directories if some two directories in the tree have coliding inode 
number, so you get random data corruption with these filesystems.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-20 19:54       ` Al Viro
  2006-12-20 20:12         ` Mikulas Patocka
@ 2006-12-31 15:02         ` Mikulas Patocka
  1 sibling, 0 replies; 100+ messages in thread
From: Mikulas Patocka @ 2006-12-31 15:02 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, linux-kernel, linux-fsdevel

On Wed, 20 Dec 2006, Al Viro wrote:

> On Wed, Dec 20, 2006 at 05:50:11PM +0100, Miklos Szeredi wrote:
>> I don't see any problems with changing struct kstat.  There would be
>> reservations against changing inode.i_ino though.
>>
>> So filesystems that have 64bit inodes will need a specialized
>> getattr() method instead of generic_fillattr().
>
> And they are already free to do so.  And no, struct kstat doesn't need
> to be changed - it has u64 ino already.

If I return 64-bit values as ino_t, 32-bit programs will get EOVERFLOW on 
stat attempt (even if they are not going to use st_ino in any way) --- I 
know that POSIX specifies it, but the question is if it is useful.

What is the correct solution? Mount option that can differentiate between 
32-bit colliding inode numbers and 64-bit non-colliding inode numbers? Or 
is there any better idea.

Given the fact that glibc compiles anything by default with 32-bit ino_t, 
I wonder if returning 64-bit inode number is possible at all.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: Finding hardlinks
  2006-12-29 10:12                 ` Trond Myklebust
@ 2006-12-31 21:19                   ` Halevy, Benny
  2007-01-02 23:20                     ` Trond Myklebust
  2007-01-02 23:46                     ` Trond Myklebust
  0 siblings, 2 replies; 100+ messages in thread
From: Halevy, Benny @ 2006-12-31 21:19 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Jeff Layton, Mikulas Patocka, Arjan van de Ven, Jan Harkes,
	Miklos Szeredi, linux-kernel, linux-fsdevel, nfsv4

Trond Myklebust wrote:
>  
> On Thu, 2006-12-28 at 17:12 +0200, Benny Halevy wrote:
> 
> > As an example, some file systems encode hint information into the filehandle
> > and the hints may change over time, another example is encoding parent
> > information into the filehandle and then handles representing hard links
> > to the same file from different directories will differ.
> 
> Both these examples are bogus. Filehandle information should not change
> over time (except in the special case of NFSv4 "volatile filehandles")
> and they should definitely not encode parent directory information that
> can change over time (think rename()!).
> 
> Cheers
>   Trond
> 

The first one is a real life example.  Hints in the filehandle change
over time.  The old filehandles are valid indefinitely, until the file
is deleted and hence can be considered permanent and not volatile.
What you say above, however, contradicts the NFS protocol as I understand
it. Here's some relevant text from the latest NFSv4.1 draft (the text in
4.2.1 below exists in in a similar form also in NFSv3, rfc1813)

| 4.2.1.  General Properties of a Filehandle
| ...
| If two filehandles from the same server are equal, they MUST refer to
| the same file. Servers SHOULD try to maintain a one-to-one correspondence
| between filehandles and files but this is not required. Clients MUST use
| filehandle comparisons only to improve performance, not for correct behavior
| All clients need to be prepared for situations in which it cannot be
| determined whether two filehandles denote the same object and in such cases,
| avoid making invalid assumptions which might cause incorrect behavior.
| Further discussion of filehandle and attribute comparison in the context of
| data caching is presented in the section "Data Caching and File Identity".
...
| 9.3.4.  Data Caching and File Identity
| ...
|  When clients cache data, the file data needs to be organized according to
| the file system object to which the data belongs. For NFS version 3 clients,
| the typical practice has been to assume for the purpose of caching that
| distinct filehandles represent distinct file system objects. The client then
| has the choice to organize and maintain the data cache on this basis.
|
| In the NFS version 4 protocol, there is now the possibility to have
| significant deviations from a "one filehandle per object" model because a
| filehandle may be constructed on the basis of the object's pathname.
| Therefore, clients need a reliable method to determine if two filehandles
| designate the same file system object. If clients were simply to assume that
| all distinct filehandles denote distinct objects and proceed to do data
| caching on this basis, caching inconsistencies would arise between the
| distinct client side objects which mapped to the same server side object.
|
| By providing a method to differentiate filehandles, the NFS version 4
| protocol alleviates a potential functional regression in comparison with the
| NFS version 3 protocol. Without this method, caching inconsistencies within
| the same client could occur and this has not been present in previous
| versions of the NFS protocol. Note that it is possible to have such
| inconsistencies with applications executing on multiple clients but that is
| not the issue being addressed here.
|
| For the purposes of data caching, the following steps allow an NFS version 4
| client to determine whether two distinct filehandles denote the same server
| side object:
|
|     * If GETATTR directed to two filehandles returns different values of the
|       fsid attribute, then the filehandles represent distinct objects.
|     * If GETATTR for any file with an fsid that matches the fsid of the two
|       filehandles in question returns a unique_handles attribute with a value
|       of TRUE, then the two objects are distinct.
|     * If GETATTR directed to the two filehandles does not return the fileid
|       attribute for both of the handles, then it cannot be determined whether
|       the two objects are the same. Therefore, operations which depend on that
|       knowledge (e.g. client side data caching) cannot be done reliably.
|     * If GETATTR directed to the two filehandles returns different values for
|       the fileid attribute, then they are distinct objects.
|     * Otherwise they are the same object.

Even for NFSv3 (that doesn't have the unique_handles attribute I think
that the linux nfs client can do a better job.  If you'd have a filehandle
cache that points at inodes you could maintain a many to one relationship
from multiple filehandles into one inode.  When you discover a new filehandle
you can look up the inode cache for the same fileid and if one is found you
can do a getattr on the old filehandle (without loss of generality you should 
always use the latest filehandle that was returned for that filesystem object,
although any filehandle that refers to it can be used).
If the getattr succeeded then the filehandles refer to the same fs object and
you can create a new entry in the filehandle cache pointing at that inode.
Otherwise, if getattr says that the old filehandle is stale I think you should
mark the inode as stale and keep it around so that applications can get an
appropriate error until last close, before you clean up the fh cache from the
stale filehandles. A new inode structure should be created for the new filehandle.

Benny

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [nfsv4] RE: Finding hardlinks
  2006-12-29 10:28                     ` [nfsv4] " Trond Myklebust
@ 2006-12-31 21:25                       ` Halevy, Benny
  2007-01-02 23:21                         ` Trond Myklebust
  0 siblings, 1 reply; 100+ messages in thread
From: Halevy, Benny @ 2006-12-31 21:25 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

Trond Myklebust wrote:
>  
> On Thu, 2006-12-28 at 15:07 -0500, Halevy, Benny wrote:
> > Mikulas Patocka wrote:
> 
> > >BTW. how does (or how should?) NFS client deal with cache coherency if 
> > >filehandles for the same file differ?
> > >
> > 
> > Trond can probably answer this better than me...
> > As I read it, currently the nfs client matches both the fileid and the
> > filehandle (in nfs_find_actor). This means that different filehandles
> > for the same file would result in different inodes :(.
> > Strictly following the nfs protocol, comparing only the fileid should
> > be enough IF fileids are indeed unique within the filesystem.
> > Comparing the filehandle works as a workaround when the exported filesystem
> > (or the nfs server) violates that.  From a user stand point I think that
> > this should be configurable, probably per mount point.
> 
> Matching files by fileid instead of filehandle is a lot more trouble
> since fileids may be reused after a file has been deleted. Every time
> you look up a file, and get a new filehandle for the same fileid, you
> would at the very least have to do another GETATTR using one of the
> 'old' filehandles in order to ensure that the file is the same object as
> the one you have cached. Then there is the issue of what to do when you
> open(), read() or write() to the file: which filehandle do you use, are
> the access permissions the same for all filehandles, ...
> 
> All in all, much pain for little or no gain.

See my answer to your previous reply.  It seems like the current
implementation is in violation of the nfs protocol and the extra pain
is required.

> 
> Most servers therefore take great pains to ensure that clients can use
> filehandles to identify inodes. The exceptions tend to be broken in
> other ways

This is true maybe in linux, but not necessarily in non-linux based nfs
servers.

> (Note: knfsd without the no_subtree_check option is one of
> these exceptions - it can break in the case of cross-directory renames).
> 
> Cheers,
>   Trond



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-30  1:04                   ` Mikulas Patocka
@ 2007-01-01  2:30                     ` Nikita Danilov
  2007-01-01 22:58                       ` Mikulas Patocka
  2007-01-02 23:14                     ` Trond Myklebust
  1 sibling, 1 reply; 100+ messages in thread
From: Nikita Danilov @ 2007-01-01  2:30 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

Mikulas Patocka writes:
 > 
 > 
 > On Fri, 29 Dec 2006, Trond Myklebust wrote:
 > 
 > > On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:
 > >> Why don't you rip off the support for colliding inode number from the
 > >> kernel at all (i.e. remove iget5_locked)?
 > >>
 > >> It's reasonable to have either no support for colliding ino_t or full
 > >> support for that (including syscalls that userspace can use to work with
 > >> such filesystem) --- but I don't see any point in having half-way support
 > >> in kernel as is right now.
 > >
 > > What would ino_t have to do with inode numbers? It is only used as a
 > > hash table lookup. The inode number is set in the ->getattr() callback.
 > 
 > The question is: why does the kernel contain iget5 function that looks up 
 > according to callback, if the filesystem cannot have more than 64-bit 
 > inode identifier?

Generally speaking, file system might have two different identifiers for
files:

 - one that makes it easy to tell whether two files are the same one;

 - one that makes it easy to locate file on the storage.

According to POSIX, inode number should always work as identifier of the
first class, but not necessary as one of the second. For example, in
reiserfs something called "a key" is used to locate on-disk inode, which
in turn, contains inode number. Identifiers of the second class tend to
live in directory entries, and during lookup we want to consult inode
cache _before_ reading inode from the disk (otherwise cache is mostly
useless), right? This means that some file systems want to index inodes
in a cache by something different than inode number.

There is another reason, why I, personally, would like to have an
ability to index inodes by things other than inode numbers: delayed
inode number allocation. Strictly speaking, file system has to assign
inode number to the file only when it is just about to report it to the
user space (either though stat, or, ugh... readdir). If location of
inode on disk depends on its inode number (like it is in inode-table
based file systems like ext[23]) then delayed inode number allocation
has to same advantages as delayed block allocation.

 > 
 > This lookup callback just induces writing bad filesystems with coliding 
 > inode numbers. Either remove coda, smb (and possibly other) filesystems 
 > from the kernel or make a proper support for userspace for them.
 > 
 > The situation is that current coreutils 6.7 fail to recursively copy 
 > directories if some two directories in the tree have coliding inode 
 > number, so you get random data corruption with these filesystems.
 > 
 > Mikulas

Nikita.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-29 10:02           ` Pavel Machek
@ 2007-01-01 22:47             ` Mikulas Patocka
  2007-01-01 23:53               ` Jan Harkes
  0 siblings, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-01 22:47 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Arjan van de Ven, Jan Harkes, Miklos Szeredi, linux-kernel,
	linux-fsdevel

Hi!

>>>> If user (or script) doesn't specify that flag, it
>>>> doesn't help. I think
>>>> the best solution for these filesystems would be
>>>> either to add new syscall
>>>> 	int is_hardlink(char *filename1, char *filename2)
>>>> (but I know adding syscall bloat may be objectionable)
>>>
>>> it's also the wrong api; the filenames may have been
>>> changed under you
>>> just as you return from this call, so it really is a
>>> "was_hardlink_at_some_point()" as you specify it.
>>> If you make it work on fd's.. it has a chance at least.
>>
>> Yes, but it doesn't matter --- if the tree changes under
>> "cp -a" command, no one guarantees you what you get.
>> 	int fis_hardlink(int handle1, int handle 2);
>> Is another possibility but it can't detect hardlinked
>> symlinks.
>
> Ugh. Is it even legal to hardlink symlinks?

Why it shoudln't be? It seems to work quite fine in Linux.

> Anyway, cp -a is not the only application that wants to do hardlink
> detection.

I tested programs for ino_t collision (I intentionally injected it) and 
found that CP from coreutils 6.7 fails to copy directories but displays 
error messages (coreutils 5 work fine). MC and ARJ skip directories with 
colliding ino_t and pretend that operation completed successfuly. FTS 
library fails to walk directories returning FTS_DC error. Diffutils, find, 
grep fail to search directories with coliding inode numbers. Tar seems 
tolerant except incremental backup (which I didn't try). All programs 
except diff were tolerant to coliding ino_t on files.

ino_t is no longer unique in many filesystems, it seems like quite serious 
data corruption possibility.

Mikulas

> 						Pavel

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-01  2:30                     ` Nikita Danilov
@ 2007-01-01 22:58                       ` Mikulas Patocka
  2007-01-01 23:05                         ` Nikita Danilov
  0 siblings, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-01 22:58 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

> > The question is: why does the kernel contain iget5 function that looks up
> > according to callback, if the filesystem cannot have more than 64-bit
> > inode identifier?
>
> Generally speaking, file system might have two different identifiers for
> files:
>
> - one that makes it easy to tell whether two files are the same one;
>
> - one that makes it easy to locate file on the storage.
>
> According to POSIX, inode number should always work as identifier of the
> first class, but not necessary as one of the second. For example, in
> reiserfs something called "a key" is used to locate on-disk inode, which
> in turn, contains inode number. Identifiers of the second class tend to

BTW. How does ReiserFS find that a given inode number (or object ID in 
ReiserFS terminology) is free before assigning it to new file/directory?

Mikulas

> live in directory entries, and during lookup we want to consult inode
> cache _before_ reading inode from the disk (otherwise cache is mostly
> useless), right? This means that some file systems want to index inodes
> in a cache by something different than inode number.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-01 22:58                       ` Mikulas Patocka
@ 2007-01-01 23:05                         ` Nikita Danilov
  2007-01-01 23:22                           ` Mikulas Patocka
  0 siblings, 1 reply; 100+ messages in thread
From: Nikita Danilov @ 2007-01-01 23:05 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

Mikulas Patocka writes:

[...]

 > 
 > BTW. How does ReiserFS find that a given inode number (or object ID in 
 > ReiserFS terminology) is free before assigning it to new file/directory?

reiserfs v3 has an extent map of free object identifiers in
super-block. reiser4 used 64 bit object identifiers without reuse.

 > 
 > Mikulas

Nikita.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-01 23:05                         ` Nikita Danilov
@ 2007-01-01 23:22                           ` Mikulas Patocka
  2007-01-04 13:59                             ` Nikita Danilov
  0 siblings, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-01 23:22 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

> > BTW. How does ReiserFS find that a given inode number (or object ID in
> > ReiserFS terminology) is free before assigning it to new file/directory?
>
> reiserfs v3 has an extent map of free object identifiers in
> super-block.

Inode free space can have at most 2^31 extents --- if inode numbers 
alternate between "allocated", "free". How do you pack it to superblock?

> reiser4 used 64 bit object identifiers without reuse.

So you are going to hit the same problem as I did with SpadFS --- you 
can't export 64-bit inode number to userspace (programs without 
-D_FILE_OFFSET_BITS=64 will have stat() randomly failing with EOVERFLOW 
then) and if you export only 32-bit number, it will eventually wrap-around 
and colliding st_ino will cause data corruption with many userspace 
programs.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-01 22:47             ` Mikulas Patocka
@ 2007-01-01 23:53               ` Jan Harkes
  2007-01-02  0:04                 ` Mikulas Patocka
  0 siblings, 1 reply; 100+ messages in thread
From: Jan Harkes @ 2007-01-01 23:53 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Pavel Machek, Arjan van de Ven, Miklos Szeredi, linux-kernel,
	linux-fsdevel

On Mon, Jan 01, 2007 at 11:47:06PM +0100, Mikulas Patocka wrote:
> >Anyway, cp -a is not the only application that wants to do hardlink
> >detection.
> 
> I tested programs for ino_t collision (I intentionally injected it) and 
> found that CP from coreutils 6.7 fails to copy directories but displays 
> error messages (coreutils 5 work fine). MC and ARJ skip directories with 
> colliding ino_t and pretend that operation completed successfuly. FTS 
> library fails to walk directories returning FTS_DC error. Diffutils, find, 
> grep fail to search directories with coliding inode numbers. Tar seems 
> tolerant except incremental backup (which I didn't try). All programs 
> except diff were tolerant to coliding ino_t on files.

Thanks for testing so many programs, but... did the files/symlinks with
colliding inode number have i_nlink > 1? Or did you also have directories
with colliding inode numbers. It looks like you've introduced hardlinked
directories in your test which are definitely not supported, in fact it
will probably cause not only issues for userspace programs, but also
locking and garbage collection issues in the kernel's dcache.

I'm surprised you're seeing so many problems. The only find problem that
I am aware of is the one where it assumes that there will be only
i_nlink-2 subdirectories in a given directory, this optimization can be
disabled with -noleaf. The only problems I've encountered with ino_t
collisions are archivers and other programs that recursively try to copy
a tree while preserving hardlinks. And in all cases these seem to have
no problem with such collisions as long as i_nlink == 1.

Jan

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-01 23:53               ` Jan Harkes
@ 2007-01-02  0:04                 ` Mikulas Patocka
  2007-01-03 18:58                   ` Frank van Maarseveen
  0 siblings, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-02  0:04 UTC (permalink / raw)
  To: Jan Harkes
  Cc: Pavel Machek, Arjan van de Ven, Miklos Szeredi, linux-kernel,
	linux-fsdevel

On Mon, 1 Jan 2007, Jan Harkes wrote:

> On Mon, Jan 01, 2007 at 11:47:06PM +0100, Mikulas Patocka wrote:
>>> Anyway, cp -a is not the only application that wants to do hardlink
>>> detection.
>>
>> I tested programs for ino_t collision (I intentionally injected it) and
>> found that CP from coreutils 6.7 fails to copy directories but displays
>> error messages (coreutils 5 work fine). MC and ARJ skip directories with
>> colliding ino_t and pretend that operation completed successfuly. FTS
>> library fails to walk directories returning FTS_DC error. Diffutils, find,
>> grep fail to search directories with coliding inode numbers. Tar seems
>> tolerant except incremental backup (which I didn't try). All programs
>> except diff were tolerant to coliding ino_t on files.
>
> Thanks for testing so many programs, but... did the files/symlinks with
> colliding inode number have i_nlink > 1? Or did you also have directories
> with colliding inode numbers. It looks like you've introduced hardlinked
> directories in your test which are definitely not supported, in fact it
> will probably cause not only issues for userspace programs, but also
> locking and garbage collection issues in the kernel's dcache.

I tested it only on files without hardlink (with i_nlink == 1) --- most 
programs (except diff) are tolerant to collision, they won't store st_ino 
in memory unless i_nlink > 1.

I didn't hardlink directories, I just patched stat, lstat and fstat to 
always return st_ino == 0 --- and I've seen those failures. These failures 
are going to happen on non-POSIX filesystems in real world too, very 
rarely.

BTW. POSIX supports (optionally) hardlinked directories but doesn't 
supoprt colliding st_ino --- so programs act according to POSIX --- but 
the problem is that this POSIX requirement no longer represents real world 
situation.

> I'm surprised you're seeing so many problems. The only find problem that
> I am aware of is the one where it assumes that there will be only
> i_nlink-2 subdirectories in a given directory, this optimization can be
> disabled with -noleaf.

This is not a bug but a feature. If filesystem doesn't count 
subdirectories, it should set directory's n_link to 1 and find will be ok.

> The only problems I've encountered with ino_t collisions are archivers 
> and other programs that recursively try to copy a tree while preserving 
> hardlinks. And in all cases these seem to have no problem with such 
> collisions as long as i_nlink == 1.

Yes, but they have big problems with directory ino_t collisions. They 
think that directories are hardlinked and skip processing them.

Mikulas

> Jan
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28 19:58                 ` Miklos Szeredi
@ 2007-01-02 19:15                   ` Pavel Machek
  2007-01-02 20:41                     ` Miklos Szeredi
  0 siblings, 1 reply; 100+ messages in thread
From: Pavel Machek @ 2007-01-02 19:15 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: bhalevy, arjan, mikulas, jaharkes, linux-kernel, linux-fsdevel, nfsv4

Hi!

> > >> It seems like the posix idea of unique <st_dev, st_ino> doesn't
> > >> hold water for modern file systems 
> > > 
> > > are you really sure?
> > 
> > Well Jan's example was of Coda that uses 128-bit internal file ids.
> > 
> > > and if so, why don't we fix *THAT* instead
> > 
> > Hmm, sometimes you can't fix the world, especially if the filesystem
> > is exported over NFS and has a problem with fitting its file IDs uniquely
> > into a 64-bit identifier.
> 
> Note, it's pretty easy to fit _anything_ into a 64-bit identifier with
> the use of a good hash function.  The chance of an accidental
> collision is infinitesimally small.  For a set of 
> 
>          100 files: 0.00000000000003%
>    1,000,000 files: 0.000003%

I do not think we want to play with probability like this. I mean...
imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
unreasonable, and collision probability is going to be ~100% due to
birthday paradox.

You'll still want to back up your 4TB server...

							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-02 19:15                   ` Pavel Machek
@ 2007-01-02 20:41                     ` Miklos Szeredi
  2007-01-02 20:50                       ` Mikulas Patocka
  2007-01-03 11:56                       ` Pavel Machek
  0 siblings, 2 replies; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-02 20:41 UTC (permalink / raw)
  To: pavel
  Cc: bhalevy, arjan, mikulas, jaharkes, linux-kernel, linux-fsdevel, nfsv4

> > > >> It seems like the posix idea of unique <st_dev, st_ino> doesn't
> > > >> hold water for modern file systems 
> > > > 
> > > > are you really sure?
> > > 
> > > Well Jan's example was of Coda that uses 128-bit internal file ids.
> > > 
> > > > and if so, why don't we fix *THAT* instead
> > > 
> > > Hmm, sometimes you can't fix the world, especially if the filesystem
> > > is exported over NFS and has a problem with fitting its file IDs uniquely
> > > into a 64-bit identifier.
> > 
> > Note, it's pretty easy to fit _anything_ into a 64-bit identifier with
> > the use of a good hash function.  The chance of an accidental
> > collision is infinitesimally small.  For a set of 
> > 
> >          100 files: 0.00000000000003%
> >    1,000,000 files: 0.000003%
> 
> I do not think we want to play with probability like this. I mean...
> imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
> unreasonable, and collision probability is going to be ~100% due to
> birthday paradox.
> 
> You'll still want to back up your 4TB server...

Certainly, but tar isn't going to remember all the inode numbers.
Even if you solve the storage requirements (not impossible) it would
have to do (4e9^2)/2=8e18 comparisons, which computers don't have
enough CPU power just yet.

It doesn't matter if there are collisions within the filesystem, as
long as there are no collisions between the set of files an
application is working on at the same time.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-02 20:41                     ` Miklos Szeredi
@ 2007-01-02 20:50                       ` Mikulas Patocka
  2007-01-02 21:10                         ` Miklos Szeredi
  2007-01-03 11:56                       ` Pavel Machek
  1 sibling, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-02 20:50 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, bhalevy, arjan, jaharkes, linux-kernel, linux-fsdevel, nfsv4



On Tue, 2 Jan 2007, Miklos Szeredi wrote:

>>>>>> It seems like the posix idea of unique <st_dev, st_ino> doesn't
>>>>>> hold water for modern file systems
>>>>>
>>>>> are you really sure?
>>>>
>>>> Well Jan's example was of Coda that uses 128-bit internal file ids.
>>>>
>>>>> and if so, why don't we fix *THAT* instead
>>>>
>>>> Hmm, sometimes you can't fix the world, especially if the filesystem
>>>> is exported over NFS and has a problem with fitting its file IDs uniquely
>>>> into a 64-bit identifier.
>>>
>>> Note, it's pretty easy to fit _anything_ into a 64-bit identifier with
>>> the use of a good hash function.  The chance of an accidental
>>> collision is infinitesimally small.  For a set of
>>>
>>>          100 files: 0.00000000000003%
>>>    1,000,000 files: 0.000003%
>>
>> I do not think we want to play with probability like this. I mean...
>> imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
>> unreasonable, and collision probability is going to be ~100% due to
>> birthday paradox.
>>
>> You'll still want to back up your 4TB server...
>
> Certainly, but tar isn't going to remember all the inode numbers.
> Even if you solve the storage requirements (not impossible) it would
> have to do (4e9^2)/2=8e18 comparisons, which computers don't have
> enough CPU power just yet.

It is remembering all inode numbers with nlink > 1 and many other tools 
are remembering all directory inode numbers (see my other post on this 
topic). It of course doesn't compare each number with all others, it is 
using hashing.

> It doesn't matter if there are collisions within the filesystem, as
> long as there are no collisions between the set of files an
> application is working on at the same time.

--- that are all files in case of backup.

> Miklos

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-02 20:50                       ` Mikulas Patocka
@ 2007-01-02 21:10                         ` Miklos Szeredi
  2007-01-02 21:37                           ` Mikulas Patocka
  0 siblings, 1 reply; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-02 21:10 UTC (permalink / raw)
  To: mikulas
  Cc: pavel, bhalevy, arjan, jaharkes, linux-kernel, linux-fsdevel, nfsv4

> > Certainly, but tar isn't going to remember all the inode numbers.
> > Even if you solve the storage requirements (not impossible) it would
> > have to do (4e9^2)/2=8e18 comparisons, which computers don't have
> > enough CPU power just yet.
> 
> It is remembering all inode numbers with nlink > 1 and many other tools 
> are remembering all directory inode numbers (see my other post on this 
> topic).

Don't you mean they are remembering all the inode numbers of the
directories _above_ the one they are currently working on?  I'm quite
sure they aren't remembering all the directories they have processed.

> It of course doesn't compare each number with all others, it is
> using hashing.

Yes, I didn't think of that.

> > It doesn't matter if there are collisions within the filesystem, as
> > long as there are no collisions between the set of files an
> > application is working on at the same time.
> 
> --- that are all files in case of backup.

No, it's usually working with a _single_ file at a time.  It will
remember inode numbers of files with nlink > 1, but it won't remember
all the other inode numbers.

You could have a filesystem with 4billion files, each one having two
links.  Not a likely scenario though.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-02 21:10                         ` Miklos Szeredi
@ 2007-01-02 21:37                           ` Mikulas Patocka
  0 siblings, 0 replies; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-02 21:37 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, bhalevy, arjan, jaharkes, linux-kernel, linux-fsdevel, nfsv4

>>> Certainly, but tar isn't going to remember all the inode numbers.
>>> Even if you solve the storage requirements (not impossible) it would
>>> have to do (4e9^2)/2=8e18 comparisons, which computers don't have
>>> enough CPU power just yet.
>>
>> It is remembering all inode numbers with nlink > 1 and many other tools
>> are remembering all directory inode numbers (see my other post on this
>> topic).
>
> Don't you mean they are remembering all the inode numbers of the
> directories _above_ the one they are currently working on?  I'm quite
> sure they aren't remembering all the directories they have processed.

cp -a is remembering all directory inodes it has visited, not just path 
from root. If you have two directories with the same inode number 
anywhere in the tree, it will skip one of them.

Mikulas

>> It of course doesn't compare each number with all others, it is
>> using hashing.
>
> Yes, I didn't think of that.
>
>>> It doesn't matter if there are collisions within the filesystem, as
>>> long as there are no collisions between the set of files an
>>> application is working on at the same time.
>>
>> --- that are all files in case of backup.
>
> No, it's usually working with a _single_ file at a time.  It will
> remember inode numbers of files with nlink > 1, but it won't remember
> all the other inode numbers.
>
> You could have a filesystem with 4billion files, each one having two
> links.  Not a likely scenario though.
>
> Miklos
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-30  1:04                   ` Mikulas Patocka
  2007-01-01  2:30                     ` Nikita Danilov
@ 2007-01-02 23:14                     ` Trond Myklebust
  2007-01-02 23:50                       ` Mikulas Patocka
  1 sibling, 1 reply; 100+ messages in thread
From: Trond Myklebust @ 2007-01-02 23:14 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

On Sat, 2006-12-30 at 02:04 +0100, Mikulas Patocka wrote:
> 
> On Fri, 29 Dec 2006, Trond Myklebust wrote:
> 
> > On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:
> >> Why don't you rip off the support for colliding inode number from the
> >> kernel at all (i.e. remove iget5_locked)?
> >>
> >> It's reasonable to have either no support for colliding ino_t or full
> >> support for that (including syscalls that userspace can use to work with
> >> such filesystem) --- but I don't see any point in having half-way support
> >> in kernel as is right now.
> >
> > What would ino_t have to do with inode numbers? It is only used as a
> > hash table lookup. The inode number is set in the ->getattr() callback.
> 
> The question is: why does the kernel contain iget5 function that looks up 
> according to callback, if the filesystem cannot have more than 64-bit 
> inode identifier?

Huh? The filesystem can have as large a damned identifier as it likes.
NFSv4 uses 128-byte filehandles, for instance.

POSIX filesystems are another matter. They can only have 64-bit
identifiers thanks to the requirement that inode numbers be 64-bit
unique and permanently stored, however Linux caters for a whole
truckload of filesystems which will never fit that label: look at all
those users of iunique(), for one...

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: Finding hardlinks
  2006-12-31 21:19                   ` Halevy, Benny
@ 2007-01-02 23:20                     ` Trond Myklebust
  2007-01-02 23:46                     ` Trond Myklebust
  1 sibling, 0 replies; 100+ messages in thread
From: Trond Myklebust @ 2007-01-02 23:20 UTC (permalink / raw)
  To: Halevy, Benny
  Cc: Jeff Layton, Mikulas Patocka, Arjan van de Ven, Jan Harkes,
	Miklos Szeredi, linux-kernel, linux-fsdevel, nfsv4

On Sun, 2006-12-31 at 16:19 -0500, Halevy, Benny wrote:
> Trond Myklebust wrote:
> >  
> > On Thu, 2006-12-28 at 17:12 +0200, Benny Halevy wrote:
> > 
> > > As an example, some file systems encode hint information into the filehandle
> > > and the hints may change over time, another example is encoding parent
> > > information into the filehandle and then handles representing hard links
> > > to the same file from different directories will differ.
> > 
> > Both these examples are bogus. Filehandle information should not change
> > over time (except in the special case of NFSv4 "volatile filehandles")
> > and they should definitely not encode parent directory information that
> > can change over time (think rename()!).
> > 
> > Cheers
> >   Trond
> > 
> 
> The first one is a real life example.  Hints in the filehandle change
> over time.  The old filehandles are valid indefinitely, until the file
> is deleted and hence can be considered permanent and not volatile.
> What you say above, however, contradicts the NFS protocol as I understand
> it. Here's some relevant text from the latest NFSv4.1 draft (the text in
> 4.2.1 below exists in in a similar form also in NFSv3, rfc1813)
> 
> | 4.2.1.  General Properties of a Filehandle
> | ...
> | If two filehandles from the same server are equal, they MUST refer to
> | the same file. Servers SHOULD try to maintain a one-to-one correspondence
> | between filehandles and files but this is not required. Clients MUST use
> | filehandle comparisons only to improve performance, not for correct behavior
> | All clients need to be prepared for situations in which it cannot be
> | determined whether two filehandles denote the same object and in such cases,
> | avoid making invalid assumptions which might cause incorrect behavior.
> | Further discussion of filehandle and attribute comparison in the context of
> | data caching is presented in the section "Data Caching and File Identity".
> ...
> | 9.3.4.  Data Caching and File Identity
> | ...
> |  When clients cache data, the file data needs to be organized according to
> | the file system object to which the data belongs. For NFS version 3 clients,
> | the typical practice has been to assume for the purpose of caching that
> | distinct filehandles represent distinct file system objects. The client then
> | has the choice to organize and maintain the data cache on this basis.
> |
> | In the NFS version 4 protocol, there is now the possibility to have
> | significant deviations from a "one filehandle per object" model because a
> | filehandle may be constructed on the basis of the object's pathname.
> | Therefore, clients need a reliable method to determine if two filehandles
> | designate the same file system object. If clients were simply to assume that
> | all distinct filehandles denote distinct objects and proceed to do data
> | caching on this basis, caching inconsistencies would arise between the
> | distinct client side objects which mapped to the same server side object.
> |
> | By providing a method to differentiate filehandles, the NFS version 4
> | protocol alleviates a potential functional regression in comparison with the
> | NFS version 3 protocol. Without this method, caching inconsistencies within
> | the same client could occur and this has not been present in previous
> | versions of the NFS protocol. Note that it is possible to have such
> | inconsistencies with applications executing on multiple clients but that is
> | not the issue being addressed here.
> |
> | For the purposes of data caching, the following steps allow an NFS version 4
> | client to determine whether two distinct filehandles denote the same server
> | side object:
> |
> |     * If GETATTR directed to two filehandles returns different values of the
> |       fsid attribute, then the filehandles represent distinct objects.
> |     * If GETATTR for any file with an fsid that matches the fsid of the two
> |       filehandles in question returns a unique_handles attribute with a value
> |       of TRUE, then the two objects are distinct.
> |     * If GETATTR directed to the two filehandles does not return the fileid
> |       attribute for both of the handles, then it cannot be determined whether
> |       the two objects are the same. Therefore, operations which depend on that
> |       knowledge (e.g. client side data caching) cannot be done reliably.
> |     * If GETATTR directed to the two filehandles returns different values for
> |       the fileid attribute, then they are distinct objects.
> |     * Otherwise they are the same object.

Nobody promised you that NFSv4 would be consistent. The above is a crock
of shit that carries over from RFC1813. fileid isn't even a mandatory
attribute in NFSv4.

> Even for NFSv3 (that doesn't have the unique_handles attribute I think
> that the linux nfs client can do a better job.  If you'd have a filehandle
> cache that points at inodes you could maintain a many to one relationship
> from multiple filehandles into one inode.  When you discover a new filehandle
> you can look up the inode cache for the same fileid and if one is found you
> can do a getattr on the old filehandle (without loss of generality you should 
> always use the latest filehandle that was returned for that filesystem object,
> although any filehandle that refers to it can be used).
> If the getattr succeeded then the filehandles refer to the same fs object and
> you can create a new entry in the filehandle cache pointing at that inode.
> Otherwise, if getattr says that the old filehandle is stale I think you should
> mark the inode as stale and keep it around so that applications can get an
> appropriate error until last close, before you clean up the fh cache from the
> stale filehandles. A new inode structure should be created for the new filehandle.

No! Read what I said in that last email again. The above crap is
precisely the algorithm I said is NOT wanted in the Linux client. We're
NOT going to do a bunch of totally unnecessary extra GETATTR calls in
order to cater to 1 or 2 servers out there that actually think the above
is a good idea. Not even Solaris does that IIRC.

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [nfsv4] RE: Finding hardlinks
  2006-12-31 21:25                       ` Halevy, Benny
@ 2007-01-02 23:21                         ` Trond Myklebust
  2007-01-03 12:35                           ` Benny Halevy
  0 siblings, 1 reply; 100+ messages in thread
From: Trond Myklebust @ 2007-01-02 23:21 UTC (permalink / raw)
  To: Halevy, Benny
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

On Sun, 2006-12-31 at 16:25 -0500, Halevy, Benny wrote:
> Trond Myklebust wrote:
> >  
> > On Thu, 2006-12-28 at 15:07 -0500, Halevy, Benny wrote:
> > > Mikulas Patocka wrote:
> > 
> > > >BTW. how does (or how should?) NFS client deal with cache coherency if 
> > > >filehandles for the same file differ?
> > > >
> > > 
> > > Trond can probably answer this better than me...
> > > As I read it, currently the nfs client matches both the fileid and the
> > > filehandle (in nfs_find_actor). This means that different filehandles
> > > for the same file would result in different inodes :(.
> > > Strictly following the nfs protocol, comparing only the fileid should
> > > be enough IF fileids are indeed unique within the filesystem.
> > > Comparing the filehandle works as a workaround when the exported filesystem
> > > (or the nfs server) violates that.  From a user stand point I think that
> > > this should be configurable, probably per mount point.
> > 
> > Matching files by fileid instead of filehandle is a lot more trouble
> > since fileids may be reused after a file has been deleted. Every time
> > you look up a file, and get a new filehandle for the same fileid, you
> > would at the very least have to do another GETATTR using one of the
> > 'old' filehandles in order to ensure that the file is the same object as
> > the one you have cached. Then there is the issue of what to do when you
> > open(), read() or write() to the file: which filehandle do you use, are
> > the access permissions the same for all filehandles, ...
> > 
> > All in all, much pain for little or no gain.
> 
> See my answer to your previous reply.  It seems like the current
> implementation is in violation of the nfs protocol and the extra pain
> is required.

...and we should care because...?

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: Finding hardlinks
  2006-12-31 21:19                   ` Halevy, Benny
  2007-01-02 23:20                     ` Trond Myklebust
@ 2007-01-02 23:46                     ` Trond Myklebust
  1 sibling, 0 replies; 100+ messages in thread
From: Trond Myklebust @ 2007-01-02 23:46 UTC (permalink / raw)
  To: Halevy, Benny
  Cc: Jeff Layton, Mikulas Patocka, Arjan van de Ven, Jan Harkes,
	Miklos Szeredi, linux-kernel, linux-fsdevel, nfsv4

On Sun, 2006-12-31 at 16:19 -0500, Halevy, Benny wrote:

> Even for NFSv3 (that doesn't have the unique_handles attribute I think
> that the linux nfs client can do a better job.  If you'd have a filehandle
> cache that points at inodes you could maintain a many to one relationship
> from multiple filehandles into one inode.  When you discover a new filehandle
> you can look up the inode cache for the same fileid and if one is found you
> can do a getattr on the old filehandle (without loss of generality you should 
> always use the latest filehandle that was returned for that filesystem object,
> although any filehandle that refers to it can be used).
> If the getattr succeeded then the filehandles refer to the same fs object and
> you can create a new entry in the filehandle cache pointing at that inode.
> Otherwise, if getattr says that the old filehandle is stale I think you should
> mark the inode as stale and keep it around so that applications can get an
> appropriate error until last close, before you clean up the fh cache from the
> stale filehandles. A new inode structure should be created for the new filehandle.

There are, BTW, other reasons why the above is a bad idea: it breaks on
a bunch of well known servers. Look back at the 2.2.x kernels and the
kind of hacks they had in order to deal with crap like the Netapp
'.snapshot' directories which contain files with duplicate fileids that
do not represent hard links, but rather represent previous revisions of
the same file.

That kind of hackery was part of the reason why I ripped out that code.
The other reasons were
        - that you end up playing unnecessary getattr games like the
        above for little gain.
        - the only servers that implemented the above were borken pieces
        of crap that encoded parent directories in the filehandle, and
        which end up breaking anyway under cross-directory renames.
        - the world is filled with non-posix filesystems that frequently
        don't have real fileids. They are often just generated on the
        fly and can change at the drop of a hat.

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-02 23:14                     ` Trond Myklebust
@ 2007-01-02 23:50                       ` Mikulas Patocka
  0 siblings, 0 replies; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-02 23:50 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

On Wed, 3 Jan 2007, Trond Myklebust wrote:

> On Sat, 2006-12-30 at 02:04 +0100, Mikulas Patocka wrote:
>>
>> On Fri, 29 Dec 2006, Trond Myklebust wrote:
>>
>>> On Thu, 2006-12-28 at 19:14 +0100, Mikulas Patocka wrote:
>>>> Why don't you rip off the support for colliding inode number from the
>>>> kernel at all (i.e. remove iget5_locked)?
>>>>
>>>> It's reasonable to have either no support for colliding ino_t or full
>>>> support for that (including syscalls that userspace can use to work with
>>>> such filesystem) --- but I don't see any point in having half-way support
>>>> in kernel as is right now.
>>>
>>> What would ino_t have to do with inode numbers? It is only used as a
>>> hash table lookup. The inode number is set in the ->getattr() callback.
>>
>> The question is: why does the kernel contain iget5 function that looks up
>> according to callback, if the filesystem cannot have more than 64-bit
>> inode identifier?
>
> Huh? The filesystem can have as large a damned identifier as it likes.
> NFSv4 uses 128-byte filehandles, for instance.

But then it needs some other syscall to let applications determine 
hardlinks --- which was the initial topic in this thread.

> POSIX filesystems are another matter. They can only have 64-bit
> identifiers thanks to the requirement that inode numbers be 64-bit
> unique and permanently stored, however Linux caters for a whole
> truckload of filesystems which will never fit that label: look at all
> those users of iunique(), for one...

I see them. The bad thing is that many programmers read POSIX, write 
programs as if POSIX specification was true and these programs break 
randomly on non-POSIX filesystem. Each non-POSIX filesystem invents st_ino 
on its own, trying to minimize hash collision, making the failure even 
less probable and worse to find.

The current situation is (for example) that cp does stat(), open(), 
fstat() and compares st_ino/st_dev --- if they mismatch, it writes error 
and doesn't copy files --- so if kernel removes the inode from cache 
between stat() and open() and filesystem uses iunique(), cp will fail.

What utilities should the user use on those non-POSIX filesystems, if not 
cp?

Probably some file-handling guidelines should be specified and written to 
Documentation/ as a form of standard that can appliaction programmers use.

Mikulas

> Trond
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-02 20:41                     ` Miklos Szeredi
  2007-01-02 20:50                       ` Mikulas Patocka
@ 2007-01-03 11:56                       ` Pavel Machek
  2007-01-03 12:33                         ` Miklos Szeredi
  1 sibling, 1 reply; 100+ messages in thread
From: Pavel Machek @ 2007-01-03 11:56 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: bhalevy, arjan, mikulas, jaharkes, linux-kernel, linux-fsdevel, nfsv4

Hi!

> > > the use of a good hash function.  The chance of an accidental
> > > collision is infinitesimally small.  For a set of 
> > > 
> > >          100 files: 0.00000000000003%
> > >    1,000,000 files: 0.000003%
> > 
> > I do not think we want to play with probability like this. I mean...
> > imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
> > unreasonable, and collision probability is going to be ~100% due to
> > birthday paradox.
> > 
> > You'll still want to back up your 4TB server...
> 
> Certainly, but tar isn't going to remember all the inode numbers.
> Even if you solve the storage requirements (not impossible) it would
> have to do (4e9^2)/2=8e18 comparisons, which computers don't have
> enough CPU power just yet.

Storage requirements would be 16GB of RAM... that's small enough. If
you sort, you'll only need 32*2^32 comparisons, and that's doable.

I do not claim it is _likely_. You'd need hardlinks, as you
noticed. But system should work, not "work with high probability", and
I believe we should solve this in long term.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 11:56                       ` Pavel Machek
@ 2007-01-03 12:33                         ` Miklos Szeredi
  2007-01-03 12:42                           ` Pavel Machek
                                             ` (2 more replies)
  0 siblings, 3 replies; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-03 12:33 UTC (permalink / raw)
  To: pavel
  Cc: bhalevy, arjan, mikulas, jaharkes, linux-kernel, linux-fsdevel, nfsv4

> > > > the use of a good hash function.  The chance of an accidental
> > > > collision is infinitesimally small.  For a set of 
> > > > 
> > > >          100 files: 0.00000000000003%
> > > >    1,000,000 files: 0.000003%
> > > 
> > > I do not think we want to play with probability like this. I mean...
> > > imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
> > > unreasonable, and collision probability is going to be ~100% due to
> > > birthday paradox.
> > > 
> > > You'll still want to back up your 4TB server...
> > 
> > Certainly, but tar isn't going to remember all the inode numbers.
> > Even if you solve the storage requirements (not impossible) it would
> > have to do (4e9^2)/2=8e18 comparisons, which computers don't have
> > enough CPU power just yet.
> 
> Storage requirements would be 16GB of RAM... that's small enough. If
> you sort, you'll only need 32*2^32 comparisons, and that's doable.
> 
> I do not claim it is _likely_. You'd need hardlinks, as you
> noticed. But system should work, not "work with high probability", and
> I believe we should solve this in long term.

High probability is all you have.  Cosmic radiation hitting your
computer will more likly cause problems, than colliding 64bit inode
numbers ;)

But you could add a new interface for the extra paranoid.  The
proposed 'samefile(fd1, fd2)' syscall is severly limited by the heavy
weight of file descriptors.

Another idea is to export the filesystem internal ID as an arbitray
length cookie through the extended attribute interface.  That could be
stored/compared by the filesystem quite efficiently.

But I think most apps will still opt for the portable intefaces which
while not perfect, are "good enough".

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-02 23:21                         ` Trond Myklebust
@ 2007-01-03 12:35                           ` Benny Halevy
  2007-01-04  0:43                             ` Trond Myklebust
  2007-01-04  8:36                             ` Trond Myklebust
  0 siblings, 2 replies; 100+ messages in thread
From: Benny Halevy @ 2007-01-03 12:35 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

Trond Myklebust wrote:
> On Sun, 2006-12-31 at 16:25 -0500, Halevy, Benny wrote:
>> Trond Myklebust wrote:
>>>  
>>> On Thu, 2006-12-28 at 15:07 -0500, Halevy, Benny wrote:
>>>> Mikulas Patocka wrote:
>>>>> BTW. how does (or how should?) NFS client deal with cache coherency if 
>>>>> filehandles for the same file differ?
>>>>>
>>>> Trond can probably answer this better than me...
>>>> As I read it, currently the nfs client matches both the fileid and the
>>>> filehandle (in nfs_find_actor). This means that different filehandles
>>>> for the same file would result in different inodes :(.
>>>> Strictly following the nfs protocol, comparing only the fileid should
>>>> be enough IF fileids are indeed unique within the filesystem.
>>>> Comparing the filehandle works as a workaround when the exported filesystem
>>>> (or the nfs server) violates that.  From a user stand point I think that
>>>> this should be configurable, probably per mount point.
>>> Matching files by fileid instead of filehandle is a lot more trouble
>>> since fileids may be reused after a file has been deleted. Every time
>>> you look up a file, and get a new filehandle for the same fileid, you
>>> would at the very least have to do another GETATTR using one of the
>>> 'old' filehandles in order to ensure that the file is the same object as
>>> the one you have cached. Then there is the issue of what to do when you
>>> open(), read() or write() to the file: which filehandle do you use, are
>>> the access permissions the same for all filehandles, ...
>>>
>>> All in all, much pain for little or no gain.
>> See my answer to your previous reply.  It seems like the current
>> implementation is in violation of the nfs protocol and the extra pain
>> is required.
> 
> ...and we should care because...?
> 
> Trond
> 

Believe it or not, but server companies like Panasas try to follow the standard
when designing and implementing their products while relying on client vendors
to do the same.

I sincerely expect you or anybody else for this matter to try to provide
feedback and object to the protocol specification in case they disagree
with it (or think it's ambiguous or self contradicting) rather than ignoring
it and implementing something else. I think we're shooting ourselves in the
foot when doing so and it is in our common interest to strive to reach a
realistic standard we can all comply with and interoperate with each other.

Benny


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 12:33                         ` Miklos Szeredi
@ 2007-01-03 12:42                           ` Pavel Machek
  2007-01-11 23:43                             ` Denis Vlasenko
  2007-01-03 12:45                           ` Martin Mares
  2007-01-03 13:54                           ` Matthew Wilcox
  2 siblings, 1 reply; 100+ messages in thread
From: Pavel Machek @ 2007-01-03 12:42 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: bhalevy, arjan, mikulas, jaharkes, linux-kernel, linux-fsdevel, nfsv4

Hi!

> > > > > the use of a good hash function.  The chance of an accidental
> > > > > collision is infinitesimally small.  For a set of 
> > > > > 
> > > > >          100 files: 0.00000000000003%
> > > > >    1,000,000 files: 0.000003%
> > > > 
> > > > I do not think we want to play with probability like this. I mean...
> > > > imagine 4G files, 1KB each. That's 4TB disk space, not _completely_
> > > > unreasonable, and collision probability is going to be ~100% due to
> > > > birthday paradox.
> > > > 
> > > > You'll still want to back up your 4TB server...
> > > 
> > > Certainly, but tar isn't going to remember all the inode numbers.
> > > Even if you solve the storage requirements (not impossible) it would
> > > have to do (4e9^2)/2=8e18 comparisons, which computers don't have
> > > enough CPU power just yet.
> > 
> > Storage requirements would be 16GB of RAM... that's small enough. If
> > you sort, you'll only need 32*2^32 comparisons, and that's doable.
> > 
> > I do not claim it is _likely_. You'd need hardlinks, as you
> > noticed. But system should work, not "work with high probability", and
> > I believe we should solve this in long term.
> 
> High probability is all you have.  Cosmic radiation hitting your
> computer will more likly cause problems, than colliding 64bit inode
> numbers ;)

As I have shown... no, that's not right. 32*2^32 operations is small
enough not to have problems with cosmic radiation.

> But you could add a new interface for the extra paranoid.  The
> proposed 'samefile(fd1, fd2)' syscall is severly limited by the heavy
> weight of file descriptors.

I guess that is the way to go. samefile(path1, path2) is unfortunately
inherently racy.

> Another idea is to export the filesystem internal ID as an arbitray
> length cookie through the extended attribute interface.  That could be
> stored/compared by the filesystem quite efficiently.

How will that work for FAT?

Or maybe we can relax that "inode may not change over rename" and
"zero length files need unique inode numbers"...

								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 12:33                         ` Miklos Szeredi
  2007-01-03 12:42                           ` Pavel Machek
@ 2007-01-03 12:45                           ` Martin Mares
  2007-01-03 13:54                           ` Matthew Wilcox
  2 siblings, 0 replies; 100+ messages in thread
From: Martin Mares @ 2007-01-03 12:45 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

Hello!

> High probability is all you have.  Cosmic radiation hitting your
> computer will more likly cause problems, than colliding 64bit inode
> numbers ;)

No.

If you assign 64-bit inode numbers randomly, 2^32 of them are sufficient
to generate a collision with probability around 50%.

				Have a nice fortnight
-- 
Martin `MJ' Mares                          <mj@ucw.cz>   http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
A Bash poem: time for echo in canyon; do echo $echo $echo; done

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 12:33                         ` Miklos Szeredi
  2007-01-03 12:42                           ` Pavel Machek
  2007-01-03 12:45                           ` Martin Mares
@ 2007-01-03 13:54                           ` Matthew Wilcox
  2007-01-03 15:51                             ` Miklos Szeredi
  2 siblings, 1 reply; 100+ messages in thread
From: Matthew Wilcox @ 2007-01-03 13:54 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

On Wed, Jan 03, 2007 at 01:33:31PM +0100, Miklos Szeredi wrote:
> High probability is all you have.  Cosmic radiation hitting your
> computer will more likly cause problems, than colliding 64bit inode
> numbers ;)

Some of us have machines designed to cope with cosmic rays, and would be
unimpressed with a decrease in reliability.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 13:54                           ` Matthew Wilcox
@ 2007-01-03 15:51                             ` Miklos Szeredi
  2007-01-03 19:04                               ` Mikulas Patocka
  2007-01-04 22:59                               ` Pavel Machek
  0 siblings, 2 replies; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-03 15:51 UTC (permalink / raw)
  To: matthew
  Cc: pavel, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

> > High probability is all you have.  Cosmic radiation hitting your
> > computer will more likly cause problems, than colliding 64bit inode
> > numbers ;)
> 
> Some of us have machines designed to cope with cosmic rays, and would be
> unimpressed with a decrease in reliability.

With the suggested samefile() interface you'd get a failure with just
about 100% reliability for any application which needs to compare a
more than a few files.  The fact is open files are _very_ expensive,
no wonder they are limited in various ways.

What should 'tar' do when it runs out of open files, while searching
for hardlinks?  Should it just give up?  Then the samefile() interface
would be _less_ reliable than the st_ino one by a significant margin.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-02  0:04                 ` Mikulas Patocka
@ 2007-01-03 18:58                   ` Frank van Maarseveen
  2007-01-03 19:17                     ` Mikulas Patocka
  2007-01-03 21:09                     ` Bryan Henderson
  0 siblings, 2 replies; 100+ messages in thread
From: Frank van Maarseveen @ 2007-01-03 18:58 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jan Harkes, Pavel Machek, Arjan van de Ven, Miklos Szeredi,
	linux-kernel, linux-fsdevel

On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote:
> 
> I didn't hardlink directories, I just patched stat, lstat and fstat to 
> always return st_ino == 0 --- and I've seen those failures. These failures 
> are going to happen on non-POSIX filesystems in real world too, very 
> rarely.

I don't want to spoil your day but testing with st_ino==0 is a bad choice
because it is a special number. Anyway, one can only find breakage,
not prove that all the other programs handle this correctly so this is
kind of pointless.

On any decent filesystem st_ino should uniquely identify an object and
reliably provide hardlink information. The UNIX world has relied upon this
for decades. A filesystem with st_ino collisions without being hardlinked
(or the other way around) needs a fix.

Synthetic filesystems such as /proc are special due to their dynamic
nature and I think st_ino uniqueness is far more important than being able
to provide hardlinks there. Most tree handling programs ("cp", "rm", ...)
break horribly when the tree underneath changes at the same time.

-- 
Frank

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 15:51                             ` Miklos Szeredi
@ 2007-01-03 19:04                               ` Mikulas Patocka
  2007-01-04 22:59                               ` Pavel Machek
  1 sibling, 0 replies; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-03 19:04 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: matthew, pavel, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4



On Wed, 3 Jan 2007, Miklos Szeredi wrote:

>>> High probability is all you have.  Cosmic radiation hitting your
>>> computer will more likly cause problems, than colliding 64bit inode
>>> numbers ;)
>>
>> Some of us have machines designed to cope with cosmic rays, and would be
>> unimpressed with a decrease in reliability.
>
> With the suggested samefile() interface you'd get a failure with just
> about 100% reliability for any application which needs to compare a
> more than a few files.  The fact is open files are _very_ expensive,
> no wonder they are limited in various ways.
>
> What should 'tar' do when it runs out of open files, while searching
> for hardlinks?  Should it just give up?  Then the samefile() interface
> would be _less_ reliable than the st_ino one by a significant margin.

You could do samefile() for paths --- as for races --- it doesn't matter 
in this scenario, it is no more racy than stat or lstat.

Mikulas

> Miklos
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 18:58                   ` Frank van Maarseveen
@ 2007-01-03 19:17                     ` Mikulas Patocka
  2007-01-03 19:26                       ` Frank van Maarseveen
  2007-01-03 21:09                     ` Bryan Henderson
  1 sibling, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-03 19:17 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Jan Harkes, Pavel Machek, Arjan van de Ven, Miklos Szeredi,
	linux-kernel, linux-fsdevel



On Wed, 3 Jan 2007, Frank van Maarseveen wrote:

> On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote:
>>
>> I didn't hardlink directories, I just patched stat, lstat and fstat to
>> always return st_ino == 0 --- and I've seen those failures. These failures
>> are going to happen on non-POSIX filesystems in real world too, very
>> rarely.
>
> I don't want to spoil your day but testing with st_ino==0 is a bad choice
> because it is a special number. Anyway, one can only find breakage,
> not prove that all the other programs handle this correctly so this is
> kind of pointless.
>
> On any decent filesystem st_ino should uniquely identify an object and
> reliably provide hardlink information. The UNIX world has relied upon this
> for decades. A filesystem with st_ino collisions without being hardlinked
> (or the other way around) needs a fix.

... and that's the problem --- the UNIX world specified something that 
isn't implementable in real world.

You can take a closed box and say "this is POSIX cerified" --- but how 
useful such box could be, if you can't access CDs, diskettes and USB 
sticks with it?

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 19:17                     ` Mikulas Patocka
@ 2007-01-03 19:26                       ` Frank van Maarseveen
  2007-01-03 19:31                         ` Mikulas Patocka
  0 siblings, 1 reply; 100+ messages in thread
From: Frank van Maarseveen @ 2007-01-03 19:26 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jan Harkes, Pavel Machek, Arjan van de Ven, Miklos Szeredi,
	linux-kernel, linux-fsdevel

On Wed, Jan 03, 2007 at 08:17:34PM +0100, Mikulas Patocka wrote:
> 
> On Wed, 3 Jan 2007, Frank van Maarseveen wrote:
> 
> >On Tue, Jan 02, 2007 at 01:04:06AM +0100, Mikulas Patocka wrote:
> >>
> >>I didn't hardlink directories, I just patched stat, lstat and fstat to
> >>always return st_ino == 0 --- and I've seen those failures. These failures
> >>are going to happen on non-POSIX filesystems in real world too, very
> >>rarely.
> >
> >I don't want to spoil your day but testing with st_ino==0 is a bad choice
> >because it is a special number. Anyway, one can only find breakage,
> >not prove that all the other programs handle this correctly so this is
> >kind of pointless.
> >
> >On any decent filesystem st_ino should uniquely identify an object and
> >reliably provide hardlink information. The UNIX world has relied upon this
> >for decades. A filesystem with st_ino collisions without being hardlinked
> >(or the other way around) needs a fix.
> 
> ... and that's the problem --- the UNIX world specified something that 
> isn't implementable in real world.

Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
inode number space in 64 bit (of course it is a matter of time for it to
jump to 128 bit and more)

-- 
Frank

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 19:26                       ` Frank van Maarseveen
@ 2007-01-03 19:31                         ` Mikulas Patocka
  2007-01-03 20:26                           ` Frank van Maarseveen
  2007-01-03 22:30                           ` Pavel Machek
  0 siblings, 2 replies; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-03 19:31 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Jan Harkes, Pavel Machek, Arjan van de Ven, Miklos Szeredi,
	linux-kernel, linux-fsdevel

>>>> I didn't hardlink directories, I just patched stat, lstat and fstat to
>>>> always return st_ino == 0 --- and I've seen those failures. These failures
>>>> are going to happen on non-POSIX filesystems in real world too, very
>>>> rarely.
>>>
>>> I don't want to spoil your day but testing with st_ino==0 is a bad choice
>>> because it is a special number. Anyway, one can only find breakage,
>>> not prove that all the other programs handle this correctly so this is
>>> kind of pointless.
>>>
>>> On any decent filesystem st_ino should uniquely identify an object and
>>> reliably provide hardlink information. The UNIX world has relied upon this
>>> for decades. A filesystem with st_ino collisions without being hardlinked
>>> (or the other way around) needs a fix.
>>
>> ... and that's the problem --- the UNIX world specified something that
>> isn't implementable in real world.
>
> Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
> inode number space in 64 bit (of course it is a matter of time for it to
> jump to 128 bit and more)

If the filesystem was designed by someone not from Unix world (FAT, SMB, 
...), then not. And users still want to access these filesystems.

64-bit inode numbers space is not yet implemented on Linux --- the problem 
is that if you return ino >= 2^32, programs compiled without 
-D_FILE_OFFSET_BITS=64 will fail with stat() returning -EOVERFLOW --- this 
failure is specified in POSIX, but not very useful.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 19:31                         ` Mikulas Patocka
@ 2007-01-03 20:26                           ` Frank van Maarseveen
  2007-01-12  0:00                             ` Denis Vlasenko
  2007-01-03 22:30                           ` Pavel Machek
  1 sibling, 1 reply; 100+ messages in thread
From: Frank van Maarseveen @ 2007-01-03 20:26 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jan Harkes, Pavel Machek, Arjan van de Ven, Miklos Szeredi,
	linux-kernel, linux-fsdevel

On Wed, Jan 03, 2007 at 08:31:32PM +0100, Mikulas Patocka wrote:
> >>>>I didn't hardlink directories, I just patched stat, lstat and fstat to
> >>>>always return st_ino == 0 --- and I've seen those failures. These 
> >>>>failures
> >>>>are going to happen on non-POSIX filesystems in real world too, very
> >>>>rarely.
> >>>
> >>>I don't want to spoil your day but testing with st_ino==0 is a bad choice
> >>>because it is a special number. Anyway, one can only find breakage,
> >>>not prove that all the other programs handle this correctly so this is
> >>>kind of pointless.
> >>>
> >>>On any decent filesystem st_ino should uniquely identify an object and
> >>>reliably provide hardlink information. The UNIX world has relied upon 
> >>>this
> >>>for decades. A filesystem with st_ino collisions without being hardlinked
> >>>(or the other way around) needs a fix.
> >>
> >>... and that's the problem --- the UNIX world specified something that
> >>isn't implementable in real world.
> >
> >Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
> >inode number space in 64 bit (of course it is a matter of time for it to
> >jump to 128 bit and more)
> 
> If the filesystem was designed by someone not from Unix world (FAT, SMB, 
> ...), then not. And users still want to access these filesystems.

They can. Hey, it's not perfect but who expects FAT/SMB to be "perfect" anyway?

> 
> 64-bit inode numbers space is not yet implemented on Linux --- the problem 
> is that if you return ino >= 2^32, programs compiled without 
> -D_FILE_OFFSET_BITS=64 will fail with stat() returning -EOVERFLOW --- this 
> failure is specified in POSIX, but not very useful.

hmm, checking iunique(), ino_t, __kernel_ino_t... I see. Pity. So at
some point in time we may need a sort of "ino64" mount option to be
able to switch to a 64 bit number space on mount basis. Or (conversely)
refuse to mount without that option if we know there are >32 bit st_ino
out there. And invent iunique64() and use that when "ino64" specified
for FAT/SMB/...  when those filesystems haven't been replaced by a
successor by that time.

At that time probably all programs are either compiled with
-D_FILE_OFFSET_BITS=64 (most already are because of files bigger than 2G)
or completely 64 bit. 

-- 
Frank

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 18:58                   ` Frank van Maarseveen
  2007-01-03 19:17                     ` Mikulas Patocka
@ 2007-01-03 21:09                     ` Bryan Henderson
  2007-01-03 22:01                       ` Frank van Maarseveen
  1 sibling, 1 reply; 100+ messages in thread
From: Bryan Henderson @ 2007-01-03 21:09 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Arjan van de Ven, Jan Harkes, linux-fsdevel, linux-kernel,
	Miklos Szeredi, Mikulas Patocka, Pavel Machek

>On any decent filesystem st_ino should uniquely identify an object and
>reliably provide hardlink information. The UNIX world has relied upon 
this
>for decades. A filesystem with st_ino collisions without being hardlinked
>(or the other way around) needs a fix.

But for at least the last of those decades, filesystems that could not do 
that were not uncommon.  They had to present 32 bit inode numbers and 
either allowed more than 4G files or just didn't have the means of 
assigning inode numbers with the proper uniqueness to files.  And the sky 
did not fall.  I don't have an explanation why, but it makes it look to me 
like there are worse things than not having total one-one correspondence 
between inode numbers and files.  Having a stat or mount fail because 
inodes are too big, having fewer than 4G files, and waiting for the 
filesystem to generate a suitable inode number might fall in that 
category.

I fully agree that much effort should be put into making inode numbers 
work the way POSIX demands, but I also know that that sometimes requires 
more than just writing some code.

--
Bryan Henderson                               San Jose California
IBM Almaden Research Center                   Filesystems


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 21:09                     ` Bryan Henderson
@ 2007-01-03 22:01                       ` Frank van Maarseveen
  2007-01-03 23:43                         ` Mikulas Patocka
  0 siblings, 1 reply; 100+ messages in thread
From: Frank van Maarseveen @ 2007-01-03 22:01 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: Arjan van de Ven, Jan Harkes, linux-fsdevel, linux-kernel,
	Miklos Szeredi, Mikulas Patocka, Pavel Machek

On Wed, Jan 03, 2007 at 01:09:41PM -0800, Bryan Henderson wrote:
> >On any decent filesystem st_ino should uniquely identify an object and
> >reliably provide hardlink information. The UNIX world has relied upon 
> this
> >for decades. A filesystem with st_ino collisions without being hardlinked
> >(or the other way around) needs a fix.
> 
> But for at least the last of those decades, filesystems that could not do 
> that were not uncommon.  They had to present 32 bit inode numbers and 
> either allowed more than 4G files or just didn't have the means of 
> assigning inode numbers with the proper uniqueness to files.  And the sky 
> did not fall.  I don't have an explanation why,

I think it's mostly high end use and high end users tend to understand
more. But we're going to see more really large filesystems in "normal"
use so..

Currently, large file support is already necessary to handle dvd and
video. It's also useful for images for virtualization. So the failing stat()
calls should already be a thing of the past with modern distributions.

-- 
Frank

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 19:31                         ` Mikulas Patocka
  2007-01-03 20:26                           ` Frank van Maarseveen
@ 2007-01-03 22:30                           ` Pavel Machek
  1 sibling, 0 replies; 100+ messages in thread
From: Pavel Machek @ 2007-01-03 22:30 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Frank van Maarseveen, Jan Harkes, Arjan van de Ven,
	Miklos Szeredi, linux-kernel, linux-fsdevel

Hi!

> >Sure it is. Numerous popular POSIX filesystems do that. There is a lot of
> >inode number space in 64 bit (of course it is a matter of time for it to
> >jump to 128 bit and more)
> 
> If the filesystem was designed by someone not from Unix world (FAT, SMB, 
> ...), then not. And users still want to access these filesystems.
> 
> 64-bit inode numbers space is not yet implemented on Linux --- the problem 
> is that if you return ino >= 2^32, programs compiled without 
> -D_FILE_OFFSET_BITS=64 will fail with stat() returning -EOVERFLOW --- this 
> failure is specified in POSIX, but not very useful.

Hehe, can we simply -EOVERFLOW on VFAT all the time? ...probably not
useful :-(. But ability to say "unknown" in st_ino field would
help....

								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 22:01                       ` Frank van Maarseveen
@ 2007-01-03 23:43                         ` Mikulas Patocka
  2007-01-04  0:12                           ` Frank van Maarseveen
  0 siblings, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-03 23:43 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Bryan Henderson, Arjan van de Ven, Jan Harkes, linux-fsdevel,
	linux-kernel, Miklos Szeredi, Pavel Machek

On Wed, 3 Jan 2007, Frank van Maarseveen wrote:

> On Wed, Jan 03, 2007 at 01:09:41PM -0800, Bryan Henderson wrote:
>>> On any decent filesystem st_ino should uniquely identify an object and
>>> reliably provide hardlink information. The UNIX world has relied upon
>> this
>>> for decades. A filesystem with st_ino collisions without being hardlinked
>>> (or the other way around) needs a fix.
>>
>> But for at least the last of those decades, filesystems that could not do
>> that were not uncommon.  They had to present 32 bit inode numbers and
>> either allowed more than 4G files or just didn't have the means of
>> assigning inode numbers with the proper uniqueness to files.  And the sky
>> did not fall.  I don't have an explanation why,
>
> I think it's mostly high end use and high end users tend to understand
> more. But we're going to see more really large filesystems in "normal"
> use so..
>
> Currently, large file support is already necessary to handle dvd and
> video. It's also useful for images for virtualization. So the failing stat()
> calls should already be a thing of the past with modern distributions.

As long as glibc compiles by default with 32-bit ino_t, the problem exists 
and is severe --- programs handling large files, such as coreutils, tar, 
mc, mplayer, already compile with 64-bit ino_t and off_t, but the user (or 
script) may type something like:

cat >file.c <<EOF
#include <sys/types.h>
#include <sys/stat.h>
main()
{
 	int h;
 	struct stat st;
 	if ((h = creat("foo", 0600)) < 0) perror("creat"), exit(1);
 	if (fstat(h, &st)) perror("stat"), exit(1);
 	close(h);
 	return 0;
}
EOF
gcc file.c; ./a.out

--- and you certainly do not want this to fail (unless you are out of disk 
space).

The difference is, that with 32-bit program and 64-bit off_t, you get 
deterministic failure on large files, with 32-bit program and 64-bit 
ino_t, you get random failures.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 23:43                         ` Mikulas Patocka
@ 2007-01-04  0:12                           ` Frank van Maarseveen
  2007-01-08  6:19                             ` Mikulas Patocka
  0 siblings, 1 reply; 100+ messages in thread
From: Frank van Maarseveen @ 2007-01-04  0:12 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Bryan Henderson, Arjan van de Ven, Jan Harkes, linux-fsdevel,
	linux-kernel, Miklos Szeredi, Pavel Machek

On Thu, Jan 04, 2007 at 12:43:20AM +0100, Mikulas Patocka wrote:
> On Wed, 3 Jan 2007, Frank van Maarseveen wrote:
> >Currently, large file support is already necessary to handle dvd and
> >video. It's also useful for images for virtualization. So the failing 
> >stat()
> >calls should already be a thing of the past with modern distributions.
> 
> As long as glibc compiles by default with 32-bit ino_t, the problem exists 
> and is severe --- programs handling large files, such as coreutils, tar, 
> mc, mplayer, already compile with 64-bit ino_t and off_t, but the user (or 
> script) may type something like:
> 
> cat >file.c <<EOF
> #include <sys/types.h>
> #include <sys/stat.h>
> main()
> {
> 	int h;
> 	struct stat st;
> 	if ((h = creat("foo", 0600)) < 0) perror("creat"), exit(1);
> 	if (fstat(h, &st)) perror("stat"), exit(1);
> 	close(h);
> 	return 0;
> }
> EOF
> gcc file.c; ./a.out
> 
> --- and you certainly do not want this to fail (unless you are out of disk 
> space).
> 
> The difference is, that with 32-bit program and 64-bit off_t, you get 
> deterministic failure on large files, with 32-bit program and 64-bit 
> ino_t, you get random failures.

What's (technically) the problem with changing the gcc default?

Alternatively we could make the error deterministic in various ways. Start
st_ino numbering from 4G (except for a few special ones maybe such
as root/mounts). Or make old and new programs look differently at the
ELF level or by sys_personality() and/or check against a "ino64" mount
flag/filesystem feature. Lots of possibilities.

-- 
Frank

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-03 12:35                           ` Benny Halevy
@ 2007-01-04  0:43                             ` Trond Myklebust
  2007-01-04  8:36                             ` Trond Myklebust
  1 sibling, 0 replies; 100+ messages in thread
From: Trond Myklebust @ 2007-01-04  0:43 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

On Wed, 2007-01-03 at 14:35 +0200, Benny Halevy wrote:
> Believe it or not, but server companies like Panasas try to follow the standard
> when designing and implementing their products while relying on client vendors
> to do the same.

I personally have never given a rats arse about "standards" if they make
no sense to me. If the server is capable of knowing about hard links,
then why does it need all this extra crap in the filehandle that just
obfuscates the hard link info?

The bottom line is that nothing in our implementation will result in
such a server performing sub-optimally w.r.t. the client. The only
result is that we will conform to close-to-open semantics instead of
strict POSIX caching semantics when two processes have opened the same
file via different hard links.

> I sincerely expect you or anybody else for this matter to try to provide
> feedback and object to the protocol specification in case they disagree
> with it (or think it's ambiguous or self contradicting) rather than ignoring
> it and implementing something else. I think we're shooting ourselves in the
> foot when doing so and it is in our common interest to strive to reach a
> realistic standard we can all comply with and interoperate with each other.

This has nothing to do with the protocol itself: it has only to do with
caching semantics. As far as caching goes, the only guarantees that NFS
clients give are the close-to-open semantics, and this should indeed be
respected by the implementation in question.

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-03 12:35                           ` Benny Halevy
  2007-01-04  0:43                             ` Trond Myklebust
@ 2007-01-04  8:36                             ` Trond Myklebust
  2007-01-04 10:04                               ` Benny Halevy
  1 sibling, 1 reply; 100+ messages in thread
From: Trond Myklebust @ 2007-01-04  8:36 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

On Wed, 2007-01-03 at 14:35 +0200, Benny Halevy wrote:
> I sincerely expect you or anybody else for this matter to try to provide
> feedback and object to the protocol specification in case they disagree
> with it (or think it's ambiguous or self contradicting) rather than ignoring
> it and implementing something else. I think we're shooting ourselves in the
> foot when doing so and it is in our common interest to strive to reach a
> realistic standard we can all comply with and interoperate with each other.

You are reading the protocol wrong in this case.

While the protocol does allow the server to implement the behaviour that
you've been advocating, it in no way mandates it. Nor does it mandate
that the client should gather files with the same (fsid,fileid) and
cache them together. Those are issues to do with _implementation_, and
are thus beyond the scope of the IETF.

In our case, the client will ignore the unique_handles attribute. It
will use filehandles as our inode cache identifier. It will not jump
through hoops to provide caching semantics that go beyond close-to-open
for servers that set unique_handles to "false".

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-04  8:36                             ` Trond Myklebust
@ 2007-01-04 10:04                               ` Benny Halevy
  2007-01-04 10:47                                 ` Trond Myklebust
  2007-01-05 16:40                                 ` Nicolas Williams
  0 siblings, 2 replies; 100+ messages in thread
From: Benny Halevy @ 2007-01-04 10:04 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven


Trond Myklebust wrote:
> On Wed, 2007-01-03 at 14:35 +0200, Benny Halevy wrote:
>> I sincerely expect you or anybody else for this matter to try to provide
>> feedback and object to the protocol specification in case they disagree
>> with it (or think it's ambiguous or self contradicting) rather than ignoring
>> it and implementing something else. I think we're shooting ourselves in the
>> foot when doing so and it is in our common interest to strive to reach a
>> realistic standard we can all comply with and interoperate with each other.
> 
> You are reading the protocol wrong in this case.

Obviously we interpret it differently and that by itself calls for considering
clarification of the text :)

> 
> While the protocol does allow the server to implement the behaviour that
> you've been advocating, it in no way mandates it. Nor does it mandate
> that the client should gather files with the same (fsid,fileid) and
> cache them together. Those are issues to do with _implementation_, and
> are thus beyond the scope of the IETF.
> 
> In our case, the client will ignore the unique_handles attribute. It
> will use filehandles as our inode cache identifier. It will not jump
> through hoops to provide caching semantics that go beyond close-to-open
> for servers that set unique_handles to "false".

I agree that the way the client implements its cache is out of the protocol
scope. But how do you interpret "correct behavior" in section 4.2.1?
 "Clients MUST use filehandle comparisons only to improve performance, not for correct behavior. All clients need to be prepared for situations in which it cannot be determined whether two filehandles denote the same object and in such cases, avoid making invalid assumptions which might cause incorrect behavior."
Don't you consider data corruption due to cache inconsistency an incorrect behavior?

Benny

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-04 10:04                               ` Benny Halevy
@ 2007-01-04 10:47                                 ` Trond Myklebust
  2007-01-05  8:28                                   ` Benny Halevy
  2007-01-05 16:40                                 ` Nicolas Williams
  1 sibling, 1 reply; 100+ messages in thread
From: Trond Myklebust @ 2007-01-04 10:47 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

On Thu, 2007-01-04 at 12:04 +0200, Benny Halevy wrote:
> I agree that the way the client implements its cache is out of the protocol
> scope. But how do you interpret "correct behavior" in section 4.2.1?
>  "Clients MUST use filehandle comparisons only to improve performance, not for correct behavior. All clients need to be prepared for situations in which it cannot be determined whether two filehandles denote the same object and in such cases, avoid making invalid assumptions which might cause incorrect behavior."
> Don't you consider data corruption due to cache inconsistency an incorrect behavior?

Exactly where do you see us violating the close-to-open cache
consistency guarantees?

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-01 23:22                           ` Mikulas Patocka
@ 2007-01-04 13:59                             ` Nikita Danilov
  0 siblings, 0 replies; 100+ messages in thread
From: Nikita Danilov @ 2007-01-04 13:59 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Arjan van de Ven, Benny Halevy, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

Mikulas Patocka writes:
 > > > BTW. How does ReiserFS find that a given inode number (or object ID in
 > > > ReiserFS terminology) is free before assigning it to new file/directory?
 > >
 > > reiserfs v3 has an extent map of free object identifiers in
 > > super-block.
 > 
 > Inode free space can have at most 2^31 extents --- if inode numbers 
 > alternate between "allocated", "free". How do you pack it to superblock?

In the worst case, when free/used extents are small, some free oids are
"leaked", but this has never been problem in practice. In fact, there
was a patch for reiserfs v3 to store this map in special hidden file but
it wasn't included in mainline, as nobody ever complained about oid map
fragmentation.

 > 
 > > reiser4 used 64 bit object identifiers without reuse.
 > 
 > So you are going to hit the same problem as I did with SpadFS --- you 
 > can't export 64-bit inode number to userspace (programs without 
 > -D_FILE_OFFSET_BITS=64 will have stat() randomly failing with EOVERFLOW 
 > then) and if you export only 32-bit number, it will eventually wrap-around 
 > and colliding st_ino will cause data corruption with many userspace 
 > programs.

Indeed, this is fundamental problem. Reiser4 tries to ameliorate it by
using hash function that starts colliding only when there are billions
of files, in which case 32bit inode number is screwed anyway.

Note, that none of the above problems invalidates reasons for having
long in-kernel inode identifiers that I outlined in other message.

 > 
 > Mikulas

Nikita.


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 15:51                             ` Miklos Szeredi
  2007-01-03 19:04                               ` Mikulas Patocka
@ 2007-01-04 22:59                               ` Pavel Machek
  2007-01-05  8:43                                 ` Miklos Szeredi
  1 sibling, 1 reply; 100+ messages in thread
From: Pavel Machek @ 2007-01-04 22:59 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: matthew, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

Hi!

> > > High probability is all you have.  Cosmic radiation hitting your
> > > computer will more likly cause problems, than colliding 64bit inode
> > > numbers ;)
> > 
> > Some of us have machines designed to cope with cosmic rays, and would be
> > unimpressed with a decrease in reliability.
> 
> With the suggested samefile() interface you'd get a failure with just
> about 100% reliability for any application which needs to compare a
> more than a few files.  The fact is open files are _very_ expensive,
> no wonder they are limited in various ways.
> 
> What should 'tar' do when it runs out of open files, while searching
> for hardlinks?  Should it just give up?  Then the samefile() interface
> would be _less_ reliable than the st_ino one by a significant margin.

You need at most two simultenaously open files for examining any
number of hardlinks. So yes, you can make it reliable.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-04 10:47                                 ` Trond Myklebust
@ 2007-01-05  8:28                                   ` Benny Halevy
  2007-01-05 10:29                                     ` Trond Myklebust
  0 siblings, 1 reply; 100+ messages in thread
From: Benny Halevy @ 2007-01-05  8:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

Trond Myklebust wrote:
> On Thu, 2007-01-04 at 12:04 +0200, Benny Halevy wrote:
>> I agree that the way the client implements its cache is out of the protocol
>> scope. But how do you interpret "correct behavior" in section 4.2.1?
>>  "Clients MUST use filehandle comparisons only to improve performance, not for correct behavior. All clients need to be prepared for situations in which it cannot be determined whether two filehandles denote the same object and in such cases, avoid making invalid assumptions which might cause incorrect behavior."
>> Don't you consider data corruption due to cache inconsistency an incorrect behavior?
> 
> Exactly where do you see us violating the close-to-open cache
> consistency guarantees?
> 

I haven't seen that. What I did see is cache inconsistency when opening
the same file with different file descriptors when the filehandle changes.
My testing shows that at least fsync and close fail with EIO when the filehandle
changed while there was dirty data in the cache and that's good. Still,
not sharing the cache while the file is opened (even on a different file
descriptors by the same process) seems impractical.

Benny

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-04 22:59                               ` Pavel Machek
@ 2007-01-05  8:43                                 ` Miklos Szeredi
  2007-01-05 13:12                                   ` Pavel Machek
  2007-01-05 17:30                                   ` Frank van Maarseveen
  0 siblings, 2 replies; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-05  8:43 UTC (permalink / raw)
  To: pavel
  Cc: matthew, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

> > > > High probability is all you have.  Cosmic radiation hitting your
> > > > computer will more likly cause problems, than colliding 64bit inode
> > > > numbers ;)
> > > 
> > > Some of us have machines designed to cope with cosmic rays, and would be
> > > unimpressed with a decrease in reliability.
> > 
> > With the suggested samefile() interface you'd get a failure with just
> > about 100% reliability for any application which needs to compare a
> > more than a few files.  The fact is open files are _very_ expensive,
> > no wonder they are limited in various ways.
> > 
> > What should 'tar' do when it runs out of open files, while searching
> > for hardlinks?  Should it just give up?  Then the samefile() interface
> > would be _less_ reliable than the st_ino one by a significant margin.
> 
> You need at most two simultenaously open files for examining any
> number of hardlinks. So yes, you can make it reliable.

Well, sort of.  Samefile without keeping fds open doesn't have any
protection against the tree changing underneath between first
registering a file and later opening it.  The inode number is more
useful in this respect.  In fact inode number + generation number will
give you a unique identifier in time as well, which is a _lot_ more
useful to determine if the file you are checking is actually the same
as one that you've come across previously.

So instead of samefile() I'd still suggest an extended attribute
interface which exports the file's unique (in space and time)
identifier as an opaque cookie.

For filesystems like FAT you can basically only guarantee that two
files are the same as long as those files are in the icache, no matter
if you use samefile() or inode numbers.  Userpace _can_ make the
inodes stay in the cache by keeping the files open, which works for
samefile as well as checking by inode number.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-05  8:28                                   ` Benny Halevy
@ 2007-01-05 10:29                                     ` Trond Myklebust
  0 siblings, 0 replies; 100+ messages in thread
From: Trond Myklebust @ 2007-01-05 10:29 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Mikulas Patocka, Jan Harkes, Miklos Szeredi, linux-kernel, nfsv4,
	linux-fsdevel, Jeff Layton, Arjan van de Ven

On Fri, 2007-01-05 at 10:28 +0200, Benny Halevy wrote:
> Trond Myklebust wrote:
> > Exactly where do you see us violating the close-to-open cache
> > consistency guarantees?
> > 
> 
> I haven't seen that. What I did see is cache inconsistency when opening
> the same file with different file descriptors when the filehandle changes.
> My testing shows that at least fsync and close fail with EIO when the filehandle
> changed while there was dirty data in the cache and that's good. Still,
> not sharing the cache while the file is opened (even on a different file
> descriptors by the same process) seems impractical.

Tough. I'm not going to commit to adding support for multiple
filehandles. The fact is that RFC3530 contains masses of rope with which
to allow server and client vendors to hang themselves. The fact that the
protocol claims support for servers that use multiple filehandles per
inode does not mean it is necessarily a good idea. It adds unnecessary
code complexity, it screws with server scalability (extra GETATTR calls
just in order to probe existing filehandles), and it is insufficiently
well documented in the RFC: SECINFO information is, for instance, given
out on a per-filehandle basis, does that mean that the server will have
different security policies? In some places, people haven't even started
to think about the consequences:

      If GETATTR directed to the two filehandles does not return the
      fileid attribute for both of the handles, then it cannot be
      determined whether the two objects are the same.  Therefore,
      operations which depend on that knowledge (e.g., client side data
      caching) cannot be done reliably.

This implies the combination is legal, but offers no indication as to
how you would match OPEN/CLOSE requests via different paths. AFAICS you
would have to do non-cached I/O with no share modes (i.e. NFSv3-style
"special" stateids). There is no way in hell we will ever support
non-cached I/O in NFS other than the special case of O_DIRECT.


...and no, I'm certainly not interested in "fixing" the RFC on this
point in any way other than getting this crap dropped from the spec. I
see no use for it at all.

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-05  8:43                                 ` Miklos Szeredi
@ 2007-01-05 13:12                                   ` Pavel Machek
  2007-01-05 13:55                                     ` Miklos Szeredi
  2007-01-05 17:30                                   ` Frank van Maarseveen
  1 sibling, 1 reply; 100+ messages in thread
From: Pavel Machek @ 2007-01-05 13:12 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: matthew, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

Hi!

> > > > Some of us have machines designed to cope with cosmic rays, and would be
> > > > unimpressed with a decrease in reliability.
> > > 
> > > With the suggested samefile() interface you'd get a failure with just
> > > about 100% reliability for any application which needs to compare a
> > > more than a few files.  The fact is open files are _very_ expensive,
> > > no wonder they are limited in various ways.
> > > 
> > > What should 'tar' do when it runs out of open files, while searching
> > > for hardlinks?  Should it just give up?  Then the samefile() interface
> > > would be _less_ reliable than the st_ino one by a significant margin.
> > 
> > You need at most two simultenaously open files for examining any
> > number of hardlinks. So yes, you can make it reliable.
> 
> Well, sort of.  Samefile without keeping fds open doesn't have any
> protection against the tree changing underneath between first
> registering a file and later opening it.  The inode number is more

You only need to keep one-file-per-hardlink-group open during final
verification, checking that inode hashing produced reasonable results.

							Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-05 13:12                                   ` Pavel Machek
@ 2007-01-05 13:55                                     ` Miklos Szeredi
  2007-01-05 14:08                                       ` Mikulas Patocka
  0 siblings, 1 reply; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-05 13:55 UTC (permalink / raw)
  To: pavel
  Cc: matthew, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

> > Well, sort of.  Samefile without keeping fds open doesn't have any
> > protection against the tree changing underneath between first
> > registering a file and later opening it.  The inode number is more
> 
> You only need to keep one-file-per-hardlink-group open during final
> verification, checking that inode hashing produced reasonable results.

What final verification?  I wasn't just talking about 'tar' but all
cases where st_ino might be used to check the identity of two files at
possibly different points in time.

Time A:    remember identity of file X
Time B:    check if identity of file Y matches that of file X

With samefile() if you open X at A, and keep it open till B, you can
accumulate large numbers of open files and the application can fail.

If you don't keep an open file, just remember the path, then renaming
X will foil the later identity check.  Changing the file at this path
between A and B can even give you a false positive.  This applies to
'tar' as well as the other uses.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-05 13:55                                     ` Miklos Szeredi
@ 2007-01-05 14:08                                       ` Mikulas Patocka
  2007-01-05 15:09                                         ` Miklos Szeredi
  0 siblings, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-05 14:08 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

>>> Well, sort of.  Samefile without keeping fds open doesn't have any
>>> protection against the tree changing underneath between first
>>> registering a file and later opening it.  The inode number is more
>>
>> You only need to keep one-file-per-hardlink-group open during final
>> verification, checking that inode hashing produced reasonable results.
>
> What final verification?  I wasn't just talking about 'tar' but all
> cases where st_ino might be used to check the identity of two files at
> possibly different points in time.
>
> Time A:    remember identity of file X
> Time B:    check if identity of file Y matches that of file X
>
> With samefile() if you open X at A, and keep it open till B, you can
> accumulate large numbers of open files and the application can fail.
>
> If you don't keep an open file, just remember the path, then renaming
> X will foil the later identity check.  Changing the file at this path
> between A and B can even give you a false positive.  This applies to
> 'tar' as well as the other uses.

And does it matter? If you rename a file, tar might skip it no matter of 
hardlink detection (if readdir races with rename, you can read none of the 
names of file, one or both --- all these are possible).

If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete 
both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", 
tar might hardlink both "c" and "d" to "a" and "b".

No one guarantees you sane result of tar or cp -a while changing the tree. 
I don't see how is_samefile() could make it worse.

Mikulas

> Miklos
>

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-05 14:08                                       ` Mikulas Patocka
@ 2007-01-05 15:09                                         ` Miklos Szeredi
  2007-01-05 15:15                                           ` Miklos Szeredi
  2007-01-08  5:57                                           ` Mikulas Patocka
  0 siblings, 2 replies; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-05 15:09 UTC (permalink / raw)
  To: mikulas
  Cc: pavel, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

> And does it matter? If you rename a file, tar might skip it no matter of 
> hardlink detection (if readdir races with rename, you can read none of the 
> names of file, one or both --- all these are possible).
> 
> If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete 
> both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", 
> tar might hardlink both "c" and "d" to "a" and "b".
> 
> No one guarantees you sane result of tar or cp -a while changing the tree. 
> I don't see how is_samefile() could make it worse.

There are several cases where changing the tree doesn't affect the
correctness of the tar or cp -a result.  In some of these cases using
samefile() instead of st_ino _will_ result in a corrupted result.

Generally samefile() is _weaker_ than the st_ino interface in
comparing the identity of two files without using massive amounts of
memory.  You're searching for a better solution, not one that is
broken in a different way, aren't you?

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-05 15:09                                         ` Miklos Szeredi
@ 2007-01-05 15:15                                           ` Miklos Szeredi
  2007-01-08 11:27                                             ` Pavel Machek
  2007-01-08  5:57                                           ` Mikulas Patocka
  1 sibling, 1 reply; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-05 15:15 UTC (permalink / raw)
  To: mikulas
  Cc: pavel, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

> > And does it matter? If you rename a file, tar might skip it no matter of 
> > hardlink detection (if readdir races with rename, you can read none of the 
> > names of file, one or both --- all these are possible).
> > 
> > If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete 
> > both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", 
> > tar might hardlink both "c" and "d" to "a" and "b".
> > 
> > No one guarantees you sane result of tar or cp -a while changing the tree. 
> > I don't see how is_samefile() could make it worse.
> 
> There are several cases where changing the tree doesn't affect the
> correctness of the tar or cp -a result.  In some of these cases using
> samefile() instead of st_ino _will_ result in a corrupted result.

Also note, that using st_ino in combination with samefile() doesn't
make the result much better, it eliminates false positives, but cannot
fix false negatives.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-04 10:04                               ` Benny Halevy
  2007-01-04 10:47                                 ` Trond Myklebust
@ 2007-01-05 16:40                                 ` Nicolas Williams
  2007-01-05 16:56                                   ` Trond Myklebust
                                                     ` (2 more replies)
  1 sibling, 3 replies; 100+ messages in thread
From: Nicolas Williams @ 2007-01-05 16:40 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Trond Myklebust, Jan Harkes, Miklos Szeredi, nfsv4, linux-kernel,
	Mikulas Patocka, linux-fsdevel, Jeff Layton, Arjan van de Ven

On Thu, Jan 04, 2007 at 12:04:14PM +0200, Benny Halevy wrote:
> I agree that the way the client implements its cache is out of the protocol
> scope. But how do you interpret "correct behavior" in section 4.2.1?
>  "Clients MUST use filehandle comparisons only to improve performance, not for correct behavior. All clients need to be prepared for situations in which it cannot be determined whether two filehandles denote the same object and in such cases, avoid making invalid assumptions which might cause incorrect behavior."
> Don't you consider data corruption due to cache inconsistency an incorrect behavior?

If a file with multiple hardlinks appears to have multiple distinct
filehandles then a client like Trond's will treat it as multiple
distinct files (with the same hardlink count, and you won't be able to
find the other links to them -- oh well).  Can this cause data
corruption?  Yes, but only if there are applications that rely on the
different file names referencing the same file, and backup apps on the
client won't get the hardlinks right either.

What I don't understand is why getting the fileid is so hard -- always
GETATTR when you GETFH and you'll be fine.  I'm guessing that's not as
difficult as it is to maintain a hash table of fileids.

Nico
-- 

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-05 16:40                                 ` Nicolas Williams
@ 2007-01-05 16:56                                   ` Trond Myklebust
  2007-01-06  7:44                                   ` Halevy, Benny
  2007-01-10 13:04                                   ` Benny Halevy
  2 siblings, 0 replies; 100+ messages in thread
From: Trond Myklebust @ 2007-01-05 16:56 UTC (permalink / raw)
  To: Nicolas Williams
  Cc: Benny Halevy, Jan Harkes, Miklos Szeredi, nfsv4, linux-kernel,
	Mikulas Patocka, linux-fsdevel, Jeff Layton, Arjan van de Ven

On Fri, 2007-01-05 at 10:40 -0600, Nicolas Williams wrote:
> What I don't understand is why getting the fileid is so hard -- always
> GETATTR when you GETFH and you'll be fine.  I'm guessing that's not as
> difficult as it is to maintain a hash table of fileids.

You've been sleeping in class. We always try to get the fileid together
with the GETFH. The irritating bit is having to redo a GETATTR using the
old filehandle in order to figure out if the 2 filehandles refer to the
same file. Unlike filehandles, fileids can be reused.

Then there is the point of dealing with that servers can (and do!)
actually lie to you.

Trond


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-05  8:43                                 ` Miklos Szeredi
  2007-01-05 13:12                                   ` Pavel Machek
@ 2007-01-05 17:30                                   ` Frank van Maarseveen
  1 sibling, 0 replies; 100+ messages in thread
From: Frank van Maarseveen @ 2007-01-05 17:30 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, matthew, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

On Fri, Jan 05, 2007 at 09:43:22AM +0100, Miklos Szeredi wrote:
> > > > > High probability is all you have.  Cosmic radiation hitting your
> > > > > computer will more likly cause problems, than colliding 64bit inode
> > > > > numbers ;)
> > > > 
> > > > Some of us have machines designed to cope with cosmic rays, and would be
> > > > unimpressed with a decrease in reliability.
> > > 
> > > With the suggested samefile() interface you'd get a failure with just
> > > about 100% reliability for any application which needs to compare a
> > > more than a few files.  The fact is open files are _very_ expensive,
> > > no wonder they are limited in various ways.
> > > 
> > > What should 'tar' do when it runs out of open files, while searching
> > > for hardlinks?  Should it just give up?  Then the samefile() interface
> > > would be _less_ reliable than the st_ino one by a significant margin.
> > 
> > You need at most two simultenaously open files for examining any
> > number of hardlinks. So yes, you can make it reliable.
> 
> Well, sort of.  Samefile without keeping fds open doesn't have any
> protection against the tree changing underneath between first
> registering a file and later opening it.  The inode number is more
> useful in this respect.  In fact inode number + generation number will
> give you a unique identifier in time as well, which is a _lot_ more
> useful to determine if the file you are checking is actually the same
> as one that you've come across previously.

Samefile with keeping fds open doesn't buy you much anyway. What exactly
would be the value of a directory tree seen by operating only on fds
(even for directories) when some rogue process is renaming, moving,
updating stuff underneath?  One ends up with a tree which misses alot
of files and hardly bears any resemblance with the actual tree at any
point in time and I'm not even talking about filedata.

It is futile to try to get a consistent tree view on a live filesystem,
with- or without using fds. It just doesn't work without fundamental
support for some kind of "freezing" or time-travel inside the
kernel. Snapshots at the block device level are problematic too.

> 
> So instead of samefile() I'd still suggest an extended attribute
> interface which exports the file's unique (in space and time)
> identifier as an opaque cookie.

But then you're just _shifting_ the problem instead of fixing it:
st_ino/st_mtime (st_ctime?) are designed for this purpose. If the
filesystem doesn't support it properly: live with the consequences
which are mostly minor. Notable exceptions are of course backup tools
but backups _must_ be verified anyway so you'll discover soon.

(btw, that's what I noticed after restoring a system from a CD (iso9660
 with RR): all hardlinks were gone)

-- 
Frank

^ permalink raw reply	[flat|nested] 100+ messages in thread

* RE: [nfsv4] RE: Finding hardlinks
  2007-01-05 16:40                                 ` Nicolas Williams
  2007-01-05 16:56                                   ` Trond Myklebust
@ 2007-01-06  7:44                                   ` Halevy, Benny
  2007-01-10 13:04                                   ` Benny Halevy
  2 siblings, 0 replies; 100+ messages in thread
From: Halevy, Benny @ 2007-01-06  7:44 UTC (permalink / raw)
  To: Nicolas Williams
  Cc: Trond Myklebust, Jan Harkes, Miklos Szeredi, nfsv4, linux-kernel,
	Mikulas Patocka, linux-fsdevel, Jeff Layton, Arjan van de Ven

> From: linux-fsdevel-owner@vger.kernel.org on behalf of Nicolas Williams
> Sent: Fri 1/5/2007 18:40
> To: Halevy, Benny
> Cc: Trond Myklebust; Jan Harkes; Miklos Szeredi; nfsv4@ietf.org; linux-kernel@vger.kernel.org; Mikulas Patocka; linux-fsdevel@vger.kernel.org; Jeff Layton; Arjan van de Ven
> Subject: Re: [nfsv4] RE: Finding hardlinks
> 
> On Thu, Jan 04, 2007 at 12:04:14PM +0200, Benny Halevy wrote:
> > I agree that the way the client implements its cache is out of the protocol
> > scope. But how do you interpret "correct behavior" in section 4.2.1?
> >  "Clients MUST use filehandle comparisons only to improve performance, not for correct behavior. All clients > need to be prepared for situations in which it cannot be determined whether two filehandles denote the same > object and in such cases, avoid making invalid assumptions which might cause incorrect behavior."
> > Don't you consider data corruption due to cache inconsistency an incorrect behavior?
> 
> If a file with multiple hardlinks appears to have multiple distinct
> filehandles then a client like Trond's will treat it as multiple
> distinct files (with the same hardlink count, and you won't be able to
> find the other links to them -- oh well).  Can this cause data
> corruption?  Yes, but only if there are applications that rely on the
> different file names referencing the same file, and backup apps on the
> client won't get the hardlinks right either.

Well, this is why the hard links were made, no?
FWIW, I believe that rename of an open file might also produce this problem.


> 
> What I don't understand is why getting the fileid is so hard -- always
> GETATTR when you GETFH and you'll be fine.  I'm guessing that's not as
> difficult as it is to maintain a hash table of fileids.


The problem with NFS is that fileid isn't enough because the client doesn't
know about removes by other clients until it uses the stale filehandle.
Also, quite a few file systems are not keeping fileids unique (this triggered
this thread)
 
> 
> Nico
> --


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-05 15:09                                         ` Miklos Szeredi
  2007-01-05 15:15                                           ` Miklos Szeredi
@ 2007-01-08  5:57                                           ` Mikulas Patocka
  2007-01-08  8:49                                             ` Miklos Szeredi
  1 sibling, 1 reply; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-08  5:57 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

>> And does it matter? If you rename a file, tar might skip it no matter of
>> hardlink detection (if readdir races with rename, you can read none of the
>> names of file, one or both --- all these are possible).
>>
>> If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete
>> both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d",
>> tar might hardlink both "c" and "d" to "a" and "b".
>>
>> No one guarantees you sane result of tar or cp -a while changing the tree.
>> I don't see how is_samefile() could make it worse.
>
> There are several cases where changing the tree doesn't affect the
> correctness of the tar or cp -a result.  In some of these cases using
> samefile() instead of st_ino _will_ result in a corrupted result.

... and those are what? If you create hardlinks while copying, you may 
have files duplicated instead of hardlinked in the backup. If you unlink 
hardlinks, cp will miss hardlinks too and create two copies of the same 
file (it searches the hash only for files with i_nlink > 1). If you rename 
files, the archive will be completely fscked up (either missing or 
duplicate files).

> Generally samefile() is _weaker_ than the st_ino interface in
> comparing the identity of two files without using massive amounts of
> memory.  You're searching for a better solution, not one that is
> broken in a different way, aren't you?

What is the relevant case where st_ino/st_dev works and samefile(char 
*path1, char *path2) doesn't?

> Miklos

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-04  0:12                           ` Frank van Maarseveen
@ 2007-01-08  6:19                             ` Mikulas Patocka
  0 siblings, 0 replies; 100+ messages in thread
From: Mikulas Patocka @ 2007-01-08  6:19 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Bryan Henderson, Arjan van de Ven, Jan Harkes, linux-fsdevel,
	linux-kernel, Miklos Szeredi, Pavel Machek

>>> Currently, large file support is already necessary to handle dvd and
>>> video. It's also useful for images for virtualization. So the failing
>>> stat()
>>> calls should already be a thing of the past with modern distributions.
>>
>> As long as glibc compiles by default with 32-bit ino_t, the problem exists
>> and is severe --- programs handling large files, such as coreutils, tar,
>> mc, mplayer, already compile with 64-bit ino_t and off_t, but the user (or
>> script) may type something like:
>>
>> cat >file.c <<EOF
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> main()
>> {
>> 	int h;
>> 	struct stat st;
>> 	if ((h = creat("foo", 0600)) < 0) perror("creat"), exit(1);
>> 	if (fstat(h, &st)) perror("stat"), exit(1);
>> 	close(h);
>> 	return 0;
>> }
>> EOF
>> gcc file.c; ./a.out
>>
>> --- and you certainly do not want this to fail (unless you are out of disk
>> space).
>>
>> The difference is, that with 32-bit program and 64-bit off_t, you get
>> deterministic failure on large files, with 32-bit program and 64-bit
>> ino_t, you get random failures.
>
> What's (technically) the problem with changing the gcc default?

Technically none (i.e. edit gcc specs or glibc includes). But persuading 
all distribution builders to use this version is impossible. Plus there 
are many binary programs that are unchangable.

> Alternatively we could make the error deterministic in various ways. Start
> st_ino numbering from 4G (except for a few special ones maybe such
> as root/mounts). Or make old and new programs look differently at the
> ELF level or by sys_personality() and/or check against a "ino64" mount
> flag/filesystem feature. Lots of possibilities.

I think the best solution would be to drop -EOVERFLOW on st_ino and let 
legacy 32-bit programs live with coliding inodes. They'll have anyway.

Mikulas

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-08  5:57                                           ` Mikulas Patocka
@ 2007-01-08  8:49                                             ` Miklos Szeredi
  2007-01-08 11:29                                               ` Pavel Machek
  0 siblings, 1 reply; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-08  8:49 UTC (permalink / raw)
  To: mikulas
  Cc: pavel, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

> >> No one guarantees you sane result of tar or cp -a while changing the tree.
> >> I don't see how is_samefile() could make it worse.
> >
> > There are several cases where changing the tree doesn't affect the
> > correctness of the tar or cp -a result.  In some of these cases using
> > samefile() instead of st_ino _will_ result in a corrupted result.
> 
> ... and those are what?

  - /a/p/x and /a/q/x are links to the same file

  - /b/y and /a/q/y are links to the same file

  - tar is running on /a

  - meanwhile the following commands are executed:

     mv /a/p/x /b/x
     mv /b/y /a/p/x

With st_ino checking you'll get a perfectly consistent archive,
regardless of the timing.  With samefile() you could get an archive
where the data in /a/q/y is not stored, instead it will contain the
data of /a/q/x.

Note, this is far nastier than the "normal" corruption you usually get
with changing the tree under tar, the file is not just duplicated or
missing, it becomes a completely different file, even though it hasn't
been touched at all during the archiving.

The basic problem with samefile() is that it can only compare files at
a single snapshot in time, and cannot take into account any changes in
the tree (unless keeping files open, which is impractical).

There's really no point trying to push for such an inferior interface
when the problems which samefile is trying to address are purely
theoretical.

Currently linux is living with 32bit st_ino because of legacy apps,
and people are not constantly agonizing about it.  Fixing the
EOVERFLOW problem will enable filesystems to slowly move towards 64bit
st_ino, which should be more than enough.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-05 15:15                                           ` Miklos Szeredi
@ 2007-01-08 11:27                                             ` Pavel Machek
  0 siblings, 0 replies; 100+ messages in thread
From: Pavel Machek @ 2007-01-08 11:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: mikulas, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

On Fri 2007-01-05 16:15:41, Miklos Szeredi wrote:
> > > And does it matter? If you rename a file, tar might skip it no matter of 
> > > hardlink detection (if readdir races with rename, you can read none of the 
> > > names of file, one or both --- all these are possible).
> > > 
> > > If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete 
> > > both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", 
> > > tar might hardlink both "c" and "d" to "a" and "b".
> > > 
> > > No one guarantees you sane result of tar or cp -a while changing the tree. 
> > > I don't see how is_samefile() could make it worse.
> > 
> > There are several cases where changing the tree doesn't affect the
> > correctness of the tar or cp -a result.  In some of these cases using
> > samefile() instead of st_ino _will_ result in a corrupted result.
> 
> Also note, that using st_ino in combination with samefile() doesn't
> make the result much better, it eliminates false positives, but cannot
> fix false negatives.

I'd argue false negatives are not as severe.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-08  8:49                                             ` Miklos Szeredi
@ 2007-01-08 11:29                                               ` Pavel Machek
  2007-01-08 12:00                                                 ` Miklos Szeredi
  0 siblings, 1 reply; 100+ messages in thread
From: Pavel Machek @ 2007-01-08 11:29 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: mikulas, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

Hi!

> > >> No one guarantees you sane result of tar or cp -a while changing the tree.
> > >> I don't see how is_samefile() could make it worse.
> > >
> > > There are several cases where changing the tree doesn't affect the
> > > correctness of the tar or cp -a result.  In some of these cases using
> > > samefile() instead of st_ino _will_ result in a corrupted result.
> > 
> > ... and those are what?
> 
>   - /a/p/x and /a/q/x are links to the same file
> 
>   - /b/y and /a/q/y are links to the same file
> 
>   - tar is running on /a
> 
>   - meanwhile the following commands are executed:
> 
>      mv /a/p/x /b/x
>      mv /b/y /a/p/x
> 
> With st_ino checking you'll get a perfectly consistent archive,
> regardless of the timing.  With samefile() you could get an archive
> where the data in /a/q/y is not stored, instead it will contain the
> data of /a/q/x.
> 
> Note, this is far nastier than the "normal" corruption you usually get
> with changing the tree under tar, the file is not just duplicated or
> missing, it becomes a completely different file, even though it hasn't
> been touched at all during the archiving.
> 
> The basic problem with samefile() is that it can only compare files at
> a single snapshot in time, and cannot take into account any changes in
> the tree (unless keeping files open, which is impractical).

> There's really no point trying to push for such an inferior interface
> when the problems which samefile is trying to address are purely
> theoretical.

Oh yes, there is. st_ino is powerful, *but impossible to implement*
on many filesystems. You are of course welcome to combine st_ino with
samefile.

> Currently linux is living with 32bit st_ino because of legacy apps,
> and people are not constantly agonizing about it.  Fixing the
> EOVERFLOW problem will enable filesystems to slowly move towards 64bit
> st_ino, which should be more than enough.

50% probability of false positive on 4G files seems like very ugly
design problem to me.
								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-08 11:29                                               ` Pavel Machek
@ 2007-01-08 12:00                                                 ` Miklos Szeredi
  2007-01-08 13:26                                                   ` Martin Mares
  2007-01-09 16:26                                                   ` Steven Rostedt
  0 siblings, 2 replies; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-08 12:00 UTC (permalink / raw)
  To: pavel
  Cc: mikulas, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

> > There's really no point trying to push for such an inferior interface
> > when the problems which samefile is trying to address are purely
> > theoretical.
> 
> Oh yes, there is. st_ino is powerful, *but impossible to implement*
> on many filesystems.

You mean POSIX compliance is impossible?  So what?  It is possible to
implement an approximation that is _at least_ as good as samefile().
One really dumb way is to set st_ino to the 'struct inode' pointer for
example.  That will sure as hell fit into 64bits and will give a
unique (alas not stable) identifier for each file.  Opening two files,
doing fstat() on them and comparing st_ino will give exactly the same
guarantees as samefile().

> > Currently linux is living with 32bit st_ino because of legacy apps,
> > and people are not constantly agonizing about it.  Fixing the
> > EOVERFLOW problem will enable filesystems to slowly move towards 64bit
> > st_ino, which should be more than enough.
> 
> 50% probability of false positive on 4G files seems like very ugly
> design problem to me.

4 billion files, each with more than one link is pretty far fetched.
And anyway, filesystems can take steps to prevent collisions, as they
do currently for 32bit st_ino, without serious difficulties
apparently.

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-08 12:00                                                 ` Miklos Szeredi
@ 2007-01-08 13:26                                                   ` Martin Mares
  2007-01-08 13:39                                                     ` Miklos Szeredi
  2007-01-09 16:26                                                   ` Steven Rostedt
  1 sibling, 1 reply; 100+ messages in thread
From: Martin Mares @ 2007-01-08 13:26 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, mikulas, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

Hello!

> You mean POSIX compliance is impossible?  So what?  It is possible to
> implement an approximation that is _at least_ as good as samefile().
> One really dumb way is to set st_ino to the 'struct inode' pointer for
> example.  That will sure as hell fit into 64bits and will give a
> unique (alas not stable) identifier for each file.  Opening two files,
> doing fstat() on them and comparing st_ino will give exactly the same
> guarantees as samefile().

Good, ... except that it doesn't work. AFAIK, POSIX mandates inodes
to be unique until umount, not until inode cache expires :-)

IOW, if you have such implementation of st_ino, you can emulate samefile()
with it, but you cannot have it without violating POSIX.

> 4 billion files, each with more than one link is pretty far fetched.

Not on terabyte scale disk arrays, which are getting quite common these days.

> And anyway, filesystems can take steps to prevent collisions, as they
> do currently for 32bit st_ino, without serious difficulties
> apparently.

They currently do that usually by not supporting more than 4G files
in a single FS.

				Have a nice fortnight
-- 
Martin `MJ' Mares                          <mj@ucw.cz>   http://mj.ucw.cz/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
"Oh no, not again!"  -- The bowl of petunias

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-08 13:26                                                   ` Martin Mares
@ 2007-01-08 13:39                                                     ` Miklos Szeredi
  0 siblings, 0 replies; 100+ messages in thread
From: Miklos Szeredi @ 2007-01-08 13:39 UTC (permalink / raw)
  To: mj
  Cc: pavel, mikulas, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

> > You mean POSIX compliance is impossible?  So what?  It is possible to
> > implement an approximation that is _at least_ as good as samefile().
> > One really dumb way is to set st_ino to the 'struct inode' pointer for
> > example.  That will sure as hell fit into 64bits and will give a
> > unique (alas not stable) identifier for each file.  Opening two files,
> > doing fstat() on them and comparing st_ino will give exactly the same
> > guarantees as samefile().
> 
> Good, ... except that it doesn't work. AFAIK, POSIX mandates inodes
> to be unique until umount, not until inode cache expires :-)
> 
> IOW, if you have such implementation of st_ino, you can emulate samefile()
> with it, but you cannot have it without violating POSIX.

The whole discussion started out from the premise, that some
filesystems can't support stable unique inode numbers, i.e. they don't
conform to POSIX.

Filesystems which do conform to POSIX have _no need_ for samefile().
Ones that don't conform, can chose a scheme that is best suited to
applications need, balancing uniqueness and stability in various ways.

> > 4 billion files, each with more than one link is pretty far fetched.
> 
> Not on terabyte scale disk arrays, which are getting quite common these days.
> 
> > And anyway, filesystems can take steps to prevent collisions, as they
> > do currently for 32bit st_ino, without serious difficulties
> > apparently.
> 
> They currently do that usually by not supporting more than 4G files
> in a single FS.

And with 64bit st_ino, they'll have to live with the limitation of not
more than 2^64 files.  Tough luck ;)

Miklos

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-08 12:00                                                 ` Miklos Szeredi
  2007-01-08 13:26                                                   ` Martin Mares
@ 2007-01-09 16:26                                                   ` Steven Rostedt
  2007-01-09 19:53                                                     ` Frank van Maarseveen
  1 sibling, 1 reply; 100+ messages in thread
From: Steven Rostedt @ 2007-01-09 16:26 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: pavel, mikulas, matthew, bhalevy, arjan, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

On Mon, 2007-01-08 at 13:00 +0100, Miklos Szeredi wrote:

> > 50% probability of false positive on 4G files seems like very ugly
> > design problem to me.
> 
> 4 billion files, each with more than one link is pretty far fetched.
> And anyway, filesystems can take steps to prevent collisions, as they
> do currently for 32bit st_ino, without serious difficulties
> apparently.

Maybe not 4 billion files, but you can get a large number of >1 linked
files, when you copy full directories with "cp -rl".  Which I do a lot
when developing. I've done that a few times with the Linux tree.  Given
other utils that copy as hard links, can perhaps make a 4 billion number
of files with >1 link possible, and perhaps likely in the near future.

-- Steve



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-09 16:26                                                   ` Steven Rostedt
@ 2007-01-09 19:53                                                     ` Frank van Maarseveen
  2007-01-09 20:11                                                       ` Steven Rostedt
  2007-01-11 10:07                                                       ` Pádraig Brady
  0 siblings, 2 replies; 100+ messages in thread
From: Frank van Maarseveen @ 2007-01-09 19:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Miklos Szeredi, pavel, mikulas, matthew, bhalevy, arjan,
	jaharkes, linux-kernel, linux-fsdevel, nfsv4

On Tue, Jan 09, 2007 at 11:26:25AM -0500, Steven Rostedt wrote:
> On Mon, 2007-01-08 at 13:00 +0100, Miklos Szeredi wrote:
> 
> > > 50% probability of false positive on 4G files seems like very ugly
> > > design problem to me.
> > 
> > 4 billion files, each with more than one link is pretty far fetched.
> > And anyway, filesystems can take steps to prevent collisions, as they
> > do currently for 32bit st_ino, without serious difficulties
> > apparently.
> 
> Maybe not 4 billion files, but you can get a large number of >1 linked
> files, when you copy full directories with "cp -rl".

Yes but "cp -rl" is typically done by _developers_ and they tend to
have a better understanding of this (uh, at least within linux context
I hope so).

Also, just adding hard-links doesn't increase the number of inodes.

-- 
Frank

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-09 19:53                                                     ` Frank van Maarseveen
@ 2007-01-09 20:11                                                       ` Steven Rostedt
  2007-01-11 10:07                                                       ` Pádraig Brady
  1 sibling, 0 replies; 100+ messages in thread
From: Steven Rostedt @ 2007-01-09 20:11 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Miklos Szeredi, pavel, mikulas, matthew, bhalevy, arjan,
	jaharkes, linux-kernel, linux-fsdevel, nfsv4


On Tue, 9 Jan 2007, Frank van Maarseveen wrote:

>
> Yes but "cp -rl" is typically done by _developers_ and they tend to
> have a better understanding of this (uh, at least within linux context
> I hope so).
>
> Also, just adding hard-links doesn't increase the number of inodes.

No, but it increases the number of inodes that have link >1. :)
-- Steve


^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: [nfsv4] RE: Finding hardlinks
  2007-01-05 16:40                                 ` Nicolas Williams
  2007-01-05 16:56                                   ` Trond Myklebust
  2007-01-06  7:44                                   ` Halevy, Benny
@ 2007-01-10 13:04                                   ` Benny Halevy
  2 siblings, 0 replies; 100+ messages in thread
From: Benny Halevy @ 2007-01-10 13:04 UTC (permalink / raw)
  To: Benny Halevy, Trond Myklebust, Jan Harkes, Miklos Szeredi, nfsv4,
	linux-kernel, Mikulas Patocka, linux-fsdevel, Jeff Layton,
	Arjan van de Ven

Nicolas Williams wrote:
> On Thu, Jan 04, 2007 at 12:04:14PM +0200, Benny Halevy wrote:
>> I agree that the way the client implements its cache is out of the protocol
>> scope. But how do you interpret "correct behavior" in section 4.2.1?
>>  "Clients MUST use filehandle comparisons only to improve performance, not for correct behavior. All clients need to be prepared for situations in which it cannot be determined whether two filehandles denote the same object and in such cases, avoid making invalid assumptions which might cause incorrect behavior."
>> Don't you consider data corruption due to cache inconsistency an incorrect behavior?
> 
> If a file with multiple hardlinks appears to have multiple distinct
> filehandles then a client like Trond's will treat it as multiple
> distinct files (with the same hardlink count, and you won't be able to
> find the other links to them -- oh well).  Can this cause data
> corruption?  Yes, but only if there are applications that rely on the
> different file names referencing the same file, and backup apps on the
> client won't get the hardlinks right either.

The case I'm discussing is multiple filehandles for the same name,
not even for different hardlinks.  This causes spurious EIO errors
on the client when the filehandle changes and cache inconsistency
when opening the file multiple times in parallel.

> 
> What I don't understand is why getting the fileid is so hard -- always
> GETATTR when you GETFH and you'll be fine.  I'm guessing that's not as
> difficult as it is to maintain a hash table of fileids.

It's not difficult at all, just that the client can't rely on the fileids to be
unique in both space and time because of server non-compliance (e.g. netapp's
snapshots) and fileid reuse after delete.



^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-09 19:53                                                     ` Frank van Maarseveen
  2007-01-09 20:11                                                       ` Steven Rostedt
@ 2007-01-11 10:07                                                       ` Pádraig Brady
  1 sibling, 0 replies; 100+ messages in thread
From: Pádraig Brady @ 2007-01-11 10:07 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Steven Rostedt, Miklos Szeredi, pavel, mikulas, matthew, bhalevy,
	arjan, jaharkes, linux-kernel, linux-fsdevel, nfsv4

Frank van Maarseveen wrote:
> On Tue, Jan 09, 2007 at 11:26:25AM -0500, Steven Rostedt wrote:
>> On Mon, 2007-01-08 at 13:00 +0100, Miklos Szeredi wrote:
>>
>>>> 50% probability of false positive on 4G files seems like very ugly
>>>> design problem to me.
>>> 4 billion files, each with more than one link is pretty far fetched.
>>> And anyway, filesystems can take steps to prevent collisions, as they
>>> do currently for 32bit st_ino, without serious difficulties
>>> apparently.
>> Maybe not 4 billion files, but you can get a large number of >1 linked
>> files, when you copy full directories with "cp -rl".
> 
> Yes but "cp -rl" is typically done by _developers_ and they tend to
> have a better understanding of this (uh, at least within linux context
> I hope so).

I'm not really following this thread, but that's wrong.
A lot of people use hardlinks to provide snapshot functionality.
I.E. the following can be used to efficiently make snapshots:

rsync /src/ /backup/today
cp -al /backup/today /backup/$Date

See also:

http://www.dirvish.org/
http://www.rsnapshot.org/
http://igmus.org/code/

> Also, just adding hard-links doesn't increase the number of inodes.

I don't think that was the point.

Pádraig.

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2006-12-28  9:06           ` Benny Halevy
  2006-12-28 10:05             ` Arjan van de Ven
  2006-12-28 13:22             ` Jeff Layton
@ 2007-01-11 23:35             ` Denis Vlasenko
  2 siblings, 0 replies; 100+ messages in thread
From: Denis Vlasenko @ 2007-01-11 23:35 UTC (permalink / raw)
  To: Benny Halevy
  Cc: Mikulas Patocka, Arjan van de Ven, Jan Harkes, Miklos Szeredi,
	linux-kernel, linux-fsdevel, nfsv4

On Thursday 28 December 2006 10:06, Benny Halevy wrote:
> Mikulas Patocka wrote:
> >>> If user (or script) doesn't specify that flag, it doesn't help. I think
> >>> the best solution for these filesystems would be either to add new syscall
> >>>  	int is_hardlink(char *filename1, char *filename2)
> >>> (but I know adding syscall bloat may be objectionable)
> >> it's also the wrong api; the filenames may have been changed under you
> >> just as you return from this call, so it really is a
> >> "was_hardlink_at_some_point()" as you specify it.
> >> If you make it work on fd's.. it has a chance at least.
> > 
> > Yes, but it doesn't matter --- if the tree changes under "cp -a" command, 
> > no one guarantees you what you get.
> >  	int fis_hardlink(int handle1, int handle 2);
> > Is another possibility but it can't detect hardlinked symlinks.

It also suffers from combinatorial explosion.
cp -a on 10^6 files will require ~0.5 * 10^12 compares...
 
> It seems like the posix idea of unique <st_dev, st_ino> doesn't
> hold water for modern file systems and that creates real problems for
> backup apps which rely on that to detect hard links.

Yes, and it should have been obvious at 32->64bit inode# transition.
Unfortunately people tend to think "ok, NOW this new shiny BIGNUM-bit
field is big enough for everybody". Then cycle repeats in five years...

I think the solution is that inode "numbers" should become
opaque _variable-length_ hashes. They are already just hash values,
this is nothing new. All problems stem from fixed width of inode# only.

--
vda

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 12:42                           ` Pavel Machek
@ 2007-01-11 23:43                             ` Denis Vlasenko
  0 siblings, 0 replies; 100+ messages in thread
From: Denis Vlasenko @ 2007-01-11 23:43 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Miklos Szeredi, bhalevy, arjan, mikulas, jaharkes, linux-kernel,
	linux-fsdevel, nfsv4

On Wednesday 03 January 2007 13:42, Pavel Machek wrote:
> I guess that is the way to go. samefile(path1, path2) is unfortunately
> inherently racy.

Not a problem in practice. You don't expect cp -a
to reliably copy a tree which something else is modifying
at the same time.

Thus we assume that the tree we operate on is not modified.
--
vda

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
  2007-01-03 20:26                           ` Frank van Maarseveen
@ 2007-01-12  0:00                             ` Denis Vlasenko
  0 siblings, 0 replies; 100+ messages in thread
From: Denis Vlasenko @ 2007-01-12  0:00 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Mikulas Patocka, Jan Harkes, Pavel Machek, Arjan van de Ven,
	Miklos Szeredi, linux-kernel, linux-fsdevel

On Wednesday 03 January 2007 21:26, Frank van Maarseveen wrote:
> On Wed, Jan 03, 2007 at 08:31:32PM +0100, Mikulas Patocka wrote:
> > 64-bit inode numbers space is not yet implemented on Linux --- the problem 
> > is that if you return ino >= 2^32, programs compiled without 
> > -D_FILE_OFFSET_BITS=64 will fail with stat() returning -EOVERFLOW --- this 
> > failure is specified in POSIX, but not very useful.
> 
> hmm, checking iunique(), ino_t, __kernel_ino_t... I see. Pity. So at
> some point in time we may need a sort of "ino64" mount option to be
> able to switch to a 64 bit number space on mount basis. Or (conversely)
> refuse to mount without that option if we know there are >32 bit st_ino
> out there. And invent iunique64() and use that when "ino64" specified
> for FAT/SMB/...  when those filesystems haven't been replaced by a
> successor by that time.
> 
> At that time probably all programs are either compiled with
> -D_FILE_OFFSET_BITS=64 (most already are because of files bigger than 2G)
> or completely 64 bit. 

Good plan. Be prepared to redo it again when 64bits will feel "small" also.
Then again when 128bit will be "small". Don't tell me this won't happen.
15 years ago people would laugh about 32bit inode numbers being not enough.
--
vda

^ permalink raw reply	[flat|nested] 100+ messages in thread

* Re: Finding hardlinks
       [not found]                     ` <7zXMb-5g5-27@gated-at.bofh.it>
@ 2007-01-05 23:54                       ` Bodo Eggert
  0 siblings, 0 replies; 100+ messages in thread
From: Bodo Eggert @ 2007-01-05 23:54 UTC (permalink / raw)
  To: Miklos Szeredi, matthew, bhalevy, arjan, mikulas, jaharkes,
	linux-kernel, linux-fsdevel, nfsv4, pavel

Miklos Szeredi <miklos@szeredi.hu> wrote:

>> > Well, sort of.  Samefile without keeping fds open doesn't have any
>> > protection against the tree changing underneath between first
>> > registering a file and later opening it.  The inode number is more
>> 
>> You only need to keep one-file-per-hardlink-group open during final
>> verification, checking that inode hashing produced reasonable results.
> 
> What final verification?  I wasn't just talking about 'tar' but all
> cases where st_ino might be used to check the identity of two files at
> possibly different points in time.
> 
> Time A:    remember identity of file X
> Time B:    check if identity of file Y matches that of file X
> 
> With samefile() if you open X at A, and keep it open till B, you can
> accumulate large numbers of open files and the application can fail.
> 
> If you don't keep an open file, just remember the path, then renaming
> X will foil the later identity check.  Changing the file at this path
> between A and B can even give you a false positive.  This applies to
> 'tar' as well as the other uses.

If you open Y, this open file descriptor will guarantee that no distinct
file will have the same inode number while all hardliked files must have
the same inode number. (AFAIK)

Now you will check this against the list of hardlink candidates using the
stored inode number. If the inode number has changed, this will result in
a false negative. If you removed X, recreated it with the same inode number
and linked that to Y, you'll get a false positive (which could be identified
by the [mc]time changes).

Samefile without keeping the files open will result in the same false
positive as open+fstat+stat, while samefile with keeping the files open
will occasionally overflow the files table, Therefore I think it's not
worth while introducing samefile as long as the inode is unique for open
files. OTOH you'll want to keep the inode number as stable as possible,
since it's the only sane way to find sets of hardlinked files and some
important programs may depend on it.
-- 
Ich danke GMX dafür, die Verwendung meiner Adressen mittels per SPF
verbreiteten Lügen zu sabotieren.

http://david.woodhou.se/why-not-spf.html

^ permalink raw reply	[flat|nested] 100+ messages in thread

end of thread, other threads:[~2007-01-12  0:02 UTC | newest]

Thread overview: 100+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-20  9:03 Finding hardlinks Mikulas Patocka
2006-12-20 11:44 ` Miklos Szeredi
2006-12-20 16:36   ` Mikulas Patocka
2006-12-20 16:50     ` Miklos Szeredi
2006-12-20 19:54       ` Al Viro
2006-12-20 20:12         ` Mikulas Patocka
2006-12-31 15:02         ` Mikulas Patocka
2006-12-21 18:58   ` Jan Harkes
2006-12-21 23:49     ` Mikulas Patocka
2006-12-22  5:05       ` Jan Harkes
2006-12-23 10:18       ` Arjan van de Ven
2006-12-23 14:00         ` Mikulas Patocka
2006-12-28  9:06           ` Benny Halevy
2006-12-28 10:05             ` Arjan van de Ven
2006-12-28 15:24               ` Benny Halevy
2006-12-28 19:58                 ` Miklos Szeredi
2007-01-02 19:15                   ` Pavel Machek
2007-01-02 20:41                     ` Miklos Szeredi
2007-01-02 20:50                       ` Mikulas Patocka
2007-01-02 21:10                         ` Miklos Szeredi
2007-01-02 21:37                           ` Mikulas Patocka
2007-01-03 11:56                       ` Pavel Machek
2007-01-03 12:33                         ` Miklos Szeredi
2007-01-03 12:42                           ` Pavel Machek
2007-01-11 23:43                             ` Denis Vlasenko
2007-01-03 12:45                           ` Martin Mares
2007-01-03 13:54                           ` Matthew Wilcox
2007-01-03 15:51                             ` Miklos Szeredi
2007-01-03 19:04                               ` Mikulas Patocka
2007-01-04 22:59                               ` Pavel Machek
2007-01-05  8:43                                 ` Miklos Szeredi
2007-01-05 13:12                                   ` Pavel Machek
2007-01-05 13:55                                     ` Miklos Szeredi
2007-01-05 14:08                                       ` Mikulas Patocka
2007-01-05 15:09                                         ` Miklos Szeredi
2007-01-05 15:15                                           ` Miklos Szeredi
2007-01-08 11:27                                             ` Pavel Machek
2007-01-08  5:57                                           ` Mikulas Patocka
2007-01-08  8:49                                             ` Miklos Szeredi
2007-01-08 11:29                                               ` Pavel Machek
2007-01-08 12:00                                                 ` Miklos Szeredi
2007-01-08 13:26                                                   ` Martin Mares
2007-01-08 13:39                                                     ` Miklos Szeredi
2007-01-09 16:26                                                   ` Steven Rostedt
2007-01-09 19:53                                                     ` Frank van Maarseveen
2007-01-09 20:11                                                       ` Steven Rostedt
2007-01-11 10:07                                                       ` Pádraig Brady
2007-01-05 17:30                                   ` Frank van Maarseveen
2006-12-28 18:14               ` Mikulas Patocka
2006-12-29 10:34                 ` Trond Myklebust
2006-12-30  1:04                   ` Mikulas Patocka
2007-01-01  2:30                     ` Nikita Danilov
2007-01-01 22:58                       ` Mikulas Patocka
2007-01-01 23:05                         ` Nikita Danilov
2007-01-01 23:22                           ` Mikulas Patocka
2007-01-04 13:59                             ` Nikita Danilov
2007-01-02 23:14                     ` Trond Myklebust
2007-01-02 23:50                       ` Mikulas Patocka
2006-12-28 13:22             ` Jeff Layton
2006-12-28 15:12               ` Benny Halevy
2006-12-28 15:54                 ` Jeff Layton
2006-12-28 16:26                   ` Jan Engelhardt
2006-12-28 18:17                 ` Mikulas Patocka
2006-12-28 20:07                   ` Halevy, Benny
2006-12-29 10:28                     ` [nfsv4] " Trond Myklebust
2006-12-31 21:25                       ` Halevy, Benny
2007-01-02 23:21                         ` Trond Myklebust
2007-01-03 12:35                           ` Benny Halevy
2007-01-04  0:43                             ` Trond Myklebust
2007-01-04  8:36                             ` Trond Myklebust
2007-01-04 10:04                               ` Benny Halevy
2007-01-04 10:47                                 ` Trond Myklebust
2007-01-05  8:28                                   ` Benny Halevy
2007-01-05 10:29                                     ` Trond Myklebust
2007-01-05 16:40                                 ` Nicolas Williams
2007-01-05 16:56                                   ` Trond Myklebust
2007-01-06  7:44                                   ` Halevy, Benny
2007-01-10 13:04                                   ` Benny Halevy
2006-12-29 10:12                 ` Trond Myklebust
2006-12-31 21:19                   ` Halevy, Benny
2007-01-02 23:20                     ` Trond Myklebust
2007-01-02 23:46                     ` Trond Myklebust
2007-01-11 23:35             ` Denis Vlasenko
2006-12-29 10:02           ` Pavel Machek
2007-01-01 22:47             ` Mikulas Patocka
2007-01-01 23:53               ` Jan Harkes
2007-01-02  0:04                 ` Mikulas Patocka
2007-01-03 18:58                   ` Frank van Maarseveen
2007-01-03 19:17                     ` Mikulas Patocka
2007-01-03 19:26                       ` Frank van Maarseveen
2007-01-03 19:31                         ` Mikulas Patocka
2007-01-03 20:26                           ` Frank van Maarseveen
2007-01-12  0:00                             ` Denis Vlasenko
2007-01-03 22:30                           ` Pavel Machek
2007-01-03 21:09                     ` Bryan Henderson
2007-01-03 22:01                       ` Frank van Maarseveen
2007-01-03 23:43                         ` Mikulas Patocka
2007-01-04  0:12                           ` Frank van Maarseveen
2007-01-08  6:19                             ` Mikulas Patocka
     [not found] <7x5mR-2wX-3@gated-at.bofh.it>
     [not found] ` <7x9Ad-18O-35@gated-at.bofh.it>
     [not found]   ` <7yXEy-UI-39@gated-at.bofh.it>
     [not found]     ` <7yYKa-2Ds-3@gated-at.bofh.it>
     [not found]       ` <7zcWP-7ET-5@gated-at.bofh.it>
     [not found]         ` <7zdzA-jc-27@gated-at.bofh.it>
     [not found]           ` <7zeP5-2ic-15@gated-at.bofh.it>
     [not found]             ` <7zgH9-5my-17@gated-at.bofh.it>
     [not found]               ` <7zJSM-14t-9@gated-at.bofh.it>
     [not found]                 ` <7zSW5-6cj-9@gated-at.bofh.it>
     [not found]                   ` <7zX9l-4rS-7@gated-at.bofh.it>
     [not found]                     ` <7zXMb-5g5-27@gated-at.bofh.it>
2007-01-05 23:54                       ` Bodo Eggert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).