LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] Sanitize filesystem NLS handling
@ 2007-03-18 16:55 Alexander E. Patrakov
  2007-03-18 22:01 ` OGAWA Hirofumi
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-18 16:55 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, agalakhov, Kay Sievers

Let me quote from the current description of CONFIG_FAT_DEFAULT_CODEPAGE:

> This option should be set to the codepage of your FAT filesystems.

Do you see the word "your"? Compile-time settings with the word "your" 
in their description are always a design bug.

As documented, this can be changed on per-mount basis with the 
"codepage=..." mount option, and this works in /etc/fstab. The same 
consideration (including the "works in /etc/fstab" note) applies to the 
iocharset parameter. However, this becomes a big problem with HAL-based 
automounting, because many agents (e.g., pmount and exo_mount from 
xfce-4.4.0) have NO means of changing the mount options. This makes it 
completely impossible to use non-English filenames with HAL-based 
automounting and distro kernels. There are a lot of threads about this 
in Russian forums, and the solution usually recommended is to uninstall HAL.

One could say that every backend must add the iocharset and codepage 
mount option by itself, but code duplication between such backends is 
never a good thing. The solution is to make the default iocharset and 
codepage adjustable at runtime on the global basis, so that no explicit 
mount options are needed. That's what the appended patch does.

More details on what the patch does:

 * Rewords the description of CONFIG_NLS_DEFAULT, because at some point 
in the past it confused some people
 * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used 
for this purpose. This is because the correct setting of both must match 
the user's locale
 * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE 
options into one, named CONFIG_CODEPAGE_DEFAULT. This is because the 
correct setting of both must match the code page used by MS-DOS in the 
user's country. For the same reason, CONFIG_SMB_NLS_DEFAULT is removed 
(the only sane choice is "y")
 * Makes the FAT filesystem accept both the old-style "codepage=866" 
mount option (which is inconsistent with other filesystems requiring a 
codepage option) and the new-style "codepage=cp866" option. This is 
necessary because CONFIG_CODEPAGE_DEFAULT must work for all filesystems 
that use it
 * Downgrades the UTF-8 FAT warning to a note, because, while using the 
utf8 iocharset produces a case-sensitive FAT filesystem, other 
iocharsets simply produce wrong characters, which is much worse
 * Renames SMB_NLS_MAXNAMELEN to NLS_MAXNAMELEN, because it is also 
useful outside smbfs
 * Makes smbfs always output iocharset and codepage in /proc/mounts, as 
FAT does
 * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at 
runtime via the following mechanisms:

   * If the nls_base.ko module is really a module, it is sufficient to 
write in modprobe.conf something like this:

options nls_base iocharset=koi8-r codepage=cp866

   * If it is not a module, add this to the kernel command line in GRUB:

nls_base.iocharset=koi8-r nls_base.codepage=cp866

   * In both cases, writing into sysfs also works:

echo -n koi8-r >/sys/module/nls_base/parameters/iocharset
echo -n cp866 >/sys/module/nls_base/parameters/codepage

Target: 2.6.22
Signed-off-by: Alexander E. Patrakov

After applying, fix the defconfigs by removing 
CONFIG_FAT_DEFAULT_IOCHARSET, CONFIG_FAT_DEFAULT_CODEPAGE, 
CONFIG_SMB_NLS_REMOTE, CONFIG_SMB_NLS_DEFAULT, and adding 
CONFIG_CODEPAGE_DEFAULT="cp437" (except for defconfig.m32104ut, where 
this should be cp932). Also make sure that CONFIG_NLS_CODEPAGE_437 is 
not disabled.

diff -ur linux-2.6.20.1/include/linux/msdos_fs.h linux-2.6.20.1.new/include/linux/msdos_fs.h
--- linux-2.6.20.1/include/linux/msdos_fs.h	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/include/linux/msdos_fs.h	2007-03-15 20:58:37.000000000 +0500
@@ -191,7 +191,7 @@
 	gid_t fs_gid;
 	unsigned short fs_fmask;
 	unsigned short fs_dmask;
-	unsigned short codepage;  /* Codepage for shortname conversions */
+	char *codepage;           /* Codepage for shortname conversions */
 	char *iocharset;          /* Charset used for filename input/display */
 	unsigned short shortname; /* flags for shortname display/create rule */
 	unsigned char name_check; /* r = relaxed, n = normal, s = strict */
diff -ur linux-2.6.20.1/include/linux/nls.h linux-2.6.20.1.new/include/linux/nls.h
--- linux-2.6.20.1/include/linux/nls.h	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/include/linux/nls.h	2007-03-15 19:44:40.000000000 +0500
@@ -3,6 +3,11 @@
 
 #include <linux/init.h>
 
+#define NLS_MAXNAMELEN 20
+
+extern char default_iocharset[NLS_MAXNAMELEN];
+extern char default_codepage[NLS_MAXNAMELEN];
+
 /* unicode character */
 typedef __u16 wchar_t;
 
diff -ur linux-2.6.20.1/include/linux/smb.h linux-2.6.20.1.new/include/linux/smb.h
--- linux-2.6.20.1/include/linux/smb.h	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/include/linux/smb.h	2007-03-15 19:44:16.000000000 +0500
@@ -11,6 +11,7 @@
 
 #include <linux/types.h>
 #include <linux/magic.h>
+#include <linux/nls.h>
 
 enum smb_protocol { 
 	SMB_PROTOCOL_NONE, 
@@ -63,10 +64,9 @@
 
 #ifdef __KERNEL__
 
-#define SMB_NLS_MAXNAMELEN 20
 struct smb_nls_codepage {
-	char local_name[SMB_NLS_MAXNAMELEN];
-	char remote_name[SMB_NLS_MAXNAMELEN];
+	char local_name[NLS_MAXNAMELEN];
+	char remote_name[NLS_MAXNAMELEN];
 };
 
 
diff -ur linux-2.6.20.1/fs/fat/inode.c linux-2.6.20.1.new/fs/fat/inode.c
--- linux-2.6.20.1/fs/fat/inode.c	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/fat/inode.c	2007-03-15 21:49:09.000000000 +0500
@@ -27,15 +27,6 @@
 #include <linux/writeback.h>
 #include <asm/unaligned.h>
 
-#ifndef CONFIG_FAT_DEFAULT_IOCHARSET
-/* if user don't select VFAT, this is undefined. */
-#define CONFIG_FAT_DEFAULT_IOCHARSET	""
-#endif
-
-static int fat_default_codepage = CONFIG_FAT_DEFAULT_CODEPAGE;
-static char fat_default_iocharset[] = CONFIG_FAT_DEFAULT_IOCHARSET;
-
-
 static int fat_add_cluster(struct inode *inode)
 {
 	int err, cluster;
@@ -462,15 +453,18 @@
 	if (sbi->nls_disk) {
 		unload_nls(sbi->nls_disk);
 		sbi->nls_disk = NULL;
-		sbi->options.codepage = fat_default_codepage;
 	}
 	if (sbi->nls_io) {
 		unload_nls(sbi->nls_io);
 		sbi->nls_io = NULL;
 	}
-	if (sbi->options.iocharset != fat_default_iocharset) {
+	if (sbi->options.iocharset != default_iocharset) {
 		kfree(sbi->options.iocharset);
-		sbi->options.iocharset = fat_default_iocharset;
+		sbi->options.iocharset = default_iocharset;
+	}
+	if (sbi->options.codepage != default_codepage) {
+		kfree(sbi->options.codepage);
+		sbi->options.codepage = default_codepage;
 	}
 
 	sb->s_fs_info = NULL;
@@ -869,7 +863,7 @@
 	{Opt_umask, "umask=%o"},
 	{Opt_dmask, "dmask=%o"},
 	{Opt_fmask, "fmask=%o"},
-	{Opt_codepage, "codepage=%u"},
+	{Opt_codepage, "codepage=%s"},
 	{Opt_nocase, "nocase"},
 	{Opt_quiet, "quiet"},
 	{Opt_showexec, "showexec"},
@@ -933,14 +927,15 @@
 	substring_t args[MAX_OPT_ARGS];
 	int option;
 	char *iocharset;
+	char *codepage;
 
 	opts->isvfat = is_vfat;
 
 	opts->fs_uid = current->uid;
 	opts->fs_gid = current->gid;
 	opts->fs_fmask = opts->fs_dmask = current->fs->umask;
-	opts->codepage = fat_default_codepage;
-	opts->iocharset = fat_default_iocharset;
+	opts->codepage = default_codepage;
+	opts->iocharset = default_iocharset;
 	if (is_vfat)
 		opts->shortname = VFAT_SFN_DISPLAY_LOWER|VFAT_SFN_CREATE_WIN95;
 	else
@@ -1024,9 +1019,12 @@
 			opts->fs_fmask = option;
 			break;
 		case Opt_codepage:
-			if (match_int(&args[0], &option))
-				return 0;
-			opts->codepage = option;
+			if (opts->codepage != default_codepage)
+				kfree(opts->codepage);
+			codepage = match_strdup(&args[0]);
+			if (!codepage)
+				return -ENOMEM;
+			opts->codepage = codepage;
 			break;
 		case Opt_flush:
 			opts->flush = 1;
@@ -1042,7 +1040,7 @@
 
 		/* vfat specific */
 		case Opt_charset:
-			if (opts->iocharset != fat_default_iocharset)
+			if (opts->iocharset != default_iocharset)
 				kfree(opts->iocharset);
 			iocharset = match_strdup(&args[0]);
 			if (!iocharset)
@@ -1101,8 +1099,8 @@
 	}
 	/* UTF-8 doesn't provide FAT semantics */
 	if (!strcmp(opts->iocharset, "utf8")) {
-		printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
-		       " for FAT filesystems, filesystem will be case sensitive!\n");
+		printk(KERN_INFO "FAT: utf8 IO charset"
+		       " is used, filesystem will be case sensitive!\n");
 	}
 
 	if (opts->unicode_xlate)
@@ -1162,7 +1160,6 @@
 	int debug;
 	unsigned int media;
 	long error;
-	char buf[50];
 
 	sbi = kzalloc(sizeof(struct msdos_sb_info), GFP_KERNEL);
 	if (!sbi)
@@ -1372,10 +1369,18 @@
 	 */
 
 	error = -EINVAL;
-	sprintf(buf, "cp%d", sbi->options.codepage);
-	sbi->nls_disk = load_nls(buf);
+	
+	sbi->nls_disk = load_nls(sbi->options.codepage);
+	if (!sbi->nls_disk) {
+		/* Maybe the old non-standard codepage specification is used */
+		char buf[NLS_MAXNAMELEN];
+		snprintf(buf, sizeof(buf), "cp%s", sbi->options.codepage);
+		buf[sizeof(buf) - 1] = 0;
+		sbi->nls_disk = load_nls(buf);
+	}
+
 	if (!sbi->nls_disk) {
-		printk(KERN_ERR "FAT: codepage %s not found\n", buf);
+		printk(KERN_ERR "FAT: codepage %s not found\n", sbi->options.codepage);
 		goto out_fail;
 	}
 
@@ -1421,8 +1426,10 @@
 		unload_nls(sbi->nls_io);
 	if (sbi->nls_disk)
 		unload_nls(sbi->nls_disk);
-	if (sbi->options.iocharset != fat_default_iocharset)
+	if (sbi->options.iocharset != default_iocharset)
 		kfree(sbi->options.iocharset);
+	if (sbi->options.codepage != default_codepage)
+		kfree(sbi->options.codepage);
 	sb->s_fs_info = NULL;
 	kfree(sbi);
 	return error;
diff -ur linux-2.6.20.1/fs/isofs/inode.c linux-2.6.20.1.new/fs/isofs/inode.c
--- linux-2.6.20.1/fs/isofs/inode.c	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/isofs/inode.c	2007-03-15 19:34:11.000000000 +0500
@@ -773,7 +773,7 @@
 
 #ifdef CONFIG_JOLIET
 	if (joliet_level && opt.utf8 == 0) {
-		char * p = opt.iocharset ? opt.iocharset : CONFIG_NLS_DEFAULT;
+		char * p = opt.iocharset ? opt.iocharset : default_iocharset;
 		sbi->s_nls_iocharset = load_nls(p);
 		if (! sbi->s_nls_iocharset) {
 			/* Fail only if explicit charset specified */
diff -ur linux-2.6.20.1/fs/Kconfig linux-2.6.20.1.new/fs/Kconfig
--- linux-2.6.20.1/fs/Kconfig	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/Kconfig	2007-03-16 14:03:57.000000000 +0500
@@ -785,28 +785,6 @@
 	  To compile this as a module, choose M here: the module will be called
 	  vfat.
 
-config FAT_DEFAULT_CODEPAGE
-	int "Default codepage for FAT"
-	depends on MSDOS_FS || VFAT_FS
-	default 437
-	help
-	  This option should be set to the codepage of your FAT filesystems.
-	  It can be overridden with the "codepage" mount option.
-	  See <file:Documentation/filesystems/vfat.txt> for more information.
-
-config FAT_DEFAULT_IOCHARSET
-	string "Default iocharset for FAT"
-	depends on VFAT_FS
-	default "iso8859-1"
-	help
-	  Set this to the default input/output character set you'd
-	  like FAT to use. It should probably match the character set
-	  that most of your FAT filesystems use, and can be overridden
-	  with the "iocharset" mount option for FAT filesystems.
-	  Note that "utf8" is not recommended for FAT filesystems.
-	  If unsure, you shouldn't set "utf8" here.
-	  See <file:Documentation/filesystems/vfat.txt> for more information.
-
 config NTFS_FS
 	tristate "NTFS file system support"
 	select NLS
@@ -1830,35 +1808,6 @@
 	  To compile the SMB support as a module, choose M here: the module will
 	  be called smbfs.  Most people say N, however.
 
-config SMB_NLS_DEFAULT
-	bool "Use a default NLS"
-	depends on SMB_FS
-	help
-	  Enabling this will make smbfs use nls translations by default. You
-	  need to specify the local charset (CONFIG_NLS_DEFAULT) in the nls
-	  settings and you need to give the default nls for the SMB server as
-	  CONFIG_SMB_NLS_REMOTE.
-
-	  The nls settings can be changed at mount time, if your smbmount
-	  supports that, using the codepage and iocharset parameters.
-
-	  smbmount from samba 2.2.0 or later supports this.
-
-config SMB_NLS_REMOTE
-	string "Default Remote NLS Option"
-	depends on SMB_NLS_DEFAULT
-	default "cp437"
-	help
-	  This setting allows you to specify a default value for which
-	  codepage the server uses. If this field is left blank no
-	  translations will be done by default. The local codepage/charset
-	  default to CONFIG_NLS_DEFAULT.
-
-	  The nls settings can be changed at mount time, if your smbmount
-	  supports that, using the codepage and iocharset parameters.
-
-	  smbmount from samba 2.2.0 or later supports this.
-
 config CIFS
 	tristate "CIFS support (advanced network filesystem for Samba, Window and other CIFS compliant servers)"
 	depends on INET
diff -ur linux-2.6.20.1/fs/nls/Kconfig linux-2.6.20.1.new/fs/nls/Kconfig
--- linux-2.6.20.1/fs/nls/Kconfig	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/nls/Kconfig	2007-03-15 20:04:38.000000000 +0500
@@ -18,13 +18,13 @@
 	  will be called nls_base.
 
 config NLS_DEFAULT
-	string "Default NLS Option"
+	string "Default UNIX NLS Option"
 	depends on NLS
 	default "iso8859-1"
 	---help---
-	  The default NLS used when mounting file system. Note, that this is
-	  the NLS used by your console, not the NLS used by a specific file
-	  system (if different) to store data (filenames) on a disk.
+	  The default UNIX NLS used when mounting file system. Note, that
+	  this is the NLS expected by the "ls" command, ot the NLS used
+	  internally by a specific file system to store filenames on a disk.
 	  Currently, the valid values are:
 	  big5, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861,
 	  cp862, cp863, cp864, cp865, cp866, cp869, cp874, cp932, cp936,
@@ -37,6 +37,26 @@
 
 	  If unsure, specify it as "iso8859-1".
 
+config CODEPAGE_DEFAULT
+	string "Default MS-DOS NLS Option"
+	depends on NLS
+	default "cp437"
+	---help---
+	  The default MS-DOS NLS used when mounting file systems with MS-DOS
+	  origin. Note, that this is the NLS used internally by these file
+	  systems to store filenames on a disk or for transmission over a
+	  network, not the NLS expected by the "ls"  command.
+	  Currently, the valid values are:
+	  big5, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861,
+	  cp862, cp863, cp864, cp865, cp866, cp869, cp874, cp932, cp936,
+	  cp949, cp950, cp1251, cp1255, euc-jp, euc-kr, gb2312, iso8859-1,
+	  iso8859-2, iso8859-3, iso8859-4, iso8859-5, iso8859-6, iso8859-7,
+	  iso8859-8, iso8859-9, iso8859-13, iso8859-14, iso8859-15,
+	  koi8-r, koi8-ru, koi8-u, sjis, tis-620, utf8.
+	  If you specify a wrong value, mounting will fail.
+
+	  If unsure, specify it as "cp437".
+
 config NLS_CODEPAGE_437
 	tristate "Codepage 437 (United States, Canada)"
 	depends on NLS
diff -ur linux-2.6.20.1/fs/nls/nls_base.c linux-2.6.20.1.new/fs/nls/nls_base.c
--- linux-2.6.20.1/fs/nls/nls_base.c	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/nls/nls_base.c	2007-03-15 19:45:59.000000000 +0500
@@ -18,6 +18,9 @@
 #endif
 #include <linux/spinlock.h>
 
+char default_iocharset[NLS_MAXNAMELEN] = CONFIG_NLS_DEFAULT;
+char default_codepage[NLS_MAXNAMELEN] = CONFIG_CODEPAGE_DEFAULT;
+
 static struct nls_table default_table;
 static struct nls_table *tables = &default_table;
 static DEFINE_SPINLOCK(nls_lock);
@@ -474,7 +477,7 @@
 {
 	struct nls_table *default_nls;
 	
-	default_nls = load_nls(CONFIG_NLS_DEFAULT);
+	default_nls = load_nls(default_iocharset);
 	if (default_nls != NULL)
 		return default_nls;
 	else
@@ -490,5 +507,13 @@
 EXPORT_SYMBOL(utf8_mbstowcs);
 EXPORT_SYMBOL(utf8_wctomb);
 EXPORT_SYMBOL(utf8_wcstombs);
+EXPORT_SYMBOL(default_iocharset);
+EXPORT_SYMBOL(default_codepage);
+
+module_param_string(iocharset, default_iocharset, NLS_MAXNAMELEN, 0644);
+MODULE_PARM_DESC(iocharset, "Default UNIX filename charset");
+
+module_param_string(codepage, default_codepage, NLS_MAXNAMELEN, 0644);
+MODULE_PARM_DESC(codepage, "Default MS-DOS filename charset");
 
 MODULE_LICENSE("Dual BSD/GPL");
diff -ur linux-2.6.20.1/fs/smbfs/inode.c linux-2.6.20.1.new/fs/smbfs/inode.c
--- linux-2.6.20.1/fs/smbfs/inode.c	2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/smbfs/inode.c	2007-03-15 19:57:42.000000000 +0500
@@ -36,13 +36,6 @@
 #include "getopt.h"
 #include "proto.h"
 
-/* Always pick a default string */
-#ifdef CONFIG_SMB_NLS_REMOTE
-#define SMB_NLS_REMOTE CONFIG_SMB_NLS_REMOTE
-#else
-#define SMB_NLS_REMOTE ""
-#endif
-
 #define SMB_TTL_DEFAULT 1000
 
 static void smb_delete_inode(struct inode *);
@@ -396,11 +389,11 @@
 			break;
 		case 'i':
 			strlcpy(mnt->codepage.local_name, optarg, 
-				SMB_NLS_MAXNAMELEN);
+				NLS_MAXNAMELEN);
 			break;
 		case 'c':
 			strlcpy(mnt->codepage.remote_name, optarg,
-				SMB_NLS_MAXNAMELEN);
+				NLS_MAXNAMELEN);
 			break;
 		case 't':
 			mnt->ttl = value;
@@ -446,10 +439,8 @@
 	if (mnt->flags & SMB_MOUNT_DMODE)
 		seq_printf(s, ",dir_mode=%04o", mnt->dir_mode & S_IRWXUGO);
 
-	if (strcmp(mnt->codepage.local_name, CONFIG_NLS_DEFAULT))
-		seq_printf(s, ",iocharset=%s", mnt->codepage.local_name);
-	if (strcmp(mnt->codepage.remote_name, SMB_NLS_REMOTE))
-		seq_printf(s, ",codepage=%s", mnt->codepage.remote_name);
+	seq_printf(s, ",iocharset=%s", mnt->codepage.local_name);
+	seq_printf(s, ",codepage=%s", mnt->codepage.remote_name);
 
 	if (mnt->ttl != SMB_TTL_DEFAULT)
 		seq_printf(s, ",ttl=%d", mnt->ttl);
@@ -555,10 +546,10 @@
 	mnt = server->mnt;
 
 	memset(mnt, 0, sizeof(struct smb_mount_data_kernel));
-	strlcpy(mnt->codepage.local_name, CONFIG_NLS_DEFAULT,
-		SMB_NLS_MAXNAMELEN);
-	strlcpy(mnt->codepage.remote_name, SMB_NLS_REMOTE,
-		SMB_NLS_MAXNAMELEN);
+	strlcpy(mnt->codepage.local_name, default_iocharset,
+		NLS_MAXNAMELEN);
+	strlcpy(mnt->codepage.remote_name, default_codepage,
+		NLS_MAXNAMELEN);
 
 	mnt->ttl = SMB_TTL_DEFAULT;
 	if (ver == SMB_MOUNT_OLDVERSION) {



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-18 16:55 [PATCH] Sanitize filesystem NLS handling Alexander E. Patrakov
@ 2007-03-18 22:01 ` OGAWA Hirofumi
  2007-03-19  2:58   ` Alexander E. Patrakov
  2007-03-22 12:59 ` [PATCH] Sanitize filesystem NLS handling Roman Zippel
  2007-03-22 16:07 ` Alexander E. Patrakov
  2 siblings, 1 reply; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-18 22:01 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:

>  * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used 
> for this purpose. This is because the correct setting of both must match 
> the user's locale

The some filesystems want to use utf-8, and others don't want to use
utf-8, no?  And is it also true about some devices using vfat?

>  * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE 
> options into one, named CONFIG_CODEPAGE_DEFAULT. This is because the 
> correct setting of both must match the code page used by MS-DOS in the 
> user's country. For the same reason, CONFIG_SMB_NLS_DEFAULT is removed 
> (the only sane choice is "y")

No. Unfortunately the real is not simple like it in some case.

>  * Makes the FAT filesystem accept both the old-style "codepage=866" 
> mount option (which is inconsistent with other filesystems requiring a 
> codepage option) and the new-style "codepage=cp866" option. This is 
> necessary because CONFIG_CODEPAGE_DEFAULT must work for all filesystems 
> that use it

You allow to set any nls to codepage? If so, it is not good.

>  * Downgrades the UTF-8 FAT warning to a note, because, while using the 
> utf8 iocharset produces a case-sensitive FAT filesystem, other 
> iocharsets simply produce wrong characters, which is much worse

No, utf-8 makes completely wrong entry. It's more wrong than other nls.

>  * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at 
> runtime via the following mechanisms:

The configurable sounds sane, and it may help some case. But, it should
not be system global. At least, I think the default would be per-filesystem,
otherwise some configs seems to be needed for other filesystem after all.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-18 22:01 ` OGAWA Hirofumi
@ 2007-03-19  2:58   ` Alexander E. Patrakov
  2007-03-19  3:46     ` OGAWA Hirofumi
  2007-03-19  6:38     ` yes --help (was: [PATCH] Sanitize filesystem NLS handling) Bernd Eckenfels
  0 siblings, 2 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19  2:58 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

OGAWA Hirofumi wrote:
> "Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> 
>>  * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used 
>> for this purpose. This is because the correct setting of both must match 
>> the user's locale
> 
> The some filesystems want to use utf-8, and others don't want to use
> utf-8, no?  And is it also true about some devices using vfat?

Sorry, I can't parse this. Linux programs see filenames in the charset 
specified by the "iocharset" mount option or this default. If some filenames 
are in UTF-8 and some aren't, the "ls" command cannot show them all 
correctly on a properly configured (aka: correctly displays the output of 
"yes --help") terminal (because it assumes that all filenames are in the 
same charset as indicated by the output of "locale charmap"). IMHO, this is 
insane enough, and it makes sense  to disable this by default.

>>  * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE 
>> options into one, named CONFIG_CODEPAGE_DEFAULT. This is because the 
>> correct setting of both must match the code page used by MS-DOS in the 
>> user's country. For the same reason, CONFIG_SMB_NLS_DEFAULT is removed 
>> (the only sane choice is "y")
> 
> No. Unfortunately the real is not simple like it in some case.

More details please. Are you saying that for Japanese people the codepage 
for FAT and SMB filesystems is not the same? How do Microsoft products work 
then? What do they send over the wire?

>>  * Makes the FAT filesystem accept both the old-style "codepage=866" 
>> mount option (which is inconsistent with other filesystems requiring a 
>> codepage option) and the new-style "codepage=cp866" option. This is 
>> necessary because CONFIG_CODEPAGE_DEFAULT must work for all filesystems 
>> that use it
> 
> You allow to set any nls to codepage? If so, it is not good.

I did this because it involved less changes. Only FAT treats codepage as a 
number. All other filesystems already allow arbitrary NLS as a codepage 
mount parameter.

>>  * Downgrades the UTF-8 FAT warning to a note, because, while using the 
>> utf8 iocharset produces a case-sensitive FAT filesystem, other 
>> iocharsets simply produce wrong characters, which is much worse
> 
> No, utf-8 makes completely wrong entry. It's more wrong than other nls.

For any non-UTF-8 based locales, the other NLS is correct and utf8 indeed 
would produce completely wrong characters. But for UTF-8 based locales, utf8 
is the only correct iocharset.

And the downgraded warning is not for those who mis-use the utf8 iocharset 
in non-UTF-8 locales, they need a completely different wording: "your 
iocharset doesn't match the locale settings, non-ASCII characters will be 
completely wrong in filenames". Unfortunately, this condition is impossible 
to detect from within the kernel.

>>  * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at 
>> runtime via the following mechanisms:
> 
> The configurable sounds sane, and it may help some case. But, it should
> not be system global. At least, I think the default would be per-filesystem,
> otherwise some configs seems to be needed for other filesystem after all.

OK, now I see that your primary objection is to merging options, and 
disagree (incorrect locale setup on your side is suspected). For meaningful 
discussion, I want to see the following:

1) Output of "locale -a"
2) Output of "yes --help" from the same terminal
3) The correct iocharset and codepage for mounting FAT filesystems on USB 
flash drives that are known readable under Windows (here "correct" = "ls in 
this terminal shows filenames correctly").
4) The same for SMB filesystems.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19  2:58   ` Alexander E. Patrakov
@ 2007-03-19  3:46     ` OGAWA Hirofumi
  2007-03-19  4:38       ` Alexander E. Patrakov
  2007-03-19  6:38     ` yes --help (was: [PATCH] Sanitize filesystem NLS handling) Bernd Eckenfels
  1 sibling, 1 reply; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19  3:46 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:

>> You allow to set any nls to codepage? If so, it is not good.
>
> I did this because it involved less changes. Only FAT treats codepage as a 
> number. All other filesystems already allow arbitrary NLS as a codepage 
> mount parameter.

I'm saying here, it is not good for vfat.

>> No, utf-8 makes completely wrong entry. It's more wrong than other nls.
>
> For any non-UTF-8 based locales, the other NLS is correct and utf8 indeed 
> would produce completely wrong characters. But for UTF-8 based locales, utf8 
> is the only correct iocharset.

No, iocharset=utf8 is wrong always.

>>>  * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at 
>>> runtime via the following mechanisms:
>> 
>> The configurable sounds sane, and it may help some case. But, it should
>> not be system global. At least, I think the default would be per-filesystem,
>> otherwise some configs seems to be needed for other filesystem after all.
>
> OK, now I see that your primary objection is to merging options, and 
> disagree (incorrect locale setup on your side is suspected). For meaningful 
> discussion, I want to see the following:
>
> 1) Output of "locale -a"
> 2) Output of "yes --help" from the same terminal
> 3) The correct iocharset and codepage for mounting FAT filesystems on USB 
> flash drives that are known readable under Windows (here "correct" = "ls in 
> this terminal shows filenames correctly").
> 4) The same for SMB filesystems.

Ah, ok. I'm thinking one locale is not enough, at least for now.
Why I can't use utf8 for jfs or something, and use other nls for vfat?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19  3:46     ` OGAWA Hirofumi
@ 2007-03-19  4:38       ` Alexander E. Patrakov
  2007-03-19  5:38         ` OGAWA Hirofumi
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19  4:38 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

OGAWA Hirofumi wrote:
> "Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> 
>>> You allow to set any nls to codepage? If so, it is not good.
>> I did this because it involved less changes. Only FAT treats codepage as a 
>> number. All other filesystems already allow arbitrary NLS as a codepage 
>> mount parameter.
> 
> I'm saying here, it is not good for vfat.

All valid options (i.e. numbers) are still available. Prohibiting 
non-numbers is a lot of work (changing module_param_string to 
module_param_call and validating the passed string, and also changing the 
code for all filesystems so that they also validate manually-passed codepage 
string).

>>> No, utf-8 makes completely wrong entry. It's more wrong than other nls.
>> For any non-UTF-8 based locales, the other NLS is correct and utf8 indeed 
>> would produce completely wrong characters. But for UTF-8 based locales, utf8 
>> is the only correct iocharset.
> 
> No, iocharset=utf8 is wrong always.

Here we just disagree, but I think I can change your opinion by giving you a 
correctly configured UTF-8 Japanese system in the form of a LiveCD (note: 
the kernel doesn't have this patch there). If you can afford ~500 MB of 
downloads, please do this:

1) download http://ftp.lfs-matrix.net/pub/lfs-livecd/lfslivecd-x86-6.2-5.iso 
and burn it
2) write (from Windows, because it cannot be misconfigured) some files with 
Japanese filenames to your flash drive
3) boot the CD, type the following at the boot prompt: linux 
LANG=ja_JP.UTF-8 TZ=Asia/Tokyo
4) edit /etc/X11/xorg.conf (should be probably unneeded, but I ask just in 
case. If you have to edit xorg.conf, report a bug to me privately)
5) startx (to get a Japanese environment)
6) open the terminal
7) yes --help (are the proper Japanese characters displayed? If not, change 
the font in the menu to some Kochi family and report this to me as a bug - 
yes I know about the Japanese opposition to Unicode because of Han Unification)
8) mount -o iocharset=utf8,codepage=932 /dev/sda1 /mnt (here I assume that 
your flash drive appears as sda1, adjust as needed)
9) ls -l /mnt (do Japanese filenames appear correctly?)
10) umount /mnt

If you want to experiment with the more traditional ja_JP.eucjp locale, you 
are also welcome to do so (but then iocharset=euc-jp will be needed to show 
filenames correctly).

> Why I can't use utf8 for jfs or something, and use other nls for vfat?

Because you'll get a mismatch between the userspace locale and the iocharset 
used by the kernel in at least one of these two cases. The mismatch will 
result in the following:

1) Your system (e.g., the "ls" command) will not correctly interpret 
filenames written by known-good setups.
2) Known-good setups will not correctly interpret filenames written by your 
system.

What probably happened in your case is that you have no known-good jfs-based 
setup and are the only user of your jfs disk, and thus can't see (2). It 
looks like you don't use UTF-8 userspace locale. Then, the on-disk filenames 
on your jfs filesystem are wrong, but your system misinterprets them, and 
two errors seem to cancel each other (but this still doesn't make it a 
correct setup).

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19  4:38       ` Alexander E. Patrakov
@ 2007-03-19  5:38         ` OGAWA Hirofumi
  2007-03-19  5:49           ` Alexander E. Patrakov
  0 siblings, 1 reply; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19  5:38 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:

> OGAWA Hirofumi wrote:
>> "Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
>> 
>>>> You allow to set any nls to codepage? If so, it is not good.
>>> I did this because it involved less changes. Only FAT treats codepage as a 
>>> number. All other filesystems already allow arbitrary NLS as a codepage 
>>> mount parameter.
>> 
>> I'm saying here, it is not good for vfat.
>
> All valid options (i.e. numbers) are still available. Prohibiting 
> non-numbers is a lot of work (changing module_param_string to 
> module_param_call and validating the passed string, and also changing the 
> code for all filesystems so that they also validate manually-passed codepage 
> string).

Ok.

>>>> No, utf-8 makes completely wrong entry. It's more wrong than other nls.
>>> For any non-UTF-8 based locales, the other NLS is correct and utf8 indeed 
>>> would produce completely wrong characters. But for UTF-8 based locales, utf8 
>>> is the only correct iocharset.
>> 
>> No, iocharset=utf8 is wrong always.
>
> Here we just disagree, but I think I can change your opinion by giving you a 
> correctly configured UTF-8 Japanese system in the form of a LiveCD (note: 
> the kernel doesn't have this patch there). If you can afford ~500 MB of 
> downloads, please do this:

I don't care about "read", because it doesn't corrupt filesystem. I care
about only "write", because it can corrupt filesystem.

If it's read-only, I'll not care at all, and will agree.

>> Why I can't use utf8 for jfs or something, and use other nls for vfat?
>
> Because you'll get a mismatch between the userspace locale and the iocharset 
> used by the kernel in at least one of these two cases. The mismatch will 
> result in the following:
>
> 1) Your system (e.g., the "ls" command) will not correctly interpret 
> filenames written by known-good setups.
> 2) Known-good setups will not correctly interpret filenames written by your 
> system.
>
> What probably happened in your case is that you have no known-good jfs-based 
> setup and are the only user of your jfs disk, and thus can't see (2). It 
> looks like you don't use UTF-8 userspace locale. Then, the on-disk filenames 
> on your jfs filesystem are wrong, but your system misinterprets them, and 
> two errors seem to cancel each other (but this still doesn't make it a 
> correct setup).

All are user policy. The users can switch locale and
G_FILENAME_ENCODING and something else, some app can switch it even
runtime, and I think kernel shouldn't have user policy, right?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19  5:38         ` OGAWA Hirofumi
@ 2007-03-19  5:49           ` Alexander E. Patrakov
  2007-03-19 14:25             ` OGAWA Hirofumi
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19  5:49 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

OGAWA Hirofumi wrote:

> I don't care about "read", because it doesn't corrupt filesystem. I care
> about only "write", because it can corrupt filesystem.
> 
> If it's read-only, I'll not care at all, and will agree.

Here you are right, but please tell RedHat about this (and you'll be at 
least called an old-fashioned person). They (and from their initiative, 
almost everyone else) use UTF-8 based locales by default, and ONLY utf8 NLS 
gives correct characters in filenames in this case.

Besides, FS corruption is possible only if the user intentionally writes two 
files with names differing only in their case.


> All are user policy. The users can switch locale and
> G_FILENAME_ENCODING and something else, some app can switch it even
> runtime, and I think kernel shouldn't have user policy, right?

G_FILENAME_ENCODING is a Glib2-only heresy, please ignore it. The "ls" tool 
always assumes that file names are in the same encoding as the output of 
"locale charmap" command. The fact that the filenames (contrary to what 
Glib2 developers say) should be in the locale encoding becomes very obvious 
if one reads POSIX specifications for tar and cpio programs (otherwise, they 
won't talk about conversion errors).

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* yes --help (was: [PATCH] Sanitize filesystem NLS handling)
  2007-03-19  2:58   ` Alexander E. Patrakov
  2007-03-19  3:46     ` OGAWA Hirofumi
@ 2007-03-19  6:38     ` Bernd Eckenfels
  2007-03-19  8:14       ` Alexander E. Patrakov
  1 sibling, 1 reply; 23+ messages in thread
From: Bernd Eckenfels @ 2007-03-19  6:38 UTC (permalink / raw)
  To: linux-kernel

In article <45FDFC69.6020605@ums.usu.ru> you wrote:
> 2) Output of "yes --help" from the same terminal

Question: what do you expect?

#> yes --version
#yes (GNU coreutils) 5.2.1
#Written by David MacKenzie.
#
#Copyright (C) 2004 Free Software Foundation, Inc.
#This is free software; see the source for copying conditions.  There is NO
#warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: yes --help (was: [PATCH] Sanitize filesystem NLS handling)
  2007-03-19  6:38     ` yes --help (was: [PATCH] Sanitize filesystem NLS handling) Bernd Eckenfels
@ 2007-03-19  8:14       ` Alexander E. Patrakov
  0 siblings, 0 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19  8:14 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

Bernd Eckenfels wrote:
> In article <45FDFC69.6020605@ums.usu.ru> you wrote:
> > 2) Output of "yes --help" from the same terminal
>
> Question: what do you expect?

A Japanese version of the help text. Unfortunately, LKML rejected it as SPAM, 
so I'll redirect you to the graphical rendering of it:

http://www.linuxfromscratch.org/~alexander/test-results/yes--help-ja.png

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19  5:49           ` Alexander E. Patrakov
@ 2007-03-19 14:25             ` OGAWA Hirofumi
  2007-03-19 15:04               ` Alexander E. Patrakov
  2007-03-19 15:10               ` Alexander E. Patrakov
  0 siblings, 2 replies; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19 14:25 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:

> OGAWA Hirofumi wrote:
>
>> I don't care about "read", because it doesn't corrupt filesystem. I care
>> about only "write", because it can corrupt filesystem.
>> 
>> If it's read-only, I'll not care at all, and will agree.
>
> Here you are right, but please tell RedHat about this (and you'll be at 
> least called an old-fashioned person). They (and from their initiative, 
> almost everyone else) use UTF-8 based locales by default, and ONLY utf8 NLS 
> gives correct characters in filenames in this case.
>
> Besides, FS corruption is possible only if the user intentionally writes two 
> files with names differing only in their case.

No, the utf8 support of vfat is wrong. This is implementation
thing, and it is not recommended until it is fixed.

>> All are user policy. The users can switch locale and
>> G_FILENAME_ENCODING and something else, some app can switch it even
>> runtime, and I think kernel shouldn't have user policy, right?
>
> G_FILENAME_ENCODING is a Glib2-only heresy, please ignore it. The "ls" tool 
> always assumes that file names are in the same encoding as the output of 
> "locale charmap" command. The fact that the filenames (contrary to what 
> Glib2 developers say) should be in the locale encoding becomes very obvious 
> if one reads POSIX specifications for tar and cpio programs (otherwise, they 
> won't talk about conversion errors).

I'm talking about two filesystems on a system here, not two encoding
on one filesystem. You can change locale on each filesystems, or each
directory, of course if it's not vfat.

It's wrong? May or may not be true. It's the user policy.
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19 14:25             ` OGAWA Hirofumi
@ 2007-03-19 15:04               ` Alexander E. Patrakov
  2007-03-19 17:26                 ` OGAWA Hirofumi
  2007-03-19 15:10               ` Alexander E. Patrakov
  1 sibling, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19 15:04 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

OGAWA Hirofumi wrote:

> No, the utf8 support of vfat is wrong. This is implementation
> thing, and it is not recommended until it is fixed.

Could you please explain your specific problem with screenshots, preferrably 
by running my LiveCD or Debian Etch in an emulator such as qemu or VMware?

But, anyway, this is a separate issue that my patch doesn't attempt to 
correct. The conclusion so far is that we disagree, and that there are 
situations where using utf8 iocharset is the least of all evils, so the 
warning is not justified enough. Reproducible testcase:

1) Install MS Windows 2000 or XP (English aka Pan-European version is just fine)
2) When the installer asks about locale settings, choose that you (a) want 
to have Cyrillic group of languages, (b) that your locale/location is 
Russia, (c) use Russian as a language for non-Unicode programs, and (d) add 
Russian keyboard layout in addition to the existing English layout if the 
system doesn't do this for you.
3) Using Windows, create a file on a floppy or flash drive. Press F2 to 
rename it. Press Alt+Shift to switch to Russian keyboard layout. Type a new 
file name. Remember (or copy to paper) what the letters that make the file 
name look like in Windows.
4) Boot my LiveCD and choose "Russian (UTF-8)" from the locale dialog (this 
matches the setup found in all modern distros such as Fedora). On the second 
screen, confirm the default settings. (Alternatively, install Debian Etch 
and select Russian during installation).
5) Try to mount the floppy or flash drive to /mnt using various iocharsets. 
List the directory contents with "ls -l". You will find that utf8 is the 
only iocharset that gives the correct letters.

OTOH, if you select Russian (KOI8-R) or Russian (CP1251), you will find that 
the only iocharset giving the correct letters will be koi8-r or cp1251, 
respectively.

> I'm talking about two filesystems on a system here, not two encoding
> on one filesystem.

I am also talking about this. Mounting two filesystems with different 
iocharsets is insane, because this will result in one of the following outcomes:

1) "ls" will show wrong characters in filenames on one of the filesystems
2) one of the two filesystems will contain wrong on-disk data for filenames, 
that, when misinterpreted by mounting with wrong iocharset, results in 
seemingly-correct output, but is misunderstood by the properly set up 
reference implementation (that's what is likely to happen with jfs in your 
example).

Testcase for (2), to be performed with my LiveCD:

1) Boot the CD, select "Russian (KOI8-R)" from the locale dialog. Agree with 
the default settings on the second screen.
2) Mount an empty floppy or flash drive with the (wrong) "cp1251" iocharset.
3) Using the right Ctrl key to switch between between English and Russian 
keyboard layouts, create a file with Russian letters in its name under /mnt.
4) umount /mnt
5) Install Windows 2000 or XP with Russian-specific settings, as described 
above.
6) Look what Windows finds on the floppy or flash. It is not the same 
filename you typed, but a sort of garbage (mojibake).

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19 14:25             ` OGAWA Hirofumi
  2007-03-19 15:04               ` Alexander E. Patrakov
@ 2007-03-19 15:10               ` Alexander E. Patrakov
  2007-03-19 17:36                 ` OGAWA Hirofumi
  1 sibling, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19 15:10 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

OGAWA Hirofumi wrote:

> I'm talking about two filesystems on a system here, not two encoding
> on one filesystem. You can change locale on each filesystems, or each
> directory, of course if it's not vfat.

Note that you can still achieve this insane result by specifying iocharset 
manually for each mount. Only the defaults are changed, but many distros set 
the default iocharset to either iso8859-1 or utf8, both of which are wrong 
for you. So you won't notice any regressions after my patch :)

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19 15:04               ` Alexander E. Patrakov
@ 2007-03-19 17:26                 ` OGAWA Hirofumi
  2007-03-20  6:25                   ` Alexander E. Patrakov
  0 siblings, 1 reply; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19 17:26 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:

> But, anyway, this is a separate issue that my patch doesn't attempt to 
> correct. The conclusion so far is that we disagree, and that there are 
> situations where using utf8 iocharset is the least of all evils, so the 
> warning is not justified enough. Reproducible testcase:

Again, I don't care about read at all. And why don't you use "utf8"
option, instead of "iocharset=utf8". "iocharset=utf8" is warned until
it is fixed. The "utf8" also doesn't work correctly in some case though.

>> I'm talking about two filesystems on a system here, not two encoding
>> on one filesystem.
>
> I am also talking about this. Mounting two filesystems with different 
> iocharsets is insane, because this will result in one of the following outcomes:
>
> 1) "ls" will show wrong characters in filenames on one of the filesystems
> 2) one of the two filesystems will contain wrong on-disk data for filenames, 
> that, when misinterpreted by mounting with wrong iocharset, results in 
> seemingly-correct output, but is misunderstood by the properly set up 
> reference implementation (that's what is likely to happen with jfs in your 
> example).

Because you didn't change the locale. And it is your policy, right?
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19 15:10               ` Alexander E. Patrakov
@ 2007-03-19 17:36                 ` OGAWA Hirofumi
  0 siblings, 0 replies; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19 17:36 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:

> Note that you can still achieve this insane result by specifying iocharset 
> manually for each mount. Only the defaults are changed, but many distros set 
> the default iocharset to either iso8859-1 or utf8, both of which are wrong 
> for you. So you won't notice any regressions after my patch :)

Nope. In fedora, it uses NLS_DEFAULT="utf8" and FAT_DEFAULT_IOCHARSET="ascii".
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-19 17:26                 ` OGAWA Hirofumi
@ 2007-03-20  6:25                   ` Alexander E. Patrakov
  2007-03-21 16:50                     ` OGAWA Hirofumi
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-20  6:25 UTC (permalink / raw)
  To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

OGAWA Hirofumi wrote:
> "Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> 
>> But, anyway, this is a separate issue that my patch doesn't attempt to 
>> correct. The conclusion so far is that we disagree, and that there are 
>> situations where using utf8 iocharset is the least of all evils, so the 
>> warning is not justified enough. Reproducible testcase:
> 
> Again, I don't care about read at all. And why don't you use "utf8"
> option, instead of "iocharset=utf8". "iocharset=utf8" is warned until
> it is fixed. The "utf8" also doesn't work correctly in some case though.

Would it be OK for you if I add the mount-time check for iocharset=utf8 to 
the fat filesystem and silently replace this with the "utf8" option, instead 
of overly actively warning the users? This way, the sysfs option and the 
nls_base.iocharset module parameter will still work as I want.

>>> I'm talking about two filesystems on a system here, not two encoding
>>> on one filesystem.
>> I am also talking about this. Mounting two filesystems with different 
>> iocharsets is insane, because this will result in one of the following outcomes:
>>
>> 1) "ls" will show wrong characters in filenames on one of the filesystems
>> 2) one of the two filesystems will contain wrong on-disk data for filenames, 
>> that, when misinterpreted by mounting with wrong iocharset, results in 
>> seemingly-correct output, but is misunderstood by the properly set up 
>> reference implementation (that's what is likely to happen with jfs in your 
>> example).
> 
> Because you didn't change the locale. And it is your policy, right?

Yes. This is because I have some files with non-ASCII names in my home 
directory. Changing the locale would make these filenames look wrong until I 
change it back.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-20  6:25                   ` Alexander E. Patrakov
@ 2007-03-21 16:50                     ` OGAWA Hirofumi
  0 siblings, 0 replies; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-21 16:50 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:

> Would it be OK for you if I add the mount-time check for iocharset=utf8 to 
> the fat filesystem and silently replace this with the "utf8" option, instead 
> of overly actively warning the users? This way, the sysfs option and the 
> nls_base.iocharset module parameter will still work as I want.

Sounds good. But, how do you specify the iocharset's nls for utf8?

>> Because you didn't change the locale. And it is your policy, right?
>
> Yes. This is because I have some files with non-ASCII names in my home 
> directory. Changing the locale would make these filenames look wrong until I 
> change it back.

The _kernel_ has no reason to force one policy, and that's why I
suggested to add per-filesystem driver option (not per-super_block).
(e.g. nls provides interface for it, etc...)
-- 
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-18 16:55 [PATCH] Sanitize filesystem NLS handling Alexander E. Patrakov
  2007-03-18 22:01 ` OGAWA Hirofumi
@ 2007-03-22 12:59 ` Roman Zippel
  2007-03-22 13:22   ` Alexander E. Patrakov
  2007-03-22 16:07 ` Alexander E. Patrakov
  2 siblings, 1 reply; 23+ messages in thread
From: Roman Zippel @ 2007-03-22 12:59 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

Hi,

On Sun, 18 Mar 2007, Alexander E. Patrakov wrote:

> More details on what the patch does:
> 
> * Rewords the description of CONFIG_NLS_DEFAULT, because at some point in the
> past it confused some people
> * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used for
> this purpose. This is because the correct setting of both must match the
> user's locale
> * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE options
> into one, named CONFIG_CODEPAGE_DEFAULT. This is because the correct setting
> of both must match the code page used by MS-DOS in the user's country. For the
> same reason, CONFIG_SMB_NLS_DEFAULT is removed (the only sane choice is "y")
> * Makes the FAT filesystem accept both the old-style "codepage=866" mount
> option (which is inconsistent with other filesystems requiring a codepage
> option) and the new-style "codepage=cp866" option. This is necessary because
> CONFIG_CODEPAGE_DEFAULT must work for all filesystems that use it
> * Downgrades the UTF-8 FAT warning to a note, because, while using the utf8
> iocharset produces a case-sensitive FAT filesystem, other iocharsets simply
> produce wrong characters, which is much worse
> * Renames SMB_NLS_MAXNAMELEN to NLS_MAXNAMELEN, because it is also useful
> outside smbfs
> * Makes smbfs always output iocharset and codepage in /proc/mounts, as FAT
> does
> * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at runtime
> via the following mechanisms:

I agree that these parameters should be changable not just at compile time 
and the iocharset should be a global default, but the on disk encoding is 
often filesystem specific, so I'd rather keep this option per filesystem.

bye, Roman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-22 12:59 ` [PATCH] Sanitize filesystem NLS handling Roman Zippel
@ 2007-03-22 13:22   ` Alexander E. Patrakov
  2007-03-22 13:49     ` Roman Zippel
  0 siblings, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-22 13:22 UTC (permalink / raw)
  To: Roman Zippel; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

Roman Zippel wrote:

> I agree that these parameters should be changable not just at compile time 
> and the iocharset should be a global default, but the on disk encoding is 
> often filesystem specific, so I'd rather keep this option per filesystem.

You are right, the on-disk encoding is filesystem specific. E.g., it is 
always UCS-2 for CIFS, NTFS and VFAT (and my patch doesn't change this).

It is exposed as a mount parameter and kernel configuration option only for 
fat and smbfs (the two filesystems that my patch touches for this matter), 
and both of these filesystems come from DOS days, where there was one 
codepage for a given country. So, for me, it made sense to glue these two 
parameters, because they both should be set to this codepage. The contrary 
would (AFAIK) only make sense if one gets floppies from France, but 
communicates with Polish SMB servers. It is a very rare situation, isn't it? 
Or do you know any other example where it would clearly make sense to have 
different defaults for fat and smbfs?

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-22 13:22   ` Alexander E. Patrakov
@ 2007-03-22 13:49     ` Roman Zippel
  2007-03-22 13:59       ` Alexander E. Patrakov
  2007-03-22 14:58       ` Alexander E. Patrakov
  0 siblings, 2 replies; 23+ messages in thread
From: Roman Zippel @ 2007-03-22 13:49 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

Hi,

On Thu, 22 Mar 2007, Alexander E. Patrakov wrote:

> It is exposed as a mount parameter and kernel configuration option only for
> fat and smbfs (the two filesystems that my patch touches for this matter), and
> both of these filesystems come from DOS days, where there was one codepage for
> a given country. So, for me, it made sense to glue these two parameters,
> because they both should be set to this codepage. The contrary would (AFAIK)
> only make sense if one gets floppies from France, but communicates with Polish
> SMB servers. It is a very rare situation, isn't it? Or do you know any other
> example where it would clearly make sense to have different defaults for fat
> and smbfs?

hfs has a codepage option as well, but I don't know its default in the 
various countries, but it could be different from DOS.

bye, Roman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-22 13:49     ` Roman Zippel
@ 2007-03-22 13:59       ` Alexander E. Patrakov
  2007-03-22 15:45         ` Roman Zippel
  2007-03-22 14:58       ` Alexander E. Patrakov
  1 sibling, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-22 13:59 UTC (permalink / raw)
  To: Roman Zippel; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

Roman Zippel wrote:

> hfs has a codepage option as well, but I don't know its default in the 
> various countries, but it could be different from DOS.

Yes. Since this comes from Mac world, it definitely makes sense to add 
CONFIG_MAC_CODEPAGE_DEFAULT, the corresponding module parameter, and make 
hfs (FIXME: anything else?) use this. However, I have no hfs filesystems, 
and thus can't test my changes.

Help from anyone who understands hfs code is welcome. In particular, what 
currently happens by default WRT the codepage? Where is the code responsible 
for this?

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-22 13:49     ` Roman Zippel
  2007-03-22 13:59       ` Alexander E. Patrakov
@ 2007-03-22 14:58       ` Alexander E. Patrakov
  1 sibling, 0 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-22 14:58 UTC (permalink / raw)
  To: Roman Zippel; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

Roman Zippel wrote:

> hfs has a codepage option as well

Is it _currently_ useful at all? The problem is that the kernel has no nls 
modules for Mac codepages (e.g., MacRoman which is cp10000).

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-22 13:59       ` Alexander E. Patrakov
@ 2007-03-22 15:45         ` Roman Zippel
  0 siblings, 0 replies; 23+ messages in thread
From: Roman Zippel @ 2007-03-22 15:45 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers

Hi,

On Thu, 22 Mar 2007, Alexander E. Patrakov wrote:

> > hfs has a codepage option as well, but I don't know its default in the
> > various countries, but it could be different from DOS.
> 
> Yes. Since this comes from Mac world, it definitely makes sense to add
> CONFIG_MAC_CODEPAGE_DEFAULT, the corresponding module parameter, and make hfs
> (FIXME: anything else?) use this. However, I have no hfs filesystems, and thus
> can't test my changes.
> 
> Help from anyone who understands hfs code is welcome. In particular, what
> currently happens by default WRT the codepage? Where is the code responsible
> for this?

grep for nls_disk, the default simply does a direct mapping, so I don't 
really want to add another config option, but a module parameter would be 
useful.

bye, Roman

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Sanitize filesystem NLS handling
  2007-03-18 16:55 [PATCH] Sanitize filesystem NLS handling Alexander E. Patrakov
  2007-03-18 22:01 ` OGAWA Hirofumi
  2007-03-22 12:59 ` [PATCH] Sanitize filesystem NLS handling Roman Zippel
@ 2007-03-22 16:07 ` Alexander E. Patrakov
  2 siblings, 0 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-22 16:07 UTC (permalink / raw)
  To: LKML; +Cc: Andrew Morton, agalakhov, Kay Sievers

I wrote:

> More details on what the patch does:
> 
> * Rewords the description of CONFIG_NLS_DEFAULT, because at some point 
> in the past it confused some people
> * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used 
> for this purpose. This is because the correct setting of both must match 
> the user's locale
> * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE 
> options into one, named CONFIG_CODEPAGE_DEFAULT. This is because the 
> correct setting of both must match the code page used by MS-DOS in the 
> user's country. For the same reason, CONFIG_SMB_NLS_DEFAULT is removed 
> (the only sane choice is "y")
> * Makes the FAT filesystem accept both the old-style "codepage=866" 
> mount option (which is inconsistent with other filesystems requiring a 
> codepage option) and the new-style "codepage=cp866" option. This is 
> necessary because CONFIG_CODEPAGE_DEFAULT must work for all filesystems 
> that use it
> * Downgrades the UTF-8 FAT warning to a note, because, while using the 
> utf8 iocharset produces a case-sensitive FAT filesystem, other 
> iocharsets simply produce wrong characters, which is much worse
> * Renames SMB_NLS_MAXNAMELEN to NLS_MAXNAMELEN, because it is also 
> useful outside smbfs
> * Makes smbfs always output iocharset and codepage in /proc/mounts, as 
> FAT does
> * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at 
> runtime via the following mechanisms:

I take the patch back, please expect V2. It needs some work WRT hfs and 
ncpfs filesystems, as well as an improvement in handling of iocharset=utf8 
in the fat filesystem.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2007-03-22 16:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-18 16:55 [PATCH] Sanitize filesystem NLS handling Alexander E. Patrakov
2007-03-18 22:01 ` OGAWA Hirofumi
2007-03-19  2:58   ` Alexander E. Patrakov
2007-03-19  3:46     ` OGAWA Hirofumi
2007-03-19  4:38       ` Alexander E. Patrakov
2007-03-19  5:38         ` OGAWA Hirofumi
2007-03-19  5:49           ` Alexander E. Patrakov
2007-03-19 14:25             ` OGAWA Hirofumi
2007-03-19 15:04               ` Alexander E. Patrakov
2007-03-19 17:26                 ` OGAWA Hirofumi
2007-03-20  6:25                   ` Alexander E. Patrakov
2007-03-21 16:50                     ` OGAWA Hirofumi
2007-03-19 15:10               ` Alexander E. Patrakov
2007-03-19 17:36                 ` OGAWA Hirofumi
2007-03-19  6:38     ` yes --help (was: [PATCH] Sanitize filesystem NLS handling) Bernd Eckenfels
2007-03-19  8:14       ` Alexander E. Patrakov
2007-03-22 12:59 ` [PATCH] Sanitize filesystem NLS handling Roman Zippel
2007-03-22 13:22   ` Alexander E. Patrakov
2007-03-22 13:49     ` Roman Zippel
2007-03-22 13:59       ` Alexander E. Patrakov
2007-03-22 15:45         ` Roman Zippel
2007-03-22 14:58       ` Alexander E. Patrakov
2007-03-22 16:07 ` Alexander E. Patrakov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).