LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] Sanitize filesystem NLS handling
@ 2007-03-18 16:55 Alexander E. Patrakov
2007-03-18 22:01 ` OGAWA Hirofumi
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-18 16:55 UTC (permalink / raw)
To: LKML; +Cc: Andrew Morton, agalakhov, Kay Sievers
Let me quote from the current description of CONFIG_FAT_DEFAULT_CODEPAGE:
> This option should be set to the codepage of your FAT filesystems.
Do you see the word "your"? Compile-time settings with the word "your"
in their description are always a design bug.
As documented, this can be changed on per-mount basis with the
"codepage=..." mount option, and this works in /etc/fstab. The same
consideration (including the "works in /etc/fstab" note) applies to the
iocharset parameter. However, this becomes a big problem with HAL-based
automounting, because many agents (e.g., pmount and exo_mount from
xfce-4.4.0) have NO means of changing the mount options. This makes it
completely impossible to use non-English filenames with HAL-based
automounting and distro kernels. There are a lot of threads about this
in Russian forums, and the solution usually recommended is to uninstall HAL.
One could say that every backend must add the iocharset and codepage
mount option by itself, but code duplication between such backends is
never a good thing. The solution is to make the default iocharset and
codepage adjustable at runtime on the global basis, so that no explicit
mount options are needed. That's what the appended patch does.
More details on what the patch does:
* Rewords the description of CONFIG_NLS_DEFAULT, because at some point
in the past it confused some people
* Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used
for this purpose. This is because the correct setting of both must match
the user's locale
* Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE
options into one, named CONFIG_CODEPAGE_DEFAULT. This is because the
correct setting of both must match the code page used by MS-DOS in the
user's country. For the same reason, CONFIG_SMB_NLS_DEFAULT is removed
(the only sane choice is "y")
* Makes the FAT filesystem accept both the old-style "codepage=866"
mount option (which is inconsistent with other filesystems requiring a
codepage option) and the new-style "codepage=cp866" option. This is
necessary because CONFIG_CODEPAGE_DEFAULT must work for all filesystems
that use it
* Downgrades the UTF-8 FAT warning to a note, because, while using the
utf8 iocharset produces a case-sensitive FAT filesystem, other
iocharsets simply produce wrong characters, which is much worse
* Renames SMB_NLS_MAXNAMELEN to NLS_MAXNAMELEN, because it is also
useful outside smbfs
* Makes smbfs always output iocharset and codepage in /proc/mounts, as
FAT does
* Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at
runtime via the following mechanisms:
* If the nls_base.ko module is really a module, it is sufficient to
write in modprobe.conf something like this:
options nls_base iocharset=koi8-r codepage=cp866
* If it is not a module, add this to the kernel command line in GRUB:
nls_base.iocharset=koi8-r nls_base.codepage=cp866
* In both cases, writing into sysfs also works:
echo -n koi8-r >/sys/module/nls_base/parameters/iocharset
echo -n cp866 >/sys/module/nls_base/parameters/codepage
Target: 2.6.22
Signed-off-by: Alexander E. Patrakov
After applying, fix the defconfigs by removing
CONFIG_FAT_DEFAULT_IOCHARSET, CONFIG_FAT_DEFAULT_CODEPAGE,
CONFIG_SMB_NLS_REMOTE, CONFIG_SMB_NLS_DEFAULT, and adding
CONFIG_CODEPAGE_DEFAULT="cp437" (except for defconfig.m32104ut, where
this should be cp932). Also make sure that CONFIG_NLS_CODEPAGE_437 is
not disabled.
diff -ur linux-2.6.20.1/include/linux/msdos_fs.h linux-2.6.20.1.new/include/linux/msdos_fs.h
--- linux-2.6.20.1/include/linux/msdos_fs.h 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/include/linux/msdos_fs.h 2007-03-15 20:58:37.000000000 +0500
@@ -191,7 +191,7 @@
gid_t fs_gid;
unsigned short fs_fmask;
unsigned short fs_dmask;
- unsigned short codepage; /* Codepage for shortname conversions */
+ char *codepage; /* Codepage for shortname conversions */
char *iocharset; /* Charset used for filename input/display */
unsigned short shortname; /* flags for shortname display/create rule */
unsigned char name_check; /* r = relaxed, n = normal, s = strict */
diff -ur linux-2.6.20.1/include/linux/nls.h linux-2.6.20.1.new/include/linux/nls.h
--- linux-2.6.20.1/include/linux/nls.h 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/include/linux/nls.h 2007-03-15 19:44:40.000000000 +0500
@@ -3,6 +3,11 @@
#include <linux/init.h>
+#define NLS_MAXNAMELEN 20
+
+extern char default_iocharset[NLS_MAXNAMELEN];
+extern char default_codepage[NLS_MAXNAMELEN];
+
/* unicode character */
typedef __u16 wchar_t;
diff -ur linux-2.6.20.1/include/linux/smb.h linux-2.6.20.1.new/include/linux/smb.h
--- linux-2.6.20.1/include/linux/smb.h 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/include/linux/smb.h 2007-03-15 19:44:16.000000000 +0500
@@ -11,6 +11,7 @@
#include <linux/types.h>
#include <linux/magic.h>
+#include <linux/nls.h>
enum smb_protocol {
SMB_PROTOCOL_NONE,
@@ -63,10 +64,9 @@
#ifdef __KERNEL__
-#define SMB_NLS_MAXNAMELEN 20
struct smb_nls_codepage {
- char local_name[SMB_NLS_MAXNAMELEN];
- char remote_name[SMB_NLS_MAXNAMELEN];
+ char local_name[NLS_MAXNAMELEN];
+ char remote_name[NLS_MAXNAMELEN];
};
diff -ur linux-2.6.20.1/fs/fat/inode.c linux-2.6.20.1.new/fs/fat/inode.c
--- linux-2.6.20.1/fs/fat/inode.c 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/fat/inode.c 2007-03-15 21:49:09.000000000 +0500
@@ -27,15 +27,6 @@
#include <linux/writeback.h>
#include <asm/unaligned.h>
-#ifndef CONFIG_FAT_DEFAULT_IOCHARSET
-/* if user don't select VFAT, this is undefined. */
-#define CONFIG_FAT_DEFAULT_IOCHARSET ""
-#endif
-
-static int fat_default_codepage = CONFIG_FAT_DEFAULT_CODEPAGE;
-static char fat_default_iocharset[] = CONFIG_FAT_DEFAULT_IOCHARSET;
-
-
static int fat_add_cluster(struct inode *inode)
{
int err, cluster;
@@ -462,15 +453,18 @@
if (sbi->nls_disk) {
unload_nls(sbi->nls_disk);
sbi->nls_disk = NULL;
- sbi->options.codepage = fat_default_codepage;
}
if (sbi->nls_io) {
unload_nls(sbi->nls_io);
sbi->nls_io = NULL;
}
- if (sbi->options.iocharset != fat_default_iocharset) {
+ if (sbi->options.iocharset != default_iocharset) {
kfree(sbi->options.iocharset);
- sbi->options.iocharset = fat_default_iocharset;
+ sbi->options.iocharset = default_iocharset;
+ }
+ if (sbi->options.codepage != default_codepage) {
+ kfree(sbi->options.codepage);
+ sbi->options.codepage = default_codepage;
}
sb->s_fs_info = NULL;
@@ -869,7 +863,7 @@
{Opt_umask, "umask=%o"},
{Opt_dmask, "dmask=%o"},
{Opt_fmask, "fmask=%o"},
- {Opt_codepage, "codepage=%u"},
+ {Opt_codepage, "codepage=%s"},
{Opt_nocase, "nocase"},
{Opt_quiet, "quiet"},
{Opt_showexec, "showexec"},
@@ -933,14 +927,15 @@
substring_t args[MAX_OPT_ARGS];
int option;
char *iocharset;
+ char *codepage;
opts->isvfat = is_vfat;
opts->fs_uid = current->uid;
opts->fs_gid = current->gid;
opts->fs_fmask = opts->fs_dmask = current->fs->umask;
- opts->codepage = fat_default_codepage;
- opts->iocharset = fat_default_iocharset;
+ opts->codepage = default_codepage;
+ opts->iocharset = default_iocharset;
if (is_vfat)
opts->shortname = VFAT_SFN_DISPLAY_LOWER|VFAT_SFN_CREATE_WIN95;
else
@@ -1024,9 +1019,12 @@
opts->fs_fmask = option;
break;
case Opt_codepage:
- if (match_int(&args[0], &option))
- return 0;
- opts->codepage = option;
+ if (opts->codepage != default_codepage)
+ kfree(opts->codepage);
+ codepage = match_strdup(&args[0]);
+ if (!codepage)
+ return -ENOMEM;
+ opts->codepage = codepage;
break;
case Opt_flush:
opts->flush = 1;
@@ -1042,7 +1040,7 @@
/* vfat specific */
case Opt_charset:
- if (opts->iocharset != fat_default_iocharset)
+ if (opts->iocharset != default_iocharset)
kfree(opts->iocharset);
iocharset = match_strdup(&args[0]);
if (!iocharset)
@@ -1101,8 +1099,8 @@
}
/* UTF-8 doesn't provide FAT semantics */
if (!strcmp(opts->iocharset, "utf8")) {
- printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
- " for FAT filesystems, filesystem will be case sensitive!\n");
+ printk(KERN_INFO "FAT: utf8 IO charset"
+ " is used, filesystem will be case sensitive!\n");
}
if (opts->unicode_xlate)
@@ -1162,7 +1160,6 @@
int debug;
unsigned int media;
long error;
- char buf[50];
sbi = kzalloc(sizeof(struct msdos_sb_info), GFP_KERNEL);
if (!sbi)
@@ -1372,10 +1369,18 @@
*/
error = -EINVAL;
- sprintf(buf, "cp%d", sbi->options.codepage);
- sbi->nls_disk = load_nls(buf);
+
+ sbi->nls_disk = load_nls(sbi->options.codepage);
+ if (!sbi->nls_disk) {
+ /* Maybe the old non-standard codepage specification is used */
+ char buf[NLS_MAXNAMELEN];
+ snprintf(buf, sizeof(buf), "cp%s", sbi->options.codepage);
+ buf[sizeof(buf) - 1] = 0;
+ sbi->nls_disk = load_nls(buf);
+ }
+
if (!sbi->nls_disk) {
- printk(KERN_ERR "FAT: codepage %s not found\n", buf);
+ printk(KERN_ERR "FAT: codepage %s not found\n", sbi->options.codepage);
goto out_fail;
}
@@ -1421,8 +1426,10 @@
unload_nls(sbi->nls_io);
if (sbi->nls_disk)
unload_nls(sbi->nls_disk);
- if (sbi->options.iocharset != fat_default_iocharset)
+ if (sbi->options.iocharset != default_iocharset)
kfree(sbi->options.iocharset);
+ if (sbi->options.codepage != default_codepage)
+ kfree(sbi->options.codepage);
sb->s_fs_info = NULL;
kfree(sbi);
return error;
diff -ur linux-2.6.20.1/fs/isofs/inode.c linux-2.6.20.1.new/fs/isofs/inode.c
--- linux-2.6.20.1/fs/isofs/inode.c 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/isofs/inode.c 2007-03-15 19:34:11.000000000 +0500
@@ -773,7 +773,7 @@
#ifdef CONFIG_JOLIET
if (joliet_level && opt.utf8 == 0) {
- char * p = opt.iocharset ? opt.iocharset : CONFIG_NLS_DEFAULT;
+ char * p = opt.iocharset ? opt.iocharset : default_iocharset;
sbi->s_nls_iocharset = load_nls(p);
if (! sbi->s_nls_iocharset) {
/* Fail only if explicit charset specified */
diff -ur linux-2.6.20.1/fs/Kconfig linux-2.6.20.1.new/fs/Kconfig
--- linux-2.6.20.1/fs/Kconfig 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/Kconfig 2007-03-16 14:03:57.000000000 +0500
@@ -785,28 +785,6 @@
To compile this as a module, choose M here: the module will be called
vfat.
-config FAT_DEFAULT_CODEPAGE
- int "Default codepage for FAT"
- depends on MSDOS_FS || VFAT_FS
- default 437
- help
- This option should be set to the codepage of your FAT filesystems.
- It can be overridden with the "codepage" mount option.
- See <file:Documentation/filesystems/vfat.txt> for more information.
-
-config FAT_DEFAULT_IOCHARSET
- string "Default iocharset for FAT"
- depends on VFAT_FS
- default "iso8859-1"
- help
- Set this to the default input/output character set you'd
- like FAT to use. It should probably match the character set
- that most of your FAT filesystems use, and can be overridden
- with the "iocharset" mount option for FAT filesystems.
- Note that "utf8" is not recommended for FAT filesystems.
- If unsure, you shouldn't set "utf8" here.
- See <file:Documentation/filesystems/vfat.txt> for more information.
-
config NTFS_FS
tristate "NTFS file system support"
select NLS
@@ -1830,35 +1808,6 @@
To compile the SMB support as a module, choose M here: the module will
be called smbfs. Most people say N, however.
-config SMB_NLS_DEFAULT
- bool "Use a default NLS"
- depends on SMB_FS
- help
- Enabling this will make smbfs use nls translations by default. You
- need to specify the local charset (CONFIG_NLS_DEFAULT) in the nls
- settings and you need to give the default nls for the SMB server as
- CONFIG_SMB_NLS_REMOTE.
-
- The nls settings can be changed at mount time, if your smbmount
- supports that, using the codepage and iocharset parameters.
-
- smbmount from samba 2.2.0 or later supports this.
-
-config SMB_NLS_REMOTE
- string "Default Remote NLS Option"
- depends on SMB_NLS_DEFAULT
- default "cp437"
- help
- This setting allows you to specify a default value for which
- codepage the server uses. If this field is left blank no
- translations will be done by default. The local codepage/charset
- default to CONFIG_NLS_DEFAULT.
-
- The nls settings can be changed at mount time, if your smbmount
- supports that, using the codepage and iocharset parameters.
-
- smbmount from samba 2.2.0 or later supports this.
-
config CIFS
tristate "CIFS support (advanced network filesystem for Samba, Window and other CIFS compliant servers)"
depends on INET
diff -ur linux-2.6.20.1/fs/nls/Kconfig linux-2.6.20.1.new/fs/nls/Kconfig
--- linux-2.6.20.1/fs/nls/Kconfig 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/nls/Kconfig 2007-03-15 20:04:38.000000000 +0500
@@ -18,13 +18,13 @@
will be called nls_base.
config NLS_DEFAULT
- string "Default NLS Option"
+ string "Default UNIX NLS Option"
depends on NLS
default "iso8859-1"
---help---
- The default NLS used when mounting file system. Note, that this is
- the NLS used by your console, not the NLS used by a specific file
- system (if different) to store data (filenames) on a disk.
+ The default UNIX NLS used when mounting file system. Note, that
+ this is the NLS expected by the "ls" command, ot the NLS used
+ internally by a specific file system to store filenames on a disk.
Currently, the valid values are:
big5, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861,
cp862, cp863, cp864, cp865, cp866, cp869, cp874, cp932, cp936,
@@ -37,6 +37,26 @@
If unsure, specify it as "iso8859-1".
+config CODEPAGE_DEFAULT
+ string "Default MS-DOS NLS Option"
+ depends on NLS
+ default "cp437"
+ ---help---
+ The default MS-DOS NLS used when mounting file systems with MS-DOS
+ origin. Note, that this is the NLS used internally by these file
+ systems to store filenames on a disk or for transmission over a
+ network, not the NLS expected by the "ls" command.
+ Currently, the valid values are:
+ big5, cp437, cp737, cp775, cp850, cp852, cp855, cp857, cp860, cp861,
+ cp862, cp863, cp864, cp865, cp866, cp869, cp874, cp932, cp936,
+ cp949, cp950, cp1251, cp1255, euc-jp, euc-kr, gb2312, iso8859-1,
+ iso8859-2, iso8859-3, iso8859-4, iso8859-5, iso8859-6, iso8859-7,
+ iso8859-8, iso8859-9, iso8859-13, iso8859-14, iso8859-15,
+ koi8-r, koi8-ru, koi8-u, sjis, tis-620, utf8.
+ If you specify a wrong value, mounting will fail.
+
+ If unsure, specify it as "cp437".
+
config NLS_CODEPAGE_437
tristate "Codepage 437 (United States, Canada)"
depends on NLS
diff -ur linux-2.6.20.1/fs/nls/nls_base.c linux-2.6.20.1.new/fs/nls/nls_base.c
--- linux-2.6.20.1/fs/nls/nls_base.c 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/nls/nls_base.c 2007-03-15 19:45:59.000000000 +0500
@@ -18,6 +18,9 @@
#endif
#include <linux/spinlock.h>
+char default_iocharset[NLS_MAXNAMELEN] = CONFIG_NLS_DEFAULT;
+char default_codepage[NLS_MAXNAMELEN] = CONFIG_CODEPAGE_DEFAULT;
+
static struct nls_table default_table;
static struct nls_table *tables = &default_table;
static DEFINE_SPINLOCK(nls_lock);
@@ -474,7 +477,7 @@
{
struct nls_table *default_nls;
- default_nls = load_nls(CONFIG_NLS_DEFAULT);
+ default_nls = load_nls(default_iocharset);
if (default_nls != NULL)
return default_nls;
else
@@ -490,5 +507,13 @@
EXPORT_SYMBOL(utf8_mbstowcs);
EXPORT_SYMBOL(utf8_wctomb);
EXPORT_SYMBOL(utf8_wcstombs);
+EXPORT_SYMBOL(default_iocharset);
+EXPORT_SYMBOL(default_codepage);
+
+module_param_string(iocharset, default_iocharset, NLS_MAXNAMELEN, 0644);
+MODULE_PARM_DESC(iocharset, "Default UNIX filename charset");
+
+module_param_string(codepage, default_codepage, NLS_MAXNAMELEN, 0644);
+MODULE_PARM_DESC(codepage, "Default MS-DOS filename charset");
MODULE_LICENSE("Dual BSD/GPL");
diff -ur linux-2.6.20.1/fs/smbfs/inode.c linux-2.6.20.1.new/fs/smbfs/inode.c
--- linux-2.6.20.1/fs/smbfs/inode.c 2007-02-20 11:34:32.000000000 +0500
+++ linux-2.6.20.1.new/fs/smbfs/inode.c 2007-03-15 19:57:42.000000000 +0500
@@ -36,13 +36,6 @@
#include "getopt.h"
#include "proto.h"
-/* Always pick a default string */
-#ifdef CONFIG_SMB_NLS_REMOTE
-#define SMB_NLS_REMOTE CONFIG_SMB_NLS_REMOTE
-#else
-#define SMB_NLS_REMOTE ""
-#endif
-
#define SMB_TTL_DEFAULT 1000
static void smb_delete_inode(struct inode *);
@@ -396,11 +389,11 @@
break;
case 'i':
strlcpy(mnt->codepage.local_name, optarg,
- SMB_NLS_MAXNAMELEN);
+ NLS_MAXNAMELEN);
break;
case 'c':
strlcpy(mnt->codepage.remote_name, optarg,
- SMB_NLS_MAXNAMELEN);
+ NLS_MAXNAMELEN);
break;
case 't':
mnt->ttl = value;
@@ -446,10 +439,8 @@
if (mnt->flags & SMB_MOUNT_DMODE)
seq_printf(s, ",dir_mode=%04o", mnt->dir_mode & S_IRWXUGO);
- if (strcmp(mnt->codepage.local_name, CONFIG_NLS_DEFAULT))
- seq_printf(s, ",iocharset=%s", mnt->codepage.local_name);
- if (strcmp(mnt->codepage.remote_name, SMB_NLS_REMOTE))
- seq_printf(s, ",codepage=%s", mnt->codepage.remote_name);
+ seq_printf(s, ",iocharset=%s", mnt->codepage.local_name);
+ seq_printf(s, ",codepage=%s", mnt->codepage.remote_name);
if (mnt->ttl != SMB_TTL_DEFAULT)
seq_printf(s, ",ttl=%d", mnt->ttl);
@@ -555,10 +546,10 @@
mnt = server->mnt;
memset(mnt, 0, sizeof(struct smb_mount_data_kernel));
- strlcpy(mnt->codepage.local_name, CONFIG_NLS_DEFAULT,
- SMB_NLS_MAXNAMELEN);
- strlcpy(mnt->codepage.remote_name, SMB_NLS_REMOTE,
- SMB_NLS_MAXNAMELEN);
+ strlcpy(mnt->codepage.local_name, default_iocharset,
+ NLS_MAXNAMELEN);
+ strlcpy(mnt->codepage.remote_name, default_codepage,
+ NLS_MAXNAMELEN);
mnt->ttl = SMB_TTL_DEFAULT;
if (ver == SMB_MOUNT_OLDVERSION) {
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-18 16:55 [PATCH] Sanitize filesystem NLS handling Alexander E. Patrakov
@ 2007-03-18 22:01 ` OGAWA Hirofumi
2007-03-19 2:58 ` Alexander E. Patrakov
2007-03-22 12:59 ` [PATCH] Sanitize filesystem NLS handling Roman Zippel
2007-03-22 16:07 ` Alexander E. Patrakov
2 siblings, 1 reply; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-18 22:01 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used
> for this purpose. This is because the correct setting of both must match
> the user's locale
The some filesystems want to use utf-8, and others don't want to use
utf-8, no? And is it also true about some devices using vfat?
> * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE
> options into one, named CONFIG_CODEPAGE_DEFAULT. This is because the
> correct setting of both must match the code page used by MS-DOS in the
> user's country. For the same reason, CONFIG_SMB_NLS_DEFAULT is removed
> (the only sane choice is "y")
No. Unfortunately the real is not simple like it in some case.
> * Makes the FAT filesystem accept both the old-style "codepage=866"
> mount option (which is inconsistent with other filesystems requiring a
> codepage option) and the new-style "codepage=cp866" option. This is
> necessary because CONFIG_CODEPAGE_DEFAULT must work for all filesystems
> that use it
You allow to set any nls to codepage? If so, it is not good.
> * Downgrades the UTF-8 FAT warning to a note, because, while using the
> utf8 iocharset produces a case-sensitive FAT filesystem, other
> iocharsets simply produce wrong characters, which is much worse
No, utf-8 makes completely wrong entry. It's more wrong than other nls.
> * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at
> runtime via the following mechanisms:
The configurable sounds sane, and it may help some case. But, it should
not be system global. At least, I think the default would be per-filesystem,
otherwise some configs seems to be needed for other filesystem after all.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-18 22:01 ` OGAWA Hirofumi
@ 2007-03-19 2:58 ` Alexander E. Patrakov
2007-03-19 3:46 ` OGAWA Hirofumi
2007-03-19 6:38 ` yes --help (was: [PATCH] Sanitize filesystem NLS handling) Bernd Eckenfels
0 siblings, 2 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19 2:58 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
OGAWA Hirofumi wrote:
> "Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
>
>> * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used
>> for this purpose. This is because the correct setting of both must match
>> the user's locale
>
> The some filesystems want to use utf-8, and others don't want to use
> utf-8, no? And is it also true about some devices using vfat?
Sorry, I can't parse this. Linux programs see filenames in the charset
specified by the "iocharset" mount option or this default. If some filenames
are in UTF-8 and some aren't, the "ls" command cannot show them all
correctly on a properly configured (aka: correctly displays the output of
"yes --help") terminal (because it assumes that all filenames are in the
same charset as indicated by the output of "locale charmap"). IMHO, this is
insane enough, and it makes sense to disable this by default.
>> * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE
>> options into one, named CONFIG_CODEPAGE_DEFAULT. This is because the
>> correct setting of both must match the code page used by MS-DOS in the
>> user's country. For the same reason, CONFIG_SMB_NLS_DEFAULT is removed
>> (the only sane choice is "y")
>
> No. Unfortunately the real is not simple like it in some case.
More details please. Are you saying that for Japanese people the codepage
for FAT and SMB filesystems is not the same? How do Microsoft products work
then? What do they send over the wire?
>> * Makes the FAT filesystem accept both the old-style "codepage=866"
>> mount option (which is inconsistent with other filesystems requiring a
>> codepage option) and the new-style "codepage=cp866" option. This is
>> necessary because CONFIG_CODEPAGE_DEFAULT must work for all filesystems
>> that use it
>
> You allow to set any nls to codepage? If so, it is not good.
I did this because it involved less changes. Only FAT treats codepage as a
number. All other filesystems already allow arbitrary NLS as a codepage
mount parameter.
>> * Downgrades the UTF-8 FAT warning to a note, because, while using the
>> utf8 iocharset produces a case-sensitive FAT filesystem, other
>> iocharsets simply produce wrong characters, which is much worse
>
> No, utf-8 makes completely wrong entry. It's more wrong than other nls.
For any non-UTF-8 based locales, the other NLS is correct and utf8 indeed
would produce completely wrong characters. But for UTF-8 based locales, utf8
is the only correct iocharset.
And the downgraded warning is not for those who mis-use the utf8 iocharset
in non-UTF-8 locales, they need a completely different wording: "your
iocharset doesn't match the locale settings, non-ASCII characters will be
completely wrong in filenames". Unfortunately, this condition is impossible
to detect from within the kernel.
>> * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at
>> runtime via the following mechanisms:
>
> The configurable sounds sane, and it may help some case. But, it should
> not be system global. At least, I think the default would be per-filesystem,
> otherwise some configs seems to be needed for other filesystem after all.
OK, now I see that your primary objection is to merging options, and
disagree (incorrect locale setup on your side is suspected). For meaningful
discussion, I want to see the following:
1) Output of "locale -a"
2) Output of "yes --help" from the same terminal
3) The correct iocharset and codepage for mounting FAT filesystems on USB
flash drives that are known readable under Windows (here "correct" = "ls in
this terminal shows filenames correctly").
4) The same for SMB filesystems.
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 2:58 ` Alexander E. Patrakov
@ 2007-03-19 3:46 ` OGAWA Hirofumi
2007-03-19 4:38 ` Alexander E. Patrakov
2007-03-19 6:38 ` yes --help (was: [PATCH] Sanitize filesystem NLS handling) Bernd Eckenfels
1 sibling, 1 reply; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19 3:46 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
>> You allow to set any nls to codepage? If so, it is not good.
>
> I did this because it involved less changes. Only FAT treats codepage as a
> number. All other filesystems already allow arbitrary NLS as a codepage
> mount parameter.
I'm saying here, it is not good for vfat.
>> No, utf-8 makes completely wrong entry. It's more wrong than other nls.
>
> For any non-UTF-8 based locales, the other NLS is correct and utf8 indeed
> would produce completely wrong characters. But for UTF-8 based locales, utf8
> is the only correct iocharset.
No, iocharset=utf8 is wrong always.
>>> * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at
>>> runtime via the following mechanisms:
>>
>> The configurable sounds sane, and it may help some case. But, it should
>> not be system global. At least, I think the default would be per-filesystem,
>> otherwise some configs seems to be needed for other filesystem after all.
>
> OK, now I see that your primary objection is to merging options, and
> disagree (incorrect locale setup on your side is suspected). For meaningful
> discussion, I want to see the following:
>
> 1) Output of "locale -a"
> 2) Output of "yes --help" from the same terminal
> 3) The correct iocharset and codepage for mounting FAT filesystems on USB
> flash drives that are known readable under Windows (here "correct" = "ls in
> this terminal shows filenames correctly").
> 4) The same for SMB filesystems.
Ah, ok. I'm thinking one locale is not enough, at least for now.
Why I can't use utf8 for jfs or something, and use other nls for vfat?
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 3:46 ` OGAWA Hirofumi
@ 2007-03-19 4:38 ` Alexander E. Patrakov
2007-03-19 5:38 ` OGAWA Hirofumi
0 siblings, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19 4:38 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
OGAWA Hirofumi wrote:
> "Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
>
>>> You allow to set any nls to codepage? If so, it is not good.
>> I did this because it involved less changes. Only FAT treats codepage as a
>> number. All other filesystems already allow arbitrary NLS as a codepage
>> mount parameter.
>
> I'm saying here, it is not good for vfat.
All valid options (i.e. numbers) are still available. Prohibiting
non-numbers is a lot of work (changing module_param_string to
module_param_call and validating the passed string, and also changing the
code for all filesystems so that they also validate manually-passed codepage
string).
>>> No, utf-8 makes completely wrong entry. It's more wrong than other nls.
>> For any non-UTF-8 based locales, the other NLS is correct and utf8 indeed
>> would produce completely wrong characters. But for UTF-8 based locales, utf8
>> is the only correct iocharset.
>
> No, iocharset=utf8 is wrong always.
Here we just disagree, but I think I can change your opinion by giving you a
correctly configured UTF-8 Japanese system in the form of a LiveCD (note:
the kernel doesn't have this patch there). If you can afford ~500 MB of
downloads, please do this:
1) download http://ftp.lfs-matrix.net/pub/lfs-livecd/lfslivecd-x86-6.2-5.iso
and burn it
2) write (from Windows, because it cannot be misconfigured) some files with
Japanese filenames to your flash drive
3) boot the CD, type the following at the boot prompt: linux
LANG=ja_JP.UTF-8 TZ=Asia/Tokyo
4) edit /etc/X11/xorg.conf (should be probably unneeded, but I ask just in
case. If you have to edit xorg.conf, report a bug to me privately)
5) startx (to get a Japanese environment)
6) open the terminal
7) yes --help (are the proper Japanese characters displayed? If not, change
the font in the menu to some Kochi family and report this to me as a bug -
yes I know about the Japanese opposition to Unicode because of Han Unification)
8) mount -o iocharset=utf8,codepage=932 /dev/sda1 /mnt (here I assume that
your flash drive appears as sda1, adjust as needed)
9) ls -l /mnt (do Japanese filenames appear correctly?)
10) umount /mnt
If you want to experiment with the more traditional ja_JP.eucjp locale, you
are also welcome to do so (but then iocharset=euc-jp will be needed to show
filenames correctly).
> Why I can't use utf8 for jfs or something, and use other nls for vfat?
Because you'll get a mismatch between the userspace locale and the iocharset
used by the kernel in at least one of these two cases. The mismatch will
result in the following:
1) Your system (e.g., the "ls" command) will not correctly interpret
filenames written by known-good setups.
2) Known-good setups will not correctly interpret filenames written by your
system.
What probably happened in your case is that you have no known-good jfs-based
setup and are the only user of your jfs disk, and thus can't see (2). It
looks like you don't use UTF-8 userspace locale. Then, the on-disk filenames
on your jfs filesystem are wrong, but your system misinterprets them, and
two errors seem to cancel each other (but this still doesn't make it a
correct setup).
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 4:38 ` Alexander E. Patrakov
@ 2007-03-19 5:38 ` OGAWA Hirofumi
2007-03-19 5:49 ` Alexander E. Patrakov
0 siblings, 1 reply; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19 5:38 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> OGAWA Hirofumi wrote:
>> "Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
>>
>>>> You allow to set any nls to codepage? If so, it is not good.
>>> I did this because it involved less changes. Only FAT treats codepage as a
>>> number. All other filesystems already allow arbitrary NLS as a codepage
>>> mount parameter.
>>
>> I'm saying here, it is not good for vfat.
>
> All valid options (i.e. numbers) are still available. Prohibiting
> non-numbers is a lot of work (changing module_param_string to
> module_param_call and validating the passed string, and also changing the
> code for all filesystems so that they also validate manually-passed codepage
> string).
Ok.
>>>> No, utf-8 makes completely wrong entry. It's more wrong than other nls.
>>> For any non-UTF-8 based locales, the other NLS is correct and utf8 indeed
>>> would produce completely wrong characters. But for UTF-8 based locales, utf8
>>> is the only correct iocharset.
>>
>> No, iocharset=utf8 is wrong always.
>
> Here we just disagree, but I think I can change your opinion by giving you a
> correctly configured UTF-8 Japanese system in the form of a LiveCD (note:
> the kernel doesn't have this patch there). If you can afford ~500 MB of
> downloads, please do this:
I don't care about "read", because it doesn't corrupt filesystem. I care
about only "write", because it can corrupt filesystem.
If it's read-only, I'll not care at all, and will agree.
>> Why I can't use utf8 for jfs or something, and use other nls for vfat?
>
> Because you'll get a mismatch between the userspace locale and the iocharset
> used by the kernel in at least one of these two cases. The mismatch will
> result in the following:
>
> 1) Your system (e.g., the "ls" command) will not correctly interpret
> filenames written by known-good setups.
> 2) Known-good setups will not correctly interpret filenames written by your
> system.
>
> What probably happened in your case is that you have no known-good jfs-based
> setup and are the only user of your jfs disk, and thus can't see (2). It
> looks like you don't use UTF-8 userspace locale. Then, the on-disk filenames
> on your jfs filesystem are wrong, but your system misinterprets them, and
> two errors seem to cancel each other (but this still doesn't make it a
> correct setup).
All are user policy. The users can switch locale and
G_FILENAME_ENCODING and something else, some app can switch it even
runtime, and I think kernel shouldn't have user policy, right?
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 5:38 ` OGAWA Hirofumi
@ 2007-03-19 5:49 ` Alexander E. Patrakov
2007-03-19 14:25 ` OGAWA Hirofumi
0 siblings, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19 5:49 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
OGAWA Hirofumi wrote:
> I don't care about "read", because it doesn't corrupt filesystem. I care
> about only "write", because it can corrupt filesystem.
>
> If it's read-only, I'll not care at all, and will agree.
Here you are right, but please tell RedHat about this (and you'll be at
least called an old-fashioned person). They (and from their initiative,
almost everyone else) use UTF-8 based locales by default, and ONLY utf8 NLS
gives correct characters in filenames in this case.
Besides, FS corruption is possible only if the user intentionally writes two
files with names differing only in their case.
> All are user policy. The users can switch locale and
> G_FILENAME_ENCODING and something else, some app can switch it even
> runtime, and I think kernel shouldn't have user policy, right?
G_FILENAME_ENCODING is a Glib2-only heresy, please ignore it. The "ls" tool
always assumes that file names are in the same encoding as the output of
"locale charmap" command. The fact that the filenames (contrary to what
Glib2 developers say) should be in the locale encoding becomes very obvious
if one reads POSIX specifications for tar and cpio programs (otherwise, they
won't talk about conversion errors).
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* yes --help (was: [PATCH] Sanitize filesystem NLS handling)
2007-03-19 2:58 ` Alexander E. Patrakov
2007-03-19 3:46 ` OGAWA Hirofumi
@ 2007-03-19 6:38 ` Bernd Eckenfels
2007-03-19 8:14 ` Alexander E. Patrakov
1 sibling, 1 reply; 23+ messages in thread
From: Bernd Eckenfels @ 2007-03-19 6:38 UTC (permalink / raw)
To: linux-kernel
In article <45FDFC69.6020605@ums.usu.ru> you wrote:
> 2) Output of "yes --help" from the same terminal
Question: what do you expect?
#> yes --version
#yes (GNU coreutils) 5.2.1
#Written by David MacKenzie.
#
#Copyright (C) 2004 Free Software Foundation, Inc.
#This is free software; see the source for copying conditions. There is NO
#warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Greetings
Bernd
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: yes --help (was: [PATCH] Sanitize filesystem NLS handling)
2007-03-19 6:38 ` yes --help (was: [PATCH] Sanitize filesystem NLS handling) Bernd Eckenfels
@ 2007-03-19 8:14 ` Alexander E. Patrakov
0 siblings, 0 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19 8:14 UTC (permalink / raw)
To: Bernd Eckenfels; +Cc: linux-kernel
Bernd Eckenfels wrote:
> In article <45FDFC69.6020605@ums.usu.ru> you wrote:
> > 2) Output of "yes --help" from the same terminal
>
> Question: what do you expect?
A Japanese version of the help text. Unfortunately, LKML rejected it as SPAM,
so I'll redirect you to the graphical rendering of it:
http://www.linuxfromscratch.org/~alexander/test-results/yes--help-ja.png
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 5:49 ` Alexander E. Patrakov
@ 2007-03-19 14:25 ` OGAWA Hirofumi
2007-03-19 15:04 ` Alexander E. Patrakov
2007-03-19 15:10 ` Alexander E. Patrakov
0 siblings, 2 replies; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19 14:25 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> OGAWA Hirofumi wrote:
>
>> I don't care about "read", because it doesn't corrupt filesystem. I care
>> about only "write", because it can corrupt filesystem.
>>
>> If it's read-only, I'll not care at all, and will agree.
>
> Here you are right, but please tell RedHat about this (and you'll be at
> least called an old-fashioned person). They (and from their initiative,
> almost everyone else) use UTF-8 based locales by default, and ONLY utf8 NLS
> gives correct characters in filenames in this case.
>
> Besides, FS corruption is possible only if the user intentionally writes two
> files with names differing only in their case.
No, the utf8 support of vfat is wrong. This is implementation
thing, and it is not recommended until it is fixed.
>> All are user policy. The users can switch locale and
>> G_FILENAME_ENCODING and something else, some app can switch it even
>> runtime, and I think kernel shouldn't have user policy, right?
>
> G_FILENAME_ENCODING is a Glib2-only heresy, please ignore it. The "ls" tool
> always assumes that file names are in the same encoding as the output of
> "locale charmap" command. The fact that the filenames (contrary to what
> Glib2 developers say) should be in the locale encoding becomes very obvious
> if one reads POSIX specifications for tar and cpio programs (otherwise, they
> won't talk about conversion errors).
I'm talking about two filesystems on a system here, not two encoding
on one filesystem. You can change locale on each filesystems, or each
directory, of course if it's not vfat.
It's wrong? May or may not be true. It's the user policy.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 14:25 ` OGAWA Hirofumi
@ 2007-03-19 15:04 ` Alexander E. Patrakov
2007-03-19 17:26 ` OGAWA Hirofumi
2007-03-19 15:10 ` Alexander E. Patrakov
1 sibling, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19 15:04 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
OGAWA Hirofumi wrote:
> No, the utf8 support of vfat is wrong. This is implementation
> thing, and it is not recommended until it is fixed.
Could you please explain your specific problem with screenshots, preferrably
by running my LiveCD or Debian Etch in an emulator such as qemu or VMware?
But, anyway, this is a separate issue that my patch doesn't attempt to
correct. The conclusion so far is that we disagree, and that there are
situations where using utf8 iocharset is the least of all evils, so the
warning is not justified enough. Reproducible testcase:
1) Install MS Windows 2000 or XP (English aka Pan-European version is just fine)
2) When the installer asks about locale settings, choose that you (a) want
to have Cyrillic group of languages, (b) that your locale/location is
Russia, (c) use Russian as a language for non-Unicode programs, and (d) add
Russian keyboard layout in addition to the existing English layout if the
system doesn't do this for you.
3) Using Windows, create a file on a floppy or flash drive. Press F2 to
rename it. Press Alt+Shift to switch to Russian keyboard layout. Type a new
file name. Remember (or copy to paper) what the letters that make the file
name look like in Windows.
4) Boot my LiveCD and choose "Russian (UTF-8)" from the locale dialog (this
matches the setup found in all modern distros such as Fedora). On the second
screen, confirm the default settings. (Alternatively, install Debian Etch
and select Russian during installation).
5) Try to mount the floppy or flash drive to /mnt using various iocharsets.
List the directory contents with "ls -l". You will find that utf8 is the
only iocharset that gives the correct letters.
OTOH, if you select Russian (KOI8-R) or Russian (CP1251), you will find that
the only iocharset giving the correct letters will be koi8-r or cp1251,
respectively.
> I'm talking about two filesystems on a system here, not two encoding
> on one filesystem.
I am also talking about this. Mounting two filesystems with different
iocharsets is insane, because this will result in one of the following outcomes:
1) "ls" will show wrong characters in filenames on one of the filesystems
2) one of the two filesystems will contain wrong on-disk data for filenames,
that, when misinterpreted by mounting with wrong iocharset, results in
seemingly-correct output, but is misunderstood by the properly set up
reference implementation (that's what is likely to happen with jfs in your
example).
Testcase for (2), to be performed with my LiveCD:
1) Boot the CD, select "Russian (KOI8-R)" from the locale dialog. Agree with
the default settings on the second screen.
2) Mount an empty floppy or flash drive with the (wrong) "cp1251" iocharset.
3) Using the right Ctrl key to switch between between English and Russian
keyboard layouts, create a file with Russian letters in its name under /mnt.
4) umount /mnt
5) Install Windows 2000 or XP with Russian-specific settings, as described
above.
6) Look what Windows finds on the floppy or flash. It is not the same
filename you typed, but a sort of garbage (mojibake).
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 14:25 ` OGAWA Hirofumi
2007-03-19 15:04 ` Alexander E. Patrakov
@ 2007-03-19 15:10 ` Alexander E. Patrakov
2007-03-19 17:36 ` OGAWA Hirofumi
1 sibling, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-19 15:10 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
OGAWA Hirofumi wrote:
> I'm talking about two filesystems on a system here, not two encoding
> on one filesystem. You can change locale on each filesystems, or each
> directory, of course if it's not vfat.
Note that you can still achieve this insane result by specifying iocharset
manually for each mount. Only the defaults are changed, but many distros set
the default iocharset to either iso8859-1 or utf8, both of which are wrong
for you. So you won't notice any regressions after my patch :)
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 15:04 ` Alexander E. Patrakov
@ 2007-03-19 17:26 ` OGAWA Hirofumi
2007-03-20 6:25 ` Alexander E. Patrakov
0 siblings, 1 reply; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19 17:26 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> But, anyway, this is a separate issue that my patch doesn't attempt to
> correct. The conclusion so far is that we disagree, and that there are
> situations where using utf8 iocharset is the least of all evils, so the
> warning is not justified enough. Reproducible testcase:
Again, I don't care about read at all. And why don't you use "utf8"
option, instead of "iocharset=utf8". "iocharset=utf8" is warned until
it is fixed. The "utf8" also doesn't work correctly in some case though.
>> I'm talking about two filesystems on a system here, not two encoding
>> on one filesystem.
>
> I am also talking about this. Mounting two filesystems with different
> iocharsets is insane, because this will result in one of the following outcomes:
>
> 1) "ls" will show wrong characters in filenames on one of the filesystems
> 2) one of the two filesystems will contain wrong on-disk data for filenames,
> that, when misinterpreted by mounting with wrong iocharset, results in
> seemingly-correct output, but is misunderstood by the properly set up
> reference implementation (that's what is likely to happen with jfs in your
> example).
Because you didn't change the locale. And it is your policy, right?
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 15:10 ` Alexander E. Patrakov
@ 2007-03-19 17:36 ` OGAWA Hirofumi
0 siblings, 0 replies; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-19 17:36 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> Note that you can still achieve this insane result by specifying iocharset
> manually for each mount. Only the defaults are changed, but many distros set
> the default iocharset to either iso8859-1 or utf8, both of which are wrong
> for you. So you won't notice any regressions after my patch :)
Nope. In fedora, it uses NLS_DEFAULT="utf8" and FAT_DEFAULT_IOCHARSET="ascii".
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-19 17:26 ` OGAWA Hirofumi
@ 2007-03-20 6:25 ` Alexander E. Patrakov
2007-03-21 16:50 ` OGAWA Hirofumi
0 siblings, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-20 6:25 UTC (permalink / raw)
To: OGAWA Hirofumi; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
OGAWA Hirofumi wrote:
> "Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
>
>> But, anyway, this is a separate issue that my patch doesn't attempt to
>> correct. The conclusion so far is that we disagree, and that there are
>> situations where using utf8 iocharset is the least of all evils, so the
>> warning is not justified enough. Reproducible testcase:
>
> Again, I don't care about read at all. And why don't you use "utf8"
> option, instead of "iocharset=utf8". "iocharset=utf8" is warned until
> it is fixed. The "utf8" also doesn't work correctly in some case though.
Would it be OK for you if I add the mount-time check for iocharset=utf8 to
the fat filesystem and silently replace this with the "utf8" option, instead
of overly actively warning the users? This way, the sysfs option and the
nls_base.iocharset module parameter will still work as I want.
>>> I'm talking about two filesystems on a system here, not two encoding
>>> on one filesystem.
>> I am also talking about this. Mounting two filesystems with different
>> iocharsets is insane, because this will result in one of the following outcomes:
>>
>> 1) "ls" will show wrong characters in filenames on one of the filesystems
>> 2) one of the two filesystems will contain wrong on-disk data for filenames,
>> that, when misinterpreted by mounting with wrong iocharset, results in
>> seemingly-correct output, but is misunderstood by the properly set up
>> reference implementation (that's what is likely to happen with jfs in your
>> example).
>
> Because you didn't change the locale. And it is your policy, right?
Yes. This is because I have some files with non-ASCII names in my home
directory. Changing the locale would make these filenames look wrong until I
change it back.
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-20 6:25 ` Alexander E. Patrakov
@ 2007-03-21 16:50 ` OGAWA Hirofumi
0 siblings, 0 replies; 23+ messages in thread
From: OGAWA Hirofumi @ 2007-03-21 16:50 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:
> Would it be OK for you if I add the mount-time check for iocharset=utf8 to
> the fat filesystem and silently replace this with the "utf8" option, instead
> of overly actively warning the users? This way, the sysfs option and the
> nls_base.iocharset module parameter will still work as I want.
Sounds good. But, how do you specify the iocharset's nls for utf8?
>> Because you didn't change the locale. And it is your policy, right?
>
> Yes. This is because I have some files with non-ASCII names in my home
> directory. Changing the locale would make these filenames look wrong until I
> change it back.
The _kernel_ has no reason to force one policy, and that's why I
suggested to add per-filesystem driver option (not per-super_block).
(e.g. nls provides interface for it, etc...)
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-18 16:55 [PATCH] Sanitize filesystem NLS handling Alexander E. Patrakov
2007-03-18 22:01 ` OGAWA Hirofumi
@ 2007-03-22 12:59 ` Roman Zippel
2007-03-22 13:22 ` Alexander E. Patrakov
2007-03-22 16:07 ` Alexander E. Patrakov
2 siblings, 1 reply; 23+ messages in thread
From: Roman Zippel @ 2007-03-22 12:59 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
Hi,
On Sun, 18 Mar 2007, Alexander E. Patrakov wrote:
> More details on what the patch does:
>
> * Rewords the description of CONFIG_NLS_DEFAULT, because at some point in the
> past it confused some people
> * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used for
> this purpose. This is because the correct setting of both must match the
> user's locale
> * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE options
> into one, named CONFIG_CODEPAGE_DEFAULT. This is because the correct setting
> of both must match the code page used by MS-DOS in the user's country. For the
> same reason, CONFIG_SMB_NLS_DEFAULT is removed (the only sane choice is "y")
> * Makes the FAT filesystem accept both the old-style "codepage=866" mount
> option (which is inconsistent with other filesystems requiring a codepage
> option) and the new-style "codepage=cp866" option. This is necessary because
> CONFIG_CODEPAGE_DEFAULT must work for all filesystems that use it
> * Downgrades the UTF-8 FAT warning to a note, because, while using the utf8
> iocharset produces a case-sensitive FAT filesystem, other iocharsets simply
> produce wrong characters, which is much worse
> * Renames SMB_NLS_MAXNAMELEN to NLS_MAXNAMELEN, because it is also useful
> outside smbfs
> * Makes smbfs always output iocharset and codepage in /proc/mounts, as FAT
> does
> * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at runtime
> via the following mechanisms:
I agree that these parameters should be changable not just at compile time
and the iocharset should be a global default, but the on disk encoding is
often filesystem specific, so I'd rather keep this option per filesystem.
bye, Roman
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-22 12:59 ` [PATCH] Sanitize filesystem NLS handling Roman Zippel
@ 2007-03-22 13:22 ` Alexander E. Patrakov
2007-03-22 13:49 ` Roman Zippel
0 siblings, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-22 13:22 UTC (permalink / raw)
To: Roman Zippel; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
Roman Zippel wrote:
> I agree that these parameters should be changable not just at compile time
> and the iocharset should be a global default, but the on disk encoding is
> often filesystem specific, so I'd rather keep this option per filesystem.
You are right, the on-disk encoding is filesystem specific. E.g., it is
always UCS-2 for CIFS, NTFS and VFAT (and my patch doesn't change this).
It is exposed as a mount parameter and kernel configuration option only for
fat and smbfs (the two filesystems that my patch touches for this matter),
and both of these filesystems come from DOS days, where there was one
codepage for a given country. So, for me, it made sense to glue these two
parameters, because they both should be set to this codepage. The contrary
would (AFAIK) only make sense if one gets floppies from France, but
communicates with Polish SMB servers. It is a very rare situation, isn't it?
Or do you know any other example where it would clearly make sense to have
different defaults for fat and smbfs?
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-22 13:22 ` Alexander E. Patrakov
@ 2007-03-22 13:49 ` Roman Zippel
2007-03-22 13:59 ` Alexander E. Patrakov
2007-03-22 14:58 ` Alexander E. Patrakov
0 siblings, 2 replies; 23+ messages in thread
From: Roman Zippel @ 2007-03-22 13:49 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
Hi,
On Thu, 22 Mar 2007, Alexander E. Patrakov wrote:
> It is exposed as a mount parameter and kernel configuration option only for
> fat and smbfs (the two filesystems that my patch touches for this matter), and
> both of these filesystems come from DOS days, where there was one codepage for
> a given country. So, for me, it made sense to glue these two parameters,
> because they both should be set to this codepage. The contrary would (AFAIK)
> only make sense if one gets floppies from France, but communicates with Polish
> SMB servers. It is a very rare situation, isn't it? Or do you know any other
> example where it would clearly make sense to have different defaults for fat
> and smbfs?
hfs has a codepage option as well, but I don't know its default in the
various countries, but it could be different from DOS.
bye, Roman
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-22 13:49 ` Roman Zippel
@ 2007-03-22 13:59 ` Alexander E. Patrakov
2007-03-22 15:45 ` Roman Zippel
2007-03-22 14:58 ` Alexander E. Patrakov
1 sibling, 1 reply; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-22 13:59 UTC (permalink / raw)
To: Roman Zippel; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
Roman Zippel wrote:
> hfs has a codepage option as well, but I don't know its default in the
> various countries, but it could be different from DOS.
Yes. Since this comes from Mac world, it definitely makes sense to add
CONFIG_MAC_CODEPAGE_DEFAULT, the corresponding module parameter, and make
hfs (FIXME: anything else?) use this. However, I have no hfs filesystems,
and thus can't test my changes.
Help from anyone who understands hfs code is welcome. In particular, what
currently happens by default WRT the codepage? Where is the code responsible
for this?
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-22 13:49 ` Roman Zippel
2007-03-22 13:59 ` Alexander E. Patrakov
@ 2007-03-22 14:58 ` Alexander E. Patrakov
1 sibling, 0 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-22 14:58 UTC (permalink / raw)
To: Roman Zippel; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
Roman Zippel wrote:
> hfs has a codepage option as well
Is it _currently_ useful at all? The problem is that the kernel has no nls
modules for Mac codepages (e.g., MacRoman which is cp10000).
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-22 13:59 ` Alexander E. Patrakov
@ 2007-03-22 15:45 ` Roman Zippel
0 siblings, 0 replies; 23+ messages in thread
From: Roman Zippel @ 2007-03-22 15:45 UTC (permalink / raw)
To: Alexander E. Patrakov; +Cc: LKML, Andrew Morton, agalakhov, Kay Sievers
Hi,
On Thu, 22 Mar 2007, Alexander E. Patrakov wrote:
> > hfs has a codepage option as well, but I don't know its default in the
> > various countries, but it could be different from DOS.
>
> Yes. Since this comes from Mac world, it definitely makes sense to add
> CONFIG_MAC_CODEPAGE_DEFAULT, the corresponding module parameter, and make hfs
> (FIXME: anything else?) use this. However, I have no hfs filesystems, and thus
> can't test my changes.
>
> Help from anyone who understands hfs code is welcome. In particular, what
> currently happens by default WRT the codepage? Where is the code responsible
> for this?
grep for nls_disk, the default simply does a direct mapping, so I don't
really want to add another config option, but a module parameter would be
useful.
bye, Roman
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] Sanitize filesystem NLS handling
2007-03-18 16:55 [PATCH] Sanitize filesystem NLS handling Alexander E. Patrakov
2007-03-18 22:01 ` OGAWA Hirofumi
2007-03-22 12:59 ` [PATCH] Sanitize filesystem NLS handling Roman Zippel
@ 2007-03-22 16:07 ` Alexander E. Patrakov
2 siblings, 0 replies; 23+ messages in thread
From: Alexander E. Patrakov @ 2007-03-22 16:07 UTC (permalink / raw)
To: LKML; +Cc: Andrew Morton, agalakhov, Kay Sievers
I wrote:
> More details on what the patch does:
>
> * Rewords the description of CONFIG_NLS_DEFAULT, because at some point
> in the past it confused some people
> * Removes CONFIG_FAT_DEFAULT_IOCHARSET, now CONFIG_NLS_DEFAULT is used
> for this purpose. This is because the correct setting of both must match
> the user's locale
> * Merges the two CONFIG_SMB_NLS_REMOTE and CONFIG_FAT_DEFAULT_CODEPAGE
> options into one, named CONFIG_CODEPAGE_DEFAULT. This is because the
> correct setting of both must match the code page used by MS-DOS in the
> user's country. For the same reason, CONFIG_SMB_NLS_DEFAULT is removed
> (the only sane choice is "y")
> * Makes the FAT filesystem accept both the old-style "codepage=866"
> mount option (which is inconsistent with other filesystems requiring a
> codepage option) and the new-style "codepage=cp866" option. This is
> necessary because CONFIG_CODEPAGE_DEFAULT must work for all filesystems
> that use it
> * Downgrades the UTF-8 FAT warning to a note, because, while using the
> utf8 iocharset produces a case-sensitive FAT filesystem, other
> iocharsets simply produce wrong characters, which is much worse
> * Renames SMB_NLS_MAXNAMELEN to NLS_MAXNAMELEN, because it is also
> useful outside smbfs
> * Makes smbfs always output iocharset and codepage in /proc/mounts, as
> FAT does
> * Makes CONFIG_NLS_DEFAULT and CONFIG_CODEPAGE_DEFAULT adjustable at
> runtime via the following mechanisms:
I take the patch back, please expect V2. It needs some work WRT hfs and
ncpfs filesystems, as well as an improvement in handling of iocharset=utf8
in the fat filesystem.
--
Alexander E. Patrakov
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2007-03-22 16:07 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-18 16:55 [PATCH] Sanitize filesystem NLS handling Alexander E. Patrakov
2007-03-18 22:01 ` OGAWA Hirofumi
2007-03-19 2:58 ` Alexander E. Patrakov
2007-03-19 3:46 ` OGAWA Hirofumi
2007-03-19 4:38 ` Alexander E. Patrakov
2007-03-19 5:38 ` OGAWA Hirofumi
2007-03-19 5:49 ` Alexander E. Patrakov
2007-03-19 14:25 ` OGAWA Hirofumi
2007-03-19 15:04 ` Alexander E. Patrakov
2007-03-19 17:26 ` OGAWA Hirofumi
2007-03-20 6:25 ` Alexander E. Patrakov
2007-03-21 16:50 ` OGAWA Hirofumi
2007-03-19 15:10 ` Alexander E. Patrakov
2007-03-19 17:36 ` OGAWA Hirofumi
2007-03-19 6:38 ` yes --help (was: [PATCH] Sanitize filesystem NLS handling) Bernd Eckenfels
2007-03-19 8:14 ` Alexander E. Patrakov
2007-03-22 12:59 ` [PATCH] Sanitize filesystem NLS handling Roman Zippel
2007-03-22 13:22 ` Alexander E. Patrakov
2007-03-22 13:49 ` Roman Zippel
2007-03-22 13:59 ` Alexander E. Patrakov
2007-03-22 15:45 ` Roman Zippel
2007-03-22 14:58 ` Alexander E. Patrakov
2007-03-22 16:07 ` Alexander E. Patrakov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).