Linux-Fsdevel Archive on lore.kernel.org
help / color / mirror / Atom feed
From: "Weber, Olaf (HPC Data Management & Storage)" <olaf.weber@hpe.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Cc: "tytso@mit.edu" <tytso@mit.edu>,
	"david@fromorbit.com" <david@fromorbit.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"kernel@lists.collabora.co.uk" <kernel@lists.collabora.co.uk>,
	"alvaro.soliverez@collabora.co.uk"
	<alvaro.soliverez@collabora.co.uk>
Subject: RE: [PATCH RFC 03/13] charsets: utf8: Add unicode character database files
Date: Fri, 12 Jan 2018 20:29:01 +0000	[thread overview]
Message-ID: <DF4PR8401MB081230306211258EA6D8926385170@DF4PR8401MB0812.NAMPRD84.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <20180112165919.GB5594@magnolia>

> -----Original Message-----
> From: Darrick J. Wong [mailto:darrick.wong@oracle.com]
> Sent: Friday, January 12, 2018 17:59
> To: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
> Cc: tytso@mit.edu; david@fromorbit.com; bpm@sgi.com; olaf@sgi.com;
> linux-ext4@vger.kernel.org; linux-fsdevel@vger.kernel.org;
> kernel@lists.collabora.co.uk; alvaro.soliverez@collabora.co.uk
> Subject: Re: [PATCH RFC 03/13] charsets: utf8: Add unicode character
> database files
> 
> On Fri, Jan 12, 2018 at 05:12:24AM -0200, Gabriel Krisman Bertazi wrote:
> > From: Olaf Weber <olaf@sgi.com>
> >
> > Add files from the Unicode Character Database, version 7.0.0, to the
> source.
> > A helper program that generates a trie used for normalization from
> > these files is part of a separate commit.
> >
> > Signed-off-by: Olaf Weber <olaf@sgi.com>
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
> >   [Move ucd directory to lib/charsets]
> > ---
> >  lib/charsets/ucd/README | 33 +++++++++++++++++++++++++++++++++
> >  1 file changed, 33 insertions(+)
> >  create mode 100644 lib/charsets/ucd/README
> >
> > diff --git a/lib/charsets/ucd/README b/lib/charsets/ucd/README new
> > file mode 100644 index 000000000000..d713e663cdf9
> > --- /dev/null
> > +++ b/lib/charsets/ucd/README
> > @@ -0,0 +1,33 @@
> > +The files in this directory are part of the Unicode Character
> > +Database for version 7.0.0 of the Unicode standard.
> > +
> > +The full set of files can be found here:
> > +
> > +  http://www.unicode.org/Public/7.0.0/ucd/
> > +
> > +The latest released version of the UCD can be found here:
> > +
> > +  http://www.unicode.org/Public/UCD/latest/
> > +
> > +The files in this directory are identical, except that they have been
> > +renamed with a suffix indicating the unicode version.
> > +
> > +Individual source links:
> > +
> > +  http://www.unicode.org/Public/7.0.0/ucd/CaseFolding.txt
> > +  http://www.unicode.org/Public/7.0.0/ucd/DerivedAge.txt
> > +
> > +
> http://www.unicode.org/Public/7.0.0/ucd/extracted/DerivedCombiningCl
> > + ass.txt
> > + http://www.unicode.org/Public/7.0.0/ucd/DerivedCoreProperties.txt
> > +
> > + http://www.unicode.org/Public/7.0.0/ucd/NormalizationCorrections.txt
> > +  http://www.unicode.org/Public/7.0.0/ucd/NormalizationTest.txt
> > +  http://www.unicode.org/Public/7.0.0/ucd/UnicodeData.txt
> > +
> > +md5sums
> > +
> > +  9a92b2bfe56c6719def926bab524fefd  CaseFolding-7.0.0.txt
> > + 07b8b1027eb824cf0835314e94f23d2e  DerivedAge-7.0.0.txt
> > +  90c3340b16821e2f2153acdbe6fc6180  DerivedCombiningClass-7.0.0.txt
> > +  c41c0601f808116f623de47110ed4f93  DerivedCoreProperties-7.0.0.txt
> > + 522720ddfc150d8e63a2518634829bce  NormalizationCorrections-7.0.0.txt
> > +  1f35175eba4a2ad795db489f789ae352  NormalizationTest-7.0.0.txt
> > + c8355655731d75e6a3de8c20d7e601ba  UnicodeData-7.0.0.txt
> 
> Uh... are these files supposed to be attached to this patch?

Actually, no, as was explained in the 1st message:

" Like the original submission from Ben, I excluded the commit that includes the
" generated header file and unicode files because they are too big and would
" bounce the list.  Instead, instructions on fetching and generating the files are
" documented in the commit message.

One issue we (SGI) anticipated is that we were proposing the inclusion of a large binary blob into
the kernel. And people here do dislike opaque binary blobs. So instead we proposed including the
program that generated the blob in question plus the source files it uses. On the one hand, a
sizable increase of the kernel source tree, on the other hand, no argument about the provenance
of the blob as both source and generator are right there.

An alternative might be to include the generated blob itself but retain the instructions so people can
verify it, providing they cared to do so. If someone was really ambitious, they could even automate
grabbing the source files from unicode.org as part of a verification build. If they were even more
ambitious, they could add such a verification build as an option to the linux kernel build system. (In
other words, I am not the one who's going to implement this if it turns out that people on this list
believe this to be a good idea.)

Olaf

  reply	other threads:[~2018-01-12 20:29 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-12  7:12 [PATCH RFC 00/13] UTF-8 case insensitive lookups for EXT4 Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 01/13] charsets: Introduce middle-layer for character encoding Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 02/13] charsets: ascii: Wrap ascii functions to charsets library Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 03/13] charsets: utf8: Add unicode character database files Gabriel Krisman Bertazi
2018-01-12 16:59   ` Darrick J. Wong
2018-01-12 20:29     ` Weber, Olaf (HPC Data Management & Storage) [this message]
2018-01-13  0:24   ` Theodore Ts'o
2018-01-13  4:28     ` Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 04/13] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 05/13] charsets: utf8: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 06/13] charsets: utf8: reduce the size of utf8data[] Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 07/13] charsets: utf8: Hook-up utf-8 code to charsets library Gabriel Krisman Bertazi
2018-01-12 10:38   ` Weber, Olaf (HPC Data Management & Storage)
2018-01-16 16:50     ` Gabriel Krisman Bertazi
2018-01-16 22:19       ` Weber, Olaf (HPC Data Management & Storage)
2018-01-23  3:33         ` Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 08/13] charsets: utf8: Introduce test module for kernel UTF-8 implementation Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 09/13] ext4: Add ignorecase mount option Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 10/13] ext4: Include encoding information on the superblock Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 11/13] fscrypt: Introduce charset-based matching functions Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 12/13] ext4: Support charset name matching Gabriel Krisman Bertazi
2018-01-12  7:12 ` [PATCH RFC 13/13] ext4: Implement ext4 dcache hooks for custom charsets Gabriel Krisman Bertazi
2018-01-12 10:52   ` Weber, Olaf (HPC Data Management & Storage)
2018-01-12 16:56 ` [PATCH RFC 00/13] UTF-8 case insensitive lookups for EXT4 Jeremy Allison

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DF4PR8401MB081230306211258EA6D8926385170@DF4PR8401MB0812.NAMPRD84.PROD.OUTLOOK.COM \
    --to=olaf.weber@hpe.com \
    --cc=alvaro.soliverez@collabora.co.uk \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=kernel@lists.collabora.co.uk \
    --cc=krisman@collabora.co.uk \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --subject='RE: [PATCH RFC 03/13] charsets: utf8: Add unicode character database files' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).