LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] unicode: refactor the rule for regenerating utf8data.h
@ 2019-04-27  6:24 Masahiro Yamada
  2019-04-27  6:27 ` Masahiro Yamada
  0 siblings, 1 reply; 4+ messages in thread
From: Masahiro Yamada @ 2019-04-27  6:24 UTC (permalink / raw)
  To: Olaf Weber, Gabriel Krisman Bertazi, Theodore Ts'o
  Cc: Masahiro Yamada, Gabriel Krisman Bertazi, linux-doc,
	linux-kbuild, linux-kernel, Jonathan Corbet, Michal Marek,
	linux-fsdevel

scripts/mkutf8data is used only when regenerating utf8data.h,
which never happens in the normal kernel build. However, it is
irrespectively built if CONFIG_UNICODE is enabled.

Moreover, there is no good reason for it to reside in the scripts/
directory since it is only used in fs/unicode/.

Hence, move it from scripts/ to fs/unicode/.

In some cases, we bypass build artifacts in the normal build. The
conventianl way to do so is to surround the code with ifdef REGENERATE_*.

For example,

 - 7373f4f83c71 ("kbuild: add implicit rules for parser generation")
 - 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")

I rewrote the rule in a more kbuild'ish style.

It works like this:

$ make REGENERATE_UTF8DATA=1 fs/unicode/
  [ snip ]
  HOSTCC  fs/unicode/mkutf8data
  GEN     fs/unicode/utf8data.h
  CC      fs/unicode/utf8-norm.o
  CC      fs/unicode/utf8-core.o
  AR      fs/unicode/built-in.a

Also, I added utf8data.h to .gitignore and dontdiff.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
---

 Documentation/dontdiff               |  1 +
 fs/unicode/.gitignore                |  1 +
 fs/unicode/Makefile                  | 37 +++++++++++++++++++++++++-----------
 fs/unicode/README.utf8data           | 10 +++++-----
 {scripts => fs/unicode}/mkutf8data.c |  0
 scripts/Makefile                     |  1 -
 6 files changed, 33 insertions(+), 17 deletions(-)
 create mode 100644 fs/unicode/.gitignore
 rename {scripts => fs/unicode}/mkutf8data.c (100%)

diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index ef25a06..bc353ad 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -176,6 +176,7 @@ mkprep
 mkregtable
 mktables
 mktree
+mkutf8data
 modpost
 modules.builtin
 modules.order
diff --git a/fs/unicode/.gitignore b/fs/unicode/.gitignore
new file mode 100644
index 0000000..44811fc
--- /dev/null
+++ b/fs/unicode/.gitignore
@@ -0,0 +1 @@
+mkutf8data
diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
index 671d31f..1a109b7 100644
--- a/fs/unicode/Makefile
+++ b/fs/unicode/Makefile
@@ -5,15 +5,30 @@ obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o
 
 unicode-y := utf8-norm.o utf8-core.o
 
-# This rule is not invoked during the kernel compilation.  It is used to
-# regenerate the utf8data.h header file.
-utf8data.h.new: *.txt $(objdir)/scripts/mkutf8data
-	$(objdir)/scripts/mkutf8data \
-		-a DerivedAge.txt \
-		-c DerivedCombiningClass.txt \
-		-p DerivedCoreProperties.txt \
-		-d UnicodeData.txt \
-		-f CaseFolding.txt \
-		-n NormalizationCorrections.txt \
-		-t NormalizationTest.txt \
+
+# To regenerate utf8data.h, run the following in the top directory:
+#   $ make REGENERATE_UTF8DATA=1 fs/unicode/
+ifdef REGENERATE_UTF8DATA
+
+$(obj)/utf8-norm.o: $(obj)/utf8data.h
+
+quiet_cmd_utf8data = GEN     $@
+      cmd_utf8data = $(obj)/mkutf8data \
+		-a $(src)/DerivedAge.txt \
+		-c $(src)/DerivedCombiningClass.txt \
+		-p $(src)/DerivedCoreProperties.txt \
+		-d $(src)/UnicodeData.txt \
+		-f $(src)/CaseFolding.txt \
+		-n $(src)/NormalizationCorrections.txt \
+		-t $(src)/NormalizationTest.txt \
 		-o $@
+
+$(obj)/utf8data.h: $(filter %.txt, $(cmd_utf8data)) $(obj)/mkutf8data FORCE
+	$(call if_changed,utf8data)
+
+always += utf8data.h
+no-clean-files += utf8data.h
+
+endif
+
+hostprogs-y += mkutf8data
diff --git a/fs/unicode/README.utf8data b/fs/unicode/README.utf8data
index eeb7561..155d56e 100644
--- a/fs/unicode/README.utf8data
+++ b/fs/unicode/README.utf8data
@@ -41,15 +41,15 @@ released version of the UCD can be found here:
 
   http://www.unicode.org/Public/UCD/latest/
 
-To build the utf8data.h file, from a kernel tree that has been built,
-cd to this directory (fs/unicode) and run this command:
+To regenerate utf8data.h in the build process, pass REGENERATE_UTF8DATA=1
+from the command line. The easiest command to update it is this:
 
-	make C=../.. objdir=../.. utf8data.h.new
+	make REGENERATE_UTF8DATA=1 fs/unicode/
 
-After sanity checking the newly generated utf8data.h.new file (the
+After sanity checking the newly generated utf8data.h file (the
 version generated from the 12.1.0 UCD should be 4,109 lines long, and
 have a total size of 324k) and/or comparing it with the older version
-of utf8data.h, rename it to utf8data.h.
+of utf8data.h, check it in.
 
 If you are a kernel developer updating to a newer version of the
 Unicode Character Database, please update this README.utf8data file
diff --git a/scripts/mkutf8data.c b/fs/unicode/mkutf8data.c
similarity index 100%
rename from scripts/mkutf8data.c
rename to fs/unicode/mkutf8data.c
diff --git a/scripts/Makefile b/scripts/Makefile
index b87e3e0..9d442ee 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -20,7 +20,6 @@ hostprogs-$(CONFIG_ASN1)	 += asn1_compiler
 hostprogs-$(CONFIG_MODULE_SIG)	 += sign-file
 hostprogs-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
 hostprogs-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
-hostprogs-$(CONFIG_UNICODE) += mkutf8data
 
 HOSTCFLAGS_sortextable.o = -I$(srctree)/tools/include
 HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
-- 
2.7.4


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] unicode: refactor the rule for regenerating utf8data.h
  2019-04-27  6:24 [PATCH] unicode: refactor the rule for regenerating utf8data.h Masahiro Yamada
@ 2019-04-27  6:27 ` Masahiro Yamada
  2019-04-28 17:43   ` Theodore Ts'o
  0 siblings, 1 reply; 4+ messages in thread
From: Masahiro Yamada @ 2019-04-27  6:27 UTC (permalink / raw)
  To: Olaf Weber, Gabriel Krisman Bertazi, Theodore Ts'o
  Cc: Gabriel Krisman Bertazi, open list:DOCUMENTATION,
	Linux Kbuild mailing list, Linux Kernel Mailing List,
	Jonathan Corbet, Michal Marek, linux-fsdevel

On Sat, Apr 27, 2019 at 3:24 PM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> scripts/mkutf8data is used only when regenerating utf8data.h,
> which never happens in the normal kernel build. However, it is
> irrespectively built if CONFIG_UNICODE is enabled.
>
> Moreover, there is no good reason for it to reside in the scripts/
> directory since it is only used in fs/unicode/.
>
> Hence, move it from scripts/ to fs/unicode/.
>
> In some cases, we bypass build artifacts in the normal build. The
> conventianl way to do so is to surround the code with ifdef REGENERATE_*.


This is a typo.

conventianl -> conventional



> For example,
>
>  - 7373f4f83c71 ("kbuild: add implicit rules for parser generation")
>  - 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")
>
> I rewrote the rule in a more kbuild'ish style.
>
> It works like this:
>
> $ make REGENERATE_UTF8DATA=1 fs/unicode/
>   [ snip ]
>   HOSTCC  fs/unicode/mkutf8data
>   GEN     fs/unicode/utf8data.h
>   CC      fs/unicode/utf8-norm.o
>   CC      fs/unicode/utf8-core.o
>   AR      fs/unicode/built-in.a
>
> Also, I added utf8data.h to .gitignore and dontdiff.
>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
> ---




-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] unicode: refactor the rule for regenerating utf8data.h
  2019-04-27  6:27 ` Masahiro Yamada
@ 2019-04-28 17:43   ` Theodore Ts'o
  2019-04-29  3:09     ` Masahiro Yamada
  0 siblings, 1 reply; 4+ messages in thread
From: Theodore Ts'o @ 2019-04-28 17:43 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Olaf Weber, Gabriel Krisman Bertazi, Gabriel Krisman Bertazi,
	open list:DOCUMENTATION, Linux Kbuild mailing list,
	Linux Kernel Mailing List, Jonathan Corbet, Michal Marek,
	linux-fsdevel

Thanks, for the suggestion and the patch!  I agree it's much better
with your proposed change (and it gets mkutf8data out of your hair :-)

I did need to make one change to your patch in order for it to work
correctly with a build directory.  That is, to support

	make -O /build/ext4 REGENERATE_UTF8DATA=1 fs/unicode/

I'll apply it to the ext4 git tree with this change.

     	      	     	      	   - Ted

diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
index 1a109b7a1da9..45955264ac04 100644
--- a/fs/unicode/Makefile
+++ b/fs/unicode/Makefile
@@ -14,13 +14,13 @@ $(obj)/utf8-norm.o: $(obj)/utf8data.h
 
 quiet_cmd_utf8data = GEN     $@
       cmd_utf8data = $(obj)/mkutf8data \
-		-a $(src)/DerivedAge.txt \
-		-c $(src)/DerivedCombiningClass.txt \
-		-p $(src)/DerivedCoreProperties.txt \
-		-d $(src)/UnicodeData.txt \
-		-f $(src)/CaseFolding.txt \
-		-n $(src)/NormalizationCorrections.txt \
-		-t $(src)/NormalizationTest.txt \
+		-a $(srctree)/$(src)/DerivedAge.txt \
+		-c $(srctree)/$(src)/DerivedCombiningClass.txt \
+		-p $(srctree)/$(src)/DerivedCoreProperties.txt \
+		-d $(srctree)/$(src)/UnicodeData.txt \
+		-f $(srctree)/$(src)/CaseFolding.txt \
+		-n $(srctree)/$(src)/NormalizationCorrections.txt \
+		-t $(srctree)/$(src)/NormalizationTest.txt \
 		-o $@
 
 $(obj)/utf8data.h: $(filter %.txt, $(cmd_utf8data)) $(obj)/mkutf8data FORCE

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] unicode: refactor the rule for regenerating utf8data.h
  2019-04-28 17:43   ` Theodore Ts'o
@ 2019-04-29  3:09     ` Masahiro Yamada
  0 siblings, 0 replies; 4+ messages in thread
From: Masahiro Yamada @ 2019-04-29  3:09 UTC (permalink / raw)
  To: Theodore Ts'o, Masahiro Yamada, Olaf Weber,
	Gabriel Krisman Bertazi, Gabriel Krisman Bertazi,
	open list:DOCUMENTATION, Linux Kbuild mailing list,
	Linux Kernel Mailing List, Jonathan Corbet, Michal Marek,
	linux-fsdevel

On Mon, Apr 29, 2019 at 2:44 AM Theodore Ts'o <tytso@mit.edu> wrote:
>
> Thanks, for the suggestion and the patch!  I agree it's much better
> with your proposed change (and it gets mkutf8data out of your hair :-)

Yes, this is my main motivation.

>
> I did need to make one change to your patch in order for it to work
> correctly with a build directory.  That is, to support
>
>         make -O /build/ext4 REGENERATE_UTF8DATA=1 fs/unicode/
>
> I'll apply it to the ext4 git tree with this change.


Thanks for the suggestion.

My first thought was "don't do this",
but it would be better to make it work with O= option.

However, even with your fix-up, it won't work correctly.

If O= is given, the newly-generated utf8data.h will be
put in the object tree.

It will co-exist with the old check-in utf8data.h
and the old one will be included because
the include paths in the srctree are searched first.

I will send v2 shortly so that O= build will work
correctly.

Thanks.





>                                    - Ted
>
> diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
> index 1a109b7a1da9..45955264ac04 100644
> --- a/fs/unicode/Makefile
> +++ b/fs/unicode/Makefile
> @@ -14,13 +14,13 @@ $(obj)/utf8-norm.o: $(obj)/utf8data.h
>
>  quiet_cmd_utf8data = GEN     $@
>        cmd_utf8data = $(obj)/mkutf8data \
> -               -a $(src)/DerivedAge.txt \
> -               -c $(src)/DerivedCombiningClass.txt \
> -               -p $(src)/DerivedCoreProperties.txt \
> -               -d $(src)/UnicodeData.txt \
> -               -f $(src)/CaseFolding.txt \
> -               -n $(src)/NormalizationCorrections.txt \
> -               -t $(src)/NormalizationTest.txt \
> +               -a $(srctree)/$(src)/DerivedAge.txt \
> +               -c $(srctree)/$(src)/DerivedCombiningClass.txt \
> +               -p $(srctree)/$(src)/DerivedCoreProperties.txt \
> +               -d $(srctree)/$(src)/UnicodeData.txt \
> +               -f $(srctree)/$(src)/CaseFolding.txt \
> +               -n $(srctree)/$(src)/NormalizationCorrections.txt \
> +               -t $(srctree)/$(src)/NormalizationTest.txt \
>                 -o $@
>
>  $(obj)/utf8data.h: $(filter %.txt, $(cmd_utf8data)) $(obj)/mkutf8data FORCE



-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-04-29  3:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-27  6:24 [PATCH] unicode: refactor the rule for regenerating utf8data.h Masahiro Yamada
2019-04-27  6:27 ` Masahiro Yamada
2019-04-28 17:43   ` Theodore Ts'o
2019-04-29  3:09     ` Masahiro Yamada

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).