[Solved] Bash and utf chars

1000 · Post by **1000** » Fri Dec 03, 2021 9:21 am

I am using for example utf table: https://kermitproject.org/utf8-t1.html

I am trying to display utf characters from script to terminal.
For example

From terminal

Code: Select all

$ printf "%b\n" $(printf '\\u%x\n' $(seq "615" "617"))
ɧ
ɨ
ɩ

From script

Code: Select all

$ bash utf.test --test
\u0267
\u0268
\u0269

But it works if the script contains something like this

Code: Select all

bash -c "printf '\u0268\n'"
bash -c "echo -e '\u0268\n'"

Or when contains something like this

Code: Select all

printf '\xc9\xa8\n'

Other example , I can like this

Code: Select all

A='\xc9\xa8\n'
printf "$A"

ɨ

But, I can not like this

Code: Select all

VAR=$(printf '\u0268' | od -vtx1 | awk '{$1=""; print $0}' | head -n1 | sed 's/ /\\x/g') ; printf "$VAR"

\u0268

The result is different than I expected

rene · Post by **rene** » Fri Dec 03, 2021 10:38 am

Works fine in bash. Note that printf is a shell built-in; I expect you are using sh rather than bash for that last example. Here:

Code: Select all

rene@hp8k:~$ bash -c "printf '\u0268\n'"
ɨ
rene@hp8k:~$ sh -c "printf '\u0268\n'"
\u0268

1000 · Post by **1000** » Fri Dec 03, 2021 11:21 am

printf in terminal probably is from coreutils

Code: Select all

$ dpkg -S $(which printf)
coreutils: /usr/bin/printf

printf (from script) is in bash built-ins

Code: Select all

bash -c 'compgen -b | grep pri'
printf

Also not working

Code: Select all

/usr/bin/printf "$VAR"

\u0268

From terminal

Code: Select all

$ ps -p $$
    PID TTY          TIME CMD
   9581 pts/0    00:00:00 bash

From script

Code: Select all

    PID TTY          TIME CMD
  11590 pts/0    00:00:00 bash

rene · Post by **rene** » Fri Dec 03, 2021 11:38 am

"printf in terminal" is the shell built-in for both /bin/sh and /bin/bash if said shell runs in said terminal. /usr/bin/printf is indeed of course the actual binary. Note that the /bin/sh printf does not support \uxxxx (man dash)

1000 · Post by **1000** » Sat Dec 04, 2021 11:27 pm

Thank you.

Example script.

Code: Select all

#!/bin/bash


DESTINY="This script can convert decimal number to unicode char."
VERSION="1"
LICENCE="GPL v3: https://www.gnu.org/licenses/gpl.html "
LC_ALL=C 


##============{
NOTE_BEGIN() {
echo "----------------------"
echo "Decimal	Unicode	char"
echo "----------------------"
}
##============}

##===================={
NOTE_ABOUT_THE_END() {
	echo " "
	echo "The end of script."
	if [ -f /usr/share/tuxtype/sounds/win.wav ] ; then
		##  Play sound :)
		aplay /usr/share/tuxtype/sounds/win.wav > /dev/null 2>&1
	fi
}
##====================}

##============================{
PRINT() {

while  [ "$NUMBER_STARTING" -le "$NUMBER_ENDING" ] ; do

	## Example:
	## Decimal 616 to unicode \u268 . Info: Decimal number this is for me number of character (0-65536)
	#	A="$(printf '\\u%x\n' '616')"
	## Unicode to char
	#	bash -c "printf '$A\n'"

	VARIABLE="$(printf '\\u%x\n' $NUMBER_STARTING)"
	bash -c "printf '$NUMBER_STARTING	%s	%b\n' '$VARIABLE' '$VARIABLE'"

	# It is for "while loop"
	NUMBER_STARTING=$[$NUMBER_STARTING+1]
done

}
##============================}


case "$1" in
	"--utf"|"-u")
#		echo "$1 $2 $3"
		[ -z $2 ] && { echo "Second argumentis empty. Enter a starting number. " ; exit 1 ;} 
		[ -z $3 ] && { echo "Third  argumentis empty. Enter a ending number. " ; exit 1 ;}
		NOTE_BEGIN
		NUMBER_STARTING="$2" ; NUMBER_ENDING="$3" ; PRINT
		NOTE_ABOUT_THE_END
	;;
	"--stdin"|"-s")
		if [[ -p /dev/stdin ]] ; then
    		PIPE=$(cat -)
    		VARIABLE="$(printf '\\u%x\n' $PIPE)"
			bash -c "printf '%b\n' '$VARIABLE'"
		fi
	;;
	"--help"|"-h")
		echo "---------------------------------------------------------"
		echo "usage: $0 --option"
		echo " "
		echo " "
		echo " Main options:"
		echo " "
		echo "   --stdin               -s     This change decimal number to char"
		echo "                                from stdin using pipe"
		echo " "
		echo "                                For example:"
		echo " 				==========================="
		echo " 				$ echo 120 | bash utf.script -s"
		echo " 				x"
		echo " 				==========================="
		echo " "
		echo "   --utf                 -u     Utf-8 (~ 0-65535 0-65536)"
		echo "                                At the end this option, add starting number"
		echo "                                and add ending number."
		echo " "
		echo "                                For example:"
		echo " 				==========================="
		echo " 				$ bash utf.script -u 610 615"
		echo " 				----------------------"
		echo " 				Decimal	Unicode	char"
		echo " 				----------------------"
		echo " 				610	\u262	ɢ"
		echo " 				611	\u263	ɣ"
		echo " 				612	\u264	ɤ"
		echo " 				613	\u265	ɥ"
		echo " 				614	\u266	ɦ"
		echo " 				615	\u267	ɧ"
 		echo " "
		echo " 				The end of script."
		echo " 				==========================="
		echo " "
		echo " 				Info:"
		echo " 				1. Unicode    https://en.wikipedia.org/wiki/List_of_Unicode_characters"
		echo " 				2. Unicode    https://www.utf8-chartable.de/"
		echo " 				3. ASCII      https://en.wikipedia.org/wiki/ASCII#Control_code_chart"
		echo " "
		echo "   --help                -h     Show help"
		echo " "
		echo "---------------------------------------------------------"
		exit
	;;
	*)
		echo "	Error: unknown option"		
		echo "	Try use: $0 --help"
		exit
	;;
esac

exit

Termy · Post by **Termy** » Wed Dec 08, 2021 12:08 am

Three things pop out to me right now, in that example script:

1.

You're `exit`-ing at the end of the script, when it already will `exit` at the end of the script, so your `exit` at the end is redundant. I hope that makes sense.

2.

The line:

Code: Select all

	VARIABLE="$(printf '\\u%x\n' $NUMBER_STARTING)"

Isn't taking advantage of BASH. You can actually assign shell variables with the `printf` builtin, using the `-v` flag and a variable name, which will also remove the unneeded command substitution (which uses a subshell). Example:

Code: Select all

	printf -v VARIABLE '\\u%x\n' $NUMBER_STARTING

It goes further than that though, because you needn't have used `printf` at all. You can actually put the escape sequence directly into a variable, as long as what you use to recall it can interpret it, which `printf` can, with the `%b` format specification. Example, which displays the m-dash Unicode character:

Code: Select all

$ Sequence='\u2014'
$ printf '%b\n' "$Sequence"
—

3.

Variables named in all capital letters are typically reserved for environment variables. There are a few common approaches, but it mostly is a stylistic choice. My preference is LikeThis.

1000 wrote: ⤴Fri Dec 03, 2021 11:21 am printf in terminal probably is from coreutils

Builtins take precedence over files in PATH, so as long as you have the `printf` builtin enabled, as is the default in BASH and Bourne Shell (IE: DASH), that will be used both interactively and in scripts, unless you were to directly execute the file, such as by providing the absolute path to it.

rene wrote: ⤴Fri Dec 03, 2021 11:38 am /usr/bin/printf is indeed of course the actual binary.

To clarify, '/usr/bin/printf' is not the binary the shell uses when you run the `printf` builtin. It's one of many external (to the shell) tools which are an alternative to the builtins, likely for compatibility reasons.

1000 · Post by **1000** » Wed Dec 08, 2021 1:06 pm

1 `exit`-ing at the end

- It is true. You're right. Command "exit" is unnecessary .
It doesn't matter in the script, so I left it.
- Privately, I had unnecessary internet links there
The code after the "exit" is not executed.
Later when the script is finished.
I leave the most important links, eg in "--help" or in a comment
and the rest of the unnecessary links I removed .

Off topic:
I also found that inserting code into a function is great to use as a comment.

Code: Select all

new code

COMMENT() {
old code
large amount of old code 
}

Useful for me for test new code. In public code, this can be considered littering. ( rubbish / garbage )

In the search engine I found https://linuxize.com/post/bash-heredoc/

- People often use the exit a lot at the end.
To be sure that the script is finished.
This is more like paranoia.
Fortunately, harmless and I can accept.
---------------

3. Variables named in all capital letters are typically reserved for environment variables. There are a few common approaches, but it mostly is a stylistic choice. My preference is LikeThis.

It depends on the school. ( good practice guides on the Internet )

- Usually simple scripts contain variables made of lowercase letters.
For me, such a code is not readable. Because I don't know if it's a command, variable or function.

- Some tutorials advise us to use variables with uppercase letters.
I am trying to follow this way. For me, such a code is more readable.

When is a function and when is it variable?
If the script is composed from only one file, it is not too big of a problem to find the beginning when it was created and tell.
When a script is composed of many files, this is the problem.
- Sometimes I am trying to give tips in the comments where function is.
- Sometimes I have all functions in file named "functions"
- Sometimes I am trying add name for function something like this
FuncName
or
FUNC_NAME
And "Func" this is short from name "function".

Edited.
Sometimes I use

( inside terminal to find something from scripts )

Code: Select all

grep -ri word_searched

or

Code: Select all

grep -rin word_searched

When is a variable from script and when is it environment variables?
- I use environment variables very rarely. If I use it I try to give information in the comment.
- I assume it makes no difference, under certain conditions.
Variables are usually declared in the script.
So I assume that if there is a conflict, the variable in the script overwrites the system variable.
And the problem does not exist.

It is more a mistake when we will not use
VARIABLE=""
and we will expect the variable to be always empty.
Because if we haven't created it doesn't mean it doesn't exist. ( for example in environment variables )

1000 · Post by **1000** » Tue Dec 21, 2021 11:29 am

When I tested the commands in the terminal,
I found what is causing the problem in the script.
LC_ALL=C
Characters above ASCII ( 126 in decimal ) are not displayed.

Code: Select all

$ printf '%b' "$(printf '\\u%x\n' {160..170})" | cat -n
     1	 
     2	¡
     3	¢
     4	£
     5	¤
     6	¥
     7	¦
     8	§
     9	¨
    10	©
    11	ª

$ printf '%b' "$(printf '\\u%x\n' {110..126})" | cat -n
     1	n
     2	o
     3	p
     4	q
     5	r
     6	s
     7	t
     8	u
     9	v
    10	w
    11	x
    12	y
    13	z
    14	{
    15	|
    16	}
    17	~

$ LC_ALL=C ; printf '%b' "$(printf '\\u%x\n' {110..126})" | cat -n
     1	n
     2	o
     3	p
     4	q
     5	r
     6	s
     7	t
     8	u
     9	v
    10	w
    11	x
    12	y
    13	z
    14	{
    15	|
    16	}
    17	~

$ LC_ALL=C ; printf '%b' "$(printf '\\u%x\n' {160..170})" | cat -n
     1	\u00A0
     2	\u00A1
     3	\u00A2
     4	\u00A3
     5	\u00A4
     6	\u00A5
     7	\u00A6
     8	\u00A7
     9	\u00A8
    10	\u00A9
    11	\u00AA

$ LC_ALL=C ; printf '%b' "$(printf '\\u%x\n' {110..126})" | cat -n
     1	n
     2	o
     3	p
     4	q
     5	r
     6	s
     7	t
     8	u
     9	v
    10	w
    11	x
    12	y
    13	z
    14	{
    15	|
    16	}
    17	~

$ LC_ALL=C ; printf '%b' "$(printf '\\u%x\n' {160..170})" | cat -n
     1	\u00A0
     2	\u00A1
     3	\u00A2
     4	\u00A3
     5	\u00A4
     6	\u00A5
     7	\u00A6
     8	\u00A7
     9	\u00A8
    10	\u00A9
    11	\u00AA

$ printf '%b' "$(printf '\\u%x\n' {160..170})" | cat -n
     1	\u00A0
     2	\u00A1
     3	\u00A2
     4	\u00A3
     5	\u00A4
     6	\u00A5
     7	\u00A6
     8	\u00A7
     9	\u00A8
    10	\u00A9
    11	\u00AA

I don't remember if I checked,
but probably my fault is that I added " LC_ALL=C " to the script and did not check it without.

Maybe I'll be wiser in the future.

Is there any other way to print unicode with " LC_ALL=C " ?

Edited
" LC_ALL=en_US.UTF-8" looks working from https://perlgeek.de/en/article/set-up-a ... nvironment

Code: Select all

$ LC_ALL=en_US.UTF-8; printf '%b' "$(printf '\\u%x\n' {160..170})" | cat -n
     1	 
     2	¡
     3	¢
     4	£
     5	¤
     6	¥
     7	¦
     8	§
     9	¨
    10	©
    11	ª

I used " LC_ALL=C " in script to optimize the script's performance.

1000 · Post by **1000** » Tue Dec 21, 2021 2:55 pm

Off topic.
Only observation.
With 159 number, next characters are not displayed.

Code: Select all

$ LC_ALL=en_US.UTF-8; printf '%b' "$(printf '\\u%x\n' {159..170})" | cat -n
     1

Code: Select all

$ LC_ALL=en_US.UTF-8; printf '%b' "$(printf '\\u%x\n' {150..170})" | cat -n
     1	
     2	
     3	
     5	
             6	
	
     8

Code: Select all

$ LC_ALL=en_US.UTF-8; printf '%b' "$(printf '\\u%x\n' {160..170})" | cat -n
     1	 
     2	¡
     3	¢
     4	£
     5	¤
     6	¥
     7	¦
     8	§
     9	¨
    10	©
    11	ª

Linux Mint Forums

[Solved] Bash and utf chars

[Solved] Bash and utf chars

Re: Bash and utf chars

Re: Bash and utf chars

Re: Bash and utf chars

Re: Bash and utf chars

Re: Bash and utf chars

Re: Bash and utf chars

Re: Bash and utf chars

Re: [Solved] Bash and utf chars