Not UTF-8 valid

Forum rules
Before you post please read how to get help
Post Reply
Seff
Level 3
Level 3
Posts: 195
Joined: Mon Aug 31, 2015 5:40 pm

Not UTF-8 valid

Post by Seff » Mon Oct 15, 2018 5:40 pm

Hello. I have some bash scripts that don't run properly- namely, scripts from gog.com. Trying to run them, I get the error "gtk: Locale not supported by C library." Opening them in Mousepad results in "This document was not UTF-8 valid." How can I find out what the proper locale is? Thank you. Also, I note that they seem to have been make with gtk 2.0.
Mint "Rafaela" Cinnamon x64
Satellite L755D

rene
Level 8
Level 8
Posts: 2226
Joined: Sun Mar 27, 2016 6:58 pm

Re: Not UTF-8 valid

Post by rene » Mon Oct 15, 2018 6:36 pm

"gtk" warnings are not produced by the shell itself; will be from whichever graphical program said script launches. The question is as such perhaps not fully fitting here, but this just so as to comment on the issue in a technical sense; not to say it should be elsewhere...

It seems likely that you have a default LANG or LC_<foo> setting for a locale that is not in fact installed. Try locale for your current settings, locale -a for the installed locales. If you notice any missing edit as root /etc/locale.gen, uncomment those you need additionally, save and run sudo locale-gen.

As to mousepad complaining: hard to say anything without an example of a script for which it complains.

Seff
Level 3
Level 3
Posts: 195
Joined: Mon Aug 31, 2015 5:40 pm

Re: Not UTF-8 valid

Post by Seff » Mon Oct 15, 2018 6:56 pm

Mint "Rafaela" Cinnamon x64
Satellite L755D

rene
Level 8
Level 8
Posts: 2226
Joined: Sun Mar 27, 2016 6:58 pm

Re: Not UTF-8 valid

Post by rene » Mon Oct 15, 2018 7:25 pm

Seems to not be something to test stand-alone. Have you tried the things I mentioned?

User avatar
xenopeek
Level 24
Level 24
Posts: 23118
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Re: Not UTF-8 valid

Post by xenopeek » Sat Oct 20, 2018 2:03 pm

Indeed, interesting to see the output of locale and locale -a commands.

Isn't gog using a shell script with embedded binary content? One very big .sh file that indeed won't open in text editor.

Failing anything else you could try running the installer from the terminal and prefix the installer command with LC_ALL=C to make it use the default C (English) locale.
Image

Seff
Level 3
Level 3
Posts: 195
Joined: Mon Aug 31, 2015 5:40 pm

Re: Not UTF-8 valid

Post by Seff » Thu Oct 25, 2018 12:08 pm

Code: Select all

locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en_GB:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=en_US.UTF-8
LC_TIME=en_US.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=en_US.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=en_US.UTF-8
LC_NAME=en_US.UTF-8
LC_ADDRESS=en_US.UTF-8
LC_TELEPHONE=en_US.UTF-8
LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=en_US.UTF-8
LC_ALL=
I can run "Blocks That Matter", so call it a partial success. (rene is probably right about this topic being in the wrong forum.)

Code: Select all

locale -a
C
C.UTF-8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
POSIX/[code]
Mint "Rafaela" Cinnamon x64
Satellite L755D

rene
Level 8
Level 8
Posts: 2226
Joined: Sun Mar 27, 2016 6:58 pm

Re: Not UTF-8 valid

Post by rene » Fri Oct 26, 2018 8:24 pm

Not seeing anything wrong; you have en_US.UTF-8 configured and installed. Sorry, no idea then what the issue is.

User avatar
catweazel
Level 17
Level 17
Posts: 7755
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Not UTF-8 valid

Post by catweazel » Fri Oct 26, 2018 8:37 pm

Seff wrote:
Mon Oct 15, 2018 5:40 pm
"This document was not UTF-8 valid."
That makes me suppose that the file is missing its two byte UTF8 header.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

rene
Level 8
Level 8
Posts: 2226
Joined: Sun Mar 27, 2016 6:58 pm

Re: Not UTF-8 valid

Post by rene » Fri Oct 26, 2018 8:50 pm

Sorry having to disagree with you again but there is no such thing as a "two byte UTF8 header" on a shell script. xenopeek's suggestion of the script embedding a binary seems likely; certainly random binary data will not (all) be valid UTF-8. However, very little useful to add without an example of the script/s it/themself/ves, so, well, ...

User avatar
catweazel
Level 17
Level 17
Posts: 7755
Joined: Fri Oct 12, 2012 9:44 pm
Location: Australian Antarctic Territory

Re: Not UTF-8 valid

Post by catweazel » Fri Oct 26, 2018 8:58 pm

rene wrote:
Fri Oct 26, 2018 8:50 pm
Sorry having to disagree with you again
It's no skin off my nose. I was merely supposing anyway.

Cheers.
¡uʍop ǝpısdn sı buıɥʇʎɹǝʌǝ os ɐıןɐɹʇsnɐ ɯoɹɟ ɯ,ı

Seff
Level 3
Level 3
Posts: 195
Joined: Mon Aug 31, 2015 5:40 pm

Re: Not UTF-8 valid

Post by Seff » Sat Oct 27, 2018 12:57 pm

There's little doubt the script runs binary code- it invokes MojoSetup, which installs a game.

Code: Select all

abel="Blocks That Matter (GOG.com)"
script="./startmojo.sh"
scriptargs=""
licensetxt=""
targetdir="binaries"
filesizes="679302"
keep="n"
quiet="n"
It's a very long script; this is just a small sample. Might be best to let it lie.
Mint "Rafaela" Cinnamon x64
Satellite L755D

gm10
Level 12
Level 12
Posts: 4147
Joined: Thu Jun 21, 2018 5:11 pm

Re: Not UTF-8 valid

Post by gm10 » Sat Oct 27, 2018 1:20 pm

Seff wrote:
Thu Oct 25, 2018 12:08 pm

Code: Select all

LANGUAGE=en_US:en_GB:en
LC_ALL=
Change the first one to en_US.UTF-8 and either remove the other one or set it to en_US.UTF-8 as well and you should be good. The file to edit should be ~/.pam_environment, or if it's not in there then add it and/or find where the original line comes from and change it there.
catweazel wrote:
Fri Oct 26, 2018 8:58 pm
rene wrote:
Fri Oct 26, 2018 8:50 pm
Sorry having to disagree with you again
It's no skin off my nose. I was merely supposing anyway.
Actually you're both wrong. There's no reason why a shell script couldn't have a UTF BOM (aka "two byte UTF8 header") but the error message comes from him loading binary data into a text editor.
Last edited by gm10 on Sat Oct 27, 2018 3:11 pm, edited 2 times in total.

rene
Level 8
Level 8
Posts: 2226
Joined: Sun Mar 27, 2016 6:58 pm

Re: Not UTF-8 valid

Post by rene » Sat Oct 27, 2018 2:26 pm

gm10 wrote:
Sat Oct 27, 2018 1:20 pm
There's no reason why a shell script couldn't have a UTF BOM (aka "two byte UTF8 header")
There certainly is.

Code: Select all

rene@t5500:~$ cat foo.sh 
#!/bin/sh
echo foo
rene@t5500:~$ xxd foo.sh 
00000000: efbb bf23 212f 6269 6e2f 7368 0a65 6368  ...#!/bin/sh.ech
00000010: 6f20 666f 6f0a                           o foo.
rene@t5500:~$ ./foo.sh 
./foo.sh: line 1: #!/bin/sh: No such file or directory
foo
Note, 0xef 0xbb 0xbf is the UTF-8 encoding of the Unicode BOM (U+FEFF); the error says that the first line is no longer even recognized as a shebang; that as far as the shell is concerned the leading \ufeff is just random garbage. That there is no such thing as a [ ... ] header on a shell script.

The fact that it in the specific above case still ends up echoing is moreover mere historical accident of the shell invoking itself on the file when failing a system-based shebang invocation. The BOM fully breaks any other kind of script. E.g.,

Code: Select all

rene@t5500:~$ cat foo.awk 
#!/usr/bin/awk -f
BEGIN { print "foo" }
rene@t5500:~$ xxd foo.awk 
00000000: efbb bf23 212f 7573 722f 6269 6e2f 6177  ...#!/usr/bin/aw
00000010: 6b20 2d66 0a42 4547 494e 207b 2070 7269  k -f.BEGIN { pri
00000020: 6e74 2022 666f 6f22 207d 0a              nt "foo" }.
rene@t5500:~$ ./foo.awk 
./foo.awk: line 1: #!/usr/bin/awk: No such file or directory
./foo.awk: line 2: BEGIN: command not found
The LANGUAGE advise is by the way also wrong; that one isn't an LC_ var and is syntactically fine as is.

gm10
Level 12
Level 12
Posts: 4147
Joined: Thu Jun 21, 2018 5:11 pm

Re: Not UTF-8 valid

Post by gm10 » Sat Oct 27, 2018 3:11 pm

rene wrote:
Sat Oct 27, 2018 2:26 pm
Note, 0xef 0xbb 0xbf is the UTF-8 encoding of the Unicode BOM (U+FEFF); the error says that the first line is no longer even recognized as a shebang;
I stand corrected. Apparently despite UTF-8 support in the shell the first character must be ASCII. How interesting, I never knew. Thx.

Post Reply

Return to “Scripts & Bash”