msort: Unicode normalization failed [SOLVED]

Forum rules
Before you post please read this

msort: Unicode normalization failed [SOLVED]

Postby bdantas on Wed Jan 25, 2012 12:24 pm

I am trying to use msort in Linux Mint 12 to alphabetize files containing Unicode characters, but I get "Unicode normalization failed" no matter what arguments I use.

The puzzling thing is that the command works without any problems in Ubuntu 11.04. In Linux Mint 12, no luck in either Cinnamon/gnome-terminal or MATE/mate-terminal. In Ubuntu, msort works without specifying the Unicode normalization mode (i.e., the default NFC mode is used--whatever that means), whereas in Mint I've tried all normalization modes and all of them give me the same "Unicode normalization failed" error.

Another puzzling thing is that the simple "sort" command does not give me an error in Mint, but unfortunately it does not quite get the job done: sort makes no distinction between c and ĉ, g and ĝ, h and ĥ, etc. I specifically want c to sort before ĉ, g before ĝ, h before ĥ, etc...msort can handle custom sort orders whereas sort cannot.

Here's the simple command I'm trying to execute:
Code: Select all
msort -s file-with-sort-order unsorted-list > sorted-list

The file-with-sort-order simply contains the sort order (in this case the Esperanto alphabet: abcĉdefgĝhĥijĵklmnoprsŝtuŭvz). Obviously, both the file-with-sort-order and unsorted-list contain simple text and are saved with UTF-8 encoding.

Any ideas why I get the "Unicode normalization failed" error in Linux Mint 12 but not in Ubuntu 11.04? And, of course, how do I get past this error so that I can use msort in Linux Mint? This is the last task I'd like to be able to do in Mint before I completely convert from Ubuntu. Any help would be much appreciated.
Last edited by bdantas on Wed Feb 01, 2012 11:02 pm, edited 1 time in total.
bdantas
Level 2
Level 2
 
Posts: 56
Joined: Tue Jan 03, 2012 8:42 pm

Linux Mint is funded by ads and donations.
 

Re: msort: Unicode normalization failed

Postby xenopeek on Wed Jan 25, 2012 2:23 pm

I'm assuming you want to sort lines? Running the following test command on Linux Mint 12 succeeds without problem:
Code: Select all
msort -lws sort-file input-file > output-file

Why didn't you need the -l and -w switch? It wouldn't even start here without the -w tacked on.
Forum Rules | IRC Channel Rules
Image
Arch Linux / 64-bit / Gnome Shell
User avatar
xenopeek
Level 21
Level 21
 
Posts: 13682
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Re: msort: Unicode normalization failed [SOLVED]

Postby bdantas on Wed Jan 25, 2012 6:03 pm

Hi, Vincent. Thanks for the reply. I tried it with your syntax and still get the same error: http://dl.dropbox.com/u/13048586/screenshot1.png
Notice the output file is empty because the error occurs when msort reads the sort order file, which is where it first encounters the Unicode characters.
Maybe the error has something to do with my system-wide language settings?
Last edited by bdantas on Thu Feb 02, 2012 12:14 pm, edited 1 time in total.
bdantas
Level 2
Level 2
 
Posts: 56
Joined: Tue Jan 03, 2012 8:42 pm

Re: msort: Unicode normalization failed

Postby xenopeek on Wed Jan 25, 2012 6:26 pm

I'm not sure this matters, but in my sort-file I had one character per line...
Forum Rules | IRC Channel Rules
Image
Arch Linux / 64-bit / Gnome Shell
User avatar
xenopeek
Level 21
Level 21
 
Posts: 13682
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Re: msort: Unicode normalization failed [SOLVED]

Postby bdantas on Wed Jan 25, 2012 6:51 pm

When I turned unicode normalization to off (-u n) as such,
Code: Select all
msort -u n -lws sort-order input-file > output-file

the error went away but all the special characters were getting put at the very end of the output-file...

So I tried doing as you suggested and having only one character per line in the sort-order file AND IT WORKED!

So, to summarize, turning unicode normalization off (-u n) got rid of the error, and having only one character per line in the sort-order file made the sorting work as intended. Very interesting that neither of these were necessary when doing this in Ubuntu, but I am happy to be learning how things work in Mint. Now I can do everything I need with my beautiful Linux Mint 12/MATE OS :D :D :D :D

I've been struggling with this off and on for several days and it is a relief to have it figured out. Many, many thanks, Vincent.
Last edited by bdantas on Thu Feb 02, 2012 12:15 pm, edited 3 times in total.
bdantas
Level 2
Level 2
 
Posts: 56
Joined: Tue Jan 03, 2012 8:42 pm

Re: msort: Unicode normalization failed

Postby xenopeek on Wed Jan 25, 2012 7:04 pm

Okay :mrgreen: I learned a new sorting command, so thanks also :wink:
Forum Rules | IRC Channel Rules
Image
Arch Linux / 64-bit / Gnome Shell
User avatar
xenopeek
Level 21
Level 21
 
Posts: 13682
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands


Return to Scripts & Bash

Who is online

Users browsing this forum: No registered users and 5 guests