Page 1 of 1

msort: Unicode normalization failed [SOLVED]

Posted: Wed Jan 25, 2012 12:24 pm
by bdantas
I am trying to use msort in Linux Mint 12 to alphabetize files containing Unicode characters, but I get "Unicode normalization failed" no matter what arguments I use.

The puzzling thing is that the command works without any problems in Ubuntu 11.04. In Linux Mint 12, no luck in either Cinnamon/gnome-terminal or MATE/mate-terminal. In Ubuntu, msort works without specifying the Unicode normalization mode (i.e., the default NFC mode is used--whatever that means), whereas in Mint I've tried all normalization modes and all of them give me the same "Unicode normalization failed" error.

Another puzzling thing is that the simple "sort" command does not give me an error in Mint, but unfortunately it does not quite get the job done: sort makes no distinction between c and ĉ, g and ĝ, h and ĥ, etc. I specifically want c to sort before ĉ, g before ĝ, h before ĥ, etc...msort can handle custom sort orders whereas sort cannot.

Here's the simple command I'm trying to execute:

Code: Select all

msort -s file-with-sort-order unsorted-list > sorted-list
The file-with-sort-order simply contains the sort order (in this case the Esperanto alphabet: abcĉdefgĝhĥijĵklmnoprsŝtuŭvz). Obviously, both the file-with-sort-order and unsorted-list contain simple text and are saved with UTF-8 encoding.

Any ideas why I get the "Unicode normalization failed" error in Linux Mint 12 but not in Ubuntu 11.04? And, of course, how do I get past this error so that I can use msort in Linux Mint? This is the last task I'd like to be able to do in Mint before I completely convert from Ubuntu. Any help would be much appreciated.

Re: msort: Unicode normalization failed

Posted: Wed Jan 25, 2012 2:23 pm
by xenopeek
I'm assuming you want to sort lines? Running the following test command on Linux Mint 12 succeeds without problem:

Code: Select all

msort -lws sort-file input-file > output-file
Why didn't you need the -l and -w switch? It wouldn't even start here without the -w tacked on.

Re: msort: Unicode normalization failed [SOLVED]

Posted: Wed Jan 25, 2012 6:03 pm
by bdantas
Hi, Vincent. Thanks for the reply. I tried it with your syntax and still get the same error: http://dl.dropbox.com/u/13048586/screenshot1.png
Notice the output file is empty because the error occurs when msort reads the sort order file, which is where it first encounters the Unicode characters.
Maybe the error has something to do with my system-wide language settings?

Re: msort: Unicode normalization failed

Posted: Wed Jan 25, 2012 6:26 pm
by xenopeek
I'm not sure this matters, but in my sort-file I had one character per line...

Re: msort: Unicode normalization failed [SOLVED]

Posted: Wed Jan 25, 2012 6:51 pm
by bdantas
When I turned unicode normalization to off (-u n) as such,

Code: Select all

msort -u n -lws sort-order input-file > output-file
the error went away but all the special characters were getting put at the very end of the output-file...

So I tried doing as you suggested and having only one character per line in the sort-order file AND IT WORKED!

So, to summarize, turning unicode normalization off (-u n) got rid of the error, and having only one character per line in the sort-order file made the sorting work as intended. Very interesting that neither of these were necessary when doing this in Ubuntu, but I am happy to be learning how things work in Mint. Now I can do everything I need with my beautiful Linux Mint 12/MATE OS :D :D :D :D

I've been struggling with this off and on for several days and it is a relief to have it figured out. Many, many thanks, Vincent.

Re: msort: Unicode normalization failed

Posted: Wed Jan 25, 2012 7:04 pm
by xenopeek
Okay :mrgreen: I learned a new sorting command, so thanks also :wink: