Page 1 of 1

how to sort on comma delimited fields enclosed in double quotes

Posted: Thu Jul 30, 2020 6:38 pm
by coffeeking
Can someone tell me how to sort this file on fields 5, then 4, then 12, then 14? Thank you.

"OH0010000000","14","7400076","LAST","FIRST","MIDDLE","","1954-04-13","1974-0
4-08","ACTIVE","R","123 ANYSTREET","","BLANCHESTER","OH","45107","","","","","",
"","","","","","","GREAT OAKS CAREER CAMPUSES","","","","15","12","SOUTHERN OHIO
ESC","","","BLANCHESTER LOCAL SD (CLINTON)","","MARION","14ABE","10","91","17",
"MARION TOWNSHIP","","","R","X","R","X","R","X","R","X","","X","","","","X","","
R","X","","","X","","","R","","X","","","","","","X","R","","","X","","","X","R"
,"X","","","","X","R","X","R","","X","R","","","X","X","","X","R","","X","","","
X","R"
"OH0010000004","14","10100929","LAST","FIRST","MIDDLE","SR","1961-10-07","2001-0
9-20","ACTIVE","R","456 ANYSTREET","","WILMINGTON","OH","45177","","","","","",
"","","","","","","GREAT OAKS CAREER CAMPUSES","","WILMINGTON CITY SD","","15","
12","","","","","","GREEN","14ABY","10","91","17","GREEN TOWNSHIP","","","","","
","","","","","","","X","","","","","","","X","X","","X","","","","","X","","","
","","","","","","","","","","","","X","","","","","","","","","","R","","","X",
"","","","","","","","","",""
"OH0010000006","14","10001812","LAST","FIRST","","","1957-09-20","2000-10-03
","ACTIVE","R","789 ANYSTREET,"","MIDLAND","OH","45148","","","","","","","",""
,"","","","GREAT OAKS CAREER CAMPUSES","","","","15","12","SOUTHERN OHIO ESC",""
,"","BLANCHESTER LOCAL SD (CLINTON)","","MARION","14ABE","10","91","17","MARION
TOWNSHIP","","","","X","","","","","","","","X","","","","","","","X","","","","
","","","","X","","","","","","X","","","","","","","X","","X","","","","","",""
,"","","X","R","","","X","","","X","","","X","","","X",""

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Thu Jul 30, 2020 6:48 pm
by freshminted
Save it all as a CSV (Comma Separated Variable) text file with your favourite text editor (Xed etc). You'll have to join up all those lines, it looks like there should really only be three lines of csv data. The load it into Libreoffice Calc (or M$ Excel) and go from there.

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Thu Jul 30, 2020 7:02 pm
by deck_luck
coffeeking wrote:
Thu Jul 30, 2020 6:38 pm
Can someone tell me how to sort this file on fields 5, then 4, then 12, then 14? Thank you.
...
Please be careful, when posting information about other people. It could be invading their privacy, and I would encourage you change the data before posting it.

There are many different was to do this. Assuming the three examples are three unique lines ending with a new line character, you can use the "sort" command to perform the tasks, and you can learn about it reading the man sort. The -t delimiter and the -k fields will be helpful.

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Thu Jul 30, 2020 7:15 pm
by MrEen
deck_luck wrote:
Thu Jul 30, 2020 7:02 pm
Please be careful, when posting information about other people. It could be invading their privacy, and I would encourage you change the data before posting it.
Removed personally identifiable information. Thanks for mentioning that. :D

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Thu Jul 30, 2020 7:35 pm
by coffeeking
Sorry, should have thought about posting personally identifiable data. Won't do that again!

I have been using the 'sort' command with the '-t' and '-k' parameters but keep getting the error message "multi-character tab '-k5,1'"

I'm really a little unclear on how to use the '-k' parameter ---- i.e., on exactly what the syntax of this parameter is.

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Thu Jul 30, 2020 8:23 pm
by coffeeking
Somewhat in my defense, all of this data is publicly available, nothing confidential. However, in future, I'll mask all personally identifiable info.

Here is an example of a command I've been trying to run

sort -t -k5,1 -k4,1 -k12,1 -k14,1 filetosort.txt > filesorted.txt

And, as I say, I am consistently getting the message "sort: multi-character tab '-k5,1'"

I have no idea what "multi-character tab" refers to (it is a straight csv file, with comma delimiters --- no tab character anywhere in the file as far as I know)

The double quote marks around each field may be the tricky part. Is there something I need in the sort command to take these double quote marks into account?

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Thu Jul 30, 2020 8:32 pm
by rene
This should do it it seems:

Code: Select all

rene@t5500:~$ cat data
x,x,x,2,2,x,x,x,x,x,x,2,x,2
x,x,x,2,2,x,x,x,x,x,x,2,x,1
x,x,x,2,2,x,x,x,x,x,x,1,x,2
x,x,x,2,2,x,x,x,x,x,x,1,x,1
x,x,x,1,2,x,x,x,x,x,x,2,x,2
x,x,x,1,2,x,x,x,x,x,x,2,x,1
x,x,x,1,2,x,x,x,x,x,x,1,x,2
x,x,x,1,2,x,x,x,x,x,x,1,x,1
x,x,x,2,1,x,x,x,x,x,x,2,x,2
x,x,x,2,1,x,x,x,x,x,x,2,x,1
x,x,x,2,1,x,x,x,x,x,x,1,x,2
x,x,x,2,1,x,x,x,x,x,x,1,x,1
x,x,x,1,1,x,x,x,x,x,x,2,x,2
x,x,x,1,1,x,x,x,x,x,x,2,x,1
x,x,x,1,1,x,x,x,x,x,x,1,x,2
x,x,x,1,1,x,x,x,x,x,x,1,x,1
rene@t5500:~$ sort -t, -k5,5 -k4,4 -k12,12 -k14,14 data
x,x,x,1,1,x,x,x,x,x,x,1,x,1
x,x,x,1,1,x,x,x,x,x,x,1,x,2
x,x,x,1,1,x,x,x,x,x,x,2,x,1
x,x,x,1,1,x,x,x,x,x,x,2,x,2
x,x,x,2,1,x,x,x,x,x,x,1,x,1
x,x,x,2,1,x,x,x,x,x,x,1,x,2
x,x,x,2,1,x,x,x,x,x,x,2,x,1
x,x,x,2,1,x,x,x,x,x,x,2,x,2
x,x,x,1,2,x,x,x,x,x,x,1,x,1
x,x,x,1,2,x,x,x,x,x,x,1,x,2
x,x,x,1,2,x,x,x,x,x,x,2,x,1
x,x,x,1,2,x,x,x,x,x,x,2,x,2
x,x,x,2,2,x,x,x,x,x,x,1,x,1
x,x,x,2,2,x,x,x,x,x,x,1,x,2
x,x,x,2,2,x,x,x,x,x,x,2,x,1
x,x,x,2,2,x,x,x,x,x,x,2,x,2
Note that quotes are not an issue if they are around every field.

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Fri Jul 31, 2020 7:55 am
by coffeeking
Hmm, yes I see that that is working.

I'm wondering now if I may not have another "end of line" problem --- i.e. are all of my lines ending with the required line feed?

This is another file I acquired from a non-unix world, so it probably doesn't have line feeds.

When I execute "file filename" it says "ASCII text with very long lines" but doesn't tell me what character is at the end of the lines (whereas with a previous file it told me the lines ended in carriage returns)

There must be end of line characters of some kind --- since "more filename" displays the records as you would expect ---- ending where they end.

How do I find out what control character is at the end of the lines?

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Fri Jul 31, 2020 8:06 am
by rene
Easiest is to simply look at the file in a(ny) hex-editor/viewer. F.e. sudo apt-get install xxd, xxd -g 1 the_file | less and look at the spot where you feel/know a newline should be.

Note; while I expect it was only a copy/paste thing, your originally pasted in file has lines that are cut short.

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Fri Jul 31, 2020 8:32 am
by rene
Oh, and by the way, didn't feel it useful to additionally remark upon after showing an actual command line, but since you didn't ack that part: I hope you are not still/again missing that , after the -t? I.e., that which tells sort that the records are comma-seperated?

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Fri Jul 31, 2020 8:53 am
by coffeeking
Thanks. This hex viewer is showing hex '0a' at the end of the records, so evidently the required line feed is there.

But now I'm stumped again. If the problem I'm having sorting this file is not a end of line problem, then I don't know what it is.

This file is very big, and the records are very big. Could I be encountering a size limit of some kind for the sort command?

I am able to sort the file, however. It is only when I attempt to sort using keys (the -k parameter) that I get the error message "sort: multi-character tab '-k5,5'"

I have no idea what this message means.

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Fri Jul 31, 2020 9:06 am
by Kadaitcha Man
coffeeking wrote:
Fri Jul 31, 2020 8:53 am
Could I be encountering a size limit of some kind for the sort command?
You were given accurate details about that only a few days ago.

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Fri Jul 31, 2020 9:13 am
by rene
coffeeking wrote:
Fri Jul 31, 2020 8:53 am
I have no idea what this message means.
Yes you do. Look one message above your own.

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Fri Jul 31, 2020 9:14 am
by coffeeking
Oh my gosh! So sorry! I was doing that correctly initially but in my haste and confusion subsequently forgot it!! What a rookie error!

Sort seems to be working now!

Thank you again for your help. I appreciate so much the help of the experts and "old hands" on this forum!!

P.S. Yes, you're right about the size question. But every time I'm having problems getting the sort to work, I keep thinking that the size of the file (millions of records) or the size of the record (several hundred bytes) must be the problem. I am amazed that linux is able to manage these very big numbers with such ease!

Re: how to sort on comma delimited fields enclosed in double quotes

Posted: Fri Jul 31, 2020 1:50 pm
by deck_luck
Been there. I call it target fixation when you keep staring at the code and focusing on what you believe is the problem and ignoring the rest. Countless times I will walk away to get a coffee or just do something else for an hour, and when I come back to the erroneous code it slaps me in the forehead. Why didn't I see that before. :lol:


Kadaitcha Man is the master of creepy disguises.