limit on sort file size?

About writing shell scripts and making the most of your shell
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Locked
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

limit on sort file size?

Post by coffeeking »

Is there a limit on the size of a file you can sort with the sort command (in terminal)? I've been sorting files OK, but, when I try to sort a very big file, what comes back is just the original file unsorted, no error messages.
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
Moonstone Man
Level 16
Level 16
Posts: 6054
Joined: Mon Aug 27, 2012 10:17 pm

Re: limit on sort file size?

Post by Moonstone Man »

coffeeking wrote: Sat Jul 25, 2020 9:53 am Is there a limit on the size of a file you can sort with the sort command (in terminal)? I've been sorting files OK, but, when I try to sort a very big file, what comes back is just the original file unsorted, no error messages.
Yes, there is a limit to sort, it's called disk space.
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

Well, I've got boatloads of disk space, way, way more than the size of this file and the size of the sorted version, but every time I try to sort it, I get the same, unsorted file returned, nothing sorted, and no error messages. It's like the operating system is saying "Nope, this is too big to sort, I'm just going to copy the file!"
User avatar
xenopeek
Level 25
Level 25
Posts: 29588
Joined: Wed Jul 06, 2011 3:58 am

Re: limit on sort file size?

Post by xenopeek »

If available memory is too small to sort the file, sort will use /tmp to store temporaries. If /tmp is a memory filesystem and the file is too big possibly it can't complete the sort but I'd expect some error. You can check with findmnt /tmp; if that says FSTYPE tmpfs or similar your /tmp is a memory filesystem. You may then try running the sort command with -T/--temporary-directory=DIR option to choose a different directory for temporaries. Make a temp directory in your home directory and then use sort -T /home/username/temp filename and that may work better if /tmp is mounted in memory.
Image
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

Thanks for this information.

I did the findmnt /tmp and it didn't return anything!

So I created the temp directory in my home directory and ran "sort -T /home/rod/temp -o sortedResidentList.txt OhioResidentList.txt"

Unfortunately, the result was no different ---- file simply copied, no error messages.
User avatar
xenopeek
Level 25
Level 25
Posts: 29588
Joined: Wed Jul 06, 2011 3:58 am

Re: limit on sort file size?

Post by xenopeek »

No response from findmnt would in this case mean it's stored where / is stored, so on disk. So no need doe the temp directory in your home directory.

I can't say what's happening. How big a file are we talking about?
Image
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

7.5 million records ---- file is not quite 500MB ----- too big for the sort command?
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: limit on sort file size?

Post by rene »

Can only report that over here sort -o sorted unsorted works fine with a 534M "unsorted" file.
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

Thanks for running this test. Evidently the file size is not the problem. My records are comma delimited and have some kind of control character at the end of each record, a carriage return or line feed or something like that. Would either of these 2 characteristics be a problem?

I notice when I view the file (unsorted or sorted) that the records are not displayed one to a line but are just kind of spewed across the screen. Is it possible that the control character at the end of my records is not being recognized by sort as the end of the records? In other words, sort sees my file as one big record?

What kind of control character is needed at the end of each record for linux to recognize it as the end of a record?
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

Is there some way for me to figure out what this control character is at the end of all of my records and then perhaps to tell the sort to recognize this character as the end of a record?
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: limit on sort file size?

Post by rene »

sort as such works on text-files, i.e., on records otherwise known as lines, separated by LF characters, 10 decimal. cat -A <file> will show LF as $. I don't believe sort has any other possibility for a line-separator either other than NUL, 0. If you have a more involved or otherwise separated file you can/should either not use plain sort or will at last have to preprocess it a bit before you can.
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

I've tried again sorting a much smaller file with the same characteristics, comma delimited, CR at end of lines, and it sorts fine!

So the only difference between the sorts that are working and the sorts that aren't working seems to be the size of the file. ------> small file works, big one doesn't!

Am I running into some kind of file size, or other resource, limit when i try to sort this file that is 7.5 million records big?
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

rene, thanks for your reply. I did not see your reply before I posted my previous reply.

I did a "file filename" and this told me that there is a carriage return at the end of each line (CR)

And, when I do a "more filename" I do see the file displayed using the CR at the end of each line, i.e., each line of the file is displayed on its own line. (the less command apparently doesn't use these CR characters and therefore doesn't display each line of text on its own line)

In my last reply, I indicated that I've essentially given up on the theory that the CR character is not being recognized by the sort command.

I am able to sort small files with the CR's at the end of the line, but I am not able to sort the 7.5 million line file with the CR's at the end of the line.

At this point, it really does seem to be a matter of file size. I don't know what else to think.
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

OK, sorry, you're all probably getting very impatient with all of my false starts and misdirection here! Sorry!

I think rene is right on the money!

Now i see that at the end of every record in the small file for which the sort is working --------> a line feed character!

This line feed character is NOT at the end of every record in the big file for which the sort is not working!!

Therefore, I am theorizing now that it is this missing line feed character that is the problem.

Is there any way to either: (1) add the line feed character to the end of every line in the big file, or (2) tell the sort to use the carriage return (CR) to recognize the end of each line instead of the line feed?

Thank you for hanging in there with me!
hydrurga
Level 5
Level 5
Posts: 746
Joined: Sun Nov 15, 2015 4:08 pm

Re: limit on sort file size?

Post by hydrurga »

Consider installing and using the dos2unix package. Man: https://linux.die.net/man/1/dos2unix
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: limit on sort file size?

Post by rene »

Pardon being late to reply each time: was doing something else...

Do I understand correctly that file reports literally CR and not CRLF? Latter would be DOS-convention and would mean dos2unix useful indeed, but if indeed your records are literally CR-seperated you may as well do it yourself:

Code: Select all

cat unsorted | tr '\r' '\n' | sort -o sorted
No, sort does not seem to allow for other line separators than LF and NUL, Does allow any and all field separators, i.e., so as to sort on a custom field within a line, but if a standard lexical sort on the line itself / first field of the line is what is needed for your data above should supposedly do it.
coffeeking
Level 1
Level 1
Posts: 46
Joined: Mon Dec 23, 2019 8:54 pm

Re: limit on sort file size?

Post by coffeeking »

Wow! That piping into the 'tr" transform of \r to \n worked like a champ! Thank you!! The file is sorting fine now.

Just one more question (I hope!).

Here's the top of my sorted file:

<removed>

As you can see, the sort seems to be completely ignoring the commas.

Do I use the 'key" option for sort to get sort to look at and sort on individual fields? Would each comma represent another field I can identify using 'key' option?
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: limit on sort file size?

Post by rene »

coffeeking wrote: Sun Jul 26, 2020 7:36 pm Do I use the 'key" option for sort to get sort to look at and sort on individual fields? Would each comma represent another field I can identify using 'key' option?
Yes. Although I'm not in fact sure this is what you need to happen, to sort on e.g. comma-separated field 2:

Code: Select all

sort -t, -k 2 -o sortedonfield2 <source>
See man sort for a fuller description of the key-format.

I by the way edited this post minutes after posting so as to edit out that actual names list: it might be considered privacy sensitive? If so, you may want to do the same to yours..
Locked

Return to “Scripts & Bash”