Page 1 of 1

Request for volunteers to vet sed error case

Posted: Wed Oct 14, 2020 11:55 am
by markfilipak
Below is an email I'm preparing to bug-sed@gnu.org.
Before I send it, I need someone to vet the error. Preferred would be someone who is not running Mint in a VM.
Any volunteers?

Of course, your ideas are welcome.

Thanks,
Mark.
=====
Kindly expedite this and contact me for any reason.

The file "1,073,709,056 bytes" provokes an error (& zero output), but only if piped from 'tr' and only for a particular pattern: /00000100/.
the file "1,073,739,776 bytes" succeeds with identical parameters.
The pipe through 'tr' appears not to be the problem.

$ sed --version
sed (GNU sed) 4.2.2
...
$ xxd -p -u "1,073,709,056 bytes" | tr -d '\n' | sed -r 's/00000100/\x0D\x0A&/g' > foo.txt
sed: couldn't re-allocate memory
$ xxd -p -u "1,073,709,056 bytes" | tr -d '\n' | sed -r 's/000001/\x0D\x0A&/g' > foo.txt
$ xxd -p -u "1,073,709,056 bytes" | sed -r 's/00000100/\x0D\x0A&/g' > foo.txt
$ xxd -p -u "1,073,739,776 bytes" | tr -d '\n' | sed -r 's/00000100/\x0D\x0A&/g' > foo.txt
$

You probably want the two source files, or perhaps only the source file that provokes the error. Kindly let me know how I can send it to you.

Re: Request for volunteers to vet sed error case

Posted: Wed Oct 14, 2020 5:02 pm
by xenopeek
I'm curious, what does this answer:

Code: Select all

wc -l "1,073,709,056 bytes"
wc -l "1,073,739,776 bytes"
From your description, and sed being line based, I suppose the smaller file has more lines?

Re: Request for volunteers to vet sed error case

Posted: Wed Oct 14, 2020 5:07 pm
by markfilipak
xenopeek wrote:
Wed Oct 14, 2020 5:02 pm
I'm curious, what does this answer:

Code: Select all

wc -l "1,073,709,056 bytes"
wc -l "1,073,739,776 bytes"
From your description, and sed being line based, I suppose the smaller file has more lines?

Code: Select all

$ wc -l "1,073,709,056 bytes"
4758494 1,073,709,056 bytes
$ wc -l "1,073,739,776 bytes"
4647236 1,073,739,776 bytes
They are both binary files (i.e. DVD 'VOB's).

Re: Request for volunteers to vet sed error case

Posted: Thu Oct 15, 2020 5:22 am
by xenopeek
I can't say whether that is the cause but the smaller file does as I suspected have more lines. 111,258 more lines is just 2.4% more on the total number of lines but perhaps it is what bottlenecks sed memory allocation somehow.

Thinking some more on it, what happens if you split this into two steps instead of using a pipe?
1. xxd -p -u "1,073,709,056 bytes" | tr -d '\n' > temp
2. sed -r 's/00000100/\x0D\x0A&/g' temp > foo.txt
Does that give the same error on step 2?

I suppose there is one more thing to test. If you make a copy of the larger file and replace 111,258 characters in it with newline 0x0A and then retest, does this larger file now give the same error?

BTW I'm lost what the output in foo.txt would be for?