SOLVED regex works in Komodo Edit but not sed (2 questions)

Questions about applications and software
Forum rules
Before you post please read how to get help
Post Reply
sadhu
Level 5
Level 5
Posts: 643
Joined: Fri Nov 22, 2013 9:48 am
Contact:

SOLVED regex works in Komodo Edit but not sed (2 questions)

Post by sadhu »

1. I'm editing some extraneous <span>...</span> tags out of a group of around 4000 web pages. As there are lots of other search/replacements to do, I'm using a bash script that cntains a lot of sed commands.

The text I want to change looks something like this:

Code: Select all

some text, more text <span class="var" title="some-title" id="note56">SAVE THIS</span> main text resumes
I want the final result to look like this "some text, more text SAVE THIS main text resumes".

This command in the bash script doesn't work:*

Code: Select all

sed -i -re 's,<span class="var" title=".*?">(.*)</span>,\1,g' webpage.html
* what I mean by "doesn't work": the bash script (containing only the above sed command) produces no change in the html file, but it works fine in komodo edit.

2. Also I tried to make the regex command "non-greedy" so that it would stop at the first </span>, but nothing I tried worked (in Komodo Edit, anyway).

So could someone help me to make a non-greedy sed regex replacement command that would stop at the the first instance of "</span>"?


Thanks
-sadhu
Last edited by sadhu on Mon Apr 13, 2020 2:26 am, edited 2 times in total.
sabbe sattā bhavantu sukhitattā. LN 19.3-64 Cinn 4.4.8, Mobo: ASUSTeK STRIX B250G GAMING v, Dual core Pntm G4560, Intel Gfx. Laptop:ASUS, Core i3, 4G RAM Intel Gfx

User avatar
xenopeek
Level 24
Level 24
Posts: 24745
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Re: regex works in Komodo Edit but not sed (2 questioins)

Post by xenopeek »

sed doesn't have non greedy matching as far as I recall.

Assuming there can be no double quote in the title and there won't be html elements inside the span elements, you can emulate non greedy matching as such:
sed -i -re 's,<span class="var" title="[^"]*">([^<]*)</span>,\1,g' webpage.html

I don't know why it doesn't work from the bash script. Trying calling it with bash -x script where you replace script with the filename of your script, to run it with debug enabled.
Image

sadhu
Level 5
Level 5
Posts: 643
Joined: Fri Nov 22, 2013 9:48 am
Contact:

Re: regex works in Komodo Edit but not sed (2 questioins)

Post by sadhu »

Thank you very much! :D
sabbe sattā bhavantu sukhitattā. LN 19.3-64 Cinn 4.4.8, Mobo: ASUSTeK STRIX B250G GAMING v, Dual core Pntm G4560, Intel Gfx. Laptop:ASUS, Core i3, 4G RAM Intel Gfx

sadhu
Level 5
Level 5
Posts: 643
Joined: Fri Nov 22, 2013 9:48 am
Contact:

Re: SOLVED regex works in Komodo Edit but not sed (2 questions)

Post by sadhu »

sed does allow a deletion of all lines between a couple of patterns, which I found to be VERY handy.

Code: Select all

sed -i '/pattern1/,/patern2/d' filename
This deletes the line containing pattern1 down to and including the line that contains pattern2.

-sadhu
sabbe sattā bhavantu sukhitattā. LN 19.3-64 Cinn 4.4.8, Mobo: ASUSTeK STRIX B250G GAMING v, Dual core Pntm G4560, Intel Gfx. Laptop:ASUS, Core i3, 4G RAM Intel Gfx

Post Reply

Return to “Software & Applications”