[SOLVED] AWK: State-Aware Pattern Matching

About writing shell scripts and making the most of your shell
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Locked
User avatar
Dark Owl
Level 5
Level 5
Posts: 553
Joined: Sat May 02, 2020 7:43 am
Location: Brit

[SOLVED] AWK: State-Aware Pattern Matching

Post by Dark Owl »

I'm hoping there are some AWK gurus here.

For beginners: an AWK script essentially breaks down as:

/pattern1/ { operation1 }
/pattern2/ { operation2 }
...


Then the input is processed by reading the first record (a text line by default), comparing it with the pattern definitions, and running the operations for all patterns that find a match in the record. Then the next record is read, until end-of-file.

What I want to do is have a state variable so that I can control which patterns are tested according to what section of an input file is currently being read. I know how to do that and make the operations conditional, but that means duplicating the state awareness through every operation:

/pattern1/ { if state=A { operation1a } else if state=B { operation1b }... }
/pattern2/ { if state=A { operation2a } else if state=B { operation2b }... }
...


...but every pattern required in every state then needs to be specified for every state.

What would be more efficient is to create a switch on testing the patterns in the first place:

if state=A {
/pattern1A/ { operation1A }
/pattern2A/ { operation2A }
...
}
if state=B {
/pattern1B/ { operation1B }
/pattern2B/ { operation2B }
...
}
...


...that way the patterns defined for State B need not be the same as for State A, and only patterns relevant to (say) State B need to be specified in the State B section.

The trouble is, AWK does not process the script in that way. It is procedural within the { operation } section, but outside that it looks at every pattern definition and applies each of them to the input record. "if" would not be valid syntax outside an operation.

Any ideas?
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 3 times in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
Currently: Linux Mint 21.2 Cinnamon 64-bit 5.8.4, AMD Ryzen5 + Geforce GT 710
Previously: LM20.3 LM20.2 LM20.1, LM20, LM20β, LM18.2
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: AWK: State Aware Pattern Matching

Post by rene »

Dark Owl wrote: Tue Jun 28, 2022 4:13 am an AWK script essentially breaks down as:
/pattern1/ { operation1 }
/pattern2/ { operation2 }
...
That is not quite correct and in this case relevantly so. Certainly the /re/ pattern is the most often used but an AWK statement is more generally

Code: Select all

pattern { action }
where pattern can be something other than a regex match as well; see man awk. Here this is to say you can do basically what you yourself suggest:

Code: Select all

$ cat foo.awk
/^section / {
	section = $2;
}
section == 0 {
	if ($0 ~ /foo/) print "0: foo";
	if ($0 ~ /bar/) print "0: bar";
}
section == 1 {
	if ($0 ~ /foo/) print "1: foo";
	if ($0 ~ /baz/) print "1: baz";
}
Note; you can do that nicer -- gawk knows about switch/case for one -- but this is a minimal illustration only.

This delimits sections by literal lines "section ?", i.e.,

Code: Select all

rene@hp8k:~$ cat foo.txt
section 0
foo
bar
baz
section 1
foo
bar
baz
section 2
foo
bar
baz
rene@hp8k:~$ awk -f foo.awk foo.txt
0: foo
0: bar
1: foo
1: baz
But that's to say then that you can basically do as you suggest; awk is a fairly complete procedural language -- although I'd still advise to not loose yourself too deeply in it because certainly something like Python would be soon-ish more convenient when things in fact get involved.
User avatar
Dark Owl
Level 5
Level 5
Posts: 553
Joined: Sat May 02, 2020 7:43 am
Location: Brit

Re: AWK: State Aware Pattern Matching

Post by Dark Owl »

I don't disagree with what you say, but what you've done is move the pattern matching inside the action section of a pattern matching the value of "section". I was aware of that possibility, I was hoping somebody had a way to escape the actual pattern matching.

One idea I have is to abort processing of any further pattern rules using NEXT.

What about something like:

STATE==n && /pattern/ { action }

?

I'll try that out later.

As to using AWK in preference to anything else, I'm sorry but I'm old-school. It's *much* easier to use something I'm familiar with than something I'm not, and anyway AWK already handles a lot of the nitty-gritty of text file processing which would have to be implemented explicitly in a general-purpose language.
Currently: Linux Mint 21.2 Cinnamon 64-bit 5.8.4, AMD Ryzen5 + Geforce GT 710
Previously: LM20.3 LM20.2 LM20.1, LM20, LM20β, LM18.2
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: AWK: State Aware Pattern Matching

Post by rene »

I don't understand your comment; what I did was exactly what you yourself suggested in the part of your post starting with "What would be more efficient [ ... ]".

Anyways; need be off...
User avatar
Dark Owl
Level 5
Level 5
Posts: 553
Joined: Sat May 02, 2020 7:43 am
Location: Brit

Re: AWK: State Aware Pattern Matching

Post by Dark Owl »

Yeah, well maybe I didn't express myself as precisely as some might like. By "efficient" I meant in terms of concise and easily understood code rather than execution.

Anyway, I updated the previous post to indicate a line of enquiry I shall pursue next...
Currently: Linux Mint 21.2 Cinnamon 64-bit 5.8.4, AMD Ryzen5 + Geforce GT 710
Previously: LM20.3 LM20.2 LM20.1, LM20, LM20β, LM18.2
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: AWK: State Aware Pattern Matching

Post by rene »

Believe you may have misread my 'section' clauses as being the very same; note that the second uses e.g. /baz/ rather than /bar/, i.e., does as per that
... that way the patterns defined for State B need not be the same as for State A, and only patterns relevant to (say) State B need to be specified in the State B section.
If that's not what you mean I'll give up.
User avatar
Dark Owl
Level 5
Level 5
Posts: 553
Joined: Sat May 02, 2020 7:43 am
Location: Brit

Re: AWK: State Aware Pattern Matching

Post by Dark Owl »

rene wrote: Tue Jun 28, 2022 7:33 am If that's not what you mean I'll give up.
Why do you think I don't understand what you wrote?

This works (expanding slightly on your example):

Code: Select all

F:\Test>type foo2.awk
/^section / {
        section = $2;
}

# Applies the following only to lines found after "section 0"
section == 0 && /foo/ { print "0: foo" }
section == 0 && /bar/ { print "0: bar" }
section == 0 { next }

# Applies the following only to lines found after "section 1"
section == 1 && /foo/ { print "1: foo" }
section == 1 && /baz/ { print "1: baz" }
section == 1 { next }

# Applies the following only to lines where the section switch is not "0" or "1"
{ print $0 }

Code: Select all

F:\Test>type foo.txt
section 0
foo
bar
baz
section 1
foo
bar
baz
section 2
foo
bar
baz

F:\Test>gawk -f foo2.awk foo.txt
0: foo
0: bar
1: foo
1: baz
section 2
foo
bar
baz

F:\Test>
It's just a question of which version is easiest to read.
Currently: Linux Mint 21.2 Cinnamon 64-bit 5.8.4, AMD Ryzen5 + Geforce GT 710
Previously: LM20.3 LM20.2 LM20.1, LM20, LM20β, LM18.2
User avatar
Dark Owl
Level 5
Level 5
Posts: 553
Joined: Sat May 02, 2020 7:43 am
Location: Brit

Re: [SOLVED] AWK: State-Aware Pattern Matching

Post by Dark Owl »

I often see the text "foo" and "bar" used as random sample strings... but why isn't it "fu" and "bar"?
Currently: Linux Mint 21.2 Cinnamon 64-bit 5.8.4, AMD Ryzen5 + Geforce GT 710
Previously: LM20.3 LM20.2 LM20.1, LM20, LM20β, LM18.2
User avatar
Coggy
Level 5
Level 5
Posts: 632
Joined: Thu Mar 31, 2022 10:34 am

Re: [SOLVED] AWK: State-Aware Pattern Matching

Post by Coggy »

A long list ofSTATE==n && /pattern/ { action } lines would be my chosen approach, unless I was really worried about performance.

You could but some NEXT actions in there to cut processing short, but that complicates the structure and makes mistakes while updating the code more likely. So I wouldn't do that unless necessary.
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: [SOLVED] AWK: State-Aware Pattern Matching

Post by rene »

Dark Owl wrote: Thu Jun 30, 2022 5:38 pm I often see the text "foo" and "bar" used as random sample strings... but why isn't it "fu" and "bar"?
History; http://www.catb.org/jargon/html/F/foo.html
User avatar
Dark Owl
Level 5
Level 5
Posts: 553
Joined: Sat May 02, 2020 7:43 am
Location: Brit

Re: [SOLVED] AWK: State-Aware Pattern Matching

Post by Dark Owl »

Coggy wrote: Fri Jul 01, 2022 3:52 am You could but some NEXT actions in there to cut processing short
I did (see above). I don't see that terminating a state section with a next complicates anything, unless you want to do some subsequent processing valid for all states.

I like Rene's approach, because it effectively vectors to the relevant state section (especially if using a case statement). I just don't much like having to make explicit match statements within the state processing.
rene wrote: Fri Jul 01, 2022 4:43 am History; http://www.catb.org/jargon/html/F/foo.html
Very good! Thanks.
Currently: Linux Mint 21.2 Cinnamon 64-bit 5.8.4, AMD Ryzen5 + Geforce GT 710
Previously: LM20.3 LM20.2 LM20.1, LM20, LM20β, LM18.2
User avatar
Dark Owl
Level 5
Level 5
Posts: 553
Joined: Sat May 02, 2020 7:43 am
Location: Brit

Re: [SOLVED] AWK: State-Aware Pattern Matching

Post by Dark Owl »

Here's a refinement – "$0 ~" is not required, because if a regexp is on the left then "~ $0" is implied:

Code: Select all

F:\Test>type foo3.awk
/^section / {
        section = $2;
}

{  switch (section) {

   case 0:

      if (/foo/) { print "0: foo" }
      if (/bar/) { print "0: bar" }
      break

   case 1:

      if (/foo/) { print "1: foo" }
      if (/baz/) { print "1: baz" }
      break

   default:

      print $0

   }
}

F:\Test>type foo.txt
section 0
foo
bar
baz
section 1
foo
bar
baz
section 2
foo
bar
baz

F:\Test>gawk -f foo3.awk < foo.txt
0: foo
0: bar
1: foo
1: baz
section 2
foo
bar
baz

F:\Test>
Getting close! I had an idea the "?" selection operator might be used instead of the "if", but Gawk baulked at having anything other than an expression on the right of the ?.

NB: The switch-case structure is not available in all versions of AWK, and Gawk won't recognise it if in compatibility mode.
Currently: Linux Mint 21.2 Cinnamon 64-bit 5.8.4, AMD Ryzen5 + Geforce GT 710
Previously: LM20.3 LM20.2 LM20.1, LM20, LM20β, LM18.2
Locked

Return to “Scripts & Bash”