Simple Gawk script not doing as I want, what am I doing wrong? Fixed the mistake I saw but it's still not working.

About writing shell scripts and making the most of your shell
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Locked
going-bald
Level 1
Level 1
Posts: 14
Joined: Tue Apr 23, 2013 6:45 am

Simple Gawk script not doing as I want, what am I doing wrong? Fixed the mistake I saw but it's still not working.

Post by going-bald »

The script is intended to scan the column titles of a set of csv's and, in each csv, look for two column titles ....... if they exist.
It seems that only one column will exist in any given csv, I have yet to find a csv where both columns exist.
I want an output, sent to another csv, that tells me if the script found either column and where that column was found i.e. field/column-number of that field. For the other, 'missing', column I want the script to output a suitable message in the appropriate cell of the output csv.
I have written some versions of the script that work, more complicated than the one of this post, but I am wondering what is wrong with this script?

Code: Select all

file1=$(date "+1_Flightlog_summary_%Y-%m-%d_[%H-%M-%S].csv")
file2=$(date "+2_set_max_height_all_hopefully_%Y-%m-%d_[%H-%M-%S].csv")

gawk -F, '{if(NR ==1){
		      print "a ,FILENAME ,FNR ,NR ,a ,set_max_height_col_label ,set_max_height_col_nos ,a ,max_height_status ,status_clo_nos ,a ,activation_date ,activ_date_col_nos" > ("output/set_max_height_all_hopefully.csv")
	      	      print " " >> ("output/set_max_height_all_hopefully.csv")
  		      }
	   	      if (FNR ==1) {Limit=0 ;status=0 ;act_date=0 ;cCol_titles_found=0 }		      
		      if ((FNR <=2) && (cCol_titles_found <1) && ($1 !~/sep/) && ($1 !=""))     { cCol_titles_found++		      
		      	      	   for (i=1; i<=NF; i++){if  (($i  ~/HOME.heightLimit/)	      && ($i !~/HOME.heightLimitStatus/))	{Limit = i }
 				   	    		 if  (($i  ~/HOME.heightLimitStatus/) || ($i ~/HOME.isReachedLimitHeight/))	{status = i }
							 if  (($i  ~/RECOVER.activeTimestamp/)|| ($i ~/DETAILS.activeTimestamp/))	{act_date = i }
					      	 	 }				   
		      		   if (Limit ==0)  	       {$(Limit)    =1}
		      		   if (status ==0) 	       {$(status)   =2}
		      		   if (act_date ==0) 	       {$(act_date) =3}
		      		   print "  c,"FILENAME","FNR","NR", c ,"$(Limit)","Limit", c ,"$(status)","status",  c  ,"$(act_date)","act_date >> ("output/set_max_height_all_hopefully.csv")
				   
# close ------------------------------------------------------------------------------------(FNR <=2) && ......
												 }

			 }' *.csv > output/$file1

			 pwd
			 mv output/set_max_height_all_hopefully.csv        output/$file2
			 rm output/1*.csv

I know there is something wrong with the output-file's naming but I haven't looked at that yet --- apart from the "mv" and "rm" bodge at the end of the script, but that is not causing the problem.
The script has oddities left over from the bigger script from which was taken i.e. "Col_titles_found" and "cshl" but I do not think they cause a problem
The output of the above is shown in the second attached. The first attachment shows the output of one of the other versions which is getting closer to what I want.
Thanks for any assistance


PS as far as I can see from the tests I have made the MODIFIED script works correctly up to the point where the "for (i=1; i<=NF; i++)" loop is closed, I get the correct data in the output csv if I have print statements immediately after the close of the "for i" loop that do not use "$(Limit)" and "$(status) where $(Limit) & $(status) would equate to $0.

I CAN NOT see what is wrong the the following 4 lines

Code: Select all

if (Limit ==0)  	       {$(Limit)    =1}
if (status ==0) 	       {$(status)   =2}
if (act_date ==0) 	       {$(act_date) =3}
print "  c,"FILENAME","FNR","NR", c ,"$(Limit)","Limit", c ,"$(status)","status",  c  ,"$(act_date)","act_date >> ("output/set_max_height_all_hopefully.csv")
Please help.

PPS I added the search for the third column "activeTimeStamp" just to see if the problem was due to "HOME.heightLimit" being a subset of "HOME.heightLimitStatus" but the output for $(act_date) is screwed up too.
Attachments
Screenshot from 2023-03-11 02-40-16.png
Screenshot from 2023-03-11 02-13-01.png
Last edited by LockBot on Sun Sep 10, 2023 10:00 pm, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
User avatar
Termy
Level 12
Level 12
Posts: 4248
Joined: Mon Sep 04, 2017 8:49 pm
Location: UK
Contact:

Re: Simple Gawk script not doing as I want, what am I doing wrong? Fixed the mistake I saw but it's still not working.

Post by Termy »

I've had to address the formatting of the bulk of it, because it was very difficult to read.

Code: Select all

gawk -F, '
	{
		if (NR == 1) {
			print "a ,FILENAME ,FNR ,NR ,a ,set_max_height_col_label ,set_max_height_col_nos ,a ,max_height_status ,status_clo_nos ,a ,activation_date ,activ_date_col_nos" > "output/set_max_height_all_hopefully.csv"
			print " " >> "output/set_max_height_all_hopefully.csv"
		}

		if (FNR == 1) {
			Limit=0
			status=0
			act_date=0
			cCol_titles_found=0
		}		      

		if (FNR <= 2 && cCol_titles_found < 1 && $1 !~ /sep/ && $1 != "") {
			cCol_titles_found++		      
			for (i = 1; i <= NF; i++) {
				if ($i ~ /HOME.heightLimit/ && $i !~ /HOME.heightLimitStatus/) {
					Limit = i
				}

				if ($i ~ /HOME.heightLimitStatus/ || $i ~ /HOME.isReachedLimitHeight/) {
					status = i
				}
				
				if ($i ~ /RECOVER.activeTimestamp/ || $i ~ /DETAILS.activeTimestamp/) {
					act_date = i
				}
			}
				   
			if (Limit == 0) Limit = 1
			if (status == 0) status = 2
			if (act_date == 0) act_date = 3

			print "  c," FILENAME "," FNR "," NR ", c ," Limit "," Limit ", c ," status "," status ",  c  ," act_date "," act_date >> "output/set_max_height_all_hopefully.csv"
Since it's incomplete and I don't have the file it's parsing, it's untested, so I apologise if I left a typo.

I've also addressed:
  • Unnecessary parentheses around each condition in if statements.
  • Invalid variable syntax mentioned by previous poster.
  • Improper use of redirection with print().
In an if statement, you need only the initial parentheses, so that the statement knows what the condition is. You can group together multiple conditions to test groups of things with or against other groups of things, which is then when the extra parentheses come in.

For the redirection, you don't need parentheses around the file path. The reason it worked was likely because of parentheses changing execution order; since "FILE" is true (contains a value), it successfully did nothing, leaving print() to do what it usually does. You can read about the syntax of redirection and other useful things under subheading I/O Statements of the gawk(1) man page.

Remember, ~ is the comparison operator for REGEX. So, comparing something with ~ /this/ will match if the string 'this' is or is contained in the thing you're comparing it to. Oftentimes, we just need to compare something to a string, which can be done with: Foo == "Bar" instead of Foo ~ /Bar/.

If you do just need to compare to strings and not with REGEX, you could simplify quite a bit:

Code: Select all

				if ($i ~ /HOME.heightLimit/ && $i !~ /HOME.heightLimitStatus/) {
					Limit = i
				}
Becomes:

Code: Select all

				if ($i == "HOME.heightLimit") Limit = i
When you have only one 'thing' to execute in an if statement, you can omit the braces; I believe the same applies to things like for loops.

I imagine addressing the above post will solve the problem, but addressing this one should also help, especially in the long run.
I'm also Terminalforlife on GitHub.
Locked

Return to “Scripts & Bash”