[Solved] Removing files based on mask of filename

All Gurus once were Newbies
Forum rules
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Please stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions prefer the other forums within the support section.
Before you post please read how to get help
Post Reply
JasonStonier
Level 2
Level 2
Posts: 56
Joined: Sun Oct 02, 2016 10:51 am

[Solved] Removing files based on mask of filename

Post by JasonStonier » Tue Sep 10, 2019 3:47 pm

I have a system which takes automated pictures on an iPhone (automated scanning of old magazines to an archive) and uploads them to my linux box, but occasionally the shutter trigger results in multiple pictures of the same page separated by milliseconds in the filename:

Code: Select all

jason@JasonMain ~/Pictures $ ls -al *.*.jpeg
-rw-r--r-- 1 jason jason 2264825 Sep 10 18:53 2019-09-10 18-53-31.157 (2).jpeg
-rw-r--r-- 1 jason jason 2264825 Sep 10 18:53 2019-09-10 18-53-31.157.jpeg
-rw-r--r-- 1 jason jason 2260654 Sep 10 18:53 2019-09-10 18-53-31.207.jpeg
-rw-r--r-- 1 jason jason 2916184 Sep 10 18:56 2019-09-10 18-56-17.848.jpeg
-rw-r--r-- 1 jason jason 2907621 Sep 10 18:56 2019-09-10 18-56-18.008 (2).jpeg
-rw-r--r-- 1 jason jason 2907621 Sep 10 18:56 2019-09-10 18-56-18.008.jpeg
-rw-r--r-- 1 jason jason 2766682 Sep 10 18:57 2019-09-10 18-57-24.156 (2).jpeg
-rw-r--r-- 1 jason jason 2766682 Sep 10 18:57 2019-09-10 18-57-24.156.jpeg
-rw-r--r-- 1 jason jason 2765306 Sep 10 18:57 2019-09-10 18-57-24.206.jpeg
-rw-r--r-- 1 jason jason 2856521 Sep 10 18:57 2019-09-10 18-57-58.145 (2).jpeg
-rw-r--r-- 1 jason jason 2856521 Sep 10 18:57 2019-09-10 18-57-58.145.jpeg
-rw-r--r-- 1 jason jason 2847966 Sep 10 18:57 2019-09-10 18-57-58.245.jpeg
Is there any way in the terminal to delete the 'repeats' - for example I would want to keep:

Code: Select all

-rw-r--r-- 1 jason jason 2264825 Sep 10 18:53 2019-09-10 18-53-31.157 (2).jpeg
and remove

Code: Select all

-rw-r--r-- 1 jason jason 2264825 Sep 10 18:53 2019-09-10 18-53-31.157.jpeg
-rw-r--r-- 1 jason jason 2260654 Sep 10 18:53 2019-09-10 18-53-31.207.jpeg
I can't rely on the repeats having (2) in the filename, and they don't usually have the same filesize, but they would always be in the same whole second e.g. '2019-09-10 18-53-31'

I've googled around a lot and there are many ways to remove true duplicates, but that's not what I have here.

Thanks for any help!
Last edited by JasonStonier on Thu Sep 12, 2019 4:29 am, edited 1 time in total.

ColdBootII
Level 5
Level 5
Posts: 521
Joined: Wed Aug 02, 2017 8:19 am

Re: Removing files based on mask of filename

Post by ColdBootII » Tue Sep 10, 2019 8:23 pm

Hi,

Instead of using a script for this, did you simply tried to find and removing duplicates using Pix or some other photo organizer?

It sure worked well for me. But if it's very important to you, as I suspect, you can check first by creating a duplicate folder, remove duplicates from it using Pix and then output that ls command in original and duplicate folders, to a text file. Then using diff you'll see for yourself what the tool deleted and whether you can rely on it or not.

Cheers.

User avatar
Flemur
Level 17
Level 17
Posts: 7333
Joined: Mon Aug 20, 2012 9:41 pm
Location: Potemkin Village

Re: Removing files based on mask of filename

Post by Flemur » Tue Sep 10, 2019 9:23 pm

18-53-31 = HH-MM-SS ?

This is all bogus syntax, but the point is to loop thru all the hh/mm/ss and mv each hh/mm/ss.xxx file to the same file (=a), overwriting it each time, then renaming it to the correct hh-mm-ss.000

Code: Select all

for hh = 0 to 23
for mm = 0 to 59
for ss = 0 to 59
  rm a
  for file in "2019-09-10 hh-mm-ss.*.jpeg"  {  # need to FORMAT those numbers into the filename
    mv $file to a
  }
  if (a exists) 
     mv a "2019-09-10 hh-mm-ss.000.jpeg"
}
Please edit your original post title to include [SOLVED] if/when it is solved!
Your data and OS are backed up....right?

ColdBootII
Level 5
Level 5
Posts: 521
Joined: Wed Aug 02, 2017 8:19 am

Re: Removing files based on mask of filename

Post by ColdBootII » Wed Sep 11, 2019 3:50 am

ColdBootII wrote:
Tue Sep 10, 2019 8:23 pm
But if it's very important to you, as I suspect, you can check first by creating a duplicate folder, remove duplicates from it using Pix and then output that ls command in original and duplicate folders, to a text file. Then using diff you'll see for yourself what the tool deleted and whether you can rely on it or not.
Heh, I forgot that in the case of Pix, you are being asked to check, which file(s) to keep/delete, sorry. Could be impractical, if there's too many of them.

JasonStonier
Level 2
Level 2
Posts: 56
Joined: Sun Oct 02, 2016 10:51 am

Re: Removing files based on mask of filename

Post by JasonStonier » Wed Sep 11, 2019 4:40 am

Flemur wrote:
Tue Sep 10, 2019 9:23 pm
18-53-31 = HH-MM-SS ?

This is all bogus syntax, but the point is to loop thru all the hh/mm/ss and mv each hh/mm/ss.xxx file to the same file (=a), overwriting it each time, then renaming it to the correct hh-mm-ss.000
That's a genius idea. Thanks so much for the pointer, really appreciate the help.

User avatar
AndyMH
Level 9
Level 9
Posts: 2641
Joined: Fri Mar 04, 2016 5:23 pm
Location: Wiltshire

Re: Removing files based on mask of filename

Post by AndyMH » Wed Sep 11, 2019 7:45 am

To get rid of lots of duplicate files I use FSlint. Can't remember if it is in software manager. Has a GUI interface.
Homebrew i5-8400+GTX1080 Cinnamon 19.0, 3 x Thinkpad T430 Cinnamon 19.0, i7-3632 , i5-3320, i5-3210, Thinkpad T60 19.0 Mate

User avatar
Flemur
Level 17
Level 17
Posts: 7333
Joined: Mon Aug 20, 2012 9:41 pm
Location: Potemkin Village

Re: Removing files based on mask of filename

Post by Flemur » Wed Sep 11, 2019 8:46 am

JasonStonier wrote:
Wed Sep 11, 2019 4:40 am
Flemur wrote:
Tue Sep 10, 2019 9:23 pm
18-53-31 = HH-MM-SS ?
This is all bogus syntax, but the point is to loop thru all the hh/mm/ss and mv each hh/mm/ss.xxx file to the same file (=a), overwriting it each time, then renaming it to the correct hh-mm-ss.000
That's a genius idea. Thanks so much for the pointer, really appreciate the help.
I wish I was more up on the bash syntax for those operations (loop, file exists, make a filename, etc); back in the old days I'd write a C program which made 'system' calls to something like that. Good luck!
Please edit your original post title to include [SOLVED] if/when it is solved!
Your data and OS are backed up....right?

User avatar
Lanser
Level 4
Level 4
Posts: 298
Joined: Mon Mar 08, 2010 5:12 am
Location: Salzburg Austria

Re: Removing files based on mask of filename

Post by Lanser » Wed Sep 11, 2019 9:55 am

Hello Jason. You may find pyRenamer worth installing. It's very very useful for bulk file renaming / filtering / using substitution or deletion.

Lanser
Thinkpads:- LM19.x Mate, LMDE3, Debian 10

User avatar
Flemur
Level 17
Level 17
Posts: 7333
Joined: Mon Aug 20, 2012 9:41 pm
Location: Potemkin Village

Re: Removing files based on mask of filename

Post by Flemur » Wed Sep 11, 2019 11:56 am

What I suggested is kinda brute force and the over-writing (by renaming) bugs me, but it should be simple to code up and it should work for any number of duplicates; but if you only have some small # of duplicates for a givien time, say 3 = the max, like you showed, it might be better to change the inside of that loop to do these cases:
1 file: just keep it = go to the next time in the ss loop.
2 files: delete one of them
3 files: delete two of them.
Please edit your original post title to include [SOLVED] if/when it is solved!
Your data and OS are backed up....right?

User avatar
coffee412
Level 5
Level 5
Posts: 998
Joined: Mon Nov 12, 2012 7:38 pm
Location: Indiana, USA
Contact:

Re: Removing files based on mask of filename

Post by coffee412 » Wed Sep 11, 2019 4:00 pm

How about just do this and be done with it ;)
Just check the variable (~) for home to make sure it works. I commented the deletion of "keepers" out just in case.

Code: Select all

#!/bin/bash

mkdir -p ~/Pictures/keepers
cp *"("*.jpeg keepers
rm -f *.jpg
cp ~/Pictures/keepers/*.jpg ~/Pictures

# rm -fr /Pictures/keepers
Ryzen x1800 Asus Prime x370-Pro 32 gigs Ram RX480 graphics
IceWarp 12.0.3 * Mint 18.3 * RAID 1/5 * OpenVPN * Linux since kernel 2.0.36
************* Get Your Linux on! ***************

JasonStonier
Level 2
Level 2
Posts: 56
Joined: Sun Oct 02, 2016 10:51 am

Re: Removing files based on mask of filename

Post by JasonStonier » Thu Sep 12, 2019 4:27 am

coffee412 wrote:
Wed Sep 11, 2019 4:00 pm
How about just do this and be done with it ;)

cp *"("*.jpeg keepers
I don't think this does what I need - I can't rely on the file having "(" in the file name. The only consistency in the files I need to consolidate is that they have the same time in the filename e.g. these three taken a few miliseconds apart at 16:02 and 22 seconds, I need to keep the first and discard the last two.
12-09-19 16-02-22.121.jpg
12-09-19 16-02-22.136.jpg
12-09-19 16-02-22.854.jpg

Flemur's brute force solution is achievable and I'm working my way through how to do it...but in parallel I solved it mechanically by clicking the iphone shutter faster so it doesn't take 'burst' photos.

Thanks for all the helpful comments!

ColdBootII
Level 5
Level 5
Posts: 521
Joined: Wed Aug 02, 2017 8:19 am

Re: [Solved] Removing files based on mask of filename

Post by ColdBootII » Thu Sep 12, 2019 4:47 am

Hi Jason,

Yes, you'll have to walk sequentially through the list of files sorted by name. read is best for that. Then extract the datetime and milisecond parts for each file name, into respective variables. Can be done using expression variable name="${string:begin:length}". Then read another file name, compare if datetime variable is the same and milissecond different and if so, delete the file etc...

Edit: where string is the actual file name... Sorry Jason, I have too much on my mind to write the full script, glad you got it sorted. :D
HTH,
Cheers.

User avatar
coffee412
Level 5
Level 5
Posts: 998
Joined: Mon Nov 12, 2012 7:38 pm
Location: Indiana, USA
Contact:

Re: Removing files based on mask of filename

Post by coffee412 » Thu Sep 12, 2019 5:40 pm

JasonStonier wrote:
Thu Sep 12, 2019 4:27 am
coffee412 wrote:
Wed Sep 11, 2019 4:00 pm
How about just do this and be done with it ;)

cp *"("*.jpeg keepers
I don't think this does what I need - I can't rely on the file having "(" in the file name. The only consistency in the files I need to consolidate is that they have the same time in the filename e.g. these three taken a few miliseconds apart at 16:02 and 22 seconds, I need to keep the first and discard the last two.
12-09-19 16-02-22.121.jpg
12-09-19 16-02-22.136.jpg
12-09-19 16-02-22.854.jpg

Flemur's brute force solution is achievable and I'm working my way through how to do it...but in parallel I solved it mechanically by clicking the iphone shutter faster so it doesn't take 'burst' photos.

Thanks for all the helpful comments!
Your OP said that you wanted to keep (*) files and get rid of the *.jpg files. If you wanted something different then I guess it was not clear.
Ryzen x1800 Asus Prime x370-Pro 32 gigs Ram RX480 graphics
IceWarp 12.0.3 * Mint 18.3 * RAID 1/5 * OpenVPN * Linux since kernel 2.0.36
************* Get Your Linux on! ***************

JasonStonier
Level 2
Level 2
Posts: 56
Joined: Sun Oct 02, 2016 10:51 am

Re: [Solved] Removing files based on mask of filename

Post by JasonStonier » Fri Sep 13, 2019 4:18 am

I thought this made it clear, but clearly not ;-)
I can't rely on the repeats having (2) in the filename, and they don't usually have the same filesize, but they would always be in the same whole second e.g. '2019-09-10 18-53-31'
I appreciate all the comments as I learn bash scripting.

Anyway, I found a really cool way to solve the specific problem - using imagemagick's 'compare' function to compare pairs of images and delete ones which were below a threshold difference. Has the benefit that it will remove images of the same subject but taken at different times (e.g. if my device captures the same page twice).

Code: Select all

#! /bin/bash
cd Test
for i in $(ls -1 *.jp*); do
  curr_file="$i"
  #echo $curr_file
	for i in $(ls -1 *.jp*); do
	  next_file="$i"
	  if [ $next_file != $curr_file ]; then
		compare_res=$(compare -metric RMSE $curr_file $next_file null: 2>&1 | cut -d'(' -f 1)
		int_result=$(echo $compare_res | cut -d '.' -f 1)
		#echo $curr_file $next_file
		#echo "Result" $compare_res
		echo "Int" $int_result	
		if [ $int_result -lt 10000 ]; then
			#echo "Less than 10000 difference"
			mv $next_file Duplicates
			echo " "
		fi	
	  fi	
	done
done

Post Reply

Return to “Newbie Questions”