[Solved] Removing files based on mask of filename

Quick to answer questions about finding your way around Linux Mint as a new user.
Forum rules
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions use the other forums in the support section.
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
JasonStonier

[Solved] Removing files based on mask of filename

Post by JasonStonier »

I have a system which takes automated pictures on an iPhone (automated scanning of old magazines to an archive) and uploads them to my linux box, but occasionally the shutter trigger results in multiple pictures of the same page separated by milliseconds in the filename:

Code: Select all

jason@JasonMain ~/Pictures $ ls -al *.*.jpeg
-rw-r--r-- 1 jason jason 2264825 Sep 10 18:53 2019-09-10 18-53-31.157 (2).jpeg
-rw-r--r-- 1 jason jason 2264825 Sep 10 18:53 2019-09-10 18-53-31.157.jpeg
-rw-r--r-- 1 jason jason 2260654 Sep 10 18:53 2019-09-10 18-53-31.207.jpeg
-rw-r--r-- 1 jason jason 2916184 Sep 10 18:56 2019-09-10 18-56-17.848.jpeg
-rw-r--r-- 1 jason jason 2907621 Sep 10 18:56 2019-09-10 18-56-18.008 (2).jpeg
-rw-r--r-- 1 jason jason 2907621 Sep 10 18:56 2019-09-10 18-56-18.008.jpeg
-rw-r--r-- 1 jason jason 2766682 Sep 10 18:57 2019-09-10 18-57-24.156 (2).jpeg
-rw-r--r-- 1 jason jason 2766682 Sep 10 18:57 2019-09-10 18-57-24.156.jpeg
-rw-r--r-- 1 jason jason 2765306 Sep 10 18:57 2019-09-10 18-57-24.206.jpeg
-rw-r--r-- 1 jason jason 2856521 Sep 10 18:57 2019-09-10 18-57-58.145 (2).jpeg
-rw-r--r-- 1 jason jason 2856521 Sep 10 18:57 2019-09-10 18-57-58.145.jpeg
-rw-r--r-- 1 jason jason 2847966 Sep 10 18:57 2019-09-10 18-57-58.245.jpeg
Is there any way in the terminal to delete the 'repeats' - for example I would want to keep:

Code: Select all

-rw-r--r-- 1 jason jason 2264825 Sep 10 18:53 2019-09-10 18-53-31.157 (2).jpeg
and remove

Code: Select all

-rw-r--r-- 1 jason jason 2264825 Sep 10 18:53 2019-09-10 18-53-31.157.jpeg
-rw-r--r-- 1 jason jason 2260654 Sep 10 18:53 2019-09-10 18-53-31.207.jpeg
I can't rely on the repeats having (2) in the filename, and they don't usually have the same filesize, but they would always be in the same whole second e.g. '2019-09-10 18-53-31'

I've googled around a lot and there are many ways to remove true duplicates, but that's not what I have here.

Thanks for any help!
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 2 times in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
ColdBootII

Re: Removing files based on mask of filename

Post by ColdBootII »

Hi,

Instead of using a script for this, did you simply tried to find and removing duplicates using Pix or some other photo organizer?

It sure worked well for me. But if it's very important to you, as I suspect, you can check first by creating a duplicate folder, remove duplicates from it using Pix and then output that ls command in original and duplicate folders, to a text file. Then using diff you'll see for yourself what the tool deleted and whether you can rely on it or not.

Cheers.
User avatar
Flemur
Level 20
Level 20
Posts: 10096
Joined: Mon Aug 20, 2012 9:41 pm
Location: Potemkin Village

Re: Removing files based on mask of filename

Post by Flemur »

18-53-31 = HH-MM-SS ?

This is all bogus syntax, but the point is to loop thru all the hh/mm/ss and mv each hh/mm/ss.xxx file to the same file (=a), overwriting it each time, then renaming it to the correct hh-mm-ss.000

Code: Select all

for hh = 0 to 23
for mm = 0 to 59
for ss = 0 to 59
  rm a
  for file in "2019-09-10 hh-mm-ss.*.jpeg"  {  # need to FORMAT those numbers into the filename
    mv $file to a
  }
  if (a exists) 
     mv a "2019-09-10 hh-mm-ss.000.jpeg"
}
Please edit your original post title to include [SOLVED] if/when it is solved!
Your data and OS are backed up....right?
ColdBootII

Re: Removing files based on mask of filename

Post by ColdBootII »

ColdBootII wrote: Tue Sep 10, 2019 8:23 pm But if it's very important to you, as I suspect, you can check first by creating a duplicate folder, remove duplicates from it using Pix and then output that ls command in original and duplicate folders, to a text file. Then using diff you'll see for yourself what the tool deleted and whether you can rely on it or not.
Heh, I forgot that in the case of Pix, you are being asked to check, which file(s) to keep/delete, sorry. Could be impractical, if there's too many of them.
JasonStonier

Re: Removing files based on mask of filename

Post by JasonStonier »

Flemur wrote: Tue Sep 10, 2019 9:23 pm 18-53-31 = HH-MM-SS ?

This is all bogus syntax, but the point is to loop thru all the hh/mm/ss and mv each hh/mm/ss.xxx file to the same file (=a), overwriting it each time, then renaming it to the correct hh-mm-ss.000
That's a genius idea. Thanks so much for the pointer, really appreciate the help.
User avatar
AndyMH
Level 21
Level 21
Posts: 13704
Joined: Fri Mar 04, 2016 5:23 pm
Location: Wiltshire

Re: Removing files based on mask of filename

Post by AndyMH »

To get rid of lots of duplicate files I use FSlint. Can't remember if it is in software manager. Has a GUI interface.
Thinkcentre M720Q - LM21.3 cinnamon, 4 x T430 - LM21.3 cinnamon, Homebrew desktop i5-8400+GTX1080 Cinnamon 19.0
User avatar
Flemur
Level 20
Level 20
Posts: 10096
Joined: Mon Aug 20, 2012 9:41 pm
Location: Potemkin Village

Re: Removing files based on mask of filename

Post by Flemur »

JasonStonier wrote: Wed Sep 11, 2019 4:40 am
Flemur wrote: Tue Sep 10, 2019 9:23 pm 18-53-31 = HH-MM-SS ?
This is all bogus syntax, but the point is to loop thru all the hh/mm/ss and mv each hh/mm/ss.xxx file to the same file (=a), overwriting it each time, then renaming it to the correct hh-mm-ss.000
That's a genius idea. Thanks so much for the pointer, really appreciate the help.
I wish I was more up on the bash syntax for those operations (loop, file exists, make a filename, etc); back in the old days I'd write a C program which made 'system' calls to something like that. Good luck!
Please edit your original post title to include [SOLVED] if/when it is solved!
Your data and OS are backed up....right?
Lanser

Re: Removing files based on mask of filename

Post by Lanser »

Hello Jason. You may find pyRenamer worth installing. It's very very useful for bulk file renaming / filtering / using substitution or deletion.

Lanser
User avatar
Flemur
Level 20
Level 20
Posts: 10096
Joined: Mon Aug 20, 2012 9:41 pm
Location: Potemkin Village

Re: Removing files based on mask of filename

Post by Flemur »

What I suggested is kinda brute force and the over-writing (by renaming) bugs me, but it should be simple to code up and it should work for any number of duplicates; but if you only have some small # of duplicates for a givien time, say 3 = the max, like you showed, it might be better to change the inside of that loop to do these cases:
1 file: just keep it = go to the next time in the ss loop.
2 files: delete one of them
3 files: delete two of them.
Please edit your original post title to include [SOLVED] if/when it is solved!
Your data and OS are backed up....right?
User avatar
coffee412
Level 8
Level 8
Posts: 2260
Joined: Mon Nov 12, 2012 7:38 pm
Location: I dont know
Contact:

Re: Removing files based on mask of filename

Post by coffee412 »

How about just do this and be done with it ;)
Just check the variable (~) for home to make sure it works. I commented the deletion of "keepers" out just in case.

Code: Select all

#!/bin/bash

mkdir -p ~/Pictures/keepers
cp *"("*.jpeg keepers
rm -f *.jpg
cp ~/Pictures/keepers/*.jpg ~/Pictures

# rm -fr /Pictures/keepers
Ryzen x1800 Asus Prime x370-Pro 32 gigs Ram RX480 graphics
Dell PE T610, Dell PE T710
- List your hardware Profile: inxi -Fxpmrz
MeshCentral * Virtualbox * Debian * InvoiceNinja * NextCloud * Linux since kernel 2.0.36
JasonStonier

Re: Removing files based on mask of filename

Post by JasonStonier »

coffee412 wrote: Wed Sep 11, 2019 4:00 pm How about just do this and be done with it ;)

cp *"("*.jpeg keepers
I don't think this does what I need - I can't rely on the file having "(" in the file name. The only consistency in the files I need to consolidate is that they have the same time in the filename e.g. these three taken a few miliseconds apart at 16:02 and 22 seconds, I need to keep the first and discard the last two.
12-09-19 16-02-22.121.jpg
12-09-19 16-02-22.136.jpg
12-09-19 16-02-22.854.jpg

Flemur's brute force solution is achievable and I'm working my way through how to do it...but in parallel I solved it mechanically by clicking the iphone shutter faster so it doesn't take 'burst' photos.

Thanks for all the helpful comments!
ColdBootII

Re: [Solved] Removing files based on mask of filename

Post by ColdBootII »

Hi Jason,

Yes, you'll have to walk sequentially through the list of files sorted by name. read is best for that. Then extract the datetime and milisecond parts for each file name, into respective variables. Can be done using expression variable name="${string:begin:length}". Then read another file name, compare if datetime variable is the same and milissecond different and if so, delete the file etc...

Edit: where string is the actual file name... Sorry Jason, I have too much on my mind to write the full script, glad you got it sorted. :D
HTH,
Cheers.
User avatar
coffee412
Level 8
Level 8
Posts: 2260
Joined: Mon Nov 12, 2012 7:38 pm
Location: I dont know
Contact:

Re: Removing files based on mask of filename

Post by coffee412 »

JasonStonier wrote: Thu Sep 12, 2019 4:27 am
coffee412 wrote: Wed Sep 11, 2019 4:00 pm How about just do this and be done with it ;)

cp *"("*.jpeg keepers
I don't think this does what I need - I can't rely on the file having "(" in the file name. The only consistency in the files I need to consolidate is that they have the same time in the filename e.g. these three taken a few miliseconds apart at 16:02 and 22 seconds, I need to keep the first and discard the last two.
12-09-19 16-02-22.121.jpg
12-09-19 16-02-22.136.jpg
12-09-19 16-02-22.854.jpg

Flemur's brute force solution is achievable and I'm working my way through how to do it...but in parallel I solved it mechanically by clicking the iphone shutter faster so it doesn't take 'burst' photos.

Thanks for all the helpful comments!
Your OP said that you wanted to keep (*) files and get rid of the *.jpg files. If you wanted something different then I guess it was not clear.
Ryzen x1800 Asus Prime x370-Pro 32 gigs Ram RX480 graphics
Dell PE T610, Dell PE T710
- List your hardware Profile: inxi -Fxpmrz
MeshCentral * Virtualbox * Debian * InvoiceNinja * NextCloud * Linux since kernel 2.0.36
JasonStonier

Re: [Solved] Removing files based on mask of filename

Post by JasonStonier »

I thought this made it clear, but clearly not ;-)
I can't rely on the repeats having (2) in the filename, and they don't usually have the same filesize, but they would always be in the same whole second e.g. '2019-09-10 18-53-31'
I appreciate all the comments as I learn bash scripting.

Anyway, I found a really cool way to solve the specific problem - using imagemagick's 'compare' function to compare pairs of images and delete ones which were below a threshold difference. Has the benefit that it will remove images of the same subject but taken at different times (e.g. if my device captures the same page twice).

Code: Select all

#! /bin/bash
cd Test
for i in $(ls -1 *.jp*); do
  curr_file="$i"
  #echo $curr_file
	for i in $(ls -1 *.jp*); do
	  next_file="$i"
	  if [ $next_file != $curr_file ]; then
		compare_res=$(compare -metric RMSE $curr_file $next_file null: 2>&1 | cut -d'(' -f 1)
		int_result=$(echo $compare_res | cut -d '.' -f 1)
		#echo $curr_file $next_file
		#echo "Result" $compare_res
		echo "Int" $int_result	
		if [ $int_result -lt 10000 ]; then
			#echo "Less than 10000 difference"
			mv $next_file Duplicates
			echo " "
		fi	
	  fi	
	done
done
Locked

Return to “Beginner Questions”