Duplicate Files Search

Questions about applications and software
Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
chiefjim
Level 6
Level 6
Posts: 1157
Joined: Sun Jun 07, 2009 7:26 am
Location: South Texas, USA

Duplicate Files Search

Post by chiefjim »

Here is a bit of a twist. Have several DVDs with various files. Would like to search the hard drives to see if any files match those on the DVD.

I want to restrict the search to only those files already named on the DVD. FSLINT comes up with duplicates anywhere of any name.

Can this be done? Command line OK but would prefer a GUI if available.
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
User avatar
xenopeek
Level 25
Level 25
Posts: 29609
Joined: Wed Jul 06, 2011 3:58 am

Re: Duplicate Files Search

Post by xenopeek »

Can you narrow it down a bit more. Should the hard disk be searched for files with identical contents to files on the DVD, regardless of their name? Or does the name matter and should first files with the same exact name be searched for and then checked whether the contents is the same?

And is there any way to restrict what parts of the hard disk should be searched? I'd image only files in your own home directory are relevant and not the rest of the hard disk that has files of the operating system.
Image
Mute Ant

Re: Duplicate Files Search

Post by Mute Ant »

You could do that the other way around, search the DVD for files that are on the hard drive...
chiefjim
Level 6
Level 6
Posts: 1157
Joined: Sun Jun 07, 2009 7:26 am
Location: South Texas, USA

Re: Duplicate Files Search

Post by chiefjim »

The goal is to see if the files found on the DVD still remain on the hard drive. Anywhere on the hard drive because when the DVD was produced files were collected from multiple directories. Not just home. I have a special directory where archival files are found.

Intent is to purge old stuff that has less value and now exists on a DVD.
User avatar
xenopeek
Level 25
Level 25
Posts: 29609
Joined: Wed Jul 06, 2011 3:58 am

Re: Duplicate Files Search

Post by xenopeek »

But should the search be limited to files with the same name or should the name be disregarded and *only* the contents be compared?
Image
chiefjim
Level 6
Level 6
Posts: 1157
Joined: Sun Jun 07, 2009 7:26 am
Location: South Texas, USA

Re: Duplicate Files Search

Post by chiefjim »

xenopeek wrote:But should the search be limited to files with the same name or should the name be disregarded and *only* the contents be compared?
File name
User avatar
xenopeek
Level 25
Level 25
Posts: 29609
Joined: Wed Jul 06, 2011 3:58 am

Re: Duplicate Files Search

Post by xenopeek »

I don't think what you want exists. fslint, fdupes, dupeguru and the like all search from a particular starting directory for duplicates in that directory (and if so told, also its subdirectories). Not what you want "look in this directory and find duplicates anywhere else on my system".

There's couple of ways I can see to find your duplicates.

First, if you have enough disk space, just make a temporary directory on your system and copy the contents of the DVDs there (each in its own directory so you know what file came from what DVD). Then use any of the above tools or other duplicate file finder to search your entire system.

Second, but more complex, would be to write a (short) script to extract the file names from each DVD and do a MD5 sum of each file and store that together in a file. Then with a second (shortish) script that loops over that list and uses 'locate' command to first find files with the same name and, if any where found, compare their MD5 sum to note if they are identical to the one on the DVD. Print what files differ from the one on the DVD and optionally, if you want, what files are on the DVD but not found on your system. That will take a bit of time to write and make fool proof (i.e., it should be able to handle filenames with spaces in them).
Image
User avatar
all41
Level 19
Level 19
Posts: 9520
Joined: Tue Dec 31, 2013 9:12 am
Location: Computer, Car, Cage

Re: Duplicate Files Search

Post by all41 »

Another possibility would be to use an indexing search tool such as 'recoll' in the repositories.
You can specify which files will be indexed (such as .mp4, .iso, .mkv, etc) or you can index every file on the system,
even words within texts and .pdf files.

There is a non-repository tool called 'Angry Search' which also indexes your files and is purported
to be the Linux version of 'Everything'.

It will take a chunk of time to index your dvds and hard drives, but once the database is built finding duplicates will be instant.
Just start typing a title and all matching locations will be displayed in the results.
Everything in life was difficult before it became easy.
User avatar
xenopeek
Level 25
Level 25
Posts: 29609
Joined: Wed Jul 06, 2011 3:58 am

Re: Duplicate Files Search

Post by xenopeek »

That will only find duplicate names though, right? Recoll and other GUI tools that sit on top of the Linux locate and find commands don't index whether files have the same contents. Or perhaps that is enough info for OP.
Image
User avatar
all41
Level 19
Level 19
Posts: 9520
Joined: Tue Dec 31, 2013 9:12 am
Location: Computer, Car, Cage

Re: Duplicate Files Search

Post by all41 »

That's a valid point. But my guess is that files found with the same name and file size will have the same content, especially
for large files.
Everything in life was difficult before it became easy.
chiefjim
Level 6
Level 6
Posts: 1157
Joined: Sun Jun 07, 2009 7:26 am
Location: South Texas, USA

Re: Duplicate Files Search

Post by chiefjim »

all41 wrote:
There is a non-repository tool called 'Angry Search' which also indexes your files and is purported
to be the Linux version of 'Everything'.
This comes the closest. If it also had a delete feature it would have been ideal.
Locked

Return to “Software & Applications”