Page 1 of 1

Comparing text files from two different directories in Python

Posted: Fri Dec 25, 2020 3:38 am
by johnjosef
Suppose there are two directories Dir1 and Dir2 containing n text files each say t1.txt, t2.txt, t3.txt, t4.txt, t5.txt ...... and tt1.txt, tt2.txt, tt3.txt, tt4.txt, tt5.txt......

I want to run a for loop such that each text file of Dir1 is compared with Dir2 and returns the matching text file.

Using Python

Re: Comparing text files from two different directories in Python

Posted: Fri Dec 25, 2020 9:17 am
by vimes666
I am not familiar with python. However if it is possible to use linux commands in python maybe this one will help:

Code: Select all

grep -Fxf <(ls dir-a) <(ls dir-b)

Re: Comparing text files from two different directories in Python

Posted: Fri Dec 25, 2020 9:23 am
by Flemur
johnjosef wrote: Fri Dec 25, 2020 3:38 am I want to run a for loop such that each text file of Dir1 is compared with Dir2 and returns the matching text file.
Does "matching" mean e.g. t1.txt matches tt1.txt, or the content of t1.txt is the same as the content of tt4.txt?

Re: Comparing text files from two different directories in Python

Posted: Tue Dec 29, 2020 3:44 am
by Petermint
File size? Quantities?

For small files, you could read every file into memory as strings then compare strings.

If you have big directories or big files, there are numerous tricks to make the comparison faster. Things like matching lengths first. Loop through directory A. For each A, loop through B looking for files with the same length. Perform a more detailed compare when lengths match.

The more detailed compare could be creating an MD5 value or similar. You can run a system utility as a task.

Re: Comparing text files from two different directories in Python

Posted: Mon Jan 04, 2021 8:02 am
by HBaguette
If the files aren't very big, you could just open both of them and iterate through them, comparing each individual character. This would likely be much slower than other ways, though, and more resource-heavy, but I'm pretty exhausted right now and can't think of much better at the moment.

First, like other people have suggested, I'd compare the file lengths and sizes, to see if it's even worth iterating in the first place. If they both match, then iterate through each individual character, and the second it finds any differences, stop comparing them (as they obviously don't match).

Re: Comparing text files from two different directories in Python

Posted: Fri Feb 19, 2021 10:31 pm
by Termy
In Perl, I'd first check file sizes, in bytes, then if they're the same, I'd use Digest::MD5, Digest::SHA, or some equivalent. I'm sure there's a similar process in Python.

Re: Comparing text files from two different directories in Python

Posted: Sat Feb 20, 2021 1:00 am
by JosephM
Well the OP never responded but this looks a lot like someone's homework problem ;)

Re: Comparing text files from two different directories in Python

Posted: Sat Feb 20, 2021 4:37 am
by TheyLive
johnjosef wrote: Fri Dec 25, 2020 3:38 am I want to run a for loop such that each text file of Dir1 is compared with Dir2 and returns the matching text file.

Using Python
HashDeep
It is a police investigation tool. It will find not only text matches, but any.
You can make a wrapper for it in python.