At work we have an expensive medical scanner, which dumps its images into a hierarchy of directories. I'd like to copy the files into a single directory preserving the meta data as much as possible* and if necessary renaming duplicate files.
* Yes, its a Windows only scanner and the software is terrible, quelle suprise. I've managed to access the images directory/folder over the Samba network.
Copy files from complex directory structure to one directory...
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Topics in this forum are automatically closed 6 months after creation.
Re: Copy files from complex directory structure to one directory...
You can just use the file manager for this.
In the root directory where you have the files search by file extension for example
You will get the list of files inside each folder and subfolder. Select them all in the search result window, copy and paste them to a new directory. If here are duplicated names, tick the checkbox to do the same for all the conflicts and then click the "Duplicate" button. The duplicated files will get
In the root directory where you have the files search by file extension for example
*.jpg
You will get the list of files inside each folder and subfolder. Select them all in the search result window, copy and paste them to a new directory. If here are duplicated names, tick the checkbox to do the same for all the conflicts and then click the "Duplicate" button. The duplicated files will get
(copy)
, (second copy)
, (3rd copy)
and so on added to the filenames. You can then use something else like a file rename program or a script to change the names of the files if needed.Linux Mint Una Cinnamon 20.3 Kernel: 5.15.x | Quad Core I7 4.2Ghz | 24GB Ram | 1TB NVMe | Intel Graphics
Re: Copy files from complex directory structure to one directory...
Thanks for that. I was really looking for a non-interactive method, a command line or shell script.
Re: Copy files from complex directory structure to one directory...
I think you'll have to script it if you want to do it on the command line.
If files in the source were guaranteed to have a unique name you could copy all files from 'source' directory and its subdirectories into 'target' directory with:
Or if you will be repeatedly doing this copy action and all files stay on source, so you only want to copy files not yet in target or that have been updated since the last copy you could it do it like this to avoid needlessly copying the same files again (but this does clobber files with the same name):
The rsync needs more options to preserve attributes, there are a bunch of options for things you can preserve.
More complex but you could write a small script to loop over the find results, check whether the filename already exists at target and if so change the target name and then copy the file. I don't know of a tool that does this for you but I wouldn't be surprised if it exists. Maybe somebody else knows of a command line tool to do what you want.
Or you could use another command beforehand to ensure all filenames are unique, and if there are any non-unique names fix that manually before running the cp or rsync command. Maybe like so; this will print filenames (without directory) that aren't unique -- if all filenames are unique it will print nothing:
If files in the source were guaranteed to have a unique name you could copy all files from 'source' directory and its subdirectories into 'target' directory with:
Code: Select all
find source -type f -exec cp --archive --no-clobber '{}' target \;
Code: Select all
find source -type f -exec rsync --no-R --no-implied-dirs '{}' target \;
More complex but you could write a small script to loop over the find results, check whether the filename already exists at target and if so change the target name and then copy the file. I don't know of a tool that does this for you but I wouldn't be surprised if it exists. Maybe somebody else knows of a command line tool to do what you want.
Or you could use another command beforehand to ensure all filenames are unique, and if there are any non-unique names fix that manually before running the cp or rsync command. Maybe like so; this will print filenames (without directory) that aren't unique -- if all filenames are unique it will print nothing:
Code: Select all
find source -type f -printf '%f\n' | sort | uniq -c | grep -v '^[[:blank:]]*1[[:blank:]]'
Re: Copy files from complex directory structure to one directory...
Do you intend to run regularly the "copy" piece of code or just once ? Because if you intend to do it regularly, you may quickly get tons of duplicate files, the reason being that all files will be copied to your destination directory each time and be renamed (that is, of course, if I'm correctly understanding your problem).Pastcal wrote: ⤴Sat Mar 09, 2024 2:56 pm At work we have an expensive medical scanner, which dumps its images into a hierarchy of directories. I'd like to copy the files into a single directory preserving the meta data as much as possible* and if necessary renaming duplicate files.
* Yes, its a Windows only scanner and the software is terrible, quelle suprise. I've managed to access the images directory/folder over the Samba network.
Something beginning with :
Code: Select all
find SourceDir -type f -newerct <some timestamp>
Re: Copy files from complex directory structure to one directory...
This might be handy to tell if any duplicate filenames exist. It works for me:
Taken from this website:
https://stackoverflow.com/questions/163 ... -filenames
Code: Select all
find -name '*' | awk -F"/" '{a[$NF]++}END{for(i in a)if(a[i]>1)print i,a[i]}'
https://stackoverflow.com/questions/163 ... -filenames
Re: Copy files from complex directory structure to one directory...
>Do you intend to run regularly the "copy" piece of code or just once ?
I'm hoping it's just a one off. Time will tell...
I'm hoping it's just a one off. Time will tell...
Re: Copy files from complex directory structure to one directory...
Noted. Another question to be sure : in your OP (original post), you speak of duplicate files. Some helpers speak of duplicate filenames, which may not be the same : duplicate files are files containing the same data even if they don't have the same name. Duplicate filenames are just the opposite : they may contain different data but bear the same name. What are your duplicates : files or filenames ?
If it's just filenames and if any kind of duplicate renaming suits you, you may use Xenopeek's suggestion :
Code: Select all
find source -type f -exec cp --archive --no-clobber '{}' target \;
--no-clobber
option with the --backup=t
option for cp
:
Code: Select all
find source -type f -exec cp --archive --backup=t '{}' target \;
--backup=t
or --backup=numbered
option does (quoting the cp man page) :
Code: Select all
numbered, t
make numbered backups