Checking files for corruption (my situation & plan)

Questions about applications and software
Forum rules
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
Netherprovinc3
Level 4
Level 4
Posts: 456
Joined: Mon Feb 04, 2019 9:29 pm

Checking files for corruption (my situation & plan)

Post by Netherprovinc3 »

Linux Mint 19.1 Cinnamon, 64 bit

I moved some files from a Mac computer onto a USB drive (a sata drive in an enclosure, actually). The files are on an hfs+ partition. hfs+ is basically the file system that shipped with most Apple computers from 1998-2017.

The final destination for these files is integration with files already on the destination computer (Linux user account home directory). In other words, there is folder1 containing item1 item2 item3. Of the many "new" files, 2 of those will become item4 and item5 (file names will remain unchanged).

Here is my plan. I want to know if this make sense.
1) grsync (a fancy copy/sync tool) all of the files to a folder on my Linux computer. The reason that we put them all in 1 directory is to make step 2 possible/easier.
2) To add some assurance that they are not corrupted, open each file. To do this, I would see if opening the file produces an error. Would need to find a script for doing this.
3) Integrate these "new" files by moving them to appropriate folders on my computer.

Does the plan seem reasonable?
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
mikeflan
Level 17
Level 17
Posts: 7141
Joined: Sun Apr 26, 2020 9:28 am
Location: Houston, TX

Re: Checking files for corruption (my situation & plan)

Post by mikeflan »

Are we talking about just a single folder at the hfs+ parition? If not, then duplicate filenames might be an issue when you grsync to a single folder.

Probably no need to do "1) grsync". Instead just use find and pipe the output to open the files.

Why check to see if they are corrupted? If they are corrupted on the hfs+ partition then they are corrupted. What to do then? Do you think the copy from hfs+ partition to ext4 partition can cause corruption? I think not likely.
Netherprovinc3
Level 4
Level 4
Posts: 456
Joined: Mon Feb 04, 2019 9:29 pm

Re: Checking files for corruption (my situation & plan)

Post by Netherprovinc3 »

mikeflan wrote: Fri Oct 15, 2021 9:53 am Are we talking about just a single folder at the hfs+ parition? If not, then duplicate filenames might be an issue when you grsync to a single folder.
On the HFS+ partition, there are folders, hidden folders, files, and hidden files.

The hidden files and folders are probably just files created when the files were written to the drive. Probably they are things that are helpful to the Mac operating system.

I will not "bring over" the hidden files and hidden folders.

I will bring over the (non-hidden) folders and the (non-hidden) files. I will also bring over the contents of the (non-hidden) folders.

Still, you make a good point. When I move the files to their ultimate destinations, need to make sure not to overwrite any file that happens to have the same name.

Further, when moving from some file systems to others, you can encounter problems with file names. Linux file system considers a file and folder with the same name to be "indistinguishable." Windows file system considers a file with uppercase to be "indistinguishable" from a file with lowercase.
mikeflan wrote: Fri Oct 15, 2021 9:53 am Probably no need to do "1) grsync". Instead just use find and pipe the output to open the files.
Are you proposing some other command to move the files from one partition to another? Maybe grsync is not so good when working with certain file systems (like hfs+)
I'll have to read up more on commands. I am not clear on how the "find" command fits in. I guess there is some way to use "find" to cycle through a list of files and open each one.
mikeflan wrote: Fri Oct 15, 2021 9:53 am Why check to see if they are corrupted? If they are corrupted on the hfs+ partition then they are corrupted. What to do then? Do you think the copy from hfs+ partition to ext4 partition can cause corruption? I think not likely.
They could have become corrupted at the time of being written to the ext4 drive. Perhaps it is very very unlikely. Once the files are on my ext4 drive, I would think that they do not get written again when moved to folders. Instead, there is a pointer that indicates the location of the data and so just a pointer changes. I suppose the new pointer that gets written could get corrupted. But, that should be extremely low likelihood.
mikeflan
Level 17
Level 17
Posts: 7141
Joined: Sun Apr 26, 2020 9:28 am
Location: Houston, TX

Re: Checking files for corruption (my situation & plan)

Post by mikeflan »

It's fairly obvious to me that you are not going to get this done to your satisfaction with just a few terminal commands. If we are talking about only 30 folders, I would suggest you do it mostly manually. The hardest part in my view is identifying file4 and file5 from file1, 2, and 3, and cross referencing folders whose names may not be identical (as seen by the Linux filesystem). Once that is done a simple copy may get the job done.

If you really think the hfs+ partition files are suspect, then maybe you should rename them (place "hfsplus-" at the beginning of the filename) before copying them over. That "hfsplus-" can easily be removed later if needed, although name collisions could still occur.
User avatar
Lady Fitzgerald
Level 15
Level 15
Posts: 5813
Joined: Tue Jan 07, 2020 3:12 pm
Location: AZ, SSA (Squabbling States of America)

Re: Checking files for corruption (my situation & plan)

Post by Lady Fitzgerald »

I use a folder/file syncing program called FreeFileSync to copy large numbers of files. There is a "hack" that can be done to FreeFileSync to enable it to verify file copies were made without corruption. Set up of a profile for a onetime copy is a bit time consuming but, since I use FreefileSync daily for updating backups, I already have multiple profiles saved so, for a onetime copy operation, I just change the file-paths on an existing profile, run it, then don't save the changed profile afterwards.
Jeannie

To ensure the safety of your data, you have to be proactive, not reactive, so, back it up!
t42
Level 11
Level 11
Posts: 3742
Joined: Mon Jan 20, 2014 6:48 pm

Re: Checking files for corruption (my situation & plan)

Post by t42 »

Creating checksum list at the source and copying it to the destination allows to monitor file corruption at all stages.

Code: Select all

find . -type f -exec md5sum {} \; > list_01.md5
md5sum -c list_01.md5
md5sum -c list_01.md5 | grep -i failed
-=t42=-
Netherprovinc3
Level 4
Level 4
Posts: 456
Joined: Mon Feb 04, 2019 9:29 pm

Re: Checking files for corruption (my situation & plan)

Post by Netherprovinc3 »

mikeflan wrote: Wed Oct 20, 2021 8:21 am It's fairly obvious to me that you are not going to get this done to your satisfaction with just a few terminal commands. If we are talking about only 30 folders, I would suggest you do it mostly manually. The hardest part in my view is identifying file4 and file5 from file1, 2, and 3, and cross referencing folders whose names may not be identical (as seen by the Linux filesystem). Once that is done a simple copy may get the job done.
I am interested in an automated process for just doing a basic check for file corruption when the files are first copied over (before the files are integrated with my other files).

I am ok with "integrating" the files from the hfs+ partition in with my files through a manual process (using the Linux GUI).
mikeflan wrote: Wed Oct 20, 2021 8:21 am If you really think the hfs+ partition files are suspect, then maybe you should rename them (place "hfsplus-" at the beginning of the filename) before copying them over. That "hfsplus-" can easily be removed later if needed, although name collisions could still occur.
The goal is just to check them all, not worried about the check being completely foolproof. To know if a file is corrupted is almost not possible, unless you have actually been monitoring it since it was first created.
Netherprovinc3
Level 4
Level 4
Posts: 456
Joined: Mon Feb 04, 2019 9:29 pm

Re: Checking files for corruption (my situation & plan)

Post by Netherprovinc3 »

Lady Fitzgerald wrote: Wed Oct 20, 2021 10:55 am I use a folder/file syncing program called FreeFileSync to copy large numbers of files. There is a "hack" that can be done to FreeFileSync to enable it to verify file copies were made without corruption. Set up of a profile for a onetime copy is a bit time consuming but, since I use FreefileSync daily for updating backups, I already have multiple profiles saved so, for a onetime copy operation, I just change the file-paths on an existing profile, run it, then don't save the changed profile afterwards.
What do you see as the advantage of FreefileSync vs Back In Time?
Maybe it's the "hack" that you described?
Netherprovinc3
Level 4
Level 4
Posts: 456
Joined: Mon Feb 04, 2019 9:29 pm

Re: Checking files for corruption (my situation & plan)

Post by Netherprovinc3 »

t42 wrote: Wed Oct 20, 2021 12:27 pm Creating checksum list at the source and copying it to the destination allows to monitor file corruption at all stages.

Code: Select all

find . -type f -exec md5sum {} \; > list_01.md5
md5sum -c list_01.md5
md5sum -c list_01.md5 | grep -i failed
In this particular situation, I am not worried about file corruption being introduced when copying from the hfs+ partition to ext4 partition. Unless, of course, there is a known problem when moving files hfs+ to ext4 using the GUI of the Cinnamon desktop. If there is such a problem, then I guess I'll have to figure out a way to deal with that.

Instead, I am just concerned about the files on the hfs+ being corrupted as they are right now. There is no "source" that has much lower chances of being corruption free. But, there are some other copies that I could "dig through" if needed to replace corrupted file.

Steps are
1) Move files to ext4 (all into one big folder).
2) Open each one to see if there is an error. Unless maybe there is a better way to test this. By better, I mean very little additional work for a much more accurate result.
3) Manually put each of these files and folders into various spots on my ext4 drive.

It's really just step 2 that I am soliciting help on.

Your code could maybe be useful for some other work that I have to do. But, right now I am probably more intrigued by Lady Fitzgerald's use of FreeFileSync. I am just being honest because you have been more than kind and I don't want you to spend time if you have other things that you would prefer to spend your time on.

I admit that your method might be better. But, might be less user friendly. I really need to get more comfortable with bash commands. I still hide under the bed when I see a man page :|
User avatar
Lady Fitzgerald
Level 15
Level 15
Posts: 5813
Joined: Tue Jan 07, 2020 3:12 pm
Location: AZ, SSA (Squabbling States of America)

Re: Checking files for corruption (my situation & plan)

Post by Lady Fitzgerald »

Netherprovinc3 wrote: Fri Oct 22, 2021 1:20 am
Lady Fitzgerald wrote: Wed Oct 20, 2021 10:55 am I use a folder/file syncing program called FreeFileSync to copy large numbers of files. There is a "hack" that can be done to FreeFileSync to enable it to verify file copies were made without corruption. Set up of a profile for a onetime copy is a bit time consuming but, since I use FreefileSync daily for updating backups, I already have multiple profiles saved so, for a onetime copy operation, I just change the file-paths on an existing profile, run it, then don't save the changed profile afterwards.
What do you see as the advantage of FreefileSync vs Back In Time?
Maybe it's the "hack" that you described?
Comparing Back in Time to FreeFileSync is comparing apples to kumquats. Back in Time is an imaging program that makes snapshots, similar to Timeshift. It's intended for backing up your operating system. It's not intended to back up data, especially when one has a large amount of data.

FreeFileSync, on the other hand, is designed for backing up data, not the operating system. When set to Mirror mode (not the same as RAID1), after comparing a source folder, partition, or drive to a destination folder, partition, or drive, it copies new and changed data from the source to the destination and deletes any files on the destination that are no longer on the source. The end result is essentially a clone of the source on the the destination.
Jeannie

To ensure the safety of your data, you have to be proactive, not reactive, so, back it up!
Netherprovinc3
Level 4
Level 4
Posts: 456
Joined: Mon Feb 04, 2019 9:29 pm

Re: Checking files for corruption (my situation & plan)

Post by Netherprovinc3 »

Lady Fitzgerald wrote: Fri Oct 22, 2021 3:28 am Comparing Back in Time to FreeFileSync is comparing apples to kumquats. Back in Time is an imaging program that makes snapshots, similar to Timeshift. It's intended for backing up your operating system. It's not intended to back up data, especially when one has a large amount of data.
You are right that Back in Time seems most often used to back up the data in a user's home directory. I back up everything in the home directory except for (1) some files that the operating system creates/ edits; and (2) virtual machines. However, you can use Back in Time to back up any files. You just choose the source files and the destination. When you take a snapshot (the term that they use for backing up), the program will write any new files to the destination. Behind the scenes, I believe the program is using the rsync command. One advantage of rsync is that it just "writes" changes to the destination. This might save some disk wear and tear since you aren't writing all of the data every time you run a backup.

It sounds like the 2 programs (Back in Time and FreeFileSync) are similar. One nice thing about Back in Time is that I think it comes "preloaded" what you would naturally want to exclude when backing up the home directory. But, that might not be as much use to you.

With Back in Time, I wouldn't be concerned about the backup being corrupted (and the source file not being corrupted) because the program will pick up the difference the next time you run the backup and write a good copy of the file then (or in one of the subsequent backups that you run).
User avatar
Lady Fitzgerald
Level 15
Level 15
Posts: 5813
Joined: Tue Jan 07, 2020 3:12 pm
Location: AZ, SSA (Squabbling States of America)

Re: Checking files for corruption (my situation & plan)

Post by Lady Fitzgerald »

Netherprovinc3 wrote: Fri Oct 22, 2021 6:44 am
Lady Fitzgerald wrote: Fri Oct 22, 2021 3:28 am Comparing Back in Time to FreeFileSync is comparing apples to kumquats. Back in Time is an imaging program that makes snapshots, similar to Timeshift. It's intended for backing up your operating system. It's not intended to back up data, especially when one has a large amount of data.
You are right that Back in Time seems most often used to back up the data in a user's home directory. I back up everything in the home directory except for (1) some files that the operating system creates/ edits; and (2) virtual machines. However, you can use Back in Time to back up any files. You just choose the source files and the destination. When you take a snapshot (the term that they use for backing up), the program will write any new files to the destination. Behind the scenes, I believe the program is using the rsync command. One advantage of rsync is that it just "writes" changes to the destination. This might save some disk wear and tear since you aren't writing all of the data every time you run a backup.

It sounds like the 2 programs (Back in Time and FreeFileSync) are similar. One nice thing about Back in Time is that I think it comes "preloaded" what you would naturally want to exclude when backing up the home directory. But, that might not be as much use to you.

With Back in Time, I wouldn't be concerned about the backup being corrupted (and the source file not being corrupted) because the program will pick up the difference the next time you run the backup and write a good copy of the file then (or in one of the subsequent backups that you run).
Unlike FreeFileSync, which is well documented, having its own website with a manual, tutorials, and a user forum, Back in Time is piddle poorly documented. I had one "heckuva" time finding any information on how it actually worked "beneath the hood" and I'm still pretty hazy on exactly how it works. It appears to be an attempt to be an all in one program that is sorta feature rich but doesn't do everything well (remember Nero?). It works similarly to Timeshift.

FreeFileSync is essentially a smart copy program. Unlike Back in Time, the end result is essentially a clone of the original drive, partition, or folder (differences include not replicating the UUID of a drive when backing up a drive or partition). In the case of a backup of an entire drive, you could use the backup as a direct replacement of a dead data drive in a computer by yanking out the dead drive and popping in the backup drive (while possible, it's not recommended since you would have to horse around with permissions, fstab entries, etc. afterwards; it would be better to pop in a replacement drive, then restore the data to it from the backup drive).

Two things that FreeFileSync has that Back in Time appears to lack is Versioning (files being deleted from a backup being sent to a Versioning folder instead of disappearing) and (unofficially but still effective) Verification of file copies. Verification immediately picks up on a file that became corrupted when copied to the backup destination. The danger of letting either backup program fix the corruption the next time you update the backup is if the original file fails before the next update, you will lose it. Verification allows you to find that quickly so you can quickly fix it (just run the backup again; since only the one file is involved, it will literally take less than a minute).

If the original file is corrupted (stuff happens), either program will replace the good file in the backup with the corrupted one. Versioning will send the deleted good file to a Versioning folder where you can catch it and restore the corrupted file with the good file with a simple file system copy and paste operation.
Jeannie

To ensure the safety of your data, you have to be proactive, not reactive, so, back it up!
mikeflan
Level 17
Level 17
Posts: 7141
Joined: Sun Apr 26, 2020 9:28 am
Location: Houston, TX

Re: Checking files for corruption (my situation & plan)

Post by mikeflan »

The goal is just to check them all
I would use the file command to check them, but I'm not sure it is fool-proof. I don't have many corrupt files on my system, but I did find a few corrupt jpg files. Here is one:
http://www.mflan.com/temp/1212.jpg

Here is the results of the file command on it:

Code: Select all

$ file 1212.jpg
1212.jpg: ASCII text, with very long lines, with no line terminators
So I say it worked on that file. If you have corrupt files you can test it on, please do.
Locked

Return to “Software & Applications”