File corruption when copying to FAT32 flash drive

Quick to answer questions about finding your way around Linux Mint as a new user.
Forum rules
There are no such things as "stupid" questions. However if you think your question is a bit stupid, then this is the right place for you to post it. Stick to easy to-the-point questions that you feel people can answer fast. For long and complicated questions use the other forums in the support section.
Before you post read how to get help. Topics in this forum are automatically closed 6 months after creation.
Locked
LessStuffMoreLife

File corruption when copying to FAT32 flash drive

Post by LessStuffMoreLife »

I've recently installed Mint 17.2 with Cinnamon on a newer Dell laptop and everything seems to be working smoothly except for one thing. I'm running into issues copying large (1 to 1.5GB) MP4 files onto my flash drive. I have run into this issue intermittently in the past with other Ubuntu based distros (mostly Netrunner, maybe Xubuntu) on other laptops and with other flash drives. The problem is that if I copy the file (Ext4 to FAT32) it is sometimes unreadable or the image is simply distorted when I attempt to play it back on my TV which can read various multimedia files directly from USB. However, if I transfer the file in Windows, the videos play back without issue. This has me worried because I keep all of my old photos and important documents backed up on external drives for safe keeping, but I need to be able to trust my OS to keep the files intact during transfer. Does anyone know what could be causing this data corruption? The only commonality over time seems to be an Ubuntu based operating system as the hardware has changed several times. Thank you so much for any help!
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
ofb

Re: File corruption when copying to FAT32 flash drive

Post by ofb »

I don't know what the cause of this corruption is. I will mention that if you need to be sure of any device to device transfer then you should use a method with error checking, such as checksums. Rsync does this by default and Grsync is a very nice GUI for this tool.

One of its many nice features is sessions, so for example I have a few sessions pre-configured for my different USB sticks, USB HDD, etc.
niowluka

Re: File corruption when copying to FAT32 flash drive

Post by niowluka »

Only thing I can think of: do you unmount the USB before removing it from laptop ?
ganamant
Level 4
Level 4
Posts: 384
Joined: Sun Mar 29, 2015 4:08 pm

Re: File corruption when copying to FAT32 flash drive

Post by ganamant »

Maybe the USB drive has got bad sectors that Windows marked in a way that Linux doesn't understand. Therefore, it tries to write to bad sectors anyways and you get corrupted files. Have you tried with a new drive?

P.S. Flash memory can only take so many write cycles before it becomes unreliable.
ofb

Re: File corruption when copying to FAT32 flash drive

Post by ofb »

Oh, I must say it's not unheard of to have bad transfers by USB. In my own experience this has happened with know-good USB sticks that pass badblocks. That's what caused me to extend my Grsync use to include anything but completely casual USB transfers. Just wasn't worth the once in a blue moon aggravation of finding a corrupt file, and then distrusting the rest.
Mute Ant

Re: File corruption when copying to FAT32 flash drive

Post by Mute Ant »

An OS won't even see bad blocks until the store runs out of spares. An SSD or USB-Flash might switch to read-only at that point. Magnetic stores just get horribly slow and rack up SMART errors.

USB1 and USB2 and SD Cards all use a CRC16 to detect transmission errors in each data block (8192 data bits + 16 'parity' bits). That leaves 1-in-65536 transmissions effectively "untested"; it's rubbish at the receiver but the CRC is correct. Hopefully you would see other tested-and-rejected blocks and realise the link was bad. The odds of CRC16 allowing rubbish to be stored is roughly the same as throwing 6 dice and getting all sixes. So it's imaginable, but not number one on the list of causes.

The normal check in linux systems is to make an md5sum file at the source end, send it along with the data, and check it matches at the receiver. Every Live-DVD contains an example. That's a 128-bit check for each file. Try rolling 48 dice so they all come up a six :lol:
ofb

Re: File corruption when copying to FAT32 flash drive

Post by ofb »

Since we've gone there, might as well include the code. :)

Create a list of md5sums for an archive;
find -type f -not -name md5sum.txt -print0|xargs -0 md5sum >> md5sum.txt

Then in the transfered version of the archive, run this check script which will only output the FAILED checks.
md5sum -c md5sum.txt | grep -v "OK$"

The same check can be run later to check an archive for bit rot. One caveat, when you make a new list of md5sums, that code will append to any old md5sum.txt that may be sitting there, not overwrite it. So either remove the original when finished, or make it distinct, like with the date in the name.
Buzzsaw

Re: File corruption when copying to FAT32 flash drive

Post by Buzzsaw »

ofb wrote:I don't know what the cause of this corruption is. I will mention that if you need to be sure of any device to device transfer then you should use a method with error checking, such as checksums. Rsync does this by default and Grsync is a very nice GUI for this tool.
No it doesn't. See this page.
Buzzsaw

Re: File corruption when copying to FAT32 flash drive

Post by Buzzsaw »

You said that "The only commonality over time seems to be an Ubuntu based operating system as the hardware has changed several times." But by 'hardware' did you include the computer? If not, then the problem might be being caused by bad RAM. Linux might be using a bad part of the RAM for its copying operations, while Windows is using a different part.
LessStuffMoreLife

Re: File corruption when copying to FAT32 flash drive

Post by LessStuffMoreLife »

Thank you everyone for your great responses! To answer niowluka, I always unmount/eject my drives before removing and to answer Buzzsaw, I've run into this problem with a couple different ASUS netbooks as well as my new Dell laptop. Grsync sounds very promising so I think I'll give that a try first. I hope it doesn't come down to running md5sums every time I want to transfer a file. That could get tedious and I believe that would just tell me if my data was corrupted without actually correcting the errors. I would like to have my transfers checked and corrected on the fly. I had assumed that this was just a basic core function of any modern OS.
ofb

Re: File corruption when copying to FAT32 flash drive

Post by ofb »

Buzzsaw wrote:No it doesn't. See this page.
'Yes it does.' :)

Hear me out. I did read and re-read that, plus quite a bit else. I've been convinced to agree a couple of times this afternoon, but please follow me though this:

First we have the problem of the "misleading" man page
Note that rsync always verifies that each _transferred_ file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is transferred, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer "Does this file need to be updated?" check.
If rsync doesn't default checksum-verify each transfer then why was this even included during the explanation of the -c 'always compare checksums' option? And why hasn't it been removed after all these years?

Here (finally!)(I did a /lot/ of reading before getting here.:) )
https://rsync.samba.org/how-rsync-works.html

If you don't already know what the unrelated 'block checksum' refers to, then you'll want to read the whole page. I just mean to highlight in red the key sentences that explain that rsync does in fact do a default checksum-verify of each transfer like the man page says, and in a way that explains why the guys investigating on the other page didn't notice enough 'pause' or a read() call.
The Sender
The sender process reads the file index numbers and associated block checksum sets one at a time from the generator.

For each file id the generator sends it will store the block checksums and build a hash index of them for rapid lookup.

Then the local file is read and a checksum is generated for the block beginning with the first byte of the local file. This block checksum is looked for in the set that was sent by the generator, and if no match is found, the non-matching byte will be appended to the non-matching data and the block starting at the next byte will be compared. This is what is referred to as the “rolling checksum”

If a block checksum match is found it is considered a matching block and any accumulated non-matching data will be sent to the receiver followed by the offset and length in the receiver's file of the matching block and the block checksum generator will be advanced to the next byte after the matching block.

Matching blocks can be identified in this way even if the blocks are reordered or at different offsets. This process is the very heart of the rsync algorithm.

In this way, the sender will give the receiver instructions for how to reconstruct the source file into a new destination file. These instructions detail all the matching data that can be copied from the basis file (if one exists for the transfe), and includes any raw data that was not available locally. At the end of each file's processing a whole-file checksum is sent and the sender proceeds with the next file.

Generating the rolling checksums and searching for matches in the checksum set sent by the generator require a good deal of CPU power. Of all the rsync processes it is the sender that is the most CPU intensive.

The Receiver
The receiver will read from the sender data for each file identified by the file index number. It will open the local file (called the basis) and will create a temporary file.

The receiver will expect to read non-matched data and/or to match records all in sequence for the final file contents. When non-matched data is read it will be written to the temp-file. When a block match record is received the receiver will seek to the block offset in the basis file and copy the block to the temp-file. In this way the temp-file is built from beginning to end.

The file's checksum is generated as the temp-file is built. At the end of the file, this checksum is compared with the file checksum from the sender. If the file checksums do not match the temp-file is deleted. If the file fails once it will be reprocessed in a second phase, and if it fails twice an error is reported.

After the temp-file has been completed, its ownership and permissions and modification time are set. It is then renamed to replace the basis file.

Copying data from the basis file to the temp-file make the receiver the most disk intensive of all the rsync processes. Small files may still be in disk cache mitigating this but for large files the cache may thrash as the generator has moved on to other files and there is further latency caused by the sender. As data is read possibly at random from one file and written to another, if the working set is larger than the disk cache, then what is called a seek storm can occur, further hurting performance.
That's where I am with it anyway. And I've got to put it down for a while. Let's see if it doesn't wake me tonight with a different worry or insight.
Buzzsaw

Re: File corruption when copying to FAT32 flash drive

Post by Buzzsaw »

Does rsync create temp files when copying locally? I'm not sure it does. If it doesn't, then I can't see how it can compare checksums for locally copied files.
Mute Ant

Re: File corruption when copying to FAT32 flash drive

Post by Mute Ant »

[+] "Does rsync create temp files when copying locally?" Yes it does. Set Show Hidden Files and watch the destination. Or interrupt an rsync and you get half-finished droppings.

[-] When the source and destination share the same kernel, they share the same disk cache RAM. rsync is checking a RAM copy of what was sent not what actually got stored.

[-] Even if you do a sync and clear the kernel's RAM records using /proc/sys/vm/drop_caches what you read back from the device can still be what was sent, held in the USB RAM, not what was stored.

[+] There's no substitute for removing the device from the system, power it off, power it on, remount it and read the contents against an md5sum list. It's why the end of a DVD burn ejects and reloads a disk, to be sure the verification is against what's on the medium, not what's in RAM.

[+] Some file formats like FLAC or ZIP contain their own CRC. They are easy to check for corruption without a separate md5sum.

[+] BTRFS detects bitrot and refuses to serve damaged data masquerading as the original.

[+] BTRFS can be set up to save two copies of everything. Obviously that halves the capacity of your store...but now errors can be corrected too.
ofb

Re: File corruption when copying to FAT32 flash drive

Post by ofb »

Oho. Check this six-post thread on the rsync user list.

"rsync doesn't checksum for local transfers?"
https://lists.samba.org/archive/rsync// ... 29814.html
niowluka

Re: File corruption when copying to FAT32 flash drive

Post by niowluka »

I think the thread has deviated off topic somewhat. I've been using linux for over 10 years and never had to use rsync and/or checksums to copy data to USB. If it's some critical data, fair enough, but copying a movie should not require this sort of complexity.
Buzzsaw

Re: File corruption when copying to FAT32 flash drive

Post by Buzzsaw »

ofb wrote:Oho. Check this six-post thread on the rsync user list.

"rsync doesn't checksum for local transfers?"
https://lists.samba.org/archive/rsync// ... 29814.html
Now I've seen that, it's obvious that rsync doesn't verify local writes, since it would have to flush the cache constantly, otherwise only cached data would be read for each checksum verification.
ofb

Re: File corruption when copying to FAT32 flash drive

Post by ofb »

More specifically: it doesn't verify writes. It verifies that the transfer across the network was good, then the write is done, and it does not verify the write at all. That was made clear either in that thread, or the ones I found after. All packed up at the moment; nighttime here and I'm in near-sleep mode.

It's still a sweet tool for making backups for a number of reasons, but it is not useful for the purpose I suggested to LessStuffMoreLife.
Locked

Return to “Beginner Questions”