Can I easily make rsync checksum only files that are the same size?

About writing shell scripts and making the most of your shell
Forum rules
Topics in this forum are automatically closed 6 months after creation.
Locked
User avatar
linx255
Level 5
Level 5
Posts: 668
Joined: Mon Mar 17, 2014 12:43 am

Can I easily make rsync checksum only files that are the same size?

Post by linx255 »

Hi,

I'm hoping not to have to write a script for this but wondering...Is there an easy way to run rsync in checksum mode, but first compare all file sizes and then only checksum the files that are the same size? Then it would sync any files that are different without having to checksum every single file in the specified directories. I mean, it doesn't make sense to checksum the file if you already know the size doesn't match. Or does checksum mode already take that into account? I need checksum mode because sometimes my files change without changing the size, and it takes far too long to sync.

Thanks
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
- I'm running Mint 18 Mate 64-bit
- 4.15.0-34-generic x86_64
- All my bash scripts begin with #!/bin/bash
User avatar
xenopeek
Level 25
Level 25
Posts: 29588
Joined: Wed Jul 06, 2011 3:58 am

Re: Can I easily make rsync checksum only files that are the same size?

Post by xenopeek »

rsync updates files if the last modified time and/or the size differs between source and destination. It doesn't use checksums for that unless you tell it to. If you modify a file and its size remains the same, its last modified time is updated anyway so that would cause rsync to update it.

rsync always use checksumming to verify a file was transferred without errors.

From the rsync manpage:
−c, −−checksum

This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a "quick check" that (by default) checks if each file’s size and time of last modification match between the sender and receiver. This option changes this to compare a 128−bit checksum for each file that has a matching size. Generating the checksums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly.

The sending side generates its checksums while it is doing the file−system scan that builds the list of the available files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has the same size as the corresponding sender’s file: files with either a changed size or a changed checksum are selected for transfer.

Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole−file checksum that is generated as the file is transferred, but that automatic after−the−transfer verification has nothing to do with this option’s before−the−transfer "Does this file need to be updated?" check.

For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5. For older protocols, the checksum used is MD4.
Image
User avatar
linx255
Level 5
Level 5
Posts: 668
Joined: Mon Mar 17, 2014 12:43 am

Re: Can I easily make rsync checksum only files that are the same size?

Post by linx255 »

This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a "quick check" that (by default) checks if each file’s size and time of last modification match between the sender and receiver.
Right, and I have to use -c because I have several very large archives that change size sometimes but not always, and checksum is necessary for the occasional time when the contents but not the size change.

So it would be nice if I could, sort of, 'hybridize' the quick check and checksum. If the file sizes don't match (which doesn't take long to find out) then we already know the checksums won't match, so why spend time checksumming files that don't match in size?

With rsync checksumming every file this makes my sync time several hours compared to the several minutes it takes now that I'm having to use cp instead of rsync. I don't suppose this is really a problem except that I had to update a bunch of code and I don't get all the options in rsync that avoid certain files.

It seems if I want to stick with rysnc practically it looks like I'd have to write a script to make a list of files that don't match in size, sync them, and checksum all others, only syncing mismatches. Too bad there isn't an option for this.
- I'm running Mint 18 Mate 64-bit
- 4.15.0-34-generic x86_64
- All my bash scripts begin with #!/bin/bash
MintBean

Re: Can I easily make rsync checksum only files that are the same size?

Post by MintBean »

If the contents change, the file 'last modified' changes, hence rsync will catch it.
User avatar
linx255
Level 5
Level 5
Posts: 668
Joined: Mon Mar 17, 2014 12:43 am

Re: Can I easily make rsync checksum only files that are the same size?

Post by linx255 »

OH wait, so then checksum is only useful for obscure situations where the 'last modified' and size are the same but contents aren't. If that's the case maybe I don't need to use checksum afterall. Hmm
- I'm running Mint 18 Mate 64-bit
- 4.15.0-34-generic x86_64
- All my bash scripts begin with #!/bin/bash
Locked

Return to “Scripts & Bash”