Can I easily make rsync checksum only files that are the same size?

Forum rules
Before you post please read how to get help
User avatar
linx255
Level 5
Level 5
Posts: 570
Joined: Mon Mar 17, 2014 12:43 am

Can I easily make rsync checksum only files that are the same size?

Postby linx255 » Thu Jul 27, 2017 1:37 am

Hi,

I'm hoping not to have to write a script for this but wondering...Is there an easy way to run rsync in checksum mode, but first compare all file sizes and then only checksum the files that are the same size? Then it would sync any files that are different without having to checksum every single file in the specified directories. I mean, it doesn't make sense to checksum the file if you already know the size doesn't match. Or does checksum mode already take that into account? I need checksum mode because sometimes my files change without changing the size, and it takes far too long to sync.

Thanks
- I'm running Mint 18 Mate 64-bit
- 4.4.0-21-generic x86_64
- All my bash scripts begin with #!/bin/bash

User avatar
xenopeek
Level 24
Level 24
Posts: 21460
Joined: Wed Jul 06, 2011 3:58 am
Location: The Netherlands

Re: Can I easily make rsync checksum only files that are the same size?

Postby xenopeek » Thu Jul 27, 2017 5:06 am

rsync updates files if the last modified time and/or the size differs between source and destination. It doesn't use checksums for that unless you tell it to. If you modify a file and its size remains the same, its last modified time is updated anyway so that would cause rsync to update it.

rsync always use checksumming to verify a file was transferred without errors.

From the rsync manpage:
−c, −−checksum

This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a "quick check" that (by default) checks if each file’s size and time of last modification match between the sender and receiver. This option changes this to compare a 128−bit checksum for each file that has a matching size. Generating the checksums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly.

The sending side generates its checksums while it is doing the file−system scan that builds the list of the available files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has the same size as the corresponding sender’s file: files with either a changed size or a changed checksum are selected for transfer.

Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole−file checksum that is generated as the file is transferred, but that automatic after−the−transfer verification has nothing to do with this option’s before−the−transfer "Does this file need to be updated?" check.

For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5. For older protocols, the checksum used is MD4.
Image

User avatar
linx255
Level 5
Level 5
Posts: 570
Joined: Mon Mar 17, 2014 12:43 am

Re: Can I easily make rsync checksum only files that are the same size?

Postby linx255 » Thu Jul 27, 2017 11:29 am

This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a "quick check" that (by default) checks if each file’s size and time of last modification match between the sender and receiver.


Right, and I have to use -c because I have several very large archives that change size sometimes but not always, and checksum is necessary for the occasional time when the contents but not the size change.

So it would be nice if I could, sort of, 'hybridize' the quick check and checksum. If the file sizes don't match (which doesn't take long to find out) then we already know the checksums won't match, so why spend time checksumming files that don't match in size?

With rsync checksumming every file this makes my sync time several hours compared to the several minutes it takes now that I'm having to use cp instead of rsync. I don't suppose this is really a problem except that I had to update a bunch of code and I don't get all the options in rsync that avoid certain files.

It seems if I want to stick with rysnc practically it looks like I'd have to write a script to make a list of files that don't match in size, sync them, and checksum all others, only syncing mismatches. Too bad there isn't an option for this.
- I'm running Mint 18 Mate 64-bit
- 4.4.0-21-generic x86_64
- All my bash scripts begin with #!/bin/bash

User avatar
MintBean
Level 9
Level 9
Posts: 2543
Joined: Fri Aug 07, 2015 6:54 am
Location: Blighty

Re: Can I easily make rsync checksum only files that are the same size?

Postby MintBean » Thu Jul 27, 2017 11:53 am

If the contents change, the file 'last modified' changes, hence rsync will catch it.

User avatar
linx255
Level 5
Level 5
Posts: 570
Joined: Mon Mar 17, 2014 12:43 am

Re: Can I easily make rsync checksum only files that are the same size?

Postby linx255 » Thu Jul 27, 2017 1:42 pm

OH wait, so then checksum is only useful for obscure situations where the 'last modified' and size are the same but contents aren't. If that's the case maybe I don't need to use checksum afterall. Hmm
- I'm running Mint 18 Mate 64-bit
- 4.4.0-21-generic x86_64
- All my bash scripts begin with #!/bin/bash


Return to “Scripts & Bash”