Page 1 of 1

Using cp to backup files with preserved original timestamps?

Posted: Thu Feb 14, 2013 10:00 pm
by mykec
Hi. I keep a lot of FLAC files on a 2TB hard drive and periodically I will make a fresh backup of the entire drive to another 2TB hard drive reserved specifically as a backup drive. Because it takes 24 hours to copy a full 2TB hard drive, I don't do this as often as I should. It would be nice if I could easily backup only the FLACs on my source drive which are different and just leave the rest alone if they are the same. Both cp and mv have "update" switches but they only copy files which are either newer on the source drive or non-existent on the destination drive.

This wouldn't be a problem if it wasn't for the fact that I frequently update my FLACs' metadata using EasyTag which preserves the original timestamps after updating the files. I *like* how EasyTag leaves my original timestamps alone but I suspect this also makes it impossible for cp and mv to tell the difference between an older FLAC with no metadata and a newer one updated using EasyTag.

I searched the man pages for cp and mv and found no options for comparing filesize data when performing an update. In other words, it would be nice if their update switches could also be made to backup files which are *different* on the source drive and not just *newer*.

I am currently exploring rsync as an alternative because it supports checksumming the files to verify accuracy of the backups made - but good grief, that seems like so much extra work and time wasted when a simple filesize comparison would suffice.


Re: Using cp to backup files with preserved original timesta

Posted: Fri Feb 15, 2013 2:52 am
by bjornmu
Never had a need for this myself, but according to the man page, rsync should by default do just what you need: compare the file size and copy files that have changed in size. It can't hurt to try; remember to add the -v option for verbose so it writes out which files it copies.

Re: Using cp to backup files with preserved original timesta

Posted: Fri Feb 15, 2013 4:42 am
by mykec
As I type I am 8 1/2 hours into a full 2TB backup operation using "rsync -a [src] [dest]". I should have used -v as well but, truth be told, I'm not particularly concerned about knowing exactly which files are currently being copied. I can instead right-click on the drive icon and view its "properties" to see the blue and yellow pie chart and number of GBs copied so far - which for me is quite sufficient. I suppose the absense of verbosity should decrease the time required to copy the entire drive because there's just that much less work to be done as it goes along.

In the past, using cp, I've been able to copy 2TB in 24 hours. So far with rsync, I'm at 333.1 GB in 8 1/2 hours - so this is going about half-speed compared to cp in the past and, therefore, turning a 1 day job into a 2 day job. Sheesh.

rsync, as I understand it, doesn't just compare filesizes but actually *checksums* the source and destination files as it writes them to verify accuracy of its operation. That may be crucial when copying files over a network but I'm just copying from one external USB drive to another on a single machine. It seems reasonable to believe it's the *checksum* operation that's doubling the copy time. A simple bytesize comparison would be sufficient and, most likely, much less time-consuming than full-blown checksumming of every file.

Regardless, my original point is wondering how to make occasional backups based on bytesize difference in addition to whether the source file is newer or non-existent on the destination drive. Because when I update the metadata of a FLAC file, the file *has* been changed but the timestamp appears to be the same as it was before. Would that not fool "cp" into ignoring a modified FLAC since 'cp --update' uses timestamp to decide whether or not to copy?

Re: Using cp to backup files with preserved original timesta

Posted: Fri Feb 15, 2013 11:31 pm
by bt101
My understanding of rsync is that its default behavior is to copy the file if either one of things has changed:
  • file date
  • file size
If both of the above are the same, you can use the -c (checksum) option so that a checksum is done one both files. However there is a penalty where the files on each side need to be read and the CPU needs to do the checksum. Certainly beneficial if you have a slow connection (YMMV otherwise).

I would recommend that you do allow your program to change the modify date when it modifies the file (if that is possible). After all the file is indeed being modified. Also, it just makes the rsync much simpler.

I would highly recommend using rsync-backup though. It is a backup program that is a conglomeration of various building blocks (one of them being rsync). It's biggest benefit is that it only backs-up changed blocks. It's perfect for cases like yours where you have tons of data and only a few changes to send. I backup entire machines with it and it goes very fast,

The backup "area" consists of a mirror copy of your last backup and a (let's call it) proprietary area that holds changed blocks for previous backups. So getting a file from the latest backup is a no-brainer. You just use a file manager. Getting older versions of files does require some commands though.

One big caveat though... I don't think there is an option in rdiff-backup for checksum, so if your files remain the same size, you will have to allow the dates to change for rdiff-backup to pickup on the change.

Re: Using cp to backup files with preserved original timesta

Posted: Sat Feb 16, 2013 5:00 am
by mykec
Personally, I prefer the timestamps on my FLACs to reflect the date and time the file itself was created. Many of my FLACs were ripped from my CDs in 2005, 2006, 2007, etc.. I did not have a program for modifying the artists and titles, etc. in FLACs until the past year or so when I began using EasyTag. I don't want a 2013 timestamp on a 2005 FLAC just because I finally gave it some meta-data that could change again at any time. If I modify a FLAC using some editor like SoX or Audacity, then yes I absolutely want a new timestamp on the file - because changes like that are substantive, whereas metadata is cosmetic at best. And I would *hate* it, for example, if a related group of FLACs all had matching 2009 timestamps except one because in 2013 I discovered I'd accidentally misspelled its title. I might see that 2 years from now and think I'd modified the music in the file using SoX or Audacity instead of merely correcting a typo. Yeah, in textbook sense, a changed file is a changed file but in a practical sense the original rip/FLAC date is more useful to me because I can look at that and know exactly how long ago I created it.

As for rsync-backup... Thanks, I've never heard of it. So let me get this straight... Since it only backs-up changed *blocks* I should be able to make some fairly radical revisions to the way my files are organized within the directory tree and not cause them to be re-copied during an rsync-backup because the actual data in the blocks to which they were written was never changed, right? Only the blocks containing my changes to the directory structure would be backed-up, right? If so, wow! Because I do tend to make big changes to the organization of my files over time without actually changing the files themselves.

Re: Using cp to backup files with preserved original timesta

Posted: Sat Feb 16, 2013 3:06 pm
by bjornmu
I see the point, I have a similar issue with pictures from my digital camera. I've always been using gphoto2 to download off the camera and that will set the time of the file to when the picture was taken (read from the EXIF data I presume), which I find very useful. But then when I finally go through old pictures and fix them up in Gimp (unsharp mask) I want to keep that timestamp. But in my case I don't overwrite the original but create new copies so the problem isn't the same.

I still wonder why rsync apparently wants to copy all files, are you sure the timestamp of the exiting file in the destination is identical to the source? And taking twice as long as cp also sounds strange. It might be useful to try with a small set of files, adding -v to make sure you see which files are processed.

Re: Using cp to backup files with preserved original timesta

Posted: Sat Feb 16, 2013 6:37 pm
by bt101
Yes rdiff-backup should only send the changed-blocks in a file. I note that you mentioned "I should be able to make some fairly radical revisions to the way my files are organized". If you mean that you would (for example) move a whole folder to a new location, then (I'm guessing) all of those files would be considered new as their path did not exist before. Anyway, whatever whatever changes you make to your files and filesystem, it should still result in less traffic/time than most other backup methods.

I originally encountered the same problem as you when noticed that I had a truecrypt volume that was being missed by rdiff-backup. Truecrypt has the "feature" where it also does not modify the change date of the archive file (it tries to be stealthy). In the end, I had to turn-off that feature for the backup to work.

I shouldn't belabour the point about your file dates as you have likely consider the following already. If your file dates are that important, perhaps it is better to organize the files in folders based on date. If you rely on the filesystem change date, that info can get lost if you decide to copy parts or all of your collection to other locations. Or perhaps there is some way to store that date in the meta-data you mentioned.