I’m using an external hard drive with an ext4 file system to make backups. The backup software I use (faubackup) copies the file hierarchy 1:1 into a timestamp-named folder on the hard drive, and makes incremental backups in such a way that it hardlinks new copies of files whose contents have not changed to the same file in the corresponding subfolder of the previous backup. Since I recently had a backup drive die on me, I now want to make sure that all the files written can actually be read without I/O error, so I know I can rely on my backup.
One way to do so would be to read the whole partition, e.g. by
dd‘ing it to
/dev/null. However, the disk is 3TB large, and doing so would take about 7 hours (via USB 3.0).
Another way would be to use
e2fsck with the
-c option, but this also takes ages.
I’m thinking it should be possible to speed the process up by not checking the whole disk, but only the files, which is only a fraction of the whole disk size. This could be done e.g. by writing all files to a tar archive which is not written to disk, but sent to
/dev/null. Here the problem is the hard linking: If I have say 10 incremental backups, the storage for that is again just a fraction of the disk, but it appears to be about 10 times larger than that.
My question: Is there a way to read only the files on the disk, and only one in each set of files hard linking to the same storage space? Or is there a way to make
e2fsck -c or something similar only check the used parts of the file system (allocated blocks)?
GNU tar does not copy the content of hard linked files multiple times. Read the first answer to this question or the official documentation on this topic. You can test this by piping the output (the archive) of tar through wc:
tar cf - -C <mountpoint of your disk> . | wc -c and verify the archive size in bytes (you can compare this to the result with the tar option