QUESTION :
I’m writing a shell script that uses the shasum
to check if the contents of a directory have changed.
On Linux and FreeBSD, the shasum
have the same behavior when I do shasum <directory>
however, on MacOS the shasum
give me hashes for files only.
FreeBSD
$ shasum CONTENTS/
7f986e5e5289c59db1bba48df92ffe4707830aaa CONTENTS/
Linux
$ shasum CONTENTS/
7f986e5e5289c59db1bba48df92ffe4707830aaa CONTENTS/
MacOS
$ shasum CONTENTS/
shasum: CONTENTS/:
How could I calculate the hash of a directory in MacOS?
TRY 1: Using TAR with pipes
Tried to use but seems that this tar option doesn’t work on MacOS.
tar cO CONTENTS/ | shasum
tar: Option -O is not permitted in mode -c
da39a3ee5e6b4b0d3255bfef95601890afd80709 -
TRY 2: Using FIND/EXEC
It was consistent between MacOS and FreeBSD, but Linux returned a weird hash
find CONTENTS -type f -exec shasum {} ; | sort -k 2 | shasum
Linux
c2ddb9bc5f543e956f5cdcc76750cb78cc5f26f3
FreeBSD
3ac2a9d4e2fc5d2d2ec3c7f612e680990cc35824
MacOS
3ac2a9d4e2fc5d2d2ec3c7f612e680990cc35824
OTHER FINDINGS ON TAR
tar
would be excellent as it “archives” a folder and then I could shasum
it, however the order of how tar
“walk” the folder structure is not consistent across operating systems. As some helpers mentioned in the comments that I should use the same version of tar
in all systems.
Just an example, on system 1 I have this order:
drwxr-xr-x 0 root wheel 0 27 Jul 07:23 usr/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f1/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f1/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f1/f0/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f1/f0/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f2/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f2/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f2/f1/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f2/f1/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f2/f1/f0/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f2/f1/f0/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f3/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f3/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f3/f2/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f3/f2/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f3/f2/f1/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f3/f2/f1/aaa
and on system 2 I have the following order:
drwxr-xr-x 0 root wheel 0 27 Jul 07:23 usr/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f1/
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f2/
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f3/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f3/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f3/f2/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f3/f2/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f3/f2/f1/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f3/f2/f1/aaa
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f2/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f2/f1/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f2/f1/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f2/f1/f0/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f2/f1/f0/aaa
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f1/aaa
drwxr-xr-x 0 root wheel 0 27 Jul 07:25 usr/f1/f0/
-rw-r--r-- 0 root wheel 0 27 Jul 07:25 usr/f1/f0/aaa
From a tar
standpoint it if all good, but due to the order, the shasum
produces a different hash.
CONCLUSION
shasum
is consistent among Linux and BSDs to check an individual file hash, but, when it comes to directories the consistency happens only on MacOS and FreeBSD, perhaps due to how files are sorted.
If sorting is enforced using the find
command, consistency is only obtained in FreeBSD and MacOS, however this method is time prohibitive as it takes a significant amount of time to calculate the hashes for every single file and then the whole structure hash.
Using tar
to create a temporary file and then doing a shasum
also found to be inconsistent between Linux and BSDs, perhaps because of difference in the archiving method.
I think the only way forward is to redesign my solution.
ANSWER :
mtree
is the tool you want.
Suppose:
$ mkdir foo
$ date > foo/date1; sleep 3
$ date > foo/date2; sleep 3
$ date > foo/date3
$ grep . foo/*
foo/date1:Wed Jul 24 16:11:32 PDT 2019
foo/date2:Wed Jul 24 16:11:35 PDT 2019
foo/date3:Wed Jul 24 16:11:38 PDT 2019
$ find . -ls
7318841 0 drwxr-xr-x 3 admin staff 102 Jul 24 16:11 .
7318847 0 drwxr-xr-x 5 admin staff 170 Jul 24 16:11 ./foo
7318849 8 -rw-r--r-- 1 admin staff 29 Jul 24 16:11 ./foo/date1
7318851 8 -rw-r--r-- 1 admin staff 29 Jul 24 16:11 ./foo/date2
7318853 8 -rw-r--r-- 1 admin staff 29 Jul 24 16:11 ./foo/date3
Create a reference manifest of directory foo
and store it in foo.mtree
:
$ mtree -c -K sha256digest -p foo > foo.mtree
Now go and mess with any file in that directory.
$ touch foo/date3
Run mtree
again and pass it the manifest you created earlier, and mtree
will tell you what
changed:
$ mtree -p foo < foo.mtree || echo fail
date3 changed
modification time expected Wed Jul 24 16:11:38 2019 found Wed Jul 24 16:14:00 2019
fail
$ echo '$ date > foo/date2' >> bar
$ mtree -p foo < foo.mtree || echo fail
date2 changed
modification time expected Wed Jul 24 16:11:35 2019 found Wed Jul 24 16:19:40 2019
SHA-256 expected c76a568f08d98c2830f2fdfb42415c3ec15341b8741450d4bbd863f1d5c4c691 found ddcf8d07785bfe4d031a989339835dc3b8b44653019568dcee612c44fc8e2f70
date3 changed
modification time expected Wed Jul 24 16:11:38 2019 found Wed Jul 24 16:14:00 2019
fail
Any files missing from foo
or added since the manifest was created will also be reported:
$ mv foo/date1 foo/date4
$ mtree -p foo < foo.mtree || echo fail
. changed
modification time expected Wed Jul 24 16:11:38 2019 found Wed Jul 24 16:21:38 2019
date2 changed
modification time expected Wed Jul 24 16:11:35 2019 found Wed Jul 24 16:19:40 2019
SHA-256 expected c76a568f08d98c2830f2fdfb42415c3ec15341b8741450d4bbd863f1d5c4c691 found ddcf8d07785bfe4d031a989339835dc3b8b44653019568dcee612c44fc8e2f70
date3 changed
modification time expected Wed Jul 24 16:11:38 2019 found Wed Jul 24 16:14:00 2019
date4 extra
./date1 missing
fail
Rmlint will do what (I think it is) you want.
Relevant points:
- It doesn’t use SHA by default, but can be told to.
- It can be installed on MacOS via homebrew.
- By default it doesn’t calculate a checksum for a single specified directory. It can be told to calculate checksums for all directories from a given starting point, as a way of finding “duplicate” directories below that point. But as a side effect, will also do exactly what you seem to be asking.
- It may be overkill for what you’re looking for, and may take a while for you to figure out the best option flags to use, but is quite robust.
- Figuring out what flags to use might be tricky. Getting directory checksums is easy enough, but getting it to not do other things, can be tricky. (Although to be clear, it doesn’t actually modify anything. At most, it generates a shell script, that you can manually run later, to modify things if desired. What it seems you need, is the JSON and/or CSV output files, which will give you the directory checksum you’re looking for.)
I use rmlint in a bash script to find duplicate directories. Here is a command that will minimally do what you want, and as little else as possible:
rmlint "base/dir/to/start/from" --see-symlinks --hidden --algorithm=sha256 --types=none,duplicatedirs --no-backup -o csv:log.csv