du -sb giving different results for 2 directories with same contents?

Posted on

Problem :

I have a directory A with 1000 sub-directories (000-999) each containing 3500 .jpg files. I wrote a PHP script that copied each of those files to another directory B in the exact same structure, namely 1000 sub-directories each containing 3500 .jpg files, except:

  1. .jpg files are renamed to new names
  2. while sub-directories are not renamed but they are now containing different files than they do in directory A

This script ran for about 20 hours and when it finally finished I ran:

du -sb *

At their parent directory to get the apparent sizes of A and B. Interestingly here’s what came up:

74778240380   A
74809644412   B

I then ran another PHP script throughout B and it turned out to have the exact same number of files as does A. Now I’m at a loss.

Why are the du -sb results different? Any other way to verify that the copy process was a success and that B is a perfect duplication of A?

Solution :

You say that you copied the JPEG files with new names. 
If the new names are substantially longer than the old ones,
then the new directories (the subdirectories of “B”)
will be bigger than the old directories (the subdirectories of “A”). 
(Yes, directories take up space, and du counts that space.) 
Your size delta (74809644412 – 74778240380) is 31404032, which is approximately 1000 × 31404. 
This is consistent with each of the 1000 subdirectories getting 31404 bytes bigger (on average). 
If each of the 3500 JPEG files’ names got nine characters longer (on average), that would do it.

Use the diff command:

diff -qr A B

From the man page -qr gives

   -q     Report  only  whether  the  files differ, not the details of the

   -r     When comparing directories, recursively compare any  subdirecto-
      ries found.

Leave a Reply

Your email address will not be published. Required fields are marked *