Problem :
On a server, I have a directory /opt/kafka/data/topics
.
$ du -hs /opt/kafka/data/topics
52M /opt/kafka/data/topics
When I tar this directory like
$ tar czfv /tmp/topics.tar.gz /opt/kafka/data/topics
I get a file size that makes sense
$ ls -alh /tmp/topics.tar.gz
-rw-r--r-- 1 user user 11M Jan 12 15:15 kafka
However, when I download topics.tar.gz
to my local OS X computer and extract it, it occupies 10GB!
Upon examining the contents of /opt/kafka/data/topics
on the server more closely, I noticed that according to ls
it contains many 10MB files:
$ find /opt/kafka/data -type f -exec ls -alh {} ;
... [output]
-rw-r--r-- 1 user user 10M Jan 12 02:45 /opt/kafka/data/topics/user-entities-KTABLE-REDUCE-STATE-STORE-0000000178-changelog-1/00000000000000000000.index
-rw-r--r-- 1 user user 10M Jan 12 02:45 /opt/kafka/data/topics/user-entities-KSTREAM-KEY-SELECT-0000000123-repartition-2/00000000000000000012.index
... [and many more]
du
reports that each of these 10MB files are 0 bytes:
$ du -h /opt/kafka/data/topics/user-entities-KTABLE-REDUCE-STATE-STORE-0000000178-changelog-1/00000000000000000000.index
0 /opt/kafka/data/topics/user-entities-KTABLE-REDUCE-STATE-STORE-0000000178-changelog-1/00000000000000000000.index
So, what is going on? Obviously I am missing something here:
du
reports 52M total. This makes sense because the device that/opt/kafka/data
is mounted on is only 5GBs,df
reports that it’s only 2% full and everything is still working.tar
gzips the contents to 10M. This makes sense too.ls
reports that many of the files are 10M on disk and when I extract the archive I get 10GBs.du
reports that each of these same files are 0 bytes.mount
reports that/dev/sdc on /opt/kafka/data type ext4 (rw,relatime,data=ordered)
Nothing adds up. Is there some kind of transparent on-disk compression I am not aware of?
Solution :
Based on the discussion in the comments, all the files are sparse. This type of thing actually confuses a lot of people the first time they deal with it, so don’t feel bad.
What’s actually going on here with the values reported by ls
and du
?
This is most easily explained with an example.
Say you create an empty file, and then write 1MB of data to it starting right at the beginning. The resultant file will be 1MB in size, and take up 1MB on disk. Both ls
and du
will report the same 1MB size for the file.
Now say instead you create an empty file, and then call seek()
to move 1MB into the file, and then write one byte. The resultant file will appear to be 1MB + 1 byte long, but there’s only actually 1 byte of data in it.
On older filesystems, the second file would have taken a very long time to write that 1 byte of data, because the OS would be busy writing out 1MB of null bytes before it wrote out that final 1 byte of actual data.
This inefficiency (both in terms of time to create the file, and space used on disk) is where sparse files come in. Instead of writing out that 1MB of null bytes, an OS that supports sparse files (like all modern UNIX systems) will annotate in that filesystem’s metadata that the region form 0-1MB is empty, and then only store that single byte you wrote. As a result, the file will appear to be 1MB + 1 byte long, but on disk it will only take up 1 byte. Additionally, when something goes to read that file, any regions the OS has annotated as empty will just read back as null bytes (so it looks no different to user programs from the first file).
This is where the discrepancy between the values reported by ls
and du
comes from. By default, ls
reports the apparent size of files (that is, how much data would you read if you started reading the file at the first byte and read all the way to the end), while du
reports the actual space used on disk by the file (usually not including other space saving tricks done by the OS like transparent compression). du
agrees with df
in this case because df
only reports the amount of space that is actually physically being used on disk.
By changing that ls -l
command to ls -ls
, you will get an extra column showing the actual on-disk size for files, which should agree with du
.