Problem :
I want to create a zfs raidz2 with 4 disks. I know this will “waste” at least 50% of the space but I aim at a high tolerance against disk failures. Using 5 disks for raidz2 is not recommended and 6 disks is to expensive and unnecessary for me. I will have 3 TB (or maybe 4 TB) disks which all have 4k sectors with 512 byte emulation.
I have read many things about zfs and raidz2 and I am starting to get confused.
Should I partition the disks?
I read that it is a good idea to create partitions in order to make the OS aware that the disk is not empty. The partitioning can also help to align to 4k sectors.
GPT or MBR? As far as I understood GPT is a must for 3 TB disks (and larger).
How should I create the partition? Will this give me a correctly aligned partitions?
sudo parted --align optimal /dev/sdX
mklabel gpt
mkpart primary 1 0% 3TB
Bonus question: how big will the partition created by mkpart primary 1 0% 3TB
be? Will it be 3*10^12 - 4096 byte
?
My intention is to not use the full size of the disk but to limit myself to 3 TB in order to be able to replace a failed drive with a different model.
Do I need ashift=12
if I have aligned partitions? Is it useless? Does it harm me somehow?
What about the stripsize? Do I need to modify it? What if I use 5 disks instead of 4 disks?
Is there anything else what I should consider?
Solution :
You need to set ashift manually if your disk reports the wrong values, which is the case for all disks inside the transition period (4K internally, 512 logically) – earlier disks report the correct 512, later disks report the correct 4k. Of course you can always specify it, it just overwrites what the disk itself would report/suggest to the system.
If you are unsure, you could also create a pool, write down the usable size, then destroy it, set ashift on all disks to 12, create it again and compare the sizes. If they are equal, your disks are honest about their contents.
Partitioning disks should only be necessary if your disks are not recognized correctly or if you want to have smaller than possible sizes.
ZFS on Linux will automatically partition disks. Earlier, it didn’t do this. It’ll look like this:
Disk /dev/sdb: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 2E55AB8A-8B22-494E-A971-B6D639BA14B1
Device Start End Sectors Size Type
/dev/sdb1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdb9 1953507328 1953523711 16384 8M Solaris reserved 1
Also, it will probably use the correct ashift
value automatically. My SSD has 13, the above HDD has 12, as expected.
Just do the following:
zpool create tank raidz2 sda sdb sdc sdd
Then check with zdb
that ashift
appears correct.