Last updated

Prepare new harddisk for ZFS/NAS

You can read more on my homelab and datahoarding problem here and here.

Today I scored two recertified 10TB HGST drives for very little. Normally I’d go for the brand new stuff, but this deal was too good to be true.

My main goal is to check if these recertified disks are worth the money/effort. Next up I want to experiment with a new ZFS pool setups. (You still cannot remove a raidz vdev from your pool in 20241)

Current state of affairs

Right now my main ZFS storage pool looks like this:

  • tank
    • raidz1 4x 3TB WD Red
    • raidz1 4x 8TB WD White
    • raidz1 4x 14TB WD White

All in all good for 100TB raw storage space and roughly 75TB of usable storage. And yet, it’s getting full. Linux ISO’s take up a lot of space. I also have a 2TB (2x 1TB SSD mirrors) pool for VM storage and a single 3TB drive as a backup intermediate disk (i.e. backups are copied there, then uploaded elsewhere).

As stated, I cannot remove any of those raidz1 vdevs. My only options are to build a new pool with new disks or replace disks in this pool.

It’s all about trust

So, new drives I normally trust. If they don’t work, they don’t work. But if they spin-up, I’ve alwasy assumed they’d be okay. Yeah, I know.

But, since I now have a pair of 10TB recertified drives, I’d like to be sure they’re good to go. Recertified in this context probably means they were retired from a data center somewhere. Their power-on time is a little over 5 years with a production data of December 2017.

To make sure these disks are good to go, I’m going to run a bunch of tests against them to see if they hold up.

  1. SMART conveyance test
  2. SMART extended test
  3. badblocks

S.M.A.R.T.

If you don’t know about different SMART tests, here’s a refresher. I’m skipping the short test, because I’m running a long test.

Short Checks the electrical and mechanical performance as well as the read performance of the disk. Electrical tests might include a test of buffer RAM, a read/write circuitry test, or a test of the read/write head elements. Mechanical test includes seeking and servo on data tracks. Scans small parts of the drive’s surface (area is vendor-specific and there is a time limit on the test). Checks the list of pending sectors that may have read errors, and it usually takes under two minutes.

Long/extended A longer and more thorough version of the short self-test, scanning the entire disk surface with no time limit. This test usually takes several hours, depending on the read/write speed of the drive and its size.

Conveyance Intended as a quick test to identify damage incurred during transporting of the device from the drive manufacturer to the computer manufacturer. Only available on ATA drives, and it usually takes several minutes.

Running these is as easy as:

1$ sudo smartctl -t <short|long|conveyance> /dev/sda

If you want to know how long theses tests are going to take:

1$ sudo smartctl -c /dev/sda
2...
3Short self-test routine
4recommended polling time: 	 (   2) minutes.
5Extended self-test routine
6recommended polling time: 	 (1144) minutes.
7...

So it’ll take a little of 19 hours to do a full extended SMART test.

Badblocks

Badblocks is a utility, well, searches a disk for bad blocks. It will write data to the entire disk an verify each block to good.

Naively I ran the following to do a full write test with progress indicator and verbose output:

1$ sudo badblocks -wsv /dev/sda
2badblocks: Value too large for defined data type invalid end block (9766436864): must be 32-bit value

As it turns out, badblocks uses a default block size of 1024, meaning it cannot - out of the box - scan disks 8TB or larger. Let’s figure out what blocksize our disk uses, and plug that information into badblocks.

1$ sudo blockdev --getbsz /dev/sda
24096
3
4$ sudo badblocks -t random -w -v -s -b 4096 /dev/sda
5Checking for bad blocks in read-write mode
6From block 0 to 2441609215
7Testing with random pattern:   5.97% done, 39:45 elapsed. (0/0/0 errors)

And we’re in business. Well, now we wait a few hours / days for badblocks to complete. I might even do a second pass just for the fun of it.

What’s next?

After I’ve run at least two badblocks passes and both they conveyance and extended SMART tests on this disk, I’m going to do it on the other one as well. If that all goes well I’ll probably put them as a ZFS mirror pair in a new pool and do some testing there.


  1. There are good technical reasons for that, I know. Wish I’d known about it before I built my pool, though. ↩︎