ZFS is a beautiful filesystem – even in case of hardware failures, as it was build to deal with them. Allow me to demonstrate a defective disk on a Raid-Z1 pool. As long as only one disk breaks down, it is still functional – even while rebuilding.
To do the recovery, we need to locate the bad disk, which happens by typing ‘zpool status -x’:
pool: MyPool state: DEGRADED status: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Attach the missing device and online it using 'zpool online'. see: https://web.archive.org/web/20090511160031/http://www.sun.com:80/msg/ZFS-8000-2Q scrub: none requested config: NAME STATE READ WRITE CKSUM Ahsay DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c1t0d5 ONLINE 0 0 0 c1t0d6 ONLINE 0 0 0 c1t0d7 UNAVAIL 0 0 0 cannot open c1t0d8 ONLINE 0 0 0 errors: No known data errors
The defective disk is c1t0d7. If the disk isn’t damaged, you can try to bring it back online using the command ‘zpool online
If this works, anything is fine again, but if you need to put the disk at another LUN, it gets a new ID. In that case you need to use ‘zpool replace
pool: MyPool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 0h0m, 0.00% done, 517h14m to go config: NAME STATE READ WRITE CKSUM Ahsay DEGRADED 0 0 0 raidz1 DEGRADED 0 0 0 c1t0d5 ONLINE 0 0 0 c1t0d6 ONLINE 0 0 0 replacing DEGRADED 0 0 7 c1t0d7s0/o FAULTED 0 0 0 corrupted data c1t0d7 ONLINE 0 0 0 3.54M resilvered c1t0d8 ONLINE 0 0 0 errors: No known data errors
After the rebuild, the so called ‘resilvering’ completed, the topic is history and we are done with it: short and sweet. In my opinion it’s even a little too easy to be true, as the whole system could stay online and working all the time.