So just getting around to checking my logs on my backup server, and it says that I have a permanently damaged file that’s un-repairable.
How is this even possible on a raidz2 volume where each member shows zero problems and no dead drives? Isn’t that whole point of raidz2, so that if one (er, two) drives have a problem the data is recoverable? How can I figure out why this happened and why it was unrecoverable, and most importantly, prevent it in the future?
It’s only my backup server and the original file is still A-OK, but I’m really concerned here!
zpool status -v:
3-2-1-backup@BackupServer:~$ sudo zpool status -v
pool: data_pool3
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 06:59:59 with 1 errors on Sun Nov 12 07:24:00 2023
config:
NAME STATE READ WRITE CKSUM
data_pool3 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx1 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx2 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx3 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx4 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx5 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx6 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx7 ONLINE 0 0 0
wwn-0x5000ccaxxxxxxxx8 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
data_pool3/(redacted)/(redacted)@backup_script:/Documentaries/(redacted)
Does it have ECC memory?
This is my backup server, so no. Primary does.
That might be the culprit
What version of ZFS are you running? Are you using native ZFS encryption?
Run two scrubs and see if the problem goes away. Has to be at least two.
There’s a bad ZFS corruption that applies to certain pools created with ZFS 2.1.x+
Maybe you’re hitting it?
NOTE: It’s under extremely specific conditions, there’s no need to panic…
It’s definitely not the recent ZFS bug that others mentioned here. Simply to the fact that when corruption occurs due to that bug it cannot be identified, the filesystem is consistent.
https://discourse.practicalzfs.com/t/recurring-permanent-errors-in-healthy-zpool/919/5 is this relevant perhaps?