Tuesday, May 20, 2014

NetApp disk errors: pre-fail a questionable disk

I recently starting receiving NHT health warnings on one of my filers, concerning a disk.

Disk 5a.14.20 received NHT health trigger (0x1 0xb 0x5d 0x10)

After further inspection, this was accompanied with several other block errors regarding this same disk.

scsi.cmd.notReadyCondition:notice]: Disk device 5a.14.20: Device returns not yet ready: CDB 0x00:00000000:0200: Sense Data SCSI:not ready - Drive spinning up (0x2 - 0x4 0x1 0x0)(7281)

raid.read.media.err:debug]: Read error on Disk /aggrXX/plex0/rg4/5a.14.20 Shelf 14 Bay 20 [NETAPP   X412_S15K7560A15 NA04]

raid.tetris.media.err:debug]: Read error on Disk /aggrXX/plex0/rg4/5a.14.20 Shelf 14 Bay 20 [NETAPP   X412_S15K7560A15 NA04] during stripe write

raid.rg.readerr.repair.data:debug]: Fixing bad data on Disk /aggrXX/plex0/rg4/5a.14.20 Shelf 14 Bay 20 [NETAPP   X412_S15K7560A15 NA04]

raid.tetris.media.recommend.reassign.err:info]: Block recommended for reassignment on Disk /aggrXX/plex0/rg4/5a.14.20 Shelf 14 Bay 20 [NETAPP   X412_S15K7560A15 NA04] during stripe write 

I decided to proactively fail the disk - pre-copy the disk to a spare, and then put it in the maintenance center.

disk maint start -d 5a.14.20

You can check on the status by running the aggr status command.  Look for prefail / copy in progress - and the disk it is being copied to.

aggr status -r aggrXX

RAID group /aggrXX/plex0/rg4 (normal, block checksums)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type

      --------- ------          ------------- ---- ---- ----
      data      5a.14.20        5a    14  20  SA:B   0   SAS (prefail, copy in progress)
      -> copy   5a.14.23        5a    14  23  SA:B   0   SAS (copy 7% completed)

Once the disk copy is complete, run:

disk maint status

You will see a status similar to this:

Testing 5a.14.20 test 2/5 cycle 1/1


To get more detail, you can add a -v to the command:

disk maint status -v
Testing 5a.14.20 test 2/5 cycle 1/1
Test name: Power Cycle Test       Iteration: 1/1    Cycle: 1   Status: Not supported   
Test name: Write Same Test        Iteration: 1/1    Cycle: 1   Status: Running            Time:   00:15:42
Test name: Verify Test            Iteration: 0/1    Cycle: 1   Status: Not started     
Test name: Data Integrity Test    Iteration: 0/1    Cycle: 1   Status: Not started     
Test name: Write Same Test        Iteration: 0/1    Cycle: 1   Status: Not started     

No comments:

Post a Comment

Featured Post

Remove 3D Objects and other annoying folders on Windows 10

 Microsoft just keeps adding more crap to clutter up the navigation in Windows 10.  Seriously, who needs a 3D Objects folder?  The tiny perc...