Change n' Hues

vSAN Disk group is in "Unhealthy State"


If you are running VMware vSAN 6.0, 6.1 and 6.2 then there is a high chance that you will be seeing this issue with the following RAID controllers,

Cisco 12G SAS Modular Raid Controller
DELL FD332-PERC (Dual ROC)
DELL FD332-PERC (Single ROC)
DELL PERC H730 Adapter
DELL PERC H730 Mini ==>We are using with Dell R620/630 serves with this RAID controller
DELL PERC H730P Adapter 
DELL PERC H730P Mini
Huawei Technologies Co. Ltd. SR 430C
Lenovo ThinkServer RAID 720i AnyRAID Adapter
Lenovo ThinkServer RAID 720ix AnyRAID Adapter
Lenovo ServeRAID 5210e SAS/SATA Controller
Lenovo ServeRAID M5210 SAS/SATA Controller
LSI MegaRAID SAS 9361-8i
LSI MegaRAID SAS 9362-8i
Supermicro SMC3108
But this can happen due to Physical Disk Drive failure and RAID Controllers from above list resetting the Disk Drives.















In some scenario only one disk group will go to unhealthy state or all the disk groups will go to unhealthy state on the ESXi host in the vSAN cluster.
The Disk group turns out to be unhealthy only if the cache disk goes not the capacity disks. When a flash cache device in a disk group is impacted by a failure, the whole of the disk group is impacted. 
The disk group status in the vSphere web client shows the overall disk group is now “Unhealthy”. The status of the magnetic disks in the same disk group shows “Flash disk down”.

vsan.disks_stats

This is a very useful command to determine the following information about disks:
Number of components on a disk (for SSDs, this is always 0)
Total disk capacity
The percentage of disk that is being consumed
Health Status of the disk
The version of the on-disk format
+++++++++++++++++++++++++++++++++++++++++++++
vsan.disks_stats /test-vc-2.local.com/vRack-Datacenter/computers/vsancluster/hosts/192.168.1.10

+----------------------+---------------+-------+------+------------+---------+----------+-------------+
| naa.50000396cc89a8c1 | 192.168.1.10 | SSD | 0 | 1490.41 GB | 1.69 % | 0.00 % | FAILED (v2) |
| naa.5000c5008fafefeb | 192.168.1.10 | MD | 22 | 1106.62 GB | 37.56 % | 37.24 % | FAILED (v2) |
| naa.5000c5008fb00f23 | 192.168.1.10 | MD | 23 | 1106.62 GB | 40.90 % | 40.67 % | OK (v2) |
| naa.5000c5008fb17f5f | 192.168.1.10 | MD | 21 | 1106.62 GB | 45.78 % | 45.55 % | OK (v2) |
| naa.5000c5008fb0d70f | 192.168.1.10 | MD | 21 | 1106.62 GB | 45.69 % | 45.55 % | FAILED (v2) |
+----------------------+---------------+-------+------+------------+---------+----------+-------------+
| naa.50000396cc89a8c5 | 192.168.1.10 | SSD | 0 | 1490.41 GB | 2.29 % | 0.00 % | FAILED (v2) |
| naa.5000c5008fb140f3 | 192.168.1.10 | MD | 23 | 1106.62 GB | 41.13 % | 40.49 % | OK (v2) |
| naa.5000c5008fafd21f | 192.168.1.10 | MD | 28 | 1106.62 GB | 35.67 % | 35.43 % | FAILED (v2) |
| naa.5000c5008fb0c10b | 192.168.1.10 | MD | 27 | 1106.62 GB | 35.77 % | 30.37 % | FAILED (v2) |
| naa.5000c5008fb168cb | 192.168.1.10 | MD | 21 | 1106.62 GB | 30.73 % | 30.37 % | FAILED (v2) |
+----------------------+---------------+-------+------+------------+---------+----------+-------------
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Possible Solutions:
1) If there is no Disk Drive failure(used for cache), put the host into Maintenance Mode with Full Data Migration and reboot it. And check whether the unhealthy disk group has come up healthy or not under,
Cluster ==> Manage ==> Settings ==> Virtual SAN ==> Disk Management
If all the Disk group are healthy state, exit the host out of Maintenance Mode and now the issue has been resolved.
2) If there is any Physical Disk drive(used for cache) failure, Check with your hardware vendor for the disk replacement.
3) Please log a support case with GSS if the above mentioned do not fix the issue. 
When fixing any vSAN issues always make sure that you run the vSAN health check under,
vSANCluster ==> Monitor ==> Virtual SAN==> Health.

Read:
https://kb.vmware.com/s/article/2144936
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/vsan/vsan-troubleshooting-reference-manual.pdf
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/products/vsan/vmw-gdl-vsan-health-check.pdf

Comments