Hi,
When using an iSCSI Active/Passive (standby, ALUA access state 0x2) dual node ALUA compatible storage array, ESXi always treats the secondary, standby path as an Active/Optimized path, where I would have expected the path to show as inactive/dead/non optimized. The iSCSI devices get discovered as VMW_SATP_ALUA from ESXi (which is correct), and ESXi defaults to VMW_PSP_MRU for its path selection policy, as the array doesn't match any custom rules from "esxcli storage nmp satp rule list", which is also correct. However, ESXi will attempt to connect to whichever path comes up first and mark it as "current/preferred", even if this is the standby path, leading to all IOs to that device failing and the storage adapter rescan operation to take several minutes to complete.
When I check from a RHEL 7.5 machine I can see that the paths are set correctly:
Active path:
sg_rtpg /dev/sdb
Report target port groups:
target port group id : 0x0 , Pref=1
target port group asymmetric access state : 0x00
T_SUP : 1, O_SUP : 1, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 1, AO_SUP : 1
status code : 0x00
vendor unique status : 0x00
target port count : 02
Relative target port ids:
0x01
0x02
Passive path:
sg_rtpg /dev/sdc
Report target port groups:
target port group id : 0x0 , Pref=0
target port group asymmetric access state : 0x02
T_SUP : 1, O_SUP : 1, LBD_SUP : 0, U_SUP : 1, S_SUP : 1, AN_SUP : 1, AO_SUP : 1
status code : 0x02
vendor unique status : 0x00
target port count : 01
Relative target port ids:
0x01
Access state 0x00 -> Active, optimized
Access state 0x02 -> standby
Pref=1 -> Preferred bit True
Pref=0 -> Preferred bit False
However, when discovering my LUNs from ESXi, I end up with a mix bag of multipath devices configured either wrong or right (there's about 50% chance, depending on which of the primary or secondary path comes up first):
Invalid config (all the "passive" LUNs are exported as LUN2, active LUNs are exported as LUN1):
naa.600140501d2d1e4b812019e700000000
Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019e700000000)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}
Path Selection Policy: VMW_PSP_MRU
Path Selection Policy Device Config: Current Path=vmhba64:C0:T1:L2
Path Selection Policy Device Custom Config:
Working Paths: vmhba64:C0:T1:L2
Is USB: false
=> LUN2 (passive) is detected as Active/Optimized and used as a working path, even though all IOs to it fail.
iqn.1998-01.com.vmware:host63-32892c1c-00023d000003,iqn.2004-04.com.mpstor:ctrla:esxia1,t,1-naa.600140501d2d1e4b812019e700000000
Runtime Name: vmhba64:C0:T0:L1
Device: naa.600140501d2d1e4b812019e700000000
Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019e700000000)
Group State: active
Array Priority: 1
Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=1,RTP_health=UP}
Path Selection Policy Path Config: {non-current path; rank: 0}
iqn.1998-01.com.vmware:host63-32892c1c-00023d000004,iqn.2004-04.com.mpstor:ctrlb:esxia1,t,1-naa.600140501d2d1e4b812019e700000000
Runtime Name: vmhba64:C0:T1:L2
Device: naa.600140501d2d1e4b812019e700000000
Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019e700000000)
Group State: active
Array Priority: 1
Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=1,RTP_health=UP}
Path Selection Policy Path Config: {current path; rank: 0}
=> Both are detected as Active, Optimized with the same rank, even though one is actually standby and the other has the "preferred" bit set.
And another one that worked fine (just luckier);
naa.600140501d2d1e4b812019ca00000000
Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019ca00000000)
Storage Array Type: VMW_SATP_ALUA
Storage Array Type Device Config: {implicit_support=on; explicit_support=off; explicit_allow=on; alua_followover=on; action_OnRetryErrors=on; {TPG_id=0,TPG_state=AO}}
Path Selection Policy: VMW_PSP_MRU
Path Selection Policy Device Config: Current Path=vmhba64:C0:T14:L1
Path Selection Policy Device Custom Config:
Working Paths: vmhba64:C0:T14:L1
Is USB: false
iqn.1998-01.com.vmware:host63-32892c1c-00023d000003,iqn.2004-04.com.mpstor:ctrla:esxia0,t,1-naa.600140501d2d1e4b812019ca00000000
Runtime Name: vmhba64:C0:T14:L1
Device: naa.600140501d2d1e4b812019ca00000000
Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019ca00000000)
Group State: active
Array Priority: 1
Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=1,RTP_health=UP}
Path Selection Policy Path Config: {current path; rank: 0}
iqn.1998-01.com.vmware:host63-32892c1c-00023d000004,iqn.2004-04.com.mpstor:ctrlb:esxia0,t,1-naa.600140501d2d1e4b812019ca00000000
Runtime Name: vmhba64:C0:T15:L2
Device: naa.600140501d2d1e4b812019ca00000000
Device Display Name: MPSTOR iSCSI Disk (naa.600140501d2d1e4b812019ca00000000)
Group State: active
Array Priority: 1
Storage Array Type Path Config: {TPG_id=0,TPG_state=AO,RTP_id=1,RTP_health=UP}
Path Selection Policy Path Config: {non-current path; rank: 0}
I have to resort to making those LUNs using a VMW_PSP_FIXED policy and manually setting up which path is primary and which is secondary. This is time consuming, but doable, though it leads to other problems where ESXi will "invert" the paths or attempt to connect to the wrong path on reboot, making rebooting the ESXi node a challenge.
What can I do to make ESXi follow my Active/Standby ALUA LUNs correctly, that is always selective the path that is advertising itself as "Active/Optimised" and never attempting READ or WRITE IOs on the path advertising itself as "Standby"?
Thanks in advance for your help!