A customer from Taiwan report a network issue as below:
- ESXi Host A & B
- High Availability (HA) configured
- Last Friday, Host A suddenly down, and all VMs being vMotion to Host B with through HA triggered.
- Before this incident, network intermittently down and he need to restart the ESXi Host to resolve the issue
- Server configured with 2x Dual Intel PCI-e NIC, with total 8 NICs from the server
- Network Teaming configured for some of the NICs
Customer throw me a challenging request as below:
"...I believe there isn't any hardware issue from the PowerEdge R710, please help me escalate to VMWare for solution..."
As usual, is my responsibility to let customer know Dell currently still able to support/resolve break-fix issue only. For 3rd party software relevant issue, we are able to support with best-effort only. Lucky enough that I'm able to convince customer to wait for the outcome of my short research.
I spend 2 to almost 3 hours looking for solution/workaround from VMWare.com, and finally I get the piece of useful information as below:
Network connectivity failure and system crash while performing control operations on physical NICs
In some cases, when multiple X-Frame II s2io NICs are sharing the same pci-x bus, performing control operations, such as changing the MTU, on the physical NIC causes a loss of network connectivity and a system crash.
Workaround: Avoid having multiple X-Frame II s2io NICs in slots that share the same pci-x bus. In situations where such a configuration is necessary, avoid performing control operations on the physical NICs while virtual machines are doing network I/O.
From the above statements, clearly to say that issue occurred due to Network Teaming being configured (contained control operations), and cause the ESXi Host A "crash", then trigger HA.
Reference:
* TIPS: Always check the known-issues from VMWare release notes, which can help to get the answer/solution or workaround faster.
0 comments:
Post a Comment