All,
I am writing this topic after an experience I had onsite with a customer recently.
The Setup> Pure Chassis 1 x240
> 2 - SI4093 Switches - no upgrade
> 1 - FC5022 SAN switch
The IssueUpon installation of the OS on the x240 we noticed the NIC was "disconnected". When we checked the interfaces we saw the below.
IBMBC2-E1#sh interface status
-----------------------------------------------------------------------
Alias Port Speed Duplex Flow Ctrl Link Description
------- ---- ----- -------- --TX-----RX-- ------ -------------
INTA1 1 1G/10G full no no disabled INTA1
INTA2 2 1G/10G full no no disabled INTA2
INTA3 3 1G/10G full no no disabled INTA3
INTA4 4 1G/10G full no no disabled INTA4
INTA5 5 1G/10G full no no disabled INTA5
INTA6 6 1G/10G full no no disabled INTA6
INTA7 7 1G/10G full no no disabled INTA7
INTA8 8 1G/10G full no no disabled INTA8
INTA9 9 1G/10G full no no disabled INTA9
INTA10 10 1G/10G full no no disabled INTA10
INTA11 11 1G/10G full no no disabled INTA11
INTA12 12 1G/10G full no no disabled INTA12
INTA13 13 1G/10G full no no disabled INTA13
INTA14 14 1G/10G full no no disabled INTA14
EXT1 43 10000 full no no up 2-CISCO-Te9-7
EXT2 44 1G/10G full no no down EXT2
EXT3 45 1G/10G full no no down EXT3
EXT4 46 1G/10G full no no down EXT4
EXT5 47 1G/10G full no no down EXT5
EXT6 48 1G/10G full no no down EXT6
EXT7 49 1G/10G full no no down EXT7
EXT8 50 1G/10G full no no down EXT8
EXT9 51 1G/10G full no no down EXT9
EXT10 52 1G/10G full no no down EXT10
EXTM 65 any auto yes yes down EXTM
MGT1 66 1000 full no no up MGT1
The Solution:We determined the issue was with the SI4093s feature called
Failover Monitoring (there is a link to the SI4093 7.8 admin guide below). This is the recommendation from IBM development team to alleviate this problem.
The SPAR definition for SPAR 1 (default) is configured to monitor an LACP aggregation group and from the switch log,
The upstream switch for EXT1 is not configured for LACP:
SPAR 1 definition
spar 1
uplink adminkey 1000
domain default member INTA1-INTA14
enable
exit
Switch log indicating LACP status:
Nov 12 12:19:52 IBMBC2-E1 NOTICE link: link up on port EXT1
Nov 12 12:19:52 IBMBC2-E1 WARNING failover: Trigger 1 is down, control ports are auto disabled.
Nov 12 12:19:52 IBMBC2-E1 NOTICE lacp: LACP is down on port EXT1
Nov 12 12:19:54 IBMBC2-E1 NOTICE lacp: LACP is suspended on port for not receiving any LACPDUs
In the trigger, the configuration is as follows:
Failover Info: Trigger
Current global Failover setting: OFF
Current global VLAN Monitor settings: OFF
Current Trigger 1 setting: enabled
limit 5
Auto Monitor settings:
Manual Monitor settings:
Manual Monitor settings:
LACP port adminkey 1000
Manual Control settings:
ports INTA1-INTA14
The Monitor is then triggered by the state of LACP group adminkey 1000, and as LACP is in down/suspended state,
The Control ports are disabled.
(Note: if LACP were active, the Limit of 5 would also cause the trigger.
From the SI4093 Applications Guide:
The failover limit lets you specify the minimum number of operational links required within each trigger before the trigger initiates a failover event.
For example, if the limit is two, a failover event occurs when the number of operational links in the trigger is two or fewer.
When you set the limit to zero (the default for each trigger),
the SI4093 initiates a failover event only when no links in the trigger are operational.
** The above would apply if LACP was active on the customer uplinks **
Please verify whether the switch on the uplink is LACP capable and whether configuring the uplink switch is an option.
It is also possible to reconfigure the SI4093 without LACP as the aggregation.
** Very important note from the Applications Guide **
** to prevent network loop **
Each SPAR must include one or more internal server ports and one or more external uplink ports.
However, if multiple external ports are to be included in a particular SPAR, they must first be configured as a Link Aggregation Group (LAG), thus operating together as a single logical port connected to the same upstream network entity. Any given SPAR cannot include multiple, independent (non-LAG) uplinkports.
Each internal or external port can be member of only one SPAR at any given time.
Because the SI4093 does not permit any SPAR to include multiple non-LAG uplink ports, the possibility of creating a broadcast loop is eliminated.
Please see document at url following (SPAR overview, pg 85):
http://pic.dhe.ibm.com/infocenter/flexsys/information/topic/com.ibm.acc.networkdevices.doc/00cg964.pdf IBM Flex System Fabric SI4093 System Interconnect Module
Application Guide for Networking OS 7.8
Summary: The default failover trigger is invoked because the SPAR is defined for an LACP aggregation, and the EXT 1 uplink is not active to the uplink switch on LACP protocol.
Please verify if you have LACP defined on the uplink switch port for the switch EXT 1 is connected to.
It is possible to modify the SI4093 for different uplink configurations and to reflect this change in failover to disable internal links for teaming
Please review the SI4093 documentation prior to changing, as the configuration of uplinks from a SPAR must follow recommendations to prevent network loop