*
News: SMF - Just Installed!


Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - PureSystemsTech

Pages: [1] 2 3 ... 21
1) Nutanix / NOS Upgrade: did not grant shutdown token
« on: February 16, 2018, 09:55:01 AM »
All,

This post is in reference to out error you may see in the log file "Genesis.out" under /home/nutanix/data/logs in a situation where an upgrade might be hung and not continuing with the upgrade.

This happened to me on my CE cluster which is a single node cluster where the a shutdown token is not being granted to allow the upgrade to finish..

When you run upgrade_status on the CVM cli you will see the following for a while until you realize "hey this thing is hung"

Quote
nutanix@NTNX-d02003b1-A-CVM:192.168.1.41:~$ upgrade_status
2018-02-16 06:42:47 INFO zookeeper_session.py:110 upgrade_status is attempting to connect to Zookeeper
2018-02-16 06:42:47 INFO upgrade_status:38 Target release version: el7.3-release-ce-2018.01.31-stable-c3b9964290bf2f28799481fed5cf32f92ab3dadc
2018-02-16 06:42:47 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling upgrade
2018-02-16 06:42:47 INFO upgrade_status:96 SVM 192.168.1.41 still needs to be upgraded. Installed release version: el7.3-release-ce-2017.11.30-stable-ab2ac46f51d4745d43126c9ad1871b7314400bab, node is currently upgrading

If you see this message in Genesis.out

Quote
Master 192.168.1.41 did not grant shutdown token to my ip 192.168.1.41, trying again in 30 seconds

Try to run the following command to grant a token.

Quote
echo -n '{"request_reason": "nos_upgrade", "request_time": 1496860678.099324, "requester_ip": "192.168.1.50"}' | zkwrite /appliance/logical/genesis/node_shutdown_token

Then tail the Genesis.out logs for the following messages.

Quote
Failed to read zknode /appliance/logical/genesis/node_shutdown_priority_list with error 'no node'
2018-02-16 06:45:15 INFO cluster_manager.py:4224 Successfully granted token to 192.168.1.41 reason nos_upgrade
2018-02-16 06:45:15 INFO node_manager.py:2266 Finishing upgrade to version el7.3-release-ce-2018.01.31-stable-c3b9964290bf2f28799481fed5cf32f92ab3dadc, you can view progress at /home/nutanix/data/logs/finish.out
2018-02-16 06:45:17 INFO ha_service.py:959 Checking if any nodes are shutting down due to upgrade
2018-02-16 06:45:17 INFO ha_service.py:977 Node 192.168.1.41 is going down

Now when you run upgrade_status on the CVM cli you should see that your SVM (CVM) is up to date.

Quote
nutanix@NTNX-d02003b1-A-CVM:192.168.1.41:~$ upgrade_status
2018-02-16 14:52:44 INFO zookeeper_session.py:110 upgrade_status is attempting to connect to Zookeeper
2018-02-16 14:52:44 INFO upgrade_status:38 Target release version: el7.3-release-ce-2018.01.31-stable-c3b9964290bf2f28799481fed5cf32f92ab3dadc
2018-02-16 14:52:44 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling upgrade
2018-02-16 14:52:44 INFO upgrade_status:96 SVM 192.168.1.41 is up to date

I hope this helps! :)

Hi Aymen,

No there would be no production impact. You would still be able to access your nodes imms through the CMM as long as the CMM can still see them. All internal management communication goes through the CMM that includes switches as well as imms. Unmanaging the chassis from the FSM will only disable any management activities from the FSM alone while the chassis is unmanaged.

Thanks!

Hi Aymen,

My best suggestion is to upgrade all of your compute node firmware using Bootable Media Creator. Once upgraded, verify your CMM is the latest. Then from the FSM 'unmanage' the chassis and then follow the command line method for upgrading your firmware to the latest version. Once the upgrade is successful try to remanage the chassis.

Please let us know your progress.

Thanks!

Hi Aymen,

Are you able to see the compute nodes from the CMM? If not, have you upgraded the compute node firmware?

Thanks,

Hi Aymen,

Good afternoon, thank you for your post. I would advise trying to make sure you can upgrade the fsm. Try to remove all updates from the FSM database by running 'smcli cleanupd - mva' and then reboot. Also make sure the fsm version you are trying to upgrade to is a compatible upgrade path from your current version. If the problem persists please reply with a screenshot if the error.


Thanks!

6) VMware Support / Re: Packet Drops on DOWN VMNIC
« on: October 12, 2016, 11:27:42 AM »
Hi Stephen,

Good morning, thanks for the question. No unfortunately we never got a solid answer on what this was happening. We believe it was a hardware/firmware issue with the Emulex adapter but neither IBM or VMware took credit for being the root cause of the problem. We began to ignore the issue since we know those VMnics are down.

7) Chassis Networking / Re: vPC creation between Cisco N7K and SI4093
« on: September 08, 2016, 04:40:14 PM »
Hi Farhan,

Thank you for your post. Here is a document for the users guide for the SI4093, in it there is a discussion about setting up a trunk.

http://publib.boulder.ibm.com/infocenter/flexsys/information/topic/com.lenovo.acc.si4093pt.doc/00cg189.pdf

From the document:
"Trunk groups are also useful for connecting a SI4093 to IBM switches and
third-party devices that support link aggregation, such as Cisco routers and
switches with EtherChannel technology (not ISL trunking technology) and Sun's
Quad Fast Ethernet Adapter."


Since this switches are essentially passthrough switches where you create a SPAR (which is essentially a logical configuration combined INT or internal ports to EXT or external ports) there does not appear to be a way to "stack" these switches since they have little to no intelligence built in. If you would like to stack the switches that would require the EN4093 switch which has an almost identical command set and functionality set to Cisco when you configure it to use the ISCLI (Industry Standard Command Line Interface).

Keep in mind also that the SI4093 can be tricky to configure although the marketing material would tell you otherwise. I have seen this switch deployed at only one customer, the rest have had the EN4093. I do not want to deter you from your solution but simply warn you that it may not be as easy as it seems. There are a few posts on issues that have been seen here in the MyPureSupport community so if you do hit a snag do a search and see if you find some answers, otherwise post again. :)

Thanks!

Hi Mathew,

Yes this has happened to us as well after running for about a year and a half we suddenly lost the whole server.

Thanks,

Hi Eric,

Do you know if any maintenance activities (upgrades, adding LDAP, etc..) were performed on the FSM during the time ranger you reference?

Thanks,

10) Flex System Manager (FSM) / Re: [SOLVED] Wizard wont start after recovery
« on: January 25, 2016, 03:52:13 PM »
AlessandroFazenda,

Thank you very much for following up and being a valued contributor for MPS. Glad to hear everything is working as expected.

Thanks!


11) Flex System Manager (FSM) / Re: Wizard wont start after recovery
« on: January 04, 2016, 02:31:01 PM »
Interesting. So its hanging on the RHEL login instead of automatically booting into the FSM software. Since you already rebuilt it 2 or 3 times I don't want to suggest doing this again but it does sound like there may have been an issue when the recovery rebuild the Recovery Partition. Did you try to download the newest recovery media? It should be 1.3.4.

I've had issues in the past rebuilding from scratch but never associated to the F12 Recovery Partition, it was always due to the USB CD-ROM device I was using. If you got past those steps successfully and are sure your RAID arrays are set to 'Boot' and the HDD is set to 'ALT' then I'm not sure what it could be short of corrupted installation media.

Is there an MD5 checksum file you received with the recovery media that you can check to make sure that the downloaded media has the same MD5 sum as the file from IBM?

12) Flex System Manager (FSM) / Re: Wizard wont start after recovery
« on: January 04, 2016, 12:50:35 PM »
Hi Alessandro,

Can you attach a screenshot of what the FSM shows after it is completely booted up?

Thanks!

13) Chassis Networking / Re: x240- EN2024 - EN2092 and Cisco 3560
« on: December 03, 2015, 09:09:38 AM »
Hi Afrugone,

It looks like you may need to add the 116 VLAN to the Cisco GigabitEthernet0/7? Also, can you attached your INTA1 switchport config?

For another reference take a look at the PDF at the link below titled "Deploying IBM Pureflex into a Cisco Network".

https://lenovopress.com/redp4901.pdf

Thanks!

14) Chassis Networking / Re: x240- EN2024 - EN2092 and Cisco 3560
« on: December 01, 2015, 07:29:24 PM »
Afrugone,

If you want multiple VLANs to pass to the chassis then, yes, you would need to configure EXTA1 as a trunk port. You could leave INTA1 at VLAN 1 but if you want that server to only have access to the production VLAN of 116 then INTA1 would need to have 'switchport access vlan 116' defined.

Your running-config should include the following.
!
interface port INTA1
        switchport access vlan 116
        exit
!
interface port EXT1
        switchport mode trunk
        switchport trunk allowed vlan 116
        switchport trunk native vlan 116
        exit
!   
vlan 116
        name "Production VLAN"
!     
spanning-tree stp <STP group #> vlan 116
!
 
If you are configuring both switches to the same switch then you would need to be sure and configure Spanning Tree to avoid loops in the network.

Thanks!

15) Chassis Networking / Re: New Network Switch installation
« on: December 01, 2015, 07:15:52 PM »
Aliimran,

In order to assist please provide the output of the command 'show logging' (in ISCLI mode).

Also, have you seen this article on MPS? It looks like you may have a similar issue where an LACP trunk group may have failover enabled where if all links in the trunk are not active it will set the internal ports to disabled.

http://mypuresupport.org/PureSystemsForum/index.php?topic=110.msg516#msg516

Let me know if this gets you somewhere.

Thanks!

Pages: [1] 2 3 ... 21