All,
This post is in reference to out error you may see in the log file "Genesis.out" under /home/nutanix/data/logs in a situation where an upgrade might be hung and not continuing with the upgrade.
This happened to me on my CE cluster which is a single node cluster where the a shutdown token is not being granted to allow the upgrade to finish..
When you run
upgrade_status on the CVM cli you will see the following for a while until you realize "hey this thing is hung"
nutanix@NTNX-d02003b1-A-CVM:192.168.1.41:~$ upgrade_status
2018-02-16 06:42:47 INFO zookeeper_session.py:110 upgrade_status is attempting to connect to Zookeeper
2018-02-16 06:42:47 INFO upgrade_status:38 Target release version: el7.3-release-ce-2018.01.31-stable-c3b9964290bf2f28799481fed5cf32f92ab3dadc
2018-02-16 06:42:47 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling upgrade
2018-02-16 06:42:47 INFO upgrade_status:96 SVM 192.168.1.41 still needs to be upgraded. Installed release version: el7.3-release-ce-2017.11.30-stable-ab2ac46f51d4745d43126c9ad1871b7314400bab, node is currently upgrading
If you see this message in
Genesis.outMaster 192.168.1.41 did not grant shutdown token to my ip 192.168.1.41, trying again in 30 seconds
Try to run the following command to grant a token.
echo -n '{"request_reason": "nos_upgrade", "request_time": 1496860678.099324, "requester_ip": "192.168.1.50"}' | zkwrite /appliance/logical/genesis/node_shutdown_token
Then tail the
Genesis.out logs for the following messages.
Failed to read zknode /appliance/logical/genesis/node_shutdown_priority_list with error 'no node'
2018-02-16 06:45:15 INFO cluster_manager.py:4224 Successfully granted token to 192.168.1.41 reason nos_upgrade
2018-02-16 06:45:15 INFO node_manager.py:2266 Finishing upgrade to version el7.3-release-ce-2018.01.31-stable-c3b9964290bf2f28799481fed5cf32f92ab3dadc, you can view progress at /home/nutanix/data/logs/finish.out
2018-02-16 06:45:17 INFO ha_service.py:959 Checking if any nodes are shutting down due to upgrade
2018-02-16 06:45:17 INFO ha_service.py:977 Node 192.168.1.41 is going down
Now when you run
upgrade_status on the CVM cli you should see that your SVM (CVM) is up to date.
nutanix@NTNX-d02003b1-A-CVM:192.168.1.41:~$ upgrade_status
2018-02-16 14:52:44 INFO zookeeper_session.py:110 upgrade_status is attempting to connect to Zookeeper
2018-02-16 14:52:44 INFO upgrade_status:38 Target release version: el7.3-release-ce-2018.01.31-stable-c3b9964290bf2f28799481fed5cf32f92ab3dadc
2018-02-16 14:52:44 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling upgrade
2018-02-16 14:52:44 INFO upgrade_status:96 SVM 192.168.1.41 is up to date
I hope this helps!
