Question

Manually remove node from Cluster

  • 20 November 2018
  • 6 replies
  • 12 views

Hello, we have had a hardware failure in one node in a cluster. I tried to remove the node through this command. ncli host rm-start id=xxxxxxx skip-space-check=true. I can see it started but it has been stuck for about a day or so. The problem I believe is that the CVM is not running and cannot be turned on because the node has completely failed and cannot even boot up. Is there a way that I can manually force it to remove through command line.

This topic has been closed for comments

6 replies

Userlevel 7
Badge +25
How many nodes in your cluster before the failure? If three than the cluster will fight removing the node since it will go below minimum.

What was the failure? Can you do a repair or was it the SSD?
Userlevel 1
Badge +3
If your cluster had 3 nodes, you'd better first add fourth node (having a dead node still a cluster member) and only then remove the third node. If 4, then it should be doable, please check ncli task list if there are some stuck tasks, kicking them out if they have completed should finish the node removal process.
Thanks for the reply.

My cluster has 12 nodes.

There is no way I will be able to repair the hardware issue, they are SSDs. The servers will not even boot up. I have tried everything. I would like to just remove them for now so it would give me some time fix the hardware issue.

The only task that is running iss the task for removing the node. It has been stuck for a few days. The status is MARKED_FOR_REMOVAL_BUT_NOT_DETACHABLE.

Any insight on how I could force this process would be appreciated.

Thanks again..
Userlevel 7
Badge +25
Ahh so this is commercial and not community edition. ;)

Assuming you have a support agreement on those nodes? CE is a bit different though the node eviction process is similar. Not sure of the top of my head how to tell the data re-replication status myself. I would call in myself.
We don't have support on the nodes. We are probably buying new blocks during the beginning of the new year. We didn't think there would be any issues until we bought new nodes, but of course that is not the case.

Thanks for the reply, I was hoping someone on the forums had run into a similar issue..
Userlevel 7
Badge +25
So some good thoughts on this thread to detect what the status of the remove is

https://next.nutanix.com/discussion-forum-14/remove-server-from-cluster-does-not-work-15287

ncli host get-remove-status

My gut says since the node is unavailable it can't cleanly evict the node. Since the node is toast you can't foundation it to get it back into the pool. zkediting the node out I would be concerned with impacting the rest of the cluster.
Baidu