Manually remove node from Cluster
Hello, we have had a hardware failure in one node in a cluster. I tried to remove the node through this command. ncli host rm-start id=xxxxxxx skip-space-check=true. I can see it started but it has been stuck for about a day or so. The problem I believe is that the CVM is not running and cannot be turned on because the node has completely failed and cannot even boot up. Is there a way that I can manually force it to remove through command line.
This topic has been closed for comments
How many nodes in your cluster before the failure? If three than the cluster will fight removing the node since it will go below minimum.
What was the failure? Can you do a repair or was it the SSD?
What was the failure? Can you do a repair or was it the SSD?
If your cluster had 3 nodes, you'd better first add fourth node (having a dead node still a cluster member) and only then remove the third node. If 4, then it should be doable, please check ncli task list if there are some stuck tasks, kicking them out if they have completed should finish the node removal process.
Thanks for the reply.
My cluster has 12 nodes.
There is no way I will be able to repair the hardware issue, they are SSDs. The servers will not even boot up. I have tried everything. I would like to just remove them for now so it would give me some time fix the hardware issue.
The only task that is running iss the task for removing the node. It has been stuck for a few days. The status is MARKED_FOR_REMOVAL_BUT_NOT_DETACHABLE.
Any insight on how I could force this process would be appreciated.
Thanks again..
My cluster has 12 nodes.
There is no way I will be able to repair the hardware issue, they are SSDs. The servers will not even boot up. I have tried everything. I would like to just remove them for now so it would give me some time fix the hardware issue.
The only task that is running iss the task for removing the node. It has been stuck for a few days. The status is MARKED_FOR_REMOVAL_BUT_NOT_DETACHABLE.
Any insight on how I could force this process would be appreciated.
Thanks again..
Ahh so this is commercial and not community edition. ;)
Assuming you have a support agreement on those nodes? CE is a bit different though the node eviction process is similar. Not sure of the top of my head how to tell the data re-replication status myself. I would call in myself.
Assuming you have a support agreement on those nodes? CE is a bit different though the node eviction process is similar. Not sure of the top of my head how to tell the data re-replication status myself. I would call in myself.
We don't have support on the nodes. We are probably buying new blocks during the beginning of the new year. We didn't think there would be any issues until we bought new nodes, but of course that is not the case.
Thanks for the reply, I was hoping someone on the forums had run into a similar issue..
Thanks for the reply, I was hoping someone on the forums had run into a similar issue..
So some good thoughts on this thread to detect what the status of the remove is
https://next.nutanix.com/discussion-forum-14/remove-server-from-cluster-does-not-work-15287
ncli host get-remove-status
My gut says since the node is unavailable it can't cleanly evict the node. Since the node is toast you can't foundation it to get it back into the pool. zkediting the node out I would be concerned with impacting the rest of the cluster.
https://next.nutanix.com/discussion-forum-14/remove-server-from-cluster-does-not-work-15287
ncli host get-remove-status
My gut says since the node is unavailable it can't cleanly evict the node. Since the node is toast you can't foundation it to get it back into the pool. zkediting the node out I would be concerned with impacting the rest of the cluster.
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.