检查表以在重新启动CVM之前验证群集运行状况状态

1年前
2021年1月31日
0回复
969次观点

UserLevel 1.

史伐
Nutanix员工
13回复

当您必须执行CVMS（控制器VM）的滚动重启或管理程序主机的滚动重启或仅为CVM之一的重启时，存在实例。

这是在重新启动之前执行的健康检查列表，以验证群集运行状况。

验证是否有任何节点或服务处于“下”状态。为较小的尺寸群集运行以下命令：

Nutanix @ CVM $群集状态

如果群集包含多个节点，则运行以下命令，该命令排除了从输出中启动的服务可能更方便：

Nutanix @ CVM $群集状态|Grep -v Up.

在继续重新启动之前需要修复出意外的节点或服务。

验证Cassandra环中是否缺少或处于“向下”状态。应该存在与IP中的IP数量相同的节点svmips.输出（下面示例中的四个节点）。如果缺少节点，则意味着它已从Cassandra ring中删除：

Nutanix @ CVM $ Nodetool -H 0环

地址状态状态加载拥有令牌

kv000000msfgt0tsk22hnmeolemt9hdkonj90tfc1jprhn0przgu6vjkcwyw

x.x.x.44 UP正常19.54 GB 25.00％00000000NUJWKYP94SEGXJFIESZM6UY1NEVSENKEZD0DK4FMDYI1JFMYSKPL

x.x.x.41向上正常15.11 GB 25.00％fv000000jzybpvdrudtmjovyihbrllq1hndrxigaqzo8bybecesiewoq6ndk

x.x.x.42向上正常23.17 GB 25.00％V00000001xcxahdrxjvlkqhxcx2xj8oatux21dpzfc46jqeltupsl9wgzkmx

x.x.x.43向上正常21.34 gb 25.00％kv000000msfgt0tsk22hnmeolemt9hdkonj90tfc1jprhn0przgu6vjkcwyw

Nutanix @ CVM $ SVMIPS

x.x.x.41 x.x.x.42 x.x.x.43 x.x.x.44

运行以下命令以检查cassandra状态：

nutanix @ cvm $ ncc health_checks cassandra_checks cassandra_status_check

验证是否有任何最近的致命文件中的文件〜Nutanix / Data / Logs目录：
Nutanix @ CVM $ ls -ltr〜/ data / logs / *致命*
查看过去1小时内的任何服务致命致命，然后验证致命服务是否在“上”状态和稳定之前，然后继续重新启动。
验证是否有任何星形节点已关闭或如果HA.PY.已启用。
nutanix @ cvm $ ncc health_checks network_checks ha_py_rerouting_check
验证群集是否可以容忍单个节点故障。

Nutanix @ CVM $ NCLI群集Get-Domain-Fault-Tolerance-Status Type = node

查看任何未确认的警报及其创建时间

有关更多详细信息和命令，请查看KB：https://portal.nutanix.com/page/documents/kbs/details?targetid=ka032000000982pcaa.

There are instances when you have to perform a rolling restart of the CVMs (Controller VMs) or a rolling restart of the hypervisor hosts or a restart of just one of the CVMs.\u00a0<\/p>

This is a list of health checks to execute prior to the restart to verify cluster health.<\/p>

\u00a0<\/p>

Verify if any nodes or services are in a 'down' state. Run the following command for smaller sized clusters:<\/li><\/ul>
\u00a0<\/p>
nutanix@cvm$ cluster status<\/p>
\u00a0<\/p>
- If the cluster contains multiple nodes, running the following command which excludes services that are UP from the output may be more convenient:<\/li><\/ul>
  nutanix@cvm$ cluster status | grep -v UP<\/p>
  - Nodes or services that are unexpectedly in a 'down' state need to be fixed before proceeding with the restart.\u00a0<\/li><\/ul>
    \u00a0<\/p>
    Verify if any nodes are missing or are in a 'down' state in the Cassandra ring. There should be the same number of nodes as the number of IPs in the svmips<\/strong> output (four nodes in the example below). If a node is missing, it means it was removed from the Cassandra ring:<\/p>
    nutanix@cvm$ nodetool -h 0 ring<\/p>
    Address \u00a0 \u00a0 Status State\u00a0 Load\u00a0 \u00a0 \u00a0 \u00a0 Owns Token<\/p>
    \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 kV000000Msfgt0tSk22HNmeoLEMT9hDKoNj90Tfc1JpRHn0pRzgU6vJkCwYW<\/p>
    X.X.X.44 Up Normal 19.54 GB\u00a0 \u00a0 25.00%\u00a0 00000000NUjWKYp94sEGXJfIESzM6uY1nEVSEnkeZd0Dk4FMDYI1JFmYskpL<\/p>
    X.X.X.41 Up Normal 15.11 GB\u00a0 \u00a0 25.00%\u00a0 FV000000jZyBpvdRUdTMjOVYIhBRLlq1hNDrXIGAqzO8bYBeceSieWOQ6NdK<\/p>
    X.X.X.42 Up Normal 23.17 GB\u00a0 \u00a0 25.00%\u00a0 V00000001XCXAHdrXjVlkQHxCX2XJ8oAtUX21dPZfC46JQeltUpSL9WgZKmX<\/p>
    X.X.X.43 Up Normal 21.34 GB\u00a0 \u00a0 25.00%\u00a0 kV000000Msfgt0tSk22HNmeoLEMT9hDKoNj90Tfc1JpRHn0pRzgU6vJkCwYW<\/p>
    \u00a0<\/p>
    nutanix@cvm$ svmips<\/p>
    X.X.X.41 X.X.X.42 X.X.X.43 X.X.X.44<\/p>
    - Run the following command to check Cassandra status:<\/li><\/ul>
      nutanix@cvm$ ncc health_checks cassandra_checks cassandra_status_check<\/p>
      Verify if there are any recent FATAL <\/strong>files in the ~nutanix\/data\/logs directory:
      \tnutanix@cvm$ ls -ltr ~\/data\/logs\/*FATAL*
      \tReview any service fatal in the past 1 hour and then validate if the fatal service is in the 'up' state and stable before you proceed with the restart.
      \t\u00a0<\/li>\t
      Verify if any Stargate node is down or if ha.py<\/em> <\/strong>is enabled.
      \tnutanix@cvm$ ncc health_checks network_checks ha_py_rerouting_check<\/li>\t
      Verify if the cluster can tolerate a single node failure.<\/li><\/ul>
      nutanix@cvm$ ncli cluster get-domain-fault-tolerance-status type=node<\/strong><\/p>
      Review any unacknowledged alerts and their create time<\/li><\/ul>
      For more details and commands , please review the kb: <\/strong>https:\/\/portal.nutanix.com\/page\/documents\/kbs\/details?targetId=kA032000000982pCAA<\/strong><\/a><\/p>
      \u00a0<\/p>","quoteUsername":"ShvetaD","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">
      喜欢
      
      引用
      
      分享

此主题已关闭征询意见

由Insidedive提供动力

条款和条件

注册

已经有一个帐户？登录

使用您的帐户登录

登录社区

使用您的帐户登录

输入您的用户名或电子邮件地址。我们将向您发送一个带有说明重置密码的电子邮件。

用户名或电子邮件

回到概述

用于病毒的扫描文件。

对不起，我们仍在检查此文件的内容，以确保它安全下载。请在几分钟后再试一次。
行

无法下载此文件

对不起，我们的病毒扫描仪检测到此文件无法安全下载。
行

Learn more about our cookies.<\/a>","cookiepolicy.button":"Accept cookies","cookiepolicy.button.deny":"Deny all","cookiepolicy.link":"Cookie settings","cookiepolicy.modal.title":"Cookie settings","cookiepolicy.modal.content":"We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.<\/a>","cookiepolicy.modal.level1":"Basic
Functional","cookiepolicy.modal.level2":"Normal
Functional + analytics","cookiepolicy.modal.level3":"Complete
Functional + analytics + social media + embedded videos"}}}">