解决了

磁盘故障tollerance

  • 2016年3月25日
  • 3回复
  • 4402的浏览量

徽章 +3
你好,我对Nutanix技术印象深刻!

但我有一个问题,如果我有一个块与三个节点的例子(基本启动),配置RF2,我已经明白,我可以失去1个节点,我的所有系统仍在运行。

如果我在同一个节点中丢失了磁盘,也可以让我继续正常运行,但是如果我同时在不同的节点上丢失了两个磁盘,会发生什么呢?

我明白,在这种情况下,我丢失的数据是相同的两个丢失的磁盘,所以在这种情况下,所有mi基础设施会下降?

也许我遗漏了什么…
图标

最佳答案乔恩2016年3月31日,10:47

\n
\nNutanix does not use RAID to protect data, we store data in a \"Replication Factor\", which stores individual blocks of data in a redundant fashion across two or more nodes in a cluster (i.e RF2 two copies or RF3, three copies).
\n
\n
\nIf you have a drive fail, let's say it was a 1TB drive but only 200GB full.
\n
\nFor the sake of easy math, let's maintain a three node cluster.
\n
\nThat means (roughly) 200GB of information was on that disk, and approximately 200GB of information is spread across on all of the disks in the other two nodes, roughly 100GB per node
\n
\nIn a traditional storage system, you'd have to:
\nRebuild an entire 4TB Drive map, on to a hot spare (idle drive) within the system, regardless of data
\nRebuild that data parity from the \"RAID Pack\" the drive failed from, which trashes performance of that RAID pack and other workloads on it, and takes forever to do the operation.
\n
\nIn Nutanix, No Raid, so you only have to rebuild\/reprotect 200GB worth of information, instead of 1TB. Also, that 200GB is spread out across the entire cluster, so all disks and nodes participate in the rebuild, spreading out the rebuild task, and making it very low impact on the cluster and performance (if at all).
\n
\n
\nThe end result?
\nDrives fail, and rebuilds happen very quickly, as the rebuild eats into the free capacity of the cluster. This means no idle\/wasted hot spares.
\nThis means that the data is re-protected faster, so the likely hood of that second drive failure taking out data is minimized (not zero, but minimized).
\n
\n
\nIf you are concerned with dual disk failure, which some customers are for business critical operations, you'd want to go with an RF3 setup, which is basically N+2, so you can have any two components fail without worry.
\n
\n
\nAnyhow, you can read more about cluster resiliency in the nutanix bible:
\nhttp:\/\/nutanixbible.com\/<\/a>","className":"post__content__best_answer"}">
查看原始

本主题已关闭供评论

3回复

Userlevel 6
徽章 + 29
没错,但与传统存储系统不同的是,这很少是一个问题,以下是高层次的“原因”:

Nutanix不使用RAID来保护数据,我们将数据存储在“复制因子”中,它以冗余的方式在集群中的两个或多个节点上存储单个数据块(即RF2两个副本或RF3三个副本)。


如果您有一个驱动器故障,让我们假设它是一个1TB的驱动器,但只有200GB的容量。

为了简单的计算,让我们维护一个三个节点的集群。

这意味着该磁盘上(大约)有200GB的信息,另外两个节点中的所有磁盘上大约有200GB的信息,每个节点大约有100GB的信息

在传统的存储系统中,你必须:
在系统内的热备盘(空闲盘)上重建整个4TB Drive映射,而不考虑数据
从硬盘失败的“RAID包”重建数据奇偶校验,这会降低RAID包和其他工作负载的性能,并且需要很长时间才能完成该操作。

在Nutanix中,没有Raid,所以你只需要重建/重新保护200GB的信息,而不是1TB。此外,这200GB分布在整个集群中,因此所有磁盘和节点都参与重建,从而分散重建任务,使其对集群和性能的影响非常低(如果有的话)。


最终的结果吗?
驱动器出现故障,并且重构发生得非常快,因为重构会消耗集群的空闲容量。这意味着没有空闲/浪费的热备份。
这意味着数据被更快地重新保护,因此第二次驱动器故障导致数据丢失的可能性被最小化(不是零,而是最小化)。


如果您关心双磁盘故障(一些客户用于业务关键操作),那么您应该使用RF3设置(基本上是N+2),这样您就可以放心地让任意两个组件出现故障。


无论如何,你可以在nutanix圣经中阅读更多关于集群弹性的内容:
http://nutanixbible.com/
徽章 +3
非常感谢您详尽的回答。

关于那件事,我只想知道这些。

谢谢你!

当RF2集群中的虚拟机在两个不同的主机上出现一个驱动器故障时,会发生什么?是否会有数据丢失?vm重启吗?

Baidu