解决了

远程站点复制调整

3年前
2018年5月29日
6个答复
1966年的观点

UserLevel 2

企鹅
开拓者
10个答复

Nutanix中有哪些选项可以加快远程站点的复制？我能够在单个流上获得的最好的方法是约20Mbps（160Mbps），总速度约为60Mbps（480Mbps），分布在几个流上。但是，我们在网站之间有一个10Gbps的链接，因此我希望获得更好的吞吐量。

图标

最好的答案企鹅2018年6月8日，16：35

@dlink7<\/user-mention> for the good info on expected performance under ideal situations. That does give me a good baseline.
\r\n
\r\nOur issue is definitely latency. Our remote site is across the continental US, and our per VM size is 30TB (10TB used) on a single disk. as you can imagine, this creates a bottleneck for us. Replications with default settings are not meeting RPO.
\r\n
\r\nOur AOS version is 5.5.2.
\r\n
\r\nI have found (with help from support) these tunable settings within the cluster:
\r\n
\r\n
\r\n

code:<\/b>
nutanix@NTNX:~$ python \/home\/nutanix\/serviceability\/bin\/edit-aos-gflags | grep stargate_
2018-06-08 10:00:09 INFO zookeeper_session.py:110 edit-aos-gflags is attempting to connect to Zookeeper
stargate_cerebro_replication_max_rpc_vblocks = 16 #default 4
stargate_cerebro_replication_max_rpc_data = 4194304 #default 1048576
stargate_cerebro_max_outstanding_vdisk_replication_rpcs = 16 #default 4
stargate_cerebro_replication_param_multiplier = 32 #default 16
stargate_vdisk_read_extents_max_outstanding_egroup_reads = 6 #default 3
<\/pre><\/div>
\r\n
\r\nI am putting together a helpful howto on how to tune these parameters, but the gist of it is editing with...
\r\n
\r\n
code:<\/b>
nutanix@NTNX:~$ python \/home\/nutanix\/serviceability\/bin\/edit-aos-gflags --service=stargate
<\/pre><\/div>
\r\n
\r\n...then restarting stargate on each cvm.
\r\n
\r\nThe restarts went perfectly with a sleep 60 in there, we had no failure on running replications, and our throughput went up ~4x as expected. our 12 day running replications began to finish.
\r\n
\r\nmy ultimate solution is two fold. first, the above settings to accelerate each stream. second, we are working to break out our 30TB disk in to 10x 3TB disks.
\r\n
\r\nRE: going off stock; I think that as nutanix grows in to more environments (and they will, the technology is amazing and i foresee continued adoption) there are going to be many more flavors of environments that sticking to stock options just wont solve. I expect in the next few years that nutanix will pivot in to a more open position. I'd love to see some of these settings within reach of the prism GUI along with some better man, info pages on what each individual setting and switch does.
\r\n
\r\nThat being said, your warning about going off stock is noted and appreciated. i understand that nutanix attempts to tune AOS as well as possible out of the box, and that tuning these settings can have an impact (sometimes bad, sometimes good). Therefore, I will only seek a behavior change in the technology to achieve some desired result.","className":"post__content__best_answer"}">
查看原件

喜欢

引用

分享

该主题已关闭以供评论

6个答复

最古老的第一

新的先来最佳投票

UserLevel 4

+19

dlink7

主持人

87个答复

3年前
2018年6月5日

嗨，您正在摇摆什么AOS版本？

在雅典卫城，每个节点都可以复制四个文件，一次总计100 mb/s。因此，在四节点配置中，群集可以复制400 MB/s或3.2 GB/s。

什么会导致这个数字下降？群集上的其他任务，例如策展人如果策划链链的延迟太高和不良的延迟。

如果您仍在遇到RPO，我不会更改任何内容。下车将稍后再困扰您。

\r\n
\r\nIn Acropolis, every node can replicate four files, up to an aggregate of 100 MB\/s at one time. Thus, in a four-node configuration, the cluster can replicate 400 MB\/s or 3.2 Gb\/s.
\r\n
\r\nWhat can cause this number to go down? Other tasks on the cluster, like curator running if they snapchain is too high and bad latency.
\r\n
\r\nIf your still meeting your RPO I wouldn't change anything. Going off stock will come back to haunt you later.","quoteUsername":"dlink7","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

UserLevel 7

+34

阿卢西亚尼

更

334个答复

3年前
2018年6月5日

你好 @penguindows

你看到的答复了吗 @dlink7

在Twitter上关注我：https：//twitter.com/angeloluciani

@penguindows<\/user-mention>
\r\n
\r\nDid you see the reply from @dlink7<\/user-mention>","quoteUsername":"aluciani","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

UserLevel 2

+5

企鹅

作者

开拓者

10个答复

3年前
2018年6月8日

回答

是的，我看到dlink7，答复。谢谢 @dlink7有关理想情况下预期性能的好信息。这确实给了我一个很好的基准。

我们的问题绝对是延迟。我们的远程站点遍布美国大陆，我们的每VM大小为单个磁盘上的30TB（使用10TB）。可以想象，这为我们创造了瓶颈。具有默认设置的复制不符合RPO。

我们的AOS版本为5.5.2。

我已经（在支持的帮助下）在集群中发现了这些可调设置：

代码：
nutanix@ntnx：〜$ python/home/nutanix/serviceability/bin/edit-aos-gflags |grep stargate_
2018-06-08 10:00:09信息Zookeeper_session.py:110 Edit-Aos-Gflags正在尝试连接到Zookeeper
stargate_cerebro_replication_max_rpc_vblocks = 16 #default 4
stargate_cerebro_replication_max_rpc_data = 4194304 #default 1048576
stargate_cerebro_max_outantate_vdisk_replication_rpcs = 16 #default 4
stargate_cerebro_replication_param_multiplier = 32 #default 16
stargate_vdisk_read_extents_max_outantate_egroup_egroup_reads = 6 #default 3

我正在为如何调整这些参数提供一个有益的方法，但是它的要点正在编辑...

代码：
nutanix@ntnx：〜$ python/home/nutanix/serviceability/bin/edit-aos-gflags -service = stargate

...然后在每个CVM上重新启动星际之门。

重新启动时，那里有60个睡眠，我们在运行复制方面没有失败，我们的吞吐量按预期升高了约4倍。我们的12天运行复制开始完成。

我的最终解决方案是两倍。首先，以上设置加速了每个流。其次，我们正在努力将30TB磁盘分解为10x 3TB磁盘。

回复：库存；我认为，随着Nutanix的发展到更多的环境（它们将会，这项技术是惊人的，我预见的是继续采用），将会有更多的环境风味，这些环境会坚持使用库存期权。我希望在接下来的几年中，Nutanix将转移到更开放的位置。我很想在Prism GUI以及一些更好的男人以及每个单独的设置和切换所做的信息页面上看到其中的一些设置。

话虽这么说，您警告说，并感谢您的警告。我知道Nutanix试图从开箱即用的情况下调整AOS以及可能的调整，并且调整这些设置可能会产生影响（有时是不好的，有时是好）。因此，我只会寻求技术改变行为，以实现一些理想的结果。

@dlink7<\/user-mention> for the good info on expected performance under ideal situations. That does give me a good baseline.
\r\n
\r\nOur issue is definitely latency. Our remote site is across the continental US, and our per VM size is 30TB (10TB used) on a single disk. as you can imagine, this creates a bottleneck for us. Replications with default settings are not meeting RPO.
\r\n
\r\nOur AOS version is 5.5.2.
\r\n
\r\nI have found (with help from support) these tunable settings within the cluster:
\r\n
\r\n
\r\n
code:<\/b>
nutanix@NTNX:~$ python \/home\/nutanix\/serviceability\/bin\/edit-aos-gflags | grep stargate_
2018-06-08 10:00:09 INFO zookeeper_session.py:110 edit-aos-gflags is attempting to connect to Zookeeper
stargate_cerebro_replication_max_rpc_vblocks = 16 #default 4
stargate_cerebro_replication_max_rpc_data = 4194304 #default 1048576
stargate_cerebro_max_outstanding_vdisk_replication_rpcs = 16 #default 4
stargate_cerebro_replication_param_multiplier = 32 #default 16
stargate_vdisk_read_extents_max_outstanding_egroup_reads = 6 #default 3
<\/pre><\/div>
\r\n
\r\nI am putting together a helpful howto on how to tune these parameters, but the gist of it is editing with...
\r\n
\r\n
code:<\/b>
nutanix@NTNX:~$ python \/home\/nutanix\/serviceability\/bin\/edit-aos-gflags --service=stargate
<\/pre><\/div>
\r\n
\r\n...then restarting stargate on each cvm.
\r\n
\r\nThe restarts went perfectly with a sleep 60 in there, we had no failure on running replications, and our throughput went up ~4x as expected. our 12 day running replications began to finish.
\r\n
\r\nmy ultimate solution is two fold. first, the above settings to accelerate each stream. second, we are working to break out our 30TB disk in to 10x 3TB disks.
\r\n
\r\nRE: going off stock; I think that as nutanix grows in to more environments (and they will, the technology is amazing and i foresee continued adoption) there are going to be many more flavors of environments that sticking to stock options just wont solve. I expect in the next few years that nutanix will pivot in to a more open position. I'd love to see some of these settings within reach of the prism GUI along with some better man, info pages on what each individual setting and switch does.
\r\n
\r\nThat being said, your warning about going off stock is noted and appreciated. i understand that nutanix attempts to tune AOS as well as possible out of the box, and that tuning these settings can have an impact (sometimes bad, sometimes good). Therefore, I will only seek a behavior change in the technology to achieve some desired result.","quoteUsername":"penguindows","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

UserLevel 4

+19

dlink7

主持人

87个答复

3年前
2018年6月8日

您会很高兴知道Eng已经实现了使这些设置更具动态性的项目。很高兴您分类了。

喜欢

引用

Bradley4681

旅行者

1回复

3年前
2018年10月16日

是的，我看到dlink7，答复。谢谢 @dlink7有关理想情况下预期性能的好信息。这确实给了我一个很好的基准。

我们的问题绝对是延迟。我们的远程站点遍布美国大陆，我们的每VM大小为单个磁盘上的30TB（使用10TB）。可以想象，这为我们创造了瓶颈。具有默认设置的复制不符合RPO。

我们的AOS版本为5.5.2。

我已经（在支持的帮助下）在集群中发现了这些可调设置：

代码：
nutanix@ntnx：〜$ python/home/nutanix/serviceability/bin/edit-aos-gflags |grep stargate_
2018-06-08 10
：00：09信息Zookeeper_session.py:110 Edit-Aos-Gflags正在尝试连接到Zookeeper
stargate_cerebro_replication_max_rpc_vblocks = 16 #default 4
stargate_cerebro_replication_max_rpc_data = 4194304 #default 1048576
stargate_cerebro_max_outantate_vdisk_replication_rpcs = 16 #default 4
stargate_cerebro_replication_param_multiplier = 32 #default 16
stargate_vdisk_read_extents_max_outantate_egroup_egroup_reads = 6 #default 3

我正在为如何调整这些参数提供一个有益的方法，但是它的要点正在编辑...

代码：
nutanix@ntnx：〜$ python/home/nutanix/serviceability/bin/edit-aos-gflags -service = stargate

...然后在每个CVM上重新启动星际之门。

您如何进行这些更改，或者您在哪里创建的方式？谢谢！

Yes, I see dlink7,s reply. Thanks @dlink7<\/user-mention> for the good info on expected performance under ideal situations. That does give me a good baseline.
\r\n
\r\nOur issue is definitely latency. Our remote site is across the continental US, and our per VM size is 30TB (10TB used) on a single disk. as you can imagine, this creates a bottleneck for us. Replications with default settings are not meeting RPO.
\r\n
\r\nOur AOS version is 5.5.2.
\r\n
\r\nI have found (with help from support) these tunable settings within the cluster:
\r\n
\r\n
\r\n
code:<\/b>
nutanix@NTNX:~$ python \/home\/nutanix\/serviceability\/bin\/edit-aos-gflags | grep stargate_
2018-06-08 10
:00:09 INFO zookeeper_session.py:110 edit-aos-gflags is attempting to connect to Zookeeper
stargate_cerebro_replication_max_rpc_vblocks = 16 #default 4
stargate_cerebro_replication_max_rpc_data = 4194304 #default 1048576
stargate_cerebro_max_outstanding_vdisk_replication_rpcs = 16 #default 4
stargate_cerebro_replication_param_multiplier = 32 #default 16
stargate_vdisk_read_extents_max_outstanding_egroup_reads = 6 #default 3

<\/pre><\/div>
\r\n
\r\nI am putting together a helpful howto on how to tune these parameters, but the gist of it is editing with...
\r\n
\r\n
code:<\/b>
nutanix@NTNX:~$ python \/home\/nutanix\/serviceability\/bin\/edit-aos-gflags --service=stargate

<\/pre><\/div>
\r\n
\r\n...then restarting stargate on each cvm.
\r\n
\r\n<\/content-quote>
\r\nHow did you make these changes or where is the how to you were creating? Thanks!","quoteUsername":"Bradley4681","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

UserLevel 2

+5

企鹅

作者

开拓者

10个答复

3年前
2018年10月16日

@bradley4681当您运行时：

代码：
nutanix@ntnx：〜$ python/home/nutanix/serviceability/bin/edit-aos-gflags -service = stargate

您输入具有所有星际之门Service Glfags的文本编辑器。在这里，您可以编辑设置以获得不同的结果。

我调整了为我工作的设置，基本上，每VDisk的流数量增加，线上允许的数据饱和量以及Stargate对复制处理的优先级。这使我能够克服我十字管道上的高纬度。

您可以在引用的块中找到我的特定设置调整。另外，您可能需要至少重新开始创世纪。

值得注意的是，关于我的环境的一些事情使这种设置的结合起作用而没有任何负面影响：

同质图像：整个集群中的每个客人都在做相同类型的工作，大小相同，并且时间表相同。这意味着我可以应用群集更改而不会影响奇数球系统。

低计算需求：我有6个来宾在8个节点群集中。基本上，我有能力为Nutanix服务提供大量电力，而不是为访客计算提供服务，因为我的客人没有很高的计算需求。

计划的工作负载：这里的客人是备用媒体服务器。他们有一个“繁忙”的时间表，在这里为客人提供IO很重要，并且可以保留IO用于Nutanix服务时“慢”的时间表。基本上，我们在晚上和白天更少备份。

高纬度，但带宽高。我越野数据中心之间的管道为2 x 10Gbps。我们有一个QoS策略，该策略将工作负载限制在此集群的5GBPS上，但这仍然是一个相当宽的管道。延迟是74ms。

回复：指南：令我尴尬的是，我的指南仅限于一些内部笔记和一些Wiki页面。它不是以我很乐意与社区分享的方式进行组织或清理的。我有很高的渴望将这些笔记融合在一起，并以对社区有用的形式。但是，我确实需要乞求您的耐心，因为新需求似乎是一个不断的流。

@Bradley4681<\/user-mention> When you run:
\r\n
code:<\/b>
nutanix@NTNX:~$ python \/home\/nutanix\/serviceability\/bin\/edit-aos-gflags --service=stargate
<\/pre><\/div>
\r\nYou enter in to a text editor with all the stargate service glfags. Here, you can edit the settings to achieve different results.
\r\n
\r\nThe settings i adjusted that wound up working for me basically amounted to an increase in the number of streams per vdisk, the amount of data saturation allowed on the line, and the priority that stargate gave to replication processing. This allowed me to overcome the high latancy on my cross US pipe.
\r\n
\r\nYou can find my specific setting adjustments in the block that you quoted. Also, you'll probably need to get a genesis restart at a minimum.
\r\n
\r\nIts worth noting a few things about my environment that made this combination of settings work without any negative impacts:
\r\n
\r\n
\r\n
Homogeneous images:<\/b> every guest in the entire cluster is doing the same type of work, is the same size and has the same schedule. This meant that i can apply cluster wide changes without adversely affecting odd ball systems. \r\n
Low compute demand<\/b>: I have 6 guests in an 8 node cluster. basically, i can afford to give loads of power to nutanix services rather than servicing guest compute because my guests do not have a high compute demand. \r\n
Scheduled workloads<\/b>: The guests here are backup media servers. They have a \"busy\" timeframe where IO for the guests is important, and a \"slow\" timeframe when IO can be reserved for nutanix services. basically, we backup more at night and less during the day. \r\n
High latancy, but high bandwidth<\/b>. The pipe between my cross country datacenters is 2 x 10gbps. we have a qos policy that limits our workload down to 5gbps for this cluster, but that is still a rather wide pipe. The latency is 74ms. \r\n<\/ol>\r\nRE: the guide<\/b>: To my embarrassment, my guide is limited to some internal notes and a few wiki pages for my team. It isn't organized or cleaned up in a fashion that i'd be comfortable sharing with the community. I have high aspirations to get these notes together and in a form that would be useful to the community. I do need to beg for your patience however, as new demand seems to be a constant stream right now.","quoteUsername":"penguindows","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

由内部提供动力

条款和条件

报名

已经有一个帐户？登录

使用您的帐户登录

登录社区

使用您的帐户登录

输入您的用户名或电子邮件地址。我们将向您发送带有指令的电子邮件以重置您的密码。

用户名或电子邮件

返回概述

扫描病毒文件。

抱歉，我们仍在检查该文件的内容，以确保它可以安全下载。请在几分钟后再试一次。
好的

该文件无法下载

抱歉，我们的病毒扫描仪检测到该文件无法安全下载。
好的

Learn more about our cookies.<\/a>","cookiepolicy.button":"Accept cookies","cookiepolicy.button.deny":"Deny all","cookiepolicy.link":"Cookie settings","cookiepolicy.modal.title":"Cookie settings","cookiepolicy.modal.content":"We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.<\/a>","cookiepolicy.modal.level1":"Basic
Functional","cookiepolicy.modal.level2":"Normal
Functional + analytics","cookiepolicy.modal.level3":"Complete
Functional + analytics + social media + embedded videos"}}}">