解决了

顺序I/O和OPLOG与范围存储

10个月前
2021年2月26日
7个答复
515意见

+1

阿尔贡
冒险家
5个答复

仅几个问题是顺序I/O和OPLOG与范围存储的问题。如果I/O在本质上被认为是顺序的，那么这会始终绕过OPLOG，或者仅在写操作大于1MB时才绕过？绕过OPLOG是否意味着相比之下的写作会慢得多？它仍然达到SSD，因此我的假设是它将是相同的。为什么在依次排出它们之前将写作的过程融合在一起有助于绩效？我对为什么需要直接写入SSD然后复制出来的步骤感兴趣。

图标

最好的答案阿罗纳2021年3月3日，23:29

I am more than happy to attempt to explain this. Feel free to ask more questions.<\/p>

You read the data after you write it. Just like me writing this right now. Imagine that you\u2019re working with the alphabet. You can write a b c d e \u2026 z or you can write a y h i \u2026 c b l \u2026 k.<\/p>

In both instances, the task is the same \u2013 to read the letters in alphabetical order. Which scenario is going to take you longer? (Please disregard the fact that you can reproduce the order from memory without reading it)<\/p>

It gets a little more complicated in real life where there are multiple layers of data organisation but in a nutshell, this is why sequenced I\/O is better than random.<\/p>

The write is received and evaluated. If it\u2019s sequential then it is written to the extent store. If it is random then it hangs out in the oplog until either it becomes part of the sequence (and is drained) or it is overwritten.<\/p>

Draining oplog sequentially means to write pieces of data not as they appear in the oplog but in the order to the extent store. Instead of writing a y h i \u2026 c b l \u2026 k the extent store will receive a b c d e \u2026 z. In that way, when the read request comes for a letter, a number of them or a sequence, it is easy to locate them on the extent store. Think of it as looking for a file or a folder on your computer. You either sort it by date or alphabetical order but you sort it to find what you\u2019re looking for.<\/p>

The data that has been touched recently is likely to be touched again soon. That\u2019s why the buffers are everywhere: RAM, your recent files in any text editor, your recent file in any file browser that you use, NICs have sort of a cache to handle bursts of I\/O too.<\/p>","className":"post__content__best_answer"}">

查看原件

nutanix
AOS

Just a few questions\u00a0re sequential\u00a0I\/O and the OPLOG versus the Extent Store. If the I\/O is deemed sequential in nature, will this always\u00a0bypass the OPLOG\u00a0or only when the write operation is larger then 1MB?\u00a0 Does bypassing the oplog mean that the write\u00a0will be a lot slower in comparison?\u00a0 It\u00a0still hits the SSD so my assumption is that\u00a0it\u2019s going to be the same. Why does the process of\u00a0coalescing\u00a0the writes before sequentially draining them help with performance? I\u2019m interested in why this step is necessary as opposed to just writing directly to the SSD and then replicating out.<\/p>","quoteUsername":"Algon","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢
引用
分享

该主题已关闭以供评论

7个答复

UserLevel 6

+5

阿罗纳
Nutanix员工
433个答复
10个月前
2021年3月3日

这是一个有趣的问题。

当vDisk（截至4.6）上有超过1.5MB的杰出写入IO时，写入IO被认为是顺序的。iOS满足这将绕过OPLOG并直接进入范围商店，因为它们已经是大量的一致数据，并且不会从合并中受益。

Nutanix圣经：I/O路径和缓存

您是正确的说，由于数据写入SSH的任何一种方式都没有太大的提高。数据仍然被认为是热的，因此不会立即将其放置在HDD上。但是，如果您对范围存储的下一步读取的读数提前思考，则由于数据已经对齐，因此它将更加有效。让我知道这是否有道理。

一个

+1

阿尔贡
作者
冒险家
5个答复
10个月前
2021年3月3日

感谢您帮助我更好地理解这一点。您所说的话是有道理的，并符合我的想法。我仍然对这部分有些不清楚“但是，如果您对范围存储的下一步读取的读数提前思考，则该数据将更加有效，因为数据以前已经对齐。”我仍然不确定合并的好处，然后依次排除传入的写作实际上。

Thanks for helping me to understand this better. It makes sense what you\u2019ve said\u00a0and matches what I was thinking. I\u2019m still a little unclear about this part though \u201cHowever, if you think ahead about the read that comes next from the extent store, it will be more efficient since the data had been aligned previously.\u201d<\/em> I\u2019m still unsure about what the\u00a0benefit is of coalescing then sequentially draining the incoming writes actually has.<\/p>","quoteUsername":"Algon","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

UserLevel 6

+5

阿罗纳

Nutanix员工

433个答复

10个月前
2021年3月3日

回答

我很乐意尝试解释这一点。随时提出更多问题。

编写数据后，您会读取数据。就像我现在写这篇文章一样。想象一下您正在使用字母。您可以写一个B C D E…Z，也可以写A Y H I…C B L…K。

在这两种情况下，任务都是相同的 - 按字母顺序读取字母。哪种情况将花费您更长的时间？（请忽略一个事实，即您可以在不阅读内存的情况下从内存中复制订单）

在现实生活中有多个数据组织的现实生活中变得更加复杂，但简而言之，这就是为什么测序I/O比随机测序更好的原因。

收到并评估写作。如果是顺序的，则将其写入范围存储。如果它是随机的，则它将在OPLOG中悬挂，直到它成为序列的一部分（并且被排干）或被覆盖。

排出OPLOG依次意味着编写数据片，而不是像在OPLOG中出现的那样，而是按范围存储的顺序。而不是写y H i…c b l…k范围的商店将收到b c d e…z。这样，当读取请求出现字母，其中许多或序列时，很容易将它们定位在范围内。将其视为在计算机上寻找文件或文件夹。您要么按日期或字母顺序排序，但要对其进行排序以找到所需的内容。

最近接触到的数据很快很快就会再次触及。这就是为什么缓冲区无处不在：RAM，您在任何文本编辑器中的最新文件，您使用的任何文件浏览器中的最新文件，NICS都有一种缓存可以处理I/O的爆发。

I am more than happy to attempt to explain this. Feel free to ask more questions.<\/p>
You read the data after you write it. Just like me writing this right now. Imagine that you\u2019re working with the alphabet. You can write a b c d e \u2026 z or you can write a y h i \u2026 c b l \u2026 k.<\/p>
In both instances, the task is the same \u2013 to read the letters in alphabetical order. Which scenario is going to take you longer? (Please disregard the fact that you can reproduce the order from memory without reading it)<\/p>
It gets a little more complicated in real life where there are multiple layers of data organisation but in a nutshell, this is why sequenced I\/O is better than random.<\/p>
The write is received and evaluated. If it\u2019s sequential then it is written to the extent store. If it is random then it hangs out in the oplog until either it becomes part of the sequence (and is drained) or it is overwritten.<\/p>
Draining oplog sequentially means to write pieces of data not as they appear in the oplog but in the order to the extent store. Instead of writing a y h i \u2026 c b l \u2026 k the extent store will receive a b c d e \u2026 z. In that way, when the read request comes for a letter, a number of them or a sequence, it is easy to locate them on the extent store. Think of it as looking for a file or a folder on your computer. You either sort it by date or alphabetical order but you sort it to find what you\u2019re looking for.<\/p>
The data that has been touched recently is likely to be touched again soon. That\u2019s why the buffers are everywhere: RAM, your recent files in any text editor, your recent file in any file browser that you use, NICs have sort of a cache to handle bursts of I\/O too.<\/p>","quoteUsername":"Alona","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

一个

+1

阿尔贡

作者

冒险家

5个答复

10个月前
2021年3月4日

再次感谢您抽出宝贵的时间来解释。啊，我看到的，从本质上讲，数据依次排出的原因是使以后更快地阅读数据，因为正如您所说，顺序阅读比随机读取要快得多？数据是否在排出之前将数据复制到其他节点？另外，只是为了确认，在1.5MB以下的任何顺序写操作仍将写入OPLOG？

Thanks again for taking the time to explain. Ah I see, so essentially the reason the data is sequentially\u00a0drained is to make reading the data later on a lot faster,\u00a0because as you said\u00a0reading sequentially is a lot fast then\u00a0randomly? Is the data replicated to other nodes before it\u2019s drained? Also, just to confirm, any sequential write operation under 1.5MB in size would still be written to the oplog?<\/p>","quoteUsername":"Algon","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

UserLevel 6

+5

阿罗纳

Nutanix员工

433个答复

10个月前
2021年3月8日

这既是读和写操作，都可以从依次编写中受益。当您需要编辑文件/数据块/等时，您需要先查找它。每个搜索操作都需要时间。按照人类标准不久，但确实如此。使用随机数据，由于系统需要在磁盘上的每个位置找到并更改数据，因此累积了这段时间。应用程序对延迟越敏感，差异越明显。当数据按顺序编写时，检索需要更少的时间。对于任何系统都是如此。例如，这是为什么碎裂（不是同一件事，但出于类似原因起作用）可以改善性能。

当数据后来被认为是冷并迁移到HDD时，差异尤其明显。

所有其他iOS，包括可能大的iOS（例如> 64k），OPLOG仍将处理。

Nutanix圣经：I/O路径和缓存

It is both read and write operations that benefit from being written sequentially. When you need to edit a file\/data block\/etc you need to find it first. Each search operation takes time. Not long by human standards but it does. With randomized data this time is accumulated as the system needs to find and change the data in each of the locations on the disk. The more sensitive the application is to the latency the more noticeable the difference is. When the data is written in sequence the retrieval takes less time. This is true for any system. This why defragmentation (not the same thing but works for a similar reason) improves performance, for example.<\/p>
The difference is especially noticeable when the data is later considered to be cold and migrated to the HDD.<\/p>
All other IOs, including those which can be large (e.g. >64K) will still be handled by the OpLog.<\/p><\/blockquote>
Nutanix Bible: I\/O Path and Cache<\/a><\/p>","quoteUsername":"Alona","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

一个

+1

阿尔贡

作者

冒险家

5个答复

10个月前
2021年3月11日

太好了，这是完全有道理的，并且解释很好。感谢您抽出宝贵的时间回答我的疑问。我可以继续询问更多，但您肯定回答了我的最初问题和一些问题。干杯

Great, that makes complete sense and well explained. Thanks for taking the time to answer my queries. I could go on asking more but you\u2019ve certainly answered my initial questions and some. Cheers<\/p>","quoteUsername":"Algon","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

UserLevel 6

+5

阿罗纳

Nutanix员工

433个答复

10个月前
2021年3月12日

很高兴提供帮助:)您可以继续问。我不能保证我会得到答案，但我会尽力而为。

Happy to help:) You can keep on asking. I can\u2019t promise I will have answers but I\u2019ll do my best.<\/p>","quoteUsername":"Alona","translations":{"Common":{"like":"Like","unlike":"Unlike"},"Forum":{"Quote":"Quote","Share":"Share"}}}">

喜欢

引用

注册

登录社区

扫描病毒文件。

该文件无法下载