512のplacement groupsを3つのレプリカにする場合について、
ドキュメントには以下のように書かれています。
In a cluster of this size, the number of placement groups has almost no influence on data durability. It could be 128 or 8192 and the recovery would not be slower or faster.
However, growing the same Ceph cluster to 20 OSDs instead of 10 OSDs is likely to speed up recovery and therefore improve data durability significantly. Each OSD now participates in only ~75 placement groups instead of ~150 when there were only 10 OSDs and it will still require all 19 remaining OSDs to perform the same amount of object copies in order to recover. But where 10 OSDs had to copy approximately 100GB each, they now have to copy 50GB each instead. If the network was the bottleneck, recovery will happen twice as fast. In other words, recovery goes faster when the number of OSDs increases.
If this cluster grows to 40 OSDs, each of them will only host ~35 placement groups. If an OSD dies, recovery will keep going faster unless it is blocked by another bottleneck. However, if this cluster grows to 200 OSDs, each of them will only host ~7 placement groups. If an OSD dies, recovery will happen between at most of ~21 (7 * 3) OSDs in these placement groups: recovery will take longer than when there were 40 OSDs, meaning the number of placement groups should be increased.
二段落目の10個のOSDから20個のOSDになった場合の回復が早くなる理由は、分かりました。
しかし、三段落目の200個のOSDの場合、なぜ~21 (7 * 3) OSDsが対象になるのでしょうか。また40個のOSDの場合は35個のOSDに1つずつpgを配分することになると思われるので、200個のOSDの方がパフォーマンスが良くなるような気がします。
よろしくお願いします。
あなたの回答
tips
プレビュー