es强制段合并实验

youncyb 发布于 2023-02-01 3204 次阅读 大数据


ES强制段合并实验

1. 问题

由于集群的磁盘空间限制,我们删除了超过10亿的数据,但是发现删除后,磁盘的可使用空间并有快速上升。原因在于es的删除文档并不是物理删除,只是标记为"删除状态"。当发生merge时,才会物理意义上的删除。

一个索引如果deleted状态的索引过多,会影响到查询的效率。根据这篇文章显示1,超过50%的文档被标记为deleted,会导致搜索性能下降约30%。

Because deleted documents remain in the index, they must still be decoded from the postings lists and then skipped during searching, so there is added search cost. To test how much, I ran a search performance test for varying queries using the 100 M document index with no deletions as the baseline, and the same index with 50% deleted documents (i.e., 150 M documents with 50M deleted). Both indices were single-segment. Here are the results:

QueryQPSStdDevQPS with deletesStdDev with deletes% change
Int Range query1.2(5.1%)0.6(1.8%)46%
Prefix query5.7(5.0%)3.4(2.3%)41%
Wildcard5.3(4.4%)3.2(2.2%)39%
And High+Low91.1(2.0%)59.5(2.1%)34%
Med Phrase36.2(2.8%)24.4(1.3%)32%
And High+Med16.6(1.5%)11.2(1.0%)32%

......

The bad news is there is clearly a non-trivial performance cost to deleted documents, and this is something we can work to reduce over time (patches welcome!). The good news is the cost is typically quite a bit lower than the percentage deletes (50% in this test) because these documents are filtered out at a low level before any of the costly query matchers and scorers see them. The more costly queries (Phrase, Span) tend to see the lowest impact, which is also good because it is the slow queries that determine node capacity for most applications.

但是由于强制合并会导致集群负载过高,所以首先对一个小的索引做了测试。

2. 强制合并小索引

小索引占用如下

1. 首先执行写入刷新

刷新写入,直至没有failed产生。

2. 关闭索引写入权限

3. 执行强制合并

ES8.x可以加wait_for_completion=false不必阻塞响应,only_expunge_deletes=true表示只合并包含一定数量deleted的segment。这个一定数量是由index.merge.policy.expunge_deletes_allowed进行控制,默认是10.0,代表如果一个分片deleted/total超过10%,则会对该分片进行合并操作。

4. 查看任务进度

3. 更改合并方案

在启用only_expunge_deletes=true参数后,默认情况下,合并结果:

deleted减少了3亿,通过计算,实际腾出空间约70GB。所以默认情况下的,强制合并能大幅度降低磁盘占用率,deleted占比越高,降低的占用率幅度就越大。

但,由于生产环境的索引过大,需要进行拆分,所以在拆分之前,先把deleted文档尽可能的删除掉,通过设置参数,提高被删除的比率:

 

通过计算:4207846666/(4207846666+12503736313)4010TB,可得至少可以释放10TB数据。再加上标记deleted也需要耗费磁盘空间,所以估计最终可以释放12TB。

合并过程中,会经历一个反复的磁盘使用率升高->磁盘使用率降低的过程。本次测试中,40TB膨胀到44TB,然后开始下降。所以做合并时,需要根据索引情况预留一部分空间,不然会直接失败。

耗时约1.1天后,合并完成,磁盘使用由40TB降至22TB,释放了18TB空间。但是每个分片出现了一些比较大的segment,最大的达到了40GB。

4. 查看segment信息

1. 查看当前哪些索引在合并

indexsegmentsCountsegmentsMemorymemoryTotalmergesCurrentmergesCurrentDocsstoreSizepr
domain227667.5gb12.2gb482134332987.5tb1600
ip1427013.3gb50.7gb471816870840.4tb1600
dns22202.8gb15.8gb003.7tb600

2. 查看某个索引各个segment情况

indexshardprirepipsegmentgenerationdocs.countdocs.deletedsizesize.memorycommittedsearchableversioncompound
domain_index36p172.16.80.1_stmt13447731611716924264.3gb1181094truetrue8.2.0true

3. 查看某个索引整体的deleted情况

indexdocs.countdocs.deletedstore.size
domain268331141794355170587.5tb

5. 强制合并导致的问题

强制合并会导致>=5GB的段,不太清楚一个段的大小达到100GB后,会导致什么结果。通过查询得到资料,目前会导致:

  1. 如果继续对该段进行update写入,则后台合并不会去关心>=5GB的段,这时需要手动执行forcemerge

参考