es解决分片恢复失败问题

ES解决恢复分片失败的问题

1. 问题描述

当集群某台节点离线后，又加入集群时，因为分片恢复问题，会遇到如下问题：


xxxxxxxxxx
2
1
obtaining shard lock timed out after 5000ms, previous lock details: [shard creation] trying to lock for [shard creation]]
2
failed to obtain in-memory shard lock

有时候，即使我们调用了retry_failedAPI，仍然无法恢复分片。


xxxxxxxxxx
1
1
POST _cluster/reroute?retry_failed=true&pretty

2. 解决方法

方法1：重启节点

该方法通过重启节点可以快速释放被锁住的内存。但是缺点是：如果多台节点都无法恢复分片，那么就需要重启多台节点。

方法2：手动清理内存

要知道分片被锁住，是因为ES线程正在对某个shard做bulk或者scroll等长时间的操作，导致该shard无法被其他线程获取。此时需要做以下步骤：

关闭索引的写入功能


xxxxxxxxxx
6
1
PUT <index_name>/_settings
2
{
3
  "index": {
4
    "blocks.write": true
5
  }
6
}

关闭该索引


xxxxxxxxxx
1
1
POST sumap_domain_20201009/_close

等待10分钟，调用：


xxxxxxxxxx
1
1
POST _cluster/reroute?retry_failed=true&pretty

如果还没有恢复，直接删除游标的上下文。（此操作会导致所有索引的游标都被删除）
```
xxxxxxxxxx
1
1
DELETE /_search/scroll/_all
```

恢复索引设置


xxxxxxxxxx
7
1
PUT <index_name>/_settings
2
{
3
  "index": {
4
    "blocks.write": false
5
  }
6
}
7
POST sumap_domain_20201009/_open

es解决分片恢复失败问题

1. 问题描述

2. 解决方法

方法1：重启节点

方法2：手动清理内存

es第三方监控方案

es强制段合并实验

Comments NOTHING

发表评论取消回复

1. 问题描述

2. 解决方法

方法1：重启节点

方法2：手动清理内存

分享到：

es第三方监控方案

es强制段合并实验

Comments NOTHING

发表评论取消回复