为什么要做 Segments Force Merge
线上 es 集群的 Segments 监控如下图所示:
如果不能理解 segment 可以查看我另外一篇文章:《理解 Elasticsearch 数据持久化模型》。
显然,我们的集群 Segments 数量多,占用空间大,需要手动干预了。合并 Segments 可以消除已标记删除的文档,释放内存,减少占用空间同时也能提高搜索速度。
工具选择
我们选择 curator,选择理由是 Elasticsearch 官方出品,功能丰富。
链接:here
安装 curator
下载地址:https://www.elastic.co/guide/en/elasticsearch/client/curator/current/yum-repository.html
这里推荐使用 Direct Package Download Link
选择所需的版本,这里选择 CentOS 7 的
wget https://packages.elastic.co/curator/5/centos/7/Packages/elasticsearch-curator-5.5.4-1.x86_64.rpm
下载完成后安装
yum install elasticsearch-curator-5.5.4-1.x86_64.rpm
配置 curator
配置文件放在 /etc/curator
下,没有的话,就创建
mkdir /etc/curator
mkdir /etc/curator/actions
touch /etc/curator/curator.yml
touch /etc/curator/actions/force-merge.yml
在该目录下创建两个文件,一个是Configuration File
,另外一个是Action File
这里配置是Configuration File
命名为 curator.yml
; Action File
更多是业务相关的,这里命名为 ./actions/force-merge.yml
curator.yml
内容如下:
---
# Remember, leave a key empty if there is no value. None will be a string,
# not a Python "NoneType"
client:
hosts:
- 127.0.0.1
port: 9200
url_prefix:
use_ssl: False
certificate:
client_cert:
client_key:
aws_key:
aws_secret_key:
aws_region:
ssl_no_validate: False
http_auth:
timeout: 30
master_only: False
logging:
loglevel: INFO
logfile:
logformat: default
blacklist: ['elasticsearch', 'urllib3']
force-merge.yml
内容如下:
---
# Remember, leave a key empty if there is no value. None will be a string,
# not a Python "NoneType"
#
# Also remember that all examples have 'disable_action' set to True. If you
# want to use this action as a template, be sure to set this to False after
# copying it.
actions:
1:
action: forcemerge
description: >-
forceMerge logstash- prefixed indices older than 2 days (based on index
creation_date) to 2 segments per shard. Delay 120 seconds between each
forceMerge operation to allow the cluster to quiesce.
This action will ignore indices already forceMerged to the same or fewer
number of segments per shard, so the 'forcemerged' filter is unneeded.
options:
max_num_segments: 2
delay: 120
timeout_override:
continue_if_exception: False
disable_action: True
filters:
- filtertype: pattern
kind: prefix
value: logstash-
exclude:
- filtertype: age
source: creation_date
direction: older
unit: days
unit_count: 2
exclude:
意思是找出 index 创建时间大于2天,然后每个分片合并到2个段,每隔120秒执行下一个分片。请根据需求修改配置内容。
定时启动 curator
每天晚上2点执行 curator,执行下面命令,将定时任务写入到 root 用户下的定时任务列表
echo '0 2 * * * /usr/bin/curator --config /etc/curator/curator.yml /etc/curator/actions/force-merge.yml > /var/spool/cron/root
教程结束! 👊
本文由 Chakhsu Lau 创作,采用 知识共享署名4.0 国际许可协议进行许可。
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名。