Elasticsearch 的 Force Merge 定时任务搭建与配置
in Note with 0 comment
Elasticsearch 的 Force Merge 定时任务搭建与配置
in Note with 0 comment

为什么要做 Segments Force Merge

线上 es 集群的 Segments 监控如下图所示:

1537848547106-3205d6a5-c1e0-42d3-95f7-226e76244fd6-image-resized.png

如果不能理解 segment 可以查看我另外一篇文章:《理解 Elasticsearch 数据持久化模型》

显然,我们的集群 Segments 数量多,占用空间大,需要手动干预了。合并 Segments 可以消除已标记删除的文档,释放内存,减少占用空间同时也能提高搜索速度。

工具选择

我们选择 curator,选择理由是 Elasticsearch 官方出品,功能丰富。

链接:here

安装 curator

下载地址:https://www.elastic.co/guide/en/elasticsearch/client/curator/current/yum-repository.html

这里推荐使用 Direct Package Download Link

选择所需的版本,这里选择 CentOS 7 的

wget https://packages.elastic.co/curator/5/centos/7/Packages/elasticsearch-curator-5.5.4-1.x86_64.rpm

下载完成后安装

yum install elasticsearch-curator-5.5.4-1.x86_64.rpm

配置 curator

配置文件放在 /etc/curator 下,没有的话,就创建

mkdir /etc/curator
mkdir /etc/curator/actions
touch /etc/curator/curator.yml
touch /etc/curator/actions/force-merge.yml

在该目录下创建两个文件,一个是Configuration File,另外一个是Action File

这里配置是Configuration File 命名为 curator.yml ; Action File 更多是业务相关的,这里命名为 ./actions/force-merge.yml

curator.yml 内容如下:

---
# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
client:
  hosts:
    - 127.0.0.1
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  aws_key:
  aws_secret_key:
  aws_region:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging:
  loglevel: INFO
  logfile:
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

force-merge.yml 内容如下:

---
# Remember, leave a key empty if there is no value.  None will be a string,
# not a Python "NoneType"
#
# Also remember that all examples have 'disable_action' set to True.  If you
# want to use this action as a template, be sure to set this to False after
# copying it.
actions:
  1:
    action: forcemerge
    description: >-
      forceMerge logstash- prefixed indices older than 2 days (based on index
      creation_date) to 2 segments per shard.  Delay 120 seconds between each
      forceMerge operation to allow the cluster to quiesce.
      This action will ignore indices already forceMerged to the same or fewer
      number of segments per shard, so the 'forcemerged' filter is unneeded.
    options:
      max_num_segments: 2
      delay: 120
      timeout_override:
      continue_if_exception: False
      disable_action: True
    filters:
    - filtertype: pattern
      kind: prefix
      value: logstash-
      exclude:
    - filtertype: age
      source: creation_date
      direction: older
      unit: days
      unit_count: 2
      exclude:

意思是找出 index 创建时间大于2天,然后每个分片合并到2个段,每隔120秒执行下一个分片。请根据需求修改配置内容。

定时启动 curator

每天晚上2点执行 curator,执行下面命令,将定时任务写入到 root 用户下的定时任务列表

echo '0 2 * * * /usr/bin/curator --config /etc/curator/curator.yml /etc/curator/actions/force-merge.yml > /var/spool/cron/root

教程结束! 👊

Responses