文章

centos部署prometheus

Prometheus

安装工具依赖


1
yum -y install net-tools vim wget zip gzip unzip gcc gcc-c++ cmake make automake autoconf libtool readline readline-devel openssl openssl-devel zlib zlib-devel bison git expat-devel libaio net-snmp net-snmp-utils net-snmp-libs net-snmp-devel

安装编译Go,配置环境变量


1
2
3
4
5
6
7
8
9
10
11
12
13
14
https://go.dev/dl/     官网选择对应版本

tra -zxvf  go1.15.8.linux-amd64.tar.gz    解压安装包

vim /etc/profile       全局且永久性配置

最后一行插入: 
           export GOROOT=/usr/local/go
           export GOPATH=~/golib:~/goproject
           export GOBIN=~/gobin
           export PATH=$PATH:$GOROOT/bin:$GOBIN
 source  /etc/profile   重新加载环境变量(配置立即生效;否则需要重启)
 
 Go  version   查看版本,显示则环境配置成功

Prometheus(数据源)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
https://prometheus.io/download/#prometheus    官网选择对应版本

tar -zxvf    prometheus-2.29.2.linux-amd64.tar.gz     解压即可食用

cd  切换到保存prometheus的目录

./prometheus  --config.file=文件目录/prometheus.yml       测试启动

网页访问:localhost:9090         默认端口号为9090

注册为系统服务:
              vim /etc/systemd/system/prometheus.service     创建prometheus启动文件
              配置:
                     [Unit]
                     Description=Prometheus Monitoring System
                     Documentation=Prometheus Monitoring System

                     [Service]
                     ExecStart=/usr/local/prometheus/prometheus \       自己的安装目录文件
                       --config.file=/usr/local/prometheus/prometheus.yml \
                       --web.listen-address=:9090
                     Restart=on-failure
                     [Install]
                     WantedBy=multi-user.target

systemctl daemon-reload    重新加载系统启动文件(配置立即生效)
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus

Prometheus直接的监控页面看起来不够直观,所以下一步安装可视化界面更直观(Grafana)

注意:rpm包安装的位置为开发者指定好的位置,而源码包安装的位置需要安装者自行指定

Grafana(更直观,可视化平台)


1
2
3
4
5
6
7
8
https://grafana.com/grafana/download            官网选择安装方式


systemctl daemon-reload                        这里的系统启动文件无需自己添加注册
systemctl enable grafana-server.service
systemctl start grafana-server.service

网页访问:localhost:3000         默认端口号为3000;默认用户名密码都是admin

监控Linux+Windows


1
2
3
https://github.com/prometheus-community/windows_exporter/tags     wiondows下载地址

https://github.com/prometheus/node_exporter/tags                              Linux下载地址
node_export注册为系统服务(Windows下载运行即可无需配置)皆在被监控机上操作

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
vim   /etc/systemd/system/node_exporter.service


[Unit]
Dsecription=node
After=network.target

[Service]
ExecStart=/usr/local/node/node_exporter-1.2.0.linux-amd64/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target



systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter
systemctl status node_exporter


网页访问:localhost:9100                若出现数据则服务开启成功
prometheus.yml配置 监控机器配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
      static_configs:
      - targets: ["localhost:9090"]

  - job_name: "Linux"
    static_configs:
    - targets: ["被监控设备IP:9100"]      Linux监控端口为9100
      labels:
        instance: Linux

  - job_name: "Windows"
    static_configs:
    - targets: ["被监控设备IP:9182"]       Windows监控端口为9182
      labels:
        instance: Windows
添加数据源+仪表盘

1
https://www.cnblogs.com/guoxiangyue/p/11772717.html   

alertmanager(邮件报警服务)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
https://github.com/prometheus/alertmanager/releases/tag/v0.24.0     报警服务下载地址


vim /etc/systemd/system/alertmanager.service

[Unit]
Description=alertmanager
After=network.target

[Service]
WorkingDirectory=/email/alertmanager
ExecStart=/email/alertmanager/alertmanager --config.file=alertmanager.yml --log.level=debug --log.format=json
Restart=on-f

alertmanager.yml配置


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
global:                      第三方登录发送方邮箱
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qq.com:465'   服务器地址  
  smtp_from: '1419846302@qq.com'     你的邮箱,必须与下面一直
  smtp_auth_username: '1419846302@qq.com'    你的邮箱
  smtp_auth_password: 'sibqupqnqeilggig'     第三方登录授权码
  smtp_require_tls: false
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 20s
  repeat_interval: 2m
  receiver: 'mail'
  routes:
  - receiver: 'mail'
receivers:
  - name: 'web.hook'
    webhook_configs:
      - url: 'http://127.0.0.1:9093/'
  - name: 'mail'
    email_configs:
    - to: 'w2030w1@163.com'          被发送方邮箱
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
    
Shift+:   wq
./amtool check-config alertmanager.yml   检查alertmanager.yml 配置是否正确

Prometheus.yml配置告警规则及文件路径


1
2
3
4
5
6
7
8
alerting:
  alertmanagers:
  - static_configs:
    - targets: ['localhost:9093']
  rule_files:                   
  - "/node_rule/*.yml"           配置告警规则(这里的文件需自己创建)
  # - "first_rules.yml"
  # - "second_rules.yml"

告警规则文件/node_rule/rule.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
groups:
- name: hostStatsAlert        # 监测设备是否存活,离线立即告警
  rules:
  - alert: "大哥,你的服务器挂掉了"
    expr: up == 0
    for: 30s
    labels:
      severity: critical
    annotations:
      summary: "Instance  down"
      description: " of job  已经关闭超过30秒."

  - alert: "CPU过高"     # 监测CPU使用情况,超过阈值立即告警
    expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 50
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "Instance  CPU usgae high"
      description: " CPU使用率超85% (current value: )"

  - alert: "主机内存使用报警"
    expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 85
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "Instance  MEM usgae high"
      description: " 内存使用率过高 > 85% (current value: )"
本文由作者按照 CC BY 4.0 进行授权