Prometheus
安装工具依赖
1
| yum -y install net-tools vim wget zip gzip unzip gcc gcc-c++ cmake make automake autoconf libtool readline readline-devel openssl openssl-devel zlib zlib-devel bison git expat-devel libaio net-snmp net-snmp-utils net-snmp-libs net-snmp-devel
|
安装编译Go,配置环境变量
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| https://go.dev/dl/ 官网选择对应版本
tra -zxvf go1.15.8.linux-amd64.tar.gz 解压安装包
vim /etc/profile 全局且永久性配置
最后一行插入:
export GOROOT=/usr/local/go
export GOPATH=~/golib:~/goproject
export GOBIN=~/gobin
export PATH=$PATH:$GOROOT/bin:$GOBIN
source /etc/profile 重新加载环境变量(配置立即生效;否则需要重启)
Go version 查看版本,显示则环境配置成功
|
Prometheus(数据源)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| https://prometheus.io/download/#prometheus 官网选择对应版本
tar -zxvf prometheus-2.29.2.linux-amd64.tar.gz 解压即可食用
cd 切换到保存prometheus的目录
./prometheus --config.file=文件目录/prometheus.yml 测试启动
网页访问:localhost:9090 默认端口号为9090
注册为系统服务:
vim /etc/systemd/system/prometheus.service 创建prometheus启动文件
配置:
[Unit]
Description=Prometheus Monitoring System
Documentation=Prometheus Monitoring System
[Service]
ExecStart=/usr/local/prometheus/prometheus \ 自己的安装目录文件
--config.file=/usr/local/prometheus/prometheus.yml \
--web.listen-address=:9090
Restart=on-failure
[Install]
WantedBy=multi-user.target
systemctl daemon-reload 重新加载系统启动文件(配置立即生效)
systemctl enable prometheus
systemctl start prometheus
systemctl status prometheus
Prometheus直接的监控页面看起来不够直观,所以下一步安装可视化界面更直观(Grafana)
|
注意:rpm包安装的位置为开发者指定好的位置,而源码包安装的位置需要安装者自行指定
Grafana(更直观,可视化平台)
1
2
3
4
5
6
7
8
| https://grafana.com/grafana/download 官网选择安装方式
systemctl daemon-reload 这里的系统启动文件无需自己添加注册
systemctl enable grafana-server.service
systemctl start grafana-server.service
网页访问:localhost:3000 默认端口号为3000;默认用户名密码都是admin
|
监控Linux+Windows
1
2
3
| https://github.com/prometheus-community/windows_exporter/tags wiondows下载地址
https://github.com/prometheus/node_exporter/tags Linux下载地址
|
node_export注册为系统服务(Windows下载运行即可无需配置)皆在被监控机上操作
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| vim /etc/systemd/system/node_exporter.service
[Unit]
Dsecription=node
After=network.target
[Service]
ExecStart=/usr/local/node/node_exporter-1.2.0.linux-amd64/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl enable node_exporter
systemctl start node_exporter
systemctl status node_exporter
网页访问:localhost:9100 若出现数据则服务开启成功
|
prometheus.yml配置 监控机器配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| static_configs:
- targets: ["localhost:9090"]
- job_name: "Linux"
static_configs:
- targets: ["被监控设备IP:9100"] Linux监控端口为9100
labels:
instance: Linux
- job_name: "Windows"
static_configs:
- targets: ["被监控设备IP:9182"] Windows监控端口为9182
labels:
instance: Windows
|
添加数据源+仪表盘
1
| https://www.cnblogs.com/guoxiangyue/p/11772717.html
|
alertmanager(邮件报警服务)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| https://github.com/prometheus/alertmanager/releases/tag/v0.24.0 报警服务下载地址
vim /etc/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
After=network.target
[Service]
WorkingDirectory=/email/alertmanager
ExecStart=/email/alertmanager/alertmanager --config.file=alertmanager.yml --log.level=debug --log.format=json
Restart=on-f
|
alertmanager.yml配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
| global: 第三方登录发送方邮箱
resolve_timeout: 5m
smtp_smarthost: 'smtp.qq.com:465' 服务器地址
smtp_from: '1419846302@qq.com' 你的邮箱,必须与下面一直
smtp_auth_username: '1419846302@qq.com' 你的邮箱
smtp_auth_password: 'sibqupqnqeilggig' 第三方登录授权码
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 20s
repeat_interval: 2m
receiver: 'mail'
routes:
- receiver: 'mail'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:9093/'
- name: 'mail'
email_configs:
- to: 'w2030w1@163.com' 被发送方邮箱
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
Shift+: wq
./amtool check-config alertmanager.yml 检查alertmanager.yml 配置是否正确
|
Prometheus.yml配置告警规则及文件路径
1
2
3
4
5
6
7
8
| alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
rule_files:
- "/node_rule/*.yml" 配置告警规则(这里的文件需自己创建)
# - "first_rules.yml"
# - "second_rules.yml"
|
告警规则文件/node_rule/rule.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| groups:
- name: hostStatsAlert # 监测设备是否存活,离线立即告警
rules:
- alert: "大哥,你的服务器挂掉了"
expr: up == 0
for: 30s
labels:
severity: critical
annotations:
summary: "Instance down"
description: " of job 已经关闭超过30秒."
- alert: "CPU过高" # 监测CPU使用情况,超过阈值立即告警
expr: 100 - (avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance) * 100) > 50
for: 1m
labels:
severity: warning
annotations:
summary: "Instance CPU usgae high"
description: " CPU使用率超85% (current value: )"
- alert: "主机内存使用报警"
expr: 100 - (node_memory_MemFree_bytes+node_memory_Cached_bytes+node_memory_Buffers_bytes) / node_memory_MemTotal_bytes * 100 > 85
for: 1m
labels:
severity: warning
annotations:
summary: "Instance MEM usgae high"
description: " 内存使用率过高 > 85% (current value: )"
|