Grafana Cloud监控各服务

选择 Grafana Cloud 原因

  • 有免费额度,足够使用了
    • 10k Metrics
    • 50 GB Logs
    • 50 GB Traces
    • 500 k6 Virtual User Hours
    • 3 Monthly Active Users
  • 自建无法保证稳定性
  • 实际是为 2G 内存的轻量服务器省内存(逃

采集各服务指标

先配置,确保能采集到数据,再进行省 Metrics 处理

Linux Node

Windows Node

  • 同样根据官网提示来进行安装,配置
  • 可以使用 scoop 安装
1
2
scoop bucket add lemon https://github.com/hoilc/scoop-lemon
scoop install lemon/windows_exporter
  • 使用 nssm 安装到服务
1
nssm install "Windows Exporter" "C:\Users\Administrator\scoop\apps\windows_exporter\current\windows_exporter.exe"

Proxmox VE

  • 先配置 pve 账户
1
2
pveum useradd prometheus@pve --password <password> --comment <comment>
pveum acl modify / --users prometheus@pve --roles PVEAuditor
1
2
3
#docker变量
PVE_USER prometheus@pve
PVE_PASSWORD <password>
  • 直接启动就行,默认端口 9221
  • agent 配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
- name: pve
scrape_configs:
- job_name: pve
static_configs:
- targets:
- 1.2.3.4
metrics_path: /pve
params:
module: [default]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 1.2.3.4:9221

Mikrotik

  • 使用 snmp 采集数据
  • 仓库地址https://github.com/IgorKha/Grafana-Mikrotik,此仓库提供了mashinkopochinko/snmp_exporter_mikrotik:latest,但是我喜欢使用最新的版本,把[https://github.com/IgorKha/Grafana-Mikrotik/blob/master/snmp/snmp.yml](https://github.com/IgorKha/Grafana-Mikrotik/blob/master/snmp/snmp.yml)文件下载到本地,使用prom/snmp-exporter:latest挂载snmp.yml
  • 启动后默认端口 9116
  • 配置 routeros snmp
1
snmp set enabled=yes
  • agent 配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
- name: mikrotik
scrape_configs:
- job_name: Mikrotik
static_configs:
- targets:
- 1.2.3.4 # mikrotik_ip
metrics_path: /snmp
params:
module: [mikrotik]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 1.2.3.4:9116

关于怎么省 Metrics

  • 因为只有 10K,跑 node_exporter 就能占 1K 多,所以需要只收集需要的指标

Linux Node

  • 只采集下面这些指标即可满足常用
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
node_exporter:
enabled: true
relabel_configs:
- replacement: hostname
target_label: instance
set_collectors:
- boottime
- cpu
- diskstats
- filesystem
- loadavg
- meminfo
- netstat
- netdev
- sockstat
- time
- uname
- filefd
- stat
- netclass

Windows Node

1
2
# 在nssm edit中添加参数
--collectors.enabled "cpu,cs,logical_disk,os,system,net"
  • windows_exporter 有几个 go 相关指标,可以忽略
1
2
3
4
5
6
7
8
9
10
- name: win
scrape_configs:
- job_name: win
static_configs:
- targets:
- 1.2.3.4:9182
metric_relabel_configs:
- source_labels: [__name__]
regex: "go_.*"
action: drop

Proxmox VE

这个就不用省了,原本就不多

Mikrotik

指标基本都需要,如果有不需要的可以自定义精简 snmp.yml

DPM

每分钟数据点,不能大于 1,所以所有 agent 上报间隔需要设置为 60s 以上

完整示例配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
integrations:
node_exporter:
enabled: true
relabel_configs:
- replacement: hostname
target_label: instance
set_collectors:
- boottime
- cpu
- diskstats
- filesystem
- loadavg
- meminfo
- netstat
- netdev
- sockstat
- time
- uname
- filefd
- stat
- netclass

metrics:
configs:
- name: mikrotik
scrape_configs:
- job_name: Mikrotik
static_configs:
- targets:
- 1.2.3.4 # mikrotik_ip
metrics_path: /snmp
params:
module: [mikrotik]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 1.2.3.4:9116
- name: pve
scrape_configs:
- job_name: pve
static_configs:
- targets:
- 1.2.3.4 # pve_ip
metrics_path: /pve
params:
module: [default]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 1.2.3.4:9221
- name: win
scrape_configs:
- job_name: win
static_configs:
- targets:
- 1.2.3.4:9182
metric_relabel_configs:
- source_labels: [__name__]
regex: "go_.*"
action: drop
global:
scrape_interval: 60s
remote_write:
- url: https://prometheus-prod-18-prod-ap-southeast-0.grafana.net/api/prom/push
basic_auth:
username: <username>
password: <password>
wal_directory: /tmp/grafana-agent-wal