一,物理节点安装配置(简单配置,未涉及报警及grafana图形展示) 1,prometheus 官网下载安装 下载安装# pwd/usr/local/srchttps://github.com/prometheus/prometheus/releases/download/v2.12.0/prometheus-2.12.0.l
一,物理节点安装配置(简单配置,未涉及报警及grafana图形展示)
1,prometheus 官网下载安装
下载安装 # pwd /usr/local/src https://github.com/prometheus/prometheus/releases/download/v2.12.0/prometheus-2.12.0.linux-amd64.tar.gz # tar xvf prometheus-2.11.1.linux-amd64.tar.gz # ln -sv /usr/local/src/prometheus-2.11.1.linux-amd64 /usr/local/prometheus # cd /usr/local/prometheus 服务启动脚本 # vim /etc/systemd/system/prometheus.service [Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network.target [Service] Restart=on-failure WorkingDirectory=/usr/local/prometheus/ ExecStart=/usr/local/prometheus/prometheus -- config.file=/usr/local/prometheus/prometheus.yml [Install] WantedBy=multi-user.target 配置所监控的node cd /usr/local/prometheus # grep -v "#" prometheus.yml | grep -v "^$" global: alerting: alertmanagers: - static_configs: - targets: rule_files: scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'promethues-node' static_configs: - targets: ['192.168.7.110:9100','192.168.7.111:9100'] 修改配置文件后需要重启服务 启动 # systemctl daemon-reload # systemctl restart prometheus # systemctl enable prometheus 查看端口是否监听正常
2,节点安装
# pwd /usr/local/src https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz # tar xvf node_exporter-0.18.1.linux-amd64.tar.gz # ln -sv /usr/local/src/node_exporter-0.18.1.linux-amd64 /usr/local/node_exporter # cd /usr/local/node_exporter 启动脚本 # vim /etc/systemd/system/node-exporter.service [Unit] Description=Prometheus Node Exporter After=network.target [Service] ExecStart=/usr/local/node_exporter/node_exporter [Install] WantedBy=multi-user.target 启动 # systemctl daemon-reload # systemctl restart node-exporter # systemctl enable node-exporter 查看端口是否监听正常,关闭防火墙和selinxu
3,监控k8s
参考https://github.com/NVIDIA/gpu-monitoring-tools/tree/master/exporters/prometheus-dcgm
起gpu特定容器做监控