说明 我们当前k8s集群上使用prometheus做监控,由于开发同学有部分业务使用websocket接口,也为了能有效对业务应用进行监控和报警,很有必要对websocket api接口存活性进行探测和监管。具
说明
我们当前k8s集群上使用prometheus做监控,由于开发同学有部分业务使用websocket接口,也为了能有效对业务应用进行监控和报警,很有必要对websocket api接口存活性进行探测和监管。具体方案、实施流程和测试详见下文。
部署简单 websocket service
我们定义一个简单的websocket service用来监控报警测试,如下:
# 创建虚拟环境,也可以直接在宿主机上部署 mkvirtualenv -p /usr/bin/python3 websocket-server # 安装必要包 pip3 install websockets # cat websocket-server.py import asyncio import websockets async def echo(websocket, path): async for message in websocket: message = "I got your message: {}".format(message) await websocket.send(message) # 定义的ip地址要能与k8s通信 asyncio.get_event_loop().run_until_complete(websockets.serve(echo, '192.168.128.6', 8765)) asyncio.get_event_loop().run_forever() # 启动websocket服务 python websocket-server.py & # 查看服务 netstat -lnp|grep 8765websocket-exports
这里我们定义一个deployment用来将监控的多个websocket apimetrics对接到prometheus,内容如下:
k8s websocket deployment
# cat websocket-kube-mon-prometheus.yaml apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/name: wss app.kubernetes.io/version: v1.8.0 name: websocket-exporter namespace: kube-mon spec: replicas: 1 selector: matchLabels: app.kubernetes.io/name: wss template: metadata: labels: app.kubernetes.io/name: wss app.kubernetes.io/version: v1.8.0 spec: containers: - image: registry.cn-shanghai.aliyuncs.com/ai-voice-test/wss-expoter:v0.0.1 env: - name: ENDPOINT #多个ws用逗号分开 value: ws://www.abc.com,ws://192.168.128.6:8765 name: websocket-exporter ports: - containerPort: 9189 name: wss-metricsk8s websocket service
定义websocket service用来被prometheus监控,内容如下:
# cat service-websocket.yaml apiVersion: v1 kind: Service metadata: name: websocket namespace: kube-mon spec: # 暂使用nodeport的形式 type: NodePort ports: - port: 9189 targetPort: 9189 protocol: TCP nodePort: 32071 selector: app.kubernetes.io/name: wss获取ip port
# 启动上面deploy和service kubectl apply -f websocket-kube-mon-prometheus.yaml kubectl apply -f service-websocket.yaml # 查看pod和service kubectl get pod -n kube-mon kubectl get svc -n kube-mon NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE websocket NodePort 192.168.237.56 <none> 9189:32071/TCP 1h配置prometheus
配置prometheus监控
# vim sidecar/cm-kube-mon-sidecar.yaml //添加以下配置 - job_name: 'websocket' static_configs: - targets: ['192.168.237.56:9189'] # 重载 kubectl apply -f sidecar/cm-kube-mon-sidecar.yaml # prometheus reload: curl -X POST http://prometheus-pod-ip:9090/-/reload配置告警规则
# vim sidecar/rules-cm-kube-mon-sidecar.yaml //添加以下配置 - alert: websocket 接口探测到异常 expr: websocket{job="websocket"} < 1 for: 30s labels: severity: 紧急 annotations: #summary: "接口{{ $labels.url }} 探测异常" description: "websocket地址: {{ $labels.url }} 探测异常 , 状态为: down ." # 重载,prometheus有热更新,稍等待1分钟左右即可 kubectl apply -f sidecar/rules-cm-kube-mon-sidecar.yaml报警测试
关闭测试websocket service
# 查看进程号 netstat -lnp|grep 8765 # 杀掉进程 kill you-id我们可以终端请求直接看到接口监控状态,如下:
curl 192.168.237.56:9189/metrics # HELP websocket websocket_help # TYPE websocket gauge websocket{url="ws://192.168.128.6:8765"} 0稍等待一会儿,报警信息报出,内容如下:
重新运行websocket service
python websocket-server.py & curl 192.168.237.56:9189/metrics # HELP websocket websocket_help # TYPE websocket gauge websocket{url="ws://192.168.128.6:8765"} 1稍等待一会儿,恢复信息报出,内容如下:
参考库
- websocket_exporter