首页
简历
直播
统计
壁纸
留言
友链
关于
Search
1
PVE开启硬件显卡直通功能
2,556 阅读
2
在k8s(kubernetes) 上安装 ingress V1.1.0
2,059 阅读
3
二进制安装Kubernetes(k8s) v1.24.0 IPv4/IPv6双栈
1,922 阅读
4
Ubuntu 通过 Netplan 配置网络教程
1,841 阅读
5
kubernetes (k8s) 二进制高可用安装
1,793 阅读
默认分类
登录
/
注册
Search
chenby
累计撰写
199
篇文章
累计收到
144
条评论
首页
栏目
默认分类
页面
简历
直播
统计
壁纸
留言
友链
关于
搜索到
199
篇与
默认分类
的结果
2022-11-16
安装Harbor
安装Harbor安装docker# 安装 apt 依赖包 apt-get install \ apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common # 添加 Docker 的官方 GPG 密钥 curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add - # 使用以下指令设置稳定版仓库 add-apt-repository \ "deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/ \ $(lsb_release -cs) \ stable" # 安装最新版本的 Docker Engine-Community 和 containerd apt-get update apt-get install docker-ce docker-ce-cli containerd.io安装docker compose# 配置Docker Compose root@cby:~# wget https://ghproxy.com/https://github.com/docker/compose/releases/download/v2.12.2/docker-compose-linux-x86_64 root@cby:~# mv docker-compose-linux-x86_64 /usr/local/bin/docker-compose root@cby:~# chmod +x /usr/local/bin/docker-compose root@cby:~# docker-compose --version Docker Compose version v2.12.2 root@cby:~# 下载harbor安装包# 下载Docker Harbor安装包 wget https://ghproxy.com/https://github.com/goharbor/harbor/releases/download/v2.6.2/harbor-offline-installer-v2.6.2.tgz # 解压安装包 root@cby:~# tar xvf harbor-offline-installer-v2.6.2.tgz -C /usr/local/ harbor/harbor.v2.6.2.tar.gz harbor/prepare harbor/LICENSE harbor/install.sh harbor/common.sh harbor/harbor.yml.tmpl root@cby:~# cd /usr/local/harbor/创建证书# 创建ca证书目录 root@cby:/usr/local/harbor# mkdir ca root@cby:/usr/local/harbor# cd ca/ root@cby:/usr/local/harbor/ca# # 生成CA证书私钥 root@cby:/usr/local/harbor/ca# openssl genrsa -out ca.key 4096 # 生成CA证书 root@cby:/usr/local/harbor/ca# openssl req -x509 -new -nodes -sha512 -days 3650 \ -subj "/C=CN/ST=Beijing/L=Beijing/O=example/OU=Personal/CN=hb.oiox.cn" \ -key ca.key \ -out ca.crt # 生成服务器证书 生成私钥 root@cby:/usr/local/harbor/ca# openssl genrsa -out hb.oiox.cn.key 4096 # 生成证书签名请求(CSR) root@cby:/usr/local/harbor/ca# openssl req -sha512 -new \ -subj "/C=CN/ST=Beijing/L=Beijing/O=example/OU=Personal/CN=hb.oiox.cn" \ -key hb.oiox.cn.key \ -out hb.oiox.cn.csr # 生成一个x509 v3扩展文件 root@cby:/usr/local/harbor/ca# cat > v3.ext <<-EOF authorityKeyIdentifier=keyid,issuer basicConstraints=CA:FALSE keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment extendedKeyUsage = serverAuth subjectAltName = @alt_names [alt_names] DNS.1=oiox.cn DNS.2=hb.oiox.cn DNS.3=www.oiox.cn EOF # 使用该v3.ext文件为您的Harbor主机生成证书 root@cby:/usr/local/harbor/ca# openssl x509 -req -sha512 -days 3650 \ -extfile v3.ext \ -CA ca.crt -CAkey ca.key -CAcreateserial \ -in hb.oiox.cn.csr \ -out hb.oiox.cn.crt 配置docker证书# 转换crt为cert,供Docker使用,Docker守护程序将.crt文件解释为CA证书,并将.cert文件解释为客户端证书 root@cby:/usr/local/harbor/ca# openssl x509 -inform PEM -in hb.oiox.cn.crt -out hb.oiox.cn.cert # 将服务器证书,密钥和CA文件复制到Harbor主机上的Docker证书文件夹中。您必须首先创建适当的文件夹 root@cby:/usr/local/harbor/ca# mkdir -p /etc/docker/certs.d/hb.oiox.cn/ root@cby:/usr/local/harbor/ca# cp hb.oiox.cn.cert /etc/docker/certs.d/hb.oiox.cn/ root@cby:/usr/local/harbor/ca# cp hb.oiox.cn.key /etc/docker/certs.d/hb.oiox.cn/ root@cby:/usr/local/harbor/ca# cp ca.crt /etc/docker/certs.d/hb.oiox.cn/ # 如果将默认nginx端口443 映射到其他端口,请创建文件夹 # /etc/docker/certs.d/yourdomain.com:port # 重新启动Docker Engine root@cby:/usr/local/harbor/ca# systemctl restart docker查看文件# 查看目录下证书文件 root@cby:/usr/local/harbor/ca# ll total 36 drwxr-xr-x 2 root root 4096 Nov 16 06:23 ./ drwxr-xr-x 5 root root 4096 Nov 16 06:16 ../ -rw-r--r-- 1 root root 2041 Nov 16 06:20 ca.crt -rw------- 1 root root 3272 Nov 16 06:16 ca.key -rw-r--r-- 1 root root 2143 Nov 16 06:23 hb.oiox.cn.cert -rw-r--r-- 1 root root 2143 Nov 16 06:22 hb.oiox.cn.crt -rw-r--r-- 1 root root 1704 Nov 16 06:22 hb.oiox.cn.csr -rw------- 1 root root 3268 Nov 16 06:22 hb.oiox.cn.key -rw-r--r-- 1 root root 261 Nov 16 06:22 v3.ext root@cby:/usr/local/harbor/ca# 配置harbor服务# 配置harbor文件 root@cby:/usr/local/harbor# cp harbor.yml.tmpl harbor.yml root@cby:/usr/local/harbor# vim harbor.yml root@cby:/usr/local/harbor# cat harbor.yml | grep -v '^#' | grep -v '^$' | grep -v ' #' hostname: hb.oiox.cn http: port: 80 https: port: 443 certificate: /usr/local/harbor/ca/hb.oiox.cn.crt private_key: /usr/local/harbor/ca/hb.oiox.cn.key harbor_admin_password: Harbor12345 database: password: root123 max_idle_conns: 100 max_open_conns: 900 data_volume: /data trivy: ignore_unfixed: false skip_update: false offline_scan: false security_check: vuln insecure: false jobservice: max_job_workers: 10 notification: webhook_job_max_retry: 10 chart: absolute_url: disabled log: level: info local: rotate_count: 50 rotate_size: 200M location: /var/log/harbor _version: 2.6.0 proxy: http_proxy: https_proxy: no_proxy: components: - core - jobservice - trivy upload_purging: enabled: true age: 168h interval: 24h dryrun: false cache: enabled: false expire_hours: 24 root@cby:/usr/local/harbor# 安装harbor# 进行安装 root@cby:/usr/local/harbor# ./install.sh tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified tput: No value for $TERM and no -T specified [Step 0]: checking if docker is installed ... Note: docker version: 20.10.21 [Step 1]: checking docker-compose is installed ... Note: docker-compose version: 2.12.2 [Step 2]: loading Harbor images ... Loaded image: goharbor/harbor-jobservice:v2.6.2 Loaded image: goharbor/trivy-adapter-photon:v2.6.2 Loaded image: goharbor/chartmuseum-photon:v2.6.2 Loaded image: goharbor/redis-photon:v2.6.2 Loaded image: goharbor/nginx-photon:v2.6.2 Loaded image: goharbor/notary-signer-photon:v2.6.2 Loaded image: goharbor/harbor-core:v2.6.2 Loaded image: goharbor/harbor-db:v2.6.2 Loaded image: goharbor/harbor-registryctl:v2.6.2 Loaded image: goharbor/harbor-exporter:v2.6.2 Loaded image: goharbor/prepare:v2.6.2 Loaded image: goharbor/registry-photon:v2.6.2 Loaded image: goharbor/notary-server-photon:v2.6.2 Loaded image: goharbor/harbor-portal:v2.6.2 Loaded image: goharbor/harbor-log:v2.6.2 [Step 3]: preparing environment ... [Step 4]: preparing harbor configs ... prepare base dir is set to /usr/local/harbor Clearing the configuration file: /config/core/app.conf Clearing the configuration file: /config/core/env Clearing the configuration file: /config/jobservice/env Clearing the configuration file: /config/jobservice/config.yml Clearing the configuration file: /config/nginx/nginx.conf Clearing the configuration file: /config/registryctl/env Clearing the configuration file: /config/registryctl/config.yml Clearing the configuration file: /config/portal/nginx.conf Clearing the configuration file: /config/db/env Clearing the configuration file: /config/registry/passwd Clearing the configuration file: /config/registry/config.yml Clearing the configuration file: /config/log/logrotate.conf Clearing the configuration file: /config/log/rsyslog_docker.conf Generated configuration file: /config/portal/nginx.conf Generated configuration file: /config/log/logrotate.conf Generated configuration file: /config/log/rsyslog_docker.conf Generated configuration file: /config/nginx/nginx.conf Generated configuration file: /config/core/env Generated configuration file: /config/core/app.conf Generated configuration file: /config/registry/config.yml Generated configuration file: /config/registryctl/env Generated configuration file: /config/registryctl/config.yml Generated configuration file: /config/db/env Generated configuration file: /config/jobservice/env Generated configuration file: /config/jobservice/config.yml loaded secret from file: /data/secret/keys/secretkey Generated configuration file: /compose_location/docker-compose.yml Clean up the input dir Note: stopping existing Harbor instance ... [Step 5]: starting Harbor ... [+] Running 10/10 ⠿ Network harbor_harbor Created 0.0s ⠿ Container harbor-log Started 0.6s ⠿ Container harbor-portal Started 0.8s ⠿ Container registryctl Started 1.1s ⠿ Container redis Started 0.9s ⠿ Container registry Started 1.1s ⠿ Container harbor-db Started 1.2s ⠿ Container harbor-core Started 1.3s ⠿ Container nginx Started 1.9s ⠿ Container harbor-jobservice Started 2.0s ✔ ----Harbor has been installed and started successfully.---- root@cby:/usr/local/harbor# root@cby:/usr/local/harbor# root@cby:/usr/local/harbor#配置解析和docker# FQDN解析 cat > /etc/hosts <<EOF 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.8.61 k8s-master01 192.168.8.62 k8s-master02 192.168.8.63 k8s-master03 192.168.8.64 k8s-node01 192.168.8.65 k8s-node02 192.168.8.66 lb-vip 192.168.8.3 hb.oiox.cn EOF # 例如docker的配置 [root@k8s-master-1 ~]# cat > /etc/docker/daemon.json <<EOF { "registry-mirrors": [ "https://hub-mirror.c.163.com", "https://mirror.baidubce.com" ], "exec-opts": ["native.cgroupdriver=systemd"], "insecure-registries": ["hb.oiox.cn"] } EOF # 重新启动docker [root@k8s-master-1 ~]# systemctl restart docker && systemctl status docker -l测试使用# 登陆 [root@k8s-master-1 ~]# docker login hb.oiox.cn Username: admin Password: WARNING! Your password will be stored unencrypted in /root/.docker/config.json. Configure a credential helper to remove this warning. See https://docs.docker.com/engine/reference/commandline/login/#credentials-store Login Succeeded [root@k8s-master-1 ~]# # 测试使用 [root@k8s-master-1 ~]# docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/dashboard:v2.7.0 [root@k8s-master-1 ~]# docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/dashboard:v2.7.0 [root@k8s-master-1 ~]# docker push hb.oiox.cn/library/dashboard:v2.7.0 [root@k8s-master-1 ~]# docker pull hb.oiox.cn/library/dashboard:v2.7.0关于https://www.oiox.cn/https://www.oiox.cn/index.php/start-page.htmlCSDN、GitHub、51CTO、知乎、开源中国、思否、掘金、简书、华为云、阿里云、腾讯云、哔哩哔哩、今日头条、新浪微博、个人博客全网可搜《小陈运维》文章主要发布于微信公众号
2022年11月16日
733 阅读
1 评论
0 点赞
2022-11-12
Grafana Prometheus Altermanager
Grafana Prometheus Altermanager 监控系统基本概念Prometheus 是一套开源的系统监控、报警、时间序列数据库的组合,最初有 SoundCloud 开发的,后来随着越来越多公司使用,于是便独立成开源项目。Alertmanager 主要用于接收 Prometheus 发送的告警信息,它支持丰富的告警通知渠道,例如邮件、微信、钉钉、Slack 等常用沟通工具,而且很容易做到告警信息进行去重,降噪,分组等,是一款很好用的告警通知系统。Prometheus架构如下:安装Grafana服务root@cby:~# sudo apt-get install -y adduser libfontconfig1 root@cby:~# wget https://dl.grafana.com/enterprise/release/grafana-enterprise_9.2.4_amd64.deb root@cby:~# sudo dpkg -i grafana-enterprise_9.2.4_amd64.deb root@cby:~# systemctl enable --now grafana-server.service Synchronizing state of grafana-server.service with SysV service script with /lib/systemd/systemd-sysv-install. Executing: /lib/systemd/systemd-sysv-install enable grafana-server Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.service → /lib/systemd/system/grafana-server.service. root@cby:~# 安装Prometheus服务root@cby:~# wget https://github.com/prometheus/prometheus/releases/download/v2.40.1/prometheus-2.40.1.linux-amd64.tar.gz root@cby:~# tar xvf prometheus-2.40.1.linux-amd64.tar.gz prometheus-2.40.1.linux-amd64/ prometheus-2.40.1.linux-amd64/NOTICE prometheus-2.40.1.linux-amd64/prometheus prometheus-2.40.1.linux-amd64/LICENSE prometheus-2.40.1.linux-amd64/console_libraries/ prometheus-2.40.1.linux-amd64/console_libraries/menu.lib prometheus-2.40.1.linux-amd64/console_libraries/prom.lib prometheus-2.40.1.linux-amd64/promtool prometheus-2.40.1.linux-amd64/prometheus.yml prometheus-2.40.1.linux-amd64/consoles/ prometheus-2.40.1.linux-amd64/consoles/prometheus-overview.html prometheus-2.40.1.linux-amd64/consoles/prometheus.html prometheus-2.40.1.linux-amd64/consoles/node-cpu.html prometheus-2.40.1.linux-amd64/consoles/node-overview.html prometheus-2.40.1.linux-amd64/consoles/node-disk.html prometheus-2.40.1.linux-amd64/consoles/index.html.example prometheus-2.40.1.linux-amd64/consoles/node.html root@cby:~# mv prometheus-2.40.1.linux-amd64 prometheus root@cby:~# 进行全局配置root@cby:~# vim prometheus/prometheus.yml root@cby:~# cat prometheus/prometheus.yml # Prometheus全局配置项 global: scrape_interval: 15s # 设定抓取数据的周期,默认为1min evaluation_interval: 15s # 设定更新rules文件的周期,默认为1min scrape_timeout: 15s # 设定抓取数据的超时时间,默认为10s external_labels: # 额外的属性,会添加到拉取得数据并存到数据库中 monitor: 'codelab_monitor' # Alertmanager配置 alerting: alertmanagers: - static_configs: - targets: ["127.0.0.1:9093"] # 设定alertmanager和prometheus交互的接口,即alertmanager监听的ip地址和端口 # rule配置,首次读取默认加载,之后根据evaluation_interval设定的周期加载 rule_files: - "dist/*.yml" # scape配置 scrape_configs: - job_name: 'prometheus' # job_name默认写入timeseries的labels中,可以用于查询使用 scrape_interval: 15s # 抓取周期,默认采用global配置 static_configs: # 静态配置 - targets: ['127.0.0.1:9090'] # prometheus所要抓取数据的地址,即instance实例项 - job_name: 'web' scrape_interval: 15s static_configs: - targets: ['10.0.0.10:9200'] - job_name: 'node-exporter' scrape_interval: 15s file_sd_configs: - files: - "static_conf/*.yaml" refresh_interval: 1s root@cby:~# 进行写入动态配置文件内容写需要监控的主机即可root@cby:~# mkdir prometheus/static_conf/ root@cby:~# vim /prometheus/static_conf/file.yaml root@cby:~# cat /prometheus/static_conf/file.yaml - targets: ['10.0.0.1:9200'] - targets: ['10.0.0.2:9200'] - targets: ['10.0.0.3:9200'] - targets: ['10.0.0.4:9200'] - targets: ['10.0.0.5:9200'] - targets: ['10.0.0.6:9200'] - targets: ['10.0.0.7:9200'] - targets: ['10.0.0.8:9200'] - targets: ['10.0.0.9:9200'] - targets: ['10.0.0.10:9200'] - targets: ['10.0.0.11:9200'] - targets: ['10.0.0.12:9200'] - targets: ['10.0.0.13:9200'] - targets: ['10.0.0.14:9200'] - targets: ['10.0.0.15:9200'] - targets: ['10.0.0.16:9200'] - targets: ['10.0.0.17:9200'] - targets: ['10.0.0.18:9200'] - targets: ['10.0.0.19:9200'] - targets: ['10.0.0.20:9200'] - targets: ['10.0.0.21:9200'] - targets: ['10.0.0.22:9200'] - targets: ['10.0.0.23:9200'] - targets: ['10.0.0.24:9200'] - targets: ['10.0.0.25:9200'] - targets: ['10.0.0.26:9200'] - targets: ['10.0.0.27:9200'] - targets: ['10.0.0.28:9200'] - targets: ['10.0.0.29:9200'] - targets: ['10.0.0.30:9200'] - targets: ['10.0.0.31:9200'] - targets: ['10.0.0.32:9200'] - targets: ['10.0.0.33:9200'] - targets: ['10.0.0.34:9200'] - targets: ['10.0.0.35:9200'] - targets: ['10.0.0.36:9200'] - targets: ['10.0.0.37:9200'] - targets: ['10.0.0.38:9200'] - targets: ['10.0.0.39:9200'] - targets: ['10.0.0.40:9200'] - targets: ['10.0.0.41:9200'] - targets: ['10.0.0.42:9200'] - targets: ['10.0.0.43:9200'] - targets: ['10.0.0.44:9200'] - targets: ['10.0.0.45:9200'] - targets: ['10.0.0.46:9200'] - targets: ['10.0.0.47:9200'] - targets: ['10.0.0.48:9200'] - targets: ['10.0.0.49:9200'] - targets: ['10.0.0.50:9200'] - targets: ['10.0.0.51:9200'] - targets: ['10.0.0.52:9200'] - targets: ['10.0.0.53:9200'] - targets: ['10.0.0.54:9200'] - targets: ['10.0.0.55:9200'] - targets: ['10.0.0.56:9200'] - targets: ['10.0.0.57:9200'] - targets: ['10.0.0.58:9200'] - targets: ['10.0.0.59:9200'] - targets: ['10.0.0.60:9200'] - targets: ['10.0.0.61:9200'] - targets: ['10.0.0.62:9200'] - targets: ['10.0.0.63:9200'] - targets: ['10.0.0.64:9200'] - targets: ['10.0.0.65:9200'] - targets: ['10.0.0.66:9200'] - targets: ['10.0.0.67:9200'] - targets: ['10.0.0.68:9200'] - targets: ['10.0.0.69:9200'] - targets: ['10.0.0.70:9200'] - targets: ['10.0.0.71:9200'] - targets: ['10.0.0.72:9200'] - targets: ['10.0.0.73:9200'] - targets: ['10.0.0.74:9200'] - targets: ['10.0.0.75:9200'] - targets: ['10.0.0.76:9200'] - targets: ['10.0.0.77:9200'] - targets: ['10.0.0.78:9200'] - targets: ['10.0.0.79:9200'] - targets: ['10.0.0.80:9200'] - targets: ['10.0.0.81:9200'] - targets: ['10.0.0.82:9200'] - targets: ['10.0.0.83:9200'] - targets: ['10.0.0.84:9200'] - targets: ['10.0.0.85:9200'] - targets: ['10.0.0.86:9200'] - targets: ['10.0.0.87:9200'] - targets: ['10.0.0.88:9200'] - targets: ['10.0.0.89:9200'] - targets: ['10.0.0.90:9200'] - targets: ['10.0.0.91:9200'] - targets: ['10.0.0.92:9200'] - targets: ['10.0.0.93:9200'] - targets: ['10.0.0.94:9200'] - targets: ['10.0.0.95:9200'] - targets: ['10.0.0.96:9200'] - targets: ['10.0.0.97:9200'] - targets: ['10.0.0.98:9200'] - targets: ['10.0.0.99:9200'] - targets: ['10.0.0.100:9200'] - targets: ['10.0.0.101:9200'] - targets: ['10.0.0.102:9200'] - targets: ['10.0.0.103:9200'] - targets: ['10.0.0.104:9200'] - targets: ['10.0.0.105:9200'] - targets: ['10.0.0.106:9200'] - targets: ['10.0.0.107:9200'] - targets: ['10.0.0.108:9200'] - targets: ['10.0.0.109:9200'] - targets: ['10.0.0.110:9200'] - targets: ['10.0.0.111:9200'] - targets: ['10.0.0.112:9200'] - targets: ['10.0.0.113:9200'] - targets: ['10.0.0.114:9200'] - targets: ['10.0.0.115:9200'] - targets: ['10.0.0.116:9200'] - targets: ['10.0.0.117:9200'] - targets: ['10.0.0.118:9200'] - targets: ['10.0.0.119:9200'] - targets: ['10.0.0.120:9200'] - targets: ['10.0.0.121:9200'] - targets: ['10.0.0.122:9200'] - targets: ['10.0.0.123:9200'] - targets: ['10.0.0.124:9200'] - targets: ['10.0.0.125:9200'] - targets: ['10.0.0.126:9200'] - targets: ['10.0.0.127:9200'] - targets: ['10.0.0.128:9200'] - targets: ['10.0.0.129:9200'] - targets: ['10.0.0.130:9200'] - targets: ['10.0.0.131:9200'] - targets: ['10.0.0.132:9200'] - targets: ['10.0.0.133:9200'] - targets: ['10.0.0.134:9200'] - targets: ['10.0.0.135:9200'] - targets: ['10.0.0.136:9200'] - targets: ['10.0.0.137:9200'] - targets: ['10.0.0.138:9200'] - targets: ['10.0.0.139:9200'] - targets: ['10.0.0.140:9200'] - targets: ['10.0.0.141:9200'] - targets: ['10.0.0.142:9200'] - targets: ['10.0.0.143:9200'] - targets: ['10.0.0.144:9200'] - targets: ['10.0.0.145:9200'] - targets: ['10.0.0.146:9200'] - targets: ['10.0.0.147:9200'] - targets: ['10.0.0.148:9200'] - targets: ['10.0.0.149:9200'] - targets: ['10.0.0.150:9200'] - targets: ['10.0.0.151:9200'] - targets: ['10.0.0.152:9200'] - targets: ['10.0.0.153:9200'] - targets: ['10.0.0.154:9200'] - targets: ['10.0.0.155:9200'] - targets: ['10.0.0.156:9200'] - targets: ['10.0.0.157:9200'] - targets: ['10.0.0.158:9200'] - targets: ['10.0.0.159:9200'] - targets: ['10.0.0.160:9200'] - targets: ['10.0.0.161:9200'] - targets: ['10.0.0.162:9200'] - targets: ['10.0.0.163:9200'] - targets: ['10.0.0.164:9200'] - targets: ['10.0.0.165:9200'] - targets: ['10.0.0.166:9200'] - targets: ['10.0.0.167:9200'] - targets: ['10.0.0.168:9200'] - targets: ['10.0.0.169:9200'] - targets: ['10.0.0.170:9200'] - targets: ['10.0.0.171:9200'] - targets: ['10.0.0.172:9200'] - targets: ['10.0.0.173:9200'] - targets: ['10.0.0.174:9200'] - targets: ['10.0.0.175:9200'] - targets: ['10.0.0.176:9200'] - targets: ['10.0.0.177:9200'] - targets: ['10.0.0.178:9200'] - targets: ['10.0.0.179:9200'] - targets: ['10.0.0.180:9200'] - targets: ['10.0.0.181:9200'] - targets: ['10.0.0.182:9200'] - targets: ['10.0.0.183:9200'] - targets: ['10.0.0.184:9200'] - targets: ['10.0.0.185:9200'] - targets: ['10.0.0.186:9200'] - targets: ['10.0.0.187:9200'] - targets: ['10.0.0.188:9200'] - targets: ['10.0.0.189:9200'] - targets: ['10.0.0.190:9200'] - targets: ['10.0.0.191:9200'] - targets: ['10.0.0.192:9200'] - targets: ['10.0.0.193:9200'] - targets: ['10.0.0.194:9200'] - targets: ['10.0.0.195:9200'] - targets: ['10.0.0.196:9200'] - targets: ['10.0.0.197:9200'] - targets: ['10.0.0.198:9200'] - targets: ['10.0.0.199:9200'] - targets: ['10.0.0.200:9200'] - targets: ['10.0.0.201:9200'] - targets: ['10.0.0.202:9200'] - targets: ['10.0.0.203:9200'] - targets: ['10.0.0.204:9200'] - targets: ['10.0.0.205:9200'] - targets: ['10.0.0.206:9200'] - targets: ['10.0.0.207:9200'] - targets: ['10.0.0.208:9200'] - targets: ['10.0.0.209:9200'] - targets: ['10.0.0.210:9200'] - targets: ['10.0.0.211:9200'] - targets: ['10.0.0.212:9200'] - targets: ['10.0.0.213:9200'] - targets: ['10.0.0.214:9200'] - targets: ['10.0.0.215:9200'] - targets: ['10.0.0.216:9200'] - targets: ['10.0.0.217:9200'] - targets: ['10.0.0.218:9200'] - targets: ['10.0.0.219:9200'] - targets: ['10.0.0.220:9200'] - targets: ['10.0.0.221:9200'] - targets: ['10.0.0.222:9200'] - targets: ['10.0.0.223:9200'] - targets: ['10.0.0.224:9200'] - targets: ['10.0.0.225:9200'] - targets: ['10.0.0.226:9200'] - targets: ['10.0.0.227:9200'] - targets: ['10.0.0.228:9200'] - targets: ['10.0.0.229:9200'] - targets: ['10.0.0.230:9200'] - targets: ['10.0.0.231:9200'] - targets: ['10.0.0.232:9200'] - targets: ['10.0.0.233:9200'] - targets: ['10.0.0.234:9200'] - targets: ['10.0.0.235:9200'] - targets: ['10.0.0.236:9200'] - targets: ['10.0.0.237:9200'] - targets: ['10.0.0.238:9200'] - targets: ['10.0.0.239:9200'] - targets: ['10.0.0.240:9200'] - targets: ['10.0.0.241:9200'] - targets: ['10.0.0.242:9200'] - targets: ['10.0.0.243:9200'] - targets: ['10.0.0.244:9200'] - targets: ['10.0.0.245:9200'] - targets: ['10.0.0.246:9200'] - targets: ['10.0.0.247:9200'] - targets: ['10.0.0.248:9200'] - targets: ['10.0.0.249:9200'] - targets: ['10.0.0.250:9200'] - targets: ['10.0.0.251:9200'] - targets: ['10.0.0.252:9200'] - targets: ['10.0.0.253:9200'] - targets: ['10.0.0.254:9200'] - targets: ['10.0.0.255:9200'] root@cby:~# 配置开机自启服务root@cby:~# vim /lib/systemd/system/prometheus.service root@cby:~# cat /lib/systemd/system/prometheus.service [Unit] Description=Prometheus After=network-online.target [Service] Type=simple ExecStart=/prometheus/prometheus --config.file=/prometheus/prometheus.yml Restart=on-failur ExecStop=/bin/kill -9 $MAINPID [Install] WantedBy=multi-user.target root@cby:~# root@cby:~# systemctl daemon-reload root@cby:~# root@cby:~# systemctl enable --now prometheus.service Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /lib/systemd/system/prometheus.service. root@cby:~# root@cby:~# systemctl status prometheus.service 安装Node_exporter监控组件root@cby:~# wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz root@cby:~# tar xvf node_exporter-1.4.0.linux-amd64.tar.gz node_exporter-1.4.0.linux-amd64/ node_exporter-1.4.0.linux-amd64/LICENSE node_exporter-1.4.0.linux-amd64/NOTICE node_exporter-1.4.0.linux-amd64/node_exporter root@cby:~# root@cby:~# mv node_exporter-1.4.0.linux-amd64 node_exporter root@cby:~# mv prometheus / root@cby:~# mv node_exporter / 设置为开机自启root@cby:~# vim /lib/systemd/system/node_exporter.service root@cby:~# cat /lib/systemd/system/node_exporter.service [Unit] Description=node_exporter After=network-online.target [Service] Type=simple ExecStart=/node_exporter/node_exporter --web.listen-address=":9200" Restart=on-failur ExecStop=/bin/kill -9 $MAINPID [Install] WantedBy=multi-user.target root@cby:~# systemctl daemon-reload root@cby:~# root@cby:~# systemctl enable --now node_exporter.service Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /lib/systemd/system/prometheus.service. root@cby:~# root@cby:~# systemctl status node_exporter.service 下载安装alertmanager服务root@cby:~# wget https://github.com/prometheus/alertmanager/releases/download/v0.24.0/alertmanager-0.24.0.linux-amd64.tar.gz root@cby:~# tar xvf alertmanager-0.24.0.linux-amd64.tar.gz alertmanager-0.24.0.linux-amd64/ alertmanager-0.24.0.linux-amd64/alertmanager.yml alertmanager-0.24.0.linux-amd64/LICENSE alertmanager-0.24.0.linux-amd64/NOTICE alertmanager-0.24.0.linux-amd64/alertmanager alertmanager-0.24.0.linux-amd64/amtool root@cby:~# root@cby:~# mv alertmanager-0.24.0.linux-amd64 alertmanager root@cby:~# mv alertmanager / root@cby:~# 全局配置root@cby:~# vim /alertmanager/alertmanager.yml root@cby:~# cat /alertmanager/alertmanager.yml global: resolve_timeout: 5m smtp_from: 'cby@chenby.cn' smtp_smarthost: 'smtp.qiye.aliyun.com:465' smtp_auth_username: 'cby@chenby.cn' smtp_auth_password: 'xxxxxxxx' smtp_require_tls: false smtp_hello: 'chenby.cn' route: group_by: ['alertname'] group_wait: 5s group_interval: 5s repeat_interval: 5m receiver: 'email' receivers: - name: 'email' email_configs: - to: 'cby@chenby.cn' send_resolved: true inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'dev', 'instance'] root@cby:~# root@cby:~#配置告警规则规则模板建议在此网站找适合自己的 https://awesome-prometheus-alerts.grep.to/举例groups: - name: test-rules rules: - alert: InstanceDown # 告警名称 expr: up == 0 # 告警的判定条件,参考Prometheus高级查询来设定 for: 2m # 满足告警条件持续时间多久后,才会发送告警 labels: #标签项 team: node annotations: # 解析项,详细解释告警信息 summary: "{{$labels.instance}}: has been down" description: "{{$labels.instance}}: job {{$labels.job}} has been down " value: {{$value}}我的告警配置root@cby:~# mkdir /prometheus/dist/ root@cby:~# vim /prometheus/dist/123.yml root@cby:~# cat /prometheus/dist/123.yml groups: - name: generals.rules rules: - alert: PrometheusJobMissing expr: absent(up{job="prometheus"}) for: 0m labels: severity: warning annotations: summary: Prometheus job missing (instance {{ $labels.instance }}) description: "A Prometheus job has disappeared\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTargetMissing expr: up == 0 for: 0m labels: severity: critical annotations: summary: Prometheus target missing (instance {{ $labels.instance }}) description: "A Prometheus target has disappeared. An exporter might be crashed.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusAllTargetsMissing expr: sum by (job) (up) == 0 for: 0m labels: severity: critical annotations: summary: Prometheus all targets missing (instance {{ $labels.instance }}) description: "A Prometheus job does not have living target anymore.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTargetMissingWithWarmupTime expr: sum by (instance, job) ((up == 0) * on (instance) group_right(job) (node_time_seconds - node_boot_time_seconds > 600)) for: 0m labels: severity: critical annotations: summary: Prometheus target missing with warmup time (instance {{ $labels.instance }}) description: "Allow a job time to start up (10 minutes) before alerting that it's down.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusConfigurationReloadFailure expr: prometheus_config_last_reload_successful != 1 for: 0m labels: severity: warning annotations: summary: Prometheus configuration reload failure (instance {{ $labels.instance }}) description: "Prometheus configuration reload error\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTooManyRestarts expr: changes(process_start_time_seconds{job=~"prometheus|pushgateway|alertmanager"}[15m]) > 2 for: 0m labels: severity: warning annotations: summary: Prometheus too many restarts (instance {{ $labels.instance }}) description: "Prometheus has restarted more than twice in the last 15 minutes. It might be crashlooping.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusAlertmanagerJobMissing expr: absent(up{job="alertmanager"}) for: 0m labels: severity: warning annotations: summary: Prometheus AlertManager job missing (instance {{ $labels.instance }}) description: "A Prometheus AlertManager job has disappeared\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusAlertmanagerConfigurationReloadFailure expr: alertmanager_config_last_reload_successful != 1 for: 0m labels: severity: warning annotations: summary: Prometheus AlertManager configuration reload failure (instance {{ $labels.instance }}) description: "AlertManager configuration reload error\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusAlertmanagerConfigNotSynced expr: count(count_values("config_hash", alertmanager_config_hash)) > 1 for: 0m labels: severity: warning annotations: summary: Prometheus AlertManager config not synced (instance {{ $labels.instance }}) description: "Configurations of AlertManager cluster instances are out of sync\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusAlertmanagerE2eDeadManSwitch expr: vector(1) for: 0m labels: severity: critical annotations: summary: Prometheus AlertManager E2E dead man switch (instance {{ $labels.instance }}) description: "Prometheus DeadManSwitch is an always-firing alert. It's used as an end-to-end test of Prometheus through the Alertmanager.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusNotConnectedToAlertmanager expr: prometheus_notifications_alertmanagers_discovered < 1 for: 0m labels: severity: critical annotations: summary: Prometheus not connected to alertmanager (instance {{ $labels.instance }}) description: "Prometheus cannot connect the alertmanager\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusRuleEvaluationFailures expr: increase(prometheus_rule_evaluation_failures_total[3m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus rule evaluation failures (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} rule evaluation failures, leading to potentially ignored alerts.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTemplateTextExpansionFailures expr: increase(prometheus_template_text_expansion_failures_total[3m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus template text expansion failures (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} template text expansion failures\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusRuleEvaluationSlow expr: prometheus_rule_group_last_duration_seconds > prometheus_rule_group_interval_seconds for: 5m labels: severity: warning annotations: summary: Prometheus rule evaluation slow (instance {{ $labels.instance }}) description: "Prometheus rule evaluation took more time than the scheduled interval. It indicates a slower storage backend access or too complex query.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusNotificationsBacklog expr: min_over_time(prometheus_notifications_queue_length[10m]) > 0 for: 0m labels: severity: warning annotations: summary: Prometheus notifications backlog (instance {{ $labels.instance }}) description: "The Prometheus notification queue has not been empty for 10 minutes\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusAlertmanagerNotificationFailing expr: rate(alertmanager_notifications_failed_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus AlertManager notification failing (instance {{ $labels.instance }}) description: "Alertmanager is failing sending notifications\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTargetEmpty expr: prometheus_sd_discovered_targets == 0 for: 0m labels: severity: critical annotations: summary: Prometheus target empty (instance {{ $labels.instance }}) description: "Prometheus has no target in service discovery\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTargetScrapingSlow expr: prometheus_target_interval_length_seconds{quantile="0.9"} / on (interval, instance, job) prometheus_target_interval_length_seconds{quantile="0.5"} > 1.05 for: 5m labels: severity: warning annotations: summary: Prometheus target scraping slow (instance {{ $labels.instance }}) description: "Prometheus is scraping exporters slowly since it exceeded the requested interval time. Your Prometheus server is under-provisioned.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusLargeScrape expr: increase(prometheus_target_scrapes_exceeded_sample_limit_total[10m]) > 10 for: 5m labels: severity: warning annotations: summary: Prometheus large scrape (instance {{ $labels.instance }}) description: "Prometheus has many scrapes that exceed the sample limit\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTargetScrapeDuplicate expr: increase(prometheus_target_scrapes_sample_duplicate_timestamp_total[5m]) > 0 for: 0m labels: severity: warning annotations: summary: Prometheus target scrape duplicate (instance {{ $labels.instance }}) description: "Prometheus has many samples rejected due to duplicate timestamps but different values\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTsdbCheckpointCreationFailures expr: increase(prometheus_tsdb_checkpoint_creations_failed_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus TSDB checkpoint creation failures (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} checkpoint creation failures\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTsdbCheckpointDeletionFailures expr: increase(prometheus_tsdb_checkpoint_deletions_failed_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus TSDB checkpoint deletion failures (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} checkpoint deletion failures\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTsdbCompactionsFailed expr: increase(prometheus_tsdb_compactions_failed_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus TSDB compactions failed (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} TSDB compactions failures\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTsdbHeadTruncationsFailed expr: increase(prometheus_tsdb_head_truncations_failed_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus TSDB head truncations failed (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} TSDB head truncation failures\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTsdbReloadFailures expr: increase(prometheus_tsdb_reloads_failures_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus TSDB reload failures (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} TSDB reload failures\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTsdbWalCorruptions expr: increase(prometheus_tsdb_wal_corruptions_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus TSDB WAL corruptions (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} TSDB WAL corruptions\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTsdbWalTruncationsFailed expr: increase(prometheus_tsdb_wal_truncations_failed_total[1m]) > 0 for: 0m labels: severity: critical annotations: summary: Prometheus TSDB WAL truncations failed (instance {{ $labels.instance }}) description: "Prometheus encountered {{ $value }} TSDB WAL truncation failures\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: PrometheusTimeserieCardinality expr: label_replace(count by(__name__) ({__name__=~".+"}), "name", "$1", "__name__", "(.+)") > 10000 for: 0m labels: severity: warning annotations: summary: Prometheus timeserie cardinality (instance {{ $labels.instance }}) description: "The \"{{ $labels.name }}\" timeserie cardinality is getting very high: {{ $value }}\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostOutOfMemory expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 10 for: 2m labels: severity: warning annotations: summary: Host out of memory (instance {{ $labels.instance }}) description: "Node memory is filling up (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostMemoryUnderMemoryPressure expr: rate(node_vmstat_pgmajfault[1m]) > 1000 for: 2m labels: severity: warning annotations: summary: Host memory under memory pressure (instance {{ $labels.instance }}) description: "The node is under heavy memory pressure. High rate of major page faults\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostUnusualNetworkThroughputIn expr: sum by (instance) (rate(node_network_receive_bytes_total[2m])) / 1024 / 1024 > 100 for: 5m labels: severity: warning annotations: summary: Host unusual network throughput in (instance {{ $labels.instance }}) description: "Host network interfaces are probably receiving too much data (> 100 MB/s)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostUnusualNetworkThroughputOut expr: sum by (instance) (rate(node_network_transmit_bytes_total[2m])) / 1024 / 1024 > 100 for: 5m labels: severity: warning annotations: summary: Host unusual network throughput out (instance {{ $labels.instance }}) description: "Host network interfaces are probably sending too much data (> 100 MB/s)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostUnusualDiskReadRate expr: sum by (instance) (rate(node_disk_read_bytes_total[2m])) / 1024 / 1024 > 50 for: 5m labels: severity: warning annotations: summary: Host unusual disk read rate (instance {{ $labels.instance }}) description: "Disk is probably reading too much data (> 50 MB/s)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostUnusualDiskWriteRate expr: sum by (instance) (rate(node_disk_written_bytes_total[2m])) / 1024 / 1024 > 50 for: 2m labels: severity: warning annotations: summary: Host unusual disk write rate (instance {{ $labels.instance }}) description: "Disk is probably writing too much data (> 50 MB/s)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" # Please add ignored mountpoints in node_exporter parameters like # "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|run)($|/)". # Same rule using "node_filesystem_free_bytes" will fire when disk fills for non-root users. - alert: HostOutOfDiskSpace expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0 for: 2m labels: severity: warning annotations: summary: Host out of disk space (instance {{ $labels.instance }}) description: "Disk is almost full (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" # Please add ignored mountpoints in node_exporter parameters like # "--collector.filesystem.ignored-mount-points=^/(sys|proc|dev|run)($|/)". # Same rule using "node_filesystem_free_bytes" will fire when disk fills for non-root users. - alert: HostDiskWillFillIn24Hours expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes < 10 and ON (instance, device, mountpoint) predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs"}[1h], 24 * 3600) < 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0 for: 2m labels: severity: warning annotations: summary: Host disk will fill in 24 hours (instance {{ $labels.instance }}) description: "Filesystem is predicted to run out of space within the next 24 hours at current write rate\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostOutOfInodes expr: node_filesystem_files_free / node_filesystem_files * 100 < 10 and ON (instance, device, mountpoint) node_filesystem_readonly == 0 for: 2m labels: severity: warning annotations: summary: Host out of inodes (instance {{ $labels.instance }}) description: "Disk is almost running out of available inodes (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostInodesWillFillIn24Hours expr: node_filesystem_files_free / node_filesystem_files * 100 < 10 and predict_linear(node_filesystem_files_free[1h], 24 * 3600) < 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0 for: 2m labels: severity: warning annotations: summary: Host inodes will fill in 24 hours (instance {{ $labels.instance }}) description: "Filesystem is predicted to run out of inodes within the next 24 hours at current write rate\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostUnusualDiskReadLatency expr: rate(node_disk_read_time_seconds_total[1m]) / rate(node_disk_reads_completed_total[1m]) > 0.1 and rate(node_disk_reads_completed_total[1m]) > 0 for: 2m labels: severity: warning annotations: summary: Host unusual disk read latency (instance {{ $labels.instance }}) description: "Disk latency is growing (read operations > 100ms)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostUnusualDiskWriteLatency expr: rate(node_disk_write_time_seconds_total[1m]) / rate(node_disk_writes_completed_total[1m]) > 0.1 and rate(node_disk_writes_completed_total[1m]) > 0 for: 2m labels: severity: warning annotations: summary: Host unusual disk write latency (instance {{ $labels.instance }}) description: "Disk latency is growing (write operations > 100ms)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostHighCpuLoad expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[2m])) * 100) > 80 for: 0m labels: severity: warning annotations: summary: Host high CPU load (instance {{ $labels.instance }}) description: "CPU load is > 80%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostCpuStealNoisyNeighbor expr: avg by(instance) (rate(node_cpu_seconds_total{mode="steal"}[5m])) * 100 > 10 for: 0m labels: severity: warning annotations: summary: Host CPU steal noisy neighbor (instance {{ $labels.instance }}) description: "CPU steal is > 10%. A noisy neighbor is killing VM performances or a spot instance may be out of credit.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostCpuHighIowait expr: avg by (instance) (rate(node_cpu_seconds_total{mode="iowait"}[5m])) * 100 > 5 for: 0m labels: severity: warning annotations: summary: Host CPU high iowait (instance {{ $labels.instance }}) description: "CPU iowait > 5%. A high iowait means that you are disk or network bound.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" # 1000 context switches is an arbitrary number. # Alert threshold depends on nature of application. # Please read: https://github.com/samber/awesome-prometheus-alerts/issues/58 - alert: HostContextSwitching expr: (rate(node_context_switches_total[5m])) / (count without(cpu, mode) (node_cpu_seconds_total{mode="idle"})) > 1000 for: 0m labels: severity: warning annotations: summary: Host context switching (instance {{ $labels.instance }}) description: "Context switching is growing on node (> 1000 / s)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostSwapIsFillingUp expr: (1 - (node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes)) * 100 > 80 for: 2m labels: severity: warning annotations: summary: Host swap is filling up (instance {{ $labels.instance }}) description: "Swap is filling up (>80%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostSystemdServiceCrashed expr: node_systemd_unit_state{state="failed"} == 1 for: 0m labels: severity: warning annotations: summary: Host systemd service crashed (instance {{ $labels.instance }}) description: "systemd service crashed\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostPhysicalComponentTooHot expr: node_hwmon_temp_celsius > 75 for: 5m labels: severity: warning annotations: summary: Host physical component too hot (instance {{ $labels.instance }}) description: "Physical hardware component too hot\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostNodeOvertemperatureAlarm expr: node_hwmon_temp_crit_alarm_celsius == 1 for: 0m labels: severity: critical annotations: summary: Host node overtemperature alarm (instance {{ $labels.instance }}) description: "Physical node temperature alarm triggered\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostRaidArrayGotInactive expr: node_md_state{state="inactive"} > 0 for: 0m labels: severity: critical annotations: summary: Host RAID array got inactive (instance {{ $labels.instance }}) description: "RAID array {{ $labels.device }} is in degraded state due to one or more disks failures. Number of spare drives is insufficient to fix issue automatically.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostRaidDiskFailure expr: node_md_disks{state="failed"} > 0 for: 2m labels: severity: warning annotations: summary: Host RAID disk failure (instance {{ $labels.instance }}) description: "At least one device in RAID array on {{ $labels.instance }} failed. Array {{ $labels.md_device }} needs attention and possibly a disk swap\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostKernelVersionDeviations expr: count(sum(label_replace(node_uname_info, "kernel", "$1", "release", "([0-9]+.[0-9]+.[0-9]+).*")) by (kernel)) > 1 for: 6h labels: severity: warning annotations: summary: Host kernel version deviations (instance {{ $labels.instance }}) description: "Different kernel versions are running\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostOomKillDetected expr: increase(node_vmstat_oom_kill[1m]) > 0 for: 0m labels: severity: warning annotations: summary: Host OOM kill detected (instance {{ $labels.instance }}) description: "OOM kill detected\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostEdacCorrectableErrorsDetected expr: increase(node_edac_correctable_errors_total[1m]) > 0 for: 0m labels: severity: info annotations: summary: Host EDAC Correctable Errors detected (instance {{ $labels.instance }}) description: "Host {{ $labels.instance }} has had {{ printf \"%.0f\" $value }} correctable memory errors reported by EDAC in the last 5 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostEdacUncorrectableErrorsDetected expr: node_edac_uncorrectable_errors_total > 0 for: 0m labels: severity: warning annotations: summary: Host EDAC Uncorrectable Errors detected (instance {{ $labels.instance }}) description: "Host {{ $labels.instance }} has had {{ printf \"%.0f\" $value }} uncorrectable memory errors reported by EDAC in the last 5 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostNetworkReceiveErrors expr: rate(node_network_receive_errs_total[2m]) / rate(node_network_receive_packets_total[2m]) > 0.01 for: 2m labels: severity: warning annotations: summary: Host Network Receive Errors (instance {{ $labels.instance }}) description: "Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} receive errors in the last two minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostNetworkTransmitErrors expr: rate(node_network_transmit_errs_total[2m]) / rate(node_network_transmit_packets_total[2m]) > 0.01 for: 2m labels: severity: warning annotations: summary: Host Network Transmit Errors (instance {{ $labels.instance }}) description: "Host {{ $labels.instance }} interface {{ $labels.device }} has encountered {{ printf \"%.0f\" $value }} transmit errors in the last two minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostNetworkInterfaceSaturated expr: (rate(node_network_receive_bytes_total{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"}[1m]) + rate(node_network_transmit_bytes_total{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"}[1m])) / node_network_speed_bytes{device!~"^tap.*|^vnet.*|^veth.*|^tun.*"} > 0.8 < 10000 for: 1m labels: severity: warning annotations: summary: Host Network Interface Saturated (instance {{ $labels.instance }}) description: "The network interface \"{{ $labels.device }}\" on \"{{ $labels.instance }}\" is getting overloaded.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostNetworkBondDegraded expr: (node_bonding_active - node_bonding_slaves) != 0 for: 2m labels: severity: warning annotations: summary: Host Network Bond Degraded (instance {{ $labels.instance }}) description: "Bond \"{{ $labels.device }}\" degraded on \"{{ $labels.instance }}\".\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostConntrackLimit expr: node_nf_conntrack_entries / node_nf_conntrack_entries_limit > 0.8 for: 5m labels: severity: warning annotations: summary: Host conntrack limit (instance {{ $labels.instance }}) description: "The number of conntrack is approaching limit\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostClockSkew expr: (node_timex_offset_seconds > 0.05 and deriv(node_timex_offset_seconds[5m]) >= 0) or (node_timex_offset_seconds < -0.05 and deriv(node_timex_offset_seconds[5m]) <= 0) for: 2m labels: severity: warning annotations: summary: Host clock skew (instance {{ $labels.instance }}) description: "Clock skew detected. Clock is out of sync. Ensure NTP is configured correctly on this host.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostClockNotSynchronising expr: min_over_time(node_timex_sync_status[1m]) == 0 and node_timex_maxerror_seconds >= 16 for: 2m labels: severity: warning annotations: summary: Host clock not synchronising (instance {{ $labels.instance }}) description: "Clock not synchronising. Ensure NTP is configured on this host.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: HostRequiresReboot expr: node_reboot_required > 0 for: 4h labels: severity: info annotations: summary: Host requires reboot (instance {{ $labels.instance }}) description: "{{ $labels.instance }} requires a reboot.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesNodeReady expr: kube_node_status_condition{condition="Ready",status="true"} == 0 for: 10m labels: severity: critical annotations: summary: Kubernetes Node ready (instance {{ $labels.instance }}) description: "Node {{ $labels.node }} has been unready for a long time\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesMemoryPressure expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1 for: 2m labels: severity: critical annotations: summary: Kubernetes memory pressure (instance {{ $labels.instance }}) description: "{{ $labels.node }} has MemoryPressure condition\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesDiskPressure expr: kube_node_status_condition{condition="DiskPressure",status="true"} == 1 for: 2m labels: severity: critical annotations: summary: Kubernetes disk pressure (instance {{ $labels.instance }}) description: "{{ $labels.node }} has DiskPressure condition\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesNetworkUnavailable expr: kube_node_status_condition{condition="NetworkUnavailable",status="true"} == 1 for: 2m labels: severity: critical annotations: summary: Kubernetes network unavailable (instance {{ $labels.instance }}) description: "{{ $labels.node }} has NetworkUnavailable condition\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesOutOfCapacity expr: sum by (node) ((kube_pod_status_phase{phase="Running"} == 1) + on(uid) group_left(node) (0 * kube_pod_info{pod_template_hash=""})) / sum by (node) (kube_node_status_allocatable{resource="pods"}) * 100 > 90 for: 2m labels: severity: warning annotations: summary: Kubernetes out of capacity (instance {{ $labels.instance }}) description: "{{ $labels.node }} is out of capacity\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesContainerOomKiller expr: (kube_pod_container_status_restarts_total - kube_pod_container_status_restarts_total offset 10m >= 1) and ignoring (reason) min_over_time(kube_pod_container_status_last_terminated_reason{reason="OOMKilled"}[10m]) == 1 for: 0m labels: severity: warning annotations: summary: Kubernetes container oom killer (instance {{ $labels.instance }}) description: "Container {{ $labels.container }} in pod {{ $labels.namespace }}/{{ $labels.pod }} has been OOMKilled {{ $value }} times in the last 10 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesJobFailed expr: kube_job_status_failed > 0 for: 0m labels: severity: warning annotations: summary: Kubernetes Job failed (instance {{ $labels.instance }}) description: "Job {{$labels.namespace}}/{{$labels.exported_job}} failed to complete\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesCronjobSuspended expr: kube_cronjob_spec_suspend != 0 for: 0m labels: severity: warning annotations: summary: Kubernetes CronJob suspended (instance {{ $labels.instance }}) description: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is suspended\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesPersistentvolumeclaimPending expr: kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1 for: 2m labels: severity: warning annotations: summary: Kubernetes PersistentVolumeClaim pending (instance {{ $labels.instance }}) description: "PersistentVolumeClaim {{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is pending\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesVolumeOutOfDiskSpace expr: kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes * 100 < 10 for: 2m labels: severity: warning annotations: summary: Kubernetes Volume out of disk space (instance {{ $labels.instance }}) description: "Volume is almost full (< 10% left)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesVolumeFullInFourDays expr: predict_linear(kubelet_volume_stats_available_bytes[6h], 4 * 24 * 3600) < 0 for: 0m labels: severity: critical annotations: summary: Kubernetes Volume full in four days (instance {{ $labels.instance }}) description: "{{ $labels.namespace }}/{{ $labels.persistentvolumeclaim }} is expected to fill up within four days. Currently {{ $value | humanize }}% is available.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesPersistentvolumeError expr: kube_persistentvolume_status_phase{phase=~"Failed|Pending", job="kube-state-metrics"} > 0 for: 0m labels: severity: critical annotations: summary: Kubernetes PersistentVolume error (instance {{ $labels.instance }}) description: "Persistent volume is in bad state\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesStatefulsetDown expr: (kube_statefulset_status_replicas_ready / kube_statefulset_status_replicas_current) != 1 for: 1m labels: severity: critical annotations: summary: Kubernetes StatefulSet down (instance {{ $labels.instance }}) description: "A StatefulSet went down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesHpaScalingAbility expr: kube_horizontalpodautoscaler_status_condition{status="false", condition="AbleToScale"} == 1 for: 2m labels: severity: warning annotations: summary: Kubernetes HPA scaling ability (instance {{ $labels.instance }}) description: "Pod is unable to scale\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesHpaMetricAvailability expr: kube_horizontalpodautoscaler_status_condition{status="false", condition="ScalingActive"} == 1 for: 0m labels: severity: warning annotations: summary: Kubernetes HPA metric availability (instance {{ $labels.instance }}) description: "HPA is not able to collect metrics\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesHpaScaleCapability expr: kube_horizontalpodautoscaler_status_desired_replicas >= kube_horizontalpodautoscaler_spec_max_replicas for: 2m labels: severity: info annotations: summary: Kubernetes HPA scale capability (instance {{ $labels.instance }}) description: "The maximum number of desired Pods has been hit\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesPodNotHealthy expr: min_over_time(sum by (namespace, pod) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[15m:1m]) > 0 for: 0m labels: severity: critical annotations: summary: Kubernetes Pod not healthy (instance {{ $labels.instance }}) description: "Pod has been in a non-ready state for longer than 15 minutes.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesPodCrashLooping expr: increase(kube_pod_container_status_restarts_total[1m]) > 3 for: 2m labels: severity: warning annotations: summary: Kubernetes pod crash looping (instance {{ $labels.instance }}) description: "Pod {{ $labels.pod }} is crash looping\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesReplicassetMismatch expr: kube_replicaset_spec_replicas != kube_replicaset_status_ready_replicas for: 10m labels: severity: warning annotations: summary: Kubernetes ReplicasSet mismatch (instance {{ $labels.instance }}) description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesDeploymentReplicasMismatch expr: kube_deployment_spec_replicas != kube_deployment_status_replicas_available for: 10m labels: severity: warning annotations: summary: Kubernetes Deployment replicas mismatch (instance {{ $labels.instance }}) description: "Deployment Replicas mismatch\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesStatefulsetReplicasMismatch expr: kube_statefulset_status_replicas_ready != kube_statefulset_status_replicas for: 10m labels: severity: warning annotations: summary: Kubernetes StatefulSet replicas mismatch (instance {{ $labels.instance }}) description: "A StatefulSet does not match the expected number of replicas.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesDeploymentGenerationMismatch expr: kube_deployment_status_observed_generation != kube_deployment_metadata_generation for: 10m labels: severity: critical annotations: summary: Kubernetes Deployment generation mismatch (instance {{ $labels.instance }}) description: "A Deployment has failed but has not been rolled back.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesStatefulsetGenerationMismatch expr: kube_statefulset_status_observed_generation != kube_statefulset_metadata_generation for: 10m labels: severity: critical annotations: summary: Kubernetes StatefulSet generation mismatch (instance {{ $labels.instance }}) description: "A StatefulSet has failed but has not been rolled back.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesStatefulsetUpdateNotRolledOut expr: max without (revision) (kube_statefulset_status_current_revision unless kube_statefulset_status_update_revision) * (kube_statefulset_replicas != kube_statefulset_status_replicas_updated) for: 10m labels: severity: warning annotations: summary: Kubernetes StatefulSet update not rolled out (instance {{ $labels.instance }}) description: "StatefulSet update has not been rolled out.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesDaemonsetRolloutStuck expr: kube_daemonset_status_number_ready / kube_daemonset_status_desired_number_scheduled * 100 < 100 or kube_daemonset_status_desired_number_scheduled - kube_daemonset_status_current_number_scheduled > 0 for: 10m labels: severity: warning annotations: summary: Kubernetes DaemonSet rollout stuck (instance {{ $labels.instance }}) description: "Some Pods of DaemonSet are not scheduled or not ready\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesDaemonsetMisscheduled expr: kube_daemonset_status_number_misscheduled > 0 for: 1m labels: severity: critical annotations: summary: Kubernetes DaemonSet misscheduled (instance {{ $labels.instance }}) description: "Some DaemonSet Pods are running where they are not supposed to run\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesCronjobTooLong expr: time() - kube_cronjob_next_schedule_time > 3600 for: 0m labels: severity: warning annotations: summary: Kubernetes CronJob too long (instance {{ $labels.instance }}) description: "CronJob {{ $labels.namespace }}/{{ $labels.cronjob }} is taking more than 1h to complete.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesJobSlowCompletion expr: kube_job_spec_completions - kube_job_status_succeeded > 0 for: 12h labels: severity: critical annotations: summary: Kubernetes job slow completion (instance {{ $labels.instance }}) description: "Kubernetes Job {{ $labels.namespace }}/{{ $labels.job_name }} did not complete in time.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesApiServerErrors expr: sum(rate(apiserver_request_total{job="apiserver",code=~"^(?:5..)$"}[1m])) / sum(rate(apiserver_request_total{job="apiserver"}[1m])) * 100 > 3 for: 2m labels: severity: critical annotations: summary: Kubernetes API server errors (instance {{ $labels.instance }}) description: "Kubernetes API server is experiencing high error rate\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesApiClientErrors expr: (sum(rate(rest_client_requests_total{code=~"(4|5).."}[1m])) by (instance, job) / sum(rate(rest_client_requests_total[1m])) by (instance, job)) * 100 > 1 for: 2m labels: severity: critical annotations: summary: Kubernetes API client errors (instance {{ $labels.instance }}) description: "Kubernetes API client is experiencing high error rate\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesClientCertificateExpiresNextWeek expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 7*24*60*60 for: 0m labels: severity: warning annotations: summary: Kubernetes client certificate expires next week (instance {{ $labels.instance }}) description: "A client certificate used to authenticate to the apiserver is expiring next week.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesClientCertificateExpiresSoon expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0 and histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 24*60*60 for: 0m labels: severity: critical annotations: summary: Kubernetes client certificate expires soon (instance {{ $labels.instance }}) description: "A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" - alert: KubernetesApiServerLatency expr: histogram_quantile(0.99, sum(rate(apiserver_request_latencies_bucket{subresource!="log",verb!~"^(?:CONNECT|WATCHLIST|WATCH|PROXY)$"} [10m])) WITHOUT (instance, resource)) / 1e+06 > 1 for: 2m labels: severity: warning annotations: summary: Kubernetes API server latency (instance {{ $labels.instance }}) description: "Kubernetes API server has a 99th percentile latency of {{ $value }} seconds for {{ $labels.verb }} {{ $labels.resource }}.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" root@cby:~# 设置开机自启root@cby:~# vim /lib/systemd/system/alertmanager.service root@cby:~# cat /lib/systemd/system/alertmanager.service [Unit] Description=Alertmanager for Prometheus After=network-online.target [Service] Type=simple ExecStart=/alertmanager/alertmanager --config.file=/alertmanager/alertmanager.yml Restart=on-failur ExecStop=/bin/kill -9 $MAINPID [Install] WantedBy=multi-user.target root@cby:~# systemctl daemon-reload root@cby:~# root@cby:~# systemctl enable --now alertmanager.service Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /lib/systemd/system/prometheus.service. root@cby:~# root@cby:~# systemctl status alertmanager.service 访问地址# Node http://10.0.0.2:9090/ # Grafana http://10.0.0.2:3000/ # Prometheus http://10.0.0.2:9200/ # Altermanager http://10.0.0.2:9093/Grafana 链接 Prometheus异常记录邮件内容关于https://www.oiox.cn/https://www.oiox.cn/index.php/start-page.htmlCSDN、GitHub、51CTO、知乎、开源中国、思否、掘金、简书、华为云、阿里云、腾讯云、哔哩哔哩、今日头条、新浪微博、个人博客全网可搜《小陈运维》文章主要发布于微信公众号
2022年11月12日
830 阅读
0 评论
1 点赞
2022-11-09
在Ubuntu中安装Samba文件服务
在Ubuntu中安装Samba文件服务安装 samba 服务root@v:~# apt install samba samba-common root@v:~# 创建共享目录root@v:~# mkdir /cby/smb/ -pv root@v:~# chmod 777 -R /cby/smb/ root@v:~# 修改配置文件 # 编写配置文件 实现匿名访问 [share] path = /cby/smb public = yes read only = no guest ok = Yes create mask = 0644 force create mode = 0644 directory mask = 0755 force directory mode = 0755 available = yes # 完整配置如下 root@v:~# vim /etc/samba/smb.conf root@v:~# cat /etc/samba/smb.conf # # Sample configuration file for the Samba suite for Debian GNU/Linux. # # # This is the main Samba configuration file. You should read the # smb.conf(5) manual page in order to understand the options listed # here. Samba has a huge number of configurable options most of which # are not shown in this example # # Some options that are often worth tuning have been included as # commented-out examples in this file. # - When such options are commented with ";", the proposed setting # differs from the default Samba behaviour # - When commented with "#", the proposed setting is the default # behaviour of Samba but the option is considered important # enough to be mentioned here # # NOTE: Whenever you modify this file you should run the command # "testparm" to check that you have not made any basic syntactic # errors. #======================= Global Settings ======================= [global] ## Browsing/Identification ### # Change this to the workgroup/NT-domain name your Samba server will part of workgroup = WORKGROUP #### Networking #### # The specific set of interfaces / networks to bind to # This can be either the interface name or an IP address/netmask; # interface names are normally preferred ; interfaces = 127.0.0.0/8 eth0 # Only bind to the named interfaces and/or networks; you must use the # 'interfaces' option above to use this. # It is recommended that you enable this feature if your Samba machine is # not protected by a firewall or is a firewall itself. However, this # option cannot handle dynamic or non-broadcast interfaces correctly. ; bind interfaces only = yes #### Debugging/Accounting #### # This tells Samba to use a separate log file for each machine # that connects log file = /var/log/samba/log.%m # Cap the size of the individual log files (in KiB). max log size = 1000 # We want Samba to only log to /var/log/samba/log.{smbd,nmbd}. # Append syslog@1 if you want important messages to be sent to syslog too. logging = file # Do something sensible when Samba crashes: mail the admin a backtrace panic action = /usr/share/samba/panic-action %d ####### Authentication ####### # Server role. Defines in which mode Samba will operate. Possible # values are "standalone server", "member server", "classic primary # domain controller", "classic backup domain controller", "active # directory domain controller". # # Most people will want "standalone server" or "member server". # Running as "active directory domain controller" will require first # running "samba-tool domain provision" to wipe databases and create a # new domain. server role = standalone server obey pam restrictions = yes # This boolean parameter controls whether Samba attempts to sync the Unix # password with the SMB password when the encrypted SMB password in the # passdb is changed. unix password sync = yes # For Unix password sync to work on a Debian GNU/Linux system, the following # parameters must be set (thanks to Ian Kahan <<kahan@informatik.tu-muenchen.de> for # sending the correct chat script for the passwd program in Debian Sarge). passwd program = /usr/bin/passwd %u passwd chat = *Enter\snew\s*\spassword:* %n\n *Retype\snew\s*\spassword:* %n\n *password\supdated\ssuccessfully* . # This boolean controls whether PAM will be used for password changes # when requested by an SMB client instead of the program listed in # 'passwd program'. The default is 'no'. pam password change = yes # This option controls how unsuccessful authentication attempts are mapped # to anonymous connections map to guest = bad user ########## Domains ########### # # The following settings only takes effect if 'server role = classic # primary domain controller', 'server role = classic backup domain controller' # or 'domain logons' is set # # It specifies the location of the user's # profile directory from the client point of view) The following # required a [profiles] share to be setup on the samba server (see # below) ; logon path = \\%N\profiles\%U # Another common choice is storing the profile in the user's home directory # (this is Samba's default) # logon path = \\%N\%U\profile # The following setting only takes effect if 'domain logons' is set # It specifies the location of a user's home directory (from the client # point of view) ; logon drive = H: # logon home = \\%N\%U # The following setting only takes effect if 'domain logons' is set # It specifies the script to run during logon. The script must be stored # in the [netlogon] share # NOTE: Must be store in 'DOS' file format convention ; logon script = logon.cmd # This allows Unix users to be created on the domain controller via the SAMR # RPC pipe. The example command creates a user account with a disabled Unix # password; please adapt to your needs ; add user script = /usr/sbin/adduser --quiet --disabled-password --gecos "" %u # This allows machine accounts to be created on the domain controller via the # SAMR RPC pipe. # The following assumes a "machines" group exists on the system ; add machine script = /usr/sbin/useradd -g machines -c "%u machine account" -d /var/lib/samba -s /bin/false %u # This allows Unix groups to be created on the domain controller via the SAMR # RPC pipe. ; add group script = /usr/sbin/addgroup --force-badname %g ############ Misc ############ # Using the following line enables you to customise your configuration # on a per machine basis. The %m gets replaced with the netbios name # of the machine that is connecting ; include = /home/samba/etc/smb.conf.%m # Some defaults for winbind (make sure you're not using the ranges # for something else.) ; idmap config * : backend = tdb ; idmap config * : range = 3000-7999 ; idmap config YOURDOMAINHERE : backend = tdb ; idmap config YOURDOMAINHERE : range = 100000-999999 ; template shell = /bin/bash # Setup usershare options to enable non-root users to share folders # with the net usershare command. # Maximum number of usershare. 0 means that usershare is disabled. # usershare max shares = 100 # Allow users who've been granted usershare privileges to create # public shares, not just authenticated ones usershare allow guests = yes #======================= Share Definitions ======================= [homes] comment = Home Directories browseable = no # By default, the home directories are exported read-only. Change the # next parameter to 'no' if you want to be able to write to them. read only = yes # File creation mask is set to 0700 for security reasons. If you want to # create files with group=rw permissions, set next parameter to 0775. create mask = 0700 # Directory creation mask is set to 0700 for security reasons. If you want to # create dirs. with group=rw permissions, set next parameter to 0775. directory mask = 0700 # By default, \\server\username shares can be connected to by anyone # with access to the samba server. # The following parameter makes sure that only "username" can connect # to \\server\username # This might need tweaking when using external authentication schemes valid users = %S # Un-comment the following and create the netlogon directory for Domain Logons # (you need to configure Samba to act as a domain controller too.) ;[netlogon] ; comment = Network Logon Service ; path = /home/samba/netlogon ; guest ok = yes ; read only = yes # Un-comment the following and create the profiles directory to store # users profiles (see the "logon path" option above) # (you need to configure Samba to act as a domain controller too.) # The path below should be writable by all users so that their # profile directory may be created the first time they log on ;[profiles] ; comment = Users profiles ; path = /home/samba/profiles ; guest ok = no ; browseable = no ; create mask = 0600 ; directory mask = 0700 [printers] comment = All Printers browseable = no path = /var/spool/samba printable = yes guest ok = no read only = yes create mask = 0700 # Windows clients look for this share name as a source of downloadable # printer drivers [print$] comment = Printer Drivers path = /var/lib/samba/printers browseable = yes read only = yes guest ok = no # Uncomment to allow remote administration of Windows print drivers. # You may need to replace 'lpadmin' with the name of the group your # admin users are members of. # Please note that you also need to set appropriate Unix permissions # to the drivers directory for these users to have write rights in it ; write list = root, @lpadmin [share] path = /cby/smb public = yes read only = no guest ok = Yes create mask = 0644 force create mode = 0644 directory mask = 0755 force directory mode = 0755 available = yes root@v:~# 重启服务root@v:~# systemctl restart smbd root@v:~# 关于https://www.oiox.cn/https://www.oiox.cn/index.php/start-page.htmlCSDN、GitHub、51CTO、知乎、开源中国、思否、掘金、简书、华为云、阿里云、腾讯云、哔哩哔哩、今日头条、新浪微博、个人博客全网可搜《小陈运维》文章主要发布于微信公众号
2022年11月09日
860 阅读
0 评论
0 点赞
2022-10-28
OpenWRT实现NAT64/DNS64
OpenWRT实现NAT64/DNS64连接到核心路由器# 连接到核心路由器 [C:\~]$ ssh root@10.0.0.1 Connecting to 10.0.0.1:22... Connection established. To escape to local shell, press 'Ctrl+Alt+]'. WARNING! The remote SSH server rejected X11 forwarding request. BusyBox v1.35.0 (2022-10-23 20:45:02 UTC) built-in shell (ash) _______ ________ __ | |.-----.-----.-----.| | | |.----.| |_ | - || _ | -__| || | | || _|| _| |_______|| __|_____|__|__||________||__| |____| |__| W I R E L E S S F R E E D O M ----------------------------------------------------- OpenWrt 22.03.0, r19685-512e76967f ----------------------------------------------------- root@OpenWrt:~# root@OpenWrt:~# 测试访问IPv6是否正常# 测试访问IPv6是否正常 root@OpenWrt:~# ping www.oiox.cn -6 PING www.oiox.cn (2409:8c44:2:160:50::): 56 data bytes 64 bytes from 2409:8c44:2:160:50::: seq=0 ttl=56 time=23.455 ms 64 bytes from 2409:8c44:2:160:50::: seq=1 ttl=56 time=22.949 ms 64 bytes from 2409:8c44:2:160:50::: seq=2 ttl=56 time=23.338 ms 64 bytes from 2409:8c44:2:160:50::: seq=3 ttl=56 time=23.695 ms ^C --- www.oiox.cn ping statistics --- 4 packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max = 22.949/23.359/23.695 ms安装tayga实现NAT64# 安装tayga实现NAT64 root@OpenWrt:~# opkg update root@OpenWrt:~# opkg install tayga配置/etc/config/network文件# 配置/etc/config/network文件 # 重点配置 globals 和 interface 'nat64' config globals 'globals' option ula_prefix 'ddbe:48ec:56c6::/48' config interface 'nat64' option proto 'tayga' option ifname 'tayga-nat64' option ipv4_addr '192.168.1.1' option prefix 'ddbe:48ec:56c6:1111::/96' option dynamic_pool '192.168.1.0/24' option accept_ra '0' option send_rs '0' # 完整配置如下 root@OpenWrt:~# vim /etc/config/network root@OpenWrt:~# cat /etc/config/network config interface 'loopback' option device 'lo' option proto 'static' option ipaddr '127.0.0.1' option netmask '255.0.0.0' config globals 'globals' option ula_prefix 'ddbe:48ec:56c6::/48' config device option name 'br-lan' option type 'bridge' list ports 'eth0' list ports 'eth1' list ports 'eth2' config interface 'lan' option device 'br-lan' option proto 'static' option ipaddr '10.0.0.1' option netmask '255.0.0.0' option ip6assign '64' config interface 'wan' option proto 'dhcp' option device 'eth3' config interface 'wan6' option proto 'dhcpv6' option device 'eth3' option reqaddress 'try' option reqprefix 'auto' config interface 'nat64' option proto 'tayga' option ifname 'tayga-nat64' option ipv4_addr '192.168.1.1' option prefix 'ddbe:48ec:56c6:1111::/96' option dynamic_pool '192.168.1.0/24' option accept_ra '0' option send_rs '0' root@OpenWrt:~# 配置/etc/config/firewall# 配置/etc/config/firewall config zone option name 'lan' list network 'lan' option input 'ACCEPT' option output 'ACCEPT' option forward 'ACCEPT' # 完整配置如下 root@OpenWrt:~# vim /etc/config/firewall root@OpenWrt:~# cat /etc/config/firewall config defaults option input 'ACCEPT' option output 'ACCEPT' option synflood_protect '1' option forward 'ACCEPT' config zone option name 'lan' list network 'lan' option input 'ACCEPT' option output 'ACCEPT' option forward 'ACCEPT' config zone option name 'wan' list network 'wan' list network 'wan6' list network 'nat64' option input 'ACCEPT' option output 'ACCEPT' option forward 'ACCEPT' option masq '1' option mtu_fix '1' config forwarding option src 'lan' option dest 'wan' config rule option target 'ACCEPT' option name 'IPv' option src '*' option dest '*' config rule option name 'Allow-DHCP-Renew' option src 'wan' option proto 'udp' option dest_port '68' option target 'ACCEPT' option family 'ipv4' config rule option name 'Allow-Ping' option src 'wan' option proto 'icmp' option icmp_type 'echo-request' option family 'ipv4' option target 'ACCEPT' config rule option name 'Allow-IGMP' option src 'wan' option proto 'igmp' option family 'ipv4' option target 'ACCEPT' config rule option name 'Allow-DHCPv6' option src 'wan' option proto 'udp' option dest_port '546' option family 'ipv6' option target 'ACCEPT' config rule option name 'Allow-MLD' option src 'wan' option proto 'icmp' option src_ip 'fe80::/10' list icmp_type '130/0' list icmp_type '131/0' list icmp_type '132/0' list icmp_type '143/0' option family 'ipv6' option target 'ACCEPT' config rule option name 'Allow-ICMPv6-Input' option src 'wan' option proto 'icmp' list icmp_type 'echo-request' list icmp_type 'echo-reply' list icmp_type 'destination-unreachable' list icmp_type 'packet-too-big' list icmp_type 'time-exceeded' list icmp_type 'bad-header' list icmp_type 'unknown-header-type' list icmp_type 'router-solicitation' list icmp_type 'neighbour-solicitation' list icmp_type 'router-advertisement' list icmp_type 'neighbour-advertisement' option limit '1000/sec' option family 'ipv6' option target 'ACCEPT' config rule option name 'Allow-ICMPv6-Forward' option src 'wan' option dest '*' option proto 'icmp' list icmp_type 'echo-request' list icmp_type 'echo-reply' list icmp_type 'destination-unreachable' list icmp_type 'packet-too-big' list icmp_type 'time-exceeded' list icmp_type 'bad-header' list icmp_type 'unknown-header-type' option limit '1000/sec' option family 'ipv6' option target 'ACCEPT' config rule option name 'Allow-IPSec-ESP' option src 'wan' option dest 'lan' option proto 'esp' option target 'ACCEPT' config rule option name 'Allow-ISAKMP' option src 'wan' option dest 'lan' option dest_port '500' option proto 'udp' option target 'ACCEPT' root@OpenWrt:~# 重启network与firewall# 重启network与firewall root@OpenWrt:~# /etc/init.d/network restart root@OpenWrt:~# /etc/init.d/firewall restart测试tayga功能# 测试tayga功能 root@OpenWrt:~# ping -6 ddbe:48ec:56c6:1111::8.8.8.8 PING ddbe:48ec:56c6:1111::8.8.8.8 (ddbe:48ec:56c6:1111::808:808): 56 data bytes 64 bytes from ddbe:48ec:56c6:1111::808:808: seq=0 ttl=51 time=57.846 ms 64 bytes from ddbe:48ec:56c6:1111::808:808: seq=1 ttl=51 time=58.418 ms 64 bytes from ddbe:48ec:56c6:1111::808:808: seq=2 ttl=51 time=57.077 ms 64 bytes from ddbe:48ec:56c6:1111::808:808: seq=3 ttl=51 time=57.571 ms ^C --- ddbe:48ec:56c6:1111::8.8.8.8 ping statistics --- 4 packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max = 57.077/57.728/58.418 ms root@OpenWrt:~# root@OpenWrt:~# root@OpenWrt:~# ping -6 ddbe:48ec:56c6:1111::1.1.1.1 PING ddbe:48ec:56c6:1111::1.1.1.1 (ddbe:48ec:56c6:1111::101:101): 56 data bytes 64 bytes from ddbe:48ec:56c6:1111::101:101: seq=0 ttl=50 time=212.821 ms 64 bytes from ddbe:48ec:56c6:1111::101:101: seq=1 ttl=50 time=212.753 ms 64 bytes from ddbe:48ec:56c6:1111::101:101: seq=2 ttl=50 time=212.087 ms 64 bytes from ddbe:48ec:56c6:1111::101:101: seq=3 ttl=50 time=212.161 ms ^C --- ddbe:48ec:56c6:1111::1.1.1.1 ping statistics --- 4 packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max = 212.087/212.455/212.821 ms root@OpenWrt:~# 配置 bind-server 实现DNS64# 配置 bind-server 实现DNS64 root@OpenWrt:~# opkg install bind-server root@OpenWrt:~# root@OpenWrt:~# opkg install bind-rndc root@OpenWrt:~# Bind是Tayga官方最推荐的DNS软件,因此接下就使用Bind来配置DNS64功能。Bind的配置项有很多,好在官方给出了详细的https://downloads.isc.org/isc/bind9/9.16.7/doc/arm/html/reference.html#options-statement-grammarBind的配置需要修改 /etc/bind/named.conf 文件。对于DNS64来说,主要关注 forwarders 、dns64 、 dnssec-validation 这几个字段。forwarders 用来表明要把Bind作为转发器来用,在 forwarders 里面指定要将收到的DNS请求转发给那些外部的DNS服务器。dns64 这个字段需要指定在tayga中配置的NAT64前缀(这里的前缀可以有多个),并且其下面还有许多配置项。 clients 用来指定客户端ACL,来决定哪些客户端会受到DNS64的影响,默认为 any ;mapped 用来指定哪些IPv4地址要进行DNS64转换,默认为 any ; exclude 用来指定哪些出现在AAAA记录中的IPv6地址要被忽略,默认是 ::ffff:0.0.0.0/96 。dnssec-validation 用来指定是否启用DNSSEC验证。 dnssec-enable 已被废除,在这里不起作用。完整配置如下# 完整配置如下 root@OpenWrt:~# vim /etc/bind/named.conf root@OpenWrt:~# cat /etc/bind/named.conf // This is the primary configuration file for the BIND DNS server named. options { directory "/tmp"; // If your ISP provided one or more IP addresses for stable // nameservers, you probably want to use them as forwarders. // Uncomment the following block, and insert the addresses replacing // the all-0's placeholder. listen-on port 53 { any; }; listen-on-v6 port 53 { any; }; allow-query { any; }; allow-query-cache { any; }; recursion yes; allow-recursion { any; }; forwarders { // 0.0.0.0; 202.106.46.151; 202.106.0.20; //114.114.114.114; //8.8.8.8; }; dns64 ddbe:48ec:56c6:1111::/96 { clients { any; }; mapped { any; }; exclude { ddbe:48ec:56c6:1111::/96; ::ffff:0000:0000/96; }; suffix ::; }; dnssec-validation no; auth-nxdomain no; # conform to RFC1035 }; include "/etc/bind/named-rndc.conf"; include "/tmp/bind/named.conf.local"; // prime the server with knowledge of the root servers zone "." { type hint; file "/etc/bind/db.root"; }; // be authoritative for the localhost forward and reverse zones, and for // broadcast zones as per RFC 1912 zone "localhost" { type master; file "/etc/bind/db.local"; }; zone "127.in-addr.arpa" { type master; file "/etc/bind/db.127"; }; zone "0.in-addr.arpa" { type master; file "/etc/bind/db.0"; }; zone "255.in-addr.arpa" { type master; file "/etc/bind/db.255"; }; root@OpenWrt:~# # 重新DNS服务 # 关闭默认dnsmasq # 启用新安装named root@OpenWrt:~# service dnsmasq stop root@OpenWrt:~# service named start root@OpenWrt:~# 测试NAT64使用测试DNS64使用关于https://www.oiox.cn/https://www.oiox.cn/index.php/start-page.htmlCSDN、GitHub、51CTO、知乎、开源中国、思否、掘金、简书、华为云、阿里云、腾讯云、哔哩哔哩、今日头条、新浪微博、个人博客全网可搜《小陈运维》文章主要发布于微信公众号
2022年10月28日
668 阅读
0 评论
0 点赞
2022-10-27
kubernetes 的TCP 数据包可视化
kubernetes 的TCP 数据包可视化介绍k8spacket是用 Golang 编写的工具,它使用gopacket第三方库来嗅探工作负载(传入和传出)上的 TCP 数据包。它在运行的容器网络接口上创建 TCP 侦听器。当 Kubernetes 创建一个新容器时,CNI 插件负责提供与其他容器进行通信的可能性。最常见的方法是用linux namespace隔离网络并用veth pair连接隔离的 namespace 与网桥。除了bridge 类型,CNI 插件还可以使用其他类型(vlan, ipvlan,macvlan),但都为容器创建了一个网络接口,它是k8spacket嗅探器的主要句柄。k8spacket有助于了解 Kubernetes 集群中的 TCP 数据包流量:显示集群中工作负载之间的流量通知流量在集群外路由到哪里显示有关连接关闭套接字的信息显示工作负载发送/接收的字节数计算建立连接的时间显示整个集群中工作负载之间的网络连接拓扑k8spacket是一个 Kubernetes API 客户端,可以将嗅探到的工作负载解析为可视化上可见的集群资源名称(Pods和Services)。它作为DaemonSet Pod启动,使用 hostNetwork,并监听节点上的网络接口。k8spacket 收集 TCP 流、处理数据,使用 Node Graph API Grafana 数据源插件(详情请查看 Node Graph API 插件),通过 API 展示在Grafana面板。要安装k8spacket,需要同时安装 Grafana。下面将在Kind安装的 k8s 集群上做演示。添加 k8spacket 的helm源[root@k8s-master-1 ~]# helm repo add k8spacket https://k8spacket.github.io/k8spacket-helm-chart "k8spacket" has been added to your repositories [root@k8s-master-1 ~]# [root@k8s-master-1 ~]# [root@k8s-master-1 ~]# [root@k8s-master-1 ~]# helm install k8spacket --namespace k8spacket k8spacket/k8spacket --create-namespace NAME: k8spacket LAST DEPLOYED: Thu Oct 27 18:48:30 2022 NAMESPACE: k8spacket STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: 1. Get the application URL by running these commands: export NODE_PORT=$(kubectl get --namespace k8spacket -o jsonpath="{.spec.ports[0].nodePort}" services k8spacket) export NODE_IP=$(kubectl get nodes --namespace k8spacket -o jsonpath="{.items[0].status.addresses[0].address}") echo http://$NODE_IP:$NODE_PORT [root@k8s-master-1 ~]#查看 pod 信息root@hello:~# kubectl get pod -n k8spacket NAME READY STATUS RESTARTS AGE k8spacket-46587 0/1 CrashLoopBackOff 2 (23s ago) 2m24s k8spacket-9wb5q 0/1 CrashLoopBackOff 1 (6s ago) 2m24s k8spacket-grh7k 0/1 ImagePullBackOff 0 2m24s k8spacket-hcgg4 0/1 CrashLoopBackOff 1 (4s ago) 2m24s k8spacket-ng99p 0/1 CrashLoopBackOff 1 (3s ago) 2m24s k8spacket-p7hgb 0/1 CrashLoopBackOff 1 (4s ago) 2m24s k8spacket-pk4zt 0/1 CrashLoopBackOff 1 (4s ago) 2m24s k8spacket-tcksl 0/1 CrashLoopBackOff 1 (6s ago) 2m24s k8spacket-tkzcc 0/1 CrashLoopBackOff 1 (8s ago) 2m24s k8spacket-w8r5r 0/1 CrashLoopBackOff 3 (11s ago) 2m24s root@hello:~# 查看报错为 tunl0 问题[root@k8s-master-1 ~]# kubectl logs -n k8spacket k8spacket-46587 2022/10/27 13:35:36 Serving requests on port 6676 2022/10/27 13:35:36 Refreshing interfaces for capturing... 2022/10/27 13:35:36 Starting capture on interface "cilium_host" 2022/10/27 13:35:36 Starting capture on interface "tunl0" 2022/10/27 13:35:36 Starting capture on interface "lxc_health" 2022/10/27 13:35:36 Starting capture on interface "cilium_net" 2022/10/27 13:35:36 Starting capture on interface "lxcaaf84592af2d" 2022/10/27 13:35:36 Starting capture on interface "lxcc06519232b44" 2022/10/27 13:35:36 reading in packets 2022/10/27 13:35:36 reading in packets 2022/10/27 13:35:36 error opening pcap handle: tunl0: That device is not up [root@k8s-master-1 ~]#修改配置# 将 charts 包拉取到本地 在进行修改信息 [root@k8s-master-1 ~]# cd /tmp/ [root@k8s-master-1 tmp]# helm fetch k8spacket/k8spacket [root@k8s-master-1 tmp]# tar -zxf k8spacket-0.1.3.tgz [root@k8s-master-1 tmp]# cd k8spacket # 设置配置为 command: "ip address | grep @ | grep -v tunl0 | sed -E 's/.* (\\w+)@.*/\\1/' | tr '\\n' ',' | sed 's/.$//'" # 完整配置如下 [root@k8s-master-1 k8spacket]# vim values.yaml [root@k8s-master-1 k8spacket]# cat values.yaml replicaCount: 1 affinity: {} image: repository: docker.io/k8spacket/k8spacket pullPolicy: IfNotPresent serviceAccount: create: true # Annotations to add to the service account annotations: {} clusterRole: create: true nodeSelector: {} podAnnotations: {} priorityClassName: "" podSecurityContext: runAsUser: 1000 securityContext: allowPrivilegeEscalation: true capabilities: add: [ "NET_ADMIN", "NET_RAW" ] service: type: ClusterIP port: 8080 nodePort: resources: requests: memory: "1000Mi" cpu: "250m" limits: memory: "1500Mi" cpu: "500m" tolerations: [] k8sPacket: metrics: ## Hide source port when 'true' (set to string value 'dynamic' instead of decimal real source port) for Prometheus metrics cardinality reasons hideSourcePort: true reverseLookup: ## Reverse lookup db file based on GeoLite2 Free Geolocation Data ## See: https://dev.maxmind.com/geoip/geolite2-free-geolocation-data?lang=en geoipDBPath: "/home/k8spacket/GeoLite2-City.mmdb" ## Whois result match regexp whoisRegexp: "(?:OrgName:|org-name:)\\s*(.*)" tcp: listener: port: 6676 interfaces: ## Command to achieve containers network interfaces command: "ip address | grep @ | grep -v tunl0 | sed -E 's/.* (\\w+)@.*/\\1/' | tr '\\n' ',' | sed 's/.$//'" ## How often refresh the list of network interfaces to listen refreshPeriod: "10s" assembler: ## See: https://pkg.go.dev/github.com/google/gopacket/tcpassembly#AssemblerOptions maxPagesPerConnection: 50 maxPagesTotal: 50 ## Every (periodDuration) seconds, flush connections that haven't seen activity in the past (closeOlderThanDuration) seconds. flushing: periodDuration: "10s" closeOlderThanDuration: "20s" [root@k8s-master-1 k8spacket]# 重新安装 k8spacket[root@k8s-master-1 k8spacket]# helm uninstall k8spacket -n k8spacket [root@k8s-master-1 k8spacket]# helm install k8spacket --namespace k8spacket . --create-namespace NAME: k8spacket LAST DEPLOYED: Thu Oct 27 21:47:38 2022 NAMESPACE: k8spacket STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: 1. Get the application URL by running these commands: export NODE_PORT=$(kubectl get --namespace k8spacket -o jsonpath="{.spec.ports[0].nodePort}" services k8spacket) export NODE_IP=$(kubectl get nodes --namespace k8spacket -o jsonpath="{.items[0].status.addresses[0].address}") echo http://$NODE_IP:$NODE_PORT [root@k8s-master-1 k8spacket]#查看验证[root@k8s-master-1 ~]# kubectl get pod -n k8spacket -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES k8spacket-8kxnx 1/1 Running 0 4m27s 192.168.1.66 k8s-node-3 <none> <none> k8spacket-cqpks 1/1 Running 0 4m27s 192.168.1.70 k8s-node-6 <none> <none> k8spacket-h72fc 1/1 Running 0 4m27s 192.168.1.67 k8s-node-4 <none> <none> k8spacket-jkxg9 1/1 Running 0 4m27s 192.168.1.75 k8s-node-7 <none> <none> k8spacket-kgpql 1/1 Running 0 4m27s 192.168.1.62 k8s-master-2 <none> <none> k8spacket-lf9br 1/1 Running 0 4m27s 192.168.1.61 k8s-master-1 <none> <none> k8spacket-mcbv5 1/1 Running 0 4m27s 192.168.1.68 k8s-node-5 <none> <none> k8spacket-ndlzt 1/1 Running 0 4m27s 192.168.1.64 k8s-node-1 <none> <none> k8spacket-vfg2x 1/1 Running 0 4m27s 192.168.1.63 k8s-master-3 <none> <none> k8spacket-vvwtr 1/1 Running 0 4m27s 192.168.1.65 k8s-node-2 <none> <none> [root@k8s-master-1 ~]# [root@k8s-master-1 ~]# kubectl get svc -n k8spacket -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR k8spacket ClusterIP 10.110.30.53 <none> 8080/TCP 4m31s app.kubernetes.io/instance=k8spacket,app.kubernetes.io/name=k8spacket [root@k8s-master-1 ~]# 访问验证 [root@k8s-master-1 ~]# curl 10.110.30.53:8080/metrics安装 dashboards 配置[root@k8s-master-1 ~]# cd /tmp/ [root@k8s-master-1 tmp]# [root@k8s-master-1 tmp]# wget https://github.com/k8spacket/k8spacket/archive/refs/heads/master.zip [root@k8s-master-1 tmp]# unzip master.zip [root@k8s-master-1 tmp]# [root@k8s-master-1 tmp]# cd k8spacket-master [root@k8s-master-1 k8spacket-master]# [root@k8s-master-1 k8spacket-master]# kubectl apply -f ./dashboards/ configmap/k8spacket-logs-dashboard created configmap/k8spacket-metrics-dashboard created configmap/k8spacket-node-graph-dashboard created [root@k8s-master-1 k8spacket-master]#安装 Grafana[root@k8s-master-1 tmp]# helm repo add grafana https://grafana.github.io/helm-charts "grafana" has been added to your repositories [root@k8s-master-1 tmp]# helm fetch grafana/grafana [root@k8s-master-1 tmp]# [root@k8s-master-1 tmp]# tar -zxf grafana-6.43.1.tgz 修改Grafana配置内容[root@k8s-master-1 tmp]# cd grafana/ [root@k8s-master-1 grafana]# [root@k8s-master-1 grafana]# vim values.yaml 修改以下配置内容 persistence: type: pvc enabled: true env: GF_INSTALL_PLUGINS: hamedkarbasi93-nodegraphapi-datasource dashboardProviders: dashboardproviders.yaml: apiVersion: 1 providers: - name: 'default' orgId: 1 folder: '' type: file disableDeletion: false editable: true options: path: /var/lib/grafana/dashboards/default dashboardsConfigMaps: default: k8spacket-node-graph-dashboard datasources: nodegraphapi-plugin-datasource.yaml: apiVersion: 1 datasources: - name: "Node Graph API" jsonData: url: "http://k8spacket.k8spacket.svc.cluster.local:8080" access: "proxy" basicAuth: false isDefault: false readOnly: false type: "hamedkarbasi93-nodegraphapi-datasource" typeLogoUrl: "public/plugins/hamedkarbasi93-nodegraphapi-datasource/img/logo.svg" typeName: "node-graph-plugin" orgId: 1 version: 1 安装Grafana[root@k8s-master-1 grafana]# helm install grafana -f values.yaml ./ NAME: grafana LAST DEPLOYED: Thu Oct 27 22:11:27 2022 NAMESPACE: default STATUS: deployed REVISION: 1 NOTES: 1. Get your 'admin' user password by running: kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo 2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster: grafana.default.svc.cluster.local Get the Grafana URL to visit by running these commands in the same shell: export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}") kubectl --namespace default port-forward $POD_NAME 3000 3. Login with the password from step 1 and the username: admin [root@k8s-master-1 grafana]# 修改为NodePort[root@k8s-master-1 grafana]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE echo-a ClusterIP 10.108.160.226 <none> 8080/TCP 6d9h echo-b NodePort 10.108.200.169 <none> 8080:31414/TCP 6d9h echo-b-headless ClusterIP None <none> 8080/TCP 6d9h echo-b-host-headless ClusterIP None <none> <none> 6d9h grafana ClusterIP 10.101.109.183 <none> 80/TCP 4m kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d9h [root@k8s-master-1 grafana]# [root@k8s-master-1 grafana]# kubectl edit svc grafana service/grafana edited [root@k8s-master-1 grafana]# [root@k8s-master-1 grafana]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE echo-a ClusterIP 10.108.160.226 <none> 8080/TCP 6d9h echo-b NodePort 10.108.200.169 <none> 8080:31414/TCP 6d9h echo-b-headless ClusterIP None <none> 8080/TCP 6d9h echo-b-host-headless ClusterIP None <none> <none> 6d9h grafana NodePort 10.101.109.183 <none> 80:30973/TCP 4m37s kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d9h [root@k8s-master-1 grafana]# 查看Grafana密码[root@k8s-master-1 grafana]# kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo 9O1Hd9LOqJ6LKUjZTlEWAGeXRitr0CZd4p6fr00J [root@k8s-master-1 grafana]# 访问地址访问 http://192.168.1.61:30973/ 添加 Node Graph API 插件 http://192.168.1.61:30973/plugins 查看 Node Graph API 数据收集源 http://192.168.1.61:30973/datasources关于https://www.oiox.cn/https://www.oiox.cn/index.php/start-page.htmlCSDN、GitHub、51CTO、知乎、开源中国、思否、掘金、简书、华为云、阿里云、腾讯云、哔哩哔哩、今日头条、新浪微博、个人博客全网可搜《小陈运维》文章主要发布于微信公众号
2022年10月27日
454 阅读
0 评论
0 点赞
1
...
10
11
12
...
40