first commit

This commit is contained in:
iProbe 2022-10-18 16:59:37 +08:00
commit ba848e218d
1001 changed files with 152333 additions and 0 deletions

View file

@ -0,0 +1,67 @@
calico异常
```
kubectl get pod -n kube-system
```
显示
```
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-744cfdf676-4qqph 1/1 Running 0 31m
calico-node-8jr59 0/1 Running 0 20s
calico-node-cs79v 0/1 Running 0 20s
calico-node-fkstd 0/1 Running 0 20s
coredns-7f89b7bc75-6md7d 1/1 Running 0 53m
coredns-7f89b7bc75-p88r5 1/1 Running 0 53m
etcd-kubernetes-master 1/1 Running 0 53m
kube-apiserver-kubernetes-master 1/1 Running 0 53m
kube-controller-manager-kubernetes-master 1/1 Running 0 53m
kube-proxy-6tfvm 1/1 Running 0 26m
kube-proxy-mgqv2 1/1 Running 0 26m
kube-proxy-v25vl 1/1 Running 0 53m
kube-scheduler-kubernetes-master 1/1 Running 0 53m
```
查看pod描述信息
```
kubectl describe pod calico-node-npjjr -n kube-system
```
显示
```
Warning Unhealthy 72s kubelet Readiness probe failed: 2020-12-18 13:55:29.276 [INFO][120] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
Warning Unhealthy 62s kubelet Readiness probe failed: 2020-12-18 13:55:39.278 [INFO][156] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
Warning Unhealthy 52s kubelet Readiness probe failed: 2020-12-18 13:55:49.283 [INFO][189] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
Warning Unhealthy 42s kubelet Readiness probe failed: 2020-12-18 13:55:59.279 [INFO][215] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
Warning Unhealthy 32s kubelet Readiness probe failed: 2020-12-18 13:56:09.280 [INFO][249] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
Warning Unhealthy 22s kubelet Readiness probe failed: 2020-12-18 13:56:19.276 [INFO][276] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
Warning Unhealthy 12s kubelet Readiness probe failed: 2020-12-18 13:56:29.276 [INFO][302] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
Warning Unhealthy 2s kubelet Readiness probe failed: 2020-12-18 13:56:39.272 [INFO][335] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
```
修改calico.yaml文件
```
/*
调整calicao 网络插件的网卡发现机制修改IP_AUTODETECTION_METHOD对应的value值。官方提供的yaml文件中ip识别策略IPDETECTMETHOD没有配置即默认为first-found这会导致一个网络异常的ip作为nodeIP被注册从而影响node-to-node mesh。我们可以修改成can-reach或者interface的策略尝试连接某一个Ready的node的IP以此选择出正确的IP。
*/
// calico.yaml 文件添加以下二行
- name: IP_AUTODETECTION_METHOD
value: "interface=ens.*" # ens 根据实际网卡开头配置
// 配置如下
- name: CLUSTER_TYPE
value: "k8s,bgp"
- name: IP_AUTODETECTION_METHOD
value: "interface=ens.*"
#或者 value: "interface=ens160"
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
# Enable IPIP
- name: CALICO_IPV4POOL_IPIP
value: "Always"
```

View file

@ -0,0 +1,12 @@
## 1、强制删除pod
```
kubectl delete pod -n <namespace> <podname> --force --grace-period=0
```
## 2、强制删除pv
```
kubectl patch pv <pvname> -p '{"metadata":{"finalizers":null}}'
```
## 3、强制删除pvc
```
kubectl patch pvc <pvcname> -n <namespace> -p '{"metadata":{"finalizers":null}}'
```

View file

@ -0,0 +1,19 @@
初始化集群报以下错误:
```
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: E0526 20:04:52.510582 13459 remote_runtime.go:925] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
```
解决方法:
```
rm /etc/containerd/config.toml
systemctl restart containerd
```
或者
```
cat > /etc/containerd/config.toml <<EOF
[plugins."io.containerd.grpc.v1.cri"]
systemd_cgroup = true
EOF
systemctl restart containerd
```

View file

@ -0,0 +1,22 @@
#### 1问题描述
1、最近发现咋 Kubernetes 集群上有个处于 Terminating 状态的 Pod
```
[ec2-user@k8s-master01 ~]$ kubectl get pod -n infra
NAME READY STATUS RESTARTS AGE
jenkins-5c54cf5557-nz4l2 1/1 Terminating 2 (8d ago) 14d
```
2、使用命令删除不动
```
[ec2-user@k8s-master01 ~]$ kubectl delete pod -n infra jenkins-5c54cf5557-nz4l2
pod "jenkins-5c54cf5557-nz4l2" deleted
```
#### 2解决办法
1、无论各种方式生成的 pod, 均可以使用如下命令强制删除:
```
kubectl delete pods <pod> --grace-period=0 --force
```
2、以上pod删除命令如下
```
kubectl delete pod -n infra jenkins-5c54cf5557-nz4l2 --grace-period=0 --force
```

View file

@ -0,0 +1,3 @@
```
kubectl get cm [YOUR CONFIGMAP NAME] -o yaml | sed -E 's/[[:space:]]+\\n/\\n/g' | kubectl apply -f -
```

View file

@ -0,0 +1,103 @@
#### 现象
> **服务器CPU内存暴涨在服务器上操作卡顿**
##### 查看服务器负载及异常进程PID
```
top
```
结果如下(*结果较多,已去掉无关项*)
```
top - 15:52:08 up 13 days, 6:21, 3 users, load average: 3.52, 3.23, 3.04
Tasks: 226 total, 1 running, 225 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.5 sy, 98.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8173400 total, 50392 free, 7783940 used, 339068 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 146592 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25586 root 20 0 2439064 2.289g 4 S 190.1 29.4 120:20.64 server
```
键盘输入“c”显示进程完整的COMMAND列如下
```
top - 15:52:23 up 13 days, 6:22, 3 users, load average: 3.72, 3.28, 3.06
Tasks: 227 total, 1 running, 226 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.0 us, 1.3 sy, 95.7 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8173400 total, 47780 free, 7787388 used, 338232 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 142808 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25586 root 20 0 2439064 2.289g 4 S 186.1 29.4 120:47.58 /opt/server
```
由以上信息可知,进程执行文件为/opt/serverpid为25586
#### 查看进程执行文件
发现/opt/server文件不存在/proc/25586/目录下也显示exe文件不存在
#### 查看定时任务是否异常
```
cat /etc/passwd | awk -F: '{print $1}' | xargs -I {} crontab -l -u {}
```
发现系统无定时任务
#### 查看/tmp目录下是否存在异常目录或文件
发现/tmp目录正常
#### 查看进程父进程
```
ps -ef | grep server
```
结果为
```
root 25586 1793 99 14:45 ? 02:08:41 /opt/server
```
父进程为1793
#### 查看父进程
```
ll /proc/1793/
```
其中exe为
```
lrwxrwxrwx 1 root root 0 Jan 11 15:56 exe -> /bin/busybox*
```
考虑服务器并没有使用busybox但是docker常用
#### 查看docker进程
```
docker ps -a
```
结果如下
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
169486212d4b zqbxacdsx "#(nop)" 3 days ago Up 3 days (healthy) harbor-jobservice
```
果然发现存在奇怪的容器镜像名称为zqbxacdsx
#### 查看docker镜像id
```
docker images
```
结果为aa05538acecf
#### 查看镜像构建过程
```
## aa05538acecf 镜像ID
docker history aa05538acecf --no-trunc
```
发现该镜像只是增加了一个脚本main.sh
#### 查看main.sh内容
进入容器中
```
## 169486212d4b 容器ID
docker exec -it 169486212d4b /bin/sh
```
查看main.sh
```
cat main.sh
```
果然main.sh是一个自动下载挖矿程序的脚本
#### 停止挖矿容器
```
docker stop 169486212d4b
```
#### 删除挖矿容器
```
docker rm 169486212d4b
```
#### 删除挖矿镜像
```
docker rmi aa05538acecf
```
观察一段时间,发现异常进程未重新启动,服务器运行平稳
至此挖矿病毒处理完成。接下来防火墙关闭必要端口docker配置加固