first commit

2022-10-18 16:59:37 +08:00 · 2022-10-18 16:59:37 +08:00 · ba848e218d
commit ba848e218d
1001 changed files with 152333 additions and 0 deletions
--- a/CloudNative/ErrorProcess/Calico异常.md
+++ b/CloudNative/ErrorProcess/Calico异常.md
@ -0,0 +1,67 @@
+calico异常，
+```
+kubectl get pod -n kube-system
+```
+显示
+```
+NAME                                        READY   STATUS    RESTARTS   AGE
+calico-kube-controllers-744cfdf676-4qqph    1/1     Running   0          31m
+calico-node-8jr59                           0/1     Running   0          20s
+calico-node-cs79v                           0/1     Running   0          20s
+calico-node-fkstd                           0/1     Running   0          20s
+coredns-7f89b7bc75-6md7d                    1/1     Running   0          53m
+coredns-7f89b7bc75-p88r5                    1/1     Running   0          53m
+etcd-kubernetes-master                      1/1     Running   0          53m
+kube-apiserver-kubernetes-master            1/1     Running   0          53m
+kube-controller-manager-kubernetes-master   1/1     Running   0          53m
+kube-proxy-6tfvm                            1/1     Running   0          26m
+kube-proxy-mgqv2                            1/1     Running   0          26m
+kube-proxy-v25vl                            1/1     Running   0          53m
+kube-scheduler-kubernetes-master            1/1     Running   0          53m
+```
+查看pod描述信息
+```
+kubectl describe pod calico-node-npjjr -n kube-system
+```
+显示
+```
+Warning  Unhealthy  72s   kubelet            Readiness probe failed: 2020-12-18 13:55:29.276 [INFO][120] confd/health.go 180: Number of node(s) with BGP peering established = 0
+calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
+  Warning  Unhealthy  62s  kubelet  Readiness probe failed: 2020-12-18 13:55:39.278 [INFO][156] confd/health.go 180: Number of node(s) with BGP peering established = 0
+calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
+  Warning  Unhealthy  52s  kubelet  Readiness probe failed: 2020-12-18 13:55:49.283 [INFO][189] confd/health.go 180: Number of node(s) with BGP peering established = 0
+calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
+  Warning  Unhealthy  42s  kubelet  Readiness probe failed: 2020-12-18 13:55:59.279 [INFO][215] confd/health.go 180: Number of node(s) with BGP peering established = 0
+calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
+  Warning  Unhealthy  32s  kubelet  Readiness probe failed: 2020-12-18 13:56:09.280 [INFO][249] confd/health.go 180: Number of node(s) with BGP peering established = 0
+calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
+  Warning  Unhealthy  22s  kubelet  Readiness probe failed: 2020-12-18 13:56:19.276 [INFO][276] confd/health.go 180: Number of node(s) with BGP peering established = 0
+calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
+  Warning  Unhealthy  12s  kubelet  Readiness probe failed: 2020-12-18 13:56:29.276 [INFO][302] confd/health.go 180: Number of node(s) with BGP peering established = 0
+calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
+  Warning  Unhealthy  2s  kubelet  Readiness probe failed: 2020-12-18 13:56:39.272 [INFO][335] confd/health.go 180: Number of node(s) with BGP peering established = 0
+calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
+```
+修改calico.yaml文件
+```
+/*
+调整calicao 网络插件的网卡发现机制，修改IP_AUTODETECTION_METHOD对应的value值。官方提供的yaml文件中，ip识别策略（IPDETECTMETHOD）没有配置，即默认为first-found，这会导致一个网络异常的ip作为nodeIP被注册，从而影响node-to-node mesh。我们可以修改成can-reach或者interface的策略，尝试连接某一个Ready的node的IP，以此选择出正确的IP。
+*/
+
+// calico.yaml 文件添加以下二行
+            - name: IP_AUTODETECTION_METHOD
+              value: "interface=ens.*"  # ens 根据实际网卡开头配置
+ 
+ // 配置如下             
+            - name: CLUSTER_TYPE
+              value: "k8s,bgp"
+            - name: IP_AUTODETECTION_METHOD
+              value: "interface=ens.*"
+              #或者 value: "interface=ens160"
+            # Auto-detect the BGP IP address.
+            - name: IP
+              value: "autodetect"
+            # Enable IPIP
+            - name: CALICO_IPV4POOL_IPIP
+              value: "Always"
+```
--- a/CloudNative/ErrorProcess/强制删除异常资源.md
+++ b/CloudNative/ErrorProcess/强制删除异常资源.md
@ -0,0 +1,12 @@
+## 1、强制删除pod
+```
+kubectl delete pod -n <namespace> <podname> --force --grace-period=0
+```
+## 2、强制删除pv
+```
+kubectl patch pv <pvname> -p '{"metadata":{"finalizers":null}}'
+```
+## 3、强制删除pvc
+```
+kubectl patch pvc <pvcname> -n <namespace> -p '{"metadata":{"finalizers":null}}'
+```
--- a/CloudNative/ErrorProcess/无法初始化k8s集群.md
+++ b/CloudNative/ErrorProcess/无法初始化k8s集群.md
@ -0,0 +1,19 @@
+初始化集群报以下错误：
+```
+error execution phase preflight: [preflight] Some fatal errors occurred:
+	[ERROR CRI]: container runtime is not running: output: E0526 20:04:52.510582   13459 remote_runtime.go:925] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
+```
+解决方法：
+```
+rm /etc/containerd/config.toml
+systemctl restart containerd
+```
+或者
+```
+cat > /etc/containerd/config.toml <<EOF
+[plugins."io.containerd.grpc.v1.cri"]
+  systemd_cgroup = true
+EOF
+systemctl restart containerd
+
+```
--- a/CloudNative/ErrorProcess/解决Terminating状态的Pod删不掉的问题.md
+++ b/CloudNative/ErrorProcess/解决Terminating状态的Pod删不掉的问题.md
@ -0,0 +1,22 @@
+#### 1，问题描述
+1、最近发现咋 Kubernetes 集群上有个处于 Terminating 状态的 Pod：
+```
+[ec2-user@k8s-master01 ~]$ kubectl get pod -n infra 
+NAME                       READY   STATUS        RESTARTS     AGE
+jenkins-5c54cf5557-nz4l2   1/1     Terminating   2 (8d ago)   14d
+```
+2、使用命令删除不动
+```
+[ec2-user@k8s-master01 ~]$ kubectl delete pod -n infra jenkins-5c54cf5557-nz4l2 
+pod "jenkins-5c54cf5557-nz4l2" deleted
+
+```
+#### 2，解决办法
+1、无论各种方式生成的 pod, 均可以使用如下命令强制删除：
+```
+kubectl delete pods <pod> --grace-period=0 --force
+```
+2、以上pod删除命令如下
+```
+kubectl delete pod -n infra jenkins-5c54cf5557-nz4l2 --grace-period=0 --force
+```
--- a/CloudNative/ErrorProcess/解决用文件创建cm格式混乱问题.md
+++ b/CloudNative/ErrorProcess/解决用文件创建cm格式混乱问题.md
@ -0,0 +1,3 @@
+```
+kubectl get cm [YOUR CONFIGMAP NAME] -o yaml | sed -E 's/[[:space:]]+\\n/\\n/g' | kubectl apply -f -
+```
--- a/CloudNative/ErrorProcess/记一次挖矿程序删除处理.md
+++ b/CloudNative/ErrorProcess/记一次挖矿程序删除处理.md
@ -0,0 +1,103 @@
+#### 现象
+> **服务器CPU内存暴涨，在服务器上操作卡顿**
+
+##### 查看服务器负载及异常进程PID
+```
+top
+```
+结果如下(*结果较多，已去掉无关项*)
+```
+top - 15:52:08 up 13 days,  6:21,  3 users,  load average: 3.52, 3.23, 3.04
+Tasks: 226 total,   1 running, 225 sleeping,   0 stopped,   0 zombie
+%Cpu(s):  1.5 us,  0.5 sy, 98.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+KiB Mem :  8173400 total,    50392 free,  7783940 used,   339068 buff/cache
+KiB Swap:        0 total,        0 free,        0 used.   146592 avail Mem 
+
+  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                     
+25586 root      20   0 2439064 2.289g      4 S 190.1 29.4 120:20.64 server 
+```
+键盘输入“c”，显示进程完整的COMMAND列，如下
+```
+top - 15:52:23 up 13 days,  6:22,  3 users,  load average: 3.72, 3.28, 3.06
+Tasks: 227 total,   1 running, 226 sleeping,   0 stopped,   0 zombie
+%Cpu(s):  3.0 us,  1.3 sy, 95.7 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+KiB Mem :  8173400 total,    47780 free,  7787388 used,   338232 buff/cache
+KiB Swap:        0 total,        0 free,        0 used.   142808 avail Mem 
+
+  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                                     
+25586 root      20   0 2439064 2.289g      4 S 186.1 29.4 120:47.58 /opt/server 
+```
+由以上信息可知，进程执行文件为/opt/server，pid为25586
+#### 查看进程执行文件
+发现/opt/server文件不存在，/proc/25586/目录下也显示exe文件不存在
+#### 查看定时任务是否异常
+```
+cat /etc/passwd | awk -F: '{print $1}' | xargs -I {} crontab -l -u {}
+```
+发现系统无定时任务
+#### 查看/tmp目录下是否存在异常目录或文件
+发现/tmp目录正常
+#### 查看进程父进程
+```
+ps -ef | grep server
+```
+结果为
+```
+root     25586  1793 99 14:45 ?        02:08:41 /opt/server
+```
+父进程为1793
+#### 查看父进程
+```
+ll  /proc/1793/
+```
+其中exe为
+```
+lrwxrwxrwx   1 root root 0 Jan 11 15:56 exe -> /bin/busybox*
+```
+考虑服务器并没有使用busybox，但是docker常用
+#### 查看docker进程
+```
+docker ps -a
+```
+结果如下
+```
+CONTAINER ID        IMAGE                                 COMMAND                  CREATED             STATUS                    PORTS                       NAMES
+169486212d4b        zqbxacdsx    "#(nop)"   3 days ago          Up 3 days (healthy)                                   harbor-jobservice
+```
+果然发现存在奇怪的容器，镜像名称为zqbxacdsx
+#### 查看docker镜像id
+```
+docker images
+```
+结果为aa05538acecf
+#### 查看镜像构建过程
+```
+## aa05538acecf 镜像ID
+docker history aa05538acecf --no-trunc 
+```
+发现该镜像只是增加了一个脚本main.sh
+#### 查看main.sh内容
+进入容器中
+```
+## 169486212d4b 容器ID
+docker exec -it 169486212d4b /bin/sh
+```
+查看main.sh
+```
+cat main.sh
+```
+果然，main.sh是一个自动下载挖矿程序的脚本
+#### 停止挖矿容器
+```
+docker stop 169486212d4b
+```
+#### 删除挖矿容器
+```
+docker rm 169486212d4b
+```
+#### 删除挖矿镜像
+```
+docker rmi aa05538acecf
+```
+观察一段时间，发现异常进程未重新启动，服务器运行平稳
+至此，挖矿病毒处理完成。接下来防火墙关闭必要端口，docker配置加固