first commit
This commit is contained in:
commit
ba848e218d
1001 changed files with 152333 additions and 0 deletions
67
CloudNative/ErrorProcess/Calico异常.md
Normal file
67
CloudNative/ErrorProcess/Calico异常.md
Normal file
|
@ -0,0 +1,67 @@
|
|||
calico异常,
|
||||
```
|
||||
kubectl get pod -n kube-system
|
||||
```
|
||||
显示
|
||||
```
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
calico-kube-controllers-744cfdf676-4qqph 1/1 Running 0 31m
|
||||
calico-node-8jr59 0/1 Running 0 20s
|
||||
calico-node-cs79v 0/1 Running 0 20s
|
||||
calico-node-fkstd 0/1 Running 0 20s
|
||||
coredns-7f89b7bc75-6md7d 1/1 Running 0 53m
|
||||
coredns-7f89b7bc75-p88r5 1/1 Running 0 53m
|
||||
etcd-kubernetes-master 1/1 Running 0 53m
|
||||
kube-apiserver-kubernetes-master 1/1 Running 0 53m
|
||||
kube-controller-manager-kubernetes-master 1/1 Running 0 53m
|
||||
kube-proxy-6tfvm 1/1 Running 0 26m
|
||||
kube-proxy-mgqv2 1/1 Running 0 26m
|
||||
kube-proxy-v25vl 1/1 Running 0 53m
|
||||
kube-scheduler-kubernetes-master 1/1 Running 0 53m
|
||||
```
|
||||
查看pod描述信息
|
||||
```
|
||||
kubectl describe pod calico-node-npjjr -n kube-system
|
||||
```
|
||||
显示
|
||||
```
|
||||
Warning Unhealthy 72s kubelet Readiness probe failed: 2020-12-18 13:55:29.276 [INFO][120] confd/health.go 180: Number of node(s) with BGP peering established = 0
|
||||
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
|
||||
Warning Unhealthy 62s kubelet Readiness probe failed: 2020-12-18 13:55:39.278 [INFO][156] confd/health.go 180: Number of node(s) with BGP peering established = 0
|
||||
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
|
||||
Warning Unhealthy 52s kubelet Readiness probe failed: 2020-12-18 13:55:49.283 [INFO][189] confd/health.go 180: Number of node(s) with BGP peering established = 0
|
||||
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
|
||||
Warning Unhealthy 42s kubelet Readiness probe failed: 2020-12-18 13:55:59.279 [INFO][215] confd/health.go 180: Number of node(s) with BGP peering established = 0
|
||||
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
|
||||
Warning Unhealthy 32s kubelet Readiness probe failed: 2020-12-18 13:56:09.280 [INFO][249] confd/health.go 180: Number of node(s) with BGP peering established = 0
|
||||
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
|
||||
Warning Unhealthy 22s kubelet Readiness probe failed: 2020-12-18 13:56:19.276 [INFO][276] confd/health.go 180: Number of node(s) with BGP peering established = 0
|
||||
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
|
||||
Warning Unhealthy 12s kubelet Readiness probe failed: 2020-12-18 13:56:29.276 [INFO][302] confd/health.go 180: Number of node(s) with BGP peering established = 0
|
||||
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
|
||||
Warning Unhealthy 2s kubelet Readiness probe failed: 2020-12-18 13:56:39.272 [INFO][335] confd/health.go 180: Number of node(s) with BGP peering established = 0
|
||||
calico/node is not ready: BIRD is not ready: BGP not established with 172.17.0.9,172.17.0.3
|
||||
```
|
||||
修改calico.yaml文件
|
||||
```
|
||||
/*
|
||||
调整calicao 网络插件的网卡发现机制,修改IP_AUTODETECTION_METHOD对应的value值。官方提供的yaml文件中,ip识别策略(IPDETECTMETHOD)没有配置,即默认为first-found,这会导致一个网络异常的ip作为nodeIP被注册,从而影响node-to-node mesh。我们可以修改成can-reach或者interface的策略,尝试连接某一个Ready的node的IP,以此选择出正确的IP。
|
||||
*/
|
||||
|
||||
// calico.yaml 文件添加以下二行
|
||||
- name: IP_AUTODETECTION_METHOD
|
||||
value: "interface=ens.*" # ens 根据实际网卡开头配置
|
||||
|
||||
// 配置如下
|
||||
- name: CLUSTER_TYPE
|
||||
value: "k8s,bgp"
|
||||
- name: IP_AUTODETECTION_METHOD
|
||||
value: "interface=ens.*"
|
||||
#或者 value: "interface=ens160"
|
||||
# Auto-detect the BGP IP address.
|
||||
- name: IP
|
||||
value: "autodetect"
|
||||
# Enable IPIP
|
||||
- name: CALICO_IPV4POOL_IPIP
|
||||
value: "Always"
|
||||
```
|
12
CloudNative/ErrorProcess/强制删除异常资源.md
Normal file
12
CloudNative/ErrorProcess/强制删除异常资源.md
Normal file
|
@ -0,0 +1,12 @@
|
|||
## 1、强制删除pod
|
||||
```
|
||||
kubectl delete pod -n <namespace> <podname> --force --grace-period=0
|
||||
```
|
||||
## 2、强制删除pv
|
||||
```
|
||||
kubectl patch pv <pvname> -p '{"metadata":{"finalizers":null}}'
|
||||
```
|
||||
## 3、强制删除pvc
|
||||
```
|
||||
kubectl patch pvc <pvcname> -n <namespace> -p '{"metadata":{"finalizers":null}}'
|
||||
```
|
19
CloudNative/ErrorProcess/无法初始化k8s集群.md
Normal file
19
CloudNative/ErrorProcess/无法初始化k8s集群.md
Normal file
|
@ -0,0 +1,19 @@
|
|||
初始化集群报以下错误:
|
||||
```
|
||||
error execution phase preflight: [preflight] Some fatal errors occurred:
|
||||
[ERROR CRI]: container runtime is not running: output: E0526 20:04:52.510582 13459 remote_runtime.go:925] "Status from runtime service failed" err="rpc error: code = Unimplemented desc = unknown service runtime.v1alpha2.RuntimeService"
|
||||
```
|
||||
解决方法:
|
||||
```
|
||||
rm /etc/containerd/config.toml
|
||||
systemctl restart containerd
|
||||
```
|
||||
或者
|
||||
```
|
||||
cat > /etc/containerd/config.toml <<EOF
|
||||
[plugins."io.containerd.grpc.v1.cri"]
|
||||
systemd_cgroup = true
|
||||
EOF
|
||||
systemctl restart containerd
|
||||
|
||||
```
|
22
CloudNative/ErrorProcess/解决Terminating状态的Pod删不掉的问题.md
Normal file
22
CloudNative/ErrorProcess/解决Terminating状态的Pod删不掉的问题.md
Normal file
|
@ -0,0 +1,22 @@
|
|||
#### 1,问题描述
|
||||
1、最近发现咋 Kubernetes 集群上有个处于 Terminating 状态的 Pod:
|
||||
```
|
||||
[ec2-user@k8s-master01 ~]$ kubectl get pod -n infra
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
jenkins-5c54cf5557-nz4l2 1/1 Terminating 2 (8d ago) 14d
|
||||
```
|
||||
2、使用命令删除不动
|
||||
```
|
||||
[ec2-user@k8s-master01 ~]$ kubectl delete pod -n infra jenkins-5c54cf5557-nz4l2
|
||||
pod "jenkins-5c54cf5557-nz4l2" deleted
|
||||
|
||||
```
|
||||
#### 2,解决办法
|
||||
1、无论各种方式生成的 pod, 均可以使用如下命令强制删除:
|
||||
```
|
||||
kubectl delete pods <pod> --grace-period=0 --force
|
||||
```
|
||||
2、以上pod删除命令如下
|
||||
```
|
||||
kubectl delete pod -n infra jenkins-5c54cf5557-nz4l2 --grace-period=0 --force
|
||||
```
|
3
CloudNative/ErrorProcess/解决用文件创建cm格式混乱问题.md
Normal file
3
CloudNative/ErrorProcess/解决用文件创建cm格式混乱问题.md
Normal file
|
@ -0,0 +1,3 @@
|
|||
```
|
||||
kubectl get cm [YOUR CONFIGMAP NAME] -o yaml | sed -E 's/[[:space:]]+\\n/\\n/g' | kubectl apply -f -
|
||||
```
|
103
CloudNative/ErrorProcess/记一次挖矿程序删除处理.md
Normal file
103
CloudNative/ErrorProcess/记一次挖矿程序删除处理.md
Normal file
|
@ -0,0 +1,103 @@
|
|||
#### 现象
|
||||
> **服务器CPU内存暴涨,在服务器上操作卡顿**
|
||||
|
||||
##### 查看服务器负载及异常进程PID
|
||||
```
|
||||
top
|
||||
```
|
||||
结果如下(*结果较多,已去掉无关项*)
|
||||
```
|
||||
top - 15:52:08 up 13 days, 6:21, 3 users, load average: 3.52, 3.23, 3.04
|
||||
Tasks: 226 total, 1 running, 225 sleeping, 0 stopped, 0 zombie
|
||||
%Cpu(s): 1.5 us, 0.5 sy, 98.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
|
||||
KiB Mem : 8173400 total, 50392 free, 7783940 used, 339068 buff/cache
|
||||
KiB Swap: 0 total, 0 free, 0 used. 146592 avail Mem
|
||||
|
||||
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
||||
25586 root 20 0 2439064 2.289g 4 S 190.1 29.4 120:20.64 server
|
||||
```
|
||||
键盘输入“c”,显示进程完整的COMMAND列,如下
|
||||
```
|
||||
top - 15:52:23 up 13 days, 6:22, 3 users, load average: 3.72, 3.28, 3.06
|
||||
Tasks: 227 total, 1 running, 226 sleeping, 0 stopped, 0 zombie
|
||||
%Cpu(s): 3.0 us, 1.3 sy, 95.7 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
|
||||
KiB Mem : 8173400 total, 47780 free, 7787388 used, 338232 buff/cache
|
||||
KiB Swap: 0 total, 0 free, 0 used. 142808 avail Mem
|
||||
|
||||
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
||||
25586 root 20 0 2439064 2.289g 4 S 186.1 29.4 120:47.58 /opt/server
|
||||
```
|
||||
由以上信息可知,进程执行文件为/opt/server,pid为25586
|
||||
#### 查看进程执行文件
|
||||
发现/opt/server文件不存在,/proc/25586/目录下也显示exe文件不存在
|
||||
#### 查看定时任务是否异常
|
||||
```
|
||||
cat /etc/passwd | awk -F: '{print $1}' | xargs -I {} crontab -l -u {}
|
||||
```
|
||||
发现系统无定时任务
|
||||
#### 查看/tmp目录下是否存在异常目录或文件
|
||||
发现/tmp目录正常
|
||||
#### 查看进程父进程
|
||||
```
|
||||
ps -ef | grep server
|
||||
```
|
||||
结果为
|
||||
```
|
||||
root 25586 1793 99 14:45 ? 02:08:41 /opt/server
|
||||
```
|
||||
父进程为1793
|
||||
#### 查看父进程
|
||||
```
|
||||
ll /proc/1793/
|
||||
```
|
||||
其中exe为
|
||||
```
|
||||
lrwxrwxrwx 1 root root 0 Jan 11 15:56 exe -> /bin/busybox*
|
||||
```
|
||||
考虑服务器并没有使用busybox,但是docker常用
|
||||
#### 查看docker进程
|
||||
```
|
||||
docker ps -a
|
||||
```
|
||||
结果如下
|
||||
```
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
169486212d4b zqbxacdsx "#(nop)" 3 days ago Up 3 days (healthy) harbor-jobservice
|
||||
```
|
||||
果然发现存在奇怪的容器,镜像名称为zqbxacdsx
|
||||
#### 查看docker镜像id
|
||||
```
|
||||
docker images
|
||||
```
|
||||
结果为aa05538acecf
|
||||
#### 查看镜像构建过程
|
||||
```
|
||||
## aa05538acecf 镜像ID
|
||||
docker history aa05538acecf --no-trunc
|
||||
```
|
||||
发现该镜像只是增加了一个脚本main.sh
|
||||
#### 查看main.sh内容
|
||||
进入容器中
|
||||
```
|
||||
## 169486212d4b 容器ID
|
||||
docker exec -it 169486212d4b /bin/sh
|
||||
```
|
||||
查看main.sh
|
||||
```
|
||||
cat main.sh
|
||||
```
|
||||
果然,main.sh是一个自动下载挖矿程序的脚本
|
||||
#### 停止挖矿容器
|
||||
```
|
||||
docker stop 169486212d4b
|
||||
```
|
||||
#### 删除挖矿容器
|
||||
```
|
||||
docker rm 169486212d4b
|
||||
```
|
||||
#### 删除挖矿镜像
|
||||
```
|
||||
docker rmi aa05538acecf
|
||||
```
|
||||
观察一段时间,发现异常进程未重新启动,服务器运行平稳
|
||||
至此,挖矿病毒处理完成。接下来防火墙关闭必要端口,docker配置加固
|
Loading…
Add table
Add a link
Reference in a new issue