二进制部署K8S集群从0到1
@TOC
介绍:k8s集群系统的各组件需要使用TLS
证书对通信进行加密,本文档使用CloudFlare
的PKI
工具集cfssl来生成Certificate Authority(CA)
和其他证书。
管理集群中的TLS
前言
每个Kubernetes集群都有一个集群根证书颁发机构(CA)。 集群中的组件通常使用CA来验证API server的证书,由API服务器验证kubelet客户端证书等。为了支持这一点,CA证书包被分发到集群中的每个节点,并作为一个secret附加分发到默认service account上。 或者,你的workload可以使用此CA建立信任。 你的应用程序可以使用类似于ACME草案的协议,使用certificates.k8s.io
API请求证书签名。
集群中的TLS信任
让Pod中运行的应用程序信任集群根CA通常需要一些额外的应用程序配置。 您将需要将CA证书包添加到TLS客户端或服务器信任的CA证书列表中。 例如,您可以使用golang TLS配置通过解析证书链并将解析的证书添加到tls.Config
结构中的Certificates
字段中,CA证书捆绑包将使用默认服务账户自动加载到pod中,路径为/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
。 如果您没有使用默认服务账户,请请求集群管理员构建包含您有权访问使用的证书包的configmap。
集群部署
环境规划
软件 | 版本 |
---|---|
Linux操作系统 | CentOS Linux release 7.6.1810 (Core) |
Kubernetes | 1.14.2 |
Docker | 18.06.1-ce |
Etcd | 3.3.1 |
角色 | IP | 组件 | 推荐配置 |
---|---|---|---|
k8s-master | 172.16.4.12 | kube-apiserver kube-controller-manager kube-scheduler etcd | 8core和16GB内存 |
k8s-node1 | 172.16.4.13 | kubelet kube-proxy docker flannel etcd | 根据需要运行的容器数量进行配置 |
k8s-node2 | 172.16.4.14 | kubelet kube-proxy docker flannel etcd | 根据需要运行的容器数量进行配置 |
组件 | 使用的证书 |
---|---|
etcd | ca.pem, server.pem, server-key.pem |
kube-apiserver | ca.pem, server.pem, server-key.pem |
kubelet | ca.pem, ca-key.pem |
kube-proxy | ca.pem, kube-proxy.pem, kube-proxy-key.pem |
kubectl | ca.pem, admin.pem, admin-key.pem |
kube-controller-manager | ca.pem, ca-key.pem |
flannel | ca.pem, server.pem, server-key.pem |
环境准备
以下操作需要在master节点和各Node节点上执行:
- 准备必要可用的软件包(非必须操作)
# 安装net-tools,可以使用ping,ifconfig等命令 yum install -y net-tools # 安装curl,telnet命令 yum install -y curl telnet # 安装vim编辑器 yum install -y vim # 安装wget下载命令 yum install -y wget # 安装lrzsz工具,可以直接拖拽文件到Xshell中上传文件到服务器或下载文件到本地。 yum -y install lrzsz
- 关闭防火墙
systemctl stop firewalld systemctl disable firewalld
- 关闭selinux
sed -i 's/enforcing/disabled' /etc/selinux/config setenforce 0 # 或者进入到/etc/selinux/config将以下字段设置并重启生效: SELINUX=disabled
- 关闭swap
swapoff -a # 临时 vim /etc/fstab #永久
- 确保net.bridge.bridge-nf-call-iptables在sysctl配置为1:
$ cat <<EOF > /etc/sysctl.d/k8s.conf net.ipv4.ip_forward =1 net.bridge.bridge-nf-call-ip6tables =1 net.bridge.bridge-nf-call-iptables =1 EOF $ sysctl --system
- 添加主机名与IP对应关系(master和node节点都需要配置)
$ vim /etc/hosts 172.16.4.12 k8s-master 172.16.4.13 k8s-node1 172.16.4.14 k8s-node2
- 同步时间
# yum install ntpdate -y # ntpdate ntp.api.bz
k8s需要容器运行时(Container Runtime Interface,CRI)的支持,目前官方支持的容器运行时包括:Docker、Containerd、CRI-O和frakti。此处以Docker作为容器运行环境,推荐版本为Docker CE 18.06 或 18.09.
- 安装Docker
# 为Docker配置阿里云源,注意是在/etc/yum.repos.d目录执行下述命令。 [root@k8s-master yum.repos.d]# wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo [root@k8s-master yum.repos.d]# yum install -y yum-utils device-mapper-persistent-data lvm2 [root@k8s-master yum.repos.d]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo # update形成缓存,并且列出可用源,发现出现docker-ce源。 [root@k8s-master yum.repos.d]# yum update && yum repolist Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirror.lzu.edu.cn * extras: mirrors.nwsuaf.edu.cn * updates: mirror.lzu.edu.cn docker-ce-stable | 3.5 kB 00:00:00 (1/2): docker-ce-stable/x86_64/updateinfo | 55 B 00:00:00 (2/2): docker-ce-stable/x86_64/primary_db | 28 kB 00:00:00 No packages marked for update Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: mirror.lzu.edu.cn * extras: mirrors.nwsuaf.edu.cn * updates: mirror.lzu.edu.cn repo id repo name status base/7/x86_64 CentOS-7 - Base 10,019 docker-ce-stable/x86_64 Docker CE Stable - x86_64 43 extras/7/x86_64 CentOS-7 - Extras 409 updates/7/x86_64 CentOS-7 - Updates 2,076 repolist: 12,547 # 列出可用的docker-ce版本,推荐使用18.06或18.09的稳定版。 yum list docker-ce.x86_64 --showduplicates | sort -r # 正式安装docker,此处以docker-ce-18.06.3.ce-3.el7为例。推荐第2种方式。 yum -y install docker-ce-18.06.3.ce-3.el7 # 在此处可能会报错:Delta RPMs disabled because /usr/bin/applydeltarpm not installed.采用如下命令解决。 yum provides '*/applydeltarpm' yum install deltarpm -y # 然后重新执行安装命令 yum -y install docker-ce-18.06.3.ce-3.el7 # 安装完成设置docker开机自启动。 systemctl enable docker
注意:以下操作都在 master 节点即 172.16.4.12 这台主机上执行,证书只需要创建一次即可,以后在向集群中添加新节点时只要将 /etc/kubernetes/ 目录下的证书拷贝到新节点上即可。
创建TLS证书和秘钥
- 采用二进制源码包安装CFSSL
# 首先创建存放证书的位置 $ mkdir ssl && cd ssl # 下载用于生成证书的 wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 # 用于将证书的json文本导入 wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 # 查看证书信息 wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64 # 修改文件,使其具备执行权限 chmod +x cfssl_linux-amd64 cfssljson_linux-amd64 cfssl-certinfo_linux-amd64 # 将文件移到/usr/local/bin/cfssl mv cfssl_linux-amd64 /usr/local/bin/cfssl mv cfssljson_linux-amd64 /usr/local/bin/cfssljson mv cfssl-certinfo_linux-amd64 /usr/local/bin/cfssl-certinfo # 如果是普通用户,可能需要将环境变量设置下 export PATH=/usr/local/bin:$PATH
创建CA(Certificate Authority)
注意以下命令,仍旧在/root/ssl文件目录下执行。
- 创建CA配置文件
# 生成一个默认配置 $ cfssl print-defaults config > config.json # 生成一个默认签发证书的配置 $ cfssl print-defaults csr > csr.json # 根据config.json文件的格式创建如下的ca-config.json文件,其中过期时间设置成了 87600h cat > ca-config.json <<EOF { "signing": { "default": { "expiry": "87600h" }, "profiles": { "kubernetes": { "expiry": "87600h", "usages": [ "signing", "key encipherment", "server auth", "client auth" ] } } } } EOF
字段说明
ca-config.json
:可以定义多个 profiles,分别指定不同的过期时间、使用场景等参数;后续在签名证书时使用某个 profile;signing
:表示该证书可用于签名其它证书;生成的 ca.pem 证书中CA=TRUE
;server auth
:表示client可以用该 CA 对server提供的证书进行验证;client auth
:表示server可以用该CA对client提供的证书进行验证;
- 创建CA证书签名请求
# 创建ca-csr.json文件,内容如下 cat > ca-csr.json <<EOF { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "Beijing", "ST": "Beijing", "O": "k8s", "OU": "System" } ], "ca": { "expiry": "87600h" } } EOF
- "CN":
Common Name
,kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name);浏览器使用该字段验证网站是否合法; - "O":
Organization
,kube-apiserver 从证书中提取该字段作为请求用户所属的组 (Group);
- 生成CA证书和私钥
[root@k8s-master ~]# cfssl gencert -initca ca-csr.json | cfssljson -bare ca 2019/06/12 11:08:53 [INFO] generating a new CA key and certificate from CSR 2019/06/12 11:08:53 [INFO] generate received request 2019/06/12 11:08:53 [INFO] received CSR 2019/06/12 11:08:53 [INFO] generating key: rsa-2048 2019/06/12 11:08:53 [INFO] encoded CSR 2019/06/12 11:08:53 [INFO] signed certificate with serial number 708489059891717538616716772053407287945320812263 # 此时/root下应该有以下四个文件。 [root@k8s-master ssl]# ls ca-config.json ca.csr ca-csr.json ca-key.pem ca.pem
- 创建Kubernetes证书
创建Kubernetes证书签名请求文件server-csr.json(kubernetes-csr.json),并将受信任的IP修改添加到hosts,比如我的三个节点的IP为:172.16.4.12 172.16.4.13 172.16.4.14
$ cat > server-csr.json <<EOF { "CN": "kubernetes", "hosts": [ "127.0.0.1", "172.16.4.12", "172.16.4.13", "172.16.4.14", "10.10.10.1", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "BeiJing", "ST": "BeiJing", "O": "k8s", "OU": "System" } ] } EOF # 正式生成Kubernetes证书和私钥 [root@k8s-master ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes server-csr.json | cfssljson -bare server 2019/06/12 12:00:45 [INFO] generate received request 2019/06/12 12:00:45 [INFO] received CSR 2019/06/12 12:00:45 [INFO] generating key: rsa-2048 2019/06/12 12:00:45 [INFO] encoded CSR 2019/06/12 12:00:45 [INFO] signed certificate with serial number 276381852717263457656057670704331293435930586226 2019/06/12 12:00:45 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements"). # 查看生成的server.pem和server-key.pem [root@k8s-master ssl]# ls server* server.csr server-csr.json server-key.pem server.pem
- 如果 hosts 字段不为空则需要指定授权使用该证书的 IP 或域名列表,由于该证书后续被
etcd
集群和kubernetes master
集群使用,所以上面分别指定了etcd
集群、kubernetes master
集群的主机 IP 和 kubernetes 服务的服务 IP(一般是kube-apiserver
指定的service-cluster-ip-range
网段的第一个IP,如 10.10.10.1)。 - 这是最小化安装的kubernetes集群,包括一个私有镜像仓库,三个节点的kubernetes集群,以上物理节点的IP也可以更换为主机名。
- 创建admin证书
创建admin证书签名请求文件,admin-csr.json
:
cat > admin-csr.json <<EOF { "CN": "admin", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "BeiJing", "ST": "BeiJing", "O": "system:masters", "OU": "System" } ] } EOF
- 后续
kube-apiserver
使用RBAC
对客户端(如kubelet
、kube-proxy
、Pod
)请求进行授权; kube-apiserver
预定义了一些RBAC
使用的RoleBindings
,如cluster-admin
将 Groupsystem:masters
与 Rolecluster-admin
绑定,该 Role 授予了调用kube-apiserver
的所有 API的权限;- O 指定该证书的 Group 为
system:masters
,kubelet
使用该证书访问kube-apiserver
时 ,由于证书被 CA 签名,所以认证通过,同时由于证书用户组为经过预授权的system:masters
,所以被授予访问所有 API 的权限;
注意:这个admin 证书,是将来生成管理员用的kube config 配置文件用的,现在我们一般建议使用RBAC 来对kubernetes 进行角色权限控制, kubernetes 将证书中的CN 字段 作为User, O 字段作为 Group(具体参考 Kubernetes中的用户与身份认证授权中 X509 Client Certs 一段)。
生成admin证书和私钥
[root@k8s-master ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes admin-csr.json | cfssljson -bare admin 2019/06/12 14:52:32 [INFO] generate received request 2019/06/12 14:52:32 [INFO] received CSR 2019/06/12 14:52:32 [INFO] generating key: rsa-2048 2019/06/12 14:52:33 [INFO] encoded CSR 2019/06/12 14:52:33 [INFO] signed certificate with serial number 491769057064087302830652582150890184354925110925 2019/06/12 14:52:33 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements"). #查看生成的证书和私钥 [root@k8s-master ssl]# ls admin* admin.csr admin-csr.json admin-key.pem admin.pem
- 创建kube-proxy证书
创建 kube-proxy 证书签名请求文件 kube-proxy-csr.json
,让它携带证书访问集群:
cat > kube-proxy-csr.json <<EOF { "CN": "system:kube-proxy", "hosts": [], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "BeiJing", "ST": "BeiJing", "O": "k8s", "OU": "System" } ] } EOF
- CN 指定该证书的 User 为
system:kube-proxy
; kube-apiserver
预定义的 RoleBindingsystem:node-proxier
将Usersystem:kube-proxy
与 Rolesystem:node-proxier
绑定,该 Role 授予了调用kube-apiserver
Proxy 相关 API 的权限;
生成 kube-proxy 客户端证书和私钥
[root@k8s-master ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy && ls kube-proxy* 2019/06/12 14:58:09 [INFO] generate received request 2019/06/12 14:58:09 [INFO] received CSR 2019/06/12 14:58:09 [INFO] generating key: rsa-2048 2019/06/12 14:58:09 [INFO] encoded CSR 2019/06/12 14:58:09 [INFO] signed certificate with serial number 175491367066700423717230199623384101585104107636 2019/06/12 14:58:09 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for websites. For more information see the Baseline Requirements for the Issuance and Management of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org); specifically, section 10.2.3 ("Information Requirements"). kube-proxy.csr kube-proxy-csr.json kube-proxy-key.pem kube-proxy.pem
- 校验证书
以server证书为例
使用openssl命令
[root@k8s-master ssl]# openssl x509 -noout -text -in server.pem ...... Signature Algorithm: sha256WithRSAEncryption Issuer: C=CN, ST=Beijing, L=Beijing, O=k8s, OU=System, CN=kubernetes Validity Not Before: Jun 12 03:56:00 2019 GMT Not After : Jun 9 03:56:00 2029 GMT Subject: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=kubernetes ...... X509v3 extensions: X509v3 Key Usage: critical Digital Signature, Key Encipherment X509v3 Extended Key Usage: TLS Web Server Authentication, TLS Web Client Authentication X509v3 Basic Constraints: critical CA:FALSE X509v3 Subject Key Identifier: E9:99:37:41:CC:E9:BA:9A:9F:E6:DE:4A:3E:9F:8B:26:F7:4E:8F:4F X509v3 Authority Key Identifier: keyid:CB:97:D5:C3:5F:8A:EB:B5:A8:9D:39:DE:5F:4F:E0:10:8E:4C:DE:A2 X509v3 Subject Alternative Name: DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.default.svc.cluster.local, IP Address:127.0.0.1, IP Address:172.16.4.12, IP Address:172.16.4.13, IP Address:172.16.4.14, IP Address:10.10.10.1 ......
- 确认
Issuer
字段的内容和ca-csr.json
一致; - 确认
Subject
字段的内容和server-csr.json
一致; - 确认
X509v3 Subject Alternative Name
字段的内容和server-csr.json
一致; - 确认
X509v3 Key Usage、Extended Key Usage
字段的内容和ca-config.json
中 ``kubernetes profile` 一致;
使用 cfssl-certinfo
命令
[root@k8s-master ssl]# cfssl-certinfo -cert server.pem { "subject": { "common_name": "kubernetes", "country": "CN", "organization": "k8s", "organizational_unit": "System", "locality": "BeiJing", "province": "BeiJing", "names": [ "CN", "BeiJing", "BeiJing", "k8s", "System", "kubernetes" ] }, "issuer": { "common_name": "kubernetes", "country": "CN", "organization": "k8s", "organizational_unit": "System", "locality": "Beijing", "province": "Beijing", "names": [ "CN", "Beijing", "Beijing", "k8s", "System", "kubernetes" ] }, "serial_number": "276381852717263457656057670704331293435930586226", "sans": [ "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local", "127.0.0.1", "172.16.4.12", "172.16.4.13", "172.16.4.14", "10.10.10.1" ], "not_before": "2019-06-12T03:56:00Z", "not_after": "2029-06-09T03:56:00Z", "sigalg": "SHA256WithRSA", ...... }
- 分发证书
将生成的证书和秘钥文件(后缀名为.pem
)拷贝到所有机器的 /etc/kubernetes/ssl
目录下备用;
[root@k8s-master ssl]# mkdir -p /etc/kubernetes/ssl [root@k8s-master ssl]# cp *.pem /etc/kubernetes/ssl [root@k8s-master ssl]# ls /etc/kubernetes/ssl/ admin-key.pem admin.pem ca-key.pem ca.pem kube-proxy-key.pem kube-proxy.pem server-key.pem server.pem # 留下pem文件,删除其余无用文件(非必须操作,可以不执行) ls | grep -v pem |xargs -i rm {}
创建kubeconfig文件
以下命令在master节点运行,没有指定运行目录,则默认是用户家目录,root用户则在/root下执行。
下载kubectl
注意请下载对应的Kubernetes版本的安装包。
# 下述网站,如果访问不了网站,请移步百度云下载: wget https://dl.k8s.io/v1.14.3/kubernetes-client-linux-amd64.tar.gz tar -xzvf kubernetes-client-linux-amd64.tar.gz cp kubernetes/client/bin/kube* /usr/bin/ chmod a+x /usr/bin/kube*
创建kubectl kubeconfig文件
# 172.16.4.12是master节点的IP,注意更改。 # 创建kubeconfig 然后需要指定k8s的api的https的访问入口 export KUBE_APISERVER="https://172.16.4.12:6443" # 设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} # 设置客户端认证参数 kubectl config set-credentials admin \ --client-certificate=/etc/kubernetes/ssl/admin.pem \ --embed-certs=true \ --client-key=/etc/kubernetes/ssl/admin-key.pem # 设置上下文参数 kubectl config set-context kubernetes \ --cluster=kubernetes \ --user=admin # 设置默认上下文 kubectl config use-context kubernetes
admin.pem
证书 OU 字段值为system:masters
,kube-apiserver
预定义的 RoleBindingcluster-admin
将 Groupsystem:masters
与 Rolecluster-admin
绑定,该 Role 授予了调用kube-apiserver
相关 API 的权限;- 生成的 kubeconfig 被保存到
~/.kube/config
文件;
注意:~/.kube/config
文件拥有对该集群的最高权限,请妥善保管。
kubelet
、kube-proxy
等 Node 机器上的进程与 Master 机器的 kube-apiserver
进程通信时需要认证和授权;
以下操作只需要在master节点上执行,生成的*.kubeconfig
文件可以直接拷贝到node节点的/etc/kubernetes
目录下。
创建TLS Bootstrapping token
Token auth file
Token可以是任意的包含128 bit的字符串,可以使用安全的随机数发生器生成。
export BOOTSTRAP_TOKEN=$(head -c 16 /dev/urandom | od -An -t x | tr -d ' ') cat > token.csv <<EOF ${BOOTSTRAP_TOKEN},kubelet-bootstrap,10001,"system:kubelet-bootstrap" EOF
后三行是一句,直接复制上面的脚本运行即可。
注意:在进行后续操作前请检查 token.csv 文件,确认其中的 ${BOOTSTRAP_TOKEN}环境变量已经被真实的值替换。
BOOTSTRAP_TOKEN 将被写入到 kube-apiserver 使用的 token.csv 文件和 kubelet 使用的 bootstrap.kubeconfig
文件,如果后续重新生成了 BOOTSTRAP_TOKEN,则需要:
- 更新 token.csv 文件,分发到所有机器 (master 和 node)的 /etc/kubernetes/ 目录下,分发到node节点上非必需;
- 重新生成 bootstrap.kubeconfig 文件,分发到所有 node 机器的 /etc/kubernetes/ 目录下;
- 重启 kube-apiserver 和 kubelet 进程;
- 重新 approve kubelet 的 csr 请求;
cp token.csv /etc/kubernetes/
创建kubelet bootstrapping kubeconfig文件
执行下面的命令时需要先安装kubectl命令
# 在执行之前,可以先安装kubectl 自动补全命令工具。 yum install -y bash-completion source /usr/share/bash-completion/bash_completion source <(kubectl completion bash) # 进到执行目录/etc/kubernetes下。 cd /etc/kubernetes export KUBE_APISERVER="https://172.16.4.12:6443" # 设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=bootstrap.kubeconfig # 设置客户端认证参数 kubectl config set-credentials kubelet-bootstrap \ --token=${BOOTSTRAP_TOKEN} \ --kubeconfig=bootstrap.kubeconfig # 设置上下文参数 kubectl config set-context default \ --cluster=kubernetes \ --user=kubelet-bootstrap \ --kubeconfig=bootstrap.kubeconfig # 设置默认上下文 kubectl config use-context default --kubeconfig=bootstrap.kubeconfig
--embed-certs
为true
时表示将certificate-authority
证书写入到生成的bootstrap.kubeconfig
文件中;- 设置客户端认证参数时没有指定秘钥和证书,后续由
kube-apiserver
自动生成;
创建kube-proxy kubeconfig文件
export KUBE_APISERVER="https://172.16.4.12:6443" # 设置集群参数 kubectl config set-cluster kubernetes \ --certificate-authority=/etc/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=kube-proxy.kubeconfig # 设置客户端认证参数 kubectl config set-credentials kube-proxy \ --client-certificate=/etc/kubernetes/ssl/kube-proxy.pem \ --client-key=/etc/kubernetes/ssl/kube-proxy-key.pem \ --embed-certs=true \ --kubeconfig=kube-proxy.kubeconfig # 设置上下文参数 kubectl config set-context default \ --cluster=kubernetes \ --user=kube-proxy \ --kubeconfig=kube-proxy.kubeconfig # 设置默认上下文 kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig
- 设置集群参数和客户端认证参数时
--embed-certs
都为true
,这会将certificate-authority
、client-certificate
和client-key
指向的证书文件内容写入到生成的kube-proxy.kubeconfig
文件中; kube-proxy.pem
证书中 CN 为system:kube-proxy
,kube-apiserver
预定义的 RoleBindingcluster-admin
将Usersystem:kube-proxy
与 Rolesystem:node-proxier
绑定,该 Role 授予了调用kube-apiserver
Proxy 相关 API 的权限;
分发kubeconfig文件
将两个 kubeconfig 文件分发到所有 Node 机器的 /etc/kubernetes/
目录下:
# 现在可以把其他节点加入互信,首先需要生成证书,三次回车即可。 ssh-keygen # 查看生成的证书 ls /root/.ssh/ id_rsa id_rsa.pub # 将生成的证书拷贝到node1和node2 ssh-copy-id root@172.16.4.13 # 输入节点用户的密码即可访问。同样方式加入node2为互信。 # 把kubeconfig文件拷贝到node节点的/etc/kubernetes,该目录需要事先手动创建好。 scp bootstrap.kubeconfig kube-proxy.kubeconfig root@172.16.4.13:/etc/kubernetes scp bootstrap.kubeconfig kube-proxy.kubeconfig root@172.16.4.14:/etc/kubernetes
创建 ETCD HA集群
etcd服务作为k8s集群的主数据库,在安装k8s各服务之前需要首先安装和启动。kuberntes 系统使用 etcd 存储所有数据,本文档介绍部署一个三节点高可用 etcd 集群的步骤,这三个节点复用 kubernetes master 机器,分别命名为k8s-master
、k8s-node1
、k8s-node2
:
角色 | IP |
---|---|
k8s-master | 172.16.4.12 |
k8s-node1 | 172.16.4.13 |
k8s-node2 | 172.16.4.14 |
TLS认证文件
需要为 etcd 集群创建加密通信的 TLS 证书,这里复用以前创建的 kubernetes 证书:
# 将/root/ssl下的ca.pem, server-key.pem, server.pem复制到/etc/kubernetes/ssl cp ca.pem server-key.pem server.pem /etc/kubernetes/ssl
- kubernetes 证书的
hosts
字段列表中包含上面三台机器的 IP,否则后续证书校验会失败;
下载二进制文件
二进制包下载地址:此文最新为etcd-v3.3.13,读者可以到https://github.com/coreos/etcd/releases页面下载最新版本的二进制文件。
wget https://github.com/etcd-io/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz tar zxvf etcd-v3.3.13-linux-amd64.tar.gz mv etcd-v3.3.13-linux-amd64/etcd* /usr/local/bin
或者直接使用yum命令安装:
yum install etcd
注意:若使用yum安装,默认etcd命令将在/usr/bin
目录下,注意修改下面的etcd.service
文件中的启动命令地址为/usr/bin/etcd
。
创建etcd的数据目录
mkdir -p /var/lib/etcd/default.etcd
创建etcd的systemd unit文件
在/usr/lib/systemd/system/目录下创建文件etcd.service,内容如下。注意替换IP地址为你自己的etcd集群的主机IP。
[Unit] Description=Etcd Server After=network.target After=network-online.target Wants=network-online.target Documentation=https://github.com/coreos [Service] Type=notify WorkingDirectory=/var/lib/etcd EnvironmentFile=-/etc/etcd/etcd.conf ExecStart=/usr/local/bin/etcd \ --name ${ETCD_NAME} \ --cert-file=/etc/kubernetes/ssl/server.pem \ --key-file=/etc/kubernetes/ssl/server-key.pem \ --peer-cert-file=/etc/kubernetes/ssl/server.pem \ --peer-key-file=/etc/kubernetes/ssl/server-key.pem \ --trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ --peer-trusted-ca-file=/etc/kubernetes/ssl/ca.pem \ --initial-advertise-peer-urls ${ETCD_INITIAL_ADVERTISE_PEER_URLS} \ --listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \ --listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http://127.0.0.1:2379 \ --advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \ --initial-cluster-token=${ETCD_INITIAL_CLUSTER_TOKEN} \ --initial-cluster etcd-master=https://172.16.4.12:2380,etcd-node1=https://172.16.4.13:2380,etcd-node2=https://172.16.4.14:2380 \ --initial-cluster-state=new \ --data-dir=${ETCD_DATA_DIR} Restart=on-failure RestartSec=5 LimitNOFILE=65536 [Install] WantedBy=multi-user.target
- 指定
etcd
的工作目录为/var/lib/etcd
,数据目录为/var/lib/etcd
,需在启动服务前创建这个目录,否则启动服务的时候会报错“Failed at step CHDIR spawning /usr/bin/etcd: No such file or directory”; - 为了保证通信安全,需要指定 etcd 的公私钥(cert-file和key-file)、Peers 通信的公私钥和 CA 证书(peer-cert-file、peer-key-file、peer-trusted-ca-file)、客户端的CA证书(trusted-ca-file);
- 创建
server.pem
证书时使用的server-csr.json
文件的hosts
字段包含所有 etcd 节点的IP,否则证书校验会出错; --initial-cluster-state
值为new
时,--name
的参数值必须位于--initial-cluster
列表中;
创建etcd的环境变量配置文件/etc/etcd/etcd.conf
mkdir -p /etc/etcd touch etcd.conf
写入内容如下:
# [member] ETCD_NAME=etcd-master ETCD_DATA_DIR="/var/lib/etcd/default.etcd" ETCD_LISTEN_PEER_URLS="https://172.16.4.12:2380" ETCD_LISTEN_CLIENT_URLS="https://172.16.4.12:2379" #[cluster] ETCD_INITIAL_ADVERTISE_PEER_URLS="https://172.16.4.12:2380" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_ADVERTISE_CLIENT_URLS="https://172.16.4.12:2379"
这是172.16.4.12节点的配置,其他两个etcd节点只要将上面的IP地址改成相应节点的IP地址即可。ETCD_NAME换成对应节点的etcd-node1 etcd-node2。
部署node节点的etcd
# 1. 从master节点传送TLS认证文件到各节点。注意需要在各节点上事先创建/etc/kubernetes/ssl目录。 scp /etc/kubernetes/ssl/*.pem root@172.16.4.13:/etc/kubernetes/ssl/ scp /etc/kubernetes/ssl/*.pem root@172.16.4.14:/etc/kubernetes/ssl/ # 2. 把master节点的etcd和etcdctl命令直接传到各节点上, scp /usr/local/bin/etcd* root@172.16.4.13:/usr/local/bin/ scp /usr/local/bin/etcd* root@172.16.4.14:/usr/local/bin/ # 3. 把etcd配置文件上传至各node节点上。注意事先在各节点上创建好/etc/etcd目录。 scp /etc/etcd/etcd.conf root@172.16.4.13:/etc/etcd/ scp /etc/etcd/etcd.conf root@172.16.4.14:/etc/etcd/ # 4. 需要修改/etc/etcd/etcd.conf的相应参数。以k8s-node1(IP:172.16.4.13)为例: # [member] ETCD_NAME=etcd-node1 ETCD_DATA_DIR="/var/lib/etcd" ETCD_LISTEN_PEER_URLS="https://172.16.4.13:2380" ETCD_LISTEN_CLIENT_URLS="https://172.16.4.13:2379" #[cluster] ETCD_INITIAL_ADVERTISE_PEER_URLS="https://172.16.4.13:2380" ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" ETCD_ADVERTISE_CLIENT_URLS="https://172.16.4.13:2379" # 上述文件主要是修改ETCD_NAME和对应的IP为节点IP即可。同样修改node2的配置文件。 # 5. 把/usr/lib/systemd/system/etcd.service的etcd服务配置文件上传至各节点。 scp /usr/lib/systemd/system/etcd.service root@172.16.4.13:/usr/lib/systemd/system/ scp /usr/lib/systemd/system/etcd.service root@172.16.4.14:/usr/lib/systemd/system/
启动服务
systemctl daemon-reload systemctl enable etcd systemctl start etcd systemctl status etcd
在所有的 kubernetes master 节点重复上面的步骤,直到所有机器的 etcd 服务都已启动。
注意:如果日志中出现连接异常信息,请确认所有节点防火墙是否开放2379,2380端口。 以centos7为例:
firewall-cmd --zone=public --add-port=2380/tcp --permanent firewall-cmd --zone=public --add-port=2379/tcp --permanent firewall-cmd --reload
验证服务
在任一 kubernetes master 机器上执行如下命令:
[root@k8s-master ~]# etcdctl \ > --ca-file=/etc/kubernetes/ssl/ca.pem \ > --cert-file=/etc/kubernetes/ssl/server.pem \ > --key-file=/etc/kubernetes/ssl/server-key.pem \ > cluster-health member 287080ba42f94faf is healthy: got healthy result from https://172.16.4.13:2379 member 47e558f4adb3f7b4 is healthy: got healthy result from https://172.16.4.12:2379 member e531bd3c75e44025 is healthy: got healthy result from https://172.16.4.14:2379 cluster is healthy
结果最后一行为 cluster is healthy
时表示集群服务正常。
部署Master节点
kubernetes master 节点包含的组件:
- kube-apiserver
- kube-scheduler
- kube-controller-manager
目前这三个组件需要部署在同一台机器上。
kube-scheduler
、kube-controller-manager
和kube-apiserver
三者的功能紧密相关;- 同时只能有一个
kube-scheduler
、kube-controller-manager
进程处于工作状态,如果运行多个,则需要通过选举产生一个 leader;
TLS证书文件
以下pem
证书文件我们在”创建TLS证书和秘钥“这一步中已经创建过了,token.csv
文件在“创建kubeconfig文件”的时候创建。我们再检查一下。
[root@k8s-master ~]# ls /etc/kubernetes/ssl/ admin-key.pem admin.pem ca-key.pem ca.pem kube-proxy-key.pem kube-proxy.pem server-key.pem server.pem
下载最新版本的二进制文件
从https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG.md页面`client` 或 server
tarball 文件server
的 tarball kubernetes-server-linux-amd64.tar.gz
已经包含了 client
(kubectl
) 二进制文件,所以不用单独下载kubernetes-client-linux-amd64.tar.gz
文件;
wget https://dl.k8s.io/v1.14.3/kubernetes-server-linux-amd64.tar.gz # 如果官网访问不到,可以移步百度云:链接:https://pan.baidu.com/s/1G6e981Q48mMVWD9Ho_j-7Q 提取码:uvc1 下载。 tar -xzvf kubernetes-server-linux-amd64.tar.gz cd kubernetes tar -xzvf kubernetes-src.tar.gz
将二进制文件拷贝到指定路径
[root@k8s-master kubernetes]# cp -r server/bin/{kube-apiserver,kube-controller-manager,kube-scheduler,kubectl,kube-proxy,kubelet} /usr/local/bin/
配置和启动kube-apiserver
(1)创建kube-apiserver的service配置文件
service配置文件/usr/lib/systemd/system/kube-apiserver.service
内容:
[Unit] Description=Kubernetes API Service Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=network.target After=etcd.service [Service] EnvironmentFile=-/etc/kubernetes/config EnvironmentFile=-/etc/kubernetes/apiserver ExecStart=/usr/local/bin/kube-apiserver \ $KUBE_LOGTOSTDERR \ $KUBE_LOG_LEVEL \ $KUBE_ETCD_SERVERS \ $KUBE_API_ADDRESS \ $KUBE_API_PORT \ $KUBELET_PORT \ $KUBE_ALLOW_PRIV \ $KUBE_SERVICE_ADDRESSES \ $KUBE_ADMISSION_CONTROL \ $KUBE_API_ARGS Restart=on-failure Type=notify LimitNOFILE=65536 [Install] WantedBy=multi-user.target
(2) 创建/etc/kubernetes/config文件内容为:
### # kubernetes system config # # The following values are used to configure various aspects of all # kubernetes services, including # # kube-apiserver.service # kube-controller-manager.service # kube-scheduler.service # kubelet.service # kube-proxy.service # logging to stderr means we get it in the systemd journal KUBE_LOGTOSTDERR="--logtostderr=true" # journal message level, 0 is debug KUBE_LOG_LEVEL="--v=0" # Should this cluster be allowed to run privileged docker containers KUBE_ALLOW_PRIV="--allow-privileged=true" # How the controller-manager, scheduler, and proxy find the apiserver KUBE_MASTER="--master=http://172.16.4.12:8080"
该配置文件同时被kube-apiserver、kube-controller-manager、kube-scheduler、kubelet、kube-proxy使用。
apiserver配置文件/etc/kubernetes/apiserver
内容为:
### ### kubernetes system config ### ### The following values are used to configure the kube-apiserver ### ## ### The address on the local server to listen to. KUBE_API_ADDRESS="--advertise-address=172.16.4.12 --bind-address=172.16.4.12 --insecure-bind-address=172.16.4.12" ## ### The port on the local server to listen on. ##KUBE_API_PORT="--port=8080" ## ### Port minions listen on ##KUBELET_PORT="--kubelet-port=10250" ## ### Comma separated list of nodes in the etcd cluster KUBE_ETCD_SERVERS="--etcd-servers=https://172.16.4.12:2379,https://172.16.4.13:2379,https://172.16.4.14:2379" ## ### Address range to use for services KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.10.10.0/24" ## ### default admission control policies KUBE_ADMISSION_CONTROL="--admission-control=ServiceAccount,NamespaceLifecycle,NamespaceExists,LimitRanger,ResourceQuota" ## ### Add your own! KUBE_API_ARGS="--authorization-mode=RBAC \ --runtime-config=rbac.authorization.k8s.io/v1beta1 \ --kubelet-https=true \ --enable-bootstrap-token-auth \ --token-auth-file=/etc/kubernetes/token.csv \ --service-node-port-range=30000-50000 \ --tls-cert-file=/etc/kubernetes/ssl/server.pem \ --tls-private-key-file=/etc/kubernetes/ssl/server-key.pem \ --client-ca-file=/etc/kubernetes/ssl/ca.pem \ --service-account-key-file=/etc/kubernetes/ssl/ca-key.pem \ --etcd-cafile=/etc/kubernetes/ssl/ca.pem \ --etcd-certfile=/etc/kubernetes/ssl/server.pem \ --etcd-keyfile=/etc/kubernetes/ssl/server-key.pem \ --enable-swagger-ui=true \ --apiserver-count=3 \ --audit-log-maxage=30 \ --audit-log-maxbackup=3 \ --audit-log-maxsize=100 \ --audit-log-path=/var/lib/audit.log \ --event-ttl=1h"
- 如果中途修改过
--service-cluster-ip-range
地址,则必须将default命名空间的kubernetes
的service给删除,使用命令:kubectl delete service kubernetes
,然后系统会自动用新的ip重建这个service,不然apiserver的log有报错the cluster IP x.x.x.x for service kubernetes/default is not within the service CIDR x.x.x.x/24; please recreate
--authorization-mode=RBAC
指定在安全端口使用 RBAC 授权模式,拒绝未通过授权的请求;- kube-scheduler、kube-controller-manager 一般和 kube-apiserver 部署在同一台机器上,它们使用非安全端口和 kube-apiserver通信;
- kubelet、kube-proxy、kubectl 部署在其它 Node 节点上,如果通过安全端口访问 kube-apiserver,则必须先通过 TLS 证书认证,再通过 RBAC 授权;
- kube-proxy、kubectl 通过在使用的证书里指定相关的 User、Group 来达到通过 RBAC 授权的目的;
- 如果使用了 kubelet TLS Boostrap 机制,则不能再指定
--kubelet-certificate-authority
、--kubelet-client-certificate
和--kubelet-client-key
选项,否则后续 kube-apiserver 校验 kubelet 证书时出现 ”x509: certificate signed by unknown authority“ 错误; --admission-control
值必须包含ServiceAccount
;--bind-address
不能为127.0.0.1
;runtime-config
配置为rbac.authorization.k8s.io/v1beta1
,表示运行时的apiVersion;--service-cluster-ip-range
指定 Service Cluster IP 地址段,该地址段不能路由可达;- 缺省情况下 kubernetes 对象保存在 etcd/registry 路径下,可以通过
--etcd-prefix
参数进行调整; - 如果需要开通http的无认证的接口,则可以增加以下两个参数:
--insecure-port=8080 --insecure-bind-address=127.0.0.1
。注意,生产上不要绑定到非127.0.0.1的地址上。
注意:完整 unit 见 kube-apiserver.service可以根据自身集群需求修改参数。
(3)启动kube-apiserver
systemctl daemon-reload systemctl enable kube-apiserver systemctl start kube-apiserver systemctl status kube-apiserver
配置和启动kube-controller-manager
(1)创建kube-controller-manager的service配置文件
文件路径/usr/lib/systemd/system/kube-controller-manager.service
[Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] EnvironmentFile=-/etc/kubernetes/config EnvironmentFile=-/etc/kubernetes/controller-manager ExecStart=/usr/local/bin/kube-controller-manager \ $KUBE_LOGTOSTDERR \ $KUBE_LOG_LEVEL \ $KUBE_MASTER \ $KUBE_CONTROLLER_MANAGER_ARGS Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
(2)配置文件/etc/kubernetes/controller-manager
### # The following values are used to configure the kubernetes controller-manager # defaults from config and apiserver should be adequate # Add your own! KUBE_CONTROLLER_MANAGER_ARGS="--address=127.0.0.1 \ --service-cluster-ip-range=10.10.10.0/24 \ --cluster-name=kubernetes \ --cluster-signing-cert-file=/etc/kubernetes/ssl/ca.pem \ --cluster-signing-key-file=/etc/kubernetes/ssl/ca-key.pem \ --service-account-private-key-file=/etc/kubernetes/ssl/ca-key.pem \ --root-ca-file=/etc/kubernetes/ssl/ca.pem \ --leader-elect=true"
--service-cluster-ip-range
参数指定 Cluster 中 Service 的CIDR范围,该网络在各 Node 间必须路由不可达,必须和 kube-apiserver 中的参数一致;--cluster-signing-*
指定的证书和私钥文件用来签名为 TLS BootStrap 创建的证书和私钥;--root-ca-file
用来对 kube-apiserver 证书进行校验,指定该参数后,才会在Pod 容器的 ServiceAccount 中放置该 CA 证书文件;--address
值必须为127.0.0.1
,kube-apiserver 期望 scheduler 和 controller-manager 在同一台机器;
(3)启动kube-controller-manager
systemctl daemon-reload systemctl enable kube-controller-manager systemctl start kube-controller-manager systemctl status kube-controller-manager
我们启动每个组件后可以通过执行命令kubectl get cs
,来查看各个组件的状态;
[root@k8s-master ~]# kubectl get cs NAME STATUS MESSAGE ERROR scheduler Unhealthy Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused controller-manager Healthy ok etcd-0 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"}
- 如果有组件report unhealthy请参考:https://github.com/kubernetes-incubator/bootkube/issues/64
配置和启动kube-scheduler
(1)创建kube-scheduler的service的配置文件
文件路径/usr/lib/systemd/system/kube-scheduler.service
[Unit] Description=Kubernetes Scheduler Plugin Documentation=https://github.com/GoogleCloudPlatform/kubernetes [Service] EnvironmentFile=-/etc/kubernetes/config EnvironmentFile=-/etc/kubernetes/scheduler ExecStart=/usr/local/bin/kube-scheduler \ $KUBE_LOGTOSTDERR \ $KUBE_LOG_LEVEL \ $KUBE_MASTER \ $KUBE_SCHEDULER_ARGS Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target
(2) 配置文件/etc/kubernetes/scheduler
。
### # kubernetes scheduler config # default config should be adequate # Add your own! KUBE_SCHEDULER_ARGS="--leader-elect=true --address=127.0.0.1"
--address
值必须为127.0.0.1
,因为当前 kube-apiserver 期望 scheduler 和 controller-manager 在同一台机器;
注意:完整 unit 见 kube-scheduler.service可以根据自身集群情况添加参数。
(3) 启动kube-scheduler
systemctl daemon-reload systemctl enable kube-scheduler systemctl start kube-scheduler systemctl status kube-scheduler
验证master节点功能
[root@k8s-master ~]# kubectl get cs NAME STATUS MESSAGE ERROR controller-manager Healthy ok scheduler Healthy ok etcd-0 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"} # 此时发现,ERROR那一栏再没有报错了。
安装flannel网络插件
所有的node节点都需要安装网络插件才能让所有的Pod加入到同一个局域网中,本文是安装flannel网络插件的参考文档。
建议直接使用yum安装flanneld,除非对版本有特殊需求,默认安装的是0.7.1版本的flannel。
(1)安装flannel
# 查看默认安装的flannel版本,下面显示是0.7.1.个人建议安装较新版本。 [root@k8s-master ~]# yum list flannel --showduplicates | sort -r * updates: mirror.lzu.edu.cn Loading mirror speeds from cached hostfile Loaded plugins: fastestmirror flannel.x86_64 0.7.1-4.el7 extras * extras: mirror.lzu.edu.cn * base: mirror.lzu.edu.cn Available Packages [root@k8s-master ~]# wget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz # 解压文件,可以看到产生flanneld和mk-docker-opts.sh两个可执行文件。 [root@k8s-master ~]# tar zxvf flannel-v0.11.0-linux-amd64.tar.gz flanneld mk-docker-opts.sh README.md # 把两个可执行文件传至node1和node2中 [root@k8s-master ~]# scp flanneld root@172.16.4.13:/usr/bin/ flanneld 100% 34MB 62.9MB/s 00:00 [root@k8s-master ~]# scp flanneld root@172.16.4.14:/usr/bin/ flanneld 100% 34MB 121.0MB/s 00:00 [root@k8s-master ~]# scp mk-docker-opts.sh root@172.16.4.13:/usr/libexec/flannel mk-docker-opts.sh 100% 2139 1.2MB/s 00:00 [root@k8s-master ~]# scp mk-docker-opts.sh root@172.16.4.14:/usr/libexec/flannel mk-docker-opts.sh 100% 2139 1.1MB/s 00:00
- 注意在node节点上一定要实现创建好盛放flanneld和mk-docker-opts.sh的目录。
(2)/etc/sysconfig/flanneld
配置文件:
# Flanneld configuration options # etcd url location. Point this to the server where etcd runs FLANNEL_ETCD_ENDPOINTS="https://172.16.4.12:2379,https://172.16.4.13:2379,https://172.16.4.14:2379" # etcd config key. This is the configuration key that flannel queries # For address range assignment FLANNEL_ETCD_PREFIX="/kube-centos/network" # Any additional options that you want to pass FLANNEL_OPTIONS="-etcd-cafile=/etc/kubernetes/ssl/ca.pem -etcd-certfile=/etc/kubernetes/ssl/server.pem -etcd-keyfile=/etc/kubernetes/ssl/server-key.pem"
(3)创建service配置文件/usr/lib/systemd/system/flanneld.service
。
[Unit] Description=Flanneld overlay address etcd agent After=network.target After=network-online.target Wants=network-online.target After=etcd.service Before=docker.service [Service] Type=notify EnvironmentFile=/etc/sysconfig/flanneld EnvironmentFile=-/etc/sysconfig/docker-network ExecStart=/usr/bin/flanneld --ip-masq \ -etcd-endpoints=${FLANNEL_ETCD_ENDPOINTS} \ -etcd-prefix=${FLANNEL_ETCD_PREFIX} \ $FLANNEL_OPTIONS ExecStartPost=/usr/libexec/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker Restart=on-failure [Install] WantedBy=multi-user.target RequiredBy=docker.service
- 注意如果是多网卡(例如vagrant环境),则需要在FLANNEL_OPTIONS中增加指定的外网出口的网卡,例如-iface=eth1
(4)在etcd中创建网络配置
执行下面的命令为docker分配IP地址段。
etcdctl --endpoints=https://172.16.4.12:2379,https://172.16.4.13:2379,https://172.16.4.14:2379 \ --ca-file=/etc/kubernetes/ssl/ca.pem \ --cert-file=/etc/kubernetes/ssl/server.pem \ --key-file=/etc/kubernetes/ssl/server-key.pem mkdir -p /kube-centos/network [root@k8s-master network]# etcdctl --endpoints=https://172.16.4.12:2379,https://172.16.4.13:2379,https://172.16.4.14:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/kubernetes/ssl/server.pem --key-file=/etc/kubernetes/ssl/server-key.pem mk /kube-centos/network/config '{"Network":"172.30.0.0/16","SubnetLen":24,"Backend":{"Type":"vxlan"}}' [root@k8s-master network]# etcdctl --endpoints=https://172.16.4.12:2379,https://172.16.4.13:2379,https://172.16.4.14:2379 --ca-file=/etc/kubernetes/ssl/ca.pem --cert-file=/etc/kubernetes/ssl/server.pem --key-file=/etc/kubernetes/ssl/server-key.pem set /kube-centos/network/config '{"Network":"172.30.0.0/16","SubnetLen":24,"Backend":{"Type":"vxlan"}}' {"Network":"172.30.0.0/16","SubnetLen":24,"Backend":{"Type":"vxlan"}}
(5)启动flannel
systemctl daemon-reload systemctl enable flanneld systemctl start flanneld systemctl status flanneld
现在查询etcd中的内容可以看到:
[root@k8s-master ~]# etcdctl --endpoints=${ETCD_ENDPOINTS} \ --ca-file=/etc/kubernetes/ssl/ca.pem \ --cert-file=/etc/kubernetes/ssl/server.pem \ --key-file=/etc/kubernetes/ssl/server-key.pem \ ls /kube-centos/network/subnets /kube-centos/network/subnets/172.30.20.0-24 /kube-centos/network/subnets/172.30.69.0-24 /kube-centos/network/subnets/172.30.53.0-24 [root@k8s-master ~]# etcdctl --endpoints=${ETCD_ENDPOINTS} \ --ca-file=/etc/kubernetes/ssl/ca.pem \ --cert-file=/etc/kubernetes/ssl/server.pem \ --key-file=/etc/kubernetes/ssl/server-key.pem \ get /kube-centos/network/config {"Network":"172.30.0.0/16","SubnetLen":24,"Backend":{"Type":"vxlan"}} [root@k8s-master ~]# etcdctl --endpoints=${ETCD_ENDPOINTS} \ --ca-file=/etc/kubernetes/ssl/ca.pem \ --cert-file=/etc/kubernetes/ssl/server.pem \ --key-file=/etc/kubernetes/ssl/server-key.pem \ get /kube-centos/network/subnets/172.30.20.0-24 {"PublicIP":"172.16.4.13","BackendType":"vxlan","BackendData":{"VtepMAC":"5e:ef:ff:37:0a:d2"}} [root@k8s-master ~]# etcdctl --endpoints=${ETCD_ENDPOINTS} \ --ca-file=/etc/kubernetes/ssl/ca.pem \ --cert-file=/etc/kubernetes/ssl/server.pem \ --key-file=/etc/kubernetes/ssl/server-key.pem \ get /kube-centos/network/subnets/172.30.53.0-24 {"PublicIP":"172.16.4.12","BackendType":"vxlan","BackendData":{"VtepMAC":"e2:e6:b9:23:79:a2"}} [root@k8s-master ~]# etcdctl --endpoints=${ETCD_ENDPOINTS} \ > --ca-file=/etc/kubernetes/ssl/ca.pem \ > --cert-file=/etc/kubernetes/ssl/server.pem \ > --key-file=/etc/kubernetes/ssl/server-key.pem \ > get /kube-centos/network/subnets/172.30.69.0-24 {"PublicIP":"172.16.4.14","BackendType":"vxlan","BackendData":{"VtepMAC":"06:0e:58:69:a0:41"}}
同时还可以查看到其他信息:
# 1. 比如可以查看到flannel网络的信息 [root@k8s-master ~]# ifconfig ....... flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 172.30.53.0 netmask 255.255.255.255 broadcast 0.0.0.0 inet6 fe80::e0e6:b9ff:fe23:79a2 prefixlen 64 scopeid 0x20<link> ether e2:e6:b9:23:79:a2 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0 ....... # 2. 可以查看到分配的子网的文件。 [root@k8s-master ~]# cat /run/flannel/docker DOCKER_OPT_BIP="--bip=172.30.53.1/24" DOCKER_OPT_IPMASQ="--ip-masq=false" DOCKER_OPT_MTU="--mtu=1450" DOCKER_NETWORK_OPTIONS=" --bip=172.30.53.1/24 --ip-masq=false --mtu=1450"
(6)将docker应用于flannel
# 需要修改/usr/lib/systemd/system/docker.servce的ExecStart字段,引入上述\$DOCKER_NETWORK_OPTIONS字段,docker.service详细配置见下。 [Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com After=network-online.target firewalld.service Wants=network-online.target [Service] Type=notify # add by gzr EnvironmentFile=-/run/flannel/docker EnvironmentFile=-/run/docker_opts.env EnvironmentFile=-/run/flannel/subnet.env EnvironmentFile=-/etc/sysconfig/docker EnvironmentFile=-/etc/sysconfig/docker-storage EnvironmentFile=-/etc/sysconfig/docker-network # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS ExecReload=/bin/kill -s HUP $MAINPID # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity # Uncomment TasksMax if your systemd version supports it. # Only systemd 226 and above support this version. #TasksMax=infinity TimeoutStartSec=0 # set delegate yes so that systemd does not reset the cgroups of docker containers Delegate=yes # kill only the docker process, not all processes in the cgroup KillMode=process # restart the docker process if it exits prematurely Restart=on-failure StartLimitBurst=3 StartLimitInterval=60s [Install] WantedBy=multi-user.target
# 重启docker使得配置生效。 [root@k8s-master ~]# systemctl daemon-reload && systemctl restart docker.service # 再次查看docker和flannel的网络,会发现两者在同一网段 [root@k8s-master ~]# ifconfig docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 172.30.53.1 netmask 255.255.255.0 broadcast 172.30.53.255 ether 02:42:1e:aa:8b:0f txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ...... flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 172.30.53.0 netmask 255.255.255.255 broadcast 0.0.0.0 inet6 fe80::e0e6:b9ff:fe23:79a2 prefixlen 64 scopeid 0x20<link> ether e2:e6:b9:23:79:a2 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0 ..... # 同理,可以应用到其他各节点上,以node1为例。 [root@k8s-node1 ~]# vim /usr/lib/systemd/system/docker.service [root@k8s-node1 ~]# systemctl daemon-reload && systemctl restart docker [root@k8s-node1 ~]# ifconfig docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 172.30.20.1 netmask 255.255.255.0 broadcast 172.30.20.255 inet6 fe80::42:23ff:fe7f:6a70 prefixlen 64 scopeid 0x20<link> ether 02:42:23:7f:6a:70 txqueuelen 0 (Ethernet) RX packets 18 bytes 2244 (2.1 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 48 bytes 3469 (3.3 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ...... flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet 172.30.20.0 netmask 255.255.255.255 broadcast 0.0.0.0 inet6 fe80::5cef:ffff:fe37:ad2 prefixlen 64 scopeid 0x20<link> ether 5e:ef:ff:37:0a:d2 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 0 bytes 0 (0.0 B) TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0 ...... veth82301fa: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450 inet6 fe80::6855:cfff:fe99:5143 prefixlen 64 scopeid 0x20<link> ether 6a:55:cf:99:51:43 txqueuelen 0 (Ethernet) RX packets 0 bytes 0 (0.0 B) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 7 bytes 586 (586.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
部署node节点
# 把master节点上的flanneld.service文件分发到各node节点上。 scp /usr/lib/systemd/system/flanneld.service root@172.16.4.13:/usr/lib/systemd/system scp /usr/lib/systemd/system/flanneld.service root@172.16.4.13:/usr/lib/systemd/system # 重新启动flanneld systemctl daemon-reload systemctl enable flanneld systemctl start flanneld systemctl status flanneld
配置Docker
不论您使用何种方式安装的flannel,将以下配置加入到/var/lib/systemd/systemc/docker.service
中可确保万无一失。
# 待加入内容 EnvironmentFile=-/run/flannel/docker EnvironmentFile=-/run/docker_opts.env EnvironmentFile=-/run/flannel/subnet.env EnvironmentFile=-/etc/sysconfig/docker EnvironmentFile=-/etc/sysconfig/docker-storage EnvironmentFile=-/etc/sysconfig/docker-network # 最终完整的docker.service文件内容如下 [root@k8s-master ~]# cat /usr/lib/systemd/system/docker.service [Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com After=network-online.target firewalld.service Wants=network-online.target [Service] Type=notify # add by gzr EnvironmentFile=-/run/flannel/docker EnvironmentFile=-/run/docker_opts.env EnvironmentFile=-/run/flannel/subnet.env EnvironmentFile=-/etc/sysconfig/docker EnvironmentFile=-/etc/sysconfig/docker-storage EnvironmentFile=-/etc/sysconfig/docker-network # the default is not to use systemd for cgroups because the delegate issues still # exists and systemd currently does not support the cgroup feature set required # for containers run by docker ExecStart=/usr/bin/dockerd ExecReload=/bin/kill -s HUP $MAINPID # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity # Uncomment TasksMax if your systemd version supports it. # Only systemd 226 and above support this version. #TasksMax=infinity TimeoutStartSec=0 # set delegate yes so that systemd does not reset the cgroups of docker containers Delegate=yes # kill only the docker process, not all processes in the cgroup KillMode=process # restart the docker process if it exits prematurely Restart=on-failure StartLimitBurst=3 StartLimitInterval=60s [Install] WantedBy=multi-user.target
(2)启动docker
安装配置kubelet
(1)检查是否禁用sawp
[root@k8s-master ~]# free total used free shared buff/cache available Mem: 32753848 730892 27176072 377880 4846884 31116660 Swap: 0 0 0
- 或者进入
/etc/fstab
目录,将swap系统注释掉。
kubelet 启动时向 kube-apiserver 发送 TLS bootstrapping 请求,需要先将 bootstrap token 文件中的 kubelet-bootstrap 用户赋予 system:node-bootstrapper cluster 角色(role), 然后 kubelet 才能有权限创建认证请求(certificate signing requests):
(2)从master节点的/usr/local/bin将kubelet和kube-proxy文件传至各节点
[root@k8s-master ~]# scp /usr/local/bin/kubelet root@172.16.4.13:/usr/local/bin/ [root@k8s-master ~]# scp /usr/local/bin/kubelet root@172.16.4.14:/usr/local/bin/
(3)在master节点上创建角色。
# 需要在master端创建权限分配角色, 然后在node节点上再重新启动kubelet服务 [root@k8s-master kubernetes]# kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap clusterrolebinding.rbac.authorization.k8s.io/kubelet-bootstrap created
(4)创建kubelet服务
第一种方式:在node节点上创建执行脚本
# 创建kubelet的配置文件和kubelet.service文件,此处采用创建脚本kubelet.sh的一键执行。 #!/bin/bash NODE_ADDRESS=${1:-"172.16.4.13"} DNS_SERVER_IP=${2:-"10.10.10.2"} cat <<EOF >/etc/kubernetes/kubelet KUBELET_ARGS="--logtostderr=true \\ --v=4 \\ --address=${NODE_ADDRESS} \\ --hostname-override=${NODE_ADDRESS} \\ --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \\ --bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \\ --api-servers=172.16.4.12 \\ #此行可以删除 --cert-dir=/etc/kubernetes/ssl \\ --allow-privileged=true \\ --cluster-dns=${DNS_SERVER_IP} \\ --cluster-domain=cluster.local \\ --fail-swap-on=false \\ --pod-infra-container-image=registry.cn-hangzhou.aliyuncs.com/google-containers/pause-amd64:3.0" EOF cat <<EOF >/usr/lib/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet After=docker.service Requires=docker.service [Service] EnvironmentFile=-/etc/kubernetes/kubelet ExecStart=/usr/local/bin/kubelet \$KUBELET_ARGS Restart=on-failure KillMode=process [Install] WantedBy=multi-user.target EOF systemctl daemon-reload systemctl enable kubelet systemctl restart kubelet && systemctl status kubelet
2)执行脚本
chmod +x kubelet.sh ./kubelet.sh 172.16.4.14 10.10.10.2 # 同时还可以查看生成的kubelet.service文件 [root@k8s-node2 ~]# cat /usr/lib/systemd/system/kubelet.service [Unit] Description=Kubernetes Kubelet After=docker.service Requires=docker.service [Service] EnvironmentFile=-/etc/kubernetes/kubelet ExecStart=/usr/local/bin/kubelet $KUBELET_ARGS Restart=on-failure KillMode=process [Install] WantedBy=multi-user.target
- 注意:在node1上执行kubelet.sh脚本,传入172.16.4.13(node1 IP)和 10.10.10.2(DNS服务器IP)。在其他节点执行脚本时,记得替换相应的参数。
或者采用第二种方式:
1)创建kubelet的配置文件/etc/kubernetes/kubelet,内容如下:
### ## kubernetes kubelet (minion) config # ## The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces) KUBELET_ADDRESS="--address=172.16.4.12" # ## The port for the info server to serve on #KUBELET_PORT="--port=10250" # ## You may leave this blank to use the actual hostname KUBELET_HOSTNAME="--hostname-override=172.16.4.12" # ## location of the api-server ## COMMENT THIS ON KUBERNETES 1.8+ KUBELET_API_SERVER="--api-servers=http://172.16.4.12:8080" # ## pod infrastructure container KUBELET_POD_INFRA_CONTAINER="--pod-infra-container-image=jimmysong/pause-amd64:3.0" # ## Add your own! KUBELET_ARGS="--cgroup-driver=systemd \ --cluster-dns=10.10.10.2 \ --bootstrap-kubeconfig=/etc/kubernetes/bootstrap.kubeconfig \ --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \ --require-kubeconfig \ --cert-dir=/etc/kubernetes/ssl \ --cluster-domain=cluster.local \ --hairpin-mode promiscuous-bridge \ --serialize-image-pulls=false"
- 如果使用systemd方式启动,则需要额外增加两个参数
--runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
--address
不能设置为127.0.0.1
,否则后续 Pods 访问 kubelet 的 API 接口时会失败,因为 Pods 访问的127.0.0.1
指向自己而不是 kubelet;"--cgroup-driver
配置成systemd
,不要使用cgroup
,否则在 CentOS 系统中 kubelet 将启动失败(保持docker和kubelet中的cgroup driver配置一致即可,不一定非使用systemd
)。--bootstrap-kubeconfig
指向 bootstrap kubeconfig 文件,kubelet 使用该文件中的用户名和 token 向 kube-apiserver 发送 TLS Bootstrapping 请求;- 管理员通过了 CSR 请求后,kubelet 自动在
--cert-dir
目录创建证书和私钥文件(kubelet-client.crt
和kubelet-client.key
),然后写入--kubeconfig
文件; - 建议在
--kubeconfig
配置文件中指定kube-apiserver
地址,如果未指定--api-servers
选项,则必须指定--require-kubeconfig
选项后才从配置文件中读取 kube-apiserver 的地址,否则 kubelet 启动后将找不到 kube-apiserver (日志中提示未找到 API Server),kubectl get nodes
不会返回对应的 Node 信息;--require-kubeconfig
在1.10版本被移除,参看PR; --cluster-dns
指定 kubedns 的 Service IP(可以先分配,后续创建 kubedns 服务时指定该 IP),--cluster-domain
指定域名后缀,这两个参数同时指定后才会生效;--cluster-domain
指定 pod 启动时/etc/resolve.conf
文件中的search domain
,起初我们将其配置成了cluster.local.
,这样在解析 service 的 DNS 名称时是正常的,可是在解析 headless service 中的 FQDN pod name 的时候却错误,因此我们将其修改为cluster.local
,去掉最后面的 ”点号“ 就可以解决该问题,关于 kubernetes 中的域名/服务名称解析请参见我的另一篇文章。--kubeconfig=/etc/kubernetes/kubelet.kubeconfig
中指定的kubelet.kubeconfig
文件在第一次启动kubelet之前并不存在,请看下文,当通过CSR请求后会自动生成kubelet.kubeconfig
文件,如果你的节点上已经生成了~/.kube/config
文件,你可以将该文件拷贝到该路径下,并重命名为kubelet.kubeconfig
,所有node节点可以共用同一个kubelet.kubeconfig文件,这样新添加的节点就不需要再创建CSR请求就能自动添加到kubernetes集群中。同样,在任意能够访问到kubernetes集群的主机上使用kubectl --kubeconfig
命令操作集群时,只要使用~/.kube/config
文件就可以通过权限认证,因为这里面已经有认证信息并认为你是admin用户,对集群拥有所有权限。KUBELET_POD_INFRA_CONTAINER
是基础镜像容器,这里我用的是私有镜像仓库地址,大家部署的时候需要修改为自己的镜像。可以使用Google的pause镜像gcr.io/google_containers/pause-amd64:3.0
,这个镜像只有300多K。
2) 创建kubelet的service配置文件
文件位置:/usr/lib/systemd/system/kubelet.service
。
[Unit] Description=Kubernetes Kubelet Server Documentation=https://github.com/GoogleCloudPlatform/kubernetes After=docker.service Requires=docker.service [Service] WorkingDirectory=/var/lib/kubelet EnvironmentFile=-/etc/kubernetes/config EnvironmentFile=-/etc/kubernetes/kubelet ExecStart=/usr/local/bin/kubelet \ $KUBE_LOGTOSTDERR \ $KUBE_LOG_LEVEL \ $KUBELET_API_SERVER \ $KUBELET_ADDRESS \ $KUBELET_PORT \ $KUBELET_HOSTNAME \ $KUBE_ALLOW_PRIV \ $KUBELET_POD_INFRA_CONTAINER \ $KUBELET_ARGS Restart=on-failure [Install] WantedBy=multi-user.target
注意:上述两种方式都可以创建kubelet服务,个人建议采用脚本一键式执行所有任务,采用第二种方式配置时,需要手动创建工作目录:/var/lib/kubelet
。此处不再演示。
(5)通过kubelet的TLS证书请求
kubelet 首次启动时向 kube-apiserver 发送证书签名请求,必须通过后 kubernetes 系统才会将该 Node 加入到集群。
1)在master节点上查看未授权的CSR请求
[root@k8s-master ~]# kubectl get csr NAME AGE REQUESTOR CONDITION node-csr-4799pnHJjREEcWDGgSFvNaoyfcn4HiOML9cpEI1IbMs 3h6m kubelet-bootstrap Pending node-csr-e3mql7Dm878tLhPUxu2pzg8e8eM17Togc6lHQX-mXZs 3h kubelet-bootstrap Pending
2)通过CSR请求
[root@k8s-master ~]# kubectl certificate approve node-csr-4799pnHJjREEcWDGgSFvNaoyfcn4HiOML9cpEI1IbMs certificatesigningrequest.certificates.k8s.io/node-csr-4799pnHJjREEcWDGgSFvNaoyfcn4HiOML9cpEI1IbMs approved [root@k8s-master ~]# kubectl certificate approve node-csr-e3mql7Dm878tLhPUxu2pzg8e8eM17Togc6lHQX-mXZs certificatesigningrequest.certificates.k8s.io/node-csr-e3mql7Dm878tLhPUxu2pzg8e8eM17Togc6lHQX-mXZs approved # 授权后发现两个node节点的csr已经approved.
3)自动生成了 kubelet kubeconfig 文件和公私钥
[root@k8s-node1 ~]# ls -l /etc/kubernetes/kubelet.kubeconfig -rw------- 1 root root 2294 Jun 14 15:19 /etc/kubernetes/kubelet.kubeconfig [root@k8s-node1 ~]# ls -l /etc/kubernetes/ssl/kubelet* -rw------- 1 root root 1273 Jun 14 15:19 /etc/kubernetes/ssl/kubelet-client-2019-06-14-15-19-10.pem lrwxrwxrwx 1 root root 58 Jun 14 15:19 /etc/kubernetes/ssl/kubelet-client-current.pem -> /etc/kubernetes/ssl/kubelet-client-2019-06-14-15-19-10.pem -rw-r--r-- 1 root root 2177 Jun 14 11:50 /etc/kubernetes/ssl/kubelet.crt -rw------- 1 root root 1679 Jun 14 11:50 /etc/kubernetes/ssl/kubelet.key
假如你更新kubernetes的证书,只要没有更新token.csv
,当重启kubelet后,该node就会自动加入到kuberentes集群中,而不会重新发送certificaterequest
,也不需要在master节点上执行kubectl certificate approve
操作。前提是不要删除node节点上的/etc/kubernetes/ssl/kubelet*
和/etc/kubernetes/kubelet.kubeconfig
文件。否则kubelet启动时会提示找不到证书而失败。
[root@k8s-master ~]# scp /etc/kubernetes/token.csv root@172.16.4.13:/etc/kubernetes/ [root@k8s-master ~]# scp /etc/kubernetes/token.csv root@172.16.4.14:/etc/kubernetes/
注意:如果启动kubelet的时候见到证书相关的报错,有个trick可以解决这个问题,可以将master节点上的~/.kube/config
文件(该文件在
[安装kubectl命令行工具]:
这一步中将会自动生成)拷贝到node节点的/etc/kubernetes/kubelet.kubeconfig
位置,这样就不需要通过CSR,当kubelet启动后就会自动加入的集群中。注意同时记得也把.kube/config中的内容复制粘贴到/etc/kubernetes/kubelet.kubeconfig中,替换原先内容。
[root@k8s-master ~]# cat .kube/config apiVersion: v1 clusters: - cluster: certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUR2akNDQXFhZ0F3SUJBZ0lVZkJtL2lzNG1EcHdqa0M0aVFFTWF5SVJaVHVjd0RRWUpLb1pJaHZjTkFRRUwKQlFBd1pURUxNQWtHQTFVRUJoTUNRMDR4RURBT0JnTlZCQWdUQjBKbGFXcHBibWN4RURBT0JnTlZCQWNUQjBKbAphV3BwYm1jeEREQUtCZ05WQkFvVEEyczRjekVQTUEwR0ExVUVDeE1HVTNsemRHVnRNUk13RVFZRFZRUURFd3ByCmRXSmxjbTVsZEdWek1CNFhEVEU1TURZeE1qQXpNRFF3TUZvWERUSTVNRFl3T1RBek1EUXdNRm93WlRFTE1Ba0cKQTFVRUJoTUNRMDR4RURBT0JnTlZCQWdUQjBKbGFXcHBibWN4RURBT0JnTlZCQWNUQjBKbGFXcHBibWN4RERBSwpCZ05WQkFvVEEyczRjekVQTUEwR0ExVUVDeE1HVTNsemRHVnRNUk13RVFZRFZRUURFd3ByZFdKbGNtNWxkR1Z6Ck1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0NBUThBTUlJQkNnS0NBUUVBOEZQK2p0ZUZseUNPVDc0ZzRmd1UKeDl0bDY3dGVabDVwTDg4ZStESzJMclBJZDRXMDRvVDdiWTdKQVlLT3dPTkM4RjA5MzNqSjVBdmxaZmppTkJCaQp2OTlhYU5tSkdxeWozMkZaaDdhTkYrb3Fab3BYdUdvdmNpcHhYTWlXbzNlVHpWVUh3d2FBeUdmTS9BQnE0WUY0ClprSVV5UkJaK29OVXduY0tNaStOR2p6WVJyc2owZEJRR0ROZUJ6OEgzbCtjd1U1WmpZdEdFUFArMmFhZ1k5bG0KbjhyOUFna2owcW9uOEdQTFlRb2RDYzliSWZqQmVNaGIzaHJGMjJqMDhzWTczNzh3MzN5VWRHdjg1YWpuUlp6UgpIYkN6UytYRGJMTTh2aGh6dVZoQmt5NXNrWXB6M0hCNGkrTnJPR1Fmdm4yWkY0ZFh4UVUyek1Dc2NMSVppdGg0Ckt3SURBUUFCbzJZd1pEQU9CZ05WSFE4QkFmOEVCQU1DQVFZd0VnWURWUjBUQVFIL0JBZ3dCZ0VCL3dJQkFqQWQKQmdOVkhRNEVGZ1FVeTVmVncxK0s2N1dvblRuZVgwL2dFSTVNM3FJd0h3WURWUjBqQkJnd0ZvQVV5NWZWdzErSwo2N1dvblRuZVgwL2dFSTVNM3FJd0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFOb3ZXa1ovd3pEWTZSNDlNNnpDCkhoZlZtVGk2dUZwS24wSmtvMVUzcHA5WTlTTDFMaXVvK3VwUjdJOCsvUXd2Wm95VkFWMTl4Y2hRQ25RSWhRMEgKVWtybXljS0crdWtsSUFUS3ZHenpzNW1aY0NQOGswNnBSSHdvWFhRd0ZhSFBpNnFZWDBtaW10YUc4REdzTk01RwpQeHdZZUZncXBLQU9Tb0psNmw5bXErQnhtWEoyZS8raXJMc3N1amlPKzJsdnpGOU5vU29Yd1RqUGZndXhRU3VFCnZlSS9pTXBGV1o0WnlCYWJKYkw5dXBldm53RTA2RXQrM2g2N3JKOU5mZ2N5MVhNSU0xeGo1QXpzRXgwVE5ETGkKWGlOQ0Zram9zWlA3U3dZdE5ncHNuZmhEandHRUJLbXV1S3BXR280ZWNac2lMQXgwOTNaeTdKM2dqVDF6dGlFUwpzQlE9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K server: https://172.16.4.12:6443 name: kubernetes contexts: - context: cluster: kubernetes user: admin name: kubernetes current-context: kubernetes kind: Config preferences: {} users: - name: admin user: client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUQzVENDQXNXZ0F3SUJBZ0lVVmlPdjZ6aFlHMzIzdWRZS2RFWEcvRVJENW8wd0RRWUpLb1pJaHZjTkFRRUwKQlFBd1pURUxNQWtHQTFVRUJoTUNRMDR4RURBT0JnTlZCQWdUQjBKbGFXcHBibWN4RURBT0JnTlZCQWNUQjBKbAphV3BwYm1jeEREQUtCZ05WQkFvVEEyczRjekVQTUEwR0ExVUVDeE1HVTNsemRHVnRNUk13RVFZRFZRUURFd3ByCmRXSmxjbTVsZEdWek1CNFhEVEU1TURZeE1qQTJORGd3TUZvWERUSTVNRFl3T1RBMk5EZ3dNRm93YXpFTE1Ba0cKQTFVRUJoTUNRMDR4RURBT0JnTlZCQWdUQjBKbGFVcHBibWN4RURBT0JnTlZCQWNUQjBKbGFVcHBibWN4RnpBVgpCZ05WQkFvVERuTjVjM1JsYlRwdFlYTjBaWEp6TVE4d0RRWURWUVFMRXdaVGVYTjBaVzB4RGpBTUJnTlZCQU1UCkJXRmtiV2x1TUlJQklqQU5CZ2txaGtpRzl3MEJBUUVGQUFPQ0FROEFNSUlCQ2dLQ0FRRUFuL29MQVpCcENUdWUKci95eU15a1NYelBpWk9mVFdZQmEwNjR6c2Y1Y1Z0UEt2cnlCSjVHVlVSUlFUc2F3eWdFdnFBSXI3TUJrb21GOQpBeFVNaFNxdlFjNkFYemQzcjRMNW1CWGQxZ3FoWVNNR2lJL3hEMG5RaEF1azBFbVVONWY5ZENZRmNMMTVBVnZSCituN2wwaVcvVzlBRjRqbXRtYUtLVUdsUU9vNzQ3anNCYWRndU9SVHBMSkwxUGw3SlVLZnFBWktEbFVXZnpwZXcKOE1ETVMzN1FodmVQc24va2RwUVZ0bzlJZWcwSFhBcXlmZHNaZjZKeGdaS1FmUUNyYlJEMkd2L29OVVRlYnpWMwpWVm9ueEpUYmFrZFNuOHR0cCtLWFlzTUYvQy8wR29sL1JkS1Mrc0t4Z2hUUWdJMG5CZXJBM0x0dGp6WVpySWJBClo0RXBRNmc0ZFFJREFRQUJvMzh3ZlRBT0JnTlZIUThCQWY4RUJBTUNCYUF3SFFZRFZSMGxCQll3RkFZSUt3WUIKQlFVSEF3RUdDQ3NHQVFVRkJ3TUNNQXdHQTFVZEV3RUIvd1FDTUFBd0hRWURWUjBPQkJZRUZCQThrdnFaVDhRRApaSnIvTUk2L2ZWalpLdVFkTUI4R0ExVWRJd1FZTUJhQUZNdVgxY05maXV1MXFKMDUzbDlQNEJDT1RONmlNQTBHCkNTcUdTSWIzRFFFQkN3VUFBNElCQVFDMnZzVDUwZVFjRGo3RVUwMmZQZU9DYmJ6cFZWazEzM3NteGI1OW83YUgKRDhONFgvc3dHVlYzU0V1bVNMelJYWDJSYUsyUU04OUg5ZDlpRkV2ZzIvbjY3VThZeVlYczN0TG9Ua29NbzlUZgpaM0FNN0NyM0V5cWx6OGZsM3p4cmtINnd1UFp6VWNXV29vMUJvR1VCbEM1Mi9EbFpQMkZCbHRTcWtVL21EQ3IxCnJJWkFYYjZDbXNNZG1SQzMrYWwxamVUak9MZEcwMUd6dlBZdEdsQ0p2dHRJNzBuVkR3Nkh3QUpkRVN0UUh0cWsKakpCK3NZU2NSWDg1YTlsUXVIU21DY0kyQWxZQXFkK0t2NnNKNUVFZnpwWHNUVXdya0tKbjJ0UTN2UVNLaEgyawpabUx2N0MvcWV6YnJvc3pGeHNZWEtRelZiODVIVkxBbXo2UVhYV1I2Q0ZzMAotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== client-key-data: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb2dJQkFBS0NBUUVBbi9vTEFaQnBDVHVlci95eU15a1NYelBpWk9mVFdZQmEwNjR6c2Y1Y1Z0UEt2cnlCCko1R1ZVUlJRVHNhd3lnRXZxQUlyN01Ca29tRjlBeFVNaFNxdlFjNkFYemQzcjRMNW1CWGQxZ3FoWVNNR2lJL3gKRDBuUWhBdWswRW1VTjVmOWRDWUZjTDE1QVZ2UituN2wwaVcvVzlBRjRqbXRtYUtLVUdsUU9vNzQ3anNCYWRndQpPUlRwTEpMMVBsN0pVS2ZxQVpLRGxVV2Z6cGV3OE1ETVMzN1FodmVQc24va2RwUVZ0bzlJZWcwSFhBcXlmZHNaCmY2SnhnWktRZlFDcmJSRDJHdi9vTlVUZWJ6VjNWVm9ueEpUYmFrZFNuOHR0cCtLWFlzTUYvQy8wR29sL1JkS1MKK3NLeGdoVFFnSTBuQmVyQTNMdHRqellackliQVo0RXBRNmc0ZFFJREFRQUJBb0lCQUE1cXFDZEI3bFZJckNwTAo2WHMyemxNS0IvTHorVlh0ZlVIcVJ2cFpZOVRuVFRRWEpNUitHQ2l3WGZSYmIzOGswRGloeVhlU2R2OHpMZUxqCk9MZWZleC9CRGt5R1lTRE4rdFE3MUR2L3hUOU51cjcveWNlSTdXT1k4UWRjT2lFd2IwVFNVRmN5bS84RldVenIKdHFaVGhJVXZuL2dkSG9uajNmY1ZKb2ZBYnFwNVBrLzVQd2hFSU5Pdm1FTFZFQWl6VnBWVmwxNzRCSGJBRHU1Sgp2Nm9xc0h3SUhwNC9ZbGo2NHhFVUZ1ZFA2Tkp0M1B5Uk14dW5RcWd3SWZ1bktuTklRQmZEVUswSklLK1luZmlJClgrM1lQam5sWFU3UnhYRHRFa3pVWTFSTTdVOHJndHhiNWRQWnhocGgyOFlFVnJBVW5RS2RSTWdCVVNad3hWRUYKeFZqWmVwa0NnWUVBeEtHdXExeElHNTZxL2RHeGxDODZTMlp3SkxGajdydTkrMkxEVlZsL2h1NzBIekJ6dFFyNwpMUGhUZnl2SkVqNTcwQTlDbk4ybndjVEQ2U1dqbkNDbW9ESk10Ti9iZlJaMThkZTU4b0JCRDZ5S0JGbmV1eWkwCk1oVWFmSzN5M091bGkxMjBKS3lQb2hvN1lyWUxNazc1UzVEeVRGMlEyV3JYY0VQaTlVRzNkNzhDZ1lFQTBFY3YKTUhDbE9XZ1hJUVNXNCtreFVEVXRiOFZPVnpwYjd3UWZCQ3RmSTlvTDBnVWdBd1M2U0lub2tET3ozdEl4aXdkQQpWZTVzMklHbVAzNS9qdm5FbThnaE1XbEZ3eHB5ZUxKK0hraTl1dFNPblJGWHYvMk9JdjBYbE01RlY5blBmZ01NCkMxQ09zZklKaVREaXJFOGQrR2cxV010dWxkVGo4Z0JKazRQRXZNc0NnWUJoNHA4aWZVa0VQdU9lZ1hJbWM3QlEKY3NsbTZzdjF2NDVmQTVaNytaYkxwRTd3Njl6ZUJuNXRyNTFaVklHL1RFMjBrTFEzaFB5TE1KbmFpYnM5OE44aQpKb2diRHNta0pyZEdVbjhsNG9VQStZS25rZG1ZVURZTUxJZElCQXcvd0N0a0NweXdHUnRUdGoxVDhZMzNXR3N3CkhCTVN3dzFsdnBOTE52Qlg2WVFjM3dLQmdHOHAvenJJZExjK0lsSWlJL01EREtuMXFBbW04cGhGOHJtUXBvbFEKS05oMjBhWkh5LzB3Y2NpenFxZ0VvSFZHRk9GU2Zua2U1NE5yTjNOZUxmRCt5SHdwQmVaY2ZMcVVqQkoxbWpESgp2RkpTanNld2NQaHMrWWNkTkkvY3hGQU9WZHU0L3Aydlltb0JlQ3Q4SncrMnJwVmQ4Vk15U1JTNWF1eElVUHpsCjhJU2ZBb0dBVituYjJ3UGtwOVJ0NFVpdmR0MEdtRjErQ052YzNzY3JYb3RaZkt0TkhoT0o2UTZtUkluc2tpRWgKVnFQRjZ6U1BnVmdrT1hmU0xVQ3Y2cGdWR2J5d0plRWo1SElQRHFuU25vNFErZFl2TXozcWN5d1hLbFEyUjZpcAo3VE0wWHNJaGFMRDFmWUNjaDhGVHNiZHNrQUNZUHpzeEdBa1l2TnRDcDI5WExCRmZWbkE9Ci0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg== # 分发.kube/config到各节点。 [root@k8s-master ~]# scp .kube/config root@172.16.4.13:/etc/kubernetes/ [root@k8s-master ~]# scp .kube/config root@172.16.4.14:/etc/kubernetes/ # 比如在node2的/etc/kubernetes/目录下则出现了config文件。 [root@k8s-node2 ~]# ls /etc/kubernetes/ bin bootstrap.kubeconfig config kubelet kubelet.kubeconfig kube-proxy.kubeconfig ssl token.csv
配置kube-proxy
脚本方式配置
(1)编写kube-proxy.sh脚本内容如下(在各node上编写该脚本):
#!/bin/bash NODE_ADDRESS=${1:-"172.16.4.13"} cat <<EOF >/etc/kubernetes/kube-proxy KUBE_PROXY_ARGS="--logtostderr=true \ --v=4 \ --hostname-override=${NODE_ADDRESS} \ --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig" EOF cat <<EOF >/usr/lib/systemd/system/kube-proxy.service [Unit] Description=Kubernetes Proxy After=network.target [Service] EnvironmentFile=-/etc/kubernetes/kube-proxy ExecStart=/usr/local/bin/kube-proxy \$KUBE_PROXY_ARGS Restart=on-failure [Install] WantedBy=multi-user.target EOF systemctl daemon-reload && systemctl enable kube-proxy systemctl restart kube-proxy && systemctl status kube-proxy
--hostname-override
参数值必须与 kubelet 的值一致,否则 kube-proxy 启动后会找不到该 Node,从而不会创建任何 iptables 规则;- kube-proxy 根据
--cluster-cidr
判断集群内部和外部流量,指定--cluster-cidr
或--masquerade-all
选项后 kube-proxy 才会对访问 Service IP 的请求做 SNAT; --kubeconfig
指定的配置文件嵌入了 kube-apiserver 的地址、用户名、证书、秘钥等请求和认证信息;- 预定义的 RoleBinding
cluster-admin
将Usersystem:kube-proxy
与 Rolesystem:node-proxier
绑定,该 Role 授予了调用kube-apiserver
Proxy 相关 API 的权限;
完整 unit 见 kube-proxy.service
(2)执行脚本
# 首先将前端master的kube-proxy命令拷贝至各个节点。 [root@k8s-master ~]# scp /usr/local/bin/kube-proxy root@172.16.4.13:/usr/local/bin/ [root@k8s-master ~]# scp /usr/local/bin/kube-proxy root@172.16.4.14:/usr/local/bin/ # 并在各个节点上更改执行权限。 chmod +x kube-proxy.sh [root@k8s-node2 ~]# ./kube-proxy.sh 172.16.4.14 Created symlink from /etc/systemd/system/multi-user.target.wants/kube-proxy.service to /usr/lib/systemd/system/kube-proxy.service. ● kube-proxy.service - Kubernetes Proxy Loaded: loaded (/usr/lib/systemd/system/kube-proxy.service; enabled; vendor preset: disabled) Active: active (running) since Fri 2019-06-14 16:01:47 CST; 39ms ago Main PID: 117068 (kube-proxy) Tasks: 10 Memory: 8.8M CGroup: /system.slice/kube-proxy.service └─117068 /usr/local/bin/kube-proxy --logtostderr=true --v=4 --hostname-override=172.16.4.14 --kubeconfig=/etc/kubernetes/kube-proxy.kubeconfig Jun 14 16:01:47 k8s-node2 systemd[1]: Started Kubernetes Proxy.
(3) --kubeconfig=/etc/kubernetes/kubelet.kubeconfig中指定的
kubelet.kubeconfig文件在第一次启动kubelet之前并不存在,请看下文,当通过CSR请求后会自动生成
kubelet.kubeconfig文件,如果你的节点上已经生成了
/.kube/config/.kube/config`文件就可以通过权限认证,因为这里面已经有认证信息并认为你是admin用户,对集群拥有所有权限。文件,你可以将该文件拷贝到该路径下,并重命名为
kubelet.kubeconfig,所有node节点可以共用同一个kubelet.kubeconfig文件,这样新添加的节点就不需要再创建CSR请求就能自动添加到kubernetes集群中。同样,在任意能够访问到kubernetes集群的主机上使用
kubectl --kubeconfig命令操作集群时,只要使用
[root@k8s-master ~]# scp .kube/config root@172.16.4.13:/etc/kubernetes/ [root@k8s-node1 ~]# mv config kubelet.kubeconfig [root@k8s-master ~]# scp .kube/config root@172.16.4.14:/etc/kubernetes/ [root@k8s-node2 ~]# mv config kubelet.kubeconfig
验证测试
# 以下操作在master节点上运行。 [root@k8s-master ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION 172.16.4.13 Ready <none> 66s v1.14.3 172.16.4.14 Ready <none> 7m14s v1.14.3 [root@k8s-master ~]# kubectl get cs NAME STATUS MESSAGE ERROR scheduler Healthy ok controller-manager Healthy ok etcd-0 Healthy {"health":"true"} etcd-1 Healthy {"health":"true"} etcd-2 Healthy {"health":"true"} # 以nginx服务测试集群可用性 [root@k8s-master ~]# kubectl run nginx --replicas=3 --labels="run=load-balancer-example" --image=nginx --port=80 kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead. deployment.apps/nginx created [root@k8s-master ~]# kubectl expose deployment nginx --type=NodePort --name=example-service service/example-service exposed [root@k8s-master ~]# kubectl describe svc example-service Name: example-service Namespace: default Labels: run=load-balancer-example Annotations: <none> Selector: run=load-balancer-example Type: NodePort IP: 10.10.10.222 Port: <unset> 80/TCP TargetPort: 80/TCP NodePort: <unset> 40905/TCP Endpoints: 172.17.0.2:80,172.17.0.2:80,172.17.0.3:80 Session Affinity: None External Traffic Policy: Cluster Events: <none> # 在node节点上访问 [root@k8s-node1 ~]# curl "10.10.10.222:80" <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html> # 外网测试访问 [root@k8s-master ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE example-service NodePort 10.10.10.222 <none> 80:40905/TCP 6m26s kubernetes ClusterIP 10.10.10.1 <none> 443/TCP 21h # 由上可知,服务暴露外网端口为40905.输入172.16.4.12:40905即可访问。
DNS服务搭建与配置
从k8s v1.11版本开始,Kubernetes集群的DNS服务由CoreDNS提供。它是CNCF基金会的一个项目,使用Go语言实现的高性能、插件式、易扩展的DNS服务端。它解决了KubeDNS的一些问题,如dnsmasq的安全漏洞,externalName不能使用stubDomains设置等。
安装CoreDNS插件
官方的yaml文件目录:https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns/coredns
在部署CoreDNS应用前,至少需要创建一个ConfigMap,一个Deployment和一个Service共3个资源对象。在启用了RBAC的集群中,还可以设置ServiceAccount、ClusterRole、ClusterRoleBinding对CoreDNS容器进行权限限制。
(1)为了起到镜像加速的作用,首先将docker的配置源更改为国内阿里云
cat << EOF > /etc/docker/daemon.json { "registry-mirrors":["https://registry.docker-cn.com","https://h23rao59.mirror.aliyuncs.com"] } EOF # 重新载入配置并重启docker [root@k8s-master ~]# systemctl daemon-reload && systemctl restart docker
(2)此处将svc,configmap,ServiceAccount等写在一个yaml文件里,coredns.yaml内容见下。
[root@k8s-master ~]# cat coredns.yaml apiVersion: v1 kind: ServiceAccount metadata: name: coredns namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: kubernetes.io/bootstrapping: rbac-defaults name: system:coredns rules: - apiGroups: - "" resources: - endpoints - services - pods - namespaces verbs: - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:coredns roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:coredns subjects: - kind: ServiceAccount name: coredns namespace: kube-system --- apiVersion: v1 kind: ConfigMap metadata: name: coredns namespace: kube-system data: Corefile: | .:53 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } prometheus :9153 proxy . /etc/resolv.conf cache 30 loop reload loadbalance } --- apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: k8s-app: kube-dns name: coredns namespace: kube-system spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: kube-dns strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 1 type: RollingUpdate template: metadata: labels: k8s-app: kube-dns spec: containers: - args: - -conf - /etc/coredns/Corefile image: docker.io/fengyunpan/coredns:1.2.6 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 5 httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 60 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: coredns ports: - containerPort: 53 name: dns protocol: UDP - containerPort: 53 name: dns-tcp protocol: TCP - containerPort: 9153 name: metrics protocol: TCP resources: limits: memory: 170Mi requests: cpu: 100m memory: 70Mi securityContext: allowPrivilegeEscalation: false capabilities: add: - NET_BIND_SERVICE drop: - all procMount: Default readOnlyRootFilesystem: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/coredns name: config-volume readOnly: true dnsPolicy: Default restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: coredns serviceAccountName: coredns terminationGracePeriodSeconds: 30 tolerations: - key: CriticalAddonsOnly operator: Exists - effect: NoSchedule key: node-role.kubernetes.io/master volumes: - configMap: defaultMode: 420 items: - key: Corefile path: Corefile name: coredns name: config-volume --- apiVersion: v1 kind: Service metadata: labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" kubernetes.io/name: KubeDNS name: kube-dns namespace: kube-system spec: selector: k8s-app: coredns clusterIP: 10.10.10.2 ports: - name: dns port: 53 protocol: UDP targetPort: 53 - name: dns-tcp port: 53 protocol: TCP targetPort: 53 selector: k8s-app: kube-dns
- clusterIP: 10.10.10.2是我集群各节点的DNS服务器IP,注意修改。并且在各node节点的kubelet的启动参数中加入以下两个参数:
- --cluster-dns=10.10.10.2:为DNS服务的ClusterIP地址。
- --cluster-domain=cluster.local:为在DNS服务中设置的域名
然后重启kubelet服务。
(3)通过kubectl create创建CoreDNS服务。
[root@k8s-master ~]# kubectl create -f coredns.yaml serviceaccount/coredns created clusterrole.rbac.authorization.k8s.io/system:coredns created clusterrolebinding.rbac.authorization.k8s.io/system:coredns created configmap/coredns created deployment.extensions/coredns created service/kube-dns created [root@k8s-master ~]# kubectl get all -n kube-system NAME READY STATUS RESTARTS AGE pod/coredns-5fc7b65789-rqk6f 1/1 Running 0 20s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-dns ClusterIP 10.10.10.2 <none> 53/UDP,53/TCP 20s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/coredns 1/1 1 1 20s NAME DESIRED CURRENT READY AGE replicaset.apps/coredns-5fc7b65789 1 1 1 20s
(4)验证DNS服务
接下来使用一个带有nslookup工具的Pod来验证DNS服务是否能正常工作:
创建busybox.yaml内容如下:
[root@k8s-master ~]# cat busybox.yaml apiVersion: v1 kind: Pod metadata: name: busybox namespace: default spec: containers: - name: busybox image: registry.cn-hangzhou.aliyuncs.com/google_containers/busybox command: - sleep - "3600" imagePullPolicy: IfNotPresent restartPolicy: Always
采用kubectl apply命令创建pod
[root@k8s-master ~]# kubectl apply -f busybox.yaml pod/busybox created # 采用kubectl describe命令发现busybox创建成功 [root@k8s-master ~]# kubectl describe po/busybox ....... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4s default-scheduler Successfully assigned default/busybox to 172.16.4.13 Normal Pulling 4s kubelet, 172.16.4.13 Pulling image "registry.cn-hangzhou.aliyuncs.com/google_containers/busybox" Normal Pulled 1s kubelet, 172.16.4.13 Successfully pulled image "registry.cn-hangzhou.aliyuncs.com/google_containers/busybox" Normal Created 1s kubelet, 172.16.4.13 Created container busybox Normal Started 1s kubelet, 172.16.4.13 Started container busybox
在容器成功启动后,通过kubectl exec <contaier_name> nslookup进行测试。</contaier_name>
[root@k8s-master ~]# kubectl exec busybox -- nslookup kubernetes Server: 10.10.10.2 Address 1: 10.10.10.2 kube-dns.kube-system.svc.cluster.local Name: kubernetes Address 1: 10.10.10.1 kubernetes.default.svc.cluster.local
注意:如果某个Service属于不同的命名空间,那么在进行Service查找时,需要补充Namespace的名称,组合完整的域名,下面以查找kube-dns服务为例,将其所在的Namespace“kube-system”补充在服务名之后,用“.”连接为”kube-dns.kube-system“,即可查询成功:
# 错误案例,没有指定namespace [root@k8s-master ~]# kubectl exec busybox -- nslookup kube-dns nslookup: can't resolve 'kube-dns' Server: 10.10.10.2 Address 1: 10.10.10.2 kube-dns.kube-system.svc.cluster.local command terminated with exit code 1 # 成功案例。 [root@k8s-master ~]# kubectl exec busybox -- nslookup kube-dns.kube-system Server: 10.10.10.2 Address 1: 10.10.10.2 kube-dns.kube-system.svc.cluster.local Name: kube-dns.kube-system Address 1: 10.10.10.2 kube-dns.kube-system.svc.cluster.local
安装dashboard插件
Kubernetes的Web UI网页管理工具kubernetes-dashboard可提供部署应用、资源对象管理、容器日志查询、系统监控等常用的集群管理功能。为了在页面上显示系统资源的使用情况,要求部署Metrics Server。参考:
dashboard官方文件目录:
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dashboard
由于 kube-apiserver
启用了 RBAC
授权,而官方源码目录的 dashboard-controller.yaml
没有定义授权的 ServiceAccount,所以后续访问 API server 的 API 时会被拒绝,不过从k8s v.18.3中官方文档提供了dashboard.rbac.yaml文件。
(1)创建部署文件kubernetes-dashboard.yaml,其内容如下:
# Copyright 2017 The Kubernetes Authors. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ------------------- Dashboard Secret ------------------- # apiVersion: v1 kind: Secret metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard-certs namespace: kube-system type: Opaque --- # ------------------- Dashboard Service Account ------------------- # apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kube-system --- # ------------------- Dashboard Role & Role Binding ------------------- # kind: Role apiVersion: rbac.authorization.k8s.io/v1 metadata: name: kubernetes-dashboard-minimal namespace: kube-system rules: # Allow Dashboard to create 'kubernetes-dashboard-key-holder' secret. - apiGroups: [""] resources: ["secrets"] verbs: ["create"] # Allow Dashboard to create 'kubernetes-dashboard-settings' config map. - apiGroups: [""] resources: ["configmaps"] verbs: ["create"] # Allow Dashboard to get, update and delete Dashboard exclusive secrets. - apiGroups: [""] resources: ["secrets"] resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs"] verbs: ["get", "update", "delete"] # Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map. - apiGroups: [""] resources: ["configmaps"] resourceNames: ["kubernetes-dashboard-settings"] verbs: ["get", "update"] # Allow Dashboard to get metrics from heapster. - apiGroups: [""] resources: ["services"] resourceNames: ["heapster"] verbs: ["proxy"] - apiGroups: [""] resources: ["services/proxy"] resourceNames: ["heapster", "http:heapster:", "https:heapster:"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kubernetes-dashboard-minimal namespace: kube-system roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: kubernetes-dashboard-minimal subjects: - kind: ServiceAccount name: kubernetes-dashboard namespace: kube-system --- # ------------------- Dashboard Deployment ------------------- # kind: Deployment apiVersion: apps/v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kube-system spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: k8s-app: kubernetes-dashboard template: metadata: labels: k8s-app: kubernetes-dashboard spec: containers: - name: kubernetes-dashboard image: lizhenliang/kubernetes-dashboard-amd64:v1.10.1 ports: - containerPort: 8443 protocol: TCP args: - --auto-generate-certificates # Uncomment the following line to manually specify Kubernetes API server Host # If not specified, Dashboard will attempt to auto discover the API server and connect # to it. Uncomment only if the default does not work. # - --apiserver-host=http://my-address:port volumeMounts: - name: kubernetes-dashboard-certs mountPath: /certs # Create on-disk volume to store exec logs - mountPath: /tmp name: tmp-volume livenessProbe: httpGet: scheme: HTTPS path: / port: 8443 initialDelaySeconds: 30 timeoutSeconds: 30 volumes: - name: kubernetes-dashboard-certs secret: secretName: kubernetes-dashboard-certs - name: tmp-volume emptyDir: {} serviceAccountName: kubernetes-dashboard # Comment the following tolerations if Dashboard must not be deployed on master tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule --- # ------------------- Dashboard Service ------------------- # kind: Service apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kube-system spec: type: NodePort ports: - port: 443 targetPort: 8443 selector: k8s-app: kubernetes-dashboard
(2)查看创建状态
[root@k8s-master ~]# kubectl get all -n kube-system | grep dashboard pod/kubernetes-dashboard-7df98d85bd-jbwh2 1/1 Running 0 18m service/kubernetes-dashboard NodePort 10.10.10.91 <none> 443:41498/TCP 18m deployment.apps/kubernetes-dashboard 1/1 1 1 18m replicaset.apps/kubernetes-dashboard-7df98d85bd 1 1 1 18m
(3)此时可以通过node节点的31116端口进行访问。输入:https://172.16.4.13:41498
或 https://172.16.4.14:41498
.
并且通过之前的CoreDNS能够解析到其服务的IP地址:
[root@k8s-master ~]# kubectl get svc -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.10.10.2 <none> 53/UDP,53/TCP 5h38m kubernetes-dashboard NodePort 10.10.10.91 <none> 443:41498/TCP 26m [root@k8s-master ~]# kubectl exec busybox -- nslookup kubernetes-dashboard.kube-system Server: 10.10.10.2 Address 1: 10.10.10.2 kube-dns.kube-system.svc.cluster.local Name: kubernetes-dashboard.kube-system Address 1: 10.10.10.91 kubernetes-dashboard.kube-system.svc.cluster.local
(4)创建SA并绑定cluster-admin管理员集群角色
[root@k8s-master ~]# kubectl create serviceaccount dashboard-admin -n kube-system serviceaccount/dashboard-admin created [root@k8s-master ~]# kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin clusterrolebinding.rbac.authorization.k8s.io/dashboard-admin created # 查看已创建的serviceaccount [root@k8s-master ~]# kubectl get secret -n kube-system | grep admin dashboard-admin-token-69zsx kubernetes.io/service-account-token 3 65s # 查看生成的token的具体信息并将token值复制到浏览器中,采用令牌登录。 [root@k8s-master ~]# kubectl describe secret dashboard-admin-token-69zsx -n kube-system Name: dashboard-admin-token-69zsx Namespace: kube-system Labels: <none> Annotations: kubernetes.io/service-account.name: dashboard-admin kubernetes.io/service-account.uid: dfe59297-8f46-11e9-b92b-e67418705759 Type: kubernetes.io/service-account-token Data ==== ca.crt: 1359 bytes namespace: 11 bytes token: eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tNjl6c3giLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiZGZlNTkyOTctOGY0Ni0xMWU5LWI5MmItZTY3NDE4NzA1NzU5Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.Wl6WiT6MZ-37ArWhPuhudac5S1Y8v2GxiUdNcy4hIwHQ1EdtzaAlvpx1mLZsQoDYJCeM6swVtNgJwhO5ESZAYQVi9xCrXsQcEDIeBkjyzpu6U4XHmab7SuS0_KEsGXhe57XKq86ogK9bAyNvNWE497V2giJJy5eR6CHKH3GR6mIwTQDSKEf-GfDfs9SHvQxRjchsrYLJLS3B_XfZyNHFXcieMZHy7V7Ehx2jMzwh6WNk6Mqk5N-IlZQRxmTBHTe3i9efN8r7CjvRhZdKc5iF6V4eG0QWkxR95WOzgV2QCCyLh4xEJw895FlHFJ1oTR2sUIRugnzyfqZaPQxdXcrc7Q
(5)在浏览器中选择token方式登录,即可查看到集群的状态:
- 注意:访问dashboard实际上有三种方式,上述过程只演示了第一种方式:
- kubernetes-dashboard 服务暴露了 NodePort,可以使用
http://NodeIP:nodePort
地址访问 dashboard。 - 通过 API server 访问 dashboard(https 6443端口和http 8080端口方式)。
- 通过 kubectl proxy 访问 dashboard。
- kubernetes-dashboard 服务暴露了 NodePort,可以使用
采用kubectl proxy访问dashboard
(1)启动代理
[root@k8s-master ~]# kubectl proxy --address='172.16.4.12' --port=8086 --accept-hosts='^*$' Starting to serve on 172.16.4.12:8086
(2)访问dashboard
访问URL:http://172.16.4.12:8086/ui
自动跳转到:http://172.16.4.12:8086/api/v1/proxy/namespaces/kube-system/services/kubernetes-dashboard/#/workload?namespace=default
安装heapster插件
准备镜像
到 heapster release 页面 下载最新版本的 heapster。
wget https://github.com/kubernetes-retired/heapster/archive/v1.5.4.tar.gz tar zxvf heapster-1.5.4.tar.gz [root@k8s-master ~]# cd heapster-1.5.4/deploy/kube-config/influxdb/ && ls grafana.yaml heapster.yaml influxdb.yaml
(1)我们修改heapster.yaml后内容如下:
# ------------------- Heapster Service Account ------------------- # apiVersion: v1 kind: ServiceAccount metadata: name: heapster namespace: kube-system --- # ------------------- Heapster Role & Role Binding ------------------- # kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: heapster subjects: - kind: ServiceAccount name: heapster namespace: kube-system roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io --- # ------------------- Heapster Deployment ------------------- # apiVersion: extensions/v1beta1 kind: Deployment metadata: name: heapster namespace: kube-system spec: replicas: 1 template: metadata: labels: task: monitoring k8s-app: heapster spec: serviceAccountName: heapster containers: - name: heapster image: registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-amd64:v1.5.3 imagePullPolicy: IfNotPresent command: - /heapster - --source=kubernetes:https://kubernetes.default - --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086 --- # ------------------- Heapster Service ------------------- # apiVersion: v1 kind: Service metadata: labels: task: monitoring # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) # If you are NOT using this as an addon, you should comment out this line. kubernetes.io/cluster-service: 'true' kubernetes.io/name: Heapster name: heapster namespace: kube-system spec: ports: - port: 80 targetPort: 8082 selector: k8s-app: heapster
(2)我们修改influxdb.yaml后内容如下:
[root@k8s-master influxdb]# cat influxdb.yaml # ------------------- Influxdb Deployment ------------------- # apiVersion: extensions/v1beta1 kind: Deployment metadata: name: monitoring-influxdb namespace: kube-system spec: replicas: 1 template: metadata: labels: task: monitoring k8s-app: influxdb spec: containers: - name: influxdb image: registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-influxdb-amd64:v1.3.3 volumeMounts: - mountPath: /data name: influxdb-storage volumes: - name: influxdb-storage emptyDir: {} --- # ------------------- Influxdb Service ------------------- # apiVersion: v1 kind: Service metadata: labels: task: monitoring # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) # If you are NOT using this as an addon, you should comment out this line. kubernetes.io/cluster-service: 'true' kubernetes.io/name: monitoring-influxdb name: monitoring-influxdb namespace: kube-system spec: type: NodePort ports: - port: 8086 targetPort: 8086 name: http - port: 8083 targetPort: 8083 name: admin selector: k8s-app: influxdb --- #-------------------Influxdb Cm-----------------# apiVersion: v1 kind: ConfigMap metadata: name: influxdb-config namespace: kube-system data: config.toml: | reporting-disabled = true bind-address = ":8088" [meta] dir = "/data/meta" retention-autocreate = true logging-enabled = true [data] dir = "/data/data" wal-dir = "/data/wal" query-log-enabled = true cache-max-memory-size = 1073741824 cache-snapshot-memory-size = 26214400 cache-snapshot-write-cold-duration = "10m0s" compact-full-write-cold-duration = "4h0m0s" max-series-per-database = 1000000 max-values-per-tag = 100000 trace-logging-enabled = false [coordinator] write-timeout = "10s" max-concurrent-queries = 0 query-timeout = "0s" log-queries-after = "0s" max-select-point = 0 max-select-series = 0 max-select-buckets = 0 [retention] enabled = true check-interval = "30m0s" [admin] enabled = true bind-address = ":8083" https-enabled = false https-certificate = "/etc/ssl/influxdb.pem" [shard-precreation] enabled = true check-interval = "10m0s" advance-period = "30m0s" [monitor] store-enabled = true store-database = "_internal" store-interval = "10s" [subscriber] enabled = true http-timeout = "30s" insecure-skip-verify = false ca-certs = "" write-concurrency = 40 write-buffer-size = 1000 [http] enabled = true bind-address = ":8086" auth-enabled = false log-enabled = true write-tracing = false pprof-enabled = false https-enabled = false https-certificate = "/etc/ssl/influxdb.pem" https-private-key = "" max-row-limit = 10000 max-connection-limit = 0 shared-secret = "" realm = "InfluxDB" unix-socket-enabled = false bind-socket = "/var/run/influxdb.sock" [[graphite]] enabled = false bind-address = ":2003" database = "graphite" retention-policy = "" protocol = "tcp" batch-size = 5000 batch-pending = 10 batch-timeout = "1s" consistency-level = "one" separator = "." udp-read-buffer = 0 [[collectd]] enabled = false bind-address = ":25826" database = "collectd" retention-policy = "" batch-size = 5000 batch-pending = 10 batch-timeout = "10s" read-buffer = 0 typesdb = "/usr/share/collectd/types.db" [[opentsdb]] enabled = false bind-address = ":4242" database = "opentsdb" retention-policy = "" consistency-level = "one" tls-enabled = false certificate = "/etc/ssl/influxdb.pem" batch-size = 1000 batch-pending = 5 batch-timeout = "1s" log-point-errors = true [[udp]] enabled = false bind-address = ":8089" database = "udp" retention-policy = "" batch-size = 5000 batch-pending = 10 read-buffer = 0 batch-timeout = "1s" precision = "" [continuous_queries] log-enabled = true enabled = true run-interval = "1s"
(3)我们修改grafana.yaml后文件内容如下:
[root@k8s-master influxdb]# cat grafana.yaml #------------Grafana Deployment----------------# apiVersion: extensions/v1beta1 kind: Deployment metadata: name: monitoring-grafana namespace: kube-system spec: replicas: 1 template: metadata: labels: task: monitoring k8s-app: grafana spec: containers: - name: grafana image: registry.cn-hangzhou.aliyuncs.com/google_containers/heapster-grafana-amd64:v4.4.3 ports: - containerPort: 3000 protocol: TCP volumeMounts: #- mountPath: /etc/ssl/certs # name: ca-certificates # readOnly: true - mountPath: /var name: grafana-storage env: - name: INFLUXDB_HOST value: monitoring-influxdb #- name: GF_SERVER_HTTP_PORT - name: GRAFANA_PORT value: "3000" # The following env variables are required to make Grafana accessible via # the kubernetes api-server proxy. On production clusters, we recommend # removing these env variables, setup auth for grafana, and expose the grafana # service using a LoadBalancer or a public IP. - name: GF_AUTH_BASIC_ENABLED value: "false" - name: GF_AUTH_ANONYMOUS_ENABLED value: "true" - name: GF_AUTH_ANONYMOUS_ORG_ROLE value: Admin - name: GF_SERVER_ROOT_URL # If you're only using the API Server proxy, set this value instead: value: /api/v1/namespaces/kube-system/services/monitoring-grafana/proxy # value: / volumes: # - name: ca-certificates # hostPath: # path: /etc/ssl/certs - name: grafana-storage emptyDir: {} --- #------------Grafana Service----------------# apiVersion: v1 kind: Service metadata: labels: # For use as a Cluster add-on (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) # If you are NOT using this as an addon, you should comment out this line. kubernetes.io/cluster-service: 'true' kubernetes.io/name: monitoring-grafana name: monitoring-grafana namespace: kube-system spec: # In a production setup, we recommend accessing Grafana through an external Loadbalancer # or through a public IP. # type: LoadBalancer # You could also use NodePort to expose the service at a randomly-generated port # type: NodePort ports: - port: 80 targetPort: 3000 selector: k8s-app: grafana
执行所有定义文件
[root@k8s-master influxdb]# pwd /root/heapster-1.5.4/deploy/kube-config/influxdb [root@k8s-master influxdb]# ls grafana.yaml heapster.yaml influxdb.yaml [root@k8s-master influxdb]# kubectl create -f . deployment.extensions/monitoring-grafana created service/monitoring-grafana created serviceaccount/heapster created clusterrolebinding.rbac.authorization.k8s.io/heapster created service/heapster created deployment.extensions/heapster created deployment.extensions/monitoring-influxdb created service/monitoring-influxdb created configmap/influxdb-config created Error from server (AlreadyExists): error when creating "heapster.yaml": serviceaccounts "heapster" already exists Error from server (AlreadyExists): error when creating "heapster.yaml": clusterrolebindings.rbac.authorization.k8s.io "heapster" already exists Error from server (AlreadyExists): error when creating "heapster.yaml": services "heapster" already exists
检查执行结果
# 检查Deployment [root@k8s-master influxdb]# kubectl get deployments -n kube-system | grep -E 'heapster|monitoring' heapster 1/1 1 1 10m monitoring-grafana 1/1 1 1 10m monitoring-influxdb 1/1 1 1 10m # 检查Pods [root@k8s-master influxdb]# kubectl get pods -n kube-system | grep -E 'heapster|monitoring' heapster-75d646bf58-9x9tz 1/1 Running 0 10m monitoring-grafana-77997bd67d-5khvp 1/1 Running 0 10m monitoring-influxdb-7d6c5fb944-jmrv6 1/1 Running 0 10m
访问各dashboard界面
错误一:system:anonymous问题
访问dashboard网页时,可能出现以下问题:
{ "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "services \"heapster\" is forbidden: User \"system:anonymous\" cannot get resource \"services/proxy\" in API group \"\" in the namespace \"kube-system\"", "reason": "Forbidden", "details": { "name": "heapster", "kind": "services" }, "code": 403 }
分析问题:Kubernetes API Server
新增了–anonymous-auth
选项,允许匿名请求访问secure port
。没有被其他authentication
方法拒绝的请求即Anonymous requests
, 这样的匿名请求的username
为system:anonymous
, 归属的组为system:unauthenticated
。并且该选线是默认的。这样一来,当采用chrome浏览器访问dashboard UI时很可能无法弹出用户名、密码输入对话框,导致后续authorization失败。为了保证用户名、密码输入对话框的弹出,需要将–anonymous-auth
设置为false
。
再次访问dashboard发现多了CPU使用率和内存使用率的表格:
(2)访问grafana页面通过kube-apiserver访问:
获取 monitoring-grafana 服务 URL
[root@k8s-master ~]# kubectl cluster-info Kubernetes master is running at https://172.16.4.12:6443 Heapster is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/heapster/proxy KubeDNS is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy monitoring-grafana is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy monitoring-influxdb is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/monitoring-influxdb:http/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
访问浏览器URL:https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
- 通过kubectl proxy访问:
创建代理
[root@k8s-master ~]# kubectl proxy --address='172.16.4.12' --port=8084 --accept-hosts='^*$' Starting to serve on 172.16.4.12:8084
访问influxdb admin UI
获取 influxdb http 8086 映射的 NodePort
[root@k8s-master influxdb]# kubectl get svc -n kube-system|grep influxdb monitoring-influxdb NodePort 10.10.10.154 <none> 8086:43444/TCP,8083:49123/TCP 53m
通过 kube-apiserver 的非安全端口访问 influxdb 的 admin UI 界面: http://172.16.4.12:8080/api/v1/proxy/namespaces/kube-system/services/monitoring-influxdb:8083/
在页面的 “Connection Settings” 的 Host 中输入 node IP, Port 中输入 8086 映射的 nodePort 如上面的 43444,点击 “Save” 即可(我的集群中的地址是172.16.4.12:32299)
- 错误一:通过kube-apiserver访问不到influxdb dashboard,出现yaml文件内容。
{ "kind": "Service", "apiVersion": "v1", "metadata": { "name": "monitoring-influxdb", "namespace": "kube-system", "selfLink": "/api/v1/namespaces/kube-system/services/monitoring-influxdb", "uid": "22c9ab6c-8f72-11e9-b92b-e67418705759", "resourceVersion": "215237", "creationTimestamp": "2019-06-15T13:33:18Z", "labels": { "kubernetes.io/cluster-service": "true", "kubernetes.io/name": "monitoring-influxdb", "task": "monitoring" } }, "spec": { "ports": [ { "name": "http", "protocol": "TCP", "port": 8086, "targetPort": 8086, "nodePort": 43444 }, { "name": "admin", "protocol": "TCP", "port": 8083, "targetPort": 8083, "nodePort": 49123 } ], "selector": { "k8s-app": "influxdb" }, "clusterIP": "10.10.10.154", "type": "NodePort", "sessionAffinity": "None", "externalTrafficPolicy": "Cluster" }, "status": { "loadBalancer": { } } }
安装EFK插件
在Kubernetes集群中,一个完整的应用或服务都会涉及为数众多的组件运行,各组件所在的Node及实例数量都是可变的。日志子系统如果不做集中化管理,则会给系统的运维支撑造成很大的困难,因此有必要在集群层面对日志进行统一收集和检索等工作。
在容器中输出到控制台的日志,都会以“*-json.log”的命名方式保存到/var/lib/docker/containers/目录下,这就为日志采集和后续处理奠定了基础。
Kubernetes推荐用Fluentd+Elasticsearch+Kibana完成对系统和容器日志的采集、查询和展现工作。
部署统一日志管理系统,需要以下两个前提条件:
- API Server正确配置了CA证书。
- DNS服务启动、运行。
系统部署架构
我们通过在每台node上部署一个以DaemonSet方式运行的fluentd来收集每台node上的日志。Fluentd将docker日志目录/var/lib/docker/containers
和/var/log
目录挂载到Pod中,然后Pod会在node节点的/var/log/pods
目录中创建新的目录,可以区别不同的容器日志输出,该目录下有一个日志文件链接到/var/lib/docker/contianers
目录下的容器日志输出。注意:两个目录下的日志都会汇集到ElasticSearch集群,最终通过Kibana完成和用户的交互工作。
这里有一个特殊需求:Fluentd必须在每个Node上运行,为了满足这一需求,我们通过以下几种方式部署Fluentd。
- 直接在Node主机上部署Fluentd.
- 利用kubelet的--config参数,为每个node都加载Fluentd Pod。
- 利用DaemonSet让Fluentd Pod在每个Node上运行。
官方文件目录:https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch
配置EFK服务配置文件
创建目录盛放文件
[root@k8s-master ~]# mkdir EFK && cd EFK
配置EFK-RABC服务
[root@k8s-master EFK]# cat efk-rbac.yaml apiVersion: v1 kind: ServiceAccount metadata: name: efk namespace: kube-system --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1beta1 metadata: name: efk subjects: - kind: ServiceAccount name: efk namespace: kube-system roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io # 注意配置的ServiceAccount为efk。
配置ElasticSearch服务
# 此处将官方的三个文档合并成了一个elasticsearch.yaml,内容如下: [root@k8s-master EFK]# cat elasticsearch.yaml #------------ElasticSearch RBAC---------# apiVersion: v1 kind: ServiceAccount metadata: name: elasticsearch-logging namespace: kube-system labels: k8s-app: elasticsearch-logging addonmanager.kubernetes.io/mode: Reconcile --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: elasticsearch-logging labels: k8s-app: elasticsearch-logging addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: - "" resources: - "services" - "namespaces" - "endpoints" verbs: - "get" --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: namespace: kube-system name: elasticsearch-logging labels: k8s-app: elasticsearch-logging addonmanager.kubernetes.io/mode: Reconcile subjects: - kind: ServiceAccount name: elasticsearch-logging namespace: kube-system apiGroup: "" roleRef: kind: ClusterRole name: elasticsearch-logging apiGroup: "" --- # -----------ElasticSearch Service--------------# apiVersion: v1 kind: Service metadata: name: elasticsearch-logging namespace: kube-system labels: k8s-app: elasticsearch-logging kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "Elasticsearch" spec: ports: - port: 9200 protocol: TCP targetPort: db selector: k8s-app: elasticsearch-logging --- #-------------------ElasticSearch StatefulSet-------# apiVersion: apps/v1 kind: StatefulSet metadata: name: elasticsearch-logging namespace: kube-system labels: k8s-app: elasticsearch-logging version: v6.6.1 addonmanager.kubernetes.io/mode: Reconcile spec: serviceName: elasticsearch-logging replicas: 2 selector: matchLabels: k8s-app: elasticsearch-logging version: v6.7.2 template: metadata: labels: k8s-app: elasticsearch-logging version: v6.7.2 spec: serviceAccountName: elasticsearch-logging containers: - image: docker.elastic.co/elasticsearch/elasticsearch:6.6.1 name: elasticsearch-logging resources: # need more cpu upon initialization, therefore burstable class limits: cpu: 1000m requests: cpu: 100m ports: - containerPort: 9200 name: db protocol: TCP - containerPort: 9300 name: transport protocol: TCP volumeMounts: - name: elasticsearch-logging mountPath: /data env: - name: "NAMESPACE" valueFrom: fieldRef: fieldPath: metadata.namespace - name: ES_JAVA_OPTS value: -Xms1024m -Xmx1024m volumes: - name: elasticsearch-logging emptyDir: {} # Elasticsearch requires vm.max_map_count to be at least 262144. # If your OS already sets up this number to a higher value, feel free # to remove this init container. initContainers: - image: alpine:3.6 command: ["/sbin/sysctl", "-w", "vm.max_map_count=262144"] name: elasticsearch-logging-init securityContext: privileged: true
配置Fluentd服务的configmap,此处通过td-agent创建
# td-agent提供了一个官方文档:个人感觉繁琐,可以直接采用其脚本安装。 curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent3.sh | sh # 正式配置configmap,其配置文件如下,可以自己手动创建。 [root@k8s-master fluentd-es-image]# cat td-agent.conf kind: ConfigMap apiVersion: v1 metadata: name: td-agent-config namespace: kube-system labels: addonmanager.kubernetes.io/mode: Reconcile data: td-agent.conf: | <filter kubernetes.**> @type kubernetes_metadata tls-cert-file /etc/kubernetes/ssl/server.pem tls-private-key-file /etc/kubernetes/ssl/server-key.pem client-ca-file /etc/kubernetes/ssl/ca.pem service-account-key-file /etc/kubernetes/ssl/ca-key.pem </filter> <match **> @id elasticsearch @type elasticsearch @log_level info type_name _doc include_tag_key true host 172.16.4.12 port 9200 logstash_format true <buffer> @type file path /var/log/fluentd-buffers/kubernetes.system.buffer flush_mode interval retry_type exponential_backoff flush_thread_count 2 flush_interval 5s retry_forever retry_max_interval 30 chunk_limit_size 2M queue_limit_length 8 overflow_action block </buffer> </match> <source> type null type tail path /var/log/containers/*.log pos_file /var/log/es-containers.log.pos time_format %Y-%m-%dT%H:%M:%S.%NZ tag kubernetes.* format json read_from_head true </source> # 注意将configmap创建在kube-system的名称空间下。 kubectl create configmap td-agent-config --from-file=./td-agent.conf -n kube-system
# 创建fluentd的DaemonSet [root@k8s-master EFK]# cat fluentd.yaml apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: fluentd-es-v1.22 namespace: kube-system labels: k8s-app: fluentd-es kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v1.22 spec: template: metadata: labels: k8s-app: fluentd-es kubernetes.io/cluster-service: "true" version: v1.22 annotations: scheduler.alpha.kubernetes.io/critical-pod: '' spec: serviceAccountName: efk containers: - name: fluentd-es image: travix/fluentd-elasticsearch:1.22 command: - '/bin/sh' - '-c' - '/usr/sbin/td-agent 2>&1 >> /var/log/fluentd.log' resources: limits: memory: 200Mi requests: cpu: 100m memory: 200Mi volumeMounts: - name: varlog mountPath: /var/log - name: varlibdockercontainers mountPath: /var/lib/docker/containers readOnly: true nodeSelector: beta.kubernetes.io/fluentd-ds-ready: "true" tolerations: - key : "node.alpha.kubernetes.io/ismaster" effect: "NoSchedule" terminationGracePeriodSeconds: 30 volumes: - name: varlog hostPath: path: /var/log - name: varlibdockercontainers hostPath: path: /var/lib/docker/containers # 此处采用了一个dockerhub上公共镜像,官方镜像需要***。
配置Kibana服务
[root@k8s-master EFK]# cat kibana.yaml #---------------Kibana Deployment-------------------# apiVersion: apps/v1 kind: Deployment metadata: name: kibana-logging namespace: kube-system labels: k8s-app: kibana-logging kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile spec: replicas: 1 selector: matchLabels: k8s-app: kibana-logging template: metadata: labels: k8s-app: kibana-logging annotations: seccomp.security.alpha.kubernetes.io/pod: 'docker/default' spec: serviceAccountName: efk containers: - name: kibana-logging image: docker.elastic.co/kibana/kibana-oss:6.6.1 resources: # keep request = limit to keep this container in guaranteed class limits: cpu: 1000m requests: cpu: 100m env: - name: "ELASTICSEARCH_URL" value: "http://172.16.4.12:9200" # modified by gzr # value: "http://elasticsearch-logging:9200" - name: "SERVER_BASEPATH" value: "/api/v1/proxy/namespaces/kube-system/services/kibana-logging/proxy" ports: - containerPort: 5601 name: ui protocol: TCP --- #------------------Kibana Service---------------------# apiVersion: v1 kind: Service metadata: name: kibana-logging namespace: kube-system labels: k8s-app: kibana-logging kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "Kibana" spec: ports: - port: 5601 protocol: TCP targetPort: ui selector: k8s-app: kibana-logging
- 注意镜像位置为:docker.elastic.co/kibana/kibana-oss:6.6.1,如果读者需要更高版本的kibana,请自行更换。
- 由于没有配置DNS,建议直接将http://elasticsearch-logging:9200,直接替换成elasticsearch-logging的集群IP。
给Node设置标签
定义 DaemonSet fluentd-es-v1.22
时设置了 nodeSelector beta.kubernetes.io/fluentd-ds-ready=true
,所以需要在期望运行 fluentd 的 Node 上设置该标签;
[root@k8s-master EFK]# kubectl get nodes NAME STATUS ROLES AGE VERSION 172.16.4.12 Ready <none> 18h v1.14.3 172.16.4.13 Ready <none> 2d15h v1.14.3 172.16.4.14 Ready <none> 2d15h v1.14.3 [root@k8s-master EFK]#kubectl label nodes 172.16.4.14 beta.kubernetes.io/fluentd-ds-ready=true node "172.16.4.14" labeled [root@k8s-master EFK]#kubectl label nodes 172.16.4.13 beta.kubernetes.io/fluentd-ds-ready=true node "172.16.4.13" labeled [root@k8s-master EFK]#kubectl label nodes 172.16.4.12 beta.kubernetes.io/fluentd-ds-ready=true node "172.16.4.12" labeled
执行定义的文件
[root@k8s-master EFK]# kubectl create -f . serviceaccount/efk created clusterrolebinding.rbac.authorization.k8s.io/efk created service/elasticsearch-logging created serviceaccount/elasticsearch-logging created clusterrole.rbac.authorization.k8s.io/elasticsearch-logging created clusterrolebinding.rbac.authorization.k8s.io/elasticsearch-logging created statefulset.apps/elasticsearch-logging created daemonset.extensions/fluentd-es-v1.22 created deployment.apps/kibana-logging created service/kibana-logging created
验证执行结果
[root@k8s-master EFK]# kubectl get po -n kube-system -o wide| grep -E 'elastic|fluentd|kibana' elasticsearch-logging-0 1/1 Running 0 115m 172.30.69.5 172.16.4.14 <none> <none> elasticsearch-logging-1 1/1 Running 0 115m 172.30.20.8 172.16.4.13 <none> <none> fluentd-es-v1.22-4bmtm 0/1 CrashLoopBackOff 16 58m 172.30.53.2 172.16.4.12 <none> <none> fluentd-es-v1.22-f9hml 1/1 Running 0 58m 172.30.69.6 172.16.4.14 <none> <none> fluentd-es-v1.22-x9rf4 1/1 Running 0 58m 172.30.20.9 172.16.4.13 <none> <none> kibana-logging-7db9f954ff-mkbhr 1/1 Running 0 25s 172.30.69.7 172.16.4.14 <none> <none>
kibana Pod 第一次启动时会用较长时间(10-20分钟)来优化和 Cache 状态页面,可以 tailf 该 Pod 的日志观察进度。
[root@k8s-master EFK]# kubectl logs kibana-logging-7db9f954ff-mkbhr -n kube-system {"type":"log","@timestamp":"2019-06-18T09:23:33Z","tags":["plugin","warning"],"pid":1,"path":"/usr/share/kibana/src/legacy/core_plugins/ems_util","message":"Skipping non-plugin directory at /usr/share/kibana/src/legacy/core_plugins/ems_util"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["warning","elasticsearch","config","deprecation"],"pid":1,"message":"Config key \"url\" is deprecated. It has been replaced with \"hosts\""} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:kibana@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:elasticsearch@6.6.1","info"],"pid":1,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:console@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:interpreter@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:metrics@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:35Z","tags":["status","plugin:timelion@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:35Z","tags":["status","plugin:elasticsearch@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"} {"type":"log","@timestamp":"2019-06-18T09:23:35Z","tags":["listening","info"],"pid":1,"message":"Server running at http://0:5601"} [root@k8s-master EFK]# kubectl logs kibana-logging-7db9f954ff-mkbhr -n kube-system {"type":"log","@timestamp":"2019-06-18T09:23:33Z","tags":["plugin","warning"],"pid":1,"path":"/usr/share/kibana/src/legacy/core_plugins/ems_util","message":"Skipping non-plugin directory at /usr/share/kibana/src/legacy/core_plugins/ems_util"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["warning","elasticsearch","config","deprecation"],"pid":1,"message":"Config key \"url\" is deprecated. It has been replaced with \"hosts\""} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:kibana@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:elasticsearch@6.6.1","info"],"pid":1,"state":"yellow","message":"Status changed from uninitialized to yellow - Waiting for Elasticsearch","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:console@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:interpreter@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:34Z","tags":["status","plugin:metrics@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:35Z","tags":["status","plugin:timelion@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from uninitialized to green - Ready","prevState":"uninitialized","prevMsg":"uninitialized"} {"type":"log","@timestamp":"2019-06-18T09:23:35Z","tags":["status","plugin:elasticsearch@6.6.1","info"],"pid":1,"state":"green","message":"Status changed from yellow to green - Ready","prevState":"yellow","prevMsg":"Waiting for Elasticsearch"} {"type":"log","@timestamp":"2019-06-18T09:23:35Z","tags":["listening","info"],"pid":1,"message":"Server running at http://0:5601"} ......
访问kibana
- 通过kube-apiserver访问:
获取kibana服务URL
[root@k8s-master ~]# kubectl cluster-info Kubernetes master is running at https://172.16.4.12:6443 Elasticsearch is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy Heapster is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/heapster/proxy Kibana is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/kibana-logging/proxy KubeDNS is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy monitoring-grafana is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy monitoring-influxdb is running at https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/monitoring-influxdb:http/proxy
浏览器访问URL:https://172.16.4.12:6443/api/v1/namespaces/kube-system/services/kibana-logging/proxy
- 错误1:Kibana did not load properly. Check the server output for more information.
解决办法:
- 错误2:访问kibana,出现503错误,具体内容如下:
{ "kind": "Status", "apiVersion": "v1", "metadata": { }, "status": "Failure", "message": "no endpoints available for service \"kibana-logging\"", "reason": "ServiceUnavailable", "code": 503 }