特色栏目： python 批处理 net编程 Javascript Php Asp Css Html5 Android seo centos

Rook 1.5.1 部署Ceph实操经验分享

来源：互联网收集：自由互联发布时间：2022-06-20

一、Rook概述 1.1 Rook简介 Rook 是一个开源的cloud-native storage编排, 提供平台和框架；为各种存储解决方案提供平台、框架和支持，以便与云原生环境本地集成。目前主要专用于Cloud-Native环

一、Rook概述

1.1 Rook简介

Rook 是一个开源的cloud-native storage编排, 提供平台和框架；为各种存储解决方案提供平台、框架和支持，以便与云原生环境本地集成。目前主要专用于Cloud-Native环境的文件、块、对象存储服务。它实现了一个自我管理的、自我扩容的、自我修复的分布式存储服务。

Rook支持自动部署、启动、配置、分配（provisioning）、扩容/缩容、升级、迁移、灾难恢复、监控，以及资源管理。为了实现所有这些功能，Rook依赖底层的容器编排平台，例如 kubernetes、CoreOS 等。。

Rook 目前支持Ceph、NFS、Minio Object Store、Edegefs、Cassandra、CockroachDB 存储的搭建。

项目地址：https://github.com/rook/rook

网站：https://rook.io/

1.2 Rook组件

Rook的主要组件有三个，功能如下：

Rook Operator

Rook与Kubernetes交互的组件
整个Rook集群只有一个

Agent or Driver

Flex Driver

已经被淘汰的驱动方式，在安装之前，请确保k8s集群版本是否支持CSI，如果不支持，或者不想用CSI，选择flex.

Ceph CSI Driver

默认全部节点安装，你可以通过 node affinity 去指定节点

Device discovery

发现新设备是否作为存储，可以在配置文件ROOK_ENABLE_DISCOVERY_DAEMON设置 enable 开启。

1.3 Rook & Ceph框架

Rook 如何集成在kubernetes 如图： Rook 1.5.1 部署Ceph实操经验分享

使用Rook部署Ceph集群的架构图如下： Rook 1.5.1 部署Ceph实操经验分享

部署的Ceph系统可以提供下面三种Volume Claim服务：

Block Storage：目前最稳定；
FileSystem：需要部署MDS，有内核要求；
Object：部署RGW；

二、ROOK 部署

2.1 准备工作

2.1.1 版本要求

kubernetes v.11 以上

2.1.2 存储要求

rook部署的ceph 是不支持lvm direct直接作为osd存储设备的，如果想要使用lvm，可以使用pvc的形式实现。方法在后面的ceph安装会提到

为了配置 Ceph 存储集群，至少需要以下本地存储选项之一：

原始设备（无分区或格式化的文件系统）
原始分区（无格式文件系统）可以 lsblk -f查看，如果 FSTYPE不为空说明有文件系统
可通过 block 模式从存储类别获得 PV

2.1.3 系统要求

本次安装环境

kubernetes 1.18
centos7.8
kernel 5.4.65-200.el7.x86_64
calico 3.16

2.1.3.1 需要安装lvm包

sudo yum install -y lvm2

2.1.3.2 内核要求

RBD

一般发行版的内核都编译有，但你最好确定下：

foxchan@~$ lsmod|grep rbd rbd 114688 0 libceph 368640 1 rbd

可以用以下命令放到开机启动项里

cat > /etc/sysconfig/modules/rbd.modules << EOF modprobe rbd EOF

CephFS

如果你想使用cephfs,内核最低要求是4.17。

2.2 部署ROOK

Github上下载Rook最新release

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits

安装公共部分

cd rook/cluster/examples/kubernetes/ceph kubectl create -f crds.yaml -f common.yaml

安装operator

kubectl apply -f operator.yaml

如果放到生产环境，请提前规划好。operator的配置在ceph安装后不能修改，否则rook会删除集群并重建。

修改内容如下：

# 启用cephfs ROOK_CSI_ENABLE_CEPHFS: "true" # 开启内核驱动替换ceph-fuse CSI_FORCE_CEPHFS_KERNEL_CLIENT: "true" #修改csi镜像为私有仓，加速部署时间 ROOK_CSI_CEPH_IMAGE: "harbor.foxchan.com/google_containers/cephcsi/cephcsi:v3.1.2" ROOK_CSI_REGISTRAR_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-node-driver-registrar:v2.0.1" ROOK_CSI_RESIZER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-resizer:v1.0.0" ROOK_CSI_PROVISIONER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-provisioner:v2.0.0" ROOK_CSI_SNAPSHOTTER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-snapshotter:v3.0.0" ROOK_CSI_ATTACHER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-attacher:v3.0.0" # 可以设置NODE_AFFINITY 来指定csi 部署的节点 # 我把plugin 和 provisioner分开了，具体调度方式看你集群资源。 CSI_PROVISIONER_NODE_AFFINITY: "app.rook.role=csi-provisioner" CSI_PLUGIN_NODE_AFFINITY: "app.rook.plugin=csi" #修改metrics端口，可以不改，我因为集群网络是host，为了避免端口冲突 # Configure CSI CSI Ceph FS grpc and liveness metrics port CSI_CEPHFS_GRPC_METRICS_PORT: "9491" CSI_CEPHFS_LIVENESS_METRICS_PORT: "9481" # Configure CSI RBD grpc and liveness metrics port CSI_RBD_GRPC_METRICS_PORT: "9490" CSI_RBD_LIVENESS_METRICS_PORT: "9480" # 修改rook镜像，加速部署时间 image: harbor.foxchan.com/google_containers/rook/ceph:v1.5.1 # 指定节点做存储 - name: DISCOVER_AGENT_NODE_AFFINITY value: "app.rook=storage" # 开启设备自动发现 - name: ROOK_ENABLE_DISCOVERY_DAEMON value: "true"

2.3 部署ceph集群

cluster.yaml文件里的内容需要修改，一定要适配自己的硬件情况，请详细阅读配置文件里的注释，避免我踩过的坑。

修改内容如下：

此文件的配置，除了增删osd设备外，其他的修改都要重装ceph集群才能生效，所以请提前规划好集群。如果修改后不卸载ceph直接apply，会触发ceph集群重装，导致集群异常挂掉

apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: # 命名空间的名字，同一个命名空间只支持一个集群 name: rook-ceph namespace: rook-ceph spec: # ceph版本说明 # v13 is mimic, v14 is nautilus, and v15 is octopus. cephVersion: #修改ceph镜像，加速部署时间 image: harbor.foxchan.com/google_containers/ceph/ceph:v15.2.5 # 是否允许不支持的ceph版本 allowUnsupported: false #指定rook数据在节点的保存路径 dataDirHostPath: /data/rook # 升级时如果检查失败是否继续 skipUpgradeChecks: false # 从1.5开始，mon的数量必须是奇数 mon: count: 3 # 是否允许在单个节点上部署多个mon pod allowMultiplePerNode: false mgr: modules: - name: pg_autoscaler enabled: true # 开启dashboard，禁用ssl，指定端口是7000，你可以默认https配置。我是为了ingress配置省事。 dashboard: enabled: true port: 7000 ssl: false # 开启prometheusRule monitoring: enabled: true # 部署PrometheusRule的命名空间，默认此CR所在命名空间 rulesNamespace: rook-ceph # 开启网络为host模式，解决无法使用cephfs pvc的bug network: provider: host # 开启crash collector，每个运行了Ceph守护进程的节点上创建crash collector pod crashCollector: disable: false # 设置node亲缘性，指定节点安装对应组件 placement: mon: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ceph-mon operator: In values: - enabled osd: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ceph-osd operator: In values: - enabled mgr: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: ceph-mgr operator: In values: - enabled # 存储的设置，默认都是true，意思是会把集群所有node的设备清空初始化。 storage: # cluster level storage configuration and selection useAllNodes: false #关闭使用所有Node useAllDevices: false #关闭使用所有设备 nodes: - name: "192.168.1.162" #指定存储节点主机 devices: - name: "nvme0n1p1" #指定磁盘为nvme0n1p1 - name: "192.168.1.163" devices: - name: "nvme0n1p1" - name: "192.168.1.164" devices: - name: "nvme0n1p1" - name: "192.168.1.213" devices: - name: "nvme0n1p1"

更多 cluster 的 CRD 配置参考：

https://github.com/rook/rook/blob/master/Documentation/ceph-cluster-crd.md

执行安装

kubectl apply -f cluster.yaml # 需要等一段时间，所有pod都已正常启动 [foxchan@k8s-master ceph]$ kubectl get pods -n rook-ceph NAME READY STATUS RESTARTS AGE csi-cephfsplugin-b5tlr 3/3 Running 0 19h csi-cephfsplugin-mjssm 3/3 Running 0 19h csi-cephfsplugin-provisioner-5cf5ffdc76-mhdgz 6/6 Running 0 19h csi-cephfsplugin-provisioner-5cf5ffdc76-rpdl8 6/6 Running 0 19h csi-cephfsplugin-qmvkc 3/3 Running 0 19h csi-cephfsplugin-tntzd 3/3 Running 0 19h csi-rbdplugin-4p75p 3/3 Running 0 19h csi-rbdplugin-89mzz 3/3 Running 0 19h csi-rbdplugin-cjcwr 3/3 Running 0 19h csi-rbdplugin-ndjcj 3/3 Running 0 19h csi-rbdplugin-provisioner-658dd9fbc5-fwkmc 6/6 Running 0 19h csi-rbdplugin-provisioner-658dd9fbc5-tlxd8 6/6 Running 0 19h prometheus-rook-prometheus-0 2/2 Running 1 3d17h rook-ceph-mds-myfs-a-5cbcdc6f9c-7mdsv 1/1 Running 0 19h rook-ceph-mds-myfs-b-5f4cc54b87-m6m6f 1/1 Running 0 19h rook-ceph-mgr-a-f98d4455b-bwhw7 1/1 Running 0 20h rook-ceph-mon-a-5d445d4b8d-lmg67 1/1 Running 1 20h rook-ceph-mon-b-769c6fd76f-jrlc8 1/1 Running 0 20h rook-ceph-mon-c-6bfd8954f5-tbsnd 1/1 Running 0 20h rook-ceph-operator-7d8cc65dc-8wtl8 1/1 Running 0 20h rook-ceph-osd-0-c558ff759-bzbgw 1/1 Running 0 20h rook-ceph-osd-1-5c97d69d78-dkxbb 1/1 Running 0 20h rook-ceph-osd-2-7dddc7fd56-p58mw 1/1 Running 0 20h rook-ceph-osd-3-65ff985c7d-9gfgj 1/1 Running 0 20h rook-ceph-osd-prepare-192.168.1.213-pw5gr 0/1 Completed 0 19h rook-ceph-osd-prepare-192.168.1.162-wtkm8 0/1 Completed 0 19h rook-ceph-osd-prepare-192.168.1.163-b86r2 0/1 Completed 0 19h rook-ceph-osd-prepare-192.168.1.164-tj79t 0/1 Completed 0 19h rook-discover-89v49 1/1 Running 0 20h rook-discover-jdzhn 1/1 Running 0 20h rook-discover-sl9bv 1/1 Running 0 20h rook-discover-wg25w 1/1 Running 0 20h

2.4 增删osd

2.4.1 添加相关label

kubectl label nodes 192.168.1.165 app.rook=storage kubectl label nodes 192.168.1.165 ceph-osd=enabled

2.4.2 修改cluster.yaml

nodes: - name: "192.168.1.162" devices: - name: "nvme0n1p1" - name: "192.168.1.163" devices: - name: "nvme0n1p1" - name: "192.168.1.164" devices: - name: "nvme0n1p1" - name: "192.168.17.213" devices: - name: "nvme0n1p1" #添加165的磁盘信息 - name: "192.168.1.165" devices: - name: "nvme0n1p1"

2.4.3 apply cluster.yaml

kubectl apply -f cluster.yaml

2.4.4 删除osd

cluster.yaml去掉相关节点，再apply

2.5 安装dashboard

这是我自己的traefik ingress，yaml目录里有很多dashboard暴露方式，自行选择

dashboard已经在前述的步骤中包含了，这里只需要把dashboard service的服务暴露出来。有多种方法，我使用的是ingress的方式来暴露：

apiVersion: traefik.containo.us/v1alpha1 kind: Ingre***oute metadata: name: traefik-ceph-dashboard annotations: kubernetes.io/ingress.class: traefik-v2.3 spec: entryPoints: - web routes: - match: Host(`ceph.foxchan.com`) kind: Rule services: - name: rook-ceph-mgr-dashboard namespace: rook-ceph port: 7000 middlewares: - name: gs-ipwhitelist

登录 dashboard 需要安全访问。Rook 在运行 Rook Ceph 集群的名称空间中创建一个默认用户，admin 并生成一个称为的秘密rook-ceph-dashboard-admin-password

要检索生成的密码，可以运行以下命令：

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

2.6 安装toolbox

执行下面的命令：

kubectl apply -f toolbox.yaml

成功后，可以使用下面的命令来确定toolbox的pod已经启动成功：

kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"

然后可以使用下面的命令登录该pod，执行各种ceph命令：

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

比如：

ceph status
ceph osd status
ceph df
rados df

删除toolbox

kubectl -n rook-ceph delete deploy/rook-ceph-tools

2.7 prometheus监控

监控部署很简单，利用Prometheus Operator，独立部署一套prometheus

安装prometheus operator

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.40.0/bundle.yaml

安装prometheus

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.git cd rook/cluster/examples/kubernetes/ceph/monitoring kubectl create -f service-monitor.yaml kubectl create -f prometheus.yaml kubectl create -f prometheus-service.yaml

默认是nodeport方式暴露

echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"

开启Prometheus Alerts

此操作必须在ceph集群安装之前

安装rbac

kubectl create -f cluster/examples/kubernetes/ceph/monitoring/rbac.yaml

确保cluster.yaml 开启

apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: name: rook-ceph namespace: rook-ceph [...] spec: [...] monitoring: enabled: true rulesNamespace: "rook-ceph" [...]

Grafana Dashboards

Grafana 版本大于等于 7.2.0

推荐一下dashboard

Ceph - Cluster
Ceph - OSD (Single)
Ceph - Pools

2.8 删除ceph集群

删除ceph集群前，请先清理相关pod

删除块存储和文件存储

kubectl delete -n rook-ceph cephblockpool replicapool kubectl delete storageclass rook-ceph-block kubectl delete -f csi/cephfs/filesystem.yaml kubectl delete storageclass csi-cephfs rook-ceph-block

删除operator和相关crd

kubectl delete -f operator.yaml kubectl delete -f common.yaml kubectl delete -f crds.yaml

清除主机上的数据

删除Ceph集群后，在之前部署Ceph组件节点的/data/rook/目录，会遗留下Ceph集群的配置信息。

若之后再部署新的Ceph集群，先把之前Ceph集群的这些信息删除，不然启动monitor会失败；

# cat clean-rook-dir.sh hosts=( 192.168.1.213 192.168.1.162 192.168.1.163 192.168.1.164 ) for host in ${hosts[@]} ; do ssh $host "rm -rf /data/rook/*" done

清除device

#!/usr/bin/env bash DISK="/dev/nvme0n1p1" # Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean) # You will have to run this step for all disks. sgdisk --zap-all $DISK # hdd 用以下命令 dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync # ssd 用以下命令 blkdiscard $DISK # These steps only have to be run once on each node # If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks. ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove % # ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter) rm -rf /dev/ceph-*

如果因为某些原因导致删除ceph集群卡主，可以先执行以下命令，再删除ceph集群就不会卡主了

kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge

2.9 rook升级

2.9.1 小版本升级

Rook v1.5.0 to Rook v1.5.1

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits cd $YOUR_ROOK_REPO/cluster/examples/kubernetes/ceph/ kubectl apply -f common.yaml -f crds.yaml kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.5.1

2.9.2 跨版本升级

Rook v1.4.x to Rook v1.5.x.

准备

设置环境变量

# Parameterize the environment export ROOK_SYSTEM_NAMESPACE="rook-ceph" export ROOK_NAMESPACE="rook-ceph"

升级之前需要保证集群健康

所有pod 是running

kubectl -n $ROOK_NAMESPACE get pods

通过tool 查看ceph集群状态是否正常

TOOLS_POD=$(kubectl -n $ROOK_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') kubectl -n $ROOK_NAMESPACE exec -it $TOOLS_POD -- ceph status cluster: id: 194d139f-17e7-4e9c-889d-2426a844c91b health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 25h) mgr: a(active, since 5h) mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay osd: 4 osds: 4 up (since 25h), 4 in (since 25h) task status: scrub status: mds.myfs-a: idle mds.myfs-b: idle data: pools: 4 pools, 97 pgs objects: 2.08k objects, 7.6 GiB usage: 26 GiB used, 3.3 TiB / 3.3 TiB avail pgs: 97 active+clean io: client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr

升级operator

1、升级common和crd

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits cd rook/cluster/examples/kubernetes/ceph kubectl apply -f common.yaml -f crds.yaml

2、升级 Ceph CSI versions

可以修改cm来自己制定镜像版本，如果是默认的配置，无需修改

kubectl -n rook-ceph get configmap rook-ceph-operator-config ROOK_CSI_CEPH_IMAGE: "harbor.foxchan.com/google_containers/cephcsi/cephcsi:v3.1.1" ROOK_CSI_REGISTRAR_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-node-driver-registrar:v2.0.1" ROOK_CSI_PROVISIONER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-provisioner:v2.0.0" ROOK_CSI_SNAPSHOTTER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-snapshotter:v3.0.0" ROOK_CSI_ATTACHER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-attacher:v3.0.0" ROOK_CSI_RESIZER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-resizer:v1.0.0"

3、升级 Rook Operator

kubectl -n $ROOK_SYSTEM_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.5.1

4、等待集群升级完毕

watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'

5、验证集群升级完毕

kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq

升级ceph 版本

如果集群状态不监控，operator会拒绝升级

1、升级ceph镜像

NEW_CEPH_IMAGE='ceph/ceph:v15.2.5' CLUSTER_NAME=rook-ceph kubectl -n rook-ceph patch CephCluster rook-ceph --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"

2、观察pod 升级

watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'

3、查看ceph集群是否正常

kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"ceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' | sort | uniq

三、部署块存储

3.1 创建pool和StorageClass

# 定义一个块存储池 apiVersion: ceph.rook.io/v1 kind: CephBlockPool metadata: name: replicapool namespace: rook-ceph spec: # 每个数据副本必须跨越不同的故障域分布，如果设置为host，则保证每个副本在不同机器上 failureDomain: host # 副本数量 replicated: size: 3 # Disallow setting pool with replica 1, this could lead to data loss without recovery. # Make sure you're *ABSOLUTELY CERTAIN* that is what you want requireSafeReplicaSize: true # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size #targetSizeRatio: .5 --- # 定义一个StorageClass apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rook-ceph-block # 该SC的Provisioner标识，rook-ceph前缀即当前命名空间 provisioner: rook-ceph.rbd.csi.ceph.com parameters: # clusterID 就是集群所在的命名空间名 # If you change this namespace, also change the namespace below where the secret namespaces are defined clusterID: rook-ceph # If you want to use erasure coded pool with RBD, you need to create # two pools. one erasure coded and one replicated. # You need to specify the replicated pool here in the `pool` parameter, it is # used for the metadata of the images. # The erasure coded pool must be set as the `dataPool` parameter below. #dataPool: ec-data-pool # RBD镜像在哪个池中创建 pool: replicapool # RBD image format. Defaults to "2". imageFormat: "2" # 指定image特性，CSI RBD目前仅仅支持layering imageFeatures: layering # Ceph admin 管理凭证配置,由operator 自动生成 # in the same namespace as the cluster. csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # 卷的文件系统类型，默认ext4，不建议xfs，因为存在潜在的死锁问题（超融合设置下卷被挂载到相同节点作为OSD时） csi.storage.k8s.io/fstype: ext4 # uncomment the following to use rbd-nbd as mounter on supported nodes # **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi # issue that causes the mount to be disconnected. You will need to follow special upgrade steps # to restart your application pods. Therefore, this option is not recommended. #mounter: rbd-nbd allowVolumeExpansion: true reclaimPolicy: Delete

3.2 demo示例

推荐pvc 和应用写到一个yaml里面

#创建pvc apiVersion: v1 kind: PersistentVolumeClaim metadata: name: rbd-demo-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: rook-ceph-block --- apiVersion: apps/v1 kind: Deployment metadata: name: csirbd-demo-pod labels: test-cephrbd: "true" spec: replicas: 1 selector: matchLabels: test-cephrbd: "true" template: metadata: labels: test-cephrbd: "true" spec: containers: - name: web-server-rbd image: harbor.foxchan.com/sys/nginx:1.19.4-alpine volumeMounts: - name: mypvc mountPath: /usr/share/nginx/html volumes: - name: mypvc persistentVolumeClaim: claimName: rbd-demo-pvc readOnly: false

四、部署文件系统

4.1 创建CephFS

CephFS的CSI驱动使用Quotas来强制应用PVC声明的大小，仅仅4.17+内核才能支持CephFS quotas。

如果内核不支持，而且你需要配额管理，配置Operator环境变量 CSI_FORCE_CEPHFS_KERNEL_CLIENT: false来启用FUSE客户端。

使用FUSE客户端时，升级Ceph集群时应用Pod会断开mount，需要重启才能再次使用PV。

apiVersion: ceph.rook.io/v1 kind: CephFilesystem metadata: name: myfs namespace: rook-ceph spec: # The metadata pool spec. Must use replication. metadataPool: replicated: size: 3 requireSafeReplicaSize: true parameters: # Inline compression mode for the data pool # Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression compression_mode: none # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size #target_size_ratio: ".5" # The list of data pool specs. Can use replication or erasure coding. dataPools: - failureDomain: host replicated: size: 3 # Disallow setting pool with replica 1, this could lead to data loss without recovery. # Make sure you're *ABSOLUTELY CERTAIN* that is what you want requireSafeReplicaSize: true parameters: # Inline compression mode for the data pool # Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression compression_mode: none # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size #target_size_ratio: ".5" # Whether to preserve filesystem after CephFilesystem CRD deletion preserveFilesystemOnDelete: true # The metadata service (mds) configuration metadataServer: # The number of active MDS instances activeCount: 1 # Whether each active MDS instance will have an active standby with a warm metadata cache for faster failover. # If false, standbys will be available, but will not have a warm cache. activeStandby: true # The affinity rules to apply to the mds deployment placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: app.storage operator: In values: - rook-ceph # topologySpreadConstraints: # tolerations: # - key: mds-node # operator: Exists # podAffinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: ceph-mds operator: In values: - enabled # topologyKey: kubernetes.io/hostname will place MDS across different hosts topologyKey: kubernetes.io/hostname preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: ceph-mds operator: In values: - enabled # topologyKey: */zone can be used to spread MDS across different AZ # Use <topologyKey: failure-domain.beta.kubernetes.io/zone> in k8s cluster if your cluster is v1.16 or lower # Use <topologyKey: topology.kubernetes.io/zone> in k8s cluster is v1.17 or upper topologyKey: topology.kubernetes.io/zone # A key/value list of annotations annotations: # key: value # A key/value list of labels labels: # key: value resources: # The requests and limits set here, allow the filesystem MDS Pod(s) to use half of one CPU core and 1 gigabyte of memory # limits: # cpu: "500m" # memory: "1024Mi" # requests: # cpu: "500m" # memory: "1024Mi" # priorityClassName: my-priority-class

4.2 创建StorageClass

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rook-cephfs provisioner: rook-ceph.cephfs.csi.ceph.com parameters: # clusterID is the namespace where operator is deployed. clusterID: rook-ceph # CephFS filesystem name into which the volume shall be created fsName: myfs # Ceph pool into which the volume shall be created # Required for provisionVolume: "true" pool: myfs-data0 # Root path of an existing CephFS volume # Required for provisionVolume: "false" # rootPath: /absolute/path # The secrets contain Ceph admin credentials. These are generated automatically by the operator # in the same namespace as the cluster. csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # (optional) The driver can use either ceph-fuse (fuse) or ceph kernel client (kernel) # If omitted, default volume mounter will be used - this is determined by probing for ceph-fuse # or by setting the default mounter explicitly via --volumemounter command-line argument. #使用kernel client mounter: kernel reclaimPolicy: Delete allowVolumeExpansion: true mountOptions: # uncomment the following line for debugging #- debug

4.3 创建pvc

在创建cephfs 的pvc 发现一直处于pending状态，社区有人认为是网络组件的差异，目前我的calico无法成功，只能改为host模式，flannel可以。

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cephfs-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi storageClassName: rook-cephfs

4.4 demo示例

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: cephfs-demo-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi storageClassName: rook-cephfs --- apiVersion: apps/v1 kind: Deployment metadata: name: csicephfs-demo-pod labels: test-cephfs: "true" spec: replicas: 2 selector: matchLabels: test-cephfs: "true" template: metadata: labels: test-cephfs: "true" spec: containers: - name: web-server image: harbor.foxchan.com/sys/nginx:1.19.4-alpine imagePullPolicy: Always volumeMounts: - name: mypvc mountPath: /usr/share/nginx/html volumes: - name: mypvc persistentVolumeClaim: claimName: cephfs-demo-pvc readOnly: false

五、遇到问题

5.1 lvm direct 不能直接做osd 存储

官方issue

https://github.com/rook/rook/issues/5751

https://github.com/rook/rook/issues/2047

解决方式：

可以手动创建本地pvc，把lvm挂载上在做osd设备。如果手动闲麻烦，可以使用local-path-provisioner

5.2 Cephfs pvc pending