In spite of the trends towards highly scalable Object Storage in the cloud computing world, filesystem-based storage is still irreplaceable in many aspects. Its hierarchical structures, human-friendly file names and above all, mature applications with prevalent protocols such as NFS, Samba/CIFS, are rooted firmly in practical IT building blocks and daily usages. Ceph FileSystem comes with a built-in mechanism, cephfs, that can be exported via networking and mounted as a remote filesystem storage, functioning just like NFS. As built-in as it sounds, the underlying protocol is “ceph”.
sh-4.4# mount -t ceph 10.1.0.1:/ /mnt/ceph
sh-4.4# mount | grep ceph`
10.1.0.1:/ on /mnt/ceph type ceph (rw,relatime,seclabel,acl)
sh-4.4# df -h /mnt/ceph
Filesystem Size Used Avail Use% Mounted on
10.1.0.1:/ 140G 4.7G 135G 4% /mnt/ceph
That’s no sweat and many people are happy with this. Nonetheless, here we are to answer to those who have to run NFS on top of Ceph for some reason, be it legacy setup, current IT restriction, old client software or something more complex in a our case when cephfs has to be bundled to a dedicated storage network which is not reachable directly by the VMs spawned on compute nodes in a different network. Whatever it is, nfs-ganesha, is here for anyone who needs his Ceph to work with NFS.
Because of the complexity of setting up Ceph storage cluster, most documents that one can find online propose to use Kubernetes or Docker to ease deployment efforts; as a result, nfs-ganesha is mostly recommended to be installed on top of containerization such that cluster HA can be achieved by the help of K8s cluster.
We do it here differently. Let’s run Ganesha as a native systemd service and manage HA with haproxy in Active/Active mode. First thing first, install needed yum repository:
sh-4.4# dnf -y install centos-release-nfs-ganesha4
Then come these 2 major packages that give us daemon, rados-ng/rados-kv and config files
sh-4.4# dnf -y install nfs-ganesha-ceph.x86_64 nfs-ganesha-rados-grace
Yum package manager helps me with dependencies and I end up having these installed. Samba shows up as well.
(1/16): nfs-ganesha-selinux-4.0-1.el8s.noarch.rpm 299 kB/s | 38 kB 00:00
(2/16): libtalloc-2.3.3-1.el8.x86_64.rpm 343 kB/s | 49 kB 00:00
(3/16): nfs-ganesha-rados-grace-4.0-1.el8s.x86_64.rpm 368 kB/s | 58 kB 00:00
(4/16): lmdb-libs-0.9.24-1.el8.x86_64.rpm 352 kB/s | 58 kB 00:00
(5/16): libtevent-0.11.0-0.el8.x86_64.rpm 1.3 MB/s | 50 kB 00:00
(6/16): libtdb-1.4.4-1.el8.x86_64.rpm 238 kB/s | 59 kB 00:00
(7/16): logrotate-3.14.0-4.el8.x86_64.rpm 340 kB/s | 86 kB 00:00
(8/16): libldb-2.4.1-1.el8.x86_64.rpm 563 kB/s | 188 kB 00:00
(9/16): samba-common-4.15.5-5.el8.noarch.rpm 658 kB/s | 224 kB 00:00
(10/16): samba-common-libs-4.15.5-5.el8.x86_64.rpm 624 kB/s | 177 kB 00:00
(11/16): libntirpc-4.0-1.el8s.x86_64.rpm 420 kB/s | 137 kB 00:00
(12/16): nfs-ganesha-4.0-1.el8s.x86_64.rpm 1.1 MB/s | 734 kB 00:00
(13/16): libcephfs2-12.2.7-9.el8.x86_64.rpm 733 kB/s | 486 kB 00:00
(14/16): nfs-ganesha-ceph-4.0-1.el8s.x86_64.rpm 90 kB/s | 59 kB 00:00
(15/16): libwbclient-4.15.5-5.el8.x86_64.rpm 137 kB/s | 124 kB 00:00
(16/16): samba-client-libs-4.15.5-5.el8.x86_64.rpm 3.8 MB/s | 5.5 MB 00:01
In /etc/ganesha/ceph.conf we modify couple values
NFS_CORE_PARAM
{
Enable_NLM = false;
Enable_RQUOTA = false;
Protocols = 4;
Bind_addr = 10.1.0.1;
}
NFSv4
{
RecoveryBackend = rados_ng;
Minor_Versions = 1,2;
Lease_Lifetime = 10;
Grace_Period = 20;
}
MDCACHE {
Dir_Chunk = 0;
}
EXPORT
{
Export_ID=100;
Protocols = 4;
Transports = TCP;
Path = /;
Pseudo = /;
Access_Type = RW;
Attr_Expiration_Time = 0;
Squash = root;
FSAL {
Name = CEPH;
}
}
CEPH
{
Ceph_Conf = /etc/ceph/ceph.conf;
}
RADOS_KV
{
pool = "cephfs_data";
}
Bind_addr is fixed to 10.1.0.1. If it is not specifically changed to fixed IP, default is 0.0.0.0, representing all network interfaces; this will conflict with haproxy which would bind 2049 port to VIP. The example here is with such cluster constructs.
- VIP: 10.1.0.100
- Control node 1 (c1): 10.1.0.1
- Control node 2 (c2): 10.1.0.2
- Control node 3 (c3): 10.1.0.3
- Compute node x 3 (not relevant here)
- Storage node x 5 (not relevant here)
Start nfs-ganesha on control nodes.
# systemctl start nfs-ganesha
# systemctl status nfs-ganesha
● nfs-ganesha.service - NFS-Ganesha file server
Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2022-04-15 18:40:08 CST; 7h ago
Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
Main PID: 16201 (ganesha.nfsd)
Tasks: 45 (limit: 179687)
Memory: 54.5M
CGroup: /system.slice/nfs-ganesha.service
└─16201 /usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT
Apr 15 18:40:08 c1 systemd[1]: Starting NFS-Ganesha file server...
Apr 15 18:40:08 c1 systemd[1]: Started NFS-Ganesha file server.
Now we have 3 A/A nfs-ganesha services, each on a control node. Next is the configuration for HA with haproxy. In /etc/haproxy/haproxy.cfg, add the section below.
listen ceph_nfs_ganesha
bind 10.1.0.100:2049
balance source
option tcpka
option tcplog
server c1 10.1.0.1:2048 check inter 2000 rise 2 fall 5
server c2 10.1.0.2:2048 check inter 2000 rise 2 fall 5
server c3 10.1.0.3:2048 check inter 2000 rise 2 fall 5
where we tell haproxy to control forward incoming connections to our VIP 10.1.0.100 and port 2049 (standard NFS port) to any of the active NFS servers running on c1, c2 and c3 with port 2048.
The last question is who would be in charge of managing where VIP (Virtual IP) runs on which of the 3 control nodes. The answer is pacemaker/corosync. Below is a snippet of create VIP resouce with pacemaker.
# pcs resource create vip ocf:heartbeat:IPaddr2 ip="10.1.0.100" op monitor interval="30s"
# pcs status
Cluster name: cube-8kWwZRbkBPcR6xk3
Cluster Summary:
* Stack: corosync
* Current DC: c1 (version 2.1.2-4.el8-ada5c3b36e2) - partition with quorum
* Last updated: Sat Apr 16 02:36:41 2022
* Last change: Fri Apr 15 20:08:14 2022 by root via crm_resource on c3
* 6 nodes configured
* 9 resource instances configured
Node List:
* Online: [ c1 c2 c3 ]
* RemoteOnline: [ p1 p2 p3 ]
Full List of Resources:
* vip (ocf::heartbeat:IPaddr2): Started c1
* haproxy (systemd:haproxy-ha): Started c1
* cinder-volume (systemd:openstack-cinder-volume): Started c2
* Clone Set: ovndb_servers-clone [ovndb_servers] (promotable):
* Masters: [ c1 ]
* Slaves: [ c2 c3 ]
* p1 (ocf::pacemaker:remote): Started c1
* p2 (ocf::pacemaker:remote): Started c1
* p3 (ocf::pacemaker:remote): Started c1
To verify, we can mount it with standard command and you can test out HA by stopping couple of the nfs-ganesha services on control nodes.
sh-4.4# mount -t nfs 10.1.0.100:/ /mnt/nfs
sh-4.4# mount | grep nfs
10.1.0.100:/ on /mnt/nfs type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.1.0.254,local_lock=none,addr=10.1.0.100)
sh-4.4# df -h /mnt/ceph
Filesystem Size Used Avail Use% Mounted on
10.1.0.1:/ 139G 4.7G 134G 4% /mnt/ceph