GlusterFS HowTo on CentOS 6.x

on May 25 | in Clusters, Linux | by | with 14 Comments

In this howto we will describe in detail how to install / configure GlusterFS 3.3.1 (latest stable) on CentOS 6.3.

GlusterFS is an open source, powerful clustered file system capable of scaling to several petabytes of storage which is available to user under a single mount point. It uses already available disk filesystems like ext3, ext4, xfs etc to store data and client will able to access the storage as local filesystem. GlusterFS cluster aggregates storage blocks over Infiniband RDMA and/or TCP/IP interconnect in a single global namespace.

We will discuss following terms later on in this howto, so it require you to understand them before proceed.

brick
The brick is the storage filesystem that has been assigned to a volume. e.g /data on server
client
The machine which mounts the volume (this may also be a server).
server
The machine (physical or virtual or bare metal) which hosts the actual filesystem in which data will be stored.
volume
A volume is a logical collection of bricks where each brick is an export directory on a server . A volume can be of several types and you can create any of them in storage pool for a single volume.
Distributed – Distributed volumes distributes files throughout the bricks in the volume. You can use distributed volumes where the requirement is to scale storage and the redundancy is either not important or is provided by other hardware/software layers.
Replicated – Replicated volumes replicates files across bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.
Striped – Striped volumes stripes data across bricks in the volume. For best results, you should use striped volumes only in high concurrency environments accessing very large files.

1. Setup

Hardware

I will use three servers and one client for my GlusterFS installation/configuration. These can be a physical machines or virtual machines. I will be using my virtual environment for this and IP/hostname will be as follow.

Two partition require on each server because 1 will be using for OS install and 2nd will be using for our storage.

Software

For OS I will be using CentOS 6.3 and GlusterFS 3.3.1. In EPEL repository 3.2.7 is available but we will go with latest version i.e 3.3.1. This is available through GlusterFS own repository.

2. Installation

First we will add GlusterFS repo in our yum repositories. To do this execute following command.

 2.1 Installation on Servers:

On Servers (host1, host2, host3) execute following command to install glusterfs server side packages.

Start glusterfs services on all servers with enable them to start automatically on startup.

 2.2 Installation on Client:

On Client execute following command to install clusterfs client side packages.

This we will later on use to mount the glusterfs on client.

 3. Creating Trusted Storage Pool.

Trusted storage pool are the servers which are running as gluster servers and will provide bricks for volumes. You will need to probe all servers to server1 (dont probe server1 or localhost).

Note: turn off your firewall using iptables -F command.

We will now create all three servers in a trusted storage pool and probing will be done on server1.

Confirm your server status.

4. Creating Glusterfs Server Volume

Now its time to create glusterfs server volume. A volume is a logical collection of bricks where each brick is an export directory on a server in the trusted storage pool.

Glusterfs gives many types in storage to create the volumes within: I will demonstrate three of them defined above and it will gives you enough knowledge to create remaining by yourself.

4.1 Distributed

Use distributed volumes where you need to scale storage because in a distributed volumes files are spread randomly across the bricks in the volume.

Start the dist-volume

Check status of volume

 4.1.1 Accessing Distributed volume and testing.

Now on client1.example.com we will access and test distributed volume functionality. To mount gluster volumes to access data, first we will mount it manually then add in /etc/fstab to mount it automatically whenever server restart.

Use mount command to access gluster volume.

Check it using mount command.

Now add following line at the end of /etc/fstab file to make it available to server on every reboot.

save the file.

Now to test create following files in the mounted directory.

Check on the servers for distributed functionality

All of the files created in mounted volume are distributed to all the servers.

4.2 Replicated

Use replicated volumes in storage where high-availability and high-reliability are critical because replicated volumes create same copies of files across multiple bricks in the volume.

Where replica 3 is a value to create number of copies on multiple servers, so here we need same copy on all servers.

Start the dist-volume

Check status of volume

4.2.1 Accessing Replicated Volume and tests.

Now same as distributed volume access using mount command. To mount gluster replicated volumes to access data, first we will mount it manually then add in /etc/fstab to mount it automatically whenever server restart.

Use mount command to access gluster volume.

Check it using mount command.

Now add following line at the end of /etc/fstab file to make it available to server on every reboot.

Now to test create following files in the mounted directory.

Check on the servers for replicated functionality

All of the files created in mounted volume are replicated to all the servers.

4.3 Stripped

Use striped volumes only in high concurrency environments accessing very large files because striped volumes stripes data across bricks in the volume.

Start the dist-volume

Check status of volume

4.3.1 Accessing Stripped Volume and tests.

Now same as distributed and replicated volume access stripped volume using mount command. To mount gluster stripped volumes to access data, first we will mount it manually then add in /etc/fstab to mount it automatically whenever server restart.

Use mount command to access gluster volume.

Check it using mount command.

Now add following line at the end of /etc/fstab file to make it available to server on every reboot.

Save the file.

Now to test create following large file in the mounted directory on client1.

Check on the servers for stripped functionality

The large file is stripped across volume successfully.

5. Managing Gluster Volumes.

Now we will see some of the common operations/maintenance you might do on gluster volumes.

5.1 Expanding Volumes.

As per need we can add volume to already online volumes. Here for example we will going to add new brick to our distributed volume. To do this task we will need to do following:

First probe a new server which will offer new brick to our volume. This has to be done on host1

Now add the new brick from new probed host4.

Check the volume information using the following command.

5.2 Shrinking Volume

As needed you can shrink volumes even the gluster fs is online and available. Due to some hardware failure or network unreachable one of your brick is unable in volume, you need to remove it then first start the process of removing,

Check the status should be completed.

commit the removing brick operation

Check the volume information for confirmation.

5.3 Rebalancing Volume

Rebalancing need to be done after expanding or shrinking the volume, this will rebalance the data amount other servers. To do this you need to issue following command

5.4 Stopping the Volume.

To stop a volume

To delete a volume

Be remember to unmount the mounted directory on your clients.

Pin It

related posts

14 Responses to GlusterFS HowTo on CentOS 6.x

  1. lou says:

    Nice article. I am testing GlusterFS for possible use in my datacenter as I need to implement shared storeage for virtualized servers. my 2 options are NFS with GlusterFS or an iSCSI scan to support my VMWare servers that currently use local storage.

    Cheers!

  2. Anton Zavrin says:

    Great Article – Thank you for taking time to post this!

  3. Adeel Ahmad says:

    Very Nice tutorial Sohail Riaz Appreciate that, there is one basic question i want to ask, in replication, you mount the volume on client as “mount.glusterfs host1.sohailriaz.com:/rep-volume /mnt/replicated/” and also create volume on host1, what would happen if host1 fails physically, does it mean your copy of data will remain on host2. but client got disconnected and you have to mount host2 in client again to access copy of data. ???? it mean down time right but minor???

    prompt reply will be highly appreciated. Thanks

    Regards,
    Adeel Ahmad

  4. Le says:

    Nice, helpful

  5. Manonit Kumar says:

    I got a few bare metal servers for my experiment and i was trying to install glusterfs packages on them. But while installing i run into a dependency issue.
    :Error: Package: glusterfs-3.5git-1.el6.x86_64 (exogeni)
    Requires: libcrypto.so.10(libcrypto.so.10)(64bit)
    Error: Package: glusterfs-3.5git-1.el6.x86_64 (exogeni)
    Requires: libssl.so.10(libssl.so.10)(64bit)
    Error: Package: glusterfs-libs-3.5git-1.el6.x86_64 (exogeni)
    Requires: libcrypto.so.10(libcrypto.so.10)(64bit)
    I tried finding the latest libcrypto libs but there is no libcrypto.so.10 version. Can some help me.

  6. Alex says:

    Thanks a lot.

  7. vaibhav kanchan says:

    Hi Sohail,

    Do we need a shared disk for setting up glusterfs between 2 nodes or need to provide separate disk to each node and proceed with the above setup?

  8. Kapish Gupta says:

    Hi,
    I was trying to install glusterfs in centos minimal version on a virtual machine. After running the command:
    yum install glusterfs-server
    there is an error shown saying that “no package glusterfs-server is available”.
    What should i do now?
    thanks

  9. @Kapish: Glusterfs updated there repo and new location for repo is

    http://download.gluster.org/pub/gluster/glusterfs/LATEST/EPEL.repo/glusterfs-epel.repo

    you need to download it and placed in under /etc/yum.repos.d/ directory.

  10. Alden says:

    Hi Sohail,

    I have existing 2 server and 1 client. what I’m planning to make is a fail-over/load balancing in client and I need to provide 1 more client server to make this happen. now my question is this recommended or a best practice, is there any problem with this setup especially when 2 users with the same account do read/write ?

    Thanks

  11. @ALDEN: Using glusterfs to use failover you can use replicated volumes as they contain same information across all bricks.
    For loadbalancing you can create different volume using different bricks and read/write same data using different volume. As volume will be present as shared filesystem to client you can instruct your application to do load balancing on different mounted volumes.

  12. ALDEN says:

    Hello Sohail,

    In my client server I have installed vsftpd or ftp for some users to easily upload the data in my server rep-volume using filezilla. but whenever I upload any file it always respond error (553 Could not create file.), but I can download the files. what do you think is the problem? is it better if I setup my server1&2 as a server client no more dedicated server for client?

    Thanks.

  13. Hi Alden,

    Could not create file only lead to permission issue’s. You need to check permission from top to bottom level till your brick. You might also check ACL for it too.
    Regarding your setup it depends on your’s client need. You need to serve well, so if server client serve you better, I am with you.

    Regards,

    Sohail Riaz

  14. Aman Ullah says:

    HI Sohail,
    Please add one more this in this tutorial. How to migrate Volume and its data into new Hard Drive.

Leave a Reply

Your email address will not be published. Required fields are marked *

« »