GFS over DRBD on FC9x64

From Snix.hk

Jump to: navigation, search

Fedora 9 x64

Contents

Install Packages

DRBD

# DRBD doesn't exist in the Fedora 9 repository
# Download and install the Centos 5 (Enterprise Linux 5) package instead
wget http://mirror.centos.org/centos/5/extras/x86_64/RPMS/drbd-8.0.13-1.el5.centos.x86_64.rpm
rpm -iv drbd-8.0.13-1.el5.centos.x86_64.rpm

Download and build the necessary packages on FC9 x64.

GFS2

yum install cman gfs2-utils stonith

Linux HA

yum install heartbeat

Required Extra Packages

wget http://people.redhat.com/lhh/obliterate
mv obliterate /sbin
chmod +x /sbin/obliterate

Service Configurations after install

chkconfig iptables off
chkconfig iscsi off
chkconfig iscsid off
chkconfig qemu off
chkconfig avahi-daemon off
chkconfig libvirtd off
chkconfig gpm off

DRBD Startup Script Modification

# cman - Cluster Manager init script
#
# chkconfig: - 21 79

Edit the /etc/init.d/drbd script so it loads bewteen the CMAN and GFS2
Change the start and stop to 22 and 75:
# chkconfig: 345 22 75
# description: Loads and unloads the drbd module 
(NOTE: Delete the "BEGIN INIT INFO" block)

# gfs2 mount/unmount helper
#
# chkconfig: - 26 74

The reload the config with:

chkconfig drbd resetpriorities

Configuration Files

DRBD

global {
    # minor-count 64;
    # dialog-refresh 5; # 5 seconds
    # disable-ip-verification;
    usage-count no;
}

common {
  syncer { rate 10M; }
}

resource gfs1 {
  protocol C;
  startup {
    wfc-timeout 20;
    degr-wfc-timeout 10;
    become-primary-on both;
  }
  disk {
    on-io-error detach;
    fencing resource-and-stonith;
  }
  net {
    # timeout           60;
    # connect-int       10;
    # ping-int          10;
    # max-buffers     2048;
    # max-epoch-size  2048;
    max-buffers 2048;
    ko-count 4;
    allow-two-primaries;
    after-sb-0pri discard-zero-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;
  }

  on node1.mydomain.com {
    device      /dev/drbd1;
    disk        /dev/mapper/VolGroup00-DataVol00;
    address     10.2.42.31:7789;
    meta-disk   internal;
  }

  on node2.mydomain.com {
    device     /dev/drbd1;
    disk       /dev/mapper/VolGroup00-DataVol00;
    address    10.2.42.32:7789;
    meta-disk  internal;
  }
  handlers {
    outdate-peer "/sbin/obliterate";
  }
}

cluster.conf

<?xml version="1.0"?>
<cluster alias="clustername" config_version="1" name="clistername">
        <cman expected_votes="1" two_node="1"/>
        <clusternodes>
                <clusternode name="node1.mydomain.com" nodeid="1" votes="1">
                        <fence>
                                <method name="human">
                                        <device name="last_resort" ipaddr="node1"/>
                                </method>
                        </fence>
                </clusternode>
                <clusternode name="node2.mydomain.com" nodeid="2" votes="1">
                        <fence>
                                <method name="human">
                                        <device name="last_resort" ipaddr="node2"/>
                                </method>
                        </fence>
                </clusternode>
        </clusternodes>
        <fencedevices>
                <fencedevice agent="fence_manual" name="last_resort"/>
        </fencedevices>
        <rm/>
</cluster>

/etc/sysconfig/cman

CCSD_OPTS=-4

Initialization

DRBD

Initialize the DRBD disk

# Zero the target disk if necessary
dd if=/dev/zero of=/dev/mapper/VolGroup00-DataVol00 bs=512 count=1024
# Initialize the drbd "disk" on both nodes
drbdadm create-md gfs1
# Mark as consistent
drbdadm -- 6::::1 set-gi gfs1

Start DRBD

service drbd start
drbdadm state all
drbdadm primary all

Cluster

No cluster specific configuration required...

GFS

Make the GFS volume

mkdir /data
mkfs.gfs2 -p lock_dlm -t mycluster:gfsvol /dev/drbd1 -j 2

Start the cluster

On both nodes:

service cman start

Mount the GFS volume

mount -t gfs2 /dev/drbs1 /data
cman_tool services

Error Recovery

Split-Brain

Split-Brain detected, dropping connection!

After split brain has been detected, one node will always have the resource in a StandAlone connection state. The other might either also be in the StandAlone state (if both nodes detected the split brain simultaneously), or in WFConnection (if the peer tore down the connection before the other node had a chance to detect split brain).

At this point, unless you configured DRBD to automatically recover from split brain, you must manually intervene by selecting one node whose modifications will be discarded (this node is referred to as the split brain victim). This intervention is made with the following commands:

drbdadm secondary resource 
drbdadm -- --discard-my-data connect resource

On the other node (the split brain survivor), if its connection state is also StandAlone, you would enter:

drbdadm connect resource