La version française est disponible ici: Cluster Xen sous Debian GNU/Linux avec DRBD, Corosync et Pacemaker
Introduction
Xen is one of the most advanced open source virtualization technology. Using virtualization allows easier server deployment and, starting,
enhance application availability. Thanks to Live migration, admins can easily empty an host server (AKA Dom0) so that they can fix hardware issue, or perform updates,
without the need of shutting down virtual machines. But, at first look, all this stuff has to be done manually. Since root cause of many outage is human errors,
this could be great to be able to make it automatic. Hard to do ? Not so, follow the guide...
Cluster basics
According to Wikipédia, a cluster consists in technics which aim to group a pool of physical and independant server, making them working together for :
Cluster size may vary a lot. In fact, cluster start from 2 servers up to thousands of them.
Xen cluster requirements
In our exemple, we'll use 2 physical servers as Dom0. Virtual machines will be spread on both dom0. So we have to:
be able to balance domU between dom0
make sure that each dom0 can host all domU at a time
make domU file system reachable on each dom0
be able to "live migrate" domU from one dom0 to the other
share domU configuration file between dom0
Xen cluster constraints
Our cluster must be compliant will following requirements:
Xen configuration is centralized on a dedicated cluster FS
The cluster FS has to be automatically mounted in each dom0
DomU state saving has to be desactivated since live migration will be handled by cluster stack
DomU must not be automatically started (cluster stack will take care of that)
DomU can be start only if associated DRBD resource is in master state on dom0
DomU cannot run if cluster FS is not mounted on dom0
DomU must run only on 1 dom0 at a time
Cluster architecture
Both dom0 have a double network attachment. The first one will be dedicated to Wan access, the other one will be use for DRBD replication, cluster managment and live migration.
Cluster installation on GNU/Linux Debian
As cluster stack, we will use Corosync and Pacemaker. First one is a cluster's messages layer, second one is a resource manager.
Both of them are available in a specific debian repository:
Once intalled, it's time to configure cluster. First, you need to generate cypher key to authenticate cluster messages and nodes.
Then, you can configure Corosync itself so that he use the right network interface.
Here, you can start your cluster on both nodes, just after having enabled Corosync in file /etc/default/corosync using option Start=yes.
You can check cluster status using command crm_mon --one-shot -V.
Cluster is telling us that he did not found any STONITH resource and that's really bad.
STONITH means "Shoot The Other Node In The Head". This is one of the most important functionnality for clusters: the ability of being sure that a resource
is not duplicated. Consider a specific resource like virtual IP address: having the same IP address on both cluster node is the best way to make your
cluster fail. For domU, it's about the same: they can not run on both dom0 at the same time.
Cluster configuration
Our cluster is now configured with basic options. It's time to set up some other ones:
stonith
You now know it, a production environment must have STONITH configured. That said, we won't use it here (don't be afraid, Google is your friend, even if it's somehow evil).
quorum
It's an election mecanism which help cluster deciding if it can work properly or not. It's useless for a 2 nodes cluster.
ressource default stickiness
Allow resources to stay on the node they are runnning on, even after a fail-back. Therefore, admin will have enough time to make sure he fixed
issue before getting failed node back in cluster pool.
To configure these options, we'll use Corosync's integrated shell. Shell can be launched with command crm
It's now time to install Xen.
LVM configuration
Our Xen cluster uses LVM. A summary of the main LVM commands can be found here (french only, sorry guys):LVM: Logical Volume Manager
You must create a VG named XenHosting. In this VG, you can create a LV named cluster-ocfs.
Here, you have to set up global options before getting drbd0 working:
OCFS2 installation
Many other clustered filesystem exist. For this exemple, we'll use OCFS2.
OCFS2 configuration
One word about OCFS2. In a perfect world, we should manage OCFS2 with pacemaker. In this particular case, this won't be the case (I had issues with lock managment which is mandatory for pacemaker).
OCFS2 cluster resource configuration
We did not specify any mountpoint for our OCFS2 file system we just create. I did not forgot anything: pacemaker will take care of everything for us.
But, let's wait 2 minutes before goin on and think. The only goal of OCFS2 is to allow us to share configuration files between nodes without the need of copy them. But we have to start DRBD replication
before mounting filesystem. If DRBD resource is not enabled, or not in master state, we must not mount file system.
That said, we need:
a DRBD ressource (Master) which will control resource status (that's why we need DRBD with version > 8.3.2)
a FileSystem resource (Clone) which will control FS mount. This resource will be cloned, that is it will apply to both nodes with only one definition.
a Constraint which will allow mount only after DRBD start
a mountpoint. Here, this will be /cluster. Let's create this directory.
Here we go:
If you created /cluster directory, your clustered FS will "automagicaly" mount.
Anyway, don't create any domU for now. Even if the migration of a stand-alone virtualization system to a clustered one is quite easy,
it's simpler to start from scratch, specialy if you are a beginner in cluster.
Xen configuration
Xen dom0 configuration will stay in /etc/xen/ but domU's one which will be in /cluster.
As you can see, I'll use cluster FS to share some other things, like iso images or debian packages I need on both nodes as well as domU configuration files.
To let cluster stack deal with domU managment, we have to disable domU auto start and state save.
Let's adapt Xen configuration
Before restarting xend:
You can now restart xend. Now you can setup your first domU.
Xen domU configuration
We'll setup an HVM domU
This is our first domU config file. It will have a CDRom as well as a VNC console. CDRom will be available at boot.
You'll be able to reach VNC console with SSH tunnel:
VNC console is now available on localhost, TCP port 5901.
You can now install domU, or you could also integrate it now into cluster. If you install it now, you'll have to shut it down before setting up cluster stack.
Configure associated Master resource so that DRBD resource can be simultaneously in master state on both nodes
Configure Xen ressource
Configure Constraint which will make sure Xen resource can not run on both dom0 at the same time and which will ensure DRBD is in master state before starting domU
The last point is very important. starting version 8.3.2, DRBD is fully integrated in pacemaker cluster. But DRBD can still be in a split-brain mode.
And you will have to fix it manually (automatic split-brain recover is really not an option in production clusters).
If you try to start a Xen resource on a node without Master DRBD resource, cluster stack will register fail attempt and prevent further start on this node,
creating a constraint.
To fix it, you'll have to deep into cluster configuration to manually remove constraint
Now domU should start. If you did prepare the physical DRBD resource on the other node, you shall be able to live migrate your DomU. If you did not, you really should ;-)
Last but not least, you don't need to set up cluster on the second node, configuration has already been propagated by cluster.
While defining Xen resource, attribute metaallow-migrate="true" allows domU live migration.
If it's not set and you want to migrate resource, then domU will be stopped on first node, and then started on the second one.
Dealing with cluster resources
Find here a summary of cluster managment main commands
I'm a system engineer specialized in Linux / Unix. I mainly work on virtualisation and on web performances.
From time to time, I find time to read some books listening classical music. But I always take my keyboard back quickly.