Good info
It’s 3 O’Clock in the morning, do you know where your
Xenserver Poolmaster is? Your client calls you frantic, and you start a
GoToMeeting to see what’s wrong. If it’s down, this could have been the result
of a few issues. Maybe there was a network glitch which resulted in the Citrix
XenServer Poolmaster fencing itself from the rest of the farm. This can also
result during a power outage, or other catastrophic failure. This is the normal
defense mechanism built into Xenserver, and in the consulting world we see this
type of scenario often. You can’t simply reboot the Poolmaster to bring it
online. Restarting the toolstack will do you no good. There is a complex
process that must be followed, so let’s discuss it –
If
you’ve tried to connect to the pool from the Xencenter console, and it failed –
your Poolmaster may be down. Verify this by dropping to a command prompt and
issue a command like “Xe host-list” to see if you get a coherent response.
If you get an error message like ““Cannot perform operation as the host is
running in emergency mode” – then your Poolmaster is almost certainly down.
How do I get the Poolmaster back up?
This
is easier said than done. First you’ve got to promote another server in the
farm to become the Poolmaster, so that it can take over pool operations. From
that servers CLI, type the command, “xe pool-emergency-transition-to-master” which will transition it to be the new
Poolmaster. If the command runs successfully, you can recover the other pool
servers by issuing the command,
“xe pool-recover-slaves”. Now if pool management is working again,
you should be able to successfully run the “xe host-list” command and get a valid response.
Now that the pool is back online, how do I fix the failed poolmaster?
1).
First you have to figure out which server in the environment has failed. To do
this, you’ll want to run the command, “xe
host-list params=uuid,name-label,host-metrics-live”. Any
servers that come back with “host-metrics-live
= false” have failed. Take note of the UUID of any
failed servers
2).
Second, you must determine which VM’s were running on that failed server. You
can do this by running the command, “xe
vm-list is-control-domain=false resident-on=UUID_of_failed_server”. Once
you’ve determined which VM’s were running, you need to reset their power state
in order to get them to move onto another server. To do this, run the command,
“xe vm-reset-powerstate
resident-on=UUID_of_failed_server –force –multiple”. You
should see the VM’s in question now show up as halted in the Xencenter console.
Restart each of the VM’s, and they should now boot up onto surviving pool
member servers.
For more information on various issues you can run into during this process, check out the official Citrix whitepaper here:
What are some root causes as to why my Xenserver Poolmaster may
have been down?
The usually suspects
include your network, because if the poolmaster loses connectivity to some of
the other Xenserver hosts in your environment, it could fence itself and go
offline as a built in defense mechanism. Poolmaster fencing is a typical
issue that can occur if there are network issues in your environment, so check
with the network team before you pass go.
No comments:
Post a Comment