Hi all,
I’m working on a disaster recovery strategy for our EnhanceCP cluster and could use some clarification on recovering from a master node failure (i.e., the control panel server).
Current Setup
1x Master node (Enhance panel only)
1x Backup node, used for off-site website backups (and yes, we also have a backup of that backup! 😄)
1 to N Site nodes, each designed to host 1 or more websites depending on resource needs
(At the moment, our 2 test sites are still running on the master node.)
Eventually, each site will run on its own dedicated server or in small groupings per node.
What I Understand
I know how to handle a failed site node using the “Mark as decommissioned” + “Restore websites” workflow.
What I Need to Know
However, I haven’t found clear documentation for the master node recovery process. So I’m wondering:
Are the site nodes still reachable/operational on a master node failure?
If the master node fails, can I rebuild it on a fresh server and/or restore from backup? Which files have to be restored?
After rebuilding, will the existing site nodes reconnect automatically? Or do I need to re-register them manually (via shell or otherwise)?
Are there any known caveats, limitations, or missing pieces in the recovery process?
Does Enhance provide any best practices or recommended steps for backing up and restoring the master node for fast disaster recovery?
Any insights, especially from those who’ve tested this in real scenarios, would be greatly appreciated!
PS. Thanks to chatGPT for helping me with the text 😄
Thanks in advance,