Cant get orchd to restart after server issue.

sip

I have two servers currently. one has the control panel and a few websites. the other just for websites.
I had an issue yesterday where the second server went down for an hour or so.
Got a notification that some sites were down and if i try to login to the control panel i just get a spinning ajax loader and nothing ever loads.

I am using open litespeed as a server.

Things i have checked and confirmed.
docker ps showed uptime of 5 weeks on the main control panel server until i restarted it.
otherwise the dockers seem fine.
neither server is out of space.
Htop doesnt show any high usage on either of the servers.

From what i can gather it appears that the control panel server isnt able to fine the second one and as such wont start orchd.

from docker logs orchd.

4eeb0bc5d0ed is unreachable: Error { kind: RpcUnavailable, context: None, entity: None, message: Some("failed to connect to all addresses") }
2025-06-01T20:41:23.426468Z  WARN ThreadId(27) orchd::scheduler::fetch_service_statuses: Server 7xxxxx5-73aa-xxx-xxxx-4eeb0bc5d0ed is unreachable: Error { kind: RpcUnavailable, context: None, entity: None, message: Some("failed to connect to all addresses") }
2025-06-01T20:42:23.423531Z  WARN ThreadId(27) orchd::scheduler::fetch_service_statuses: Server 7xxxxx5-73aa-xx-xxxx-4eeb0bc5d0ed is unreachable: Error { kind: RpcUnavailable, context: None, entity: None, message: Some("failed to connect to all addresses") }
2025-06-01T20:43:23.424128Z  WARN ThreadId(27) orchd::scheduler::fetch_service_statuses: Server 7xxxxx5-xxxx-xxx-xxx-4eeb0bc5d0ed is unreachable: Error { kind: RpcUnavailable, context: None, entity: None, message: Some("failed to connect to all addresses") }
2025-06-01T20:45:03.006969Z ERROR ThreadId(28) orchd::scheduler::stat_polls: Failed to collect server stats for 7xxxxx5-xx-xx-xx-4eeb0bc5d0ed :internal: RpcFailure: 4-DEADLINE_EXCEEDED Deadline Exceeded

I have restarted both servers several times.
server 1 can ping server 2 and see it without problem.

The only thing i can think of atm. is that when the server was down it was put into recovery mode. which changes the ssh fingerprint etc and that at some point there server 1 got mixed up or stuck in some way.

i assume i just know too little about the potential problems and im hoping someone can shed some light?

Thanks in advance

PDudeP

Enhance is no longer using docker.
On which version are you?

sip

I am still on v11 atm. I read through the upgrade to v12 and had put it off for a later day as i was away for a while.

I believe im on 11.0.4. but i cant get into the control panel now to check. is there a way to check from ssh?

sip

im sure ovh are having some wierd issues atm.

but anyway i ran the v12 update. now i get a grub boot screen when i look at kvm as it never came back up.

Kosta

sip download your websites data and start with fresh cluster

sip

yeah. i was wondering how do i do that?
i can get the mysql folder and remount it in another db i assume.
but where are the site files held?

Kosta

sip how u can have websites and hosting platform but u don’t know where is files located????
Download the databases as well!

MarkD

The website files will be in the /var/www/xxxxxxxxx folder where xxxxxxxxx is the GUID of each website

sip

MarkD Thank you.

Kosta because i didnt need to know until now. i have been using the control panel and all the sites on here are wordpress sites i transferred using cpmove.