MySQL/MariaDB Random Stops: Page 2 - Enhance Control Panel

MySQL/MariaDB Random Stops

gmakhs

cPFence since the OOM is triggered from the container reaching the limit, this is a bug, if the server was out of ram maybe i would agree with you but in the case we discuss it isnt

cPFence

gmakhs

Just monitor the slow queries, you’ll almost always find the root cause there. It’s usually the starting point for these kinds of issues.

gmakhs

cPFence Again you miss the forest for the Trees....

There is no point to monitor slow queries on a server with a lot of websites when the source of the issue is somewhere else

MarkD

gmakhs Yes I have raised a ticket to get their feedback on this

cPFence

gmakhs

No problem, you're free to ignore what I said. Just keep hunting for that mysterious hidden bug in Enhance. Good luck!

MarkD

MySQL metrics is a dog and won’t help much in actually pinpointing the issue. Just monitor slow queries, if you find nothing there, then start looking elsewhere. But take it from this old guy: when you’re dealing with MySQL problems, always start by checking for slow queries. You’ll thank me for this tip later.

MarkD

cPFence Ordinarily I would agree with you but we also monitor the MySQL metrics and there was nothing showing as you can see from this 5 minute window, nothing zip, nada
MySQL Usage

cPFence

MarkD

Add this to your [mysqld] section in my.cnf:

slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 2
log_queries_not_using_indexes = 1

Wait until the next issue happens and check the log. If nothing shows up there, only then move on to other troubleshooting steps.

MarkD

cPFence
I have now added those extra lines to a few of our shared servers and will wait for the next restart which could be hours or days away. I hear what you are saying about the slow_queries and we did already monitor them but only if they took longer than 10 seconds and there weren't any showing this morning - I have now set them as per your suggestion.

I am also going to install netdata on the same servers as that will give us per second memory usage so that the next time this happens we can see exactly how much memory was free when MySQL gets killed - assuming that this is being killed by the OOM process 🤷‍♂️

cPFence

MarkD we did already monitor them but only if they took longer than 10 seconds

I'm not going to mention names, but it's pretty common for large hosting providers (those not using CloudLinux) to set their slow query monitoring as low as 0.5 seconds. They proactively contact clients running slow queries, pushing them to optimize or upgrade their plans. Honestly, I don't blame them; in shared hosting, keeping queries efficient is crucial for overall server stability—and this can easily be achieved by selecting well-coded themes and plugins.

gmakhs

MarkD looking forward to seeing what you find.

One of my server who got the issue had both lsws and mariadb killed (different times ), and there are a handful of sites there with 128 GB ram, then I found the ooms on the /websites/ So I came to that conclusion.
Looking forward to be proved wrong and probably find a solution .

@cPFence I don't believe is a bug of enhance itself, I believe is a byproduct issue of MVP development, and I also believe it can be improved .

As for cloud Linux, is super cheap 13 USD per month, and it does very good job(different php versions , extensions , limits , mysql governor , lve, super fast support ), I wish enhance could work with it.
On sharing hosting you can't predict which website will abuse, and having properly working resource limits, with soft and hard options, is important.

MarkD

So we have managed to catch what has killed MySQL on one occasion and in this instance it was Litespeed but we have no idea why it did that.

We have a server where MySQL was shut down on 13/04/25 at 19:42:27.

If you look at this Audit log it shows the litespeed process sent the SIGKILL

type=PROCTITLE msg=audit(04/13/25 19:42:27.498:355805) : proctitle=litespeed (lshttpd - main)
type=OBJ_PID msg=audit(04/13/25 19:42:27.498:355805) : opid=56235 oauid=unset ouid=mysql oses=-1 obj=/usr/sbin/mysqld ocomm=ib_io_rd-2
type=SYSCALL msg=audit(04/13/25 19:42:27.498:355805) : arch=x86_64 syscall=kill success=yes exit=0 a0=0xdbb8 a1=SIGKILL a2=0x8 a3=0x748ba6bb1fc0 items=0 ppid=1 pid=2303176 auid=unset uid=root gid=www-data euid=root suid=root fsuid=root e
gid=www-data sgid=www-data fsgid=www-data tty=(none) ses=unset comm=litespeed exe=/usr/local/lsws/bin/lshttpd.6.3.2 subj=unconfined key=sigkill_watch

And then if we look at the litespeed error log for that same time we see this

2025-04-13 19:42:11.703539 [NOTICE] [2303176] [T0] [2303176] Cmd from child: [extappkill:56279:-3:0]
2025-04-13 19:42:11.703566 [INFO] [2303176] [T0] Failed to get process [56279] start time, not running, skip killing.
2025-04-13 19:42:11.703339 [INFO] [2303179] [T0] [192.248.156.201:36770>2.223.152.152#xxxxx.co.uk] Abort request processing by PID:56279, kill: 1, begin time: 3, sent time: 3, req processed: 30
2025-04-13 19:42:11.703345 [NOTICE] [2303179] [T0] sendKillCmdToWatchdog: 'extappkill:56279:-3:0'.
2025-04-13 19:42:12.700474 [INFO] [2303179] [T0] [147.78.3.13:12347-H3:7C275FFA95E3C447-0>213.180.203.204#xxxxx.co.uk] Access is denied by context rewrite.
2025-04-13 19:42:27.390287 [NOTICE] [2303176] [T0] [2303176] Cmd from child: [extappkill:56248:-3:0]
2025-04-13 19:42:27.390396 [INFO] [2303176] [T0] [CLEANUP] Send signal: 15 to process: 56248 (ib_io_rd-2)
2025-04-13 19:42:27.500768 [INFO] [2303176] [T0] [CLEANUP] Process 56248 (ib_io_rd-2) wont stop after SIGTERM, send SIGKILL.
2025-04-13 19:42:27.390135 [INFO] [2303179] [T0] [192.248.156.201:57228>2.223.152.152#xxxxx.co.uk] Abort request processing by PID:56248, kill: 1, begin time: 4, sent time: 4, req processed: 142
2025-04-13 19:42:27.390140 [NOTICE] [2303179] [T0] sendKillCmdToWatchdog: 'extappkill:56248:-3:0'.

We have been capturing all of the running processes every 15 seconds and here is the mysql process just before it was killed (you can see it was running since April 6th)

mysql 56235 5.3 19.2 4997680 3138128 ? Ssl Apr06 498:59 /usr/sbin/mysqld

The 2 processes mentioned in the litespeed error log (56279 & 56248) were not picked up by the process capture so no idea what they were and due to their low id's process id recycling must be in effect.

So litespeed tried to kill 2 processes that did not exist and ended up killing Mysql? I checked our pid_max value and its 4194304 so no issues there.

If this is happening to other users it would explain why we are seeing mysql being terminated for what appears no reason as it is not an OOM kill but a litespeed kill - so, why is litespeed killing mysql?

gmakhs

MarkD I had my Litespeed killed today, not from OOM, i haven;t experienced mysql in a while.

I can't tell why :/

twest

MarkD good detective work, getting us closer to a resolution. I'm also experiencing this, seems totally random at different times and different servers, maybe 1 server per week out of the bunch. Definitely could be Litespeed related, I'm also using it.

MarkD

We have been working with LiteSpeed technical support to identify the reason for it shutting down the MySQL service and we now have a working theory and possible fix.

It would appear that under some circumstances when a MySQL thread uses a recycled lsphp process id it can get shut down by LiteSpeed. LiteSpeed is issuing a kill -9 to the thread process id but this actually gets logged in the Audit trail as a kill -9 to the main MySQL process id so it shuts down.

This explains why we have only ever seen this problem on our busier servers. All of our servers pid_max are set to the default of 4194304 (cat /proc/sys/kernel/pid_max) and that range is easily being hit so pid recycling starts to happen.

LiteSpeed support have provided a test fix for this in their debug builds but as of yet we have been unable to get the debug build to work properly under enhance so we can't comment on if this has fixed it yet.

If you want to test this yourself then feel free to install the latest debug build:

Please be aware that this is a DEBUG build you are installing so I accept no responsibility for what it does

/usr/local/lsws/admin/misc/lsup.sh -d -f -v 6.3.2

That being said, if you have any problems you can easily revert back to the latest STABLE build with:

/usr/local/lsws/admin/misc/lsup.sh -f -v 6.3.2

As I find out more I will let you know.

MarkD

The DEBUG build has not fixed the problem 🤬 - I have reported this back to LiteSpeed

gmakhs

MarkD inform litespeed that the way enhance handles php, litespeed has no power over the lsphp and can't terminate or restart those processes that might be part of the issue

MarkD

gmakhs In that case in the Litespeed configuration should we not set "External application abort" to "no abort" in Server->General tab, under General Setting?

It wont fix the bug with the wrong process id but it will at least stop litespeed trying to shut down the process in the first place?

SystemFreaks

MarkD We are also dealing with the same problem, it seems to be Enhance - Litespeed related.
I made a Ticket also with litespeed and i am thinking the same as you about no abort, but i wouldn't try it without advising Litespeed's support.

MarkD

We have a fix!

Finally Litespeed have acknowledged and provided a fix for this problem with their latest debug build. We have been running this version for 3 days now and not a single process has been killed 🥳

If you want to install the fix now you will need to install the debug build of 6.3.3.

Please be aware that this is a DEBUG build you are installing so I accept no responsibility for what it does

/usr/local/lsws/admin/misc/lsup.sh -d -f -v 6.3.3

If you have any problems you can easily revert back to the latest STABLE build with:

/usr/local/lsws/admin/misc/lsup.sh -f -v 6.3.2

Right, onto the next problem now where our Journal logs are constantly being cleared every few hours 🤦‍♂️

twest

Awesome, thanks for the follow up! Hopefully that makes it's way into general release soon 🙂

« Previous Page