Possible dead lock (OLS)

wav3front

Hi all,

It's a busy site that runs, and every now and then it completely freezes for about 1 min.
I can see this notice error on OLS logs:

No request delivery notification has been received from LSAPI application, possible dead lock.

Anyone had similar issue?

I just raised PHP_LSAPI_CHILDREN from 10 to 30 and also Max Connections for lsphp to 30

It's really something that these changes are not persistent on Enhance. Will they stay if I dont do anything on the panel?

Rich

wav3front Updates within the panel would more likely change the vhost cofigs. Its updating the enhance panel version that will cause docker OLS to update thereby rendering your webconsole settings undone.

I've not seen this as an issue on a variety of VPS - > Dedi's what's your setup as maybe there's an IO Wait issue somewhere?

slimx

and what are the server specs?

wav3front

its a 4464P server with 128GB of ram and nvme drives.

It's not the hardware. The load is not high.

Rich

wav3front OK, at least you've tried to tune it for more throughput. What about DB slow queries? Is it a WP site? A 1 minute pause is pretty significant, bad plugin maybe?

Honestly I am not sure what else to suggest.

wav3front

I just noticed this.
By default, this exists on the lsphp environment on Enhance:
LSAPI_AVOID_FORK=200M

But per documentation:
https://docs.litespeedtech.com/lsws/extapp/php/configuration/options/

this setting is binary, 0 or 1?
error?

Rich

wav3front

"LSAPI_AVOID_FORK=1 will only keep the child processes alive if there is enough available memory. By default "enough" is set to 1GB, so if your server has less than 1 GB available, setting LSAPI_AVOID_FORK=1 will not work. Instead you can set a limit, as in LSAPI_AVOID_FORK=100M. This will allow the LSAPI_AVOID_FORK variable to work as expected."

I'd say it's fine, it's telling it to respect 200MB instead of 1GB.

You could up that to 500M or even 1 if that's a dedicated server to a big website. 1 means zombie processes might exists, which with that RAM you've got isn't likely an issue.

Are you using any hard limits on a package that the website is a part of?

wav3front

Rich

Thanks for the info.

No, I'm not using any limits. It's a dedi server.

This 1min freeze continues to happen even with increased OLS settings. I believe its a php/database issue, so I need to dig deeper.

Rich

wav3front I am personally thinking DB, so maybe have a look here. Although I wouldn't rule out php workers entirely.

wav3front

Hi,

I cannot find a solution to this.

There's nothing else on OLS log, The only error is "No request delivery notification has been received from LSAPI application, possible dead lock."
Raiting limits like PHP_LSAPI_CHILDREN or Max Connections makes no difference
Nothing on PHP logs that could at least give me a hint
Nothing on MariaDB logs that could give me a hint
MariaDB slow query log does contain some entries, but they seem to be uncorrelated with the issue. They are too few and the timing does not much.

I'm totally out of ideas. I have started doing random things like changing php versions....i don't know what to do.
It's not WP websites. One website is Xenforo, the other one is an old classified application called Oxy Classifieds,
and Revive Ad Server also runs on the same server.

Any ideas?

Rich

wav3front

Do you notice any 500 or 503 errors when seeing these deadlock messages?

Do you have any package restrictions for resources at all? If so what about setting up a separate package with no restrictions and seeing if one of these sites still has issues? possible nproc or io issue? I used to see cpu spikes with some restrictions on.

As for OLS have you looked at initTimeout, pcKeepAliveTimeout?

Tried xdebug or deep php logging could it be a poorly formatted php script causing infinite loops, or some really long running threads... have you tried opening up the timeout for PHP execution to something un-godly to see if you can catch something in htop?

Is it even possible to put these sites, or even one site on a seperate server, such that you can set some wild settings to see? try alternative web servers just incase.

Do you have this or see anything in it? /usr/local/lsws/logs/stderr.log

Rich

There some other things you could do to affect the sites individually using .htaccess might help give you some more tools to play with.

https://docs.litespeedtech.com/lsws/cp/cpanel/long-run-script/#easiest-solution

wav3front

Hi @Rich , thanks for getting back to me.

I haven't personally seen a 500 when viewing the websites; all I get is a complete freeze for 1-2 minutes.
The log does contain some though:

2024-12-18 17:00:31.027846 NOTICE [440899] [***.**.194.55:38960:HTTP2-3#*****.gr] No request delivery notification has been received from LSAPI application, possible dead lock. 2024-12-18 17:00:30.096404 NOTICE [440900] [***.***.251.176:63184:HTTP2-303#*****..gr] ExtConn timed out while connecting. 2024-12-18 17:00:30.096490 NOTICE [440900] [***.***.251.176:63184:HTTP2-303#*****..gr] oops! 503 Service Unavailable

Do you have any package restrictions for resources at all? If so what about setting up a separate package with no restrictions and seeing if one of these sites still has issues? possible nproc or io issue? I used to see cpu spikes with some restrictions on.

No, there are no limits. Those websites run on a dedi server with high resources

As for OLS have you looked at initTimeout, pcKeepAliveTimeout?

I'm trying to find those in OLS admin. Where are they?

Tried xdebug or deep php logging could it be a poorly formatted php script causing infinite loops, or some really long running threads... have you tried opening up the timeout for PHP execution to something un-godly to see if you can catch something in htop? Is it even possible to put these sites, or even one site on a seperate server, such that you can set some wild settings to see? try alternative web servers just incase. Do you have this or see anything in it? /usr/local/lsws/logs/stderr.log

stderr contains nothing related. only:

2024-12-02 10:01:18.800 [STDERR] sh: 1: /usr/sbin/sendmail: not found

I suspect this is an issue with Oxy classifieds. It's a badly written application. Even though it has run for decades without issues, but what I'm thinking is that, it previously run on MySQL5 and now it runs on MariaDB 11, maybe MariaDB handles bad queries differently. But there is nothing worth notting on slow query log or mariaDB error log.

The only thing I can do now, is start separating the applications to different server, so that I can at least "make sure" that this is indeed coming from Oxy like my hunch is telling me.

Rich

wav3front If it's so old and came from MySQL 5.x What's the table type? MyISAM? InnoDB took over as the default table of choice, surely MariaDB would of complained a bit. What's the collation it's using. Is there anyway of checking and spinning off a clone to play with these, if they're even applicable?

initTimeout is under: External App > SAPI > lsphp: Initial Request Timeout (secs)

In same area there is Connection Keep-Alive Timeout

He had his under this wsgiDefaults config setting, but that's not as relevent within webconfig as there rails/python/node, you're needing PHP only.

I was reading through this post: here

There was also mention of zlib setting, but I don't think it's that, but you might as well try... what version of your server are you on, as my current default is 1.7.19 and 1.8.2 is available. There's another post on here about updating the OLS webserver. It's possible there's been fixes since then as people was complaining about this as late as 1.7.12 in that post.

wav3front

The version is OLS is: OpenLiteSpeed 1.8.2

I doubt if this is actually an OLS issue.

Rich If it's so old and came from MySQL 5.x What's the table type? MyISAM? InnoDB took over as the default table of choice, surely MariaDB would of complained a bit. What's the collation it's using. Is there anyway of checking and spinning off a clone to play with these, if they're even applicable?

Yes, some tables are actually still MyISAM.

I will switch to InnoDB and report results.

wav3front

Rich initTimeout is under: External App > SAPI > lsphp: Initial Request Timeout (secs)

In same area there is Connection Keep-Alive Timeout

Raised those, no change. I just had a freeze for 1 min.

Rich

Might be worth switching that site on to a server you can switch to apache, if the issue goes away, it's OLS, if it doesn't then it's something else, either likely php or db.

Another thing might be to temporary use netdata, since you can look through the metrics and correlate with pauses and try drill down to any iowaits or cpu spikes. That said some are concerned removing it doesn't clean up properly, so again might be something to do on a different server. I had some horrible CPU spikes and didn't know what was causing them until I saw which client php process was with netdata... you already know what client is the issue, but it might give more detail across the board to review.

Does this only affect this site, or does the pauses effect other sites at the same time?

Also have you given this a try? enhance php strace it talks looking for hangs.

Beyond this all I can think of is something like newrelic or php Xray? (not used it but might help you get vision on what's happening) I've setup Xdebug in the past for looking into why LSWS was failing on long executions when I configured the server to allow them. It helped but it's a pain to setup and it was my own code I was working on.

If you've got PHP pretty much set to what it was prior to moving the site, then it really could be something with Mariadb.

I know you said you'd used slow query but what about:
Try Configure Long Query Time: Define the time threshold for a query to be considered “slow”. You can set the long_query_time system variable to a value in seconds (e.g., 5 seconds). For MariaDB 10.11 and later, use log_slow_query_time instead.
SET GLOBAL long_query_time = 5; // or SET GLOBAL log_slow_query_time = 5;

and set it to like 45second, so you can isolate them long pauses and try see if it's caused by DB, if not, then must be PHP loop/ poor execution. Likewise what if you set the php execution down to say 20seconds, can you break the site instead of it hanging?

I think your at the stage of trying to isolate the what and where rather than the how. Fingers crossed for you!

wav3front

Rich Also have you given this a try? enhance php strace it talks looking for hangs.

Btw this will not work:

Followed the instructions and I'm getting:

**_com01@*-*:~$ strace -p 22
strace: attach: ptrace(PTRACE_SEIZE, 22): Operation not permitted

@Adam something has changed since the time of the documentation?

wav3front

Hi, just a quick update: i switched to innoDB for ALL tables, nothing changed.

@Rich many thanks for the information. I will go through that and report results.

have noticed something though:
Monitoring htop, some CPU cores go up to 100% and they stay there for some time and even get on "red" teritorry.,
mariadb is on top of the list.

Here's my.cnf file

`[mysqld]
skip-log-bin

ssl-ca=/etc/certs/mysql/ca.pem
ssl-cert=/etc/mysql/ssl/cert.pem
ssl-key=/etc/mysql/ssl/key.pem

skip-host-cache
skip-name-resolve

default_authentication_plugin = mysql_native_password

innodb_flush_log_at_trx_commit=2
innodb_buffer_pool_size=6G
max_allowed_packet=512M
query-cache-type=1
query_cache_size=52428800
max_connections=500
innodb_flush_neighbors=0
innodb_flush_method=O_DIRECT_NO_FSYNC
innodb_io_capacity=450
innodb_random_read_ahead=ON
table_open_cache=16013

log_output=FILE
slow_query_log
slow_query_log_file=slow-queries.log
long_query_time=5.0`

The reason I have enabled query cache is because Oxy classifieds has some very badly written queries and it's a way to save it. What surprised me though is that, before moving this website to Enhance, MySQL run on a much slower Windows 2022 VM running MySQL5.7 (for query cache) without any issues. If anything, the CPU load was much less.

Riddle.

Rich

wav3front You might want to check on these ones.... I saw notices of redundancy in MySQL logs, when I was doing some tweaking recently:

skip-host-cache
skip-name-resolve

default_authentication_plugin = mysql_native_password

likewise I've commented out my certs as when I looked there wasn't any certs there anyway.

Your bufferpool size could be much bigger given all the ram you got. mines at 32GB on 64GB server. Here's mine to compare with: [note its for mysql not mariadb]

[mysqld]
# Binary logging
skip-log-bin                      # Disable binary logging for performance

# SSL settings
# ssl-ca=/etc/certs/mysql/ca.pem  # Commented out for SSL CA
# ssl-cert=/etc/mysql/ssl/cert.pem # Commented out for SSL
# ssl-key=/etc/mysql/ssl/key.pem   # Commented out for SSL

# Character set settings
collation-server=utf8mb4_unicode_ci  # Use utf8mb4 for better Unicode support
character-set-server=utf8mb4          # Use utf8mb4 for better Unicode support

# Basic settings
max_connections=300                   # Increased to allow more concurrent connections
thread_cache_size=16                  # Increased to reduce thread creation overhead

# InnoDB settings
innodb_file_per_table=1               # Keep for better space management
innodb_buffer_pool_size=32G           # Set to 32GB for caching
innodb_buffer_pool_instances=32        # Increased for better concurrency
innodb_log_file_size=512M             # Increased for larger transactions
innodb_log_buffer_size=64M            # Increased for larger transactions
innodb_flush_log_at_trx_commit=2      # Set to 2 for performance with some durability
innodb_flush_method=O_DIRECT           # Keep for performance
tmp_table_size=256M                   # Increased for larger temporary tables
max_heap_table_size=256M              # Increased for larger heap tables
innodb_thread_concurrency=0            # Allow InnoDB to manage concurrency
innodb_read_io_threads=4               # Increased for better read performance
innodb_write_io_threads=4              # Increased for better write performance
innodb_io_capacity=4000                # Increased for better I/O performance
innodb_io_capacity_max=10000           # Increased for better I/O performance
innodb_checksum_algorithm=crc32
innodb_log_compressed_pages=OFF
innodb_change_buffering=all
innodb_redo_log_capacity=8G            # Set to 8G for better recovery performance

# Additional settings
activate_all_roles_on_login=ON         # Keep for role management
host_cache_size=0
performance_schema=ON                   # Enable performance schema
sql-mode="NO_ENGINE_SUBSTITUTION"

# Additional tuning based on MySQLTuner recommendations
join_buffer_size=512K                  # Increase join buffer size
table_definition_cache=6000             # Increase table definition cache

Your server is better specc'ed than mine. It's possible your DB is a little restrictive on table sizes and how many it can have in working memory. Mine isn't perfectly tuned, but its performing a lot better now than it was before it's last tuning.

You have a few rules I'll look into myself like innodb_random_read_ahead=ON and innodb_flush_neighbors=0. Thanks for sharing.

Rich

wav3front Interesting per docs it doesn't work for me either... I tried one of the user PHP pids as route, triggered a webpage reload and saw files being accessed etc..

wav3front

Rich I'm always baffled about tuning databases. Mainly because it's actually difficult to make real, acurate, "scientific" tests to actually know how each parameter changes things, to better or worse.

About innodb_buffer_pool_size, I also thought that you simply give us much as possible, until I run Releem on a very busy server and one of their recommendations was to actually descruase innodb_buffer_pool_size.

Again, I (personaly) have no way to actually measure -properly- each change on my.cnf.