For some reason since we upgraded to v12 our journal logs are regulary getting wiped to the point that we never have more than a few hours worth. And if we run the journalctl --rotate command everything dissapears.
We have triple checked the settings and we know they are setup correctly with persistant storage and supposed to allowing up to infinite storage (for testing this) but they still get wiped every time.
We have over 30 servers at the moment running Enhance and we are only seeing this problem on 10 of them but those 10 servers happen to be ones where there are more than 20 websites on them so we are thinking maybe it's something to do with load or the number of overlays?
2 of the servers only have the Backup role installed, another only has the email role so it's not role specific.
What we do know after countless hours of investigating is that the journal files appear to frequently get corrupted (journalctl --verify shows this) and this may be one reason why they are being cleared as apparently that is what --rotate will do, it clears the logs if it sees any corruption but that doesn't happen 100% of the time.
I have tried Nuking the logs with these commands
systemctl stop systemd-journald.socket
systemctl stop systemd-journald-dev-log.socket
systemctl stop systemd-journald.service
rm -rf /var/log/journal
mkdir -p /var/log/journal
chown root:systemd-journal /var/log/journal
chmod 2755 /var/log/journal
systemctl start systemd-journald.socket
systemctl start systemd-journald-dev-log.socket
systemctl start systemd-journald.service
After nuking them the problem does go away and we can use the --rotate command without any problems but the problem reappears the next day. Incidentally, after nuking them we only see 2 journal files get re-created which is the system one and the user-1000.journal, we don't see any of the other user-xxxx.journal files until the next day.
Today - I did spot this in the logs on one of the servers that has Application + Database roles (it does not have the email role)
srs_milter[4109618]: can't create event pipe: Too many open files
Do we need the srs_milter service on Application servers?
And as for the journals, is there just too much logging going on in v12 when there are multiple users or is there something else that I can check as I am at a loss 🤷♂️