Just reduced my average load from 0.9 to 0.1

Major7

I have 1 virtual server with around 30 websites on it (OLS) and investigated why the load is that high (permanent 0.9, most websites use redis cache).
In the logfiles i figured out that so many AI-Bots are crawling the sites permanently.
After i added a rule to fail2ban this bots are now blocked (they come with thousands of different IPs so it lasts 1 hour until the most of them are in the fail2ban list.
It would be nice to have an option in Enhance where we could block server wide parts of the user agents and also block whole countries.

The steps to block AI-Bots:
apt install fail2ban
Create a file /etc/fail2ban/filter.d/badbots.conf with that contents (her you can extend the Bots

[Definition]
failregex = ^"<HOST>" "\d+" "\S+ \S+ \S+" "\d+" "\d+" "\d+" "\S*" ".*(GPTBot|bingbot|Amazonbot|BLEXBot|MJ12bot|ClaudeBot).*"
ignoreregex =

Create a file /etc/fail2ban/jail.local with that contents (with 3600 the IPs are banned for 1 hour, will set it to 1 day soon:

[badbots]
enabled  = true
port     = http,https
filter   = badbots
logpath  = /var/local/enhance/webserver_logs/*.log
backend  = polling
maxretry = 1
bantime  = 3600

Then do systemctl restart fail2ban.
After that we can watch the blocking orgy with tail -f /var/log/fail2ban.log.

Just to clearify: only do this when you don't care, if the AI-Bots cannnot index the websites anymore. It could have an impact when people ask things in ChatGPT like "give me some websites that sell flowers in london" and the Bots don't have your hosted websites in their index. Don't know in detail what happens behind the scenes.

Shaijee

Major7 /var/local/enhance/webserver_logs/*.log

I found that nearly every log file under /var/local/enhance/webserver_logs/* is 0 bytes (see screenshot). How can we use fail2ban with those logs, @Major7?

@Adam — can we access the actual website logs so we can secure them with Fail2Ban?

CloudyBlake

Major7 Could you do this through cloudflare also? I think they just rolled out an AI bot block function in beta.

JohnB

Major7 Never seen AI-Bots cause a major issue unless the sites are really poorly designed and aren't using caching.

Bad actor bots absolutely because they will scan using a bunch of different query strings and bunch of post requests looking for vulnerabilities.

I'd encourage you to test one of the sites with a free stress test tool like K6. Just the main pages on a few sites at the same time. Its likely to immediately crumble at the slightest pressure which is bad if a site gets popular even in a small niche group

mendozal

Shaijee Those are the logs, they're just rotated very quicky. I guess fail2ban will still be able to read them though.

rdbf

Shaijee can we access the actual website logs so we can secure them with Fail2Ban?

These are the actual ones, they get discarded and recreated every 5 minutes though. If you want persistent logging, it takes running additional scripts.

Shaijee

CloudyBlake Could you do this through cloudflare also? I think they just rolled out an AI bot block function in beta.

Yes — it’s very efficient and never reaches the server.

mendozal Those are the logs, they're just rotated very quicky. I guess fail2ban will still be able to read them though.

See the screenshot and the last modified dates.

Major7

mendozal yes, with backend=polling it can handle rotated logs.

@Enhance: It is important to have all the logs in place. In the past i had a few infected websites, because customers did not update their plugins/components/themes... within the logs we were able to investigate what happened in detail, what component was affected, when it happened, what was the post request. Sometimes we had to assist with authorities, when someone uploaded illegal stuff to a customers Nextcloud/Website and we were able to give ip addresses of the bad guys etc. With no logfiles at all we are blind in such situations. Also it would be nixe to have a tool like Webalizer, where customers can see all the details of the visitors.

pratik_asabe

@cPFence are we protected against such bots? If not it'd be great to see if cpfence includes this measure!

cPFence

pratik_asabe @cPFence are we protected against such bots? If not it'd be great to see if cpfence includes this measure!

Yes, you’re already protected against hundreds of bad bots by default in cPFence. AI bots aren’t blocked out of the box since we believe that can hurt SEO, with AI search now driving a lot of traffic (Google top results are often reserved for Gemini answers as well), and the number of people searching via AI keeps insanely growing every day. If you still want to block them, you can easily do it from your webui or cli.

And definitely don’t block bots like “bingbot” - mistakenly added by OP? - on shared hosting servers; otherwise, all your clients will vanish from Bing’s search results.

By the way, it’s not a good idea to use fail2ban with cPFence. cPFence already monitors logs and blocks failed logins and abusers, so you’ll just be wasting resources running both.

rdbf

Major7 yes, with backend=polling it can handle rotated logs.

They're not rotated, they are wiped every 5 minutes.

Major7 It is important to have all the logs in place.

Which is why I made nginx perform real-ip logging (actual IP, not cloudflare ones) and persistent logging. The Enhance package can do it, it's just not enabled. I want to be able to look back several days, use cat/grep/awk whatever to find what I want when something goes wrong.

https://github.com/rdbf/nginxtune-enhance

mrEckendonk

cPFence

SEO is being replaced by GEO. Only the old skool SEO companies won't you to know it, spending thousands of US$ of your budget.

As you have a sitemap for Google, you should have a LLM.txt file for A.I. to rank; don't stay behind this new trend.

pratik_asabe

cPFence Thank you for the clarification and i agree with some of your points..

JohnB

cPFence While you're here I'll ask you a question.

I'd like to test cPFence but when I go to "buy" the free trial I'm not sure what server IP I should enter. The CP server or the server I want to protect?

Can I protect more than one server during the trial?

cPFence

JohnB

Yes, enter the IP of the server you want to protect. Only one server can be added during the trial. If you have more questions, please open a ticket so we don’t break Enhance forum rules.

JohnB

cPFence Sure, thanks!