Servers
Getting DDoSed by a bot...
For the last few weeks, around April 26th/April 27th, a client site that they manage (it's their client's) had been having intermittent crashes. This was causing a DB table to become corrupt and require manual repairing to bring the site back up. Then it'll be up for a few days and then happen again, repeat for the last few weeks or so.
Initial investigation
The cause of this was very unusual. I personally hadn't encountered MySQL tables regularaly corrupt. It's a WordPress site with some quite complex plugins, that also appear fairly unoptimised. I don't make it my habit to work on these sites, but since I do contracted work on occasion for this client, I thought I'd investigate further, as it has peaked my interest.
My first step was to start checking log files, for WordPress and MySQL. Nothing was super obvious other than entries about being unable to write to the corrupted table, plus around the same time reports that the server had crashed.
I'm no DB expert and so I immediately went to do research, much more knowledgably people on the internet on this topic. Initially I didn't realise that the database table was MyISAM, which, if I understand correctly from my research, has a habit of becoming corrupt if it was in the middle of an operation as the server crashes. Interesting...
So, I blamed it on bad optimisation of WordPress and the plugins being used. Since I couldn't find any concrete examples I repaired the table, site was restored and I called it done.
Couple weeks later
A couple weeks later I was sent an email from my client (from their client, client inception!) that it was offline again. Interesting, so as soon as I could (after I had finished my 9 - 5) I logged on to have a look.
Same issue, the same table wp_options
has crashed and become corrupt. Once again, I repaired it so the site was online again, and then checked the logs again see if there's any patterns.
I couldn't really find anything other than that some of the records being added were related to WooCommerce's transients for attribute counts. I'm no WordPress/WooCommerce expert but from my research this is cache data WooCommerce creates for the filtering attributes, I think, so that counts do not have to be generated every time.
Again, nothing out the ordinary other than a GitHub issue I found about these transients not being the most optimised and causing large tables (I'm unfortunately having trouble finding the issue again).
So, again, I ruled out anything else and just blamed WordPress and its plugins.
Is WordPress optimisation really this bad...
Fast forward to the day before I'm writing this article. It had now been another week or so since the last crash. This time, I was checking the site every now and then, and noticed it was down late one night. I fixed the table, then noted to check the next morning.
The next morning, the site is down again! This was unusual, so once again I brought the site back up and decided to do some more research. Since MySQL was crashing I assumed it was maybe unoptimised so I performed some checks and completed a couple of actions, again I'm no DB expert:
Converted any MyISAM tables to InnoDB. The
wp_options
table was actually MyISAM, I didn't think it was up to this point.Adjusted some MySQL config, disabling name resolution, and setting the InnoDB buffer pool size - this was because by this point I had seen MySQL was running out of memory causing the crash
The first change, I believe should improve the likelihood of corruption when the server reboots/crashes. I do find InnoDB to generally be better.
The second change was more of a scraping the barrel attempt given I was witnessing memory running out.
After making these changes, I then kept htop
running on my other monitor while I was doing other things. Then... it happened again:
The two cores were maxed out at over 100% - plus more as the server was reporting very high load
Memory was maxed out at 4GB
Swap was maxed out too
That's not good. I managed to get into MySQL so I could run SHOW PROCESSLIST;
and the returned results all looked normal, just requests for data or updating data. The site is not very high traffic, maybe a hundred or so visits per day. Is WordPress/WooCommerce or the other plugins really this badly optimised...
AI revolution...
The answer to WordPress/plugin optimisation is... kind of. So I do still feel the resource usage from a single page load on this site is pretty terrible. I'm sure experienced WordPress developer would be able to improve this, but I just find it ridiculous that people take a blogging platform at its core and turn it into e-commerce stores, or marketplaces etc. That's not what this article is about though...
I made a last ditch effort to have another look at the access logs for the web server, since I just witnessed the issue I wanted to see if there was anything unusual. Then a lot of requests caught my eye, wit an interesting user agent - Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])
ClaudeBot/1.0
you're making a lot of requests whoever you are. So I did what anyone would do in this situation, I Googled it!
I immediately came across a Reddit thread about this issue. Interestingly too, they also had the issue since around 26th/27th April. Within that thread was also a link to a post on the Linux Mint forum, as they have experienced a similar issue where the forum was taken down due to the amount of traffic.
I think it is absolutely ridiculous that a company is being so negligent with their scraper. Searching through the rest of the logs and I found that that the bot multiple times per second was accessing every possible combination of filters on this site, each time it seems like it is potentially recomputing and updating various cached values relating to the WooCommerce attributes. Thus exaggerating the possible performance issues in the site and plugins.
ClaudeBot get lost...
So, how we can tell ClaudeBot to get lost and save our server from all this unnecessary traffic? Well, usually a "good crawler" would look to see if it has been disallowed via the robots.txt
file. Replies on the Reddit post also seemed to confirm this, so I added the following:
1User-agent: ClaudeBot
2Disallow: /
After saving that the requests then stoppe... no they didn't. So, I took the sledgehammer approach. I initially added a rule to block the requests in the WAF installed in WordPress. This way I was able to see the requests coming in and get blocked.
Next, I wanted to block it before it even got to PHP. So, I did some further research for Apache2 (haven't edited the config for one in some time) and implemented the following change to the htaccess file:
1<IfModule mod_rewrite.c>
2 RewriteEngine On
3 RewriteCond %{HTTP_USER_AGENT} ^.*(ClaudeBot).*$ [NC]
4 RewriteRule .* - [F,L]
5</IfModule>
After saving this file, instantly the WAF log stopped getting new entries from ClaudeBot. The server resource usage has dropped substantially, and, so far at the time of writing, the server has not been overloaded again.
Final words
I'm sure there are still some performance issues with the site, weather that is WordPress or the plugins I do not know. Honestly, I don't care that much either, I don't really touch WordPress sites unless it's an emergency, I just find them horrible to work with. Especially when it is being forced to do more than just a blog or informational website.
However, Anthropic if for some reason you are reading this and have managed to reached it to the end. Please fuck off, you're lack of respect for sites is causing damage and financial loss of smaller organisations who do not have the resources to handle your, effectively, DDoS attack. Respect robots file, and honestly, stop training your models on other peoples work without their permission!