Investigation

Diagnosing a WordPress database skyrocketing to 17+GB in hours

December 19th, 2024

This is a head scratcher! Last month, I spent a week or so sinking a lot of time into trying to work out why a client's site database was skyrocketing to over 17GB on-disk, while only being just over 400MB.

Follow me on a journey of spam, WordPress transients, plugins, and InnoDB overhead.

What are we working with?

The site is, as I have already mentioned, a WordPress site. It uses a number of plugins, the main ones being an events calendar, WooCommerce, and Dokan. It's a marketplace site, think Etsy, for local creators. I don't believe it has a huge amount of traffic, not that you'd think that from the CPU usage spikes.

I migrated the site's hosting for the client, cost saving, this migration meant that there was more attention on the logging, to check for any issues, but it also had a few more monitoring and stats than it had previously. This meant I was able to identify the main symptom that will be repeating throughout this post, around the database disk size vs the data size.

Identifying the symptoms

The site's new hosting environment is a cPanel based platform. I quite like cPanel, provides a decent interface and key stats. One of the stats is "Database Size". The hosting partner I use restricts this to about half of the overall quota allocated to the account. I have 50GB allocated to this account.

After the migration, everything looked good. Later in the evening, I logged into the cPanel to check how it was doing. The first thing I see is that the disk usage has shot up a lot. By around 9GB. I then noticed the database disk usage was over 9GB. Given the SQL I transferred was about 900MB uncompressed, that was a bit larger.

I logged into the PHPMyAdmin tool and checked the database to see what it was reporting as the size. At the time, this was around 900MB. Didn't match up with cPanel, but maybe cPanel is still updating its stats following the migration of the site, as the site did have about 24GB of other files, images, etc. I did do some initial checks, one of the tables I did find which was taking up most of the space was a sessions table, this contained over 500MB of data and a review of what is in here revealed it was mostly full of data that was expired and very old, so I cleaned out the old, expired data. Which brings us to our around 400MB data size in the database.

Fast-forward to the following day... I logged into the cPanel again, and now I see a yellow warning next to the disk usage, we're at 80% quota. Just under it was the database disk usage, now at over 17GB. Okay, that's not right at all. Checking the size PHPMyAdmin reports, it's a little larger, but not 16.5GB, maybe one or two MB.

At this point, I assume it was a cPanel issue, so I open a ticket with my provider. Turns out, not a cPanel issue...

An optimise an hour keeps the disk lower...

The hosting provider were able to look into this further, since they have direct access to the underlying DB files. They identified that the usermeta table has gained significant overhead, along with a few other tables. They triggered a table optimise on all tables. Once cPanel has refreshed its quota cache, this had resolved the issue. We were now back down to around a 500MB database usage, a little extra is fine since cPanel calculates it based on the size of the files rather than the information schema.

I left it overnight again, before checking the following morning. Now we're at 14GB. Same issue, hosting provider said that again there was significant overhead in the DB, and that the optimise command should just be run via PHPMyAdmin as needed. Except, triggering this effectively hourly to maintain the database size is not realistic. So, now the investigation begins.

Follow me on Bluesky!

If you're enjoying this article, follow me on Bluesky to know when I publish next!

Spam, transients, and bad plugins

Further investigation and research over the next couple of days made it clear we're dealing with a few different things that's causing our tables to have trouble clearing up data.

Spam users

First, is spam users and open registrations. The site had both a customer-facing registration area for purchasing or selling on the site, plus the register link on the admin login page was also enabled. So first, I introduced captcha on the customer-facing registration form, this way we can try to avoid spam sign-ups. Next, registrations via the admin screen were disabled.

Next, time to try and deal with the data left behind from spam users. I wrote a query that returns users where they have not been associated with any of the linked data relating to the site, if they have, then they're less likely to be spam. This took some time to craft to balance getting rid of as much spam data as possible but not impacting legitimate users. This allowed us to delete ~40,000 records in the users table, and over 1 million records in the usermeta table.

Transients and bad plugins

It was initially thought, both by myself and some helpful folks over on the WordPress sub Reddit, that the open registration and number of spam users/data is what was causing the issues. Especially since my hosting partner identified those tables as being the main problem.

Unfortunately, the next day and we were back at ~15GB database usage...

I then decided to change tactics. I started to look at the disk_free reported by InnoDB. I didn't start doing this until optimising the tables first, I wanted to prevent any issues with running out of disk usage. While running this, I noticed that:

1. the options table was having a lot of writes and deletes, effectively data being constantly replaced

2. the data_free on the options table growing rapidly over time

What could be causing this, surely we're not constantly updating/replacing options right... well I forgot that WordPress uses this table for its transients, effectively a form of cache for various things. Looking into this more and I found that WooCommerce and Dokan were both causing a very large amount of writes to this table, replacing or updating transients very often - I suspect via jobs being performed during user visits to calculate counts etc.

My understanding of transients, and the reason for caching in the first place, is to optimise the site when needing to perform relatively time-consuming computation. You run it once, cache it, then only re-compute when it changes. That was not the behaviour I was seeing though, instead, the transients were being constantly replaced, which over the span of a few hours was causing InnoDB to be unable to clean up the free space and spiralling out of control.

It could be bugs in these plugins, they are relatively old versions. However, I can't update them without authorisation from the client, which they haven't given and do not wish to have them updated. So, I instead went down the route of extracting the transients to a persistent object cache instead, getting them out of the DB.

The hosting platform I moved this client to, at their request, does not support the usual options Redis, and/or Memcached. However, I did find an interesting approach. A plugin that provides a persistent object cache by using the OPCache system. Effectively writing lots of PHP files with the serialised data, then allowing OPCache to cache these into memory, interesting approach, and appears to work very well.

After configuring the persistent object cache, I then made sure everything still worked, it did, and then started the waiting game again...

The next day, I logged in to check the usage... ~400MB. Wow! Excellent. I left it another day or two and yep other than the expected MB or two increase it seemed stable. The constant creating/deleting of transients must have been causing some awful fragmentation and making it difficult for InnoDB to clean up / reuse that leftover data.

I would have liked to investigate this further to really get to the bottom of the issue, especially with upgrading plugins, to see if it was inefficient old versions causing the issue. However, this migration was being completed for free, for reasons, and the client doesn't want to pay lots for new licences and any development to get the site up to date, so this is the best I could do with what I was working with.

I really do not enjoy debugging WordPress

Hopefully, if you are experiencing or have experienced something similar, that this article has been helpful to identify possible causes and/or solutions. Maybe you've experienced something like this before and would like to share your experience, let me know.

However, the experience I had throughout the over a week this took to diagnose is one of the reasons I really do not enjoy using or debugging WordPress. The odd technical choices of the core, the random plugins that have varying quality, and the difficulty to debug really makes this tricky to identify. This isn't something I've ever experienced with larger, more active products than this site, and just reinforces my opinion that anything other than a simple blog or informational site is likely better off with an alternative platform.

I'm available for Laravel, Statamic, and PHP projects!

Thanks for reading this article. If you're in the market for a developer for your next project, or upgrades and extensions to your existing project get in touch with me via my Grizzly Pumpkin brand.

I have over 8 years experience building complex, high-performance websites and web applications in Laravel and Statamic, or just plain ol' PHP. I'd love to discuss your project further!