Jump to content

Optimising pages_meta - a recent disk usage problem and Claude's suggestions...


Recommended Posts

Posted

While trying to find the cause of disk usage increasing by several GB per week, I discovered a file inside /var/lib/mysql that was indeed increasing by several GB per week, sometimes up to a GB a day.

This file is used for fulltext indexing: fts_000000000000dbd8_0000000000013612_index_1.ibd

By running optimisation on the database tables, I was able to 'reset' the file. I thought it was maybe a bug in the database, so tried setting up a fresh database copy, but the issue persisted.

Time to call Claude... Here's the full conversation: https://claude.ai/share/94de4c96-4a04-4fae-b3c5-de95225316bd

Long story short, I was using pages_meta to record statistics tied to pages/users, so write operations on the table were pretty frequent. It seems a bit absurd to me that this could cause a index file to balloon to such a ridiculous size (it was 100GB when first discovered) as it isn't a high traffic site and the amount of data in the table is minimal (2MB). However because of the frequent writes, the indexing file was being frequently appended to and for some reason it doesn't prune itself it just keeps expanding. I've re-written the stats so they are stored on the filesystem instead of the database, so far that seems to be working.

Claude brought a couple of things up, which would probably be worth reviewing @ryan:

Why is there a fulltext index on the data column?

Quote

Questions worth asking yourself:

  1. Do you actually use MATCH ... AGAINST queries on this table? If the FULLTEXT index isn't being actively used for search queries, you can simply drop it — that alone will stop the file growing and let you reclaim all that space
  2. What's in data? If it's storing large structured content (JSON, HTML, serialised data), a FULLTEXT index on it is particularly wasteful since most of the tokenised content won't be meaningful search terms.
  3. Is search on this table critical? If you do need full-text search, with only 4,000 rows you could likely get away with a LIKE query or move search to a dedicated tool like Elasticsearch/Meilisearch rather than bearing the overhead of InnoDB's FTS.

No primary key

Quote

The table has no PRIMARY KEY. InnoDB requires one internally, so it silently creates a hidden 6-byte row ID. This is inefficient and also means OPTIMIZE TABLE has to work harder when rebuilding. You should add one — source_id and nametogether look like a natural composite key

Ultimately, I think the issue here is how I've gone about using pages_meta - It isn't the right solution for frequently updated datasets - but I thought I'd pass this on for consideration, and also for anyone else who might find themselves in the puzzling situation of a ballooning index_1.ibd file.

Cheers,

Chris

  • Like 1
Posted
2 hours ago, nbcommunication said:

Ultimately, I think the issue here is how I've gone about using pages_meta - It isn't the right solution for frequently updated datasets - but I thought I'd pass this on for consideration, and also for anyone else who might find themselves in the puzzling situation of a ballooning index_1.ibd file.

Kudos! Thanks for taking the time to share.  I can clearly picture myself making the same assumptions about page_meta haha.

What was the type of data you were storing? JSON?

  • Like 1
Posted

Hi @elabx,

Yes, but very small amounts. The kicker here is that it doesn't actually matter what is being stored, it is the frequency of writes that causes the index file to balloon. This caused me a lot of confusion, as I have sites where I store much larger JSON values in pages_meta and they don't appear to have the problem. Ultimately they do, but because writes are much less frequent the index file hasn't grown that much in comparison.

Cheers,

Chris

  • Like 1
Posted

Ok got it!

Maybe this is actually a bug? I sort of remember @ryan mentioning that meta weren't really built with search in mind?  So maybe just drop the index creation and drop it on updates?

Posted

@elabx - that's what Claude suggests. It looks more like a poor implementation in InnoDB itself to me but I'm sure there's reasons for why it behaves the way it does.

Either way, I'd definitely be interested to hear what @ryan thinks.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...