Jump to content

Optimising pages_meta - a recent disk usage problem and Claude's suggestions...


Recommended Posts

Posted

While trying to find the cause of disk usage increasing by several GB per week, I discovered a file inside /var/lib/mysql that was indeed increasing by several GB per week, sometimes up to a GB a day.

This file is used for fulltext indexing: fts_000000000000dbd8_0000000000013612_index_1.ibd

By running optimisation on the database tables, I was able to 'reset' the file. I thought it was maybe a bug in the database, so tried setting up a fresh database copy, but the issue persisted.

Time to call Claude... Here's the full conversation: https://claude.ai/share/94de4c96-4a04-4fae-b3c5-de95225316bd

Long story short, I was using pages_meta to record statistics tied to pages/users, so write operations on the table were pretty frequent. It seems a bit absurd to me that this could cause a index file to balloon to such a ridiculous size (it was 100GB when first discovered) as it isn't a high traffic site and the amount of data in the table is minimal (2MB). However because of the frequent writes, the indexing file was being frequently appended to and for some reason it doesn't prune itself it just keeps expanding. I've re-written the stats so they are stored on the filesystem instead of the database, so far that seems to be working.

Claude brought a couple of things up, which would probably be worth reviewing @ryan:

Why is there a fulltext index on the data column?

Quote

Questions worth asking yourself:

  1. Do you actually use MATCH ... AGAINST queries on this table? If the FULLTEXT index isn't being actively used for search queries, you can simply drop it — that alone will stop the file growing and let you reclaim all that space
  2. What's in data? If it's storing large structured content (JSON, HTML, serialised data), a FULLTEXT index on it is particularly wasteful since most of the tokenised content won't be meaningful search terms.
  3. Is search on this table critical? If you do need full-text search, with only 4,000 rows you could likely get away with a LIKE query or move search to a dedicated tool like Elasticsearch/Meilisearch rather than bearing the overhead of InnoDB's FTS.

No primary key

Quote

The table has no PRIMARY KEY. InnoDB requires one internally, so it silently creates a hidden 6-byte row ID. This is inefficient and also means OPTIMIZE TABLE has to work harder when rebuilding. You should add one — source_id and nametogether look like a natural composite key

Ultimately, I think the issue here is how I've gone about using pages_meta - It isn't the right solution for frequently updated datasets - but I thought I'd pass this on for consideration, and also for anyone else who might find themselves in the puzzling situation of a ballooning index_1.ibd file.

Cheers,

Chris

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   1 member

×
×
  • Create New...