Jump to content

[SOLVED] InnoDB performance with CSV import way worse than MyISAM - does PW core need adapting?


Beluga
 Share

Recommended Posts

I have found that InnoDB performs badly when importing pages via CSV. I really should have tested this 1,5 years ago when @adrian suggested it (I guess I needed a rest). All the other testers back in 2017 obviously used MyISAM as their import results were so performant.

My results from today:

MyISAM:
30729 pages imported, 120 s timeout (actual time 4 min 42 s)

InnoDB:
7232 pages imported, 120 s timeout (actual time 9 min 10 s)

Used a stopwatch for the actual time.

Machine is i7-6700K, 32GB memory, SSD. Dev environment running under Docker.

PW 3.0.98. DB charset utf8mb4. PHP Version 7.2.5

Batch Child Editor used for importing.

Test case included as attachment. It has importable JSON for fields and the template and a CSV file.

CSV is tab separated and all fields are enclosed with the character ^

My MySQL cnf override is:

[mysqld]
innodb_buffer_pool_size = 16G # (adjust value here, 50%-70% of total RAM)
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 1 # may change to 2 or 0
innodb_flush_method = O_DIRECT
innodb_file_per_table=ON
innodb_stats_on_metadata = OFF
innodb_buffer_pool_instances = 8 # (or 1 if innodb_buffer_pool_size < 1GB)
query_cache_type = 0
query_cache_size = 0
innodb_autoinc_lock_mode = 2
innodb_io_capacity_max=6000
innodb_io_capacity=3000
key_buffer_size = 1024M
max_allowed_packet = 1024M
table_open_cache = 2000

It seems to me we can't affect the performance by simply tweaking InnoDB settings. I assume PW would need core changes to adapt to InnoDB here. Pinging @ryan

I did some digging and ran into an article, which details the bugginess of InnoDB FULLTEXT indexes.

Quote

InnoDB FULLTEXT indexes may be used, with care, when MyISAM or other engines/means to add fulltext search is not an option. [...] There are many other bugs to take into account if you ever plan to add any single FULLTEXT index to your InnoDB table.

I went through the list of bugs and only a couple were fixed (some were closed as wontfix). Kind of depressing ?

csv-testcase.zip

  • Like 1
Link to comment
Share on other sites

This is just a wild guess, but from one of the links posted here (the official mySQL docs), I guess this can be also a performance factor:

 

Use the multiple-row INSERT syntax to reduce communication overhead between the client and the server if you need to insert many rows:

INSERT INTO yourtable VALUES (1,2), (5,5), ...;

This tip is valid for inserts into any table, not just InnoDB tables.

 

I don't know if this syntax is being used by either PW core or the Batch Child Editor module. But I can imagine that it really makes a difference with lots of data being manipulated in one big query...

Link to comment
Share on other sites

31 minutes ago, netcarver said:

Hmm, I guess InnoDB may be auto-committing on every insert. Try turning autocommit off for bulk imports - but it might not be a panacea.

Eh, I added autocommit=0 to my cnf override, but it turns out PW is unable to create any pages (failed when I tried to create the parent page).

Btw. another perf data point with an InnoDB install: Trash with 7232 pages emptied in 6 min 35 s. I don't think any of this is related to the CSV import module used.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...