Beluga Posted February 23, 2017 Share Posted February 23, 2017 (edited) I have a problem with the performance of importing CSV data with ~35k rows and 17 columns as pages. This is all on my local dev environment, PW 3.0.42, Ubuntu 16.10 virtual machine with SSD, 16GB memory and 4 CPU cores allocated (6th gen i7). PHP is 7.1 and MariaDB is 10.1.21. So the setup if beefier compared to the server environment this project would be living in, at least in the CPU department. So far I have tried with Ryan's CSV import module. I started the import at 14:30, left it running as I went to sleep. When I woke up and checked it, Apache server had somehow borked. 14k pages existed. I tried to restart apache2, but it coredumped. After a restart of the VM, Apache worked again. If we assume the import ran for 10 hours (might have run longer), that's about 23 pages per minute. Seems it could go faster. Next I will test with @adrian's Batch Child Editor. If someone wants to conduct their own testing, attached is a zip file with - real world CSV data, from a proverb research database, Tab separated and enclosed with ^ characters (because Ryan's module doesn't work with pure Tab separated). I have added dummy "title" field data, so it can be used with any of the importers without hacks. - the needed fields exported from PW - my php.ini - my my.cnf csv-testcase.zip Edited February 23, 2017 by cstevensjr Moved from Modules/Plugins Section Link to comment Share on other sites More sharing options...
adrian Posted February 23, 2017 Share Posted February 23, 2017 I just did a quick test with BCE using your test file and by the time it hit my 120 timeout, it had created 4898 pages, so 2449 pages per minute isn't too bad. Curious what you find at your end though. 2 Link to comment Share on other sites More sharing options...
Zeka Posted February 23, 2017 Share Posted February 23, 2017 Also made small test with Ryan's module. Were added 9359 ( 3119 per minute) pages in 180s before timeout. 2 Link to comment Share on other sites More sharing options...
Beluga Posted February 23, 2017 Author Share Posted February 23, 2017 Fantastic work, guys. Zeka's result proves that my local setup is somehow silly. Maybe it is too primitive: I don't even use php-fpm. However, you can imagine my surprise, when I returned back home just now after leaving the BCE import running: it had imported all the 35k pages successfully! I should set my local environment up like I have on the server, with php-fpm and Caddy. Thank you for restoring my sanity and trust in PW. Link to comment Share on other sites More sharing options...
adrian Posted February 23, 2017 Share Posted February 23, 2017 Glad BCE worked for you I am curious though what happened for you with Ryan's module. My local dev setup is definitely nothing special. A 2011 MacBook Pro with 8GB RAM and PHP 7.1 all setup via HomeBrew. I feel like something else must be amiss. I have never used MariaDB - I know it is supposed to be faster if anything and completely compatible with MySQL, but who knows? 1 Link to comment Share on other sites More sharing options...
Robin S Posted February 23, 2017 Share Posted February 23, 2017 @Beluga, I think the ImportPagesCSV module should be able to handle 35K rows with the right PHP settings. Ryan has talked about needing to split into batches above 450K rows (!) so 35K should be no problem. 1 Link to comment Share on other sites More sharing options...
Beluga Posted February 23, 2017 Author Share Posted February 23, 2017 I have MariaDB on my server, it's fine. I think I will try this dockerized Caddy + php-fpm and skip the VM https://github.com/abiosoft/caddy-docker Link to comment Share on other sites More sharing options...
Beluga Posted February 26, 2017 Author Share Posted February 26, 2017 Ok, I set up a Docker dev environment with Caddy, php-fpm and MariaDB and now I could import the whole 35k rows with Ryan's module in about 2 hours. Still, that is much worse than Zeka's result. Maybe I have to tweak some php-fpm and MariaDB settings.. I just don't know how exactly. Link to comment Share on other sites More sharing options...
Beluga Posted February 27, 2017 Author Share Posted February 27, 2017 I realized tuning php-fpm is useless as it only helps when dealing with multiple users. I tuned MariaDB instead. I had created the PW install with InnoDB. Sadly, it took an equivalent amount of time to import with Ryan's module. Even emptying the trash with 35k pages took 30 minutes and I always get "Connection to server lost" while doing either import or trashing. It then does not allow me in the admin until it has finished the operation. Next I will try with BCE. Here is my override for my.cnf: Spoiler [mysqld] innodb_buffer_pool_size = 16G # (adjust value here, 50%-70% of total RAM) innodb_additional_mem_pool_size = 16M innodb_log_file_size = 256M innodb_flush_log_at_trx_commit = 1 # may change to 2 or 0 innodb_flush_method = O_DIRECT innodb_file_per_table=ON innodb_stats_on_metadata = OFF innodb_buffer_pool_instances = 8 # (or 1 if innodb_buffer_pool_size < 1GB) query_cache_type = 0 query_cache_size = 0 innodb_autoinc_lock_mode = 2 innodb_io_capacity_max=6000 innodb_io_capacity=3000 key_buffer_size = 1024M max_allowed_packet = 1024M table_open_cache = 2000 Link to comment Share on other sites More sharing options...
adrian Posted February 27, 2017 Share Posted February 27, 2017 I'd love to see some results with MyISAM on the same server if you have the time. Link to comment Share on other sites More sharing options...
Beluga Posted February 28, 2017 Author Share Posted February 28, 2017 Still with InnoDB, I tried with BCE. It only created 3660 pages out of 35k. I wonder why that is and if it's related to the "Connection to server lost" behavior. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now