DaveP Posted September 17, 2012 Share Posted September 17, 2012 I realise it's something of an edge case, but I'm struggling to import a CSV file containing 1,692,242 rows. Firstly, a bit of background - the CSV is generated by phpMyAdmin, and contains the full UK postcode list and grid references like thus "AB101AA","394251","806376" "AB101AB","394251","806376" "AB101AF","394181","806429" "AB101AG","394251","806376" "AB101AH","394371","806359" "AB101AJ","394181","806429" and so on. So while there are a lot of lines, not big ones - just 3 fields. They will be use to calculate the distance between 2 places (along the lines of a 'Store Locator' page). The input fails when clicking the 'Continue to Step 2' button, and just reloads Step 1. This happens on files > about 250,000 rows. (250,000 works, 500,000 fails) What I have tried/noticed No PW errors, on screen or in error log. No PHP errors ditto. I've increased upload_max_filesize in php.ini to a figure well over the CSV file size I will probably do the import in 250,000 line chunks (probably would have finished by now if I hadn't written this ), but if I'm doing something wrong or it's a known limitation or a bug, maybe we are as well knowing. 1 Link to comment Share on other sites More sharing options...
slkwrm Posted September 17, 2012 Share Posted September 17, 2012 Have you tried to increase max_execution_time in php.ini? Not sure it helps, but maybe worth trying. Link to comment Share on other sites More sharing options...
SiNNuT Posted September 17, 2012 Share Posted September 17, 2012 And i thought i was being ambitious the other day while importing 13.000 rows/pages! In step 2 the csv file is opened and it's contents read. Maybe some memory_limit issues? Googling for something like 'php fgetcsv timeout' gives some results like this which describe similar issues. Interesting. On a side note: wouldn't it be better to store this data is a separate/custom table? 1 Link to comment Share on other sites More sharing options...
DaveP Posted September 17, 2012 Author Share Posted September 17, 2012 @slkwrm max_execution_time is 360 (seconds) but that isn't relevant, as the CSV takes maybe 5 seconds to 'upload' (it's all happening on local WAMP) and the 'Step1' page reloads instantly after that. @sinnut memory_limit is set at 256M, so shouldn't be an issue. At the moment the data is in a separate table, but I couldn't come up with a sensible query to find PW pages and do mySQL point geometry calculations. (My lack of skill in mySQL, undoubtedly.) Plus it seemed 'cleaner' to have everything in PW (new postcodes are created as new buildings are built, so there is always maintenance to do in the db). Link to comment Share on other sites More sharing options...
Soma Posted September 17, 2012 Share Posted September 17, 2012 500K is quite a lot. So even 250k is a lot and surprized it works. It's not about time it takes to upload and the size of the csv. It will generate 250k pages in memory and it looks a lot like a memory or time execution limit. Recently we had to up the memory limit up be able to upload 6MB+ images on a server so it's not alwas directly related. I think importing such amount of data in chunks is your best bet. It was discussed in the CSV thread an other occasion here in the forums. 1 Link to comment Share on other sites More sharing options...
SiNNuT Posted September 17, 2012 Share Posted September 17, 2012 @sinnut memory_limit is set at 256M, so shouldn't be an issue. At the moment the data is in a separate table, but I couldn't come up with a sensible query to find PW pages and do mySQL point geometry calculations. (My lack of skill in mySQL, undoubtedly.) Plus it seemed 'cleaner' to have everything in PW (new postcodes are created as new buildings are built, so there is always maintenance to do in the db). And how will you do this maintenance? Re-run the csv import and update every now and then using a csv from here via this? I haven't had to do this but i would rather setup a cronjob using MySQL's LOAD DATA INFILE to import the CSV into it's own table. In this case it doesn't seem so clean to me to have all of this data as PW pages. Depending on what you want you could make a module to work with your postcode table. A bit different but modules like this also set up separate tables. Link to comment Share on other sites More sharing options...
DaveP Posted September 17, 2012 Author Share Posted September 17, 2012 I don't need all the 1million+ rows, just the ones that we have sites in, and any future sites in existing postcodes. That said there are 2000+ existing sites that I need to access the grid references for. Any new sites with brand new postcodes (very few indeed) can be looked up manually (probably here or here), but data input will be cleaner and with more consistent UX if it is part of PW. So TL;DR is it's a one-time import. Link to comment Share on other sites More sharing options...
SiNNuT Posted September 17, 2012 Share Posted September 17, 2012 Ah ok, sorry that i misunderstood. From the TS i got the impression you needed an up to date full postcode list to work with regularly. Still don't understand the use case here, but nvm. Link to comment Share on other sites More sharing options...
ryan Posted September 18, 2012 Share Posted September 18, 2012 When I'm importing pages and I need to import quantities above 1 million, I usually try and limit it to 450,000 new pages at a time. It is true though that the more memory and execution time you provide, the more you'll be able to do. You also might find advantages to doing this from the API rather than ImportPagesCSV, as you've got more control over memory usage. Though I can't think of anything offhand that would be a memory concern in ImportPagesCSV, but also haven't tried it on a CSV file with 1.8 million rows. Still, major respect for at least attempting to take PW into the 1+ million page territory. Not many have done that, but you can. Link to comment Share on other sites More sharing options...
DaveP Posted March 6, 2015 Author Share Posted March 6, 2015 <update class='very_late'> Just happened across this thread, looking for something else. To finish off the story, I ended up adding another (non-PW) table to the db with phpMyAdmin, and using a couple of hand-crafted queries to access the data. That was probably the best solution in this case, since it allowed me to do spatial geometry stuff on the grid references that PW doesn't currently support. </update> Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now