Jump to content

Problem with CSV Import module


DaveP
 Share

Recommended Posts

I realise it's something of an edge case, but I'm struggling to import a CSV file containing 1,692,242 rows.

Firstly, a bit of background - the CSV is generated by phpMyAdmin, and contains the full UK postcode list and grid references like thus

"AB101AA","394251","806376"
"AB101AB","394251","806376"
"AB101AF","394181","806429"
"AB101AG","394251","806376"
"AB101AH","394371","806359"
"AB101AJ","394181","806429"

and so on. So while there are a lot of lines, not big ones - just 3 fields. They will be use to calculate the distance between 2 places (along the lines of a 'Store Locator' page).

The input fails when clicking the 'Continue to Step 2' button, and just reloads Step 1. This happens on files > about 250,000 rows. (250,000 works, 500,000 fails)

What I have tried/noticed

  • No PW errors, on screen or in error log.
  • No PHP errors ditto.
  • I've increased
    upload_max_filesize

    in php.ini to a figure well over the CSV file size

I will probably do the import in 250,000 line chunks (probably would have finished by now if I hadn't written this ??? ), but if I'm doing something wrong or it's a known limitation or a bug, maybe we are as well knowing.

  • Like 1
Link to comment
Share on other sites

And i thought i was being ambitious the other day while importing 13.000 rows/pages!

In step 2 the csv file is opened and it's contents read. Maybe some memory_limit issues?

Googling for something like 'php fgetcsv timeout' gives some results like this which describe similar issues. Interesting.

On a side note: wouldn't it be better to store this data is a separate/custom table?

  • Like 1
Link to comment
Share on other sites

@slkwrm max_execution_time is 360 (seconds) but that isn't relevant, as the CSV takes maybe 5 seconds to 'upload' (it's all happening on local WAMP) and the 'Step1' page reloads instantly after that.

@sinnut memory_limit is set at 256M, so shouldn't be an issue. At the moment the data is in a separate table, but I couldn't come up with a sensible query to find PW pages and​ do mySQL point geometry calculations. (My lack of skill in mySQL, undoubtedly.) Plus it seemed 'cleaner' to have everything in PW (new postcodes are created as new buildings are built, so there is always maintenance to do in the db).

Link to comment
Share on other sites

500K is quite a lot. So even 250k is a lot and surprized it works. It's not about time it takes to upload and the size of the csv. It will generate 250k pages in memory and it looks a lot like a memory or time execution limit.

Recently we had to up the memory limit up be able to upload 6MB+ images on a server so it's not alwas directly related.

I think importing such amount of data in chunks is your best bet. It was discussed in the CSV thread an other occasion here in the forums.

  • Like 1
Link to comment
Share on other sites

@sinnut memory_limit is set at 256M, so shouldn't be an issue. At the moment the data is in a separate table, but I couldn't come up with a sensible query to find PW pages and​ do mySQL point geometry calculations. (My lack of skill in mySQL, undoubtedly.) Plus it seemed 'cleaner' to have everything in PW (new postcodes are created as new buildings are built, so there is always maintenance to do in the db).

And how will you do this maintenance? Re-run the csv import and update every now and then using a csv from here via this?

I haven't had to do this but i would rather setup a cronjob using MySQL's LOAD DATA INFILE to import the CSV into it's own table. In this case it doesn't seem so clean to me to have all of this data as PW pages.

Depending on what you want you could make a module to work with your postcode table.

A bit different but modules like this also set up separate tables.

Link to comment
Share on other sites

I don't need all the 1million+ rows, just the ones that we have sites in, and any future sites in existing postcodes. That said there are 2000+ existing sites that I need to access the grid references for. Any new sites with brand new postcodes (very few indeed) can be looked up manually (probably here or here), but data input will be cleaner and with more consistent UX if it is part of PW. So TL;DR is it's a one-time import.

Link to comment
Share on other sites

When I'm importing pages and I need to import quantities above 1 million, I usually try and limit it to 450,000 new pages at a time. :lol: It is true though that the more memory and execution time you provide, the more you'll be able to do.

You also might find advantages to doing this from the API rather than ImportPagesCSV, as you've got more control over memory usage. Though I can't think of anything offhand that would be a memory concern in ImportPagesCSV, but also haven't tried it on a CSV file with 1.8 million rows. Still, major respect for at least attempting to take PW into the 1+ million page territory. Not many have done that, but you can.

Link to comment
Share on other sites

  • 2 years later...

<update class='very_late'>

Just happened across this thread, looking for something else. To finish off the story, I ended up adding another (non-PW) table to the db with phpMyAdmin, and using a couple of hand-crafted queries to access the data. That was probably the best solution in this case, since it allowed me to do spatial geometry stuff on the grid references that PW doesn't currently support.

</update>

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...