elabx Posted June 3, 2020 Share Posted June 3, 2020 Hi! I am iterating a CSV with a few million rows, that I have already imported into ProcessWire and I want to add some more information. This is a sample of the script (the part that matters), ran on the command line, which iterates through the CSV tries to find an existing record and if successful, updates a page reference field. The problem I'm having is that the script is bulking up memory, and well, eventually passes out. I'd love to know if anyone knows what I could be doing wrong in terms of releasing memory. // This function is called once in the next foreach loop, on every iteration. function getCategories($record, $limit){ $categories = new PageArray(); for ($i = 1; $i < $limit; $i++) { $code = wire('sanitizer')->selectorValue($record["Field_" . $i]); if ($code) { $selector = "template=category, title=$code"; //echo "$selector \n"; $found = wire('pages')->get($selector, ['cache' => false]); if ($found->id) { $categories->add($found); } } } return $categories; } $csv = Reader::createFromPath('./file.csv', 'r'); $csv->setHeaderOffset(0); //Iterable object from PHP CSV League class $records = $csv->getRecords(); foreach ($records as $offset => $record) { if ($offset == 1) { $database->beginTransaction(); } if ($offset % 100 == 0) { if($database->inTransaction()){ $database->commit(); } $database->beginTransaction(); echo "Passed $offset... \n"; //Here's where I check the memory usage echo round(memory_get_usage(true) / 1048576.2) . PHP_EOL; } $existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}", ['cache' => false]); if($existing->id && !$existing->categories->count){ $categories = getCategories($record, 15); $existing->of(false); $existing->categories = $categories; $existing->save('categories'); } $pages->uncacheAll(); } So far, what I've tried is removing the $pages->get() call and it just works fine, memory stays at around 16M. I've also used a similar script when creating the pages, and in that scenario $Pages->uncacheAll() seemed to worked well. Link to comment Share on other sites More sharing options...
Robin S Posted June 3, 2020 Share Posted June 3, 2020 Not sure if it will make a big difference but you can avoid loading the categories PageArray (which happens when you do $existing->categories->count) on pages that you don't need to update. $existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}, categories.count=0", ['cache' => false]); if($existing->id) { // ... Link to comment Share on other sites More sharing options...
teppo Posted June 3, 2020 Share Posted June 3, 2020 Likely irrelevant, but... // wire('pages') $existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}", ['cache' => false]); ... // $pages $pages->uncacheAll(); I don't see anything obviously "wrong" with your script, which is why I'm wondering if it could be possible that this uncache call isn't doing what it should? 2 Link to comment Share on other sites More sharing options...
flydev Posted June 3, 2020 Share Posted June 3, 2020 How is created the CSV file ? I mean on which computer ? You can ran into memory issue because of line ending badly detected when for example a CSV file is created on MacOS, see : Quote Warning: If your CSV document was created or is read on a Macintosh computer, add the following lines before using the library to help PHP detect line ending. if (!ini_get("auto_detect_line_endings")) { ini_set("auto_detect_line_endings", '1'); } Other thing, try importing the data with TracyDebugger disabled. Other thing, instead of using getRecords() : $reader = \League\Csv\Reader::createFromPath('./file.csv', 'r'); $stmt = (new \League\Csv\Statement())->offset(2); foreach ($stmt->process($reader) as $record) { print_r($record); } or instead loading the whole file : $reader = \League\Csv\Reader::createFromPath('./file.csv', 'r'); foreach ($reader as $record) { print_r($record); } 1 Link to comment Share on other sites More sharing options...
elabx Posted June 3, 2020 Author Share Posted June 3, 2020 Guys I wanna shoot myself, I randomly restarted the script, no updates, and it is going smooth at 16M of memory usage, now I feel the server hates me lol. I have no idea how this could have happened. 4 hours ago, flydev ?? said: How is created the CSV file ? I mean on which computer ? You can ran into memory issue because of line ending badly detected when for example a CSV file is created on MacOS, see : Wow I had no idea this could happen, in this case I've already tested iterating trough the whole CSV and no memory issues even a count($records) works really well. Thanks everyone for your help! I'll get back on what I find out. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now