Command line script memory issue

elabx · June 3, 2020

Hi! I am iterating a CSV with a few million rows, that I have already imported into ProcessWire and I want to add some more information. This is a sample of the script (the part that matters), ran on the command line, which iterates through the CSV tries to find an existing record and if successful, updates a page reference field. The problem I'm having is that the script is bulking up memory, and well, eventually passes out.

I'd love to know if anyone knows what I could be doing wrong in terms of releasing memory.

// This function is called once in the next foreach loop, on every iteration.
function getCategories($record, $limit){
	$categories = new PageArray();
	for ($i = 1; $i < $limit; $i++) {
		$code = wire('sanitizer')->selectorValue($record["Field_" . $i]);
		if ($code) {
			$selector = "template=category, title=$code";
			//echo "$selector \n";
			$found = wire('pages')->get($selector, ['cache' => false]);
			if ($found->id) {
				$categories->add($found);
			} 
		}
	}
    
	return $categories;
}

$csv = Reader::createFromPath('./file.csv', 'r');
$csv->setHeaderOffset(0);

//Iterable object from PHP CSV League class
$records = $csv->getRecords();

foreach ($records as $offset => $record) {
 
    if ($offset == 1) {
		$database->beginTransaction();
    }

     if ($offset % 100 == 0) {
         if($database->inTransaction()){
             $database->commit();
         }
         $database->beginTransaction();
         echo "Passed $offset... \n";

		 //Here's where I check the memory usage
         echo round(memory_get_usage(true) / 1048576.2) . PHP_EOL;
     }
    
     $existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}", ['cache' => false]);

    
     if($existing->id && !$existing->categories->count){
        $categories = getCategories($record, 15);
        $existing->of(false);
        $existing->categories = $categories;
        $existing->save('categories');
     }

     $pages->uncacheAll();

}

So far, what I've tried is removing the $pages->get() call and it just works fine, memory stays at around 16M. I've also used a similar script when creating the pages, and in that scenario $Pages->uncacheAll() seemed to worked well.

Robin S · June 3, 2020

Not sure if it will make a big difference but you can avoid loading the categories PageArray (which happens when you do $existing->categories->count) on pages that you don't need to update.

$existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}, categories.count=0", ['cache' => false]);
if($existing->id) {
    // ...

teppo · June 3, 2020

Likely irrelevant, but...

     // wire('pages')
     $existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}", ['cache' => false]);

     ...

     // $pages
     $pages->uncacheAll();

I don't see anything obviously "wrong" with your script, which is why I'm wondering if it could be possible that this uncache call isn't doing what it should?

flydev · June 3, 2020

How is created the CSV file ? I mean on which computer ? You can ran into memory issue because of line ending badly detected when for example a CSV file is created on MacOS, see :

Quote
Warning: If your CSV document was created or is read on a Macintosh computer, add the following lines before using the library to help PHP detect line ending.
if (!ini_get("auto_detect_line_endings")) {
    ini_set("auto_detect_line_endings", '1');
}

Other thing, try importing the data with TracyDebugger disabled.

Other thing, instead of using getRecords() :

$reader = \League\Csv\Reader::createFromPath('./file.csv', 'r');
$stmt = (new \League\Csv\Statement())->offset(2);
foreach ($stmt->process($reader) as $record) {
    print_r($record);
}

or instead loading the whole file :

$reader = \League\Csv\Reader::createFromPath('./file.csv', 'r');
foreach ($reader as $record) {
    print_r($record);
}

elabx · June 3, 2020

Guys I wanna shoot myself, I randomly restarted the script, no updates, and it is going smooth at 16M of memory usage, now I feel the server hates me lol. I have no idea how this could have happened.

4 hours ago, flydev ?? said:

How is created the CSV file ? I mean on which computer ? You can ran into memory issue because of line ending badly detected when for example a CSV file is created on MacOS, see :

Wow I had no idea this could happen, in this case I've already tested iterating trough the whole CSV and no memory issues even a count($records) works really well.

Thanks everyone for your help! I'll get back on what I find out.

Sign In

Command line script memory issue

Recommended Posts

elabx

Link to comment

Share on other sites

Robin S

Link to comment

Share on other sites

teppo

Link to comment

Share on other sites

flydev

Link to comment

Share on other sites

elabx

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

Activity

My Activity Streams

Store

My Details

Support