Command line script memory issue

elabx · June 3, 2020

Hi! I am iterating a CSV with a few million rows, that I have already imported into ProcessWire and I want to add some more information. This is a sample of the script (the part that matters), ran on the command line, which iterates through the CSV tries to find an existing record and if successful, updates a page reference field. The problem I'm having is that the script is bulking up memory, and well, eventually passes out.

I'd love to know if anyone knows what I could be doing wrong in terms of releasing memory.

// This function is called once in the next foreach loop, on every iteration.
function getCategories($record, $limit){
	$categories = new PageArray();
	for ($i = 1; $i < $limit; $i++) {
		$code = wire('sanitizer')->selectorValue($record["Field_" . $i]);
		if ($code) {
			$selector = "template=category, title=$code";
			//echo "$selector \n";
			$found = wire('pages')->get($selector, ['cache' => false]);
			if ($found->id) {
				$categories->add($found);
			} 
		}
	}
    
	return $categories;
}

$csv = Reader::createFromPath('./file.csv', 'r');
$csv->setHeaderOffset(0);

//Iterable object from PHP CSV League class
$records = $csv->getRecords();

foreach ($records as $offset => $record) {
 
    if ($offset == 1) {
		$database->beginTransaction();
    }

     if ($offset % 100 == 0) {
         if($database->inTransaction()){
             $database->commit();
         }
         $database->beginTransaction();
         echo "Passed $offset... \n";

		 //Here's where I check the memory usage
         echo round(memory_get_usage(true) / 1048576.2) . PHP_EOL;
     }
    
     $existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}", ['cache' => false]);

    
     if($existing->id && !$existing->categories->count){
        $categories = getCategories($record, 15);
        $existing->of(false);
        $existing->categories = $categories;
        $existing->save('categories');
     }

     $pages->uncacheAll();

}

So far, what I've tried is removing the $pages->get() call and it just works fine, memory stays at around 16M. I've also used a similar script when creating the pages, and in that scenario $Pages->uncacheAll() seemed to worked well.

Robin S · June 3, 2020

Not sure if it will make a big difference but you can avoid loading the categories PageArray (which happens when you do $existing->categories->count) on pages that you don't need to update.

$existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}, categories.count=0", ['cache' => false]);
if($existing->id) {
    // ...

teppo · June 3, 2020

Likely irrelevant, but...

     // wire('pages')
     $existing = wire('pages')->get("template=supplier, unique_id={$record['Unique ID']}", ['cache' => false]);

     ...

     // $pages
     $pages->uncacheAll();

I don't see anything obviously "wrong" with your script, which is why I'm wondering if it could be possible that this uncache call isn't doing what it should?

flydev · June 3, 2020

How is created the CSV file ? I mean on which computer ? You can ran into memory issue because of line ending badly detected when for example a CSV file is created on MacOS, see :

Quote
Warning: If your CSV document was created or is read on a Macintosh computer, add the following lines before using the library to help PHP detect line ending.
if (!ini_get("auto_detect_line_endings")) {
    ini_set("auto_detect_line_endings", '1');
}

Other thing, try importing the data with TracyDebugger disabled.

Other thing, instead of using getRecords() :

$reader = \League\Csv\Reader::createFromPath('./file.csv', 'r');
$stmt = (new \League\Csv\Statement())->offset(2);
foreach ($stmt->process($reader) as $record) {
    print_r($record);
}

or instead loading the whole file :

$reader = \League\Csv\Reader::createFromPath('./file.csv', 'r');
foreach ($reader as $record) {
    print_r($record);
}

elabx · June 3, 2020

Guys I wanna shoot myself, I randomly restarted the script, no updates, and it is going smooth at 16M of memory usage, now I feel the server hates me lol. I have no idea how this could have happened.

4 hours ago, flydev ?? said:

How is created the CSV file ? I mean on which computer ? You can ran into memory issue because of line ending badly detected when for example a CSV file is created on MacOS, see :

Wow I had no idea this could happen, in this case I've already tested iterating trough the whole CSV and no memory issues even a count($records) works really well.

Thanks everyone for your help! I'll get back on what I find out.

Sign In

Command line script memory issue

Recommended Posts

elabx

Robin S

teppo

flydev

elabx

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

Activity

My Activity Streams

Support

Store

My Details