Jump to content
hellomoto

Deleting file field's files via API not working or timing out

Recommended Posts

<?php

class ProcessTracksterMultiUpload extends WireData implements Module {

	/**
	 * getModuleInfo is a module required by all modules to tell ProcessWire about them
	 *
	 * @return array
	 *
	 */
	public static function getModuleInfo() {

		return array(

			// The module's title, typically a little more descriptive than the class name
			'title' => 'Process Trackster Multi-Upload', 

			// version number 
			'version' => 1, 

			// summary is brief description of what this module is
			'summary' => 'Creates a new page for each individual track',
			
			// Optional URL to more information about the module
			'href' => '',

			// singular=true: indicates that only one instance of the module is allowed.
			// This is usually what you want for modules that attach hooks. 
			'singular' => true, 

			// autoload=true: indicates the module should be started with ProcessWire.
			// This is necessary for any modules that attach runtime hooks, otherwise those
			// hooks won't get attached unless some other code calls the module on it's own.
			// Note that autoload modules are almost always also 'singular' (seen above).
			'autoload' => true, 
		
			// Optional font-awesome icon name, minus the 'fa-' part
			'icon' => '', 
			);
	}

	/**
	 * Initialize the module
	 *
	 * ProcessWire calls this when the module is loaded. For 'autoload' modules, this will be called
	 * when ProcessWire's API is ready. As a result, this is a good place to attach hooks. 
	 *
	 */
	public function init() {

		// add a hook after each page is saved
		$this->pages->addHookAfter('save', $this, 'addNewTracks');

	}

	/**
	 * Adds tracks
	 *
	 */
	protected function addNewTracks($event) {
		$page = $event->arguments[0]; 
		$pages = wire('pages');
		if($page->template != 'tracks') return;
		
		foreach($page->tracks_upload as $t) {
			$basename = str_replace(".{$t->ext}",'',$t->name);
			$p = new Page;
			$p->parent = $pages->get('/tracks/');
			$p->template = 'track';
			$p->title = $basename;
			$p->name = $p->id;
			$p->addStatus(Page::statusUnpublished)->save();
			$p->audio->add($t->filename);
			$p->save();
			$this->message("Added new track {$p->name} by {$p->createdUser}");
			//wire('pages')->get(1014)->tracks_upload->delete($t->filename);
			//$deleted = $t->name;
			//$files->delete($t);
			$this->message("Deleted temporary file $$t->filename");
			//$this->message("new $newname");
		}
		$pages->get($page->id)->tracks_upload->deleteAll();
		$pages->get($page->id)->save();
		//$page->tracks_upload->removeAll()->save();
		//$thisPage = $pages->get($page->id);
		//$thisPage->save();
		//$page->tracks_upload->deleteAll();
		//$page->save();
	}
	
}

I've tried $page->tracks_upload->deleteAll(), removeAll(), $page->delete($t), $page->delete($t->name) ($page ones in the foreach loop), all kinds of things. Nothing seems to work. Pages are created fine and all, I just need these files deleted too.

Share this post


Link to post
Share on other sites

how about try this before you remove:

$page->of(false); // to this before the removeAll();
$page->tracks_upload->removeAll();
$page->save();

Share this post


Link to post
Share on other sites

I'm using PW 2.8.28 with a CroppableImage field. I have an import routine that updates existing records when possible. The script runs fine unless I try to delete existing images before adding new ones. If I include $page->image->deleteAll() right before adding a new image (and after disabling output formatting), I get the following error (from nginx):

502 Bad Gateway

The server returned an invalid or incomplete response.

 

Nothing relevant is logged in PW, or Apache or Nginx access or error logs, AFAICT. Note that it does seem to work, despite the error, but it doesn't make a great impression on clients. Any suggestions?

 

Share this post


Link to post
Share on other sites

i had 502 errors yesterday with sparkpost mailings and active tracy debugger. are you using tracy? try switching it off.

of course that's not a solution just a quick suggestion for debugging...

Share this post


Link to post
Share on other sites

Thanks for the suggestion, not using Tracy on this site.

I've run plenty of imports similar to this one before, including one recently (on a different project) that imported over 6k pages, and never got a 502 until I added ->deleteAll(). Maybe I need to consider some sort of bulk operation to remove all the images for this template in one shot, rather than one page at a time. I doubt there's anything like that in the API, though. Plus, I'd prefer to only delete the images as needed, in case some other condition in the script causes that entire record to be skipped.

 

Share this post


Link to post
Share on other sites

sorry, just throwing in wild guesses... maybe something like that? or findMany if you are using pw3

 

Share this post


Link to post
Share on other sites

In this particular case, I'm iterating over lines in a CSV file, looking up pages one at a time, and updating (or creating new ones) as needed. So I'm never creating a PageArray (AFAIK, anyway). It's possible that ->deleteAll() is no less performant than other things I'm doing, but it just happens to be the thing that pushed this script over the edge. I mention it here because someone else seemed to have the same problem. The relevant code:
 

// look for a page with this primary item number
$product = wire('pages')->get('template=product-detail,itemno_mill_standard='.$itemno_mill_standard.',include=all,limit=1');
                    
// if none is found, create a new page
if(!$product->parent){                    
    $product = new Page();
    $product->template = 'product-detail';
    $product->parent = wire('pages')->get(1048); // magic number: ID of the parent for all product pages
}

// disable output formatting
$product->of(false);

// set individual fields (name, title, etc.)...

// set published status
($active ? $product->removeStatus(Page::statusUnpublished) : $product->addStatus(Page::statusUnpublished));

// save the page
$product->save();

// add image, save page again, and remove temporary image
if($has_image){                    
    if($product->image) $product->image->deleteAll(); // get rid of anything that's already in there
    $product->image->add($image_path);
    $product->save();
    unlink(realpath($image_path));
}

 

Share this post


Link to post
Share on other sites
53 minutes ago, bernhard said:

i had 502 errors yesterday with sparkpost mailings and active tracy debugger. are you using tracy? try switching it off.

Not wanting to hijack this post, but could you provide more info about this - sounds like something I need to fix in Tracy. PM me if you like rather than polluting this post even more.

Share this post


Link to post
Share on other sites

Staring at that snippet, I am wondering if I should be saving the page in between the delete and add operations?

Share this post


Link to post
Share on other sites

i'm curious... does a

wire('pages')->uncacheAll();

at the end of each iteration help?

Share this post


Link to post
Share on other sites
1 minute ago, Jason Huck said:

Staring at that snippet, I am wondering if I should be saving the page in between the delete and add operations?

Certainly doesn't hurt to try. I would also suggest that saves within the $has_image check could just save the image field, like this:

$product->save('image');

Share this post


Link to post
Share on other sites

Both worthwhile suggestions, thanks! Added them both, and will leave them in there, but still getting a 502.

 

Share this post


Link to post
Share on other sites

Can we see all the code - I can't see where $has_image is defined. Also, any chance the entire thing is in a loop?

Not related, but I have never seen this approach before:

if(!$product->parent){

Normally you would use:

if(!$product->id){

Also, do you ever set the title for the new product page?

One final thing - the Croppable Image fieldtype does not support PW 3.x - does everything work without the error if you switch to a normal image fieldtype?

Share this post


Link to post
Share on other sites

Sure, here's the whole thing. This is called from a custom admin page. These actions do occur in a loop (iterating over the lines of a CSV file). I changed ->parent to ->id in case of extra overhead (retrieving and creating a page object vs. retrieving an integer value). I do set the name, title, and status of each page. Switching to a normal image fieldtype vs. CroppableImage makes no difference. This is PW 2.8.28. Thanks for taking a look!

<?php
	include($config->paths->templates.'init/functions.inc');

	function importProductData($filepath){
		$result = new StdClass();
		$result->lines_total = 0;
		$result->lines_added = 0;
		$result->lines_failed = 0;
	
		$sizes = [
			5 => '¾”',
			6 => '1”',
			7 => '1¼”',
			8 => '1½”',
			9 => '2”',
			11 => '3”'
		];			
	
		// open the file for reading
		if(!ini_get('auto_detect_line_endings')) ini_set('auto_detect_line_endings', true);
		
		if(($f = fopen($filepath, 'r')) !== false){		
			$header = true;
		
			// parse as CSV
			while(($line = fgetcsv($f, 0, ',', '"')) !== false){
				// skip the header
				if($header == true){
					$header = false;
					continue;
				}
	
				$result->lines_total++;
	
				// process a line
				try{	
					$title = $line[1];
					$brand = ''; // TODO: Get brand data.
					$product_type = ''; // TODO: Determine product type.
					$prod_cat = (int) str_replace('F', '', $line[4]);
					$size = $sizes[$prod_cat];
					
					$uom_raw = $line[2];
					if($uom_raw == 'EA' || $uom_raw == 'PC'){
						$uom = $uom_raw;
						$uom_value = null;
					}else{
						$uom = 'FT';
						$uom_value = (int) $uom_raw;
					}
					
					$weight = (float) $line[10];
					
					$itemno_mill_standard = (int) $line[0];
					$itemno_mill_stainless = (int) $line[11];
					$itemno_anodized_standard = (int) $line[12];
					$itemno_anodized_stainless = (int) $line[13];
					$active = ($line[5] == 'A');
				
					$image_data_raw = $line[17];
					
					if(startsWith($image_data_raw, '0x')){
						$has_image = true;
						$image_data = hex2bin(ltrim($image_data_raw, '0x'));
						$image_path = wire('config')->paths->templates.'tmp/'.$line[18];
						file_put_contents($image_path, $image_data);
					}else{
						$has_image = false;
					}
				
					// look for a page with this primary item number
					$product = wire('pages')->get('template=product-detail,itemno_mill_standard='.$itemno_mill_standard.',include=all,limit=1');
										
					// if none is found, create a new page
					if(!$product->id){			
						$product = new Page();
						$product->template = 'product-detail';
						$product->parent = wire('pages')->get(1048);
					}
					
					// disable output formatting so that the page can be saved
					$product->of(false);
					
					// set fields
					$product->name = wire('sanitizer')->pageName($title.' '.$line[3]); // slug is never seen, so append UPC code to ensure it is unique
					$product->title = $title;
					$product->brand = ''; // TODO: Get brand data.
					$product->product_type = ''; // TODO: Determine product type.
					$product->size = $size;
					$product->uom = $uom;
					$product->uom_value = $uom_value;
					$product->weight = $weight;
					$product->itemno_mill_standard = $itemno_mill_standard;
					$product->itemno_mill_stainless = $itemno_mill_stainless;
					$product->itemno_anodized_standard = $itemno_anodized_standard;
					$product->itemno_anodized_stainless = $itemno_anodized_stainless;
					
					// set published status
					($active ? $product->removeStatus(Page::statusUnpublished) : $product->addStatus(Page::statusUnpublished));
					
					// save the page
					$product->save();
				
					// add image, save page again, and remove temporary image
					if($has_image){					
						if($product->image) $product->image->deleteAll(); // get rid of anything that's already in there
						$product->save('image'); // try saving after delete to avoid 502's
						$product->image->add($image_path);
						$product->save('image'); // save just the image field instead of the whole page
						unlink(realpath($image_path));
					}

					$result->lines_added++;
					
				}catch(Exception $e){
					$result->lines_failed++;
					wire('log')->save('import-errors', $itemno_mill_standard.': '.$e->getMessage());
				}
				
				// attempt to fix 502 errors introduced by ->deleteAll() above
				wire('pages')->uncacheAll();
				
				// debugging
				// if($result->lines_total > 10) break;
			}
		
			fclose($f);
			wire('log')->prune('import-errors', 30);
		}else{
			wire('pages')->error('Could not open '.$filepath.' for reading.');
		}
		
		return $result;
	}



	// collect output
	$out = '<h2>Import Product Data</h2>';

	// build the form
	$filepath = $config->paths->templates.'tmp/';
	
	$uploadform = $modules->get('InputfieldForm');
	$uploadform->action = './';
	$uploadform->method = 'post';
	$uploadform->attr('id+name','product-import');
	
	$field = $modules->get('InputfieldFile');
	$field->label = 'CSV File';
	$field->name = 'productdata';
	$field->attr('id','productdata');
	$field->required = 1;
	$field->extensions = 'csv txt tab';
	$field->maxFiles = 1;
	$field->overwrite = true;
	$field->descriptionRows = 0;
	$field->destinationPath = $filepath;
	$field->uploadOnlyMode = true;
	$uploadform->append($field);
	
	$submit = $modules->get('InputfieldSubmit');
	$submit->attr('value','Import');
	$submit->attr('id+name','submit');
	$uploadform->append($submit);
	
	// check for a submission
	if($input->post->submit){
		// empty the tmp directory prior to starting a new import,
		// in case there were failures on a previous attempt that
		// left files there
		array_map('unlink', glob($filepath.'*.*'));
			
		// validate the form
		$uploadform->processInput($input->post);
		
		// process the form
		$uploaded_files = $uploadform->get('productdata')->value;
		
		if(count($uploaded_files)){
			$file = $uploaded_files->first();
			$this->message('File uploaded successfully.');		
			$filepath .= $file->name;
			
			$import = importProductData($filepath);
			
			if($import->lines_failed){
				$this->error('Processed '.$import->lines_total.' lines with '.$import->lines_failed.' errors. Check the import-errors log for details.');
			}else{
				$this->message('Processed '.$import->lines_total.' lines with no errors.');
			}
			
			// delete the uploaded file
			if(unlink(realpath($filepath))){
				$this->message('Removed uploaded file.');
			}else{
				$this->error('Could not delete uploaded file.');
			}
		}else{
			$this->error('Uploaded file can\'t be found.');
		}
	}
	
	// render the form, with or without errors
	$out .= $uploadform->render(); 
	
	// display the results
	echo $out;
?>

...and for the sake of completeness, here's the contents of functions.inc:

<?php
    // Excerpts a field to $limit length
    if(!function_exists('excerpt')) {
		function excerpt($str, $limit = 400, $endstr = '…'){
			$str = strip_tags($str);
			if(strlen($str) <= $limit) return $str;
			$out = substr($str, 0, $limit);
			$pos = strrpos($out, " ");
			if($pos > 0) $out = substr($out, 0, $pos);
			return $out .= $endstr;
		}
	}

	// Check if a string begins with another string.
	function startsWith($haystack, $needle){
		 $length = strlen($needle);
		 return (substr($haystack, 0, $length) === $needle);
	}
?>

 

Share this post


Link to post
Share on other sites

I can't see anything there that would cause a problem. I would suggest trying that snippet of deleting images on its own to make sure there is nothing weird with your install. I just checked with PW 3.0.29 via the Tracy console and had no problem. If this works for you (which it should), have you tried limiting your script to just a few lines of the CSV? Also, have you tried using PHP's max_execution_time option - I use this quite often when running custom import scripts.

Screen Shot 2016-08-06 at 7.11.14 AM.png

 

Share this post


Link to post
Share on other sites

Increasing the max_execution_time value has no effect. The script still returns a 502 in the same amount of time. It does run successfully if I limit the number of rows.

I'm going to play with increasing the rows and/or skipping initial rows to try and see if it's an issue with the total amount of data, or a specific row that's causing problems.

I'll also try just running the ->deleteAll() in isolation to see if the behavior changes.

Thanks!

 

Share this post


Link to post
Share on other sites

@Jason Huck 

I don't think this will sort out your problem - but you might want to do your realpath() call earlier - before you call file_put_contents() to save the image to tmp storage. You should also check the return value of that call and not try to store the image if it failed to write it to the tmp location.

if(startsWith($image_data_raw, '0x')){
	$image_data = hex2bin(ltrim($image_data_raw, '0x'));
	$image_path = realpath(wire('config')->paths->templates.'tmp/'.$line[18]); // Realpath here
	$has_image = file_put_contents($image_path, $image_data); // Use return value to signal need to add the image
} else {
	$has_image = false;
}

...and...

// add image, save page again, and remove temporary image
if($has_image){					
	if($product->image) {
		$product->image->deleteAll(); // get rid of anything that's already in there
		$product->save('image'); // try saving after delete to avoid 502's
	}
	$product->image->add($image_path);
	$product->save('image'); // save just the image field instead of the whole page
	unlink($image_path); // Now no need to realpath() here.
}

Also, have you tried saving the entire product page rather than just the 'image' field?

// add image, save page again, and remove temporary image
if($has_image){					
	if($product->image) {
		$product->image->deleteAll();
		$product->save();
	}
	$product->image->add($image_path);
	$product->save();
	unlink($image_path);
}

 

  • Like 2

Share this post


Link to post
Share on other sites

@netcarver Actually it looks like that did sort the issue. I'll have to test some more, but I'm guessing that verifying the result of file_put_contents, which I should have been doing in the first place, is the key. Without that, the script was trying to add and remove nonexistent image files. Thanks for pointing that out!

  • Like 1

Share this post


Link to post
Share on other sites

So that was a bit of a red herring. For some reason, putting realpath in that position caused all of the file writes to fail, so it was just skipping the image field handling completely. What I'm seeing right now is that it can handle roughly 180 rows at a time. I can skip any number of rows at the beginning, and process any consecutive batch of rows successfully right up to around 180. It sometimes varies by a few rows. It's not an issue with any particular row either -- e.g., it fails at row 181, but if I skip a row it processes row 181 just fine and fails on row 182.

 

Share this post


Link to post
Share on other sites

@Jason Huck Yikes! Sorry, my mistake - I forgot that realpath() also checks for file existence and needs lots of permissions set up right. Let me try again.

$has_image = false;
if(startsWith($image_data_raw, '0x')){
	$image_data = hex2bin(ltrim($image_data_raw, '0x'));
	$image_path = wire('config')->paths->templates.'tmp/'.$line[18];
	$has_image = file_put_contents($image_path, $image_data);
}

...
// add image, save page again, and remove temporary image
if(false !== $has_image){
	if($product->image) {
		$product->image->deleteAll(); // get rid of anything that's already in there
		$product->save('image'); // try saving after delete to avoid 502's
	}
	$product->image->add($image_path);
	$product->save('image'); // save just the image field instead of the whole page
	unlink($image_path);
}

 

Share this post


Link to post
Share on other sites
55 minutes ago, netcarver said:

Is your server running php-fpm by any chance?

No. The dev server is a CentOS 7 virtual machine running stock Apache and PHP. Earlier I mistakenly said it was running behind nginx. It's actually running behind haproxy.

Share this post


Link to post
Share on other sites

@Jason Huck Hmmm. So HAProxy is in the mix.

Quote

Nothing relevant is logged in PW, or Apache or Nginx access or error logs, AFAICT. Note that it does seem to work, despite the error, but it doesn't make a great impression on clients. Any suggestions?

Ok, here's a possible scenario that might fit your observations;

If HAProxy's server connection times out before Apache finishes, you'll get a 50x error reported (not sure if 502 or 504) in the browser yet the Apache process will probably complete without error behind the dropped connection. Have you looked at the "timeout server" setting (and the other timeout settings) in your HAProxy config file (/etc/haproxy/haproxy.cfg)? I think you'll need to up that beyond where you have the timeout set for the PHP script execution.

Edited to add: Have you thought about moving this processing load into a background task? 

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...