Jump to content

Facebook PHP SDK vs Processwire API


jrtderonde
 Share

Recommended Posts

Hey,

I'm currently working on a script that automates the import of certain Facebook posts. The script basically requests all the Facebook posts (combined with Images). It loops through all the posts and stores these within Processwire. I did this because I experience the Facebook API as very slow. I will run a Cronjob on this "hidden page" that imports the posts every 10-15 minutes. These posts are then loaded directly from Processwire so the API doesn't affect the loading speed of the page.

I experience a minor issue that keeps me from importing all posts correctly. Some images (from the Facebook API) are given through some kind of "safe-image url" - like this: 

https://external.xx.fbcdn.net/safe_image.php?d=xxxxx&url=https://www.facebook.com/ads/image/?d=xxxxx

Regular images (in comparison to this one) are given like this:

https://scontent.xx.fbcdn.net/hphotos-xtp1/v/t1.0-9/s720x720/xxxxxx.jpg?oh=xxxxxx&oe=xxxxx

Processwire has no problems with storing the images with the url as stated above this line. The "safe-images" however are causing some problems as I experience the following error:

Error: Exception: Unable to copy: https://external.xx.fbcdn.net/safe_image.php?d=xxxx&url=https://www.facebook.com/ads/image/?d=xxx-xxxxx-xxxxx=> C:/wamp/www/abelle/site/assets/files/1335/d_aqksuwuaixz-djme44retrvs0th_wzpccib0ojqqb9lknelwusfkjvchyncx0pywb4q8k6iac9rf25kolvgy7nczw96myatyf8u6ap4gi2_cvdj-escwqhmb8dgb.com_ads_image__d_aqksuwuaixz_djme44retrvs0th_wzpccib0ojqqb9lknelwusfkjvchyncx0pywb4q8k6iac9rf25kolvgy7nczw96myatyf8u6ap4gi2_cvdj_escwqhmb8dgbafl1ikir_cs1gmmxhwwwfo_nlxez (in C:\wamp\www\abelle\wire\core\Pagefile.php line 117)

I'm wondering if anyone has experienced this same problem before. I am using the following little script to store the image within the back-end of my project.

foreach ($array["posts"] as $post):

		// Get the specific id
		$id = $post["id"];

		// See if there is any pages with the ID yet
		$spider = $pages->find("title=$id");

		if ($spider->count() != 1):

			// Set the counter
			echo "<li>$i: New record found</li>";

			// Set all the information
			$new = new Page();
			$new->template = "news";
			$new->parent = $parent;
			$new->save(); 

			// Parse the variables, check valid if needed
			$new->title = $post["id"];
			$new->post_id = $post["id"];

			// Check for these
			if (!empty($post["message"])):
				$new->post_message = $post["message"];
			endif;

			// Check for these
			if (!empty($post["story"])):
				$new->post_title = $post["story"];
			else:
				$new->post_title = $post["id"];
			endif;

			// Declare the time
			$new->post_date = date_format($post["created_time"], "d-m-Y H:i:s");

			// Check for these
			if (!empty($post["message"])):
				$new->post_message = $post["message"];
			endif;

			// Check for these
			if (!empty($post["full_picture"])):

				// Grab the url
				$url = rawurldecode($post["full_picture"]);

				// Parse it to the backend
				$new->post_full_picture = $url;

				echo $url;

			endif;

			// Save the page
			$new->save(); 

		else:

			// Set the counter
			echo "<li>$i: Original record found</li>";

		endif;

		// Increment the counter
		$i++;

	endforeach;
Link to comment
Share on other sites

For your information, I tried to "explode" the "safe-url" to an URL in which only the image-url is pointed out. I get the following error; 

Notice: Undefined index: extension in C:\wamp\www\abelle\wire\core\Sanitizer.php on line 314

I'm starting to get the feeling that Processwire will not really be able to grab this image from the given URL and parse it to the backend. Does anyone know a solution? 

Link to comment
Share on other sites

Maybe the filename is simply too long for Windows. It used to be 256 characters (probably still is). Your image filename is 296 chars long...

Another issue could be http://stackoverflow.com/questions/8084172/facebook-thumbnails-issue-traced-to-safe-image-php

Also, 

d_aqksuwuaixz-djme44retrvs0th_wzpccib0ojqqb9lknelwusfkjvchyncx0pywb4q8k6iac9rf25kolvgy7nczw96myatyf8u6ap4gi2_cvdj-escwqhmb8dgb.com_ads_image__d_aqksuwuaixz_djme44retrvs0th_wzpccib0ojqqb9lknelwusfkjvchyncx0pywb4q8k6iac9rf25kolvgy7nczw96myatyf8u6ap4gi2_cvdj_escwqhmb8dgbafl1ikir_cs1gmmxhwwwfo_nlxez

doesn't really have a file extension o_O

If the images are all of the same filetype, maybe add it manually? Or do a CURL header check (2nd block of code here http://stackoverflow.com/a/31046363)

  • Like 1
Link to comment
Share on other sites

You'll need to grab the image using wireUpload (or some other method if you prefer), then rename, then you can "add" to the image field. Take a look at this solution by Sevarf2: https://processwire.com/talk/topic/5490-hook-for-wireupload-filename-images/?p=53997

The reason is the filename not ending in an image extension - PW's sanitizer won't allow it through when directly adding.

Hope that helps.

Link to comment
Share on other sites

Maybe the filename is simply too long for Windows. It used to be 256 characters (probably still is). Your image filename is 296 chars long...

Another issue could be http://stackoverflow.com/questions/8084172/facebook-thumbnails-issue-traced-to-safe-image-php

Also, 

d_aqksuwuaixz-djme44retrvs0th_wzpccib0ojqqb9lknelwusfkjvchyncx0pywb4q8k6iac9rf25kolvgy7nczw96myatyf8u6ap4gi2_cvdj-escwqhmb8dgb.com_ads_image__d_aqksuwuaixz_djme44retrvs0th_wzpccib0ojqqb9lknelwusfkjvchyncx0pywb4q8k6iac9rf25kolvgy7nczw96myatyf8u6ap4gi2_cvdj_escwqhmb8dgbafl1ikir_cs1gmmxhwwwfo_nlxez

doesn't really have a file extension o_O

If the images are all of the same filetype, maybe add it manually? Or do a CURL header check (2nd block of code here http://stackoverflow.com/a/31046363)

I believe that this is indeed part of the solution, I will check, thanks!

Link to comment
Share on other sites

You'll need to grab the image using wireUpload (or some other method if you prefer), then rename, then you can "add" to the image field. Take a look at this solution by Sevarf2: https://processwire.com/talk/topic/5490-hook-for-wireupload-filename-images/?p=53997

The reason is the filename not ending in an image extension - PW's sanitizer won't allow it through when directly adding.

Hope that helps.

Thanks for your input. Out of 25 images, 23 are stored correctly (those with a valid URL that actually has an extension at the end). Two of these are links that a user is redirected to after the URL is visited. Processwire is able to actually store these images directly from an URL, so that's a plus, I am only facing a problem with these "redirected" urls.

Link to comment
Share on other sites

I found that the image URL is actually a 302 Redirect.

>>> https://www.facebook.com/ads/image/?d=AQKSUWuaIXz-dJmE44rETRVS0tH_wzPCCIb0oJQqB9LKNeLWUSFkJvcHYNCx0pywb4q8k6iaC9RF25kOlvgY7NCzW96myAtYf8u6Ap4gi2_CVdJ-eScWQhmB8dGbAfl1iKiR_CS1gMMxHwWWfo_nLxeZ

> --------------------------------------------
> 302 Found
> --------------------------------------------

Status: 302 Found Code: 302 Location: https://scontent.xx.fbcdn.net/hads-xfa1/t45.1600-4/12480837_6037580358670_471536793_n.png P3P: CP="Facebook does not have a P3P policy. Learn why here: http://fb.me/p3p" X-Frame-Options: DENY Content-Type: image/png X-Content-Type-Options: nosniff X-XSS-Protection: 0 Strict-Transport-Security: max-age=15552000; preload Public-Key-Pins-Report-Only: max-age=500; pin-sha256="WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18="; pin-sha256="r/mIkG3eEpVdm+u/ko/cwxzOMo1bk4TyHIlByibiA5E="; pin-sha256="q4PO2G2cbkZhZ82+JgmRUyGMoAeozA+BSXVXQWB8XWQ="; report-uri="http://reports.fb.com/hpkp/" Pragma: no-cache Expires: Sat, 01 Jan 2000 00:00:00 GMT Cache-Control: private, no-cache, no-store, must-revalidate Access-Control-Allow-Origin: * Set-Cookie: reg_ext_ref=deleted; expires=Thu, 01-Jan-1970 00:00:01 GMT; Max-Age=0; path=/; domain=.facebook.com; httponly Vary: Accept-Encoding X-FB-Debug: n+c81EJaLrZR8h4YdM/RGKNqGl8MN/vv8rPkQiyKQnA/KPlV1vMuWMwF2TkWiNTDrj/0CdHxzjCSvAA7G1Qpow== Date: Tue, 23 Feb 2016 10:57:37 GMT Connection: close Content-Length: 0
  • Like 1
Link to comment
Share on other sites

Hmm, I think that it will be impossible to retrieve the image as the URL redirects (302) to a different page. I am not able to grab the image URL after trying several methods (php, curl). 

Has anyone experienced this specific problem before? 

Link to comment
Share on other sites

hmm, can you give us a real problematic image URL?

both https://scontent-fra3-1.xx.fbcdn.net/hads-xfa1/t45.1600-4/12480837_6037580358670_471536793_n.png https://scontent.xx.fbcdn.net/hads-xfa1/t45.1600-4/12480837_6037580358670_471536793_n.png seem OK here - status 200, not 302.

maybe experiment with different CURL options...

        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_MAXREDIRS => 1,
Link to comment
Share on other sites

This should IMHO be filed as a bug. The problem here is that Pagefiles::cleanBasename doesn't account for this kind of URLs without a trailing filename. It simply splits the string at the last dot in the URL when it should only look for dots after the last path separator (and perhaps also account for a reasonable maximum extension length, but that's another thing). This is the relevant code:

        $dot = strrpos($basename, '.');
        $ext = $dot ? substr($basename, $dot) : '';

It should (untested) be sufficient to change it to:

        $slash = strrpos($basename, '/');
        $dot = strrpos($basename, '.', $slash);
        $ext = $dot ? substr($basename, $dot) : '';
        if(strlen($ext) > 10) $ext = substr($ext, 0, 10); // Not sure what a reasonable default would be here
  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...