Jump to content

johnstephens

Members
  • Content Count

    50
  • Joined

  • Last visited

Community Reputation

27 Excellent

About johnstephens

  • Rank
    Full Member

Recent Profile Visitors

5,167 profile views
  1. Don't use that code. I found a better way. What I discovered was, using the Iterator interface from PHP's standard library did not cause the same problems as DOMNodeList. I still can't account for why calling a function inside my foreach block caused the DOMNodeList to skip alternate nodes, but using an Iterator seems to just work. Unfortunately, there's no Iterator that deals directly with DOM nodes, and you can't feed a DOMNodeList to any Iterator's constructor. That proved to be simple enough to solve by converting the DOMNodeList to an array first: function array_from($listable) { $new_array = []; foreach($listable as $item) { $new_array[] = $item; } return $new_array; } Once I had an array, I could feed it to a new ArrayIterator, and then use foreach to go reliably do stuff to each DOMNode item. Here's a sample of what it looks like in action. Since I'm importing content from Textpattern, the source includes images as HTML img elements as well as a variety of Textpattern tags (including txp:image tags and some smd_macros) // Regular HTML img elements $img_elements = new \ArrayIterator(array_from($dom->getElementsByTagName('img'))); // txp:image tags $images = new \ArrayIterator(array_from($dom->getElementsByTagName('image'))); // smd_macro called txp:image_hd $hd_images = new \ArrayIterator(array_from($dom->getElementsByTagName('image_hd'))); // smd_macro called txp:picture $pictures = new \ArrayIterator(array_from($dom->getElementsByTagName('picture'))); // Combine them all using AppendIterator $source_images = new \AppendIterator(); $source_images->append($img_elements); $source_images->append($images); $source_images->append($hd_images); $source_images->append($pictures); // Do the stuff that needs to be done to each DOMNode item foreach ($source_images as $image) { // Handle a single image here… } I hope this helps someone in the future! Or, maybe me if I have to solve a similar problem again…
  2. I think I've solved the problem. I have no idea why this is necessary, but running the foreach block inside a recursive function seems to rapidly pick up all the images: function add_to_page_recursor($images, $image_prefix, $all_images, $newpage, $dom) { foreach($images as $image) { handle_picture($image, $image_prefix, $all_images, $newpage, $dom); $count = $images ? $images->count() : 0; if ($count > 0) add_to_page_recursor($images, $image_prefix, $all_images, $newpage, $dom); } } One obstacle to this solution is that DOMNodeList does not have the count() method before PHP 7.2, so this code requires PHP 7.2+. But for my publication server, it works. Now I just need to refactor the handle_picture function to handle all the variations of images I'm importing, but that should be simple. If anyone can shed any light on why the foreach block would be skipping images in the source that it can pick up in iterative passes, I'd love to learn what's going on here better. Thank you!
  3. Is this a known feature of PHP, that a function inside a foreach block can just blot out everything else happening inside the block for that iteration? The handle_picture() function works perfectly fine on the odd-numbered iterations (even array indices), no matter what image it is processing. And it fails on every even-numbered iteration (odd array indices). If I shuffle the source order, I get the same odd/even success/failure breakdown. So it's not choking on specific images, just whatever image happens to fall on even iterations. And then, it just ignores the whole iteration without an error or any indication.
  4. Thank you, @DV-JF! I had Tracy installed already, so this was a simple next step. Unfortunately, it confirms something I knew already without giving me new information. // Get txp:image tags $images = $dom->getElementsByTagName('image'); $list_a = []; $i = 0; // Iterate through all the images and just add their names to the $list_a array foreach($images as $image) { $list_a[] = $i . ' => ' . $image->getAttribute('name'); $i++; } $list_b = []; $j = 0; // Iterate through all the images AGAIN: // Add their names to the $list_b array, AND // Try to import them with the handle_picture function foreach($images as $image) { $list_b[] = $j . ' => ' . $image->getAttribute('name'); $j++; handle_picture($image, $image_prefix, $all_images, $newpage, $dom); } bd($list_a); bd($list_b); What I'm seeing in the bd dumps is exactly what I said above: Every alternate image item is being skipped in the second foreach block. Completely skipped: Their names don't get added to the array, and the variable $j doesn't increment. It's not just that the handle_picture() function chokes on them. Or rather, when the handle_picture() function doesn't work, $list_b and $j don't get any information either. Here is the output of my bd dumps—first $list_a: array (6) 0 => "0 => image_1.jpg" (16) 1 => "1 => image_2.jpg" (16) 2 => "2 => image_3.jpg" (16) 3 => "3 => image_4.jpg" (16) 4 => "4 => image_5.jpg" (16) 5 => "5 => image_6.jpg" (16) …and $list_b: array (3) 0 => "0 => image_1.jpg" (16) 1 => "1 => image_3.jpg" (16) 2 => "2 => image_5.jpg" (16) Likewise, if I bd() anything at all inside the handle_picture() function definition, Tracy only shows me the output for every other image, ie. the items that got added to $list_b above. This doesn't get me any closer to seeing what's going on. What am I missing? Thanks in advance for any guidance you can offer!
  5. Oops, the code snippet above should conclude with this: if ($i) echo "<pre>Second count: I counted <b>{$j}</b> txp:image tags in this document.\n\n</pre>"; Not that it matters a lot—it's just part of my troubleshooting.
  6. Hi, @adrian! (and anyone else who reads this) I'm running into a problem, I wonder if there's some simple way to solve. I found that my import script was failing to import images from the content. So I added this to the script so that I could see what was going on: $i = 0; foreach($dom->getElementsByTagName('image') as $image) { $i++; } if ($i) echo "<pre>First count: I counted <b>{$i}</b> txp:image tags in this document.\n</pre>"; $j = 0; foreach($dom->getElementsByTagName('image') as $image) { $j++; // Code that creates img tag from txp:image, adds image src to ProcessWire page, and replaces txp:image tag with img } The first foreach block just counts the number of txp:image tags in the body, so I can print it out afterward. The second block counts the same elements AGAIN, while also running code to import the images into the current ProcessWire page. Then it prints out the second count, for comparison with the first. When an article has just 1 image, the two counts match: 1 image was found, 1 was imported. When the article has more than that, the second foreach block appears to skip every alternate image. My hunch is, the script gets stuck when importing an image, and that's why it only imports images 1, 3, 5, …. If that's the actual choke point, how could I find out? Is there an obvious workaround? Is there a way to make the image import function asynchronous? Some other solution? Thanks for any guidance or suggestions you can offer!
  7. That should solve the ProcessWire API side of the problem! Now I have to figure out how to join the comments into the record returned as `$article` in my code above… Is that a LEFT JOIN?
  8. Thank you, @elabx! I'll look this over right away.
  9. $articles = database()->query(' SELECT * FROM textpattern WHERE section="blog" '); foreach($articles as $article) { $newArticle = new Page(); $newArticle->template = 'blog-post'; $newArticle->parent = pages()->get('/blog'); $newArticle->title = $article['Title']; $newArticle->body = $article['Body_html']; // Add other content from query to $newArticle->{fields} here $newArticle->save(); } I'm using something like the above to connect to a database, grab articles, and import them into ProcessWire. The source database has comments for these articles in a table called `txp_discuss`, which uses the `parentid` field to link comments to their parent article's `id`. I know I can associate that info with my `$article` items by editing the MySQL query, but I don't know how to pipe it in to a ProcessWire Comments field called `comments` that I added to the `blog-post` template I'm using. Does anyone have guidance to offer having done this before? Thank you in advance!
  10. Thank you, @adrian! I don't understand what the $np variable references. Is it the current ProcessWire page instance?
  11. Thanks! I feel very foggy on how to do that. Could you direct me to an appropriate code example? I'll look into that! I'm used to dealing with the DOM in JavaScript, but with PHP I'm not so savvy. DOMDocument looks like a great fit! Thank you!
  12. I'm working on a script for importing very old static HTML files into ProcessWire so they are searchable on the new site. What I have so far works, but I wonder if there are ways I can make the work of cleaning up the imported content easier, by doing more useful cleanup during the import. For this demo, suppose all the files exist in one directory, called "public", and suppose we're importing them all into the basic-page template. At this point, the basic-page template has been modified from the blank profile to include one additional textarea field called "body", which uses the CKEditor. <?php include './path/to/processwire/index.php'; // Use FileSystemIterator to save all the files in the 'public' directory // https://www.php.net/manual/en/class.filesystemiterator.php $files = new FileSystemIterator('./public'); // This is a callback function for the CallbackFilterIterator below $is_html_file = function($file) { return strpos($file->getFilename(), '.htm'); }; // Use CallbackFilterIterator to winnow the files down to only HTML files // https://www.php.net/manual/en/class.callbackfilteriterator.php $html_files = new CallbackFilterIterator($files, $is_html_file); // Input a regular expression and a string -> output an array of matches $preg_matches = function($regex, $string) { preg_match($regex, $string, $array); return $array; }; // Iterate over the directory objects stored in $html_files foreach($html_files as $file) { // Turn this file into a SplFileObject so we can read its contents // https://www.php.net/manual/en/class.splfileobject.php#splfileobject.constants.drop-new-line $_file = new SplFileObject($file); $contents = $_file->fread($_file->getSize()); $h1_content = $preg_matches('/\<h1\>(.*?)\<\/h1\>/i', $contents)[1] | false; // Create a new ProcessWire page and save the content into it $article = new \ProcessWire\Page(); $article->parent = $pages->get('/'); $article->template = 'basic-page'; $article->title = preg_match('/\<h1\>(.*?)\<\/h1\>/i', $contents) ? $preg_matches('/\<h1\>(.*?)\<\/h1\>/i', $contents)[1] : $preg_matches('/\<title\>(.*?)\<\/title\>/i', $contents)[1]; $article->body = $contents; $article->save(); } This successfully titles all the pages that have at least one h1 tag. (I know this is making a big assumption of proper markup, but it appears to be broadly correct in this one case.) The rest of the content is dumped into the page's body field. If this helps anyone else solve a similar problem, have the code! (WTFPL) But when one is dealing with archaic HTML using font tags and tables for layout (yeek!), this leaves much room for improvement. Something I'd like to do is get rid of all the layout tables and site furniture, like branding markup, navigation, and footer text. Of course, that is not marked up in a consistent way across all the documents. 😉 I wonder if anyone has guidance for something like this? Do you know of any best practices for automating the cleanup old HTML? Thank you! Edit: When searching for HTML tags, matches should be case insensitive (using the i flag after the delimiter). Also, use the content of the title element when there is no h1 tag on the page. This is all fixed in the code above.
  13. I have a site operating on Textpattern CMS. I still love Texptattern, which has served us well for many years—but the needs of this site have steadily grown beyond the limits of what Textpattern can do comfortably. I'd like to transition the site to ProcessWire, but the project doesn't have a budget for a full redesign right now. I'm wondering if it's feasible to install ProcessWire now, and create a limited number of pages managed by ProcessWire; then seamlessly redirect all requests for pages not managed by ProcessWire to Textpattern. Then, over time, I'd like to migrate the remaining templates and content to ProcessWire so that more and more of the site is being managed by ProcessWire, until the Textpattern site is no longer needed. Has anyone else done something like this? What steps did you take to minimize any possible disruption? How would you recommend setting something like this up? Thanks in advance for any guidance you can offer!
  14. That appears to have fixed the issue! I can log in after uncommenting the RewriteRule and the RewriteBase / line. Thank you! The installation was the latest stable version as of yesterday, ProcessWire 3.0.123. And the server runs PHP 7.2.17. I appreciate the help! Is there anything else I should look out for when running ProcessWire on a subdomain with this PHP version? Thank you again!
×
×
  • Create New...