Jump to content
John W.

How To: Simple Sitemap.xml Generator That Optionally Excludes Children

Recommended Posts

SYNOPSIS

A little guide to generating an sitemap.xml using (I believe) a script Ryan originally wrote with the addition of being able to optionally exclude child pages from being output in the sitemap.xml file.

I was looking back on a small project today where I was using a php script to generate an xml file, I believe the original was written by Ryan. Anyway, I needed a quick fix for the script to allow me to optionally exclude children of pages from being included in the sitemap.xml output.

OVERVIEW

A good example of this is a site where if you visit /minutes/ a page displays a list of board meetings which includes a title,  date, description and link to download the .pdf file.

I have a template called minutes and a template called minutes-document. The first page, minutes, when loaded via /minutes/ simply grabs all of its child pages and outputs the name, description and actual path of an uploaded .pdf file for a visitor to download.

In my back-end I have the template MINUTES and MINUTES-DOCUMENT. Thus:

5a5d2863e7406_ScreenShot2018-01-15at4_16_02PM.png.822392d3608ac64be13bc668b2540886.png


So, basically, their employee can login, hover over minutes, click new, then create a new (child) record and name it the date of the meeting e.g. June 3rd, 2016 :

5a5d2a49076c6_ScreenShot2018-01-15at4_24_37PM.png.f7a586718fc487f600b6c1e1d86ccf13.png
 

---------------------------

OPTIONALLY EXCLUDING CHILDREN - SETUP

Outputting the sitemap.xml and optionally excluding children that belong to a template.

The setup of the original script is as follows:

1. Save the file to the templates folder as sitemap.xml.php

2. Create a template called sitemap-xml and use the sitemap.xml.php file.

3. Create a page called sitemap.xml using the sitemap-xml template

 

Now, with that done you will need to make only a couple of slight modifications that will allow the script to exclude children of a template from output to the sitemap.xml

1. Create a new checkbox field and name it:   sitemap_exclude_children

2. Add the field to a template that you want to control whether the children are included/excluded from the sitemap. In my example I added it to my "minutes" template.

3. Next, go to a page that uses a template with the field you added above. In my case, "MINUTES"

4. Enable the checkbox to exclude children, leave it unchecked to include children.

For example, in my MINUTES page I enabled the checkbox and now when /sitemap.xml is loaded the children for the MINUTES do not appear in the file.

5a5d2d8ba9b40_ScreenShot2018-01-15at4_16_24PM.png.f9636d1a9224e198ca00a38f7478d3c2.png

 

A SIMPLE CONDITIONAL TO CHECK THE "sitemap_exclude_children" VALUE

This was a pretty easy modification to an existing script, adding only one line. I just figure there may be others out there using this script with the same needs.

I simply inserted the if condition as the first line in the function:

function renderSitemapChildren(Page $page) { 
	if($page->sitemap_exclude_children) return "";

...
...
...

 

THE FULL SCRIPT WITH MODIFICATION

<?php 

/**
 * ProcessWire Template to power a sitemap.xml 
 *
 * 1. Copy this file to /site/templates/sitemap-xml.php
 * 2. Add the new template from the admin.
 *    Under the "URLs" section, set it to NOT use trailing slashes.
 * 3. Create a new page at the root level, use your sitemap-xml template
 *    and name the page "sitemap.xml".
 *
 * Note: hidden pages (and their children) are excluded from the sitemap.
 * If you have hidden pages that you want to be included, you can do so 
 * by specifying the ID or path to them in an array sent to the
 * renderSiteMapXML() method at the bottom of this file. For instance:
 *
 * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); 
 * 
 * patch to prevent pages from including children in the sitemap when a field is checked / johnwarrenllc.com
 * 1. create a checkbox field  named sitemap_exclude_children
 * 2. add the field to the parent template(s) you plan to use
 * 3. when a new page is create with this template, checking the field will prevent its children from being included in the sitemap.xml output
 */

function renderSitemapPage(Page $page) {

	return 	"\n<url>" . 
		"\n\t<loc>" . $page->httpUrl . "</loc>" . 
		"\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . 
		"\n</url>";	
}

function renderSitemapChildren(Page $page) { 

	if($page->sitemap_exclude_children) return ""; /* Aded to exclude CHILDREN if field is checked */

	$out = '';
	$newParents = new PageArray(); 
	$children = $page->children; 
	
	foreach($children as $child) {
		$out .= renderSitemapPage($child);
		if($child->numChildren) $newParents->add($child); 
			else wire('pages')->uncache($child); 
	}

	foreach($newParents as $newParent) {
		$out .= renderSitemapChildren($newParent); 
		wire('pages')->uncache($newParent); 
	}

	return $out; 
}

function renderSitemapXML(array $paths = array()) {

	$out = 	'<?xml version="1.0" encoding="UTF-8"?>' . "\n" . 
		'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

	array_unshift($paths, '/'); // prepend homepage

	foreach($paths as $path) {
		$page = wire('pages')->get($path); 
		if(!$page->id) continue; 
		$out .= renderSitemapPage($page);
		if($page->numChildren) { $out .=  renderSitemapChildren($page); }
	}

	$out .= "\n</urlset>";

	return $out; 
}

header("Content-Type: text/xml");
echo renderSitemapXML(); 
// Example: echo renderSitemapXML(array('/hidden/page/')); 

 

In conclusion, I have used a couple different processwire sitemap generating modules. But for my needs, the above script is fast and easy to setup/modify.

- Thanks

 

  • Like 5

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By stanoliver
      My aim is to output a very basic xml document which should be styled with a few css-styles.
      <?xml version = "1.0"?> <contact-info> <name>Donal Duck</name> <company>Superducks</company> <phone>(011) 123-4567</phone> </contact-info> How do I implement it with processwire?
    • By franciccio-ITALIANO
      Hi, we can choose the "headline" and "title" and "summery" in panel page of processwire, but we can't write the "metadecriptions" and "tags".
       I can write mdescropt and tags in templates, but I've same templates for many articles... so, how I can change mdescription and tags?

      Thanks...
    • By louisstephens
      So I reread my first draft, and it made absolutely no sense (I deleted it to hopefully better explain myself).  I am trying to make a system (that to me is a bit complicated) utilizing jquery and processwire together. My whole goal is to put a url like https://domain.com/launch?first_name=jim&occupation=builder in a script tag on another site(just a localhost .php page) to then pull out the data for that person and append to divs etc. Basically, the initial script tag would point to "launch" which has a content-type of "application/javascript". Using jquery, I would pull out the persons name and occupation and then make a specific ajax get request to "domain.com/api" (in json format) for a look up of the person. Essentially then I could pull that particular person's information from the json data, and do with it how I please in the "launch" page.  In processwire, I have a page structure like:
      People -Jim Bob (template: person ) --Occupations (template: basic-page) ---Builder (template: occupation) ---Greeter (template: occupation) It is really just a bunch of people with their occupations and a few fields to the occupation template. With the "api" (template: api) url, I was hoping to return all the data (of people) in json format like:
      Example Format:
      { "id": 1, "title": "Jim Bob", "occupations": { "builder": { "id": 44, "title": "Builder", "years_worked": 1, "etc": "ect", }, "Greeter": { "id": 44, "title": "Greeter", "years_worked": 1, "etc": "ect", }, } } Where I get lost is really outputting the page names and nesting in the occupations into json. I have used Pages2JSON before, but I was a bit lost on how to implement what i was thinking.
       
      I have access to all the local host files, but I was hoping to kind of build out a "system" where I could place the script tag/parameters in any project, and be able to interact with the data without doing an ajax call on the actual site. In a way, this would keep processwire handling all the data and requests, and my other "projects" just with a simple script tag. This might all be way too much/over complicated, but I couldn't quite wrap my head around how to achieve it. 
    • By Leftfield
      Hi All 🙂

      How to append canonical URL to head from certain templates?

      Thanks!!!
    • By MoritzLost
      Writing reusable markup generation functions
      Hello there, I've been working with ProcessWire for a while now, and I've been writing some helper functions to generate markup and reduce the amount of repetitive code in my templates. In this tutorial I want to explain how to write small, reusable functions and combine them to accomplish bigger tasks. Note that this is the follow-up to my last post, Building a reusable function to generate responsive images. In that tutorial, I demonstrated a pretty large function that generates multiple image variations for responsive images, as well as the corresponding markup. In this post, I'll split this function into multiple smaller functions that can be utilized for other purposes as well. This will be more beginner-orientated than the last one, I hope there's some interest in this anyway 🙂
      Note that for my purposes, I prefer to have those functions as static methods on a namespaced object, so the following code examples will be placed in a simple Html class. However, you can use those as normal functions just as well.
      class Html { // code goes here } Edit: Those functions use some syntax exclusive to PHP 7.1 and above, they won't work in PHP 7.0 and lower. Thanks for @Robin S for pointing that out.
      Seperation of concerns
      To split up the original function, we need to analyze all the individual tasks it performs:
      Generate several image variations in different sizes. Generate the corresponding srcset attribute markup according to the specification. Generate the sizes attribute markup based on the passed queries. Automatically include the description as the alt attribute. Generate the markup for all attributes (including the ones passed to the function). Generate the markup for the complete img tag. The first three of those tasks are very specifically concerned with generating responsive images. Generating the alt attribute is relevant to any img tag, not just responsive images. Finally, generating the attributes and HTML markup is relevant to all HTML markup that one wants to generate. Therefore, this is how a hierarchy between those functions could look like.
      Generate responsive image Generate image markup Generate any HTML tag markup Generate an HTML start tag Generate HTML attributes markup Generate an HTML end tag Those bullet points are the tasks I want to turn into individual functions, each accepting arguments as general as they can be, facilitating code reuse. I'll start writing those out from the ground up.
      Generating attributes markup
      HTML attributes are a list of property-value pairs, where the value is wrapped in quotation marks (") and assigned to the property with an equals-sign (=). Each pair is separated by a space. There are also standalone/empty attributes that don't have a value, for example:
      <input id="name" class="form-control" disabled> Since the input format consists of key-value pairs, it makes sense to use an associative array as the argument to the attributes functions.
      public static function attributes( array $attributes ): string { $attr_string = ''; foreach ($attributes as $attr => $val) { $attr_string .= ' ' . $attr . '="' . $val . '"'; } return $attr_string; } However, this still needs to support standalone attributes. Those attributes are also known as boolean attributes, since their presence indicates a true value, their absence the opposite. Since all other values in the markup are strings or integers, we can differentiate between those based on the type of the value in the associative array. If it's a boolean, we'll treat it as a standalone attribute and only include it if the value is also true.
      public static function attributes( array $attributes, bool $leading_space = false ): string { $attr_string = ''; foreach ($attributes as $attr => $val) { if (is_bool($val)) { if ($val) { $attr_string .= " $attr"; } } else { $attr_string .= ' ' . $attr . '="' . $val . '"'; } } if (!$leading_space) { $attr_string = ltrim($attr_string, ' '); } return $attr_string; } Of course, this means that if a value in the array is boolean false, this array item will be left out. This is by design, as it allows the caller to use expressions in the array declaration. For example:
      echo Html::attributes([ 'id' => 'name', 'class' => 'form-control', 'disabled' => $this->isDisabled() ]); This way, if isDisabled returns true, the disabled attribute will be included, and left out if it doesn't.
      Note that I also included a $leading_space argument for convenience.
      Generating start tags, end tags and complete HTML elements
      The start tag is identified by the element name and the attributes it contains. The end tag only needs a name. Those functions are trivial:
      public static function startTag( string $element, ?array $attributes = [] ): string { $attribute_string = self::attributes($attributes, true); return "<{$element}{$attribute_string}>"; } public static function endTag(string $element): string { return "</{$element}>"; } Of course, the startTag function builds on the existing function to generate the attributes. Note that a start tag is identical with a standalone tag (i.e. a void HTML element such as the img tag).
      At this point, it's also trivial to write a function that builds a complete element, including start and end tag as well as the enclosed content.
      public static function element( string $element, ?string $content = null, array $attributes = [], $self_closing = false ): string { if ($self_closing) { return self::startTag($element, $attributes); } else { return self::startTag($element, $attributes) . $content . self::endTag($element); } } Note that while this function does take several arguments, all except the first have reasonable default values, so usually the caller will only have to pass two or three of them. Some examples:
      echo Html::startTag('hr'); // <hr> echo Html::element('a', 'My website', ['href' => 'http://herebedragons.world']); // <a href="http://herebedragons.world">My website</a> Image tags
      Those functions make for a solid foundation to build any type of HTML element markup. Based on the type, the functions can accept more specific arguments to be easier to use. For example, the previous link example could be simplified by writing a link function that accepts a link text and an href value, since those are needed for any link:
      public static function link( string $url, ?string $text = null, array $attributes = [] ): string { // use url as text if no text was passed $text = $text ?? $url; $attributes['href'] = $url; return self::element('a', $text, $attributes); } Anyway, for our image markup function, we'll take a Pageimage object as an argument instead, since most images we will use in a ProcessWire template will come from the ProcessWire API. Since all ProcessWire image fields have a description field by default, we can use that description as the alt attribute, which is good practice for accessibility.
      public static function image(Pageimage $img, array $attributes = []): string { $attributes['src'] = $img->url(); // use image description as alt text, unless specified in $attributes if (empty($attributes['alt']) && !empty($img->description())) { $attributes['alt'] = $img->description(); } return self::selfClosingElement('img', $attributes); } Pretty simple. Note that the alt attribute can still be manually overridden by the caller by including it in the $attributes array.
      Responsive images
      Now, the responsive image function can be shortened by building on this function in turn. Optimally, the three distinct tasks this performs (see above) should be separated into their own functions as well, however in practice I haven't seen much need for this. Also, this post is plenty long already, so ...
      public static function imageResponsive( Pageimage $img, ?int $standard_width = 0, ?int $standard_height = 0, ?array $attributes = [], ?array $sizes_queries = [], array $variant_factors = [0.25, 0.5, 0.75, 1, 1.5, 2] ): string { // use inherit dimensions of the passed image if standard width/height is empty if (empty($standard_width)) { $standard_width = $img->width(); } if (empty($standard_height)) { $standard_height = $img->height(); } $suffix = 'auto_srcset'; // if $attributes is null, default to an empty array $attributes = $attributes ?? []; // get original image for resizing $original_image = $img->getOriginal() ?? $img; // the default image for the src attribute $default_image = $original_image->size( $standard_width, $standard_height, ['upscaling' => false, 'suffix' => $suffix] ); // build the srcset attribute string, and generate the corresponding widths $srcset = []; foreach ($variant_factors as $factor) { // round up, srcset doesn't allow fractions $width = ceil($standard_width * $factor); $height = ceil($standard_height * $factor); // we won't upscale images if ($width <= $original_image->width() && $height <= $original_image->height()) { $current_image = $original_image->size($width, $height, ['upscaling' => false, 'suffix' => $suffix]); $srcset[] = $current_image->url() . " {$width}w"; } } $attributes['srcset'] = implode(', ', $srcset); // build the sizes attribute string if ($sizes_queries) { $attributes['sizes'] = implode(', ', $sizes_queries); } return self::image($default_image, $attributes); } See my last post for details. Since then, I made some changed to the function I outlined here (thanks to @horst for pointing out some pitfalls with my approach). Most notably:
      The generated images now include a prefix so they can be removed by a cleanup script more easily. The function now accepts a width and a height parameter so that the aspect ratio of the generated images is fixed (reasons for this change are explained here). To get the original functionality back, I also wrote two helper functions that takes only a width/height and fill in the missing parameter based on the aspect ratio of the passed image. The helper functions look like this:
      public static function imageResponsiveByWidth( Pageimage $img, ?int $standard_width = 0, ?array $attributes = [], ?array $sizes_queries = [], array $variant_factors = [0.25, 0.5, 0.75, 1, 1.5, 2] ): string { // automatically fill the height parameter based // on the aspect ratio of the passed image if (empty($standard_width)) { $standard_width = $img->width(); } $factor = $img->height() / $img->width(); $standard_height = ceil($factor * $standard_width); return self::imageResponsive( $img, $standard_width, $standard_height, $attributes, $sizes_queries, $variant_factors ); } Conclusion
      This approach was born out of necessity, since pure PHP templating makes for some messy code. Of course, another approach would be to use a template engine in the first place. However, I didn't want the overhead of installing Twig or Blades for my smaller projects, so for those small to medium-sized projects, I found some helper functions to generate markup and clean up my code to be a helpful addition.
      A small disclaimer, I update those functions pretty frequently while developing with ProcessWire, so it's possible some errors made their way into the versions I posted here that I haven't discovered yet. If you want to use some of the included code in your own projects, make sure to properly test it.
      I'm also working on a small library including those and some other helpers I wrote, I'll post a Github link once it's in a usable stage.
      So this post got way longer than I intended, I hope that some of you still made your way through it and enjoyed it a bit 🙂 If you see some problems or possible improvements to those methods and the general approach, I'd be happy to hear them!
      Complete code for reference
      <?php use \Processwire\Pageimage; class Html { /** * Build a simple element tag with the passed element. * * @param string $element The element/tag name as a string. * @param ?string $content The content of the element (what goes between the tags). * @param ?array $attributes Optional attributes for the element. * @param boolean $self_closing Whether the element is self-closing (i.e. no end tag). $content is ignored if true. * @return string The HTML element markup. */ public static function element( string $element, ?string $content = null, array $attributes = [], $self_closing = false ): string { if ($self_closing) { return self::startTag($element, $attributes); } else { return self::startTag($element, $attributes) . $content . self::endTag($element); } } /** * Builds a start tag for an element (or a self-closing/void element). * * @param string $element * @param array $attributes * @return string The HTML start tag markup. */ public static function startTag( string $element, ?array $attributes = [] ): string { $attribute_string = self::attributes($attributes, true); return "<{$element}{$attribute_string}>"; } /** * Build an end tag for an element. * * @param string $element The HTML end tag markup. * @return void */ public static function endTag(string $element): string { return "</{$element}>"; } /** * Build an HTML attribute string from an array of attributes. Attributes set * to (bool) true will be included as standalone (no attribute value) and left * out if set to (bool) false. * * @param array $attributes Attributes in attribute => value form. * @param bool $leading_space Whether to include a leading space in the attribute string. * @return string The attributes as html markup. */ public static function attributes( array $attributes, bool $leading_space = false ): string { $attr_string = ''; foreach ($attributes as $attr => $val) { if (is_bool($val)) { if ($val) { $attr_string .= " $attr"; } } else { $attr_string .= ' ' . $attr . '="' . $val . '"'; } } if (!$leading_space) { $attr_string = ltrim($attr_string, ' '); } return $attr_string; } /** * Image Functions. */ /** * Build a simple image tag from a Processwire Pageimage object. * * @param Pageimage $img The image to use. * @param array $attributes Optional attributes for the element. * @return string */ public static function image(Pageimage $img, array $attributes = []): string { $attributes['src'] = $img->url(); // use image description as alt text, unless specified in $attributes if (empty($attributes['alt']) && !empty($img->description())) { $attributes['alt'] = $img->description(); } return self::selfClosingElement('img', $attributes); } /** * Builds a responsive image element including different resolutions * of the passed image and optionally a sizes attribute build from * the passed queries. * * @param Pageimage $img The base image. Must be passed in the largest size available. * @param int|null $standard_width The standard width for the generated image. Use NULL to use the inherent size of the passed image. * @param int|null $standard_height The standard height for the generated image. Use NULL to use the inherent size of the passed image. * @param array|null $attributes Optional array of html attributes. * @param array|null $sizes_queries The full queries and sizes for the sizes attribute. * @param array|null $variant_factors The multiplication factors for the alternate resolutions. * @return string */ public static function imageResponsive( Pageimage $img, ?int $standard_width = 0, ?int $standard_height = 0, ?array $attributes = [], ?array $sizes_queries = [], array $variant_factors = [0.25, 0.5, 0.75, 1, 1.5, 2] ): string { // use inherit dimensions of the passed image if standard width/height is empty if (empty($standard_width)) { $standard_width = $img->width(); } if (empty($standard_height)) { $standard_height = $img->height(); } $suffix = 'auto_srcset'; // if $attributes is null, default to an empty array $attributes = $attributes ?? []; // get original image for resizing $original_image = $img->getOriginal() ?? $img; // the default image for the src attribute $default_image = $original_image->size( $standard_width, $standard_height, ['upscaling' => false, 'suffix' => $suffix] ); // build the srcset attribute string, and generate the corresponding widths $srcset = []; foreach ($variant_factors as $factor) { // round up, srcset doesn't allow fractions $width = ceil($standard_width * $factor); $height = ceil($standard_height * $factor); // we won't upscale images if ($width <= $original_image->width() && $height <= $original_image->height()) { $current_image = $original_image->size($width, $height, ['upscaling' => false, 'suffix' => $suffix]); $srcset[] = $current_image->url() . " {$width}w"; } } $attributes['srcset'] = implode(', ', $srcset); // build the sizes attribute string if ($sizes_queries) { $attributes['sizes'] = implode(', ', $sizes_queries); } return self::image($default_image, $attributes); } /** * Shortcut for the responsiveImage function that only takes a width parameter. * Height is automatically generated based on the aspect ratio of the passed image. * * @param Pageimage $img The base image. Must be passed in the largest size available. * @param int|null $standard_width The standard width for this image. Use NULL to use the inherent size of the passed image. * @param array|null $attributes Optional array of html attributes. * @param array|null $sizes_queries The full queries and sizes for the sizes attribute. * @param array|null $variant_factors The multiplication factors for the alternate resolutions. * @return string */ public static function imageResponsiveByWidth( Pageimage $img, ?int $standard_width = 0, ?array $attributes = [], ?array $sizes_queries = [], array $variant_factors = [0.25, 0.5, 0.75, 1, 1.5, 2] ): string { // automatically fill the height parameter based // on the aspect ratio of the passed image if (empty($standard_width)) { $standard_width = $img->width(); } $factor = $img->height() / $img->width(); $standard_height = ceil($factor * $standard_width); return self::imageResponsive( $img, $standard_width, $standard_height, $attributes, $sizes_queries, $variant_factors ); } /** * Shortcut for the responsiveImage function that only takes a height parameter. * Width is automatically generated based on the aspect ratio of the passed image. * * @param Pageimage $img The base image. Must be passed in the largest size available. * @param int|null $standard_height The standard height for this image. Use NULL to use the inherent size of the passed image. * @param array|null $attributes Optional array of html attributes. * @param array|null $sizes_queries The full queries and sizes for the sizes attribute. * @param array|null $variant_factors The multiplication factors for the alternate resolutions. * @return string */ public static function imageResponsiveByHeight( Pageimage $img, ?int $standard_height = 0, ?array $attributes = [], ?array $sizes_queries = [], array $variant_factors = [0.25, 0.5, 0.75, 1, 1.5, 2] ): string { // automatically fill the width parameter based // on the aspect ratio of the passed image if (empty($standard_height)) { $standard_height = $img->height(); } $factor = $img->width() / $img->height(); $standard_width = ceil($factor * $standard_height); return self::imageResponsive( $img, $standard_width, $standard_height, $attributes, $sizes_queries, $variant_factors ); } }  
×
×
  • Create New...