Jump to content

How To: Simple Sitemap.xml Generator That Optionally Excludes Children


John W.
 Share

Recommended Posts

SYNOPSIS

A little guide to generating an sitemap.xml using (I believe) a script Ryan originally wrote with the addition of being able to optionally exclude child pages from being output in the sitemap.xml file.

I was looking back on a small project today where I was using a php script to generate an xml file, I believe the original was written by Ryan. Anyway, I needed a quick fix for the script to allow me to optionally exclude children of pages from being included in the sitemap.xml output.

OVERVIEW

A good example of this is a site where if you visit /minutes/ a page displays a list of board meetings which includes a title,  date, description and link to download the .pdf file.

I have a template called minutes and a template called minutes-document. The first page, minutes, when loaded via /minutes/ simply grabs all of its child pages and outputs the name, description and actual path of an uploaded .pdf file for a visitor to download.

In my back-end I have the template MINUTES and MINUTES-DOCUMENT. Thus:

5a5d2863e7406_ScreenShot2018-01-15at4_16_02PM.png.822392d3608ac64be13bc668b2540886.png


So, basically, their employee can login, hover over minutes, click new, then create a new (child) record and name it the date of the meeting e.g. June 3rd, 2016 :

5a5d2a49076c6_ScreenShot2018-01-15at4_24_37PM.png.f7a586718fc487f600b6c1e1d86ccf13.png
 

---------------------------

OPTIONALLY EXCLUDING CHILDREN - SETUP

Outputting the sitemap.xml and optionally excluding children that belong to a template.

The setup of the original script is as follows:

1. Save the file to the templates folder as sitemap.xml.php

2. Create a template called sitemap-xml and use the sitemap.xml.php file.

3. Create a page called sitemap.xml using the sitemap-xml template

 

Now, with that done you will need to make only a couple of slight modifications that will allow the script to exclude children of a template from output to the sitemap.xml

1. Create a new checkbox field and name it:   sitemap_exclude_children

2. Add the field to a template that you want to control whether the children are included/excluded from the sitemap. In my example I added it to my "minutes" template.

3. Next, go to a page that uses a template with the field you added above. In my case, "MINUTES"

4. Enable the checkbox to exclude children, leave it unchecked to include children.

For example, in my MINUTES page I enabled the checkbox and now when /sitemap.xml is loaded the children for the MINUTES do not appear in the file.

5a5d2d8ba9b40_ScreenShot2018-01-15at4_16_24PM.png.f9636d1a9224e198ca00a38f7478d3c2.png

 

A SIMPLE CONDITIONAL TO CHECK THE "sitemap_exclude_children" VALUE

This was a pretty easy modification to an existing script, adding only one line. I just figure there may be others out there using this script with the same needs.

I simply inserted the if condition as the first line in the function:

function renderSitemapChildren(Page $page) { 
	if($page->sitemap_exclude_children) return "";

...
...
...

 

THE FULL SCRIPT WITH MODIFICATION

<?php 

/**
 * ProcessWire Template to power a sitemap.xml 
 *
 * 1. Copy this file to /site/templates/sitemap-xml.php
 * 2. Add the new template from the admin.
 *    Under the "URLs" section, set it to NOT use trailing slashes.
 * 3. Create a new page at the root level, use your sitemap-xml template
 *    and name the page "sitemap.xml".
 *
 * Note: hidden pages (and their children) are excluded from the sitemap.
 * If you have hidden pages that you want to be included, you can do so 
 * by specifying the ID or path to them in an array sent to the
 * renderSiteMapXML() method at the bottom of this file. For instance:
 *
 * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); 
 * 
 * patch to prevent pages from including children in the sitemap when a field is checked / johnwarrenllc.com
 * 1. create a checkbox field  named sitemap_exclude_children
 * 2. add the field to the parent template(s) you plan to use
 * 3. when a new page is create with this template, checking the field will prevent its children from being included in the sitemap.xml output
 */

function renderSitemapPage(Page $page) {

	return 	"\n<url>" . 
		"\n\t<loc>" . $page->httpUrl . "</loc>" . 
		"\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . 
		"\n</url>";	
}

function renderSitemapChildren(Page $page) { 

	if($page->sitemap_exclude_children) return ""; /* Aded to exclude CHILDREN if field is checked */

	$out = '';
	$newParents = new PageArray(); 
	$children = $page->children; 
	
	foreach($children as $child) {
		$out .= renderSitemapPage($child);
		if($child->numChildren) $newParents->add($child); 
			else wire('pages')->uncache($child); 
	}

	foreach($newParents as $newParent) {
		$out .= renderSitemapChildren($newParent); 
		wire('pages')->uncache($newParent); 
	}

	return $out; 
}

function renderSitemapXML(array $paths = array()) {

	$out = 	'<?xml version="1.0" encoding="UTF-8"?>' . "\n" . 
		'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

	array_unshift($paths, '/'); // prepend homepage

	foreach($paths as $path) {
		$page = wire('pages')->get($path); 
		if(!$page->id) continue; 
		$out .= renderSitemapPage($page);
		if($page->numChildren) { $out .=  renderSitemapChildren($page); }
	}

	$out .= "\n</urlset>";

	return $out; 
}

header("Content-Type: text/xml");
echo renderSitemapXML(); 
// Example: echo renderSitemapXML(array('/hidden/page/')); 

 

In conclusion, I have used a couple different processwire sitemap generating modules. But for my needs, the above script is fast and easy to setup/modify.

- Thanks

 

  • Like 5
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...