John W.

How To: Simple Sitemap.xml Generator That Optionally Excludes Children

Recommended Posts

SYNOPSIS

A little guide to generating an sitemap.xml using (I believe) a script Ryan originally wrote with the addition of being able to optionally exclude child pages from being output in the sitemap.xml file.

I was looking back on a small project today where I was using a php script to generate an xml file, I believe the original was written by Ryan. Anyway, I needed a quick fix for the script to allow me to optionally exclude children of pages from being included in the sitemap.xml output.

OVERVIEW

A good example of this is a site where if you visit /minutes/ a page displays a list of board meetings which includes a title,  date, description and link to download the .pdf file.

I have a template called minutes and a template called minutes-document. The first page, minutes, when loaded via /minutes/ simply grabs all of its child pages and outputs the name, description and actual path of an uploaded .pdf file for a visitor to download.

In my back-end I have the template MINUTES and MINUTES-DOCUMENT. Thus:

5a5d2863e7406_ScreenShot2018-01-15at4_16_02PM.png.822392d3608ac64be13bc668b2540886.png


So, basically, their employee can login, hover over minutes, click new, then create a new (child) record and name it the date of the meeting e.g. June 3rd, 2016 :

5a5d2a49076c6_ScreenShot2018-01-15at4_24_37PM.png.f7a586718fc487f600b6c1e1d86ccf13.png
 

---------------------------

OPTIONALLY EXCLUDING CHILDREN - SETUP

Outputting the sitemap.xml and optionally excluding children that belong to a template.

The setup of the original script is as follows:

1. Save the file to the templates folder as sitemap.xml.php

2. Create a template called sitemap-xml and use the sitemap.xml.php file.

3. Create a page called sitemap.xml using the sitemap-xml template

 

Now, with that done you will need to make only a couple of slight modifications that will allow the script to exclude children of a template from output to the sitemap.xml

1. Create a new checkbox field and name it:   sitemap_exclude_children

2. Add the field to a template that you want to control whether the children are included/excluded from the sitemap. In my example I added it to my "minutes" template.

3. Next, go to a page that uses a template with the field you added above. In my case, "MINUTES"

4. Enable the checkbox to exclude children, leave it unchecked to include children.

For example, in my MINUTES page I enabled the checkbox and now when /sitemap.xml is loaded the children for the MINUTES do not appear in the file.

5a5d2d8ba9b40_ScreenShot2018-01-15at4_16_24PM.png.f9636d1a9224e198ca00a38f7478d3c2.png

 

A SIMPLE CONDITIONAL TO CHECK THE "sitemap_exclude_children" VALUE

This was a pretty easy modification to an existing script, adding only one line. I just figure there may be others out there using this script with the same needs.

I simply inserted the if condition as the first line in the function:

function renderSitemapChildren(Page $page) { 
	if($page->sitemap_exclude_children) return "";

...
...
...

 

THE FULL SCRIPT WITH MODIFICATION

<?php 

/**
 * ProcessWire Template to power a sitemap.xml 
 *
 * 1. Copy this file to /site/templates/sitemap-xml.php
 * 2. Add the new template from the admin.
 *    Under the "URLs" section, set it to NOT use trailing slashes.
 * 3. Create a new page at the root level, use your sitemap-xml template
 *    and name the page "sitemap.xml".
 *
 * Note: hidden pages (and their children) are excluded from the sitemap.
 * If you have hidden pages that you want to be included, you can do so 
 * by specifying the ID or path to them in an array sent to the
 * renderSiteMapXML() method at the bottom of this file. For instance:
 *
 * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); 
 * 
 * patch to prevent pages from including children in the sitemap when a field is checked / johnwarrenllc.com
 * 1. create a checkbox field  named sitemap_exclude_children
 * 2. add the field to the parent template(s) you plan to use
 * 3. when a new page is create with this template, checking the field will prevent its children from being included in the sitemap.xml output
 */

function renderSitemapPage(Page $page) {

	return 	"\n<url>" . 
		"\n\t<loc>" . $page->httpUrl . "</loc>" . 
		"\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . 
		"\n</url>";	
}

function renderSitemapChildren(Page $page) { 

	if($page->sitemap_exclude_children) return ""; /* Aded to exclude CHILDREN if field is checked */

	$out = '';
	$newParents = new PageArray(); 
	$children = $page->children; 
	
	foreach($children as $child) {
		$out .= renderSitemapPage($child);
		if($child->numChildren) $newParents->add($child); 
			else wire('pages')->uncache($child); 
	}

	foreach($newParents as $newParent) {
		$out .= renderSitemapChildren($newParent); 
		wire('pages')->uncache($newParent); 
	}

	return $out; 
}

function renderSitemapXML(array $paths = array()) {

	$out = 	'<?xml version="1.0" encoding="UTF-8"?>' . "\n" . 
		'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

	array_unshift($paths, '/'); // prepend homepage

	foreach($paths as $path) {
		$page = wire('pages')->get($path); 
		if(!$page->id) continue; 
		$out .= renderSitemapPage($page);
		if($page->numChildren) { $out .=  renderSitemapChildren($page); }
	}

	$out .= "\n</urlset>";

	return $out; 
}

header("Content-Type: text/xml");
echo renderSitemapXML(); 
// Example: echo renderSitemapXML(array('/hidden/page/')); 

 

In conclusion, I have used a couple different processwire sitemap generating modules. But for my needs, the above script is fast and easy to setup/modify.

- Thanks

 

  • Like 5

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By gerald
      Hi all,
      I use version 3.0.62. Is there a way to delete all children of a page with one click - without parent page?
      thanks
    • By FrancisChung
      Hi, I have an ongoing issue with Google SEO that I can't seem to fix. Wondering if anyone has come across a similar situation?

      We deployed a new version of the website using a new deployment methodology and unfortunately, the wrong robots.txt file was deployed basically telling Googlebot not to scrape the site.

      The end result is that if our target keywords are used for a (Google) search, our website is displayed on the search page with "No information is available for this page." 

      Google provides a link to fix this situation on the search listing, but so far everything I have tried in it hasn't fixed the situation.
      I was wondering if anyone has gone through this scenario and what was the steps to remedy it?
      Or perhaps it has worked and I have misunderstood how it works?

      The steps I have tried in the Google Webmaster Tool :
      Gone through all crawl errors Restored the Robots.txt file and Verified with Robots.txt tester Fetch/Fetch and Render as Google as both Desktop/Mobile, using root URL and other URLs, using Indexing Requested / Indexing Requested for URL and Linked Pages. Uploaded a new Sitemap.xml  Particularly on the Sitemap page, it says 584 submitted, 94 indexed.
       
      Would the Search Engine return "No Information available" because the page is not indexed? The pages I'm searching for are our 2 most popular keywords and entry points into site. It's also one of 2 most popular category pages.  So I'm thinking it probably isn't the case but ...

      How can I prove / disprove the category pages are being indexed?

      The site in questions is Sprachspielspass.de. The keywords to search are fingerspiele and kindergedichte.

       
    • By Krlos
      Hi, I'm using Formbuilder to build forms in my  website, I have different forms to track Google Adwords Conversions but I have like 20 differents forms.
      I was wondering how do you guys handle conversions in Google Adwords
    • By Robin S
      Password Generator
      Adds a password generator to InputfieldPassword.

       
      Usage
      Install the Password Generator module.
      Now any InputfieldPassword has a password generation feature. The settings for the generator are taken automatically from the settings* of the password field.
      *Settings not supported by the generator:
      Complexify: but generated passwords should still satisfy complexify settings in the recommended range. Banned words: but the generated passwords are random strings so actual words are unlikely to occur.  
      https://modules.processwire.com/modules/password-generator/
      https://github.com/Toutouwai/PasswordGenerator
    • By iNoize
      Hello, 
      for an Project i need help for an Script. 
      There is an XML File with all the Objects Data and Images for every object. 

      This is the Data Structure 
      http://prntscr.com/hvfo31


      One XML file and Pictures. 
      So every Time the new Data is Uploadet via ImmoTool there have to be maybe an batch or something or an button in the Sitebackend to check the new files. 

      If the Status is  NEW then create an Object 
      if the Status is CHANGE  then make some changes 
      If the Status is Delete then remove the Object in other Parent 

      There are Basic infos in XML like Description Rooms etc. 
      Also the latitude and attitude for the maps. 
      After succsessful changes the Folder schould be empty for the next upload. 

      I need only help to create and handle the sites with some data (PW API ). The output I will generate by myself. 


      I have some basic php code 

      Need somebody to fix it ready for me in PW. 

      Tnx