John W.

How To: Simple Sitemap.xml Generator That Optionally Excludes Children

Recommended Posts

SYNOPSIS

A little guide to generating an sitemap.xml using (I believe) a script Ryan originally wrote with the addition of being able to optionally exclude child pages from being output in the sitemap.xml file.

I was looking back on a small project today where I was using a php script to generate an xml file, I believe the original was written by Ryan. Anyway, I needed a quick fix for the script to allow me to optionally exclude children of pages from being included in the sitemap.xml output.

OVERVIEW

A good example of this is a site where if you visit /minutes/ a page displays a list of board meetings which includes a title,  date, description and link to download the .pdf file.

I have a template called minutes and a template called minutes-document. The first page, minutes, when loaded via /minutes/ simply grabs all of its child pages and outputs the name, description and actual path of an uploaded .pdf file for a visitor to download.

In my back-end I have the template MINUTES and MINUTES-DOCUMENT. Thus:

5a5d2863e7406_ScreenShot2018-01-15at4_16_02PM.png.822392d3608ac64be13bc668b2540886.png


So, basically, their employee can login, hover over minutes, click new, then create a new (child) record and name it the date of the meeting e.g. June 3rd, 2016 :

5a5d2a49076c6_ScreenShot2018-01-15at4_24_37PM.png.f7a586718fc487f600b6c1e1d86ccf13.png
 

---------------------------

OPTIONALLY EXCLUDING CHILDREN - SETUP

Outputting the sitemap.xml and optionally excluding children that belong to a template.

The setup of the original script is as follows:

1. Save the file to the templates folder as sitemap.xml.php

2. Create a template called sitemap-xml and use the sitemap.xml.php file.

3. Create a page called sitemap.xml using the sitemap-xml template

 

Now, with that done you will need to make only a couple of slight modifications that will allow the script to exclude children of a template from output to the sitemap.xml

1. Create a new checkbox field and name it:   sitemap_exclude_children

2. Add the field to a template that you want to control whether the children are included/excluded from the sitemap. In my example I added it to my "minutes" template.

3. Next, go to a page that uses a template with the field you added above. In my case, "MINUTES"

4. Enable the checkbox to exclude children, leave it unchecked to include children.

For example, in my MINUTES page I enabled the checkbox and now when /sitemap.xml is loaded the children for the MINUTES do not appear in the file.

5a5d2d8ba9b40_ScreenShot2018-01-15at4_16_24PM.png.f9636d1a9224e198ca00a38f7478d3c2.png

 

A SIMPLE CONDITIONAL TO CHECK THE "sitemap_exclude_children" VALUE

This was a pretty easy modification to an existing script, adding only one line. I just figure there may be others out there using this script with the same needs.

I simply inserted the if condition as the first line in the function:

function renderSitemapChildren(Page $page) { 
	if($page->sitemap_exclude_children) return "";

...
...
...

 

THE FULL SCRIPT WITH MODIFICATION

<?php 

/**
 * ProcessWire Template to power a sitemap.xml 
 *
 * 1. Copy this file to /site/templates/sitemap-xml.php
 * 2. Add the new template from the admin.
 *    Under the "URLs" section, set it to NOT use trailing slashes.
 * 3. Create a new page at the root level, use your sitemap-xml template
 *    and name the page "sitemap.xml".
 *
 * Note: hidden pages (and their children) are excluded from the sitemap.
 * If you have hidden pages that you want to be included, you can do so 
 * by specifying the ID or path to them in an array sent to the
 * renderSiteMapXML() method at the bottom of this file. For instance:
 *
 * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); 
 * 
 * patch to prevent pages from including children in the sitemap when a field is checked / johnwarrenllc.com
 * 1. create a checkbox field  named sitemap_exclude_children
 * 2. add the field to the parent template(s) you plan to use
 * 3. when a new page is create with this template, checking the field will prevent its children from being included in the sitemap.xml output
 */

function renderSitemapPage(Page $page) {

	return 	"\n<url>" . 
		"\n\t<loc>" . $page->httpUrl . "</loc>" . 
		"\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . 
		"\n</url>";	
}

function renderSitemapChildren(Page $page) { 

	if($page->sitemap_exclude_children) return ""; /* Aded to exclude CHILDREN if field is checked */

	$out = '';
	$newParents = new PageArray(); 
	$children = $page->children; 
	
	foreach($children as $child) {
		$out .= renderSitemapPage($child);
		if($child->numChildren) $newParents->add($child); 
			else wire('pages')->uncache($child); 
	}

	foreach($newParents as $newParent) {
		$out .= renderSitemapChildren($newParent); 
		wire('pages')->uncache($newParent); 
	}

	return $out; 
}

function renderSitemapXML(array $paths = array()) {

	$out = 	'<?xml version="1.0" encoding="UTF-8"?>' . "\n" . 
		'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

	array_unshift($paths, '/'); // prepend homepage

	foreach($paths as $path) {
		$page = wire('pages')->get($path); 
		if(!$page->id) continue; 
		$out .= renderSitemapPage($page);
		if($page->numChildren) { $out .=  renderSitemapChildren($page); }
	}

	$out .= "\n</urlset>";

	return $out; 
}

header("Content-Type: text/xml");
echo renderSitemapXML(); 
// Example: echo renderSitemapXML(array('/hidden/page/')); 

 

In conclusion, I have used a couple different processwire sitemap generating modules. But for my needs, the above script is fast and easy to setup/modify.

- Thanks

 

  • Like 5

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Marco Angeli
      Hi there,
      I added a ssl certificate to my site and I'd like to redirect every single http url to its new https version
      So I added this code in the .htacces file, after the RewriteEngine On :
      Redirect 301 /about https://www.mysite.it/about
      Unfortunately this is now working: I get the "too many redirects" error.
      The following code works, but it's a bulk redirection to the home page, something I don't want for SEO reasons (https://moz.com/blog/save-your-website-with-redirects😞
      RewriteCond %{HTTP_HOST} mysite\.it [NC]
      RewriteCond %{SERVER_PORT} 80
      RewriteRule ^(.*)$ https://www.mysite.it/$1 [R,L]
      Any suggestions?
    • By Guy Verville
      First of all, I'm not an expert on PHP. I recently read about generators and I understand their usefulness in avoiding loading a set of objects into an array to the point of saturating the memory.
      The $pages->find() call is known to be greedy (and slow) when it comes to processing large amounts of pages, because it loads all objects into memory.
      Is there a way to use a generator to avoid this problem? Is there a workaround? I know that $pages->findMany() exists, but it is also called greedy.
      See https://secure.php.net/manual/en/language.generators.php
      Translated with www.DeepL.com/Translator
    • By gebeer
      Hi,
      just wanted to share something I came across while working on an import module for XML data from a web service. The XML I got was not huge, but still, loading around 3.5 MB of XML with 250+ large child nodes into memory at once with simplexml_load_file() and then looping over it had significant impact on performance.
      I searched for a solution and found this great article about how to parse large XML files.
      It basically explains how to utilize the native XMLReader class together with SimpleXMLElement to handle such situations in a memory efficient way.
      After implementing it I got a significant improval on perceived performance. No comparison in numbers to share here as I'm a bit short on time.
    • By kaba86
      Hello PW Community, really glad that discovered this CMS recently, it is very strange it took so long That idea of no front design limitations is just awesome!
      Need to say that I have a bit of knowledge of html and css, but almost no php, so I need your help.
      What I want to do is an article posting  cms, with this structure:
       
      - Homepage - Projects - Articles -- Category 1 --- Articles of category 1 -- Category 2 --- Articles of category 2 - About - Contact Found this ProcessWire Profile https://github.com/tutsplus/how-to-create-an-ajax-driven-theme-for-processwire
      It covers almost all my needs, except the menu. When I add a childpage for this page http://artist.nicegrp.com/publications/world-world/ , World:World doesn't appear under Writings & Publications.
      I need a menu that works like a breadcrumb, that shows on the menu the category that you are viewing. So when I'm in articles page, on the menu it shows only articles and it's categories. When I get into a category, that category takes state active link but doesn't show on the menu links and titles for contained articles. How can I do that?
      Sorry for my long writing and English, it is not my native but I hope you understood what I need. Can you help me with that?
      Thank you
       
    • By chrizz
      hey there
      I guess a lot of you have already heard of the hreflang attribute which tells search engines which URL they should list on their result pages. For some of my projects I build this manually but now I am wondering if there's need to add this as a module to PW modules directory. 
      How do you deal with the hreflang thingy? Would you you be happy if you can use a module for this or do you have concerns that using a module maybe does not cover your current use cases?
      Cheers,
      Chris