Jump to content

How do I create a sitemap template? (plus, how to deal with thousands of pages)


ryan
 Share

Recommended Posts

Creating a sitemap is fairly easy in ProcessWire. The strategy we use is to get the page where we want the sitemap to start (like the homepage), print out it's children, and perform the same action on any children that themselves have children. We do this with a recursive function. Below is the contents of the sitemap.php template which demonstrates this. This example is also included in the default ProcessWire installation, but we'll go into more detail here.

/site/templates/sitemap.php

<?php

function sitemapListPage($page) {

       // create a list item & link to the given page, but don't close the <li> yet
       echo "<li><a href='{$page->url}'>{$page->title}</a> ";

       // check if the page has children, if so start a nested list
       if($page->numChildren) {
               // start a nested list
               echo "<ul>";

               // loop through the children, recursively calling this function for each
               foreach($page->children as $child) sitemapListPage($child);

               // close the nested list
               echo "</ul>";
       }

       // close the list item
       echo "</li>";
}

// include site header markup
include("./head.inc");

// start the sitemap unordered list
echo "<ul class='sitemap'>";

// get the homepage and start the sitemap
sitemapListPage($pages->get("/"));

// close the unordered list
echo "</ul>";

// include site footer markup
include("./foot.inc");

The resulting markup will look something like this (for the small default ProcessWire site):

<ul class='sitemap'>
   <li><a href='/'>Home</a> 
       <ul>
           <li><a href='/about/'>About</a> 
               <ul>
                   <li><a href='/about/child1/'>Child page example 1</a> </li>
                   <li><a href='/about/child2/'>Child page example 2</a> </li>
               </ul>
           </li>
           <li><a href='/templates/'>Templates</a> </li>
           <li><a href='/site-map/'>Site Map</a> </li>
      </ul>
   </li>
</ul>				

Note: to make this site map appear indented with each level, you may need to update your stylesheet with something like this:

ul.sitemap li {
   margin-left: 2em; 
}

The above sitemap template works well for a simple site. But what if you have some pages that have a "hidden" status? They won't appear in the sitemap, nor will any of their children. If you want them to appear, then you would want to manually add them to the what is displayed. To do this, retrieve the hidden page and send it to the sitemapListPage() function just like you did with the homepage:

<?php

// get the homepage and start the sitemap
// (this line is included here just for placement context)
sitemapListPage($pages->get("/"));

// get our hidden page and include it in the site map
sitemapListPage($pages->get("/some-hidden-page/")); 

What if your sitemap has thousands of pages?

If you have a very large site, this strategy above may produce a sitemap with thousands of items and take a second or two to generate. A page with thousands of links may not be the most helpful sitemap strategy to your users, so you may want to consider alternatives. However, if you've decided you want to proceed, here is how to manage dealing with this many pages in ProcessWire.

1. First off you probably don't want to regenerate this sitemap for every pageview. As a result, you should enable caching if your template in: Admin > Setup > Templates > Sitemap > Advanced > Cache Time. I recommend setting it to one day (86400 seconds). Once you save this setting, the template will be rendered from a cache when the user is not logged in. Note that when you view it while still logged in, it's not going to use the cache… and that's okay.

2. Secondly, consider adding limits to the number of child pages you retrieve in the sitemapListPage function. It may be that you only need to list the first hundred child pages, in which case you could add a "limit=100" selector to your $page->children call:

<?php
// this example takes place inside the sitemapListPage function. 
// loop through the children, recursively calling this function for each: 
foreach($page->children("limit=100") as $child) sitemapListPage($child);

3. Loading thousands of pages (especially with lots of autojoined fields) may cause you to approach the memory limit of what Apache will allow for the request. If you are hitting a memory limit, you'll know it because ProcessWire will generate an error. If that happens, you need to manage your memory by freeing groups of pages once you no longer need them. Here's one strategy to use at the end of the sitemapListPage function that helps to ensure the memory allocated to the child pages is freed, making room for another thousand pages. :)

<?php
function sitemapListPage($page) {

       // ... everything above omitted for brevity in this example ...

       // close the list item
       echo "</li>";

       // release loaded pages by telling the $pages variable to uncache them. 
       // this will only uncache pages that are out of scope, so it's safe to use.
       wire('pages')->uncacheAll();
}
  • Like 3
Link to comment
Share on other sites

  • 1 year later...

The above sitemap template works well for a simple site. But what if you have some pages that have a "hidden" status? They won't appear in the sitemap, nor will any of their children. If you want them to appear, then you would want to manually add them to the what is displayed. To do this, retrieve the hidden page and send it to the sitemapListPage() function just like you did with the homepage:

<?php

// get the homepage and start the sitemap
// (this line is included here just for placement context)
sitemapListPage($pages->get("/"));

// get our hidden page and include it in the site map
sitemapListPage($pages->get("/some-hidden-page/"));

While this works, it doesn't really include the hidden page in the sitemap but adds it to it, i.e. both the root page and the hidden page (including their children) are list items of the sitemap. In my case, the hidden page is technically a child page of the root page, so it should be included in the root pages list item as a child. (I know I could "simulate" this in terms of looks with CSS, but I prefer to have the "proper" markup.)

Is there a way to achieve this?

Link to comment
Share on other sites

Huh, why don't you also use MarkupSimpleNavigation with a little smile ;)

You can also include hidden pages if you add the selector "include=hidden". If that helps.

  • Like 1
Link to comment
Share on other sites

Ryan, if you ever need some king of slogan for PW, I suggest: "It's just too easy." ;)

Of course, thanks to soma, this

foreach($page->children("include=hidden") as $child) sitemapListPage($child);

does the trick. Boy, I wish I had known about PW way, way earlier. :)

  • Like 2
Link to comment
Share on other sites

  • 3 months later...
  • 5 months later...

Huh, why don't you also use MarkupSimpleNavigation with a little smile ;)

You can also include hidden pages if you add the selector "include=hidden". If that helps.

this is definitely *THE WAY* to make a sitemap...! Thanks...

spent a long time trying to make a 3 column sitemap using CSS3 column count, however it will not work if you have home as a 'parent' <li> to the rest of the list, because the browser will keep all of the child items of that <li> in one column... with the MSN module, you can prepend the homepage, so it is in it's own <li> on the same level as the rest of the site...

Link to comment
Share on other sites

  • 8 months later...
hi guys is it possible for every (nth) pages to create a new sitemap xml?

Not sure I understand the question? But I'm guessing you might be talking about caching. You could certainly cache (and probably should), using the caching features at the template level. 

Link to comment
Share on other sites

hi Ryan thanks for the response, my problem is that i have 12000 pages that i have to serve in an .xml so i get an error for memory limit, i really dont know if it worth for the seo to leave all the pages or to use limits,  ...

this is the approach for the XML format of a Sitemap index  -> https://support.google.com/webmasters/answer/71453 but i m not sure if  this  will work because i will have to make for every (about) 10000 pages a new xml, do you have any idea for an automate way to create a new xml page for every  (3000) pages?

Link to comment
Share on other sites

You could always create an XML file in multiple stages and keep appending to it.  If you are filling up memory even while using the $pages->uncacheAll(); as described in the original post, then you are probably filling up memory as a result of the output you are generating. In this case, it does make sense to write to a file and do it in chunks rather than all at once. What i mean by write to a file is fwrite($fp, $data); rather than echo $data; Whenever you fopen() a file, you can choose "a" as an "append" write mode, which means anything you write to it will simply be appended to what is already there. Chances are it'll take some time to write out 10k+ pages, so you'll need to ini_set('max_execution_time', 60*5); // 5 minutes or however long you need. Lastly, consider whether you really need a sitemap.xml and if it's worth the effort. My experience has been that so long as the site has good internal links, a sitemap.xml doesn't provide any perceptible benefit other than adding some fun to the webmaster tools screen. Though maybe others have a different experience. 

Link to comment
Share on other sites

  ...

In this case, it does make sense to write to a file and do it in chunks rather than all at once. What i mean by write to a file is fwrite($fp, $data); rather than echo $data; Whenever you fopen() a file, you can choose "a" as an "append" write mode, which means anything you write to it will simply be appended to what is already there.

...

Hi, maybe a bit OT but worth to mention I think:

PHP has a nice function to create and open a pointer to a temporary file - but with the advantage that it first write to memory and only flush content to the disk if a specified amount of memory is reached:

$mb5 = 5 * 1024 * 1024;

$fp = fopen('php://temp/maxmemory:' . $mb5, 'rb+');

// then add content to the pointer ...
fputs($fp, "hello\n");

// At the end rewind and echo out the content:
rewind($fp);
echo stream_get_contents($fp);

The first 5 MB will kept in memory and if the content hits the limit it gets written into a temporary diskfile.

see: http://www.php.net/manual/en/wrappers.php.php

  • Like 6
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...