ryan

How do I create a sitemap.xml?

Recommended Posts

To create a sitemap.xml you can use Pete's Sitemap XML module, or you can create a template file and page to do it for you. This post explains how to create a template to do it for you. The benefit here is that you may find it simpler to tweak a template file than a module, though either is a good solution.  Here is how to do it with a template file and a page:

sitemap-xml.php

/**
 * ProcessWire Template to power a sitemap.xml 
 *
 * 1. Copy this file to /site/templates/sitemap-xml.php
 * 2. Add the new template from the admin.
 *    Under the "URLs" section, set it to NOT use trailing slashes.
 * 3. Create a new page at the root level, use your sitemap-xml template
 *    and name the page "sitemap.xml".
 *
 * Note: hidden pages (and their children) are excluded from the sitemap.
 * If you have hidden pages that you want to be included, you can do so 
 * by specifying the ID or path to them in an array sent to the
 * renderSiteMapXML() method at the bottom of this file. For instance:
 *
 * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); 
 *
 */

function renderSitemapPage(Page $page) {
  return 
    "\n<url>" .
    "\n\t<loc>" . $page->httpUrl . "</loc>" .
    "\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" .
    "\n</url>";
}

function renderSitemapChildren(Page $page) {
 
  $out = '';
  $newParents = new PageArray();
  $children = $page->children;
  
  foreach($children as $child) {
    $out .= renderSitemapPage($child);
    if($child->numChildren) $newParents->add($child);
      else wire('pages')->uncache($child);
  }
  
  foreach($newParents as $newParent) {
    $out .= renderSitemapChildren($newParent);
    wire('pages')->uncache($newParent);
  }
  
  return $out;
}

function renderSitemapXML(array $paths = array()) {
  
  $out =  
    '<?xml version="1.0" encoding="UTF-8"?>' . "\n" .
    '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
  
  array_unshift($paths, '/'); // prepend homepage
  
  foreach($paths as $path) {
    $page = wire('pages')->get($path); 
    if(!$page->id) continue; 
    $out .= renderSitemapPage($page);
    if($page->numChildren) $out .= renderSitemapChildren($page);
  }
  
  $out .= "\n</urlset>";
  return $out; 
}

header("Content-Type: text/xml");

echo renderSitemapXML();  
// If you want to include other hidden pages: 
// echo renderSitemapXML(array('/path/to/hidden/page/')); 

  • Like 26

Share this post


Link to post
Share on other sites

It looks like this template renders an endless redirect. I am using this in a multilingual site, maybe that's the problem??

Share this post


Link to post
Share on other sites
It looks like this template renders an endless redirect. I am using this in a multilingual site, maybe that's the problem??

It shouldn't matter if your site is multi-lingual or not. Though just out of curiosity, which method(s) of multi-language are you using?

If you are right about "endless redirect" (as in a 301 or 302), most likely the issue is occurring somewhere before PW since there are no redirects in this template. Double check that your template is set to not have a trailing slash. Then, check your .htaccess file to make sure you don't have some directive in there that is enforcing trailing slashes. 

Share this post


Link to post
Share on other sites

I use the LanguageSupportPageNames as multilingual setup. (works amazing!!! )

This is the wget response for the sitemap.xml.

I did double check to not have a trailing slash. the .htaccess is the stock .htaccess from your processwire github.

--2013-06-21 09:27:53--  http://mywebsite.com/en/sitemap.xml/
Connecting to mywebsite.com|194.247.xx.xx|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /sitemap.xml [following]
 


 

Share this post


Link to post
Share on other sites

Okay I think I've got a better idea of what you are seeing now. But it still looks like the request "http://mywebsite.com/en/sitemap.xml/" has a trailing slash. Though I'm guessing that's actually the second redirect and that the LanguageSupportPageNames module is issuing them. LanguageSupportPageNames is actually not meant to work with the trailing slashes turned off (yet). So in your case, I would go ahead and enable trailing slashes for your sitemap-xml template and try again. Meanwhile, I'll work to get LanguageSupportPageNames compatible with trailing slashes. 

Share this post


Link to post
Share on other sites

Thanks Ryan for the template. :-)
Looking for a solution to use a sitemap together with LanguageSupportPageNames I created a sitemapindex which is stored as a static file named sitemap.xml  in the root-directory. It is working nicely.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.org/de/sitemap.xml</loc>
   </sitemap>
   <sitemap>
      <loc>http://www.example.org/en/sitemap.xml</loc>
   </sitemap>
</sitemapindex>
 

Now I am thinking about to create a sitemap based on the new google annotation: rel="alternate" hreflang="x" and your template.
My question, how can I prevent the redirect of sitemap.xml to en/sitemap.xml or any other language dependent site, still using LanguageSupportPageNames for all the other stuff?

Share this post


Link to post
Share on other sites
The first request I did was to http://mywebsite.com/sitemap.xml and that redirects to http://mywebsite.com/en/sitemap.xml/

"/en/" is the site's default language.

Thanks, I think I've got this fixed. Try out the latest dev commit.

My question, how can I prevent the redirect of sitemap.xml to en/sitemap.xml or any other language dependent site, still using LanguageSupportPageNames for all the other stuff?

Your best bet is to use no language segment for your default language. So when you edit your homepage, on the "settings" tab, make sure the default language field is blank rather than "en". 

  • Like 1

Share this post


Link to post
Share on other sites

How can I use the sitemap.xml with the foundation site profile?

A _main.php is embedded in a sitemap.xml and a clean sitemap.xml turns out only if to delete a _main.php line from a config.php.

Share this post


Link to post
Share on other sites
How can I use the sitemap.xml with the foundation site profile?

In your sitemap-xml.php file, add this line at the bottom:

$useMain = false; 
  • Like 2

Share this post


Link to post
Share on other sites

This works very well Ryan.

But I have a question :  why in the sitemap.xml there is only tag, categories ? What can I do to have too my posts?

I use your blog profile PW (it's amazing, after weeks of search for the best blog engine I found yours !!), my blog is here http://blog.itanea.fr

Thanks for your help.

Share this post


Link to post
Share on other sites
But I have a question :  why in the sitemap.xml there is only tag, categories ? What can I do to have too my posts?

Most likely your /posts/ page has hidden status. If you uncheck the hidden box on your /posts/ settings tab, it should show up in the sitemap.xml. An alternative would be to modify your sitemap code to allow for hidden pages by changing the $page->children call to $page->children("include=hidden"). 

Share this post


Link to post
Share on other sites

I get the following error:

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<p class="error WireFatalError">
Error: Call to a member function getAll() on a non-object (line 66 of C:\www\www\sites\organizedfellow.dev\site\templates\sitemap-xml.php)
<br/>
<br/>
<em>
This error message was shown because you are logged in as a Superuser. Error has been logged.
</em>
</p>

Share this post


Link to post
Share on other sites

I no longer get the above error.

I had previously gotten the sitemap.xml.php file from another threat that pointed here. I just replaced that code with what Ryan has above.

:)

Nicely done Mr Ryan.

Share this post


Link to post
Share on other sites

Hey Ryan, looks cool but I do wonder "Whats the point??"

Google seems to be continually crawling the hell out of my sites and even sites with a massive number of pages deep in the tree are well crawled. I keep reading everywhere that I "should have" an XML sitemap to make crawling more effective, but no-one ever explains exactly how it will help when the sites are already well crawled.

All I see is a out-of-date sitemap being potentially damaging.

I know this thread is old and it's probably not the place to start a discussion on sitemaps, but it would be nice to have a definitive answer other than "because you should" :)

Share this post


Link to post
Share on other sites

Hello forum,

a newbie question:

having a multilingual setup, /sitemap.xml would be redirected to /en/sitemap.xml if language name is enableb for the default language (english in this example).

I dont want to use the existing module because i don't want it loaded on each request and because i need more customization.

So i decide to use the home page with urlSegments on, intercepting urlSegment(1) == 'sitemap.xml' only for the default language.

Are there any cons in using urlSegments in the home page other than the need to render the 404 erro page for any other urlSegment? 

kind regards

Share this post


Link to post
Share on other sites

I'm new to Processwire so forgive me of not knowing everything.

I figured hat I could to something like that:

$children = $page->children("id=1017|1018|1019|1020|1029,include=hidden");

But I'd like to include a page’s children but not the parent page itself. I could set the IDs but if I add one, I have to add the ID by hand as well. This I don't.

Share this post


Link to post
Share on other sites

An easy way if you wanna build a big list would be to just make a little array with all the rules and use that to filter.  

	$excludeArray = array(
	"include=hidden",
	"parent!=2",
	"id!=1009",
	"parent!=1047",
	"id!=27",
	"id!=7",	
	"parent!=7",
	"id!=1047",	
	"id!=4945",
	"template!=items",
	"template!=antique-items",
	"template!=publication",
	"template!=publication_type",
	"template!=publication_section",	
	"template!=404",	
	"id!=2947",
	"path!=/processwire/"
	);
	
	$exclude = implode(",",$excludeArray);

	$children = $page->children($exclude); 
  • Like 3

Share this post


Link to post
Share on other sites

I'm new to Processwire so forgive me of not knowing everything.

I figured hat I could to something like that:

$children = $page->children("id=1017|1018|1019|1020|1029,include=hidden");

But I'd like to include a page’s children but not the parent page itself. I could set the IDs but if I add one, I have to add the ID by hand as well. This I don't.

Hi Hummer and welcome to the forums!

I honestly don't really understand your needs properly - $page->children() will only return the children of the current page, not the page itself. Could you possible clarify exactly what you want a little better?

Share this post


Link to post
Share on other sites

MuchDevs suggestion goes in that direction. I have to tryout some rules and see if that works.

I'll try to elaborate it better.

I have a structure, but I don't want that everything is listet in my sitemap and I don't want to add new pages and IDs manually in the future.

/root-Frontpage of PR

//Page 1

//Page 2

...

//Service (that Page ist hidden)

///News (lists all news entries)

////News entries 1

////News entries 2

////News entries 3

For now Services and every children are hidden and not shown in the sitemap

Share this post


Link to post
Share on other sites

To create a sitemap.xml you can use Pete's Sitemap XML module, or you can create a template file and page to do it for you. This post explains how to create a template to do it for you. The benefit here is that you may find it simpler to tweak a template file than a module, though either is a good solution.  Here is how to do it with a template file and a page:

attachicon.gifsitemap-xml.php

/**
 * ProcessWire Template to power a sitemap.xml 
 *
 * 1. Copy this file to /site/templates/sitemap-xml.php
 * 2. Add the new template from the admin.
 *    Under the "URLs" section, set it to NOT use trailing slashes.
 * 3. Create a new page at the root level, use your sitemap-xml template
 *    and name the page "sitemap.xml".
 *
 * Note: hidden pages (and their children) are excluded from the sitemap.
 * If you have hidden pages that you want to be included, you can do so 
 * by specifying the ID or path to them in an array sent to the
 * renderSiteMapXML() method at the bottom of this file. For instance:
 *
 * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); 
 *
 */

function renderSitemapPage(Page $page) {
  return 
    "\n<url>" .
    "\n\t<loc>" . $page->httpUrl . "</loc>" .
    "\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" .
    "\n</url>";
}

function renderSitemapChildren(Page $page) {
 
  $out = '';
  $newParents = new PageArray();
  $children = $page->children;
  
  foreach($children as $child) {
    $out .= renderSitemapPage($child);
    if($child->numChildren) $newParents->add($child);
      else wire('pages')->uncache($child);
  }
  
  foreach($newParents as $newParent) {
    $out .= renderSitemapChildren($newParent);
    wire('pages')->uncache($newParent);
  }
  
  return $out;
}

function renderSitemapXML(array $paths = array()) {
  
  $out =  
    '<?xml version="1.0" encoding="UTF-8"?>' . "\n" .
    '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
  
  array_unshift($paths, '/'); // prepend homepage
  
  foreach($paths as $path) {
    $page = wire('pages')->get($path); 
    if(!$page->id) continue; 
    $out .= renderSitemapPage($page);
    if($page->numChildren) $out .= renderSitemapChildren($page);
  }
  
  $out .= "\n</urlset>";
  return $out; 
}

header("Content-Type: text/xml");

echo renderSitemapXML();  
// If you want to include other hidden pages: 
// echo renderSitemapXML(array('/path/to/hidden/page/')); 

Thank you for this very helpful piece of code, Ryan :)

I use the sitemap for a multilanguage site (two languages only – en and de). Englisch is default (www.domain.net), German is de (www.domain.net/de)

Referring to this article (https://support.google.com/webmasters/answer/2620865?hl=en or in German https://support.google.com/webmasters/answer/2620865?hl=de) I extended the "renderSitemapPage" function to indicate which page is a translation (and which language the current page is).

It's not really automated / generic for more languages, and I also added "https://" manually. But it might be a help for someone. :) If anyone want's to optimize – you're welcome!

function renderSitemapPage(Page $page) {

	return 	"\n<url>" . 
		"\n\t<loc>" . $page->httpUrl . "</loc>" . 
		"\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . 
                "\n\t<xhtml:link rel='alternate' hreflang='de' href='https://". wire('config')->httpHost.$page->localUrl(wire('languages')->get("de")) . "' />" . 
                "\n\t<xhtml:link rel='alternate' hreflang='en' href='". $page->httpUrl . "' />" .
		"\n</url>" .
                "\n<url>" . 
		"\n\t<loc>https://". wire('config')->httpHost.$page->localUrl(wire('languages')->get("de")) . "</loc>" . 
		"\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . 
                "\n\t<xhtml:link rel='alternate' hreflang='en' href='". $page->httpUrl . "' />" .
                "\n\t<xhtml:link rel='alternate' hreflang='de' href='https://". wire('config')->httpHost.$page->localUrl(wire('languages')->get("de")) . "' />" . 
		"\n</url>";
}

With this function you only have to submit www.domain.net/sitemap/ to google to include both languges.

Attention: www.domain.net/de/sitemap generates a faulty sitemap with the code above and should not be submitted to google.

One more thing to do: To make the markup valide, you also have to add  the namespace in the urlset-Element:

function renderSitemapXML(array $paths = array()) {

	$out = 	'<?xml version="1.0" encoding="UTF-8"?>' . "\n" . 
		'<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">';
  • Like 1

Share this post


Link to post
Share on other sites

I would just like to add something to Ryans great template code. I think this may be helpful for some.

To add some styling to the output, you can:

In "renderSitemapXML" function, change the following

  $out =  '<?xml version="1.0" encoding="UTF-8"?>' . "\n" .
    '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

to this

$csspath = wire('config')->urls->templates; 
$out =  '<?xml version="1.0" encoding="UTF-8"?>' . "\n" .
    '<?xml-stylesheet type="text/xsl" href="' . $csspath . '/css/xsl-stylesheet.xsl" ?>' . "\n" .
    '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

Then include this file "xsl-stylesheet.xsl" in your "css" directory.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
                xmlns:html="http://www.w3.org/TR/REC-html40"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
                xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:template match="/">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>XML Sitemap</title>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
        <script type="text/javascript" src="http://tablesorter.com/jquery.tablesorter.min.js"></script>
        <script type="text/javascript"><![CDATA[
          $(document).ready(function() {
                $("#sitemap").tablesorter( { widgets: ['zebra'] } );
          });
        ]]></script>
        <style type="text/css">
          body {
            font-family: Helvetica, Arial, sans-serif;
            font-size: 18px;
            color: #545353;
          }
          table {
            border: none;
            border-collapse: collapse;
          }
          #sitemap tr.odd {
            background-color: #eee;
          }
          #sitemap tbody tr:hover {
            background-color: #ccc;
          }
          #sitemap tbody tr:hover td, #sitemap tbody tr:hover td a {
            color: #000;
          }
          #content {
            margin: 0 auto;
            width: 1000px;
          }
          .expl {
            margin: 10px 3px;
            line-height: 1.3em;
          }
          .expl a {
            color: #da3114;
            font-weight: bold;
          }
          a {
            color: #000;
            text-decoration: none;
          }
          a:visited {
            color: #777;
          }
          a:hover {
            text-decoration: underline;
          }
          td {
            font-size:14px;
          }
          th {
            text-align:left;
            padding-right:30px;
            font-size:12px;
          }
          thead th {
            border-bottom: 1px solid #000;
            cursor: pointer;
          }
        </style>
      </head>
      <body>
        <div id="content">
          <h1>XML Sitemap</h1>
          <p class="expl">
            Generated by <a href="http://processwire.com/">Processwire</a> this is an XML Sitemap, meant for consumption by search engines.
          </p>
          <p class="expl">
            You can find more information about XML sitemaps on <a href="http://sitemaps.org">sitemaps.org</a>.
          </p>
          <p class="expl">
            This sitemap contains <xsl:value-of select="count(sitemap:urlset/sitemap:url)"/> URLs.
          </p>
          <table id="sitemap" cellpadding="3">
            <thead>
              <tr>
                <th width="75%">URL</th>
                <th width="12%">Last Change</th>
              </tr>
            </thead>
            <tbody>
              <xsl:for-each select="sitemap:urlset/sitemap:url">
                <tr>
                  <td>
                    <xsl:variable name="itemURL">
                      <xsl:value-of select="sitemap:loc"/>
                    </xsl:variable>
                    <a href="{$itemURL}">
                      <xsl:value-of select="sitemap:loc"/>
                    </a>
                  </td>
                  <td>
                    <span>
                      <xsl:value-of select="sitemap:lastmod"/>
                    </span>
                  </td>
                </tr>
              </xsl:for-each>
            </tbody>
          </table>
        </div>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>

You can of course put the file anywhere else, just change the path in the code.

Also you can adjust the styling to suit your taste/site.

I hope this helps someone.

P.S. though I have called the file a "stylesheet" it is actually an XSL Transformation

You can read more here http://www.w3schools.com/xsl/xsl_intro.asp

  • Like 4

Share this post


Link to post
Share on other sites

An update to the xsl sylesheet above. Tablesorter is not required for zebra striping, and so jquery is not required.

Added CSS3 nth-child(odd) styling instead.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
                xmlns:html="http://www.w3.org/TR/REC-html40"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
                xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="html" version="1.0" encoding="UTF-8" indent="yes"/>
  <xsl:template match="/">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title>XML Sitemap</title>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <style type="text/css">
          body {
            font-family: Helvetica, Arial, sans-serif;
            font-size: 18px;
            color: #545353;
          }
          table {
            border: none;
            border-collapse: collapse;
          }
          #sitemap tr.odd {
            background-color: #eee;
          }
          #sitemap tbody tr:hover {
            background-color: #ccc;
          }
          #sitemap tbody tr:hover td, #sitemap tbody tr:hover td a {
            color: #000;
          }
          #content {
            margin: 0 auto;
            width: 1000px;
          }
          .expl {
            margin: 10px 3px;
            line-height: 1.3em;
          }
          .expl a {
            color: #da3114;
            font-weight: bold;
          }
          a {
            color: #000;
            text-decoration: none;
          }
          a:visited {
            color: #777;
          }
          a:hover {
            text-decoration: underline;
          }
          td {
            font-size:14px;
          }
          th {
            text-align:left;
            padding-right:30px;
            font-size:12px;
          }
          thead th {
            border-bottom: 1px solid #000;
            cursor: pointer;
          }
          tbody tr:nth-child(odd) {
            background-color: #E8E8E8;
          }
        </style>
      </head>
      <body>
        <div id="content">
          <h1>XML Sitemap</h1>
          <p class="expl">
            Generated by <a href="http://processwire.com/">Processwire</a> this is an XML Sitemap, meant for consumption by search engines.
          </p>
          <p class="expl">
            You can find more information about XML sitemaps on <a href="http://sitemaps.org">sitemaps.org</a>.
          </p>
          <p class="expl">
            This sitemap contains <xsl:value-of select="count(sitemap:urlset/sitemap:url)"/> URLs.
          </p>
          <table id="sitemap" cellpadding="3">
            <thead>
              <tr>
                <th width="75%">URL</th>
                <th width="12%">Last Change</th>
              </tr>
            </thead>
            <tbody>
              <xsl:for-each select="sitemap:urlset/sitemap:url">
                <tr>
                  <td>
                    <xsl:variable name="itemURL">
                      <xsl:value-of select="sitemap:loc"/>
                    </xsl:variable>
                    <a href="{$itemURL}">
                      <xsl:value-of select="sitemap:loc"/>
                    </a>
                  </td>
                  <td>
                    <span>
                      <xsl:value-of select="sitemap:lastmod"/>
                    </span>
                  </td>
                </tr>
              </xsl:for-each>
            </tbody>
          </table>
        </div>
      </body>
    </html>
  </xsl:template>
</xsl:stylesheet>

  • Like 4

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Recently Browsing   0 members

    No registered users viewing this page.