Jump to content

How do I create a sitemap.xml?


ryan

Recommended Posts

  • 1 month later...

On my new site a couple of months ago, I setup Ryan's sitemap from this thread. It worked fine.

After that I have added a whopping 18 000 pages (tags, etc.), and my sitemap is no longer working. I have no idea if there's a connection. But could there be a limit? My code is still exactly like Ryan's, no customisation.

URL is http://filmdagbok.no/sitemap.xml

Chrome shows nothing, Firefox shows an error.

Extra info:

No trailing slash. No cache (although I should probably setup this for this template?).

Edit:

I figured out that if I exclude the parent for my different tag types, the XML worked. But still, I wonder if there's a limit on how many pages the sitemap can hold? And how one would solve the problem.

Edited by laban
Link to comment
Share on other sites

Hello laban,

I use the latest firefox and in my case your sitemap is working, but I am not able to see how many links are in there.

I edited my post. I was able to show my sitemap when I reduced links to around 2400. That is the number of pages I have if I exclude my tags (actors and directors). The the sitemap is now working. But it stops working if I allow all my pages again.

Link to comment
Share on other sites

Hi m-artin,

thanks for pointing us to this post. I have implemented it on my site and it works quite well. But should not the multilanguage declaration be defined for every page in the sitemap instead for the root page only?

Best regards Jürgen

Link to comment
Share on other sites

 But should not the multilanguage declaration be defined for every page in the sitemap instead for the root page only?

I'm not quite sure and also asked myself this question. Looks like, we have to investigate further.

Link to comment
Share on other sites

You must create a separate url element for each URL. Each url element must include a loc tag indicating the page URLs, and an xhtml:link rel="alternate" hreflang="XX" subelement for every alternate version of the page, including itself.

Source: https://support.google.com/webmasters/answer/2620865?hl=en

I understand "You must create a separate url element for each URL" that each page in the sitemap has to include the language tags.

Link to comment
Share on other sites

  • 1 month later...

Interesting issue here. I have the setup:

Articles page which is hidden. This has the template 'articles' which does not have a file.

The children of the Articles have the template 'article' which does have a file.

When I change $children = $page->children; to $children = $page->children("template=article, include=all"); these are not displayed. I am using 2.6.19.

Link to comment
Share on other sites

  • 4 months later...

Source: https://support.google.com/webmasters/answer/2620865?hl=en

I understand "You must create a separate url element for each URL" that each page in the sitemap has to include the language tags.

@Juergen,

obviously I missed your post half a year ???

New Edition

But finally I found it and redesigned the template file I provided in post 88 of another thread

It is working now properly according to the google guidelines. It comes with 301 redirect to the path version that lacks the language segment which works in PW 3.0 up but not in 2.7. In this case you need to uncomment the code line which forces the redirect.

Template detects if the translation is checked 'active'. Furthermore you can easily adjust selectors for the page array where the pathes are taken from. Feel free to try and use it.

multilang-sitemap-xml.php.zip

  • Like 6
Link to comment
Share on other sites

  • 4 months later...

Hi all,

I make a website using PW for one of my friends/clients and I using the template file method by Ryan to make the XML sitemap.

The website is multi-language (English is default) so when I give him the sitemap links for every language he ask me why is the http://www.domain.com/sitemap.xml have the same content as http://www.domain.com/en/sitemap.xml

and ask me if it possible to use only one site map file that content/linked to all other sitemaps so I make some changes so that http://www.domain.com/sitemap.xml will output:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
	<loc>http://www.domain.com/en/sitemap.xml</loc>
</sitemap>
<sitemap>
	<loc>http://www.domain.com/sublang/sitemap.xml</loc>
</sitemap>
</sitemapindex>

 

So I change:

echo renderSitemapXML();

In to:

$mapindexurl = $_SERVER[HTTP_HOST].$_SERVER[REQUEST_URI];    

if ($mapindexurl == $_SERVER[HTTP_HOST]."/sitemap.xml") {
    
     //echo renderSitemapindexXML(); I tried to move the code below to a function but the language loop did not output anything
    echo '<?xml version="1.0" encoding="UTF-8"?>' . "\n" .
    '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
    
		foreach($languages as $language) {
        
        if(!$page->viewable($language)) continue; // is page viewable in this language?
        $url = $page->localUrl($language); 
        echo "\n<sitemap>" .
        "\n\t<loc>http://" . $_SERVER[HTTP_HOST]. $url . "</loc>" .
        "\n</sitemap>";
        
		}
    
    echo "\n</sitemapindex>";
    
} else {
    echo renderSitemapXML();
}

 

Its working fine and I hope this is useful to someone if the code is OK

I also hope if someone can review it since I am not a code writer ^_^

Thank you

Link to comment
Share on other sites

  • 7 months later...

Here is my solution for large sitemaps containing thousands upon thousands of pages, without the trouble of timeouts and such. I'm currently running this on website with 170.000+ pages. Relying only upon the ProcessWire API. Just a sidenote: I have no actual need to add all my pages to a sitemap …

The keywords here are sitemap index, sitemap and ProcessWire's page numbers.

Two templates:

  • sitemap-index.php
  • sitemap-xml.php (with page numbers activated)

The structure of mine is this:

  • sitemap-index.php is domain.com/sitemap/
  • sitemap-xml.php is domain.com/sitemap/sitemap/ 

And the code for each of them:

sitemap-index.php

<?php
	namespace ProcessWire;

	$out =  
	    '<?xml version="1.0" encoding="UTF-8"?>' . "\n" .
	    '<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
	
	$templates = "basic-page|blog-post|tag";
	$key = $pages->count("template=$templates");
	$limit = 200;

	$pageNum = ceil($key/ $limit);
	$post = $pages->get("template=sitemap-xml");

	$i = 1;

	while($pageNum >= $i){
	  $out .= "\n<sitemap>" .
	    "\n\t<loc>" . $post->httpUrl . "page$i/</loc>" .
	    "\n\t<lastmod>" . date("Y-m-d", $post->modified) . "</lastmod>" .
	    "\n</sitemap>";

	  $i = $i + 1;
	}

	$out .= "\n</sitemapindex>";

	header("Content-Type: text/xml");

	echo $out; 
?>

sitemap-xml.php

<?php
	namespace ProcessWire;

	$out =  
	    '<?xml version="1.0" encoding="UTF-8"?>' . "\n" .
	    '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';

	$pageArray = $pages->find("template=basic-page|blog-post|tag, limit=200");

	foreach ($pageArray as $post) {
	  $out .= "\n<url>" .
	    "\n\t<loc>" . $post->httpUrl . "</loc>" .
	    "\n\t<lastmod>" . date("Y-m-d", $post->modified) . "</lastmod>" .
	    "\n</url>";
	}

	$out .= "\n</urlset>";

	header("Content-Type: text/xml");

	echo $out;
?>

 

Link to comment
Share on other sites

  • 1 month later...

I'm using hreflang within head as shown on the languages PW3 templates. Should my sitemap-xml template include only the default language since I'm already including hreflang within head or I should include all the additional languages too?

Link to comment
Share on other sites

27 minutes ago, Zeka said:

Here is another link about hreflang: https://support.google.com/webmasters/answer/189077?hl=en that clearly says we have to use 1 of these 3 methods: 

HTML link element in header
HTTP header
Sitemap

I'm using the first method "HTML link element in header". So that means on my sitemap I don't have to use hreflang too but do I have to include the links of the additional languages or not? It's a bit confusing...

Link to comment
Share on other sites

  • 6 months later...
  • 5 years later...

I was looking for a solution for a Multisite project, and found that with a slight change of Ryans script  I can manage to have the correct stylesheet for each domain.
I therefore call the renderSitemapXML function with the correct root page:

echo renderSitemapXML(array($page->rootParent->url));

and within that function skip the line

array_unshift($paths, '/'); // prepend homepage

That is doing the job! 🤩

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...