Jump to content
Pete

Module: XML Sitemap

Recommended Posts

I'm on the verge of converting your code back to a template, this is getting silly...

Just want to clarify that the problem is not with Pete's module. There was an obscure problem with the modules installer in the core which has been fixed. I'm not certain this is the problem in your case as you are still getting a different behavior than us, but I suggest grabbing the latest copy of ProcessWire before trying anything else.

Share this post


Link to post
Share on other sites

Just want to clarify that the problem is not with Pete's module.

My apologies for implying any such thing, just getting frustrated. When I saw this module it looked like a great timesaver, and that hasn't worked out.

I'm uploading a fresh copy of PW now, we'll see how it goes.

EDIT: Still no go!

Share this post


Link to post
Share on other sites

Can you confirm that you still get this same error message and that your ProcessWire reports it's version number as 2.2?

Exception: Unable to create path: /MarkupSitemapXML/ (in /home/theseeke/public_html/wire/core/CacheFile.php line 62) This error message was shown because you are logged in as a Superuser. Error has been logged.

thanks,

Ryan

Share this post


Link to post
Share on other sites

I just downloaded a fresh/blank copy of PW 2.2 and installed this module just to make sure it didn't have anything to do with the dev site I had tested on before. But it's working as expected, creating the dir in the right place, etc. So there must be something else that I'm missing. The behavior you are describing definitely indicates a core bug. I can't think of any other possibility. I'm going to do more testing here and hope to find and push a solution shortly.

Share this post


Link to post
Share on other sites

I'm pleased to say that this issue is resolved, and when it came right down to it, Problem Existed Between Keyboard and Chair.

Stupid me forgot to read the instructions, and installed the module to /wire/modules instead of /site/modules.

Now that it's in the right place it's working fine, cheers!

Share this post


Link to post
Share on other sites

Thanks for reporting back, glad that it's working. I was stuck trying to determine how it was doing that, so it's a relief to hear it's resolved. This has still been valuable though as we did solve a bug as a result (top of this page).

Share this post


Link to post
Share on other sites

Hi there,

I've been using this module on a couple of sites in place of the template I previously used and it certainly is a boon in keeping my template folder a lot tidier — great work!

I changed line 53 to check for access when iterating the children so that user-restricted pages do not show up in the sitemap. I think this is a saner default but could be a settting too.

foreach($page->children("check_access=1") as $child) $entry .= $this->sitemapListPage($child);

Thanks,

Stephen

Share this post


Link to post
Share on other sites

Stephen: check_access=1 is default, so you don't have to include that. If you don't want to check access, then you need to use check_access=0.

  • Like 1

Share this post


Link to post
Share on other sites

Very true. I obviously visited the sitemap straight after installing, while still logged in as admin. My mistake!

Thanks, apeisa

Share this post


Link to post
Share on other sites

Want to share a few things I've added for a magazine site I'm building:

First of all, I think "priority" and "changefreq" are fairly important if you're going to have a sitemap at all. This post has some info on Google's guidelines: http://www.eduki.com...-are-important/

What I decided to do was quickly add two global fields to PW so I can set these values manually in each page:

-sitemap_priority

-changefreq

And in the code:

public function sitemapListPage($page) {
$entry = "";
$default_priority = "0.5";
$default_changefreq = "monthly";

include $this->fuel('config')->paths->templates . "sitemap_module_defaults.inc";

if ($page->sitemap_ignore == 0 || $page->path == '/') { // $page->path part added so that it ignores hiding the homepage, else you wouldn't have ANY pages returned
$modified = date ('Y-m-d', $page->modified);
$entry = "\n <url>\n";
$entry .= " <loc>{$page->httpUrl}</loc>\n";
$entry .= " <lastmod>{$modified}</lastmod>\n";

if(!empty($page->sitemap_priority)) {
$entry .= " <priority>{$page->sitemap_priority}</priority>\n";
} else {
$entry .= " <priority>{$default_priority}</priority>\n";
}

if(!empty($page->changefreq)) {
$entry .= " <changefreq>{$page->changefreq}</changefreq>\n";
} else {
$entry .= " <changefreq>{$default_changefreq}</changefreq>\n";
}

 $entry .= " </url>";
if($page->numChildren) {
foreach($page->children as $child) $entry .= $this->sitemapListPage($child);
}
}
return $entry;
}

The sitemap_module_defaults.inc file in the templates dir is so I can set some values on the fly without doing it manually:

<?php
switch($page->template->name) {
case "blog_post":
case "blog_topic":
case "blog_topic_type":
$default_priority = "0.7";
$default_changefreq = "daily";
break;
}
?>

That's done and works fine for me.

Something I found frustrating with the sitemap module was that I couldn't add virtual pages I made. For example, if there's a page with urlSegments with some kind of page manipulation they won't show up on the sitemap, for obvious reasons. So what I did was add this method to the module:

public function sitemapListVirtualPage($httpUrl, $modified = NULL, $sitemap_priority = "0.5", $changefreq = "monthly") {
$entry = "";
$modified = date ('Y-m-d', $modified);
$entry = "\n <url>\n";
$entry .= " <loc>{$httpUrl}</loc>\n";

if($modified) {
$entry .= " <lastmod>{$modified}</lastmod>\n";
}

if($sitemap_priority) {
$entry .= " <priority>" . (float)$sitemap_priority . "</priority>\n";
}

if($changefreq) {
$entry .= " <changefreq>{$changefreq}</changefreq>\n";
}

 $entry .= " </url>";
return $entry;
}

And added this include to the init method before the output is saved to cache:


public function init() {
// Intercept a request for a root URL ending in sitemap.xml and output
if (strpos($_SERVER['REQUEST_URI'], wire('config')->urls->root . 'sitemap.xml') !== FALSE) {
// Check for the cached sitemap, else generate and cache a fresh sitemap
$cache = wire('modules')->get("MarkupCache");
if(!$output = $cache->get("MarkupSitemapXML", 3600)) {
$output = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
$output .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">';
$output .= $this->sitemapListPage(wire('pages')->get("/"));

include $this->fuel('config')->paths->templates . "sitemap_module_virtual.inc";

$output .= "\n</urlset>";
$cache->save($output);
}

header("Content-Type: text/xml");
echo $output;
exit;
}
}

And in my sitemap_module_virtual.inc file I did this:

<?php
foreach(wire('pages')->find("template=blog_topic|blog_topic_type") as $real_page) {
$output .= $this->sitemapListVirtualPage("http://" . $this->fuel('config')->httpHost . $real_page->url . "archives/", wire('pages')->get("template=blog_post,sort=-created")->modified, "0.4", "daily");
}
foreach(wire('pages')->find("template=blog_topic|blog_topic_type") as $real_page) {
$output .= $this->sitemapListVirtualPage("http://" . $this->fuel('config')->httpHost . $real_page->url . "rss/", wire('pages')->get("template=blog_post,sort=-created")->modified, "0.4", "daily");
}
?>

In this part, I have manipulated /archive/ and /rss/ sub-pages for each blog_topic and sub-topic(or blog_topic_type) using urlSegments. I wanted these to show up on the sitemap, even though they're probably not super important.

Usually I wouldn't bother doing this if it was just these kinds of pages. But what if you have countless articles with manipulated urls that don't show in the sitemap? This is perfect example of why I did this for the future as well.

I've been trying to understand how to do these things for a while, so I hope it helps someone out.

EDIT: I have a download with all of my changes if anyone's interested in taking a look: http://clintonskakun.com/processwire-docs/posts/the-xml-sitemap-module-with-priority-and-changefreq-and-more/

Edited by ClintonSkakun
  • Like 1

Share this post


Link to post
Share on other sites

Thanks for posting this, these seem like some useful additions and some good insights on sitemap.xml too.

Share this post


Link to post
Share on other sites

jukooz asked me a while back how to stick the sitemaps into a template as he's using the multisite module and currently I guess that this module will pull the sitemap for EVERY site on an install...?

This should work on a per-site basis having it in a separate template file per site as a workaround for now, but pay attention to the comments please as there are things to change per-site - please also note that this is un-tested and largely just pulled from the module and tweaked for pasting into a template for use instead of the module:

EDIT: See attachment as the forum software tries to parse a URL in the code : sitemap.txt

I've also updated the module (see first post) to v1.0.3 to check if the page is viewable before including it in the sitemap - I noticed that it was incorrectly listing pages that had no template file... oops!

  • Like 1

Share this post


Link to post
Share on other sites

Hey,

is it possible to use this module with the LanguageLocalizedURL ?

I wanne make the site multilanguage like this:

www.url.de/de/testindeutsch/

www.url.de/en/testinenglish/

Would like to hear from you... Greets Jens alias DV-JF

Share this post


Link to post
Share on other sites

While I've not tried it, I would guess that this module does not collaborate with or accommodate the LanguageLocalizedURL module in any special way. Though Pete could say for sure. 

Share this post


Link to post
Share on other sites

Hey Ryan,

thx for answering...

Though Pete could say for sure. 

Hope so :)

Greets

Share this post


Link to post
Share on other sites

It doesn't accommodate it at the moment, no, but I may need this myself soon so if you can wait a week or two I may have an update (it's not at the top of my list at the moment but I might surprise myself and do it sooner ;)).

Thinking out loud, it needs:

  • Check if LanguageLocalizedURL is installed
  • Find the root page for each language
  • Generate a sitemap.xml page under each root page (it doesn't really generate a page in the database, it just outputs the sitemap if you request that URL)

I think that's it, but I need to get to grips with the LanguageLocalizedURL first.

Share this post


Link to post
Share on other sites

Pete the LLU module doesn't have separate trees, it uses a gateways page for each language in the root like "/de/" "/en/" with url segments to then get the page and switch user language.  The site structure is still all the same as without, you just use text language fields.

The parsing of the url happend automaticly through these gateway pages and it hooks into Page::path to change the url of the pages system wide. So if you do a echo $page->url you'll get the language url in the language you're currently in. Like /en/about-us/, /de/ueber-uns/. 

Have anyone tried yet if it doesn't already work?

Share this post


Link to post
Share on other sites

Soma just submitted a pull request and I merged it.

The 1.0.4 version should allow you to use this with the LanguageLocalizedURL module - let us know how you get on and thanks to Soma ;)

Share this post


Link to post
Share on other sites

Pete, thanks, but I already pulled another request, forgot to up the cache time.

Also to fully work it would also have to add the "language_published" check to see if language version of the page is really published. Will try to add that later.

Also wanted to add that I never use sitemaps for google and never will again (used to try long time ago, but it doesn't really help at all if you build the site carefully. It just eating time doing it and making sure everthing works still). It's not as easy as it first seems and can even be contra productive if not done carefully. Problem with this module as it is now, it will not find and list pages that may are added through urlSegments and I don't see a way to do it easy. Also it doesn't have weighting etc.

  • Like 1

Share this post


Link to post
Share on other sites

Thanks - merged it :)

The main reason I created it was because it gets a new site listed faster - plain and simple. That was my experience from sitemaps a few years ago at least so I hope it still happens, but it does seem to work from my experience.

I've never bothered with weighting or anything like that. Not bothered about URL segments either as for a start the crawler will find a link to that from a normal page on the site presumably, but my main goal was to get Google to crawl the page sooner than it normally would - and I think it will do this every time it looks at the sitemap and sees a new page (assuming it doesn't find it in the crawl already on its own by following links in the content).

Think of it this way - if you release a brand new site without telling Google, it will take until someone links to your site before Google is aware of its existence. On a small, personal site with no blog or comments that could be a loooong time, but the first thing I usually do is launch a site and submit it so I know it's done.

The argument against this is there are other ways of doing that as search engine companies often have pages where you can just type your domain in and they will search it sometime later, but I like to keep Google informed when I add new pages etc just in case it has a lapse in concentration and misses something :)

Share this post


Link to post
Share on other sites

just uped another pull request... :)

I think google doesn't parse the website if it has a sitemap.xml. Or has this changed? I found myself fixing others website by actually removing the sitemap form google.

Google is so fast nowadays it doesn't matter that much, as it won't be in index before index update. I could be wrong as things change all the time, but thats my personal experience and from readup. Also if you add a sitemap page to your site will help if you really worry about it. As with most seo things you have to take everything with a grain of salt at the end.

It's maybe ok if google doesn't come to your new site if it's new, but tests have shown even only 1 link to your site and google will parse it in 1 day if you have a nice structure.

Share this post


Link to post
Share on other sites

I used to post a link on my website to give my new client sites a poke. Work well. Or even in this forum.

  • Like 1

Share this post


Link to post
Share on other sites

Forgot to mention, I've pulled in the last of Soma's pull requests so this is now v1.0.5 to implement the added functionality he wrote as well as the fixes for the bugs he created :P;)

My fault for not testing :D

  • Like 1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By joshua
      ---
      Module Directory: https://modules.processwire.com/modules/privacy-wire/
      Github: https://github.com/blaueQuelle/privacywire/
      Packagist:https://packagist.org/packages/blauequelle/privacywire
      Module Class Name: PrivacyWire
      Changelog: https://github.com/blaueQuelle/privacywire/blob/master/Changelog.md
      ---
      This module is (yet another) way for implementing a cookie management solution.
      Of course there are several other possibilities:
      - https://processwire.com/talk/topic/22920-klaro-cookie-consent-manager/
      - https://github.com/webmanufaktur/CookieManagementBanner
      - https://github.com/johannesdachsel/cookiemonster
      - https://www.oiljs.org/
      - ... and so on ...
      In this module you can configure which kind of cookie categories you want to manage:

      You can also enable the support for respecting the Do-Not-Track (DNT) header to don't annoy users, who already decided for all their browsing experience.
      Currently there are four possible cookie groups:
      - Necessary (always enabled)
      - Functional
      - Statistics
      - Marketing
      - External Media
      All groups can be renamed, so feel free to use other cookie group names. I just haven't found a way to implement a "repeater like" field as configurable module field ...
      When you want to load specific scripts ( like Google Analytics, Google Maps, ...) only after the user's content to this specific category of cookies, just use the following script syntax:
      <script type="text/plain" data-type="text/javascript" data-category="statistics" data-src="/path/to/your/statistic/script.js"></script> <script type="text/plain" data-type="text/javascript" data-category="marketing" data-src="/path/to/your/mareketing/script.js"></script> <script type="text/plain" data-type="text/javascript" data-category="external_media" data-src="/path/to/your/external-media/script.js"></script> <script type="text/plain" data-type="text/javascript" data-category="marketing">console.log("Inline scripts are also working!");</script> The data-attributes (data-type and data-category) are required to get recognized by PrivacyWire. the data-attributes are giving hints, how the script shall be loaded, if the data-category is within the cookie consents of the user. These scripts are loaded asynchronously after the user made the decision.
      If you want to give the users the possibility to change their consent, you can use the following Textformatter:
      [[privacywire-choose-cookies]] It's planned to add also other Textformatters to opt-out of specific cookie groups or delete the whole consent cookie.
      You can also add a custom link to output the banner again with a link / button with following class:
      <a href="#" class="privacywire-show-options">Show Cookie Options</a> <button class="privacywire-show-options">Show Cookie Options</button>  
      I would love to hear your feedback 🙂
      CHANGELOG
      You can find the always up-to-date changelog file here.
    • By joshua
      As we often use Matomo (former known as Piwik) instead of Google Analytics we wanted to embed Matomo not only in the template code but also via the ProcessWire backend.
      That's why I developed a tiny module for the implementation.
      The module provides the possibility to connect to an existing Matomo installation with the classical site tracking and also via the Matomo Tag Manager.
      If you have also PrivacyWire installed, you can tell MatomoWire to only load the script, if the user has accepted cookies via PrivacyWire.
      To offer an Opt-Out solution you can choose between the simple Opt-Out iFrame, delivered by your Matomo installation, or a button to choose cookies via PrivacyWire.
      You'll find the module both in the module directory and via github:
      ProcessWire Module Directory MatomoWire @ GitHub MatomoWire @ Packagist ->installable via composer require blauequelle/matomowire I'm looking forward to hear your feedback!


    • By Robin S
      If your module has a lot of config fields you might want to divide them into groups inside a tabbed interface. Here is a demonstration module showing how this can be done.
      https://github.com/Toutouwai/ModuleConfigTabs

      Thanks to @kixe for providing my starting point in this forum topic.
    • By FireWire
      Hello community!

      I want to share a new module I've been working on that I think could be a big boost for multi-language ProcessWire sites.

      Some background, I was looking for a way for our company website to be efficiently translated as working with human translators was pretty laborious and a lack of updating content created a divergence between languages. I, and several other devs here, have talked about translation integrations and have recognized the power that DeepL has. DeepL is an AI deep learning powered service that delivers translation quality beyond any automated service available. After access to the API was opened up to the US, I built Fluency, a DeepL translation integration for ProcessWire.
      Fluency brings automated translation to every multi-language field in the admin, and also provides a translation tool allowing the user to translate their text to any language without it being inside a template's field. With Fluency you can:
      Translate any plain textarea or text input Translate any CKEditor content (yes, with markup) Translate page names for fully localized URLs on every page Translate your in-template translation function wrapped strings Translate modules DeepL offers translations to the following languages: English (US), English (UK), German, French, Spanish, Portuguese (EU), Portuguese (Brazil, Italian, Dutch, Polish, Russian, Japanese, Chinese (Simplified)
      Installation and usage is completely plug and play. Whether you're building a new multi-language site, need to update a site to multi-language, or simply want to stop manually translating a site and make any language a one-click deal, it could not be easier to do it. Fluency works by having you match the languages configured in ProcessWIre to DeepL's. You can have your site translating to any or all of the languages DeepL translates to in minutes (quite literally).
      Let's break out the screenshots...
      When the default language tab is shown, a message is displayed to let users know that translation is available. Clicking on each tab shows a link that says "Translate from English". Clicking it shows an animated overlay with the word "Translating..." cycling through each language and a light gradient shift. Have a CKEditor field? All good. Fluency will translated it and use DeepL's ability to translate text within HTML tags. CKEditor fields can be translated as easily and accurately as text/textarea fields.

      Repeaters and AJAX created fields also have translation enabled thanks to a JavaScript MutationObserver that searches for multi-language fields and adds translation as they're inserted into the DOM. If there's a multi-language field on the page, it will have translation added.

      Same goes for image description fields. Multi-language SEO friendly images are good to go.

      Creating a new page from one of your templates? Translate your title, and also translate your page name for native language URLs. (Not available for Russian, Chinese, or Japanese languages due to URL limitations). These can be changed in the "Settings" tab for any page as well so whether you're translating new pages or existing pages, you control the URLs everywhere.

      Language configuration pages are no different. Translate the names of your languages and search for both Site Translation Files (including all of your modules)

      Translate all of the static text in your templates as well. Notice that the placeholders are retained. DeepL is pretty good at recognizing and keeping non-translatable strings like that. If it is changed, it's easy to fix manually.

      Fluency adds a "Translate" item to the CMS header. When clicked this opens up a modal with a full translation tool that lets the user translate any language to any language. No need to leave the admin if you need to translate content from a secondary language back to the default ProcessWire language. There is also a button to get the current API usage statistics. DeepL account owners can set billing limitations via character count to control costs. This may help larger sites or sites being retrofitted keep an eye on their usage. Fluency can be used by users having roles given the fluency-translate permission.

      It couldn't be easier to add Fluency to your new or existing website. Simply add your API key and you're shown what languages are currently available for translation from/to as provided by DeepL. This list and all configuration options are taken live from the API so when DeepL releases new languages you can add them to your site without any work. No module updates, just an easy configuration. Just match the language you configured in ProcessWire to the DeepL language you want it to be associated with and you're done. Fluency also allows you to create a list of words/phrases that will not be translated which can prevent items such as brands and company names from being translated when they shouldn't

       
      Limitations:
      No "translate page" - Translating multiple fields can be done by clicking multiple translation links on multiple fields at once but engineering a "one click page translate" is not feasible from a user experience standpoint. The time it takes to translate one field can be a second or two, but cumulatively that may take much longer (CKEditor fields are slower than plain text fields). There may be a workaround in the future but it isn't currently on the roadmap. No "translate site" - Same thing goes for translating an entire website at once. It would be great, but it would be a very intense process and take a very (very) long time. There may be a workaround in the future but it isn't on the roadmap. No current support for Inline CKEditor fields - Handling for CKEditor on-demand hasn't been implemented yet, this is planned for a future release though and can be done. I just forgot about it because I've never really used that feature personally.. Alpha release - This module is in alpha. Releases should be stable and usable, but there may be edge case issues. Test the module thoroughly and please report any bugs via a Gitlab issue on the repository or respond here. Please note that the browser plugin for Grammarly conflicts with Fluency (as it does with many web applications). To address this issue it is recommended that you disable Grammarly when using Fluency, or open the admin to edit pages in a private window where Grammarly may not be loaded. This is an issue that may not have a resolution as creating a workaround may not be possible. If you have insight as to how this may be solved please visit the Gitlab page and file a bugfix ticket.
      Requirements:
      ProcessWire  3.0+ UIKit Admin Theme That's Fluency in a nutshell. A core effort in this module is to create it so that there is nothing DeepL related hard-coded in that would require updating it when DeepL offers new languages. I would like this to be a future-friendly module that doesn't require developer work to keep it up-to-date.
      It's Free
      This is my first real module and I want to give it back to the community as thanks. This is the best CMS I've worked with (thank you Ryan & contributors) and a great community (thank you dear reader). The only cost to use this is a subscription fee for the DeepL Pro API. Find out more and sign up here.
      Download & Feedback
      Download the latest version here
      https://github.com/SkyLundy/Fluency-Translation/archive/main.zip
      Github repository:
      https://github.com/SkyLundy/Fluency-Translation
      File issues and feature requests here (your feedback and testing is greatly appreciated):
      https://github.com/SkyLundy/Fluency-Translation/issues
       
      Thank you! ¡Gracias! Ich danke Ihnen! Merci! Obrigado! Grazie! Dank u wel! Dziękuję! Спасибо! ありがとうございます! 谢谢你!

    • By Robin S
      An inputfield module that brings EasyMDE Markdown editor to ProcessWire.
      EasyMDE is a fork of SimpleMDE, for which there is an existing PW module. Inputfield EasyMDE has a few advantages though:
      EasyMDE seems to be more actively developed than SimpleMDE, which hasn't seen any updates for several years. You can define options for Inputfield EasyMDE. Inputfield EasyMDE can be used in Repeater fields and in custom fields for File/Image fields.  
      Inputfield EasyMDE
      EasyMDE (Easy Markdown Editor) as an inputfield for ProcessWire.
      EasyMDE is a Markdown editor with some nice features, allowing users who may be less experienced with Markdown to use familiar toolbar buttons and shortcuts. More information is at the EasyMDE website.

      Installation
      Install the Inputfield EasyMDE module.
      Usage
      Create a new textarea field, and in the "Inputfield Type" dropdown choose "EasyMDE". Save the field and if you like you can then configure the EasyMDE options for the field as described below.
      To convert Markdown to HTML you can install the core TextformatterMarkdownExtra module and apply the textformatter to the field. Alternatively you can use $sanitizer->entitiesMarkdown() on the field value, e.g.
      echo $sanitizer->entitiesMarkdown($page->your_field_name, ['fullMarkdown' => true]); Configuration
      On the "Input" tab of the field settings you can define EasyMDE options for the field in JSON format. Refer to the EasyMDE documentation for the available options. Keys in the JSON must be surrounded with double quotes.
      Example:
      "toolbar": ["bold", "italic", "heading", "|", "side-by-side"], "sideBySideFullscreen": false  
      https://github.com/Toutouwai/InputfieldEasyMDE
      https://processwire.com/modules/inputfield-easy-mde/
×
×
  • Create New...