Tyssen

Environment-specific robots.txt

Recommended Posts

Is there any way with PW to do environment-specific robots.txt, i.e. to block robots from staging sites without having to manually edit files in different environments?

Share this post


Link to post
Share on other sites

Here's how you might dynamically create it with ProcessWire without tinkering with .htaccess files.

  1. Create a new template, call it robots, and set its URLs > Should page URLs end with a slash setting to no, and Files > Content-Type to text/plain. You should tick disable Append file and Prepend file options as well.
    Optionally set its Family > May this page have children to no, and Family > Can this template be used for new pages to one. Family > Optionally Set allowed templates for parents to home only.
  2. Create a new page under homepage, set its template to robots, and name as robots.txt.
  3. Create a new template file at /site/templates/robots.php, inside it you type
<?php namespace Processwire;
// render different robots.txt depending on your own conditions.
if ($config->debug) {
	// use PHP_EOL to create multiline strings
	echo <<<PHP_EOL
User-agent: *
Disallow: /
PHP_EOL;
  
} else {
  
	echo <<<PHP_EOL
User-agent: *
Disallow: 
PHP_EOL;

}

and done. You should be able to see robots.txt at the url /robots.txt.

  • Like 10

Share this post


Link to post
Share on other sites

Thanks guys! Sorry for the late reply, didn't get any notifications of replies. Going to give both methods a try.

Share this post


Link to post
Share on other sites

Hi @Tyssen

I think that @abdus method is more natural for PW, in the same manner, you can implement sitemap.

Share this post


Link to post
Share on other sites

I am trying to implement the above method from abdus.

All works fine when i have a title of robots

But as soon as I name the page robots.txt I get the following error:

"The requested file robots.txt was not found."

So i tried robots.doc and that work perfectly well. Their must be something preventing me to use the extension .txt anyone any ideas?

Share this post


Link to post
Share on other sites
Quote

But as soon as I name the page robots.txt I get the following error:

"The requested file robots.txt was not found."

Same for me ??? 

Thought maybe it was a $config setting but couldn't find anything.

Suggestions?

Share this post


Link to post
Share on other sites

Solved!!! Answer was in the .htaccess file.

Remove reference to robots.txt being a physical file on the system.

#RewriteCond %{REQUEST_FILENAME} !(favicon\.ico|robots\.txt)
  RewriteCond %{REQUEST_FILENAME} !(favicon\.ico)

 

  • Like 2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By John W.
      SYNOPSIS
      A little guide to generating an sitemap.xml using (I believe) a script Ryan originally wrote with the addition of being able to optionally exclude child pages from being output in the sitemap.xml file.
      I was looking back on a small project today where I was using a php script to generate an xml file, I believe the original was written by Ryan. Anyway, I needed a quick fix for the script to allow me to optionally exclude children of pages from being included in the sitemap.xml output.
      OVERVIEW
      A good example of this is a site where if you visit /minutes/ a page displays a list of board meetings which includes a title,  date, description and link to download the .pdf file.
      I have a template called minutes and a template called minutes-document. The first page, minutes, when loaded via /minutes/ simply grabs all of its child pages and outputs the name, description and actual path of an uploaded .pdf file for a visitor to download.
      In my back-end I have the template MINUTES and MINUTES-DOCUMENT. Thus:


      So, basically, their employee can login, hover over minutes, click new, then create a new (child) record and name it the date of the meeting e.g. June 3rd, 2016 :

       
      ---------------------------
      OPTIONALLY EXCLUDING CHILDREN - SETUP
      Outputting the sitemap.xml and optionally excluding children that belong to a template.
      The setup of the original script is as follows:
      1. Save the file to the templates folder as sitemap.xml.php
      2. Create a template called sitemap-xml and use the sitemap.xml.php file.
      3. Create a page called sitemap.xml using the sitemap-xml template
       
      Now, with that done you will need to make only a couple of slight modifications that will allow the script to exclude children of a template from output to the sitemap.xml
      1. Create a new checkbox field and name it:   sitemap_exclude_children
      2. Add the field to a template that you want to control whether the children are included/excluded from the sitemap. In my example I added it to my "minutes" template.
      3. Next, go to a page that uses a template with the field you added above. In my case, "MINUTES"
      4. Enable the checkbox to exclude children, leave it unchecked to include children.
      For example, in my MINUTES page I enabled the checkbox and now when /sitemap.xml is loaded the children for the MINUTES do not appear in the file.

       
      A SIMPLE CONDITIONAL TO CHECK THE "sitemap_exclude_children" VALUE
      This was a pretty easy modification to an existing script, adding only one line. I just figure there may be others out there using this script with the same needs.
      I simply inserted the if condition as the first line in the function:
      function renderSitemapChildren(Page $page) { if($page->sitemap_exclude_children) return ""; ... ... ...  
      THE FULL SCRIPT WITH MODIFICATION
      <?php /** * ProcessWire Template to power a sitemap.xml * * 1. Copy this file to /site/templates/sitemap-xml.php * 2. Add the new template from the admin. * Under the "URLs" section, set it to NOT use trailing slashes. * 3. Create a new page at the root level, use your sitemap-xml template * and name the page "sitemap.xml". * * Note: hidden pages (and their children) are excluded from the sitemap. * If you have hidden pages that you want to be included, you can do so * by specifying the ID or path to them in an array sent to the * renderSiteMapXML() method at the bottom of this file. For instance: * * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); * * patch to prevent pages from including children in the sitemap when a field is checked / johnwarrenllc.com * 1. create a checkbox field named sitemap_exclude_children * 2. add the field to the parent template(s) you plan to use * 3. when a new page is create with this template, checking the field will prevent its children from being included in the sitemap.xml output */ function renderSitemapPage(Page $page) { return "\n<url>" . "\n\t<loc>" . $page->httpUrl . "</loc>" . "\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . "\n</url>"; } function renderSitemapChildren(Page $page) { if($page->sitemap_exclude_children) return ""; /* Aded to exclude CHILDREN if field is checked */ $out = ''; $newParents = new PageArray(); $children = $page->children; foreach($children as $child) { $out .= renderSitemapPage($child); if($child->numChildren) $newParents->add($child); else wire('pages')->uncache($child); } foreach($newParents as $newParent) { $out .= renderSitemapChildren($newParent); wire('pages')->uncache($newParent); } return $out; } function renderSitemapXML(array $paths = array()) { $out = '<?xml version="1.0" encoding="UTF-8"?>' . "\n" . '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'; array_unshift($paths, '/'); // prepend homepage foreach($paths as $path) { $page = wire('pages')->get($path); if(!$page->id) continue; $out .= renderSitemapPage($page); if($page->numChildren) { $out .= renderSitemapChildren($page); } } $out .= "\n</urlset>"; return $out; } header("Content-Type: text/xml"); echo renderSitemapXML(); // Example: echo renderSitemapXML(array('/hidden/page/'));  
      In conclusion, I have used a couple different processwire sitemap generating modules. But for my needs, the above script is fast and easy to setup/modify.
      - Thanks
       
    • By Krlos
      Hi, I'm using Formbuilder to build forms in my  website, I have different forms to track Google Adwords Conversions but I have like 20 differents forms.
      I was wondering how do you guys handle conversions in Google Adwords
    • By Mike Rockett
        
      Docs & Download: rockettpw/markup-sitemap
      Modules Directory: MarkupSitemap
      MarkupSitemap is essentially an upgrade to MarkupSitemapXML by Pete. It adds multi-language support using the built-in LanguageSupportPageNames. Where multi-language pages are available, they are added to the sitemap by means of an alternate link in that page's <url>. Support for listing images in the sitemap on a page-by-page basis and using a sitemap stylesheet are also added.
      Example when using the built-in multi-language profile:
      <url> <loc>http://domain.local/about/</loc> <lastmod>2017-08-27T16:16:32+02:00</lastmod> <xhtml:link rel="alternate" hreflang="en" href="http://domain.local/en/about/"/> <xhtml:link rel="alternate" hreflang="de" href="http://domain.local/de/uber/"/> <xhtml:link rel="alternate" hreflang="fi" href="http://domain.local/fi/tietoja/"/> </url> It also uses a locally maintained fork of a sitemap package by Matthew Davies that assists in automating the process.
      The doesn't use the same sitemap_ignore field available in MarkupSitemapXML. Rather, it renders sitemap options fields in a Page's Settings tab. One of the fields is for excluding a Page from the sitemap, and another is for excluding its children. You can assign which templates get these config fields in the module's configuration (much like you would with MarkupSEO).
      Note that the two exclusion options are mutually exclusive at this point as there may be cases where you don't want to show a parent page, but only its children. Whilst unorthodox, I'm leaving the flexibility there. (The home page cannot be excluded from the sitemap, so the applicable exclusion fields won't be available there.)
      As of December 2017, you can also exclude templates from sitemap access altogether, whilst retaining their settings if previously configured.
      Sitemap also allows you to include images for each page at the template level, and you can disable image output at the page level.
      The module allows you to set the priority on a per-page basis (it's optional and will not be included if not set).
      Lastly, a stylesheet option has also been added. You can use the default one (enabled by default), or set your own.
      Note that if the module is uninstalled, any saved data on a per-page basis is removed. The same thing happens for a specific page when it is deleted after having been trashed.
          
    • By OLSA
      Hello for all,
      these days finishing project what is combination of e-commerce and portal website. Problem is "Products" page-tree because content need to be divided inside 48 categories. Also the client gave me content divided in 68 categories, but at the end they accepted my proposal about reducing that number. But problem is that and with 48 categories in backend, "Products" page tree is very large, long list, and very hard for administration. On front-end mega-menu with labels (menu group headings) solved that problem for vistors. And, at the end, I decided to test variant with additional categories to group categories, and result is good (now it's easy for adminsitration to find what they want very easy).
      Problem: now urls are longer, and in some parts (categories) not sure how that's can affect on SEO (eg. before: "products/showers/some-product", now: "products/showers-and-bathtubs/showers/some-product"). There are few more examples like that. And another thing is that urls are now longer (more characters, and deeper).

      For better administration I added new level of categorisation - I think - that is not a way to go. What is your opinion or suggestion about that? Maybe different, custom admin template, with custom navigation, or some folders? or ...?
      Thanks.
    • By Peter Knight
      Had a question about trailing slashes and forcing one or other.
      I've a site where most pages can be accessed with AND without a trailing slash
      IE
      domain.com/about-us/contact
      and
      domain.com/about-us/contact/
      are both accessible and being indexed by Google. It's obviously bad for SEO but I can't seem to make PW respect one and redirect etc.
      There is a setting in templates>template>URLs 
      I must be overlooking something as I have 'yes' selected and both URLs are still reachable with no redirect.
      What do you guys do to counter this?