Environment-specific robots.txt

Tyssen · April 14, 2017

Is there any way with PW to do environment-specific robots.txt, i.e. to block robots from staging sites without having to manually edit files in different environments?

Zeka · April 15, 2017

Hi @Tyssen

https://www.andrewversalle.com/blog/environment-specific-robotstxt-files

abdus · April 15, 2017

Here's how you might dynamically create it with ProcessWire without tinkering with .htaccess files.

Create a new template, call it robots, and set its URLs > Should page URLs end with a slash setting to no, and Files > Content-Type to text/plain. You should tick disable Append file and Prepend file options as well.
Optionally set its Family > May this page have children to no, and Family > Can this template be used for new pages to one. Family > Optionally Set allowed templates for parents to home only.
Create a new page under homepage, set its template to robots, and name as robots.txt.
Create a new template file at /site/templates/robots.php, inside it you type

<?php namespace Processwire;
// render different robots.txt depending on your own conditions.
if ($config->debug) {
	// use PHP_EOL to create multiline strings
	echo <<<PHP_EOL
User-agent: *
Disallow: /
PHP_EOL;
  
} else {
  
	echo <<<PHP_EOL
User-agent: *
Disallow: 
PHP_EOL;

}

and done. You should be able to see robots.txt at the url /robots.txt.

Tyssen · April 20, 2017

Thanks guys! Sorry for the late reply, didn't get any notifications of replies. Going to give both methods a try.

Zeka · April 21, 2017

Hi @Tyssen

I think that @abdus method is more natural for PW, in the same manner, you can implement sitemap.

erikvanberkum · June 3, 2017

I am trying to implement the above method from abdus.

All works fine when i have a title of robots

But as soon as I name the page robots.txt I get the following error:

"The requested file robots.txt was not found."

So i tried robots.doc and that work perfectly well. Their must be something preventing me to use the extension .txt anyone any ideas?

psy · June 3, 2017

Quote

But as soon as I name the page robots.txt I get the following error:

"The requested file robots.txt was not found."

Same for me

Thought maybe it was a $config setting but couldn't find anything.

Suggestions?

psy · June 3, 2017

Solved!!! Answer was in the .htaccess file.

Remove reference to robots.txt being a physical file on the system.

#RewriteCond %{REQUEST_FILENAME} !(favicon\.ico|robots\.txt)
  RewriteCond %{REQUEST_FILENAME} !(favicon\.ico)

rastographics · February 15, 2019

Awesome solution just what I was looking for!

For a little more flexibility, instead of hardcoding to allow robots if debug is false, I added a checkbox field on the robots template so I can turn SEO blocking on or off from the backend.

Sign In

Environment-specific robots.txt

Recommended Posts

Tyssen

Zeka

abdus

Tyssen

Zeka

erikvanberkum

psy

psy

rastographics

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Similar Content

Ichiban - SEO control center

SEO Neo - 1.1.3 available

WIP: SEO NEO

module Module for Calculating SERP Display Width of Meta Titles & Descriptions

SEO PRACTICE: How to insert json-ld structured data

Browse

Activity

My Activity Streams

Support

Store

My Details