Tyssen

Environment-specific robots.txt

8 posts in this topic

Is there any way with PW to do environment-specific robots.txt, i.e. to block robots from staging sites without having to manually edit files in different environments?

Share this post


Link to post
Share on other sites

Here's how you might dynamically create it with ProcessWire without tinkering with .htaccess files.

  1. Create a new template, call it robots, and set its URLs > Should page URLs end with a slash setting to no, and Files > Content-Type to text/plain. You should tick disable Append file and Prepend file options as well.
    Optionally set its Family > May this page have children to no, and Family > Can this template be used for new pages to one. Family > Optionally Set allowed templates for parents to home only.
  2. Create a new page under homepage, set its template to robots, and name as robots.txt.
  3. Create a new template file at /site/templates/robots.php, inside it you type
<?php namespace Processwire;
// render different robots.txt depending on your own conditions.
if ($config->debug) {
	// use PHP_EOL to create multiline strings
	echo <<<PHP_EOL
User-agent: *
Disallow: /
PHP_EOL;
  
} else {
  
	echo <<<PHP_EOL
User-agent: *
Disallow: 
PHP_EOL;

}

and done. You should be able to see robots.txt at the url /robots.txt.

10 people like this

Share this post


Link to post
Share on other sites

Thanks guys! Sorry for the late reply, didn't get any notifications of replies. Going to give both methods a try.

Share this post


Link to post
Share on other sites

I am trying to implement the above method from abdus.

All works fine when i have a title of robots

But as soon as I name the page robots.txt I get the following error:

"The requested file robots.txt was not found."

So i tried robots.doc and that work perfectly well. Their must be something preventing me to use the extension .txt anyone any ideas?

Share this post


Link to post
Share on other sites
Quote

But as soon as I name the page robots.txt I get the following error:

"The requested file robots.txt was not found."

Same for me ??? 

Thought maybe it was a $config setting but couldn't find anything.

Suggestions?

Share this post


Link to post
Share on other sites

Solved!!! Answer was in the .htaccess file.

Remove reference to robots.txt being a physical file on the system.

#RewriteCond %{REQUEST_FILENAME} !(favicon\.ico|robots\.txt)
  RewriteCond %{REQUEST_FILENAME} !(favicon\.ico)

 

2 people like this

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Junaid Farooqui
      Hi guys,
      i was trying to implement SEO URL structure on my another processwire website. SEO team requested us to put .html on every pages. e.g http://www.mydomain.com/products.html and if click any product then it will like http://www.mydomain.com/products/product-one.html so after little bit research i find out it can be done with URLsegment option in template options. i checked on Allow URL Segment, said No to "Should page url end with slash" and said No to "Should URL segments end with a trailing slash? " but after all this setting we try to access the it is showing 404 page. i don't know why.
      Any help will be highly appreciate
      Thanks 
      J
    • By rash
      Hi guys,
      I'm a Processwire-Newbie and new to this forum. Happily I have to struggle with very few difficulties, thanks to the clear and pleasing concept and structure of PW. Currently there is only one thing that makes me brood:
      I have a main category 'posts' that contains the majority of all pages. So the regular url would be 'domain/posts/post-one' etc. As I prefer the url scheme 'domain/post-one' I followed the instructions discussed in this topic.
      This hook is in my 'init.php':
      wire()->addHookBefore('Page::path', function($event) { $page = $event->object; if($page->template == 'post') { $event->replace = true; $event->return = "/$page->name/"; } }); And this is in my 'home' template:
      if(strlen($input->urlSegment2)) { throw new Wire404Exception(); } else if(strlen($input->urlSegment1)) { $input->urlSegment1; $name = $sanitizer->pageName($input->urlSegment1); $post = $pages->get("/posts/")->child("name=$name"); if($post->id) echo $post->render(); else throw new Wire404Exception(); $renderMain = false; } else { // regular homepage output $posts = $pages->find("parent=/posts/, limit=$limit, sort=-date"); $body = renderPosts($posts); } Both templates 'home' and 'post' have 'url segments' option activated.
      On the first sight everything is working fine. $page-name outputs '/domain/post-one' and the page '/domain/post-one' ist getting displayed. What's frighening me is the fact, that 'domain/posts/post-one' is working as well. This means 'post-one' can be adressed with two different urls, and I’m not sure how to rate that.
      On one hand, nobody will ever notice the '/domain/posts/page-one' option, as it's listed nowhere. So I could just ignore it. On the other hand, I don't know for sure if this presumption is correct. Maybe there are unknown channels where the 'wrong' urls will be spreaded, then there will be 'doubled content' which is bad, as far as  I know.
      So what I'm asking for: Is there an easy way to avoid the double url-scheme option and output a 404 error, when 'domain/posts/page-one' is called? Or should I just don't care, as it doesn't matter a all? Unfortunately, I don’t fully understand every line of the second code, so I would be very grateful if someone could light it up for me a bit.
      Thanks + regards
      Ralf
    • By Hummer
      Hi there,
      I am using a url schema like this:
      page.html
      page1.html
      etc.
      As the name unter settings. I don't want to start a discussion about how useful .html is
      Works but not for pages that have child pages, where the path would is
      page.html/child1.html
      page1.html/child4.html
      The page and page1 should be without html if I access the children and with if I click the parent.
    • By Peter Knight
      One of my blogs seems to be accessible from multiple URLs and it's affecting my clients SEO.
      For example, using the following URL structures, I can access the same page.
      http://www.domain.not/blog/page2/
      http://www.domain.not/blog/posts/page2/
      The correct one is probably the second one as all posts are children of blog. 
      My actual blog structure is as follows
      Blog
      - Posts
         -- Post A
         -- Post B
         -- Post C
         (etc)
      - Tags
      - Categories
      I have pagination enabled on a template called blog-posts which is applied to the Posts page. 
      I'm not sure though why the double URL is occouring?
    • By hheyne
      Hello, I have several times discussions with the seo guys about urls in processwire. There are some which are valid with the regular url and also with domain.xx/index.php?it=
      What do you think about this? Is it possible to redirect the "domain.xx/index.php?it=" url to a "seo-friendlier" url?
      I have this within several projects.