FrancisChung

Google not indexing Category Pages

Recommended Posts

Hi, I have an ongoing issue with Google SEO that I can't seem to fix. Wondering if anyone has come across a similar situation?

We deployed a new version of the website using a new deployment methodology and unfortunately, the wrong robots.txt file was deployed basically telling Googlebot not to scrape the site.

The end result is that if our target keywords are used for a (Google) search, our website is displayed on the search page with "No information is available for this page." 

Google provides a link to fix this situation on the search listing, but so far everything I have tried in it hasn't fixed the situation.
I was wondering if anyone has gone through this scenario and what was the steps to remedy it?
Or perhaps it has worked and I have misunderstood how it works?

The steps I have tried in the Google Webmaster Tool :

  1. Gone through all crawl errors
  2. Restored the Robots.txt file and Verified with Robots.txt tester
  3. Fetch/Fetch and Render as Google as both Desktop/Mobile, using root URL and other URLs, using Indexing Requested / Indexing Requested for URL and Linked Pages.
  4. Uploaded a new Sitemap.xml 

Particularly on the Sitemap page, it says 584 submitted, 94 indexed.

 

Would the Search Engine return "No Information available" because the page is not indexed? The pages I'm searching for are our 2 most popular keywords and entry points into site. It's also one of 2 most popular category pages.  So I'm thinking it probably isn't the case but ...

How can I prove / disprove the category pages are being indexed?

The site in questions is Sprachspielspass.de. The keywords to search are fingerspiele and kindergedichte.

 

Share this post


Link to post
Share on other sites

Just wondering, but how long has this been happening? In somewhat similar situations I've had delays of hours .. days.

Another thing I noticed is that your (current) robots.txt has a max-age of 30 days. According to Google's docs, they may "increase the time" a robots.txt file is cached based on this header, so in theory they could still be holding on to the old version. In that case I don't really know which steps to take, but first I'd wait for a few days to make sure that it isn't just Google's regular delay :)

Share this post


Link to post
Share on other sites

Hi @teppo, it's been going on for a few weeks now. Very frustrating.
It's definitely beyond a wait and see approach stage now.

Is the robots.txt max-age defined in .htaccess? I don't recall creating such setting for robots.txt specifically, but I do recall such setting for the website in general for cache-headers.

 

Share this post


Link to post
Share on other sites

I run the robots.txt file through this
http://www.webconfs.com/http-header-check.php

It says :
Cache-Control => max-age=2592000
Expires => Wed, 21 Feb 2018 18:21:44 GMT

Which is a bit baffling.

The Homepage and Main Category pages are showing Cache-Control : No Cache at the moment.
I do have Procache installed but currently, it's off at the moment. 

This is our current Cache-Control settings.

 

<ifModule mod_headers.c>
  # Ignore comments, everything is set to 8 days atm.
  # 1 week for fonts
  <filesMatch "\.(ico|jpe?g|jpg|png|gif|swf)$">
    Header set Cache-Control "max-age=691200, public"
  </filesMatch>
  <filesMatch "\.(eot|svg|ttf|woff|woff2)$">
    Header set Cache-Control "max-age=691200, public"
  </filesMatch>
  # 1 day for css / js files
  <filesMatch "\.(css)$">
    Header set Cache-Control "max-age=691200, public"
  </filesMatch>
  <filesMatch "\.(js)$">
    Header set Cache-Control "max-age=691200, private"
  </filesMatch>
  <filesMatch "\.(x?html?|php)$">
    Header set Cache-Control "private, must-revalidate"
  </filesMatch>
</ifModule>

I will add a separate filesMatch module for Robots.txt and see what happens ...
 


  <filesMatch "^robots.(txt|php)$">
    Header Set Cache-Control "max-age=0, public"
  </filesMatch>

 

Share this post


Link to post
Share on other sites

I guess you're committing major SEO sins here... You deliver the exact same content under several URLs / domains. Google is not amused by such practises.

e.g.

http://sprachspielspass.de/kinderlieder/alle-kinderlieder/ri-ra-rutsch/

http://kinder-reime.com/kinderlieder/alle-kinderlieder/ri-ra-rutsch/

http://finger-spiele.com/kinderlieder/alle-kinderlieder/ri-ra-rutsch/

This is called "black hat SEO", and frowned upon. I only discovered these other domains by checking your HTTP headers, where it says

access-control-allow-origin: kinder-reime.com, finger-spiele.com, sprachspielspass.de

 

  • Thanks 1

Share this post


Link to post
Share on other sites
21 hours ago, dragan said:

I guess you're committing major SEO sins here... You deliver the exact same content under several URLs / domains. Google is not amused by such practises.

e.g.

http://sprachspielspass.de/kinderlieder/alle-kinderlieder/ri-ra-rutsch/

http://kinder-reime.com/kinderlieder/alle-kinderlieder/ri-ra-rutsch/

http://finger-spiele.com/kinderlieder/alle-kinderlieder/ri-ra-rutsch/

This is called "black hat SEO", and frowned upon. I only discovered these other domains by checking your HTTP headers, where it says


access-control-allow-origin: kinder-reime.com, finger-spiele.com, sprachspielspass.de

 

@Dragan, the other two domains are our test server / test domains. In theory, Google or any other crawlers shouldn't be indexing them because robots.txt for those other sites instructs it not to. Surely this isn't a frowned upon practice?

In fact, the reason why I'm in such a pickle is because I migrated the code + contents from our UAT site to our live site including the robots.txt file from the UAT. 

I'm aware of what Google does to punish sites that circumvent around Seach Engine rules and I thought it's pretty hard to circumvent them these days. And I assure you that's not our intention here to cannibalise our own SEO rankings.

Perhaps I need to fix the access-control-allow-origin to have only one domain? I wasn't sure listing all our domains would have a negative impact? 

Share this post


Link to post
Share on other sites

Hi,

Also, you don't have the non-www version redirected to the www version, or vice versa.

And you should normally only have one meta name="canonical" version in your source code. 
Currently, it changes depending on the non-www or www version.

  • Like 1

Share this post


Link to post
Share on other sites

I've managed to get Google display our meta description correctly again today in their search results.

Not sure which of the following corrected it, but the steps are took were :

1) Not to cache Robots.txt (.htaccess) 

  <filesMatch "^robots.(txt|php)$">
    Header Set Cache-Control "max-age=0, public"
  </filesMatch>

2) Removed other test sites from access-control-allow-origin
3) Created a sitemap-category.xml file where it listed the homepage plus important category pages and uploaded it to Google.
4) Uploaded sitemap.xml to Google again after fixing 1) & 2)

I initially tried 3) but it didn't fix the problem initially (Google initially indexed 0 pages ... then 1 page today ).

I tried 2) 3) 4) today and it seems to have fixed it. 
Process 1) I tried when I first posted this thread.

It's also quite possible it "organically" fixed itself today by coincidence and none of my steps was actually effective.

As for the canonical meta-name issues, I'm using MarkupSEO module so it possibly is an issue with that.
I'll post any updates here.

Thanks again to @dragan@Christophe & @teppo for responding. 

  • Like 1

Share this post


Link to post
Share on other sites

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By chrizz
      hey there
      I guess a lot of you have already heard of the hreflang attribute which tells search engines which URL they should list on their result pages. For some of my projects I build this manually but now I am wondering if there's need to add this as a module to PW modules directory. 
      How do you deal with the hreflang thingy? Would you you be happy if you can use a module for this or do you have concerns that using a module maybe does not cover your current use cases?
      Cheers,
      Chris
       
       
       
       
    • By John W.
      SYNOPSIS
      A little guide to generating an sitemap.xml using (I believe) a script Ryan originally wrote with the addition of being able to optionally exclude child pages from being output in the sitemap.xml file.
      I was looking back on a small project today where I was using a php script to generate an xml file, I believe the original was written by Ryan. Anyway, I needed a quick fix for the script to allow me to optionally exclude children of pages from being included in the sitemap.xml output.
      OVERVIEW
      A good example of this is a site where if you visit /minutes/ a page displays a list of board meetings which includes a title,  date, description and link to download the .pdf file.
      I have a template called minutes and a template called minutes-document. The first page, minutes, when loaded via /minutes/ simply grabs all of its child pages and outputs the name, description and actual path of an uploaded .pdf file for a visitor to download.
      In my back-end I have the template MINUTES and MINUTES-DOCUMENT. Thus:


      So, basically, their employee can login, hover over minutes, click new, then create a new (child) record and name it the date of the meeting e.g. June 3rd, 2016 :

       
      ---------------------------
      OPTIONALLY EXCLUDING CHILDREN - SETUP
      Outputting the sitemap.xml and optionally excluding children that belong to a template.
      The setup of the original script is as follows:
      1. Save the file to the templates folder as sitemap.xml.php
      2. Create a template called sitemap-xml and use the sitemap.xml.php file.
      3. Create a page called sitemap.xml using the sitemap-xml template
       
      Now, with that done you will need to make only a couple of slight modifications that will allow the script to exclude children of a template from output to the sitemap.xml
      1. Create a new checkbox field and name it:   sitemap_exclude_children
      2. Add the field to a template that you want to control whether the children are included/excluded from the sitemap. In my example I added it to my "minutes" template.
      3. Next, go to a page that uses a template with the field you added above. In my case, "MINUTES"
      4. Enable the checkbox to exclude children, leave it unchecked to include children.
      For example, in my MINUTES page I enabled the checkbox and now when /sitemap.xml is loaded the children for the MINUTES do not appear in the file.

       
      A SIMPLE CONDITIONAL TO CHECK THE "sitemap_exclude_children" VALUE
      This was a pretty easy modification to an existing script, adding only one line. I just figure there may be others out there using this script with the same needs.
      I simply inserted the if condition as the first line in the function:
      function renderSitemapChildren(Page $page) { if($page->sitemap_exclude_children) return ""; ... ... ...  
      THE FULL SCRIPT WITH MODIFICATION
      <?php /** * ProcessWire Template to power a sitemap.xml * * 1. Copy this file to /site/templates/sitemap-xml.php * 2. Add the new template from the admin. * Under the "URLs" section, set it to NOT use trailing slashes. * 3. Create a new page at the root level, use your sitemap-xml template * and name the page "sitemap.xml". * * Note: hidden pages (and their children) are excluded from the sitemap. * If you have hidden pages that you want to be included, you can do so * by specifying the ID or path to them in an array sent to the * renderSiteMapXML() method at the bottom of this file. For instance: * * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); * * patch to prevent pages from including children in the sitemap when a field is checked / johnwarrenllc.com * 1. create a checkbox field named sitemap_exclude_children * 2. add the field to the parent template(s) you plan to use * 3. when a new page is create with this template, checking the field will prevent its children from being included in the sitemap.xml output */ function renderSitemapPage(Page $page) { return "\n<url>" . "\n\t<loc>" . $page->httpUrl . "</loc>" . "\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . "\n</url>"; } function renderSitemapChildren(Page $page) { if($page->sitemap_exclude_children) return ""; /* Aded to exclude CHILDREN if field is checked */ $out = ''; $newParents = new PageArray(); $children = $page->children; foreach($children as $child) { $out .= renderSitemapPage($child); if($child->numChildren) $newParents->add($child); else wire('pages')->uncache($child); } foreach($newParents as $newParent) { $out .= renderSitemapChildren($newParent); wire('pages')->uncache($newParent); } return $out; } function renderSitemapXML(array $paths = array()) { $out = '<?xml version="1.0" encoding="UTF-8"?>' . "\n" . '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'; array_unshift($paths, '/'); // prepend homepage foreach($paths as $path) { $page = wire('pages')->get($path); if(!$page->id) continue; $out .= renderSitemapPage($page); if($page->numChildren) { $out .= renderSitemapChildren($page); } } $out .= "\n</urlset>"; return $out; } header("Content-Type: text/xml"); echo renderSitemapXML(); // Example: echo renderSitemapXML(array('/hidden/page/'));  
      In conclusion, I have used a couple different processwire sitemap generating modules. But for my needs, the above script is fast and easy to setup/modify.
      - Thanks
       
    • By Krlos
      Hi, I'm using Formbuilder to build forms in my  website, I have different forms to track Google Adwords Conversions but I have like 20 differents forms.
      I was wondering how do you guys handle conversions in Google Adwords
    • By Mike Rockett
        
      Docs & Download: rockettpw/markup-sitemap
      Modules Directory: MarkupSitemap
      MarkupSitemap is essentially an upgrade to MarkupSitemapXML by Pete. It adds multi-language support using the built-in LanguageSupportPageNames. Where multi-language pages are available, they are added to the sitemap by means of an alternate link in that page's <url>. Support for listing images in the sitemap on a page-by-page basis and using a sitemap stylesheet are also added.
      Example when using the built-in multi-language profile:
      <url> <loc>http://domain.local/about/</loc> <lastmod>2017-08-27T16:16:32+02:00</lastmod> <xhtml:link rel="alternate" hreflang="en" href="http://domain.local/en/about/"/> <xhtml:link rel="alternate" hreflang="de" href="http://domain.local/de/uber/"/> <xhtml:link rel="alternate" hreflang="fi" href="http://domain.local/fi/tietoja/"/> </url> It also uses a locally maintained fork of a sitemap package by Matthew Davies that assists in automating the process.
      The doesn't use the same sitemap_ignore field available in MarkupSitemapXML. Rather, it renders sitemap options fields in a Page's Settings tab. One of the fields is for excluding a Page from the sitemap, and another is for excluding its children. You can assign which templates get these config fields in the module's configuration (much like you would with MarkupSEO).
      Note that the two exclusion options are mutually exclusive at this point as there may be cases where you don't want to show a parent page, but only its children. Whilst unorthodox, I'm leaving the flexibility there. (The home page cannot be excluded from the sitemap, so the applicable exclusion fields won't be available there.)
      As of December 2017, you can also exclude templates from sitemap access altogether, whilst retaining their settings if previously configured.
      Sitemap also allows you to include images for each page at the template level, and you can disable image output at the page level.
      The module allows you to set the priority on a per-page basis (it's optional and will not be included if not set).
      Lastly, a stylesheet option has also been added. You can use the default one (enabled by default), or set your own.
      Note that if the module is uninstalled, any saved data on a per-page basis is removed. The same thing happens for a specific page when it is deleted after having been trashed.