Jump to content

Hide child pages but use their data


totoff
 Share

Recommended Posts

hi forum,

i'm sure that has been asked and answered before, but can't find it in the forum jungle, not even via google ... therefore in short:

i display a list of questions and answers on a faq-list-page. the data for each faq comes from child pages of the faq-list: 

/faq-list

/faq-1

/faq-2

 

and so on. as i don't need the child pages for more than holding the data and don't want it to be indexed by google, i would like to hide it. however, if i set it to hidden in settings i can't retrieve their data anymore for my faq-list-page.

does anybody know a topic here in the forum that addresses this or has a workaround?

thanks, christoph

Link to comment
Share on other sites

Hi Christoph

Although you might hide the parent (and therefore the children will not appear on the menu) the children and the parent are still published.

If you look through the Cheat Sheet you will see you can call a specific page and then the children.

So, for instance

$pages->get("/path/to/page/")->children - will retrieve all the children to a specific page.

Now you can create a foreach loop to retrieve the fields you require from the children and display it however you wish.

http://processwire.com/api/cheatsheet/

Link to comment
Share on other sites

I think I mean is it possible to index the site after content includes have been rendered by the individual pages. So that my include shows up as a result on its own, and as part of the page it is included in.

Of course, if your individual page has url that is reachable by search engine.

Link to comment
Share on other sites

I think you already have your solution but on a sidenote; if it's only about hiding stuff from search engines you could just add <meta name="robots" content="noindex, nofollow"> in the head section of the faq entry html.

Link to comment
Share on other sites

Don't forget that visiting a page with chrome or the google toolbar, that is not indexed in G, will invite a crawl from googlebot.  

For SEO control I usually always add a field that allows the editor to set the correct meta robots for the page in the admin.  And in code I will usually cascade or inherit the value of the parent unless override was present.  

  • Like 1
Link to comment
Share on other sites

Don't forget that visiting a page with chrome or the google toolbar, that is not indexed in G, will invite a crawl from googlebot.  

If that would be true, all my development sites would be in google index, but they aren't.

  • Like 1
Link to comment
Share on other sites

@soma- I said invite. I really should have said "could".

I did more additional research on this and G say's it won't and Matt Cutts debunked this a while back. I had seen googlebot on some dev sites and was sure they had not been shared. Could have been a tweet or some other crawled resource that shared it.  I stand corrected.

Note that they do collect URL's in chrome if you have Google as the default search engine (for typeahead or missing URL's), and in the Google Toolbar when "page rank" is enabled the URL's are phoned home. Just not shared . Yet ;) 

  • Like 1
Link to comment
Share on other sites

  • 1 month later...

i'm coming back to this thread as i still struggle to find the optimum solution for the problem i described in my first post. to sum up, as far as i understand there are four options to use child pages as "data containers" without making them viewable or have them indexed by google:

  1. keep the children hidden/unpublished but retrieve their data with selector include=hidden -> not the optimum as it confuses clients ("why is this page hidden")
  2. don't assign a template file to the page template -> throws a 404, possible, but not the most elegant solution imho
  3. set a 301 to their parent -> unsure on the seo effects and may cause trouble if the page tree needs to be changed
  4. robots.txt disallow /faq-1 etc. -> not applicable as it requires a static page tree

from my opinion, option 3 seems to be the best - but still leaves me a bit dissatisfied ... i would be happy to hear your opinion or suggestions how you would solve this.

thanks, christoph

Link to comment
Share on other sites

Option 1 or better 2. is the most elegant if you don't need to be able to view those anyway.

If you don't have links or urls to those data pages, spider will not find them anyway, so nothing to worry about.

Link to comment
Share on other sites

I think there's something to be said for the 'unpublished' option. If you look at this setting in the admin it says: 'Unpublished: Not visible on site'. It seems this is exactly what you want. Surely this can't be hard to explain to clients: "we keep faq items unpublished because we don't want them individually accessible on the website".

Another option is a variation on 4: instead of trying to do this in robots.txt you can put <meta name="robots" content="noindex, nofollow"> in the head section of your faq template. This way you also keep spiders out and and it automaticaly applies to all pages using the faq template. Of course, they still would be url accessible but you wouldn't link to them on the site, nor would google index them, so no real problem.

Just in case someone would visit a faq url you could make clear what's happening via the template output: "notice: this page is part of .... visit this page instead (link)"

Link to comment
Share on other sites

Surely this can't be hard to explain to clients

unfortunately it was. the quote above is an original. it was not that they didn't get it, they simply forgot about it and published the pages anyway.

Link to comment
Share on other sites

Using published option is actually worst and most unelegant. You then can't use published anymore...

Option 2 is the simplest and most elegant way to go. If you have no template file you can't view them directly. It's there for this reason.

  • Like 1
Link to comment
Share on other sites

I tend to make sure that the parent is Hidden, then there is no automatic link to the kids (whatever state they are in) unless you create one. That way they don't accidentally become visible because someone did the wrong setting.

If you want a catch all, you could always give them a template file but don't render any of the fields. You can then redirect that anywhere you wanted.

I would think just keeping the parent Hidden will be easier though and means that the children (if published) are still accessible by page fields, if you need that later.

Link to comment
Share on other sites

hi there,

thanks for all your comments. interesting to see that there is such a wealth of opinions and different strategies. i tend to agree to soma whose approach "if you don't want to make it public, just don't assign a template file to it" convinces me most. also with regard to search enignes: no url, no links, 404 just in case = no problem.

Link to comment
Share on other sites

  • 11 months later...
  • 1 month later...

Good morning guys :-)

Don´t really understand PW on this point.
 
Am I understanding right that you guys mention to just set the status of the page or the parent to "Hidden: Excluded from lists and searches"?

When I´m doing this on my test site the page is still accessible via the url, even when logged out.

Only when I set the status to unpublished or delete the template file it´s throwing 404

As I understand right now, "hidden" does only hide from search results on the own page and on listings like navigations or something, right?

But for this to be true. I just marked my contact page as hidden and pulled out some data via $pages->get('/contact') without including the "include=hidden" selector but it was still working.

EDIT: okay $pages->get will include hidden pages as well as ryan said in this post

And as I mentioned above, the page is accessible via url as well. (which I would understand because it´s only hiding from search results)

And totoff is confusing me with this one #7 as well haha^^

Isn´t anything throwing a 404? I mean I could just add something random at the end of my url (example.com/somethingrandom) and getting a 404

So how can he (how can you) prevent it from showing a 404? Or are you doing a redirect?

Hope someone can understand my confused brain and bring a little clarification in there which is really appreciated :D

EDIT: At the moment I think it´s best for me to have a page without template file and set state to hidden to have it different in page tree list

Cheers

Can

Edited by Can
Link to comment
Share on other sites

Hi Can,

Am I understanding right that you guys mention to just set the status of the page or the parent to "Hidden: Excluded from lists and searches"?

When I´m doing this on my test site the page is still accessible via the url, even when logged out.

Only when I set the status to unpublished or delete the template file it´s throwing 404

Yep, that's the correct behavior. Hidden will exclude a page from $pages->find() calls, unless you specify "include=hidden" in your selector.

Think of a hidden page like a page that should not be visible in your navigation or lists, but still accessible when one knows the direct URL.

Unpublished really means that the page is not yet ready/published for the public, here a 404 is displayed. The same goes for pages with templates that do not have a physical template file associated.

Be careful when logged in as superuser, if I remember correctly you'll see unpublished pages. In order to simulate the website for a guest visitor, you could use another browser or the private/incognito mode :)

But for this to be true. I just marked my contact page as hidden and pulled out some data via $pages->get('/contact') without including the "include=hidden" selector but it was still working.

$pages->get() is an explicit call to retrieve a page. ProcessWire assumes that you want to get it, no matter if it's hidden or not

Isn´t anything throwing a 404? I mean I could just add something random at the end of my url (example.com/somethingrandom) and getting a 404

So how can he (how can you) prevent it from showing a 404? Or are you doing a redirect?

Pw is throwing an 404 if you enter a path that does not exist.

Or if a page you want to visit is unpublished or does not have a template file (because Pw does not know what markup to render).

You can also throw a 404 anytime yourself, though that is already more advanced stuff.

Could you maybe describe us what you want to do? Why would you want to prevent showing a 404?

Cheers :)

  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...