Jump to content

Google News Articles / URL Structure


RyanJ
 Share

Recommended Posts

In order to submit articles to Google news, they require the article urls to be "unique" and contain "at least 3 digits". I have spent most of my morning searching through the forums and have not really found a solution. 

The best one I have ran across I think is the Process Permalink module. From my understanding of it, you can chose custom url's for pages using a specific template. I am assuming this is kinda what wordpress does with custom post types. However, I do not believe it is being maintained and was never submitted to the module Repository. It also does not work with the current version of ProcessWire and throws several errors. I would love to adopt the module, but am not confident in my skills to make it work yet :).

The Date Archiver is also a possible solution except in this particular case, archiving the articles is not necessary at this point and if a article is not archived, it does not contain the required digits as google request. Also if you only archive by year, then you may run across the issue of a duplicate title.

This topic is basically requesting the same, but no solution was really offered other then adding dates to the tree structure which is what the Date Archiver does for you automatically which is awesome! Of course to me this is not a viable option at this point. 

Several other topics which basically all involve archiving which the date archiver does for you and using url segments.
 
 
 
Maybe the ability to append the page id or created date to the url? Maybe I have thought upon this too long, but any suggestions are appreciated.
Link to comment
Share on other sites

Hi RJay

Each page in ProcessWire has a unique path - this one is solved.

As for the digits, you can enter some in the page name. To add for example the creation date of the article automatically to the name field,

you could use a hook that hooks after saving the page, then modifies the name attribute.

  • Like 1
Link to comment
Share on other sites

Thanks for the link SiNNuT. Google specifically looks for the digits in the url in order to list it in their news feed. From what I can tell this module redirects to the clean version url. I could also see as the articles increase having so many redirects would be undesirable. I too agree that it is strange that they require digits, but tested it and they indeed will not list a url if it does not contain digits.

you could use a hook that hooks after saving the page, then modifies the name attribute.

Thanks for the tip Wanze. This seems like it could work. I will have to give it a try.

Link to comment
Share on other sites

I haven't used it but it seems that http://processwire.com/talk/topic/4611-redirect-id-based-urls/ would be a possible solution.

From what I can tell this module redirects to the clean version url. 

I have just pushed a new version of the module that supports "Load" rather than "Redirect" so the url that is entered which includes the 4 digit page ID will stay in the browser address bar.

Not sure if this will suit your needs or not, but it was an easy addition so I thought I'd add it anyways.

  • Like 4
Link to comment
Share on other sites

Hi adrian,

This is awesome you added this and so quickly! I think it will definitely do the trick. I looking forward to digging into the module to see how you are adding the page id to the url. Thank you for sharing and adding the addition. 

  • Like 1
Link to comment
Share on other sites

No problem at all - happy to help. 

I just pushed another small update that adds a canonical link to the page if you are using the new "Load" option. This is to help identify the ID based URL as a duplicate of the proper PW url so Google etc won't penalize you for duplicate content.

  • Like 3
Link to comment
Share on other sites

In order to submit articles to Google news, they require the article urls to be "unique" and contain "at least 3 digits".

Are you sure (on 3 digits)? We didn't have to do that on the CMSCritic site, though I think we had to setup a custom news feed.

Link to comment
Share on other sites

Hi Ryan,

This is from their technical submission guide lines.

Article URLs. To make sure we only crawl new articles, please make sure your URLs are unique with at least 3 digits, and are permanent.

I built a client site using WordPress in the past and the client wished to have their articles submitted to Google News. After reading their guidelines, I did not want to change the url structure of the site to include a digit, so I submitted it anyway. Even though I had submitted a news site-map, a month or so later only 1 article had been indexed and it just so happened to contain some digits in the url.

I changed the structure to contain dates and the next morning their articles were being indexed. I guess it could also depend on the sites authority and other things, but then again Google does what Google wants :)..

Link to comment
Share on other sites

if you want to add the page id to the url, you can also add a hook, and do something similar to this:

http://processwire.com/talk/topic/1648-ok-to-change-page-name-path-after-save/?p=15232

then you can setup how to save the url, so for example you could append the date, or the id to the end of the url, or anything you want...

just did something like this on a recent site and ended up extending this buildUrl module a lot and having it set the page name and title based on the input of 7 fields on the template... if you need examples i can post a gist later

Link to comment
Share on other sites

Hi Macrura,

Thanks for sharing. I think the idea of using a certain lets say date field to append to the url is a great option. I have not dove into modules/hooks yet, but will take your example for your post and see how I can alter to my needs. Any examples though are always appreciated. Also Adrian's module works great adding the id to the url.

Thanks

Link to comment
Share on other sites

Hi Rjay-

here is my example, this one uses dates in certain conditions and also page ids...

https://gist.github.com/outflux3/7568222

this module works on the current dev, not sure if it works on the stable though

should also mention that this is entirely based on the initial buildUrl module that pete posted on the link a few posts up

  • Like 1
Link to comment
Share on other sites

Thanks for sharing your approach Ryan. I think the xml feed is definitely the difference here. In my situation there was no feed submitted, just the url. After changing the structure to include the articles date, the articles appeared in Google News. It must be a case of this format or that format :)...

  • Like 1
Link to comment
Share on other sites

I feel sure someone at Google has made a typo and they mean "characters" not "digits". As in: "an article title less than 3 characters won't be shown".

What they've written in the bit you quoted just seems like nonsense to me given that you and ryan have mentioned that you both have articles showing fine in Google News now.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...