Jump to content

Automatic meta description


alan
 Share

Recommended Posts

Hi,

I've put together a small piece of code for use in the HTML HEAD tag to automate the creation of the content for a META DESCRIPTION tag.

I would be interested if anyone can see a neater way to do it or solve my regex TODO. And otherwise I'm posting just in case this is helpful for anyone else.

<meta name="description" content='<?php

// Check if there is text in the summary field, if so output it as the author
// has deliberately written a meta description. Otherwise grab the first N
// (e.g. 160) characters of the body field, strip the HTML tags, replace them
// with a space and then output it as a poor-mans meta description.
//
// TODO this regex replaces opening and closing HTML tags and so
// <h2>What are hedgehogs?</h2><p>Hedgehogs are
// is cleaned up like this
// _What are hedgehogs?__Hedgehogs are
// when the ideal would be
// What are hedgehogs?_Hedgehogs are

$summary = $page->get("summary");
if($summary) echo $summary; else echo preg_replace('/<[^>]*>/', ' ', substr($page->get("body"), 0, 160));

?>' />
Link to comment
Share on other sites

What is wrong with strip_tags?

 $strippedText = strip_tags($unstrippedText);

Regarding length, you could use one of the word / sentence cutting functions. I have no time to search for one right now, but I remember seeing one, that would take string and length you wanted - and it gave you snippet, which ended with full sentence (if the delta between max_length(the one you wanted) and actual length was lower than 5%) or with whole words.

  • Like 1
Link to comment
Share on other sites

What adam said.

public function wordLimiter($str = '', $limit = 120, $endstr = '...'){

	if($str == '') return '';

	if(strlen($str) <= $limit) return $str;

	$out = substr($str, 0, $limit);
	$pos = strrpos($out, " ");
	if ($pos>0) {
		$out = substr($out, 0, $pos);
	}
	$out .= $endstr;
	return $out;

}

Cut's between words.

  • Like 1
Link to comment
Share on other sites

adamkiss OK I found/remembered why I didn't use strip_tags. If the BODY that is sampled is:

<h2>My hedgehogs</h2>
<p>Hedgehogs are cute.</p>

then strip_tags produces

My hedgehogsHedgehogs are cute.

I.e. it runs two words together.

But thanks for the suggestions and I'm going to keep looking for that PHP tag you noted that would strip to a sentence, v. useful.

Link to comment
Share on other sites

Hi Alan,

Haven't we met before somewhere? Anyway, you are almost there with your code. Welcome to the world of regular expressions. Now, to go from...

// _What are hedgehogs?__Hedgehogs are

...to...

// What are hedgehogs?_Hedgehogs are

You need to do a couple of extra steps if you want the PHP workout and can't find anything else. If you store what you want to echo in $out then do this (I'm showing it in two steps but you can combine if you want)...

$out = strtr( $out, array( '  ' => ' ' ) ); // All double space runs to single space runs
$out = trim( $out ); // No leading or trailing spaces.
echo $out;

You should get your target result.

Link to comment
Share on other sites

This is a method to strip html tags:

preg_replace('#<[^>]+>#', ' ', $page->body)

If you're picky to get 2 spaces to 1 you could do instead.

trim(str_replace("  "," ",preg_replace('#<[^>]+>#', ' ', $page->body)))

But in html they collapse if there's more than 1 space except   so I think it doesn't matter much.

Link to comment
Share on other sites

Hi netcarver, why yes, your name is familiar :) Nice to see some TXP people here.

Thank you for the PHP loveliness, I'll go adjust!

Soma, thanks for the function, re-useable and gentler ending with the elipsis is nicer and thanks for the regex too!

Link to comment
Share on other sites

Thanks everyone for your most helpful comments, now producing perfect output, I'll post back here shortly what I ended up with for my ref/to help anyone else similarly PHP disabled as I am). Cheers!, -Alan

Link to comment
Share on other sites

OK, so this is what I ended up with thanks to the kind helping hands of others.

<meta name="description" content='<?php
// Check if there is text in a field called 'summary' and if there is output it
// as the author has deliberately written a meta description. Otherwise grab
// the first N (e.g. 160) characters from a field called 'body', strip the HTML
// tags, replace them with a space and then collapse double spaces to single
// and strip leading/trailing spaces to produce a poor-mans meta description.
$summary = $page->get("summary");
if($summary) echo $summary; else {
 $out = preg_replace('#<[^>]+>#', ' ', wordLimiter($page->get("body"), 160));
 // All two space-runs to single space runs
 $out = preg_replace('/\s+/', ' ',$out);
 // No leading or trailing spaces.
 $out = trim( $out );
 echo $out;
}
?>' />

wordLimiter is as per Soma's post except that I removed 'public', I am including the function in-line at the top of this .inc file, I don't know why (due to my PHP feebleness) but with it in I got an error. Thanks again everyone for all your comments. Cheers, -Alan

Edit 2012-03-13-1048 gone $out = strtr( $out, array( ' ' => ' ' ) ); replaced by $out = preg_replace('/\s+/', ' ',$out); as per #.

Edit 2012-03-13-1117 OR replace all the above with this compact version :)

Edited by alan
Link to comment
Share on other sites

I think the code can be improved using:

$out = preg_replace('/\s+/', ' ',$out);

instead of:

$out = strtr( $out, array( '  ' => ' ' ) );

as the first variant will substitute even multiple spaces (more than two).

Link to comment
Share on other sites

Here's another take on generating the excerpt-

$words = 50;
$excerpt = str_replace('<p>',' ',$page->get("body"));
$excerpt = trim(strip_tags($excerpt));
$excerpt = implode(' ', array_splice(explode(' ', $excerpt), 0, $words - 1)) . '…';
echo '<p>'.$excerpt.'</p>';

It gives a nice tidy '$words' length excerpt without html tags, and without requiring any regex unpleasantness!

Link to comment
Share on other sites

Thank you slkwrm and Dave P.

If it's of interest, this is my source text:

h2 What is ProcessWire? /h2

ol li a nice list /li /ol

p ProcessWire gives you full control over your fields, templates and markup. It provides a powerful template system that works the way you do. Not to mention, ProcessWire's API makes working with your content easy and enjoyable. p

With your code DaveP (I removed the final p wrap as the contents is going in a meta description tag) the output is

What is ProcessWire?a nice list ProcessWire gives you full...

which I assume is because your code searches for an explicit tag, p. I was working on stripping HTML regardless of tag but thank you for this compact approach.

slkwrm with your code in place a double space I was still getting I'd not noticed is now removed, thanks, before with my code

What is ProcessWire?^^a nice list^^ProcessWire gives

now with your code

What is ProcessWire? a nice list ProcessWire gives

Delighted by the quality, speed and amount of help I've received here, thanks all.

Link to comment
Share on other sites

str_replace can take an array, so just add any expected opening tags to the 'tags' array, rather than just the <p>, thus

$words = 50;
$tags = array('<p>','<h2>','<ol>');
$excerpt = str_replace($tags,' ',$page->get("body"));
$excerpt = trim(strip_tags($excerpt));
$excerpt = implode(' ', array_splice(explode(' ', $excerpt), 0, $words - 1)) . '…';
echo '<p>'.$excerpt.'</p>';

The PW page I copied my original from was just working with text I had input, so I knew to only expect <p> tags :)

  • Like 1
Link to comment
Share on other sites

  • 2 months later...

Update: The code I ended using was:

$out = preg_replace('#<[^>]+>#', ' ', wordLimiter($page->get("body"), 160));
// All two space-runs to single space runs
$out = preg_replace('/\s+/', ' ',$out);
// No leading or trailing spaces.
$out = trim( $out );
echo $out;

...but I just noticed that if 'body' opens with an image tag with a large ALT description taking up a total of, say, 150 characters then you end up with only 10 characters. So this is the improved version:

$out = preg_replace('#<[^>]+>#', ' ', $page->get("body"));
$out = wordLimiter($out, 160);
// All two space-runs to single space runs
$out = preg_replace('/\s+/', ' ',$out);
// No leading or trailing spaces.
$out = trim( $out );
echo $out;

Perhaps it's a shame the ALT tag copy is not part of the meta description, but hey, this is automated content creation, which is why for the SEO-sensitive you can (I do) switch on the contents of a dedicated field, if populated it's content becomes that page's meta description and if not, then the auto code makes the meta description.

Link to comment
Share on other sites

  • 1 month later...

Truth is that if a page is worthy of a meta description you should write one not regurgitate existing content. It is a waste of cpu and your time as the SE's are more than capable of determining what to show based on users search terms and page content.

Think of the meta description as a way for you to speak to the searcher since in effect you are when the meta description is shown in the SERPS versus the engines algorithm determining that for you. Tell the user what they will find if they visit. You will be rewarded with increased conversions and that will increase your overall ranking in the algorithm.

Link to comment
Share on other sites

That's very true, but sometimes SE's don't get the right content. I use this all the time if the meta description is empty. But, then again, I always make sure they are filled over time. When you launch a new site it can come in handy. You should know the rules before you break them ;)

Link to comment
Share on other sites

Good points @JeffS and I agree with @arjen that it's handy to have this in place, at least I find it so and I continue to value the content of meta description based on Google, experience, etc and when their contents get updated at different times this auto bootstrap is useful for me.

Link to comment
Share on other sites

Sure, here (dutch) and here (dutch). Sorry about the Dutch only, but Karel is a highly recognized SEO expert. I also seen some blogs by Matt Cutts announcing this back in 2009, but can't find the links. I didn't say the meta description isn't important, but the Google algorithm seems to be changing slowly. For visitors coming from Google a highly attractive description does result in a better click ratio. As Matt Cutts always said you should write for visitors not for Google. That's why an automated description is great as a fallback, but you shouldn't use it, lay back and forget about creating unique, attrictive descriptions.

Link to comment
Share on other sites

I don't follow Google's algorithms too closely, so someone correct me if I'm wrong on any of this, but I always enjoy talking about it. :) As far as I know, the words in meta descriptions aren't used at all for ranking purposes, and never have been. But they have a ton of value for user marketing purposes, since that text actually appears to the user at Google. So a finely crafted meta description can be what makes the difference in a user clicking on your site versus another in the search result pages at Google. The same goes for inclusion of keywords in path structure, which get bolded if they match the user's query, marketing to the user in yet another way (even if negligible for actual ranking). You can build a site that performs well with search engines and completely ignore the meta description. But you'll get more people clicking from the search results to your site if you use them well.

By that token, the meta description's value can probably be seen as very important from a traditional marketing perspective. But I'm not a copy writer and often don't know how to compel one with words in any special way. So if I (as the developer) don't have the responsibility of writing content, I might prefer to have Google auto-generate one for me rather than trying to auto-generate one at the site. (i.e. omitting the meta description or leaving it blank). Though if there is already a summary or worthwhile sub-head field included with the content, they might be perfect for the meta description. But I don't think there's any reason to be afraid of Google's auto-generated description, unless you really have good human-written content for it.

The <title> tags are even more valuable for user-marketing. But unlike meta descriptions they have always carried a lot of weight for ranking. Though if my experience is correct, Google is being even more picky about what it considers a good title tag than it used to. If Google thinks your <title> tag is trying to speak more to keywords than to the user, it's value is reduced. This has always been the case, but I think it's become more so. What always seems to work well is a <title> that forms a focused and intelligible phrase (or short sentence) that reads well to the user and uses a target keyword (or keywords if they read together) in a natural way.

These details are the things that we are responsible for and are able to focus on. But ultimately, even if you do everything right on your site, 80% of search performance still has to do with what's happening elsewhere and things that may be out of our hands. Most importantly: what quality sites are linking to you and why they are linking to you. But also: how good and unique the site is (relative to others) and how long it's been around… things that tend to correlate with link quality.

  • Like 2
Link to comment
Share on other sites

I always enjoy talking about it.

Me too.

But I'm not a copy writer and often don't know how to compel one with words in any special way.

This may be true from a marketing POV, but to me - and I guess others here - you are great with words on this forum.

But I don't think there's any reason to be afraid of Google's auto-generated description, unless you really have good human-written content for it.

This seems very logical, but sometimes Google indexes really weird stuff like breadcrumbs. I have to say this only happened to me with some e-commerce solutions like Magento and Opencart. Since then I never really relied on this. Also I think that a good written text always contains an introduction and I usually make sume that such an introduction is in the meta description automatically, so that it can be edited later on.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...