Jump to content


Photo

Automatic meta description

seo

  • Please log in to reply
23 replies to this topic

#1 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 12 March 2012 - 02:43 PM

Hi,

I've put together a small piece of code for use in the HTML HEAD tag to automate the creation of the content for a META DESCRIPTION tag.

I would be interested if anyone can see a neater way to do it or solve my regex TODO. And otherwise I'm posting just in case this is helpful for anyone else.


<meta name="description" content='<?php

// Check if there is text in the summary field, if so output it as the author
// has deliberately written a meta description. Otherwise grab the first N
// (e.g. 160) characters of the body field, strip the HTML tags, replace them
// with a space and then output it as a poor-mans meta description.
//
// TODO this regex replaces opening and closing HTML tags and so
// <h2>What are hedgehogs?</h2><p>Hedgehogs are
// is cleaned up like this
// _What are hedgehogs?__Hedgehogs are
// when the ideal would be
// What are hedgehogs?_Hedgehogs are

$summary = $page->get("summary");
if($summary) echo $summary; else echo preg_replace('/<[^>]*>/', ' ', substr($page->get("body"), 0, 160));

?>' />


#2 adamkiss

adamkiss

    Master of the universe

  • Moderators
  • 1,086 posts
  • 291

Posted 12 March 2012 - 03:00 PM

What is wrong with strip_tags?

  $strippedText = strip_tags($unstrippedText);

Regarding length, you could use one of the word / sentence cutting functions. I have no time to search for one right now, but I remember seeing one, that would take string and length you wanted - and it gave you snippet, which ended with full sentence (if the delta between max_length(the one you wanted) and actual length was lower than 5%) or with whole words.

#3 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 12 March 2012 - 03:10 PM

adamkiss there's probably nothing wrong with strip_tags, thank you for steering a PHP 7-stone-weakling in a better direction ;)

I'll go search as suggested!

#4 Soma

Soma

    Hero Member

  • Moderators
  • 3,408 posts
  • 1941

  • LocationSH, Switzerland

Posted 12 March 2012 - 03:20 PM

What adam said.


	public function wordLimiter($str = '', $limit = 120, $endstr = '...'){

		if($str == '') return '';

		if(strlen($str) <= $limit) return $str;

		$out = substr($str, 0, $limit);
		$pos = strrpos($out, " ");
		if ($pos>0) {
			$out = substr($out, 0, $pos);
		}
		$out .= $endstr;
		return $out;

	}

Cut's between words.

@somartist | modules created | support me, flattr my work flattr.com


#5 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 12 March 2012 - 03:26 PM

adamkiss OK I found/remembered why I didn't use strip_tags. If the BODY that is sampled is:

<h2>My hedgehogs</h2>
<p>Hedgehogs are cute.</p>

then strip_tags produces


My hedgehogsHedgehogs are cute.

I.e. it runs two words together.

But thanks for the suggestions and I'm going to keep looking for that PHP tag you noted that would strip to a sentence, v. useful.

#6 netcarver

netcarver

    Sr. Member

  • Members
  • PipPipPipPip
  • 428 posts
  • 342

  • LocationUK

Posted 12 March 2012 - 03:36 PM

Hi Alan,

Haven't we met before somewhere? Anyway, you are almost there with your code. Welcome to the world of regular expressions. Now, to go from...

// _What are hedgehogs?__Hedgehogs are

...to...

// What are hedgehogs?_Hedgehogs are

You need to do a couple of extra steps if you want the PHP workout and can't find anything else. If you store what you want to echo in $out then do this (I'm showing it in two steps but you can combine if you want)...

$out = strtr( $out, array( '  ' => ' ' ) ); // All double space runs to single space runs
$out = trim( $out ); // No leading or trailing spaces.
echo $out;

You should get your target result.
Steve ☧

#7 Soma

Soma

    Hero Member

  • Moderators
  • 3,408 posts
  • 1941

  • LocationSH, Switzerland

Posted 12 March 2012 - 03:36 PM

This is a method to strip html tags:

preg_replace('#<[^>]+>#', ' ', $page->body)


If you're picky to get 2 spaces to 1 you could do instead.

trim(str_replace("  "," ",preg_replace('#<[^>]+>#', ' ', $page->body)))

But in html they collapse if there's more than 1 space except &nbsp; so I think it doesn't matter much.

@somartist | modules created | support me, flattr my work flattr.com


#8 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 12 March 2012 - 03:41 PM

Hi netcarver, why yes, your name is familiar :) Nice to see some TXP people here.

Thank you for the PHP loveliness, I'll go adjust!

Soma, thanks for the function, re-useable and gentler ending with the elipsis is nicer and thanks for the regex too!

#9 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 12 March 2012 - 03:53 PM

Thanks everyone for your most helpful comments, now producing perfect output, I'll post back here shortly what I ended up with for my ref/to help anyone else similarly PHP disabled as I am). Cheers!, -Alan

#10 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 12 March 2012 - 05:37 PM

OK, so this is what I ended up with thanks to the kind helping hands of others.

<meta name="description" content='<?php
// Check if there is text in a field called 'summary' and if there is output it
// as the author has deliberately written a meta description. Otherwise grab
// the first N (e.g. 160) characters from a field called 'body', strip the HTML
// tags, replace them with a space and then collapse double spaces to single
// and strip leading/trailing spaces to produce a poor-mans meta description.
$summary = $page->get("summary");
if($summary) echo $summary; else {
  $out = preg_replace('#<[^>]+>#', ' ', wordLimiter($page->get("body"), 160));
  // All two space-runs to single space runs
  $out = preg_replace('/\s+/', ' ',$out);
  // No leading or trailing spaces.
  $out = trim( $out );
  echo $out;
}
?>' />

wordLimiter is as per Soma's post except that I removed 'public', I am including the function in-line at the top of this .inc file, I don't know why (due to my PHP feebleness) but with it in I got an error. Thanks again everyone for all your comments. Cheers, -Alan

Edit 2012-03-13-1048 gone $out = strtr( $out, array( ' ' => ' ' ) ); replaced by $out = preg_replace('/\s+/', ' ',$out); as per #.

Edit 2012-03-13-1117 OR replace all the above with this compact version :)

Edited by alan, 13 March 2012 - 10:18 AM.


#11 slkwrm

slkwrm

    Sr. Member

  • Members
  • PipPipPipPip
  • 279 posts
  • 101

Posted 13 March 2012 - 12:15 AM

I think the code can be improved using:
$out = preg_replace('/\s+/', ' ',$out);

instead of:

$out = strtr( $out, array( '  ' => ' ' ) );

as the first variant will substitute even multiple spaces (more than two).

#12 DaveP

DaveP

    Sr. Member

  • Members
  • PipPipPipPip
  • 302 posts
  • 156

  • LocationChorley, UK

Posted 13 March 2012 - 04:14 AM

Here's another take on generating the excerpt-

$words = 50;
$excerpt = str_replace('<p>',' ',$page->get("body"));
$excerpt = trim(strip_tags($excerpt));
$excerpt = implode(' ', array_splice(explode(' ', $excerpt), 0, $words - 1)) . '&hellip;';
echo '<p>'.$excerpt.'</p>';

It gives a nice tidy '$words' length excerpt without html tags, and without requiring any regex unpleasantness!
Twitter : Facebook : GitHub : G+ : Blog : Powered by C8H10N4O2 and C10H14N2

#13 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 13 March 2012 - 09:43 AM

Thank you slkwrm and Dave P.

If it's of interest, this is my source text:

h2 What is ProcessWire? /h2
ol li a nice list /li /ol
p ProcessWire gives you full control over your fields, templates and markup. It provides a powerful template system that works the way you do. Not to mention, ProcessWire's API makes working with your content easy and enjoyable. p

With your code DaveP (I removed the final p wrap as the contents is going in a meta description tag) the output is
What is ProcessWire?a nice list ProcessWire gives you full...
which I assume is because your code searches for an explicit tag, p. I was working on stripping HTML regardless of tag but thank you for this compact approach.

slkwrm with your code in place a double space I was still getting I'd not noticed is now removed, thanks, before with my code
What is ProcessWire?^^a nice list^^ProcessWire gives
now with your code
What is ProcessWire? a nice list ProcessWire gives

Delighted by the quality, speed and amount of help I've received here, thanks all.

#14 DaveP

DaveP

    Sr. Member

  • Members
  • PipPipPipPip
  • 302 posts
  • 156

  • LocationChorley, UK

Posted 13 March 2012 - 09:51 AM

str_replace can take an array, so just add any expected opening tags to the 'tags' array, rather than just the <p>, thus

$words = 50;
$tags = array('<p>','<h2>','<ol>');
$excerpt = str_replace($tags,' ',$page->get("body"));
$excerpt = trim(strip_tags($excerpt));
$excerpt = implode(' ', array_splice(explode(' ', $excerpt), 0, $words - 1)) . '&hellip;';
echo '<p>'.$excerpt.'</p>';

The PW page I copied my original from was just working with text I had input, so I knew to only expect <p> tags :)
Twitter : Facebook : GitHub : G+ : Blog : Powered by C8H10N4O2 and C10H14N2

#15 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 13 March 2012 - 10:15 AM

Oo THANKS for pointing out that it's an array DaveP, this is neat :D

#16 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 23 May 2012 - 03:55 PM

Update: The code I ended using was:
$out = preg_replace('#<[^>]+>#', ' ', wordLimiter($page->get("body"), 160));
// All two space-runs to single space runs
$out = preg_replace('/\s+/', ' ',$out);
// No leading or trailing spaces.
$out = trim( $out );
echo $out;
...but I just noticed that if 'body' opens with an image tag with a large ALT description taking up a total of, say, 150 characters then you end up with only 10 characters. So this is the improved version:
$out = preg_replace('#<[^>]+>#', ' ', $page->get("body"));
$out = wordLimiter($out, 160);
// All two space-runs to single space runs
$out = preg_replace('/\s+/', ' ',$out);
// No leading or trailing spaces.
$out = trim( $out );
echo $out;
Perhaps it's a shame the ALT tag copy is not part of the meta description, but hey, this is automated content creation, which is why for the SEO-sensitive you can (I do) switch on the contents of a dedicated field, if populated it's content becomes that page's meta description and if not, then the auto code makes the meta description.

#17 JeffS

JeffS

    Distinguished Member

  • Members
  • PipPipPip
  • 76 posts
  • 62

  • LocationMichigan - USA

Posted 03 July 2012 - 07:52 AM

Truth is that if a page is worthy of a meta description you should write one not regurgitate existing content. It is a waste of cpu and your time as the SE's are more than capable of determining what to show based on users search terms and page content.

Think of the meta description as a way for you to speak to the searcher since in effect you are when the meta description is shown in the SERPS versus the engines algorithm determining that for you. Tell the user what they will find if they visit. You will be rewarded with increased conversions and that will increase your overall ranking in the algorithm.

#18 arjen

arjen

    Sr. Member

  • Members
  • PipPipPipPip
  • 366 posts
  • 146

  • LocationHoogeveen, The Netherlands

Posted 03 July 2012 - 11:33 AM

That's very true, but sometimes SE's don't get the right content. I use this all the time if the meta description is empty. But, then again, I always make sure they are filled over time. When you launch a new site it can come in handy. You should know the rules before you break them ;)
work will always be the curse of the drinking classes...

#19 alanfluff

alanfluff

    Sr. Member

  • Members
  • PipPipPipPip
  • 439 posts
  • 128

  • LocationOttawa, Canada

Posted 03 July 2012 - 11:44 AM

Good points @JeffS and I agree with @arjen that it's handy to have this in place, at least I find it so and I continue to value the content of meta description based on Google, experience, etc and when their contents get updated at different times this auto bootstrap is useful for me.

#20 arjen

arjen

    Sr. Member

  • Members
  • PipPipPipPip
  • 366 posts
  • 146

  • LocationHoogeveen, The Netherlands

Posted 03 July 2012 - 03:17 PM

According to the latest seo standards Google is using less and less importance with the title and meta description. But that's another discussion ;)
work will always be the curse of the drinking classes...





Also tagged with one or more of these keywords: seo

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users