Jump to content

Capture user search queries


Jennifer S
 Share

Recommended Posts

Hello. I've been hanging out in Google Analytics a bit these days enjoying the real-time activity view of my site's traffic. It has prompted me to wonder if there is a way to record the terms people search for using the built-in PW search engine to a log file of some sort on the server so I can review it from time to time to get a sense of user needs.

Link to comment
Share on other sites

Hi Jennifer,

I don't think anything exists right now, although I have a vague recollection of something along these lines being mentioned somewhere - maybe not?

Anyway, I think this could be a useful module. It could be a log file like typical apache etc log files, but then you'd need a way of analyzing them, and also archiving old ones etc. Maybe a custom database table would be a better approach. I think to get the most of it, we'd probably want to build a dashboard in the PW admin that would allow you to visualize the search terms - maybe something as simple as using a tag cloud generation algorithm (I have used this code in the past:

http://chir.ag/projects/tagline/), or maybe something more sophisticated/customizable.

I'd actually be interested in tackling this module - it shouldn't be very difficult, but I probably won't have the time for a little while at least, but I can definitely see it as a useful addition to a big long term project I have on the go right now.

Link to comment
Share on other sites

Just to add a bit of detail to Soma's answer -

  • Don't forget the semi-colon  :P 
    $log->save('querylog', $q);
    
    (I did at first.)
  • There's no need to add a file extension, as '.txt' is added automatically. (I did at first.)
  • The file 'querylog.txt' is saved in '/site/assets/logs/' like so '2014-04-02 09:39:24 username http://example.com/search/ searchword'.
  • Like 5
Link to comment
Share on other sites

If you wanted to take this further...

Set up a template called searched_words (or whatever)

Add an additional field called count (number field)

template: searched_words

fields: title, count

in _init.php (enable this in config.php) write a few lines of code to grab the search keywords if they exist. 

//do something if we have q in the url ($q=searchstring)

//check to see if the searchstring is already in the database

//if it is just increment the count field

//if it's new add a new page and set the count to one

The advantage here is being able to pull out all of your searched words and phrases ordering by count. You'll want to know which words are more heavily searched, because it means your users aren't finding what they are looking for. I used this technique when I was a CTO for an online retailer. It helped us understand which products we needed to carry, the language our users preferred (e.g. sneakers or shoes), and it illuminated our blind spots (e.g. if a lot of people were searching for privacy policy, we knew we either needed to add a privacy policy or make it easier for our users to find if it already existed). 

  • Like 6
Link to comment
Share on other sites

Yes, this sounds very useful. I don't see the code above but I understand what your intention is. That's exactly the kind of feedback I hope to gather.

I will need a database solution because my site does not have write access to the log directory in the production environment.

Link to comment
Share on other sites

Just to add a little to sshaw's comments, logging search queries in that kind of way can also help identify searchers' commonly misspelt words. I used to work for an online sports retailer and one manufacturer (Daiwa) was commonly misspelt as 'Diawa'. It makes no business sense to return no search results on a simple misspelt query such as this.

A simple preg_replace() can fix obvious user errors like this, but a more elegant solution using, for example, mySQL's 'SOUNDS LIKE', searching your stored search history can quite easily be used to generate suggestions like google does ('Did you mean "Daiwa"?').

  • Like 4
Link to comment
Share on other sites

I don't see the code above but I understand what your intention is. 

Truth be told, I didn't fill in the code, because I'm not an expert with PW yet and have to look everything up in the API and cheat sheet. Here's an attempt at the code, however It's untested and probably won't work as is. It should get you started though. It won't take long for other users to correct any of my mistakes or provide additional help. 

//Assuming template is called keywords and field is called counter...

//do something if we have q in the url (q=searchstring)
if(!empty($input->get->q)) {
  $keyword = $pages->get('template=keyword,name='.$input->get->q);
  if($keyword) {
    //update the count
    $keyword->counter = $keyword->count + 1;
    $keyword->save();
  } else {
    //create a new page
    $p = new Page(); // create new page object
    $p->template = 'keyword'; // set template
    $p->parent = $pages->get('/keywords/'); // set the parent, you can also use ID
    $p->name = $input->get->q; // give it a name used in the url for the page
    $p->title = 'My New Page'; // set page title (not neccessary but recommended)
    $p->counter = 1;
    $p->save();
  }
}

Saving pages is discussed in more detail here: https://processwire.com/talk/topic/352-creating-pages-via-api/

logging search queries in that kind of way can also help identify searchers' commonly misspelt words. I used to work for an online sports retailer and one manufacturer (Daiwa) was commonly misspelt as 'Diawa'.

Dave has an excellent point and doing this will increase sales. We used soundex I think. Sounds like wasn't available at the time. 

It's also worth mentioning that sometimes you're pay per click marketing dollar will go further when bidding on commonly misspelled keywords, because your competitors haven't thought to do this (if they had the cost per click would be high). There's no point in picking just any misspelled word. You want to make sure a lot of people are searching with the misspelling and that the cost per click is low.

We used to even bid on our competitor's business names so that when their loyal customers would search for them our site would be right below them in the sponsored results.

There's also a difference between "business name" and "businessname.com", which people commonly type into the search engines instead of the address bar. We used to bid on our business name and domain name for this reason. We made sure we were in the sponsored results and first several results of the organic search results.

  • Like 1
Link to comment
Share on other sites

We used to even bid on our competitor's business names so that when their loyal customers would search for them our site would be right below them in the sponsored results.

Annoyed the heck out of us when people did that!  >:D

Link to comment
Share on other sites

Thanks for the attempt sshaw...Most important thing missing is you are saving user input without first sanitizing it....Never trust user input so we must sanitize it before saving it :-)

More here to help you choose an appropriate sanitizer:

http://processwire.com/api/variables/sanitizer/

Thanks Kongondo!

$input doesn't run the sanitizer automatically? I don't understand the purpose of using $input->get over $_GET if the sanitizer isn't run automatically. Does anyone know?

Link to comment
Share on other sites

Annoyed the heck out of us when people did that!  >:D

Yep, I totally get that. When our competitors would bid on our business name (or leech off of us in some fashion) we tried to hold a perspective of abundance (there's enough here for us all). It felt good that they recognized our presence in the market and it meant that we were doing things well.

We felt there was a benefit when there was enough room in the market and both businesses are growing, but not direct competitors yet. Similar to reality show contestants who team up when there's a competitive advantage, then as the weaker competitors are weeded out and the competition goes head to head the alliances are broken. We would have been happy to see our competitors following suit (many of them did). We even let smaller companies blatantly steal our product images, the only time we cared was when they watermarked them with their own copyright message.

This type of competition is also better for the consumer too, because businesses have to be better at what they do or their consumers will go to someone who is (it's always about providing the best value to the consumer).

Link to comment
Share on other sites

Thanks Kongondo!

$input doesn't run the sanitizer automatically? I don't understand the purpose of using $input->get over $_GET if the sanitizer isn't run automatically. Does anyone know?

It's all here, nice and clear as daylight: http://processwire.com/api/variables/input/    :-)

But I'll post it here anyway...(emphasis added)

THE $INPUT VARIABLE IS YOUR CONNECTION TO GET, POST AND COOKIE VARIABLES, URL SEGMENTS, AND PAGE NUMBERS. 
 

It provides this via $input->get, $input->post, $input->cookie, $input->urlSegment(n) and $input->pageNum. While you could also use PHP's $_GET, $_POST, and $_COOKIE (superglobal) variables, $input provides these benefits:

  • You don't have to worry about PHP's magic_quotes, as values provided by $input are never escaped.
  • $input returns NULL if you access a variable that doesn't exist (no need to use isset() like with PHP's superglobals)
  • $input->whitelist provides a place for you to store input variables you have sanitized and want to share with ProcessWire or plugin modules. 
  • Variables can be accessed in either object, array, or function fashion, according to your preference and need

Please note that the values returned by $input->get, $input->post, and $input->cookie are just as dangerous as those returned by $_GET, $_POST and $_COOKIE. They do not have any sanitization or validation done to them. You should never use the data without making sure that it's valid. If you echo any keys or values from $input, you always want to encode them with PHP's htmlentities() or htmlspecialchars() at minimum. ProcessWire provides some built-in sanitization and validation functions which you can access with $sanitizer

Link to comment
Share on other sites

  • 1 month later...

nice consesus folks haha..just had the same idea couple of days ago and hacked something together

it could get some improvements but it´s working quite nice as it is

nobody has to enhance this one for me..only if fun or own use ;)

just wanted to share my implementation of this idea

ah maybe nicer to mention what´s going on in short

it´s not only about storing search terms but giving the possibility to provide alternative spellings as well (not automated though)

so my search.php is checking if the search term matches an alternative spelling, if yes it´s creating the needed selector if no create the "normal" selector (now I´m thinking it could be better to change those, so every search is a default one first and if this one not matches it´s trying to get an alternative spelling, maybe it´s a little faster?!)

then it´s spitting out the results

if search term not yet present as page..it´s creating one with a little counter which of course gets incremented when already present

(I´m outputting those in the admin dashboard and thinking about a little button and pageselect next to each misspelled word which would save it as alternative spelling

the alternative spellings are just a title and a body and you need to add words manually right now

if no results or results < 3 (could merge them) or no search term at all it´s appending a little sitemap as alternative

at at the end the search form (a little one is already in the head)

if($q = $sanitizer->selectorValue($input->get->q)) {
	
	$input->whitelist('q', $q);
	
	$alternative = $pages->get("parent=5756, search_keywords%=\"$q\"")->title;
	
	if(count($alternative)) {
		$countSelect = "title|body%=$alternative, id!=27|5724|5756|5760, template!=search-keywords|search-alternatives";
		$resultsSelect = "title|body%=$alternative, id!=27|5724|5756|5760, template!=search-keywords|search-alternatives, limit=50";
	}
	else {
		$countSelect = "title|body%=$q, id!=27|5724|5756|5760, template!=search-keywords|search-alternatives";
		$resultsSelect = "title|body%=$q, id!=27|5724|5756|5760, template!=search-keywords|search-alternatives, limit=50";
	}
	
	$count = count($pages->find("$countSelect"));
	$results = $pages->find("$resultsSelect");
	
	if($alternative) {
		$content .= "<p>No results for <strong>\"$q\"</strong><br>
					Instead we found $count matches for <strong>\"$alternative\"</strong>.</p>";
	}
	elseif($count === 0) {
		$content .= "<p>No results for \"$q\"</p>";	
	}
	else {
		$content .= "<p>$count matches for \"$q\"</p>";	
	}
	
	if($count) {
		$content .= "<ul>";
		foreach($results as $r) $content .= "<li><a href='$r->url'><strong>$r->title - $r->headline</strong></a><p>" . wordLimiter($r->body,20,'...') . "</p></li>";
		$content .= "</ul>";
		$content .= "<div class='row'>" . $results->renderPager(array(
		    'nextItemLabel' => "Next",
		    'previousItemLabel' => "Prev",
		    'listMarkup' => "<ul class='pagination six centered text-centered'>{out}</ul>",
		    'itemMarkup' => "<li class='{class}'>{out}</li>",
		    'linkMarkup' => "<a href='{url}'>{out}</a>",
			'currentItemClass' => "active",
			'currentLinkMarkup' => "<span>{out}</span>"
		)) . "</div>";
	}
	
	// Save Keyword as Page or Increase when already exists
	$keyword = $pages->get("template=search-keywords, title=$q");
	
	if(!$keyword->id) {
		$p = new Page();
	    $p->parent = $pages->get("5760");	// /suche/liste-der-suchbegriffe/
	    $p->template = 'search-keywords';
		$p->title = $q; //$input->whitelist('q', $q);	//date("d.m.Y - G:i:s");
		$p->name = $q;
		$p->save();
		
		$p->of(false);
		
		$p->search_keyword_count = 1;
		$p->save();
	}
	else {
		$keyword->of(false);
		$keyword->search_keyword_count = $keyword->search_keyword_count + 1;
		$keyword->save(); 
	}
	
}
if($count === 0 || $count < 3 || !$q) {
	$content .= "<p>Sitemap</p>";
	$content .= "<ul>";
	foreach($pages->find("template!=admin, has_parent!=2, id!=27|5724 include=all") as $page) {
		$content .= "<li><a href='$page->url'><strong>$page->title - $page->headline</strong></a><br>". wordLimiter($page->body,20,'...') ."</li>";
	}
	$content .= "</ul>";
	$content .= "<p>Or try a search.</p>";
}	
$content .= "<div class='row'><div class='seven columns centered'>" . renderSearchForm() . "</div></div>";

hardcoded the messages in german, so it´s just a quick translate to make clear what it´s saying ;)

cheers

PS: what about srollable code blocks in here?

Update:

Simplified the selector

<?php
$templates = "template=home|basic-page|course|courses|workshop|workshops|album|gallery|link-category|links|teacher|teachers";
$limit = "limit=15";

$alternative = $pages->get("parent=5756, search_keywords%=\"$q\"")->title;

if($alternative) $selector = "title|headline|body%=$alternative, id!=27|5724|5756|5760, $templates, $limit";
	else $selector = "title|headline|body%=$q, id!=27|5724|5756|5760, $templates, $limit";

$results = $pages->find("$selector");
	$count = $results->getTotal();

Only one $pages->find and.

Tried to combine the $selector but "title|headline|body%=$alternative|$q wasn't working, got wrong results

Could at least combine the exclusion..but I'm busy with another project right now :)

Edited by Can
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...