Jump to content

Cross reference search results


MuchDev
 Share

Recommended Posts

I have been working on this logic off and on lately and have made some progress. The original question has been answered/fixed. Now I am on to making my search results make more sense. 

See from post #11 for new content.

----------------------------------------------------------------------

I fear I am a bit out of my depth on this one. I have been trying to get a site search up on an art gallery site that I am developing. I have created a template that has all of the fields in all of the templates on my site that I want to search against. From this list I am wanting to generate my search query. When I try and run a fulltext query based on these templates I get a big error. 

Error: Exception: SQLSTATE[HY000]: General error: 1191 Can't find FULLTEXT index matching the column list

//use field template to create list of fields
	
foreach ($pages->get("/search-engine-fields/")->fields as $field){
	$allFields .= $field.'|';
}
	
$matches = $pages->find("{$allFields}body~=$q,limit=50,template!=admin,template!=permission,template!=user"); 

 I have attempted using %= but then I am unable to locate items which have segments that contain the terms. Is there another method that I should employ?

Link to comment
Share on other sites

I can't really help you with your error, but maybe use this to exclude all the admin stuff.

has_parent!=2

Also you could use just a template to get all fields, without an extra page.

$search_here = $templates->get("name");

foreach($search_here->fields as $field){
  …
}
  • Like 1
Link to comment
Share on other sites

Hey thanks, that is much cleaner I didn't thing about being able to fetch the template fields without using a page :)

	foreach ($templates->get("fields")->fields as $field){
		$allFields .= $field.'|';
	}
	
	$matches = $pages->find("{$allFields}body%=$q,limit=50,has_parent!=2,template!=404"); 

Search is still broken though when I try and use a ~= selector though. Also this query will not find results that contain text on multiple fields.

Link to comment
Share on other sites

Well after hours of reading I think that I may just not be able to query this data properly using only the built in mysql search features. For this reason I have gone and installed ElasticSearch on my server. Now just to figure out how to get it talking to processwire. So far I have it installed and running and returning its default record.  It seems like the module is working but then it doesnt index anything.

Link to comment
Share on other sites

Not really sure if this helps or not, but it seems like you have a field that doesn't have a fulltext index defined (stating the obvious I know), but if you can separate your search into two parts - one for fields that are working using ~= or *= and then problem field using %= and then appending the results of one search to the other, you might be ok.

I think to start it would be good to figure out which is the problem field. Can you try listing the fields manually and work that out?

  • Like 1
Link to comment
Share on other sites

This is a good point, after reviewing all of the fields that I have added there are some non text fields ie:datetime, repeater, and page. I am a bit green when it comes to database work, so I misunderstood what the documentation was refering to about what a full text search is. Is it possible to run a text search on repeater and datetime fields, or will this have to be done via another method?  

Link to comment
Share on other sites

Sure, you can do something like this in your selector for date/time, which would return all results after the current time.

find("your_date_time_field>".time());

For repeaters you want something like:

find("title|my_repeater.my_field~=$q");

Hope the helps get you going.

  • Like 1
Link to comment
Share on other sites

That was very helpful. My search is no longer breaking. Now I am on the right track. 

So in order to run a query that would locate an item based on partial text in different fields then what operator would I use. For instance a template record which has this information

Title :            Fabulous artwork

year:            1999

medium:       Lithograph

description:  This description has a keyword in it as well like Paris

Let's suppose that a user entered the query into the search : Paris lithograph 1999 or artwork lithograph. Would this search be able to be executed in one shot or would I have to run a partial search on each keyword and then re-search the records for the next keyword in line?

Well it looks as if I have more reading to do. I googled a bit more and found an old post about combining fields in a search. This hopefully will allow me to get a better result list.

https://processwire.com/talk/topic/4860-selector-match-that-mixes-multiple-fields/ 

Link to comment
Share on other sites

So I've studied some other's code and something like this seems to be what I am wanting. I will break the words off and search them, at that point I will add the results to the search array. This still is not working for some reason though. 

$queryWords = explode(" ",$q);
	
foreach ($queryWords as $word){

	foreach ($templates->get("fields")->fields as $field){
		if (!$matches) $matches = new PageArray();
		$partialResults =  $pages->find("{$field}%=$word,has_parent!=2,template!=404"); 
		$matches->append($partialResults);
	}
}
Link to comment
Share on other sites

Well I was really close. Here is what I came up with. 

//use field template to create list of fields
	
//separate out words in query and iterate them

$queryWords = explode(" ",$q);

foreach ($queryWords as $word){
	foreach ($templates->get("fields")->fields as $field){
		if (!$matches) $matches = new PageArray();
		try {
			$partialResults =  $pages->find("{$field}%=$word,has_parent!=2,template!=404"); 
		}
		catch(Exception $e) {
		     break;
		}
		$matches->append($partialResults);
	}
}

Any suggestions? Everything seems to be working correctly. I am now fetching a boatload of results. Maybe even too many. 

Link to comment
Share on other sites

Ok so I have managed to get a pretty good list of results going. All my pages are now showing up :). Now I am on to sorting based on relevance. I have a couple things that I would like to implement but I could really use some input on. First off lets start with my code. 

	
//use field template to create list of fields
	
//separate out words in query and iterate them

$queryWords = explode(" ",$q);

$matches =  $pages->find("title*=$q,has_parent!=2,template!=404"); 
//$matches = $pages->find("{$allFields}body*=$q,has_parent!=2,template!=404"); 

foreach ($queryWords as $word){
	if (count($word)>3){
		foreach ($templates->get("fields")->fields as $field){
			if (!$matches) $matches = new PageArray();
			try {
				$partialResults =  $pages->find("{$field}%=$word,has_parent!=2,template!=404"); 
				}
			catch(Exception $e) {
			break;
			}
			$matches->append($partialResults);
		}
	}
}

In my code I have referenced a template that I have created with a big list of fields. 

1.	title
2.	artist_firstname
3.	artist_lastname
4.	artist_nationality 
5.	artist_biography_snippet
6.	artist_biography 
7.	artwork_dept 
8.	artwork_year
9.	artwork_ed
10.	artwork_dimensions
11.	artwork_pricePrice
12.	artwork_medium 
13.	Tags
14.	antique_artist 
15.	artwork_condition 
16.	artwork_printer 
17.	artwork_series 
18.	artwork_references 
19.	artwork_publisher 
20.	artwork_provenance 
21.	artwork_description 
22.	artwork_signature
23.	publication_title 
24.	publication_subtitle 
25.	publication_medium 
26.	publication_medium_detail 
27.	publication_date
28.	publication_dimensions
29.	publication_description 
30.	publication_price
31.	publication_type
32.	subsection_description

All of these are referenced and then searched against. This seems like way too much overhead but I can't think of any other way as this site will be very textbox heavy. What I would like to come up with is a method to prioritize results and sort them based on relevance and maybe a simple method to cross reference the results. Also I was wondering if there was a php or processwire method that anyone knew about that would be able to determine weather a word was a stopword I would then use that in my find loop

Link to comment
Share on other sites

I can't really help you with your error, but maybe use this to exclude all the admin stuff.

has_parent!=2

Also you could use just a template to get all fields, without an extra page.

$search_here = $templates->get("name");

foreach($search_here->fields as $field){
  …
}

If you don't have include=all in your selector, there's no need for has_parent!=2. If a visitor (guest) user searches it won't return those in admin anyway, only for superusers. It's generally a good approach to specify templates you actually want to search only. Also you can only have one "has_parent!=2" in the selector, so you couldn't use it for if you need it to restrict a search to a branch.

I don't think there's one simple answer to building a search like this. I'm not sure about what exactly are the details here, and what is important to be able to search. It really comes down to what field you search and how. Relevance, as far as I know only works with a single find(), using full text search on a single text field using * or ~ , this will return the results sorted by relevance already, everything else won't give you relevance. 

By the glimpse at your code there's would be a lot of overhead and using this approach won't ever give you a relevance search results.

Of topic:

if (count($word)>3){

you can't count string only arrays

if(!$matches)

This doesn't work. A PageArray if empty doesn't return false, you'd check with if(!count($matches))

So what to do?

I don't know an answer to what you should do. In general you seem too have a lot of fields, that seems unnecessary and could be simplified and reused, instead of having 3 "xname_description" fields, you could have one "description" field and use it everywhere. This will help reduce the fields to search for at least.

Building search once you enter a complex setup with lots of fields and pages and want ot have relevance you might be best up to create indexes, like the cache field to combine text fields to one (hidden) cache field, that you would then search with a single query. 

  • Like 2
Link to comment
Share on other sites

    I thank you Soma for taking time to respond to my questions and LostKobrakai for giving some good feedback on my selectors. That is a fantastic bunch of info that you have provided, and really does help me to understand the platform and the methods that I will need to employ to complete this. This is just a general search before I get to any filtering or sorting. This search just needs to return a more complete and less filtered result list that holds items that are just generally associated to the query. Mostly what I see that people will search for is something like Artist Name | Artwork Name or Artist Name | Medium or even just a search for Tag. From here they would filter their search based on department, year,price,medium.

   The reason that I have used so many fields is mostly due to the fact that I am also creating descriptions specific to the templates, so that a very basic user would be able to interact and understand what they had to enter in order to make an item display on the site, I also am working with another developer friend and wanted to try and keep the variables in a naming convention that they would hopefully be able to intuit their meaning easier. 

   The data that I would like to be searchable is pretty basic as it would mostly be just the items themselves, and all of their relevant metadata. About 80% of this site will be this template (items):

title
contemporary_artist_selection - links item to their artist. As items may be in different parents or sections the artist will be selected using a page field so that it can be located globally.
artwork_year
artwork_ed
artwork_medium
artwork_dimensions
artwork_price
artwork_sale_information
Tags
artwork_description
gridSizer - used for markup and will not need to be searchable
artwork_img - used for markup and will not need to be searchable
meta_description
meta_keywords

  So now at 

Of topic:

if (count($word)>3){

you can't count string only arrays

if(!$matches)

This doesn't work. A PageArray if empty doesn't return false, you'd check with if(!count($matches))

I'm a bit of a php noob so I didn't realize that this was not going to work, good to know :). I suppose I would have gotten here in the end, but I really appreciate you helping me on this. 

So what to do?

I don't know an answer to what you should do. In general you seem too have a lot of fields, that seems unnecessary and could be simplified and reused, instead of having 3 "xname_description" fields, you could have one "description" field and use it everywhere. This will help reduce the fields to search for at least.

Building search once you enter a complex setup with lots of fields and pages and want ot have relevance you might be best up to create indexes, like the cache field to combine text fields to one (hidden) cache field, that you would then search with a single query. 

This is when I get to the EUREKA moment. Your suggestion would be perfect for this. I could create a cache field that held all of these fields in one field and just reference it for my search! So now to look into creating this functionality and using this instead of trying to search the specific fields in the beginning. I can then run more specific searches based on user input. You Soma are a GENIUS. 

  • Like 1
Link to comment
Share on other sites

Following your advice I have implemented a cache field that holds all of the fields that I want to query. I am running into an issue though where I am not returning any results from a page field when I search using the cache field. Is this a limitation of the module, a setting issue, or something I am doing wrong.

This should work with page fields correct?

$matches = $pages->find("search_fields%=$q"); 
Link to comment
Share on other sites

So for some reason this method is really fighting me. I have been fiddling with different ways to make the cache field pick up my page field that I use to define an artist in an artwork. So far I have tried enabling autojoin and also have tried to create a new cache field that just caches this field as well. I am going to try another page field type and see if something else will work. 

**edit**

Ok so I am still not sure why this page field is not caching but I have just decided to work around it by including the field name in the query. I like the idea of only using a cache field to generate results though, and look forward to any tips on how to get this up and running :)

Link to comment
Share on other sites

Thank you I looked everywhere and couldn't find anything on the subject. I was under the assumption that I had just configured something incorrectly. Everything is working perfectly now thanks to your guidance. You are a good man Soma!

As a side note I just wanted to thank you for creating your ajax progressive search module. I have been using it to show results on my site and it looks amazing. It works so well for image results, I really couldn't ask for anything more. Cheers!

  • Like 1
Link to comment
Share on other sites

*edit* I managed to speed up the query significantly by changing my query operator to ~= instead of %=. I guess the documentation was right about speed. 

So I fiddled and I fiddled to get reliable results that combine partial text, multiple words, and multiple fields all in one search. Here is what I came up with. 

	// Send our sanitized query 'q' variable to the whitelist where it will be
	// picked up and echoed in the search box by the head.inc file.
	$input->whitelist('q', $q); 

	//separate out words in query and iterate them
	$queryWords = explode(" ",$q);
	
	//build query
	$wordcount = count($queryWords);
	
	foreach ($queryWords as $word){
	$wordcount--;	
	if ($wordcount>0){
		$sep = ',';
	}else{
		$sep = '';
	}
		
	$sr1 = '';
	$sr2 = '';
	$sr3 = '';
	
	$sr1 = "title|";
	//if word is not a number then query contemporary_artist_selection page field
	if (!is_numeric($word)){
		$sr2 = "contemporary_artist_selection|";
	}
	
	$sr3 = "search_fields";
	
        //was using %= to search but found search times too slow.
	$query.= "$sr1$sr2$sr3~=$word$sep";	
	
	}
        	
        //query built, now find and spit errors for debugging
	try{
	    $matches = $pages->find("$query,limit=20");
	}catch(Exception $e){
	    echo $e."\n\nThere has been an error \n Query:$query";
	}

	$count = $matches->getTotal(); 
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...