Jump to content

Need advice for more performant db query


sz-ligatur
 Share

Recommended Posts

Hello, I'm struggling with a query in search results that slow down the rendering way too much. So much I appreciate the processwire api, I may have done stupid things with it.

I have a site-search for documents (pages). After retrieving the matches, I query for the used tags at each page to render a tag-filter menu (isotope.js). In addition there is a counter on each tag label to show the usage of this tag by documents in the current result list.

Thats my code…

//$matches : found pages
//$tags : collection of all tags used by $matches (a tag is a page)
//$page->tax_tag : is page reference field at document pages, multiple select

foreach ($tags as $tag) {
	$usage = 0;
	foreach ($matches as $doc) {
		if ($doc->tax_tag->has("id={$tag}")) $usage++;
	}
	$label = "{$tag->get('title')} {$usage}";
	
}

I guess that's award-worthy for producing as many unnecessary database hits as possible ... ?

What do you think is the right way to get this data out of the database? Is anybody out there with a blazing sql-query to get all that tag-info in one hit?

Link to comment
Share on other sites

Instead of the nested foreach you could do

foreach ($tags as $tag) {
	$usage = $matches->find("tax_tag={$tag}")->count;
	$label = "{$tag->get('title')} {$usage}";
	
}

As for optimizing speed, how many pages are there potentially in $matches and in $tags, hundreds or thousands?

To speed up queries, you can use  $pages->findRaw() and only get the properties that you need for output and then work with those arrays in memory.

Example:

// find all pages with template document that have a tax_tag assigned
// returns associative array indexed by page id with fields title, body and tax_tag where tax_tag is an associative array indexed by tax tag id with fields id and title
// [
//     1234 => [
//         'title' => 'Page title',
//         'body' => 'Page body content',
//         'tax_tag' => [
//             4567 => [
//                 'id' => 4567,
//                 'title' => 'Tax tag title',
//             ],
//             ...
//         ],
//     ],
//     ...
// ]
$matches = $pages->findRaw("template=document, tax_tag!=''", ['title', 'body', 'tax_tag' => ['id', 'title']]);

Now you can use the resulting associative array $matches and do operations on it in memory without further DB calls.

If you want to get a unique array of tags with usage count that are inside $matches, you could do something like this:

 

// loop through matches to construct available tags array
$tagsAvailable = array();
foreach($matches as $m) {
    // loop through tax_tags of each item
    foreach($m['tax_tag'] as $key => $tag) {
        // if key found in $tagsAvailable, tag is already there and we continue
        if(array_key_exists($key, $tagsAvailable)) continue; 
        // get count of occurences of tax_tag id inside $matches items 
        // and assign count as field 'usage' to the tag
        $tag['usage'] = count(array_filter(array_column($matches, 'tax_tag'), function($arr) use($key) {
            return in_array($key, array_keys($arr));
        }));
        // add tag to available $tags
        $tagsAvailable[$key] = $tag;
    }
}

$tagsAvailable will look somewhat like this

[
    1137 => [
        'id' => 1137,
        'title' => 'Tag Title 1',
        'usage' => 4,
    ],
    1140 => [
        'id' => 1140,
        'title' => 'Tag Title 2',
        'usage' => 7,
    ],
]

Now you can use $tagsAvailable to render your tags and $matches to render your result list.

EDIT: I did this with $matches only containing 22 items and $tagsAvilable resulting in 5 items and the whole operation took 0.01ms and used 2.6kB of memory. So this should scale to hundreds or even thousands of matches. Though if you have that many matches you might consider chunk processing and paginating them.

  • Like 7
Link to comment
Share on other sites

Good morning @gebeer, that looks promising – so far I didn't make the step to use the findRaw() but I will – looks like a huge potential to get my search run better. Thank you so much for your detailed examples. I will definitely dive into that and try to use it further up in my data collecting process. Btw. its about 650 documents and 430 tags.

For now I just tried your 1-liner… 

4 hours ago, gebeer said:

Instead of the nested foreach you could do

foreach ($tags as $tag) {
	$usage = $matches->find("tax_tag={$tag}")->count;
	$label = "{$tag->get('title')} {$usage}";
	
}

Result: 1:4! The nested if takes avg 0.28s, your version 0.07s – now found 423 matches takes 30s instead of 120s for building the cache file (on my slow local machine). 
Looks like a good start for today – thank you @gebeer!

  • Like 3
Link to comment
Share on other sites

$pages->findRaw is the way to go if you want many results and only specific fields. I reduced the load time of a json file for 22000 pages querying 14 fields of the page from around 7 seconds to 1.5 to 2.0 seconds. My client is more than happy because he reloads the page with this data very often on one day.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...