Jump to content

Quick and dirty stress test?


heldercervantes
 Share

Recommended Posts

Hi guys.

I'm adding a search engine to my project and I'm worried about performance. The search is complex and there will be thousands of pages to go through once it's up and running. Right now I have little over 10.

So for a quick test, I was trying something like this:

$bulk_matches = $pages->find($selector); // that selector would pretty much get every searchable page
for ($i=0; $i<100; $i++) {
   foreach ($bulk_matches as $bm) {
      $bulk_matches->add($bm);
   }
}
// and then do the crazy weird search on supersized $bulk_matches
}

Of course this would be a crappy test even if it did work, but it would give me some initial insight without having to populate my DB with 1000 bogus entries, which would be a pain.

Is there a quick and dirty way to trick a pageArray into duplicating its content?

Thanks,

HC

Link to comment
Share on other sites

PageArray is designed to hold unique records, so the easiest way around this would probably be adding those bogus pages via API:

Run a script with a foreach/for/while loop and for each round add a page with (machine-generated) bogus data. Test and finally run a script that removes all bogus pages based on template (if these are the only pages using this template) or some other factor you've cooked into your bogus data – specific name format, parent, etc.

Another thing to note is that if you run a find query on $bulk_matches, which is a PageArray you've already fetched, this will use in-memory selectors. Your actual search will most likely make use of database selectors instead, which will differ in both functionality and efficiency :)

On the other hand, what you've described here doesn't really sound like much of a problem yet. Searching from thousands of pages should be fast, at least assuming that you a) add a sensible limit to each query, and b) don't add too many fields to the query and/or combine your search fields beforehand using Fieldtype Cache or something similar.

Link to comment
Share on other sites

...assuming that you a) add a sensible limit to each query, and b) don't add too many fields to the query...

That's just the thing. I have 15 fields to search on and must do it progressively. 4 of those fields are considered primary, and a keyword that's found in one of those sets a narrowing point. From there on, search focuses on these pages and that keyword is excluded from search in other fields. So you search "Cola", and if the engine finds entries with that on the brand name, it will ignore others that have that word on the description.

Also, this allows the engine to give alternatives if it can't find all the words. You search "Awesome Nike", it will discover pages with "nike" but none that also have "awesome" in it, and so returns something like "Sorry, no results for the complete query. Showing results for "Awesome Nike".

This will be a b**ch to optimise.

I'll have a look at https://www.mockaroo.com tonight, to generate my bogus entries.

  • Like 1
Link to comment
Share on other sites

Results are in. Used mockaroo to generate a bunch of entries, pumped pages up to over 1100, ran a search... 3.9 seconds!

Then I scrapped all the fields I'm searching, left only one, and got 1.9 seconds. Still way too much.

With limit=100 on the first block gathering and still keeping the whole process I got it down to 0.35 seconds. That's much more like it. Now I have to figure out how to narrow down results on the first find. Good luck to me :)

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...