heldercervantes Posted August 10, 2015 Share Posted August 10, 2015 Hi guys. I'm adding a search engine to my project and I'm worried about performance. The search is complex and there will be thousands of pages to go through once it's up and running. Right now I have little over 10. So for a quick test, I was trying something like this: $bulk_matches = $pages->find($selector); // that selector would pretty much get every searchable page for ($i=0; $i<100; $i++) { foreach ($bulk_matches as $bm) { $bulk_matches->add($bm); } } // and then do the crazy weird search on supersized $bulk_matches } Of course this would be a crappy test even if it did work, but it would give me some initial insight without having to populate my DB with 1000 bogus entries, which would be a pain. Is there a quick and dirty way to trick a pageArray into duplicating its content? Thanks, HC Link to comment Share on other sites More sharing options...
teppo Posted August 10, 2015 Share Posted August 10, 2015 PageArray is designed to hold unique records, so the easiest way around this would probably be adding those bogus pages via API: Run a script with a foreach/for/while loop and for each round add a page with (machine-generated) bogus data. Test and finally run a script that removes all bogus pages based on template (if these are the only pages using this template) or some other factor you've cooked into your bogus data – specific name format, parent, etc. Another thing to note is that if you run a find query on $bulk_matches, which is a PageArray you've already fetched, this will use in-memory selectors. Your actual search will most likely make use of database selectors instead, which will differ in both functionality and efficiency On the other hand, what you've described here doesn't really sound like much of a problem yet. Searching from thousands of pages should be fast, at least assuming that you a) add a sensible limit to each query, and b) don't add too many fields to the query and/or combine your search fields beforehand using Fieldtype Cache or something similar. Link to comment Share on other sites More sharing options...
heldercervantes Posted August 10, 2015 Author Share Posted August 10, 2015 ...assuming that you a) add a sensible limit to each query, and b) don't add too many fields to the query... That's just the thing. I have 15 fields to search on and must do it progressively. 4 of those fields are considered primary, and a keyword that's found in one of those sets a narrowing point. From there on, search focuses on these pages and that keyword is excluded from search in other fields. So you search "Cola", and if the engine finds entries with that on the brand name, it will ignore others that have that word on the description. Also, this allows the engine to give alternatives if it can't find all the words. You search "Awesome Nike", it will discover pages with "nike" but none that also have "awesome" in it, and so returns something like "Sorry, no results for the complete query. Showing results for "Awesome Nike". This will be a b**ch to optimise. I'll have a look at https://www.mockaroo.com tonight, to generate my bogus entries. 1 Link to comment Share on other sites More sharing options...
heldercervantes Posted August 10, 2015 Author Share Posted August 10, 2015 Results are in. Used mockaroo to generate a bunch of entries, pumped pages up to over 1100, ran a search... 3.9 seconds! Then I scrapped all the fields I'm searching, left only one, and got 1.9 seconds. Still way too much. With limit=100 on the first block gathering and still keeping the whole process I got it down to 0.35 seconds. That's much more like it. Now I have to figure out how to narrow down results on the first find. Good luck to me 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now