Jump to content

PW selector performance


Webrocker
 Share

Recommended Posts

Hi,

consider this setup:

- "Shows" having a certain "Event"; stored in a page field "event_relation"

- "Events" having multiple "Shows"; stored in a page field "show_relation"

There are about 86,000 shows and 17,000 events in the database, together with the normal cms pages there are over 100,000 rows in the pages table.

I now query first the shows matching a certain criteria, and then query the events having these shows, in order to use their data for display.

I had the server timing out several times; even with memory settings maxed up.

I created the following bit for testing, already using a raw sql query to get to the page ids of the shows, then using this in a "pw-style" selector for the events.

echo '<h1>RAW mySQL</h1>';
$start  = microtime(true);
$sql    = 'SELECT pages_id FROM field_date WHERE data>"2015-11-28" AND data<"2016-01-15"';
$query  = $database->prepare($sql);
$query->execute();
$res    = $query->fetchAll(PDO::FETCH_COLUMN);
$result = count($res);
$end    = microtime(true);
$diff   = $end - $start;
$res1 = implode('|',$res);
echo '<p>Result-Count: ' . $result . '</p>';
echo '<p>Duration: ' . $diff . ' Seconds.</p>';

echo '<h1>PW</h1>';
// process-wire:
$start  = microtime(true);
$res    = $pages->find('template=event_item,shows_relation='.$res1);
$result = count($res);
$end    = microtime(true);
$diff   = $end - $start;
echo '<p>Result-Count: ' . $result . '</p>';
echo '<p>Duration: ' . $diff . ' Seconds.</p>';
# RAW mySQL
Result-Count: 26770
Duration: 0,021031141281128 Seconds.

# PW
Result-Count: 7229
Duration: 40,812306880951 Seconds.

Look at this -- 40 seconds, on a root server in a testing environment, no "real" traffic interfering. O_o

I had a look into mySQLs log files and processlist, and I noticed that the query that PW is sending contains LOTS of OR statements like this:

...
        OR ((field_shows_relation.data='103472') )  
        OR ((field_shows_relation.data='103473') )  
        OR ((field_shows_relation.data='103477') )  
        OR ((field_shows_relation.data='103478') )  
        OR ((field_shows_relation.data='103479') )  
        OR ((field_shows_relation.data='103480') )  
        OR ((field_shows_relation.data='103481') )  
        OR ((field_shows_relation.data='103482') )  
        OR ((field_shows_relation.data='103483') )  
        OR ((field_shows_relation.data='103486') )  
        OR ((field_shows_relation.data='103490') )  
        OR ((field_shows_relation.data='103491') )  
        OR ((field_shows_relation.data='103492') )  
        OR ((field_shows_relation.data='103496') )  
        OR ((field_shows_relation.data='103497') )  
        OR ((field_shows_relation.data='103515') )  
        OR ((field_shows_relation.data='103518') )  
        OR ((field_shows_relation.data='103526') )  
        OR ((field_shows_relation.data='103527') )  
        OR ((field_shows_relation.data='103528') )  
        OR ((field_shows_relation.data='103529') )  
        OR ((field_shows_relation.data='103530') )  
        OR ((field_shows_relation.data='103531') )  
        OR ((field_shows_relation.data='103532') )  
        OR ((field_shows_relation.data='103533') )  
        OR ((field_shows_relation.data='103534') )  
        OR ((field_shows_relation.data='103535') )  
        OR ((field_shows_relation.data='103536') )  
        OR ((field_shows_relation.data='103537') )  
        OR ((field_shows_relation.data='103539') )
....

in the example above like 26,770 rows of these "OR" statements.

I don't know if this is the reason why those queries take a long time, and I am by no means skilled in query-crafting, but wouldn't it be easier to have an "IN( a,b,.....,n )" statement for that?

The problem is, we are using quite a lot of these "page" fields to reference lots of shared data like venues and locations - and some of these use other page relations and so the resulting queries fired by the PW selectors are very very long and complex, and so the percieved performance of the page using this is rather sluggish. As long as the data queried stays under like 1000 results, everything is ok, but the it starts to increasingly getting slower.

Has someone else had similar experiences? Should we stay away from the API selectors for such an amount of data or have we shot ourselfs in the foot architecture-wise by using all those page-fields?

thanks,

Tom

Link to comment
Share on other sites

Hi,

The problem is that $pages->find() returns Page objects, it takes a lot of memory and time to build > 7000 objects. The solution is to paginate your results.

Are you displaying over 7000 results on your site?

  • Like 1
Link to comment
Share on other sites

Hi Wanze,

thx for the fast reply :-)

… the problem is, the queries are timing out even before the end result (which is paginated) will be returned -- it seems that the selectors that are querying the page-relation fields try to look up several thousand page objects in between - even if the result in the end is only a handful of matching pages… :-/

but again, maybe there is a problem in the way we are trying to get to the information, not in the way pw returns it.

For the moment I was wondering wether the repeated "OR" lines in the sql statement can make a difference…

thx

Tom

Link to comment
Share on other sites

to illustrate the above; this is with

$res    = $pages->find('template=event_item,shows_relation='.$res1.',limit=50');
# RAW mySQL
Result-Count: 26770
Duration: 0,030179023742676 Seconds.

# PW
Result-Count: 50
Duration: 48,704106092453 Seconds.

Tom

Link to comment
Share on other sites

Is there a specific reason you're running a MySQL query first? Could you rewrite this part to use one PW selector? Such as...

template=event_item, shows_relation.date>2015-11-28, shows_relation.date<2016-01-15
  • Like 5
Link to comment
Share on other sites

Hi Craig,

wow, this simple modification sped up the query quite a bit O_o :-))

$res    = $pages->find('template=event_item, shows_relation.date>2015-11-28, shows_relation.date<2016-01-15');
# RAW mySQL
Result-Count: 26770
Duration: 0,023936033248901 Seconds. // not used in the following:

# PW
Result-Count: 7342
Duration: 5,9315440654755 Seconds.

THANK YOU.

Link to comment
Share on other sites

No problem Tom :)

That's definitely an improvement, but personally I think nearly 6 seconds is still a bit long for a query to execute that would be displayed on a page request. I would suggest limiting it to X number of items first to see if that improves it. There may be other optimisations that can be made as well, or caching could be introduced depending on the use case of the data, how frequently it changes or how dynamic it all needs to be.

  • Like 4
Link to comment
Share on other sites

hi craig,

yes, the "real" one uses a limit. if set to 20, this takes around two seconds.

do you happen to know of a way to "see" the mysql queries PW is sending? if I introduce a sort=shows_relation.date to the above, it again will time out…

thx

Tom

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...