Jump to content

Selector query performance in Fieldtype inheriting FieldtypeMulti


elabx
 Share

Recommended Posts

Hi! I am struggling a bit with the following issue, I am working on FieldtypeRecurringDates and just realized that the query performance is very bad when querying the field through a selector, for example, on this find();

$from = date('Y-m-d H:i:s', strtotime('today'));
$to = date('Y-m-d H:i:s', strtotime('+30 days'));
$pages->find("event_recurring_dates>='$from', event_recurring_dates<='$to'");

I am getting the following SQL query:

SELECT pages.id,
       pages.parent_id,
       pages.templates_id
FROM   `pages`
       JOIN field_event_recurring_dates AS field_event_recurring_dates
         ON field_event_recurring_dates.pages_id = pages.id
            AND ((( field_event_recurring_dates.data >= '2023-03-09 00:00:00' ))
                )
       LEFT JOIN field_event_recurring_dates AS
                 field_event_recurring_dates__blank1
              ON field_event_recurring_dates__blank1.pages_id = pages.id
WHERE  ( pages.status < 1024 )
       AND ((( field_event_recurring_dates__blank1.data IS NULL
                OR
       field_event_recurring_dates__blank1.data <= '2023-04-08 07:34:47' )))
GROUP  BY pages.id -- [3715.9ms]

As per the last comment in the last line, it runs in almost 4 seconds! In a table with liek 10k rows I think this is way too much??

I did the test of removing the LEFT  JOIN, and just set a JOIN, and the performance increases dramatically, it now takes 0.0214  ?

SELECT pages.id,
       pages.parent_id,
       pages.templates_id
FROM   `pages`
       JOIN field_event_recurring_dates AS field_event_recurring_dates
         ON field_event_recurring_dates.pages_id = pages.id
            AND ((( field_event_recurring_dates.data >= '2023-03-09 00:00:00' ))
                )
       -- Here was the LEFT JOIN         
       JOIN field_event_recurring_dates AS
                 field_event_recurring_dates__blank1
              ON field_event_recurring_dates__blank1.pages_id = pages.id
WHERE  ( pages.status < 1024 )
       AND ((( field_event_recurring_dates__blank1.data IS NULL
                OR
       field_event_recurring_dates__blank1.data <= '2023-04-08 07:34:47' )))
GROUP  BY pages.id 

 

Link to comment
Share on other sites

@elabx First thing to check is that you've got an index on your `data` column for that field. If not then execute this query: ALTER TABLE field_event_recurring_dates ADD INDEX data (data)

I'm also wondering about the other indexes on your Fieldtype.  You've got an "id" as the primary key, which is unusual for a Fieldtype. Usually the primary key is this for a FieldtypeMulti:

$schema['keys']['primary'] = 'PRIMARY KEY (pages_id, sort)'; 

So I'm guessing you may not have the index on pages_id and sort, which would definitely slow it down, potentially a lot. If you don't need the "id" then I would probably drop it, since you don't really  need it on a FieldtypeMulti. The only Fieldtype I've built that keeps an id is FieldtypeTable, and it uses the 'data' column for that id and then uses this in its getDatabaseSchema:

$schema['data'] = 'INT UNSIGNED NOT NULL AUTO_INCREMENT';
$schema['keys']['primary'] = 'PRIMARY KEY (data)'; 
$schema['keys']['pages_id'] = 'UNIQUE (pages_id, sort)';
unset($schema['keys']['data']); 

Most likely the above is the primary reason for the bottleneck. But another more general reason that your query may be slow is because the $pages->find() selector you are using doesn't filter by anything other than your event_recurring_dates field. In a real use case, usually you'd at least have template(s) or a parent in that selector, or you'd be using children(), etc. Any should improve the performance. Without any filter, you are asking it to query all pages on the site, and if it's a big site, that's going to be potentially slow. 

The reason for the LEFT JOIN is because without it you can only match rows that that exist in the database. You have an operator "<=" that can match empty or NULL values. Since you also have an operator ">=" that can't match empty values, then the left join of course isn't necessary, but each part of the selector is calculated on its own (independent calls to getMatchQuery). You are getting the fallback Fieldtype and PageFinder logic since your Fieldtype doesn't provide its own getMatchQuery(). This is a case where you may find it beneficial to have your own getMatchQuery() method. The getMatchQuery in FieldtypeDatetime might also be a good one to look at. 

In your case, since your Fieldtype is just matching dates, it may be that you don't need a left join situation at all, since you might never need to match null (non-existing) rows. So you could probably get by with a pretty simple getMatchQuery(). Maybe something like this (this is assuming 'data' is the only column you use, otherwise replace 'data' with $subfield): 

public function getMatchQuery($query, $table, $subfield, $operator, $value) {

  if($subfield === 'count') {
    return parent::getMatchQuery($query, $table, $subfield, $value); 
  }

  // limit to operators: =, !=, >, >=, <, <= (exclude things like %=, *=, etc.)
  if(!$this->wire()->database->isOperator($operator)) {
    throw new WireException('You can only use DB-native operators here'); 
  }
  
  if(empty($value)) {
    // empty value, which we'll let FieldtypeMulti handle
    if(in_array($operator, [ '=', '<', '<=' ])) {
      // match non-presence of rows
      return parent::getMatchQuery($query, $table, 'count', '=', 0); 
    } else {
      // match presence of rows
      return parent::getMatchQuery($query, $table, 'count', '>', 0); 
    }
  }
  
  // convert value to ISO-8601 and create the WHERE condition
  if(!ctype_digit("$value")) $value = strtotime($value);
  $value = date('Y-m-d H:i:s', (int) $value); 
  $query->where("$table.data{$operator}?", $value);
  
  reuturn $query; 
}	

 

  • Like 2
Link to comment
Share on other sites

9 minutes ago, ryan said:

@elabx First thing to check is that you've got an index on your `data` column for that field. If not then execute this query: ALTER TABLE field_event_recurring_dates ADD INDEX data (data)

It's a datetime type column and can confirm it has an index. (BTREE type shows on phpmyadmin?)

9 minutes ago, ryan said:

I'm also wondering about the other indexes on your Fieldtype.  You've got an "id" as the primary key, which is unusual for a Fieldtype. Usually the primary key is this for a FieldtypeMulti:

I did in a bit of a trial and error (looking at the FieldtypeTable's code!) to get the field to paginate because with recurring dates things can go wild and I didn't seem to get it work without the id field, maybe I am understanding something wrong?

I am going to create those missing indexes and try the get match query get back on this thread! I was actually already trying with the one from FieldtyeDatetime, so I guess I'm on the right track!

Thank you so much for taking time to answer, invaluable!

Link to comment
Share on other sites

Got this!!

SELECT 
  pages.id, 
  pages.parent_id, 
  pages.templates_id 
FROM 
  `pages` 
  JOIN field_event_recurring_dates AS field_event_recurring_dates ON field_event_recurring_dates.pages_id = pages.id 
  AND (
    (
      (
        field_event_recurring_dates.data >= '2023-03-09 00:00:00'
      )
    )
  ) 
  LEFT JOIN field_event_recurring_dates AS field_event_recurring_dates__blank1 ON field_event_recurring_dates__blank1.pages_id = pages.id 
WHERE 
  (pages.templates_id = 50) 
  AND (pages.status < 1024) 
  AND (
    (
      (
        field_event_recurring_dates__blank1.data IS NULL 
        OR field_event_recurring_dates__blank1.data <= '2023-04-08 10:36:35'
      )
    )
  ) 
GROUP BY 
  pages.id -- [525.6ms]

Would you think the id column is still hitting the performance some way?? 

The fieldtype's schema right now is as:

$schema = parent::getDatabaseSchema($field);
$schema['id'] = 'INT UNSIGNED NOT NULL AUTO_INCREMENT';
$schema['data'] = 'datetime NOT NULL';
$schema['keys']['primary'] = 'PRIMARY KEY (id)';
$schema['keys']['pages_id'] = 'UNIQUE (pages_id, sort)';
$schema['keys']['data'] = 'KEY data (data)';

Conclusion, need to wrap my head around MySQL properly, my favorite CMS has got me too spoiled haha.

  • Like 1
Link to comment
Share on other sites

@elabx That seems like a major performance improvement (looking at the ms time at the bottom). And maybe it's about right. The only thing I'd mention is that the left join/null queries aren't really necessary for your Fieldtype since you are matching dates, and you can use field_name.count=0 to match non-presence of rows. So you may want to put in your own getMatchQuery(), like my earlier example, as that might optimize it further. 

  • Like 1
Link to comment
Share on other sites

19 hours ago, ryan said:

The only thing I'd mention is that the left join/null queries aren't really necessary for your Fieldtype since you are matching dates, and you can use field_name.count=0 to match non-presence of rows. So you may want to put in your own getMatchQuery(), like my earlier example, as that might optimize it further. 

I did place that code, but will work on refine it, thanks!

20 hours ago, ryan said:

You've got an "id" as the primary key, which is unusual for a Fieldtype.

Do you have any recommendation on to removing this? I saw it was sort of a requirement to enable pagination on fieldtypes? Is this assumption correct? 

Thanks a lot for your help @ryan!

Link to comment
Share on other sites

So what I did to remove the null checks was overriding the isEmptyValue() method like:

public function isEmptyValue(Field $field, $value) {
        return true;
}

I am going to guess this shouldn't be just like this lol

I ended up with a query like this, which also does seem faster:

SELECT pages.id,pages.parent_id,pages.templates_id
FROM `pages`
JOIN field_event_recurring_dates AS field_event_recurring_dates ON field_event_recurring_dates.pages_id=pages.id AND (((field_event_recurring_dates.data>='2023-03-12 00:00:00' ) ))
JOIN field_event_recurring_dates AS field_event_recurring_dates1 ON field_event_recurring_dates1.pages_id=pages.id AND (((field_event_recurring_dates1.data<='2023-04-11 19:52:53' ) ))
WHERE (pages.templates_id=50)
AND (pages.status<1024)
GROUP BY pages.id -- [117.7ms]

I see the selector now throws an exception when putting a blank value, I'll  guess it has to do with the simplistich approach in the override laid done above. I will dig into how to properly override the isEmptyValue() for my case., getting some hints in FieldtypeTime.

Thanks again for the help!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...