Performance/architecture question

joe_g · July 15, 2013

Hi,

I'm considering switching CMS from Symphony CMS for an existing project. The project is a bit of a resource hog and I wonder if a switch would help. But also, any general input about the architecture is welcome. Maybe there is a better way (a Processwire way) to solve this.

– The website is an event database, events are hiearchial up to 3 levels. In other words, events can have sub-events and sub-sub-events.

– Each event has one or more time spans attached to it, since they often happen more than once.

– Model overview attached below

there are 500 events with a total of 1000 time-spans, this should be able to grow ten-folds the coming years. Say 10.000 time-spans on 5000 events.

The front page consists of two parts 1. calendar and 2. overview

Problem 1.

Is this too slow? There is no such thing as "index" on time ?

Calendar would be (say 100 results out of the total 10.000 time-spans)

$times = $pages->find("parent=/events, template=time, time.date_end > $today"); //too slow?

foreach($times as $time) {

$event = $time->parent; //display event info for each time it happens

}

Problem 2.

How can i filter by time-spans that are children-pages?

I would want to do something like this, (but I guess I can't do that) – filter on child page values?

$pages->find("parent=/events, template=event, on_frontpage=1, children.enddate > $today");

So my question is if it's possible to do something like this with snappy response times in processwire. Hope it makes sense as a question!

rgds,

J

ryan · July 16, 2013

Is this too slow? There is no such thing as "index" on time ?

All ProcessWire fields are indexed (including datetime fields), so I don't think you'll have a problem with it being slow. But if you want to retrieve and load a 1000+ pages at a time, that could be potentially slow (and perhaps run out of memory). It's important to place limits (i.e. "limit=50") on any queries that can return huge amounts of pages, in order to ensure that everything stays running fast.

$times = $pages->find("parent=/events, template=time, time.date_end > $today"); //too slow?

I don't think that would be slow. But paginate your results by placing a limit to be sure. Also, I doubt that you need the parent=/events/ since you are already specifying a template. Though if I understand your structure right, if you did want to specify a parent, you'd probably want to use has_parent rather than parent, since the time pages live under /events/some-event/ rather than /events/. Last thing to note is that you can't have spaces before/after operators like you do in your time.date_end > $today" -- that would need to be "time.date_end>today". Btw, you can specify "today" rather than populating a $today variable, if you prefer.

How can i filter by time-spans that are children-pages?

I would want to do something like this, (but I guess I can't do that) – filter on child page values?

$pages->find("parent=/events, template=event, on_frontpage=1, children.enddate > $today");

Actually you can do this. But note the same suggestions I had above. There would also be this alternative, which might be potentially faster because ProcessWire is a little more optimized for parent-field matching queries than children-field matching queries (given that a page can have only 1 parent, but any number of children):

$pages->find("template=time, parent.on_frontpage=1, enddate>today");

When you foreach the results of that, note they are 'time' pages rather than 'event' pages, so you'd want to refer to the parent in order to get the event.

joe_g · July 16, 2013

Ha!

I had a feeling this fit better in PW. Thank you so much for the explanation!

And, "parent.on_frontpage=1" that is really really nice!

If I can convince the client to make the jump I'll post my progress here. I would start by simulating this setup to see what response times I'll be getting.

cheers,

J

*ps. didn't know about the spaces thing, good to know.

joe_g · January 8, 2014

Edit: I did something wrong before. But this line

$pages->get('/events')->children('template=event,onfrontpage=1,times.enddate>'.$today);

it returns 60 events, and it takes 1 full second.

Is that normal? I would have imagined this would be one single sql with lots of outer joins in it? "times" is a repeater.

each event containts perhaps 5 page-fields (where 2 is repeaters), and 10-15 regular fields. Everything that can be auto-joined is auto-joined, although it makes little difference seemingly. All fields a tri-lingual, perhaps that makes a difference?

Ryan, you suggested I should do

$times = $pages->get('/events')->find('template=time,parent.onfrontpage=1,enddate>'.$today);

but since times is a repeater, i can't really do that (and it would be weird logically as well)

J

~~I started trying this out - and surprisingly the speed issue isn't so much searching and finding, but displaying the data. (as in looping through each field and echo'ing them)~~

I thought that (if I was using auto-join) - once the stuff was loaded it would be super fast to display. Could it be that auto-join doesn't work for repeaters and references to other pages? It takes several seconds to loop through 200 events and display them including all ther doimain data (a couple of repeaters, and other connected page-references - say about 5 domain data fields, and 10 regular data fields).

I tried before doing a equivalent SQL query that joins together all data including all the domain data (i was using the database of Symphony, but the abstraction is similar) - I joined together everything and the query would execute in milliseconds, and all the data would be loaded and ready to be displayed.

~~Is there anything to do about this or is custom SQL the only option here?~~

~~thanks!~~

ryan · January 11, 2014

It would be executed in a single query, except that there is a repeater in there. The repeater portion of the selector ends up executing a separate query. Repeaters always add overhead. Though 1 second still seems like a long time for that particular find(). Are you sure it's the find() call that takes 1 second, or does the 1 second include output generation time? If you find that it is the find() call that's the bottleneck, ideally you might want to make a copy of that enddate field in your event template, and have it populated automatically with a hook in your /site/templates/admin.php. You could hide the field so that it's not visible in your admin interface.

/site/templates/admin.php

$pages->addHook('saveReady', function($event) {
  $page = $event->arguments(0); 
  if($page->template != 'event') return;
  $enddate = 0; 
  foreach($page->times as $time) {
     if($time->enddate > $enddate) $enddate = $time->enddate;
  }
  $page->enddate = $enddate;
});

In doing this, you are essentially creating a cache for the max 'enddate', and that cache should give you a big performance improvement. If you take this route, you'll have to create the cache once at the beginning, just to establish values:

foreach($pages->find("template=event") as $event) {
  $event->of(false); 
  $event->save();
}

You could then delete the above code after running it once.

joe_g · January 12, 2014

This is absolutely brilliant. Automatic denormalisation, sort of.

The 1 second it took was indeed only that very line, but without the auto-joined repeaters things improved a great deal. I'm repeatedly importing thousands of pages, then deleting them, lots of times. Could there be a performance degradation due to that? (I'm guessing thats not the case)

The second snippet triggers the hook saveReady for existing pages, I guess?

Many thanks for the help – I'll get back with the results later

J

joe_g · January 14, 2014

I got really excited there for a moment, and in some cases this could work. In my case I'll need to display the times anyway, so they need to be fetched one way or another. So the expense of getting those relational pages is still there.

I take it that every relationship / "join" that involves other pages adds a linear expense, and there is no way around that except custom sql?

...I think I have to rethink the structure.

Pete · January 15, 2014

Where are you importing thousands of pages from repeatedly? Not on every page request I hope

Also make sure you're not storing too many items in repeaters fir each page as this can slow things down. If you remove the repeater from the selector temporarily does it become much faster? I suspect it might but either way it could show you where the slowness is coming from.

joe_g · January 31, 2014

I have an existing site that I'm trying to convert to processwire, so I'm transferring a lot of data and deleting it again as part of the process to write the conversion of the data. I was wondering if this might mess up some indexes and slow things down gradually (although I'm guessing not).

(by the way, converting an existing site to processwire is awesome)

About performance, I don't have a lot of pages per repeater, but what I have is a lot of repeater fields. It seems like every added repeater (where the value needs to be displayed) adds a linear amount of processing power (twice the number of repeater fields, twice the time required to get the data). I was hoping that 'autojoin' would join the data (including the repeater data) into some kind of big outer join (to bundle it all in one SQL call), but it doesn't seem to be the case.

So I figure autojoin makes more sense for regular fields, but not for repeaters.

In my case, I guess I should just be conservative with the amount of things i display, and try to simplify/denormalize the structure a bit.

J

ryan · February 7, 2014

You are right that repeaters add overhead. Think of it this way: each repeater item is itself a page. So if you are loading 1 page that has 5 repeater items, then you are actually loading 6 pages. ProcessWire can deal with lots of pages no problem, but if you start dealing with lots and lots of pages with lots and lots of repeaters, then it's good to be aware of potential overhead. Autojoining your repeater won't help, but autojoining the fields in the repeater can help.

Alexander · September 12, 2014

Dear all,

I have 205 pages with repeaters. Each repeater has 21 lines with two fields: date field and checkbox. I use this function to find free chalets for selected days and it's take around 10 seconds. Any suggestion of possible speedup this? Maybe PageTable is faster in this case?



function find($selector) {

	if (wire('sanitizer')->name(wire('input')->get->dates)) {


		$dates = wire('sanitizer')->name(wire('input')->get->dates);

		$dates = strtotime($dates);

		$todate = strtotime("+1 day", $dates); 


		$selector .= "template=chalet";

		$chalets = wire('pages')->find($selector); 


		foreach ($chalets as $chalet) {


		foreach ($chalet->clndr->find('arrival=' . $dates . '|' . $todate . ', booked=0') as $r)

		    { ... }

                }

}

sforsman · September 12, 2014

I take it 'clndr' is the repeater-field? Your code is slow because you are executing a find on each chalet. You should be able to do the whole thing with a single selector (that uses sub-field selectors), e.g.

$selector = "template=chalet, clndr.arrival={$dates}|{$todate}, clndr.book=0";

This should be very fast.

If this doesn't solve your problem please elaborate on your design.

Alexander · September 12, 2014

Sfrosman,

thank you for trying.

This selector returns all chalets with selected dates in repeater and some of booked=0. Another words, it's not possible to use this selector to check booked exactly selected dates or not.

sforsman · September 12, 2014

Edit: Just to make this clear. Let's say you have the following scenario

Chalet A
- Repeater 1
  - Arrival=2014-09-12
  - Booked=1
- Repeater 2
  - Arrival=2014-09-11
  - Booked=0
Chalet B
- Repeater 1
  - Arrival=2014-09-12
  - Booked=0
- Repeater 2
  - Arrival=2014-09-11
  - Booked=0

When searched with the dates 2014-09-11 and 2014-09-12, do you want the selector to return

Chalet A and Chalet B, or
Just Chalet B?

Edited September 12, 2014 by sforsman

Alexander · September 12, 2014

Wow!

Actual it's look little bit different.

Chalet A
- Repeater 1
  - Arrival=2014-09-13
  - Booked=1
- Repeater 2
  - Arrival=2014-09-20
  - Booked=0
Chalet B
- Repeater 1
  - Arrival=2014-09-14
  - Booked=0
- Repeater 2
  - Arrival=2014-09-21
  - Booked=0

When searched with dates 2014-09-13 | 2014-09-14 I want to return only Chalet B because it's not booked.

More than, if forget about booked=0 and use this code it's take around 7 seconds.

$selector = "template=chalet, clndr.arrival={$dates}|{$todate}";

If use search without repeater for normal fields like checkbox, it works as normal.

So, I'm thinking maybe it's better to made hidden copy of all repeaters line to checkboxes like 20140921=0, 20140921=1 and use them for searching.

Edited September 12, 2014 by Alexander

sforsman · September 12, 2014

I now see your problem, Alexander. Unfortunately that is how Repeater-fields work. I would recommend switching to PageTables, that provide the functionality you need out of the box. PageTable is a feature of 2.5, which will be out any day now. You would define your selector like

$selector = "template=chalet, clndr.id=[template=clndr, arrival={$dates}|{$todate}, booked=0]";

PageTables have been available in the dev-branch for quite a while and we are already using them heavily. They are superior to Repeaters in every way. Of course you would have to write a simple script that saves your repeaters to the new PageTable-field - the script would be trivial though and I'd be happy to help with it.

Soma · September 12, 2014

You can also search repeater direct and get the page they're on. Look how the repeater template is named and use that

$selector = "template=repeater-chalet, ...";

then on the result

$chaletpage = $repeaterpage->getForPage();

adrian · September 12, 2014

Of course you would have to write a simple script that saves your repeaters to the new PageTable-field - the script would be trivial though and I'd be happy to help with it.

This should get you going on the conversion if you decide to do that:

https://processwire.com/talk/topic/6654-convert-repeater-to-pagetable/?p=65160

You can ignore the need for a title by making the title hidden and not required in the context of the pagetable template and setting a value for the automatic page name format when setting up the pagetable field.

Sign In

Performance/architecture question

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members