Jump to content

Lazy loading of pages


apeisa
 Share

Recommended Posts

Would it be possible to "lazy load" pages? So that it wouldn't load whole page in memory, but just ID:s for example. That would allow PW api to be used in more scenarios (like when you do want to have thousands of items in memory).

I guess problem is that to allow those nice selectors we do need to have whole page objects in memory...?

Link to comment
Share on other sites

Not sure what it does, but Page object is pretty heavy, cannot keep that many in memory. Probably because all the other stuff instead of just fields.

I see this as a little problematic, when trying to use PW in other kind of stuff than websites (ie. building webapps). Especially because PW does have db table per field (and not per template), meaning that building custom queries are not how most of us have used to.

Btw: if I print_r($page) on the homepage, the result is a massive. Does it parse every page there or why is that?

Edited by apeisa
Link to comment
Share on other sites

PW loads stuff on demand, so if you print_r $page, then that's going to trigger it to load a lot of stuff and perhaps even an infinite loop. It's best not to print_r or var_dump $page objects for this reason.

A $page object is pretty light in it's default state. It gains weight based on the number of autojoin fields you have. Following that, all fields you access are loaded on demand and then become part of the page's data. To keep pages as lightweight as possible, you'd want to limit your autojoin fields to only those that you need for every instance of a $page (like the 'title' for example).

While a $page is intended to be lightweight, it's obviously going to be heavier than just an ID. And by that logic you can store more IDs in memory than you could pages. Though it's rare that I would do so because an ID rarely provides much value on its own. But there are cases, and here's how you can do it.

You can query any field_* table and expect that it will have a pages_id field. So if I wanted to find all pages that had the title 'Templates', I could do this:

$ids = array();
$result = $db->query("SELECT pages_id FROM field_title WHERE data='Templates'"); 
while($row = $result->fetch_assoc()) {
  $ids[] = $row['pages_id'];
}

// ...if you later want to convert it to Page objects, there is a getById function:
$matches = $pages->getById($ids); // returns a PageArray

Note that getById function performs faster if it knows what template is used by the page. It is an optional param that can be specified like this: $pages->getById($ids, $template);

Here's another scenario. Lets say that you want to do something like a $pages->find(), but get the IDs rather than the Page objects. Here's how you can do that:

$finder = new PageFinder();
$query = new Selectors("title=Templates, template=basic-page, sort=-created"); 
$results = $finder->find($query); // $results is array

$ids = array();
foreach($results as $result) {
   // each result includes: id, templates_id, parent_id and score (if rank sorting)
   echo "<li>Found Page ID $result[id] using template ID $result[templates_id]</li>';
} 

That's how you can use the PageFinder class, which isn't part of the public API, but may be useful in certain situations where you only need to interact with IDs. However, keep in mind that when you use this, you are bypassing PW's caching and such. So if you are using this to ultimately generate Page objects then you will likely lose performance by going this route. But if you have a use for thousands of page IDs and don't need to turn them into Page objects, this would be the way to go.

  • Like 1
Link to comment
Share on other sites

Thanks Ryan, exactly the information I was looking for.

My scenario is that I need to parse children pages. There might be easily many thousands of those children, all using same template. There are one particular page reference field on that template that I am looking for. Let's called it referenced_page. What I need is all unique "referenced_pages" that are used in children pages.

This is one way to do it, but while page count goes up, this will be impossible:

$a = array();
foreach($page->children() as $p) {
 $a[$p->referenced_page->id] = $p->referenced_page->title;
}

(currently I ended up doing it other way around, first loading all pages that can be referenced_pages, then looking if any of $page->children() has it as referenced_page, then go to next. Reason for this is that there are much less possible reference_pages than there are $page->children. This works now, but got me thinking about better way of doing all this.)

I will definitely play with your examples if I can make this faster. Although I might be looking solution from wrong angle... What would be the ultimate solution would be something like this:

$children = $page->children("referenced_page.unique");
foreach($children as $p) {
 $a[] = $p->referenced_page->id;
}
Link to comment
Share on other sites

Not sure I totally understand the example, but maybe something like this would do it?

$sql = <<< _SQL

   SELECT pages.id, ref_page.id, field_title.data
   FROM pages 
   JOIN field_referenced_page AS ref_field ON ref_field.pages_id=pages.id
   JOIN pages AS ref_page ON ref_field.data=ref_page.id
   JOIN field_title ON ref_page.id=field_title.pages_id 
   WHERE pages.parent_id={$page->parent->id}
   GROUP BY ref_page.id

_SQL;

$a = array();
$result = $db->query($sql);
while($row = $result->fetch_row()) {
   list($id, $ref_id, $ref_title) = $row; 
   $a[$ref_id] = $ref_title; 
}

Written in the browser without data to test on, so it may not be in a working state, but something like this may provide the result you are looking for.

Link to comment
Share on other sites

How about markup cache?

Edit: Apart from that. Your initial thought about going from the referenced_pages side would make sense. Cycle all pages and check if any children found then add it and break. I just thought if you're going to use it as a list on page, why not use simple API, cache the markup and only rebuild it every hour. Of course you can't beat raw sql but I think at some point (page count) you anyway want use a cache.

Link to comment
Share on other sites

Soma, this is totally different: I have build voting application (used for elections etc), where I save each vote as page. There are different kind of voting mechanisms, since in some elections each voter can give something like 40 votes. So if I have 300 people voting, it can create 300 * 40 = 12 000 votes. What I need in this situation is those 300 voters, not all those votes. All this data needs to be real time, so no use of any kind of cache. Also the original problem was that I was running out of memory, so no cache help here. Sorry for being mysterious about my use case - I tried to simplify it, but ended up complicate it for you guys...

I got this working nicely, this is what I ended using (basically stripped down from Ryan's example SQL):

SELECT ref_page.id
   FROM pages
   JOIN field_v_voter AS ref_field ON ref_field.pages_id=pages.id
   JOIN pages AS ref_page ON ref_field.data=ref_page.id
   WHERE pages.parent_id='$id'
   GROUP BY ref_page.id;

In this case I have 4500 votes and 244 voters. It doesn't even blink an eye to load, even if I end up loading each of those voters. It was pretty much opposite resolution when I tried to load those 4500 pages in memory and then those 244 in addition...

Also, this was very helpful to see, that it won't be that bad or mysterious if you need to get into sql level here. In simple cases like this it won't be that difficult sql after all. Although now it seems clear that in this particular application it would have been much better idea to use custom table instead of pages. SQL would have been this:

SELECT voter_id
   FROM votes
   WHERE election_id='$id'
   GROUP BY voter_id;

Also it might be nice idea to add "who has already voted" table to keep those ID:s.

Thanks guys again!

  • Like 1
Link to comment
Share on other sites

  • 9 months later...

Hi everyone - still very new to PW, but I am coming up against this decision too.

First just let me say how awesome I think the pages model works for the structure of a typical website, but when it comes to a web app (the distinction between a site and an app is often vague in my mind), I am wondering if perhaps it wouldn't be better to go with a custom database table to store user submitted data and use SQL (which I am comfortable with).

I do feel at times like the PW pages model and API it is making things more cumbersome when it comes to data querying and manipulation (but quite possibly this is just my inexperience with it still). The database is starting to fill up with so many tables - maybe this really isn't something I should be concerned about, it just feels inefficient - seems like there must be lots of joins going on to pull all this data together.

I am certainly no database guru, so maybe this is fine, but are there any guidelines/use examples as to when it might be more appropriate to go with custom tables?

Thanks

Link to comment
Share on other sites

I think it just depends on what you are trying to do. But as a guideline, only consider using your own custom tables if the data you are pulling doesn't need to be represented by unique URLs. Pages are designed to be represented by URLs, though they don't always have to be. But if you've got complex data that is large in quantity or fields, and individual records don't benefit by each having their own URL, then you may find it more efficient to use alternate data storage like SQL (if you are comfortable doing so). If you want the data to be managed in ProcessWire admin and queried from PW's API, you can create custom Fieldtypes by extending Fieldtype or FieldtypeMulti. An individual Fieldtype can represent as complex of a table structure as you want it to. 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...