Jump to content

Page reference scalability


AndZyk
 Share

Recommended Posts

Hello,

I am currently building a intranet with hundreds of posts. One planned feature is, that each logged-in user can see what post he has already read.

So I have added a page reference field „readPosts“ to the user template and save each post as page to this page reference field when visiting the post.

This could lead to hundreds of pages in this single page reference field. Because of this I was wondering, if anybody has experience in how good the page reference field can scale? Can it handle hundreds of page references or could there be a downside to the performance? As far as I can see, in the database only the user id, page id and sort number would be saved. So theoretically there would be no performance issue.

The alternativ would be to save the user directly in the post as page reference. Would that be the better solution? ?

Regards, Andreas

Edited by AndZyk
Fixed typo
Link to comment
Share on other sites

I use a similar setup within a product catalogue. The average is about ~60 referenced pages. Performance is still perfect.

How many users are there that could read those hundreds of pages?

Maybe you could save the users that read a page to that page instead.

 

  • Like 1
Link to comment
Share on other sites

Hello @wbmnfktr,

thank you for your answer. Good to know, that the page reference field can handle many page references without a performance issue.

The most amount I had in one page reference field was about 100, but only temporarily. So I don't if there is a problematic amount.

But you are right, that it would be better to save the users to the post directly. Because there are probably around one hundred users, but over time the posts could grow in the thousands.

So one thousand references in one user page reference field is worse than one hundred references in one post reference field I think. ?

Regards, Andreas

Link to comment
Share on other sites

Another idea or maybe even a solution for this scenario.

I never used it in any of my projects so far but I know that @bernhard's RockGrid(?) or at least one of his modules is really really fast while handling LOTS of data and could possibly handle even a few thousand referenced pages. BUT to be honest I don't know if his module only works in the backend or if it can be a problem solver in the frontend, too.

Link to comment
Share on other sites

I'd build a custom DB table that holds the id of the page and the id of the user that visited that page.

 page | user
 -----|-----
  1   | 41
  2   | 41
  3   | 41

Get all pages that have been visited by the admin?

SELECT page FROM visitedpages WHERE user = 41

Get all users that visited page 1?

SELECT user FROM visitedpages WHERE page = 1

Executing custom SQL is as easy as that:

k9AQCmx.png

Thousands of pages? No Problem ? Thousands of users? No Problem ? 

PS: If you only want an array of all page ids:

ycUYD58.png

  • Like 5
Link to comment
Share on other sites

4 hours ago, AndZyk said:

if anybody has experience in who good the page reference field can scale?

My only concern would be that a lot of page references can slow down page-loading in edit-mode in the admin. But just for updating / creating / reading such values, I don't think performance would suffer. You can always completely hide these PR fields in the admin (or choose to only load via AJAX when opened).

  • Like 1
Link to comment
Share on other sites

28 minutes ago, dragan said:

My only concern would be that a lot of page references can slow down page-loading in edit-mode in the admin. But just for updating / creating / reading such values, I don't think performance would suffer. You can always completely hide these PR fields in the admin (or choose to only load via AJAX when opened).

I'm not so sure about that. PW often treats pages as... well... pages. Not plain ids. So it could happen that you load all those pages into memory which would be a huge overhead and could drastically slow down your site. You can even build your own method that gets all users that visited a page:

if($page->wasViewedBy($user)) {
  echo "You have read this post";
}
else {
  echo "Wow! We have new content for you ;)";
}

Add a timestamp to your DB table and you can even collect statistics about page visits... Not complicated at all. Try to do that with PW page reference fields ?

If you want something future-proof: Overcome your fears and build a custom DB table ? 

Link to comment
Share on other sites

50 minutes ago, bernhard said:

I'd build a custom DB table that holds the id of the page and the id of the user that visited that page.

Thank you for this example. That would be of course the most performant solution. For now I will stick with a page reference field, because it's easier for me to handle on the API side. But if performance will ever be an issue, I will keep this solution in mind. ?

42 minutes ago, dragan said:

My only concern would be that a lot of page references can slow down page-loading in edit-mode in the admin.

This will be most likely no issue, because I think this field will be hidden in the edit-mode for data protection.

6 minutes ago, bernhard said:

If you want something future-proof: Overcome your fears and build a custom DB table ? 

Thats true, I have to try out building own data tables some time. Till now I didn't have a need for this, because its so easy to build complex data models with ProcessWire. ?

Link to comment
Share on other sites

@AndZyk, a Page Reference field doesn't scale well because all pages are loaded into memory and there is no ability to limit the loaded pages or paginate the inputfield in admin.

Back in 2016 Ryan said it was likely that pagination would be added to several core fieldtypes but it hasn't happened yet: https://processwire.com/blog/posts/fieldtype-pagination/

The Page Reference fieldtype/inputfield is the one that would benefit the most from this. I have an open request for this that you could add your voice to: https://github.com/processwire/processwire-requests/issues/13

  • Like 3
Link to comment
Share on other sites

12 hours ago, bernhard said:

What exactly is easier? I'd be happy to get an example

Currently I am using has() and a selector on the page reference field readByUsers:

if (!$post->readByUsers->has(user())) {
	// Do something
}

if (count($child->find("template=post, readByUsers!={$user}")) {
	// Do something
}

Probably that is not do difficult to write as SQL query, but I have not much experience with writing SQL. ?

I would be happy about an example.

1 hour ago, Robin S said:

@AndZyk, a Page Reference field doesn't scale well because all pages are loaded into memory and there is no ability to limit the loaded pages or paginate the inputfield in admin.

Thank you for the hint. Do the pages get loaded in the admin even if the field is hidden completely (with the field setting) in the edit-page? In that case, this could be an issue.

Link to comment
Share on other sites

19 hours ago, bernhard said:

PW often treats pages as... well... pages. Not plain ids. So it could happen that you load all those pages into memory which would be a huge overhead and could drastically slow down your site.

I'm glad you believe @Robin S if you didn't want to believe me ? 

Just create a custom $page method that does your logic, then you have your own performant API. That should get you started:

IXZaaAb.png

$wire->addHook("Page::readByUser", function($event) {
    $result = $this->database->query("SELECT id,templates_id FROM pages LIMIT 5");
    $ids = $result->fetchAll(\PDO::FETCH_OBJ);
    
    $user = $event->arguments(0);
    d("check for user $user");
    
    $event->return = $ids;
});
d($page->readByUser($user));

 

  • Like 4
Link to comment
Share on other sites

Just for the record (not saying that it would be the best solution overall), FieldtypePageIDs also has certain benefits if/when the problem is loading too many Page objects into memory. It could be worth looking into in this sort of situation ?

  • Like 6
Link to comment
Share on other sites

12 hours ago, AndZyk said:

Do the pages get loaded in the admin even if the field is hidden completely (with the field setting) in the edit-page?

If I use Tracy to do a test dump in FieldtypePage::wakeupValue (which is where the stored IDs are loaded into a PageArray) then it seems that this method executes regardless of the visibility of the field in Page Edit.

And it's not Page Edit that's the main problem anyway. If you have a field then presumably you want to work with that field, but whenever you do...

$page->my_pr_field

...then all the pages are loaded to memory. So for instance if you want to add a page via the API...

$page->my_pr_field->add($p)

...then you are automatically loading all the pages to memory.

@bernhard's suggestion and @teppo's module are good workarounds but I think it's important to get a core solution to this. PW has an "everything is a page" philosophy, so pages can be any unit of data, not necessarily a viewable page of content. And Page Reference fields are the fundamental way to make connections between pages in PW. So when you have thousands of pages that need to have connections to thousands of other pages then you have to start looking beyond the core fieldtypes which is a limitation for PW.

Sometimes you can do what @wbmnfktr suggested and reverse the connection, putting the PR field in the other template to reduce the number of pages stored per field. But this isn't always possible. In terms of concrete examples, imagine you were using PW to build some kind of Spotify-like database that tracks users' listening habits. There could be thousands of users who listen to thousands of songs, so either way you cut it you want to store thousands of pages in a PR field.

  • Like 6
Link to comment
Share on other sites

On 11/22/2019 at 3:04 PM, bernhard said:

I'm glad you believe @Robin S if you didn't want to believe me ? 

Just create a custom $page method that does your logic, then you have your own performant API. That should get you started:

Of course I believe you. Thank you for this method, I will try it out tomorrow. ?

On 11/22/2019 at 3:34 PM, teppo said:

Just for the record (not saying that it would be the best solution overall), FieldtypePageIDs also has certain benefits if/when the problem is loading too many Page objects into memory. It could be worth looking into in this sort of situation ?

Thank you for mentioning this module. I haven't heard about it until now, but this could also be a solution. ?

On 11/22/2019 at 11:03 PM, Robin S said:

@bernhard's suggestion and @teppo's module are good workarounds but I think it's important to get a core solution to this. PW has an "everything is a page" philosophy, so pages can be any unit of data, not necessarily a viewable page of content. And Page Reference fields are the fundamental way to make connections between pages in PW. So when you have thousands of pages that need to have connections to thousands of other pages then you have to start looking beyond the core fieldtypes which is a limitation for PW.

Yes, this would be a great enhancement for the page reference fields. I hope @ryan will consider adding this into the core.

 

Thank you all for your help.

Regards, Andreas

Link to comment
Share on other sites

On 11/22/2019 at 8:30 AM, Robin S said:

a Page Reference field doesn't scale well because all pages are loaded into memory and there is no ability to limit the loaded pages or paginate the inputfield in admin.

This is really the problem: the fieldtypes/inputfields having no support for pagination. I hit that on a project of mine, where users are linked to tickets. Given that the event in question does attract up to around 10k users registering I needed to hide the inputfield completely, because even just showing the value uneditable would load all the users when showing the ticket.

  • Like 2
Link to comment
Share on other sites

Since @ryan most likely won't read this thread, maybe someone should open an issue (with label "enhancement") here: https://github.com/processwire/processwire-requests/issues?q=is%3Aopen+is%3Aissue+label%3Aenhancement

Most probably, this would mean a huge refactoring of lots of core-code, but hey, we would all profit from a performance boost in situations mentioned in this thread.

Link to comment
Share on other sites

1 hour ago, dragan said:

maybe someone should open an issue

What we're discussing is a feature request rather than a bug to report, and there already is the feature request I linked to in my earlier post.

I've added a comment linking to this thread but like all such things Ryan will just get to it when he gets to it. This is feature request 13 of 269 and counting. ?

  • Like 7
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...