Jump to content

PW with lots of data. Rocks.


Webrocker
 Share

Recommended Posts

Hi,

I just wanted to sing a song of praise of how smart actually PW is (designed).

From my recent posts here in the forum you may know what I'm currently trying to do in my spare time - I port/migrate a fairly popular handbuilt 16 year old forum over to process wire. Reason is; the forum's users have grown to like the not-like-the forums out there look of the 16 year old originally perl/cgi based thing, which basically is nothing else like a nested list in most views, which then got enhanced whith loads of custom pages, functions, galleries and wutnot. 16yrs worth of procedural php code, created by at least 4 different and differently skilled developers. Bring it on! :-)

Whatever the reason, I thought this might be the perfect pet project to get to know PW in and out.

And I really have to say: I'm deeply impressed. I have over 15 years experience wiith different CMSses, and I consider myself as fairly skilled in TYPO3 (yes, with that Typoscript riddles), know my way around cutsomizing and theming Wordpress sites since wp2.1 and I messed around with several other popular and not so popular CMSs as a part of my daily work.

For like 1.5 years now I use PW as the "motor" of most sites I build. And now, with this "pet project" of mine, I'm really starting to fall in love :wub: :

  • Being able to bootstrap the API and creating scripts that I can "fire" via the commandline, directly on the server, connecting to other databases, helped immensly with the task of getting 186000 posts by 5000 users into a processwire install.
  • creating the relations between these posts (which one is the root of a "discussion", what is the "parent" of the current post) was easily done be using "Page" fields and again a script that scraped that info from the old structure and populated the page fields accordingly. I plan to write a detailed how-to once the migration is completed and working, but this could take some time ;-)
  • Now the backend would time out if I wanted to edit one of the post. Caching the templates didn't help, so I thought about what was going on "under the hood" -- and after a little bit of head scratching it occured to me that maybe the 4 "Pages" field each post carried, were to blame, esp. the 2 fields referencing the whole collection of posts: "post_root" and "post_parent" -- each of which could theoretically be one of the other 185999 posts in the database. Since the default option for the visibility of "pages" fields are "select" and "open", PW needs to query the 185999 posts - and that brought down my php script execution limit and/or the allocated memory size of 152MB. Wow. But guess what, and that's where my romance with PW started for real: If you switch the visibility to "closed", or in my case to "closed with ajax on opening" the backend will run fast again. Now the selects will only be populated (with ajax) if I need to change the post_root or post-parent - which should never be necessary in the everyday use of that data.
  • Initially I had imported the 4000 users not with the system's user template, but a custom one. While this worked without a problem, it brought down the lister, which wouldn't respond and time out after like 5 minutes. I still don't know why, but changing those users to the system user template got rid of that problem, and now they are behaving quite well.

This is so cool.

What started as a late night idea seems now to grow into a real project. With all that importing and moving imported data to pages data, I had like half a million pages in this install, and PW handled this quite nicely. In the meantime I have cleaned up and currently I'm down to about 186000 pages plus 4000 users plus some static pages. This wouldn't have been possible without the option to use the PW API from the outside.

Thank you Ryan

for this great piece of software.

Cheers,

Tom

  • Like 21
Link to comment
Share on other sites

Cool - let us know how you get on with the front-end as I'm always interested to hear about large data projects.

For topic lists on the front-end, pagination is obviously your friend, but also so is autojoining some oft-used fields in your post template - things like topic starter, topic start date, last reply date, last reply user, avatar (just guessing at fields here).

Basically for the most common fields that are always going to be required whichever way you're viewing a topic (full topic or in a list or search results etc) if you autojoin them it saves on queries.

This page explains it better (no idea why I've not bookmarked this one myselef yet :)): https://processwire.com/talk/topic/26-what-is-the-autojoin-feature-in-the-fields-editor-and-how-do-i-use-it/ 

  • Like 2
Link to comment
Share on other sites

Hi Pete
 

Basically for the most common fields that are always going to be required whichever way you're viewing a topic (full topic or in a list or search results etc) if you autojoin them it saves on queries.

thx for the link and the hints regarding the auto join… I was wondering about that option while inspecting my template's fields… now this makes more sense to me, great! :-)

cheers
Tom

Link to comment
Share on other sites

Hi Sergio,

this is a fairly run of the mill managed server; it is hosted at/by german provider 1&1, who don't excell at their shared hosting offers, but the servers are quite ok

Intel® Atom™ C2750 8 Cores x 2,4 GHz (2,6 GHz Turbo Boost)

8 GB DDR3 ECC

1.000 GB (2 x 1.000 SATA)
opt: 240 GB ( 2 x 240 GB SSD)
Intel® S3500 / Software RAID 1

The issues I encountered  were mostly time outs by either the php/apache or the mysql server. calling the import scripts via the php/apache led repeatetly to "gateway timeout" after about 5 minutes. I didn't test the import scripts from the console, though.

I maxed the php ini settings, but it seems that the server has some settings that I cannot override; for example the max allocatable memory for php is 152mb, no matter what I may set in php.ini. And the mysql server will time out after about 5 minutes, even with settings as high as 1hr or more in the config.

cheers
Tom

  • Like 1
Link to comment
Share on other sites

  1. Now the backend would time out if I wanted to edit one of the post. Caching the templates didn't help, so I thought about what was going on "under the hood" -- and after a little bit of head scratching it occured to me that maybe the 4 "Pages" field each post carried, were to blame, esp. the 2 fields referencing the whole collection of posts: "post_root" and "post_parent" -- each of which could theoretically be one of the other 185999 posts in the database. Since the default option for the visibility of "pages" fields are "select" and "open", PW needs to query the 185999 posts - and that brought down my php script execution limit and/or the allocated memory size of 152MB. Wow. But guess what, and that's where my romance with PW started for real: If you switch the visibility to "closed", or in my case to "closed with ajax on opening" the backend will run fast again. Now the selects will only be populated (with ajax) if I need to change the post_root or post-parent - which should never be necessary in the everyday use of that data.

Template caching won't help, since it is purely for frontend. Are you using select as inputfield for post_root and post_parent? In page fields where you can have very many options (like thousands) you really should use inputfield that doesn't load all pages, like autocomplete. Also editing that value comes easier instead if dropdown with 185999 items :)

  • Like 2
Link to comment
Share on other sites

The autocomplete and pagelist inputfields are the ones, that load their pages via ajax and/or paginated, so these are the goto inputfields for this number of pages. What I would really dig is, if there was an inputfield, that just opens a modal window with a lister, where you can filter available pages and select/deselect them. This would be the killer inputfield for large numbers of selectable/selected pages.

  • Like 5
Link to comment
Share on other sites

Hi,

good call, I will check out the autocomplete input for the backend. for the moment I'm quite happy with the backend performance.

Currently I'm tackling the problem of how to get the _contents_ of 186000 html post files into the body field of my post pages :)

Link to comment
Share on other sites

  • 1 year later...
On 2015-8-17 at 6:44 PM, LostKobrakai said:

What I would really dig is, if there was an inputfield, that just opens a modal window with a lister, where you can filter available pages and select/deselect them. This would be the killer inputfield for large numbers of selectable/selected pages.

Here it is...version 3 of Visual Page Selector has this feature....;)

 

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...