Jump to content

MySQL Database to Processwire Pages. Opinions?


theo
 Share

Recommended Posts

Hello

I would like to get some opinions of you about the pros and cons of migrating a MySQL Database to PW Pages.
If there is already such a thread, please redirect me.

Think of the skyscraper demo but bigger. ;-)  http://demo.processwire.com/

A data administration tool with hierarchies of objects and persons and news etc. and everything is somehow linked together.

This MySQL Database exists and I ask myself, if I should write a new "admin" backend for it (using the existing DB-Schema), or
import everything as PW Pages.

I can Imagine the following:

PRO PW Pages
- Creating the "admin" backend with PW is easier than writing an "admin" from scratch.
  With Repeaters and the Page Fieldtype it should be possible to link things together.
  Tools for Image/File upload, CKEditors etc are already there.
- Querying the data for the frontend should be easy.
- Everything has the same look, object- and persons administration as well as more webpage centric pages (home / about..).

CONTRA PW Pages
- More bloated database. Many field types. Probably slower.
- A kind of vendor-lock-in. If in a few years you would like to move away from PW, exporting the whole thing including relations back to MySQL is possible but a lot of work.

What do you think? What would you do? Do you think of a third way?

I'm looking forward to read your opinions. :)

 

Link to comment
Share on other sites

on the CONTRA side:

- i personally wouldn't care much about database bloat - how many records?
- speed may not be an issue, depends on how you are using the records, and if you have caching on the front end
- vendor lock in - you can easily write a script to export the records into CSV files and import to a new database; the data might change a lot over time, and what if you want to add fields to data types etc.

on the 3RD way:

you can leave the data in MySQL and access it in ProcessWire but, you can't really edit it that easily unless you start writing custom modules/interfaces and doing SQL queries..

  • Like 4
Link to comment
Share on other sites

16 hours ago, theo said:

Think of the skyscraper demo but bigger. ;-)

It depends on how big is big. In one of our bigger projects, we designed the whole system around ProcessWire's page-templates-fields paradigm. It was designed in such a ways that it looked almost like a normalized relational database with PageReference providing the "relationships", but with each page easily editable in the admin. This was working well, until it wasn't. I'm guessing it was because we "normalized" the data too much that we were hit with performance issues when we started making big queries specially with aggregates, and with traversals. Of course this wasn't ProcessWire's fault but our own because we wanted so much to be able to code using mostly PWs API and very minimal raw SQL. 

Our intermediate fix was to write the queries manually to make the huge queries. But after a while the solution felt dirty, so in the end we refactored our data, pulled them out of ProcessWire's database and moved them to a separate database. We used a half-and-half ORM solution to access the data and wrote custom modules to edit them on the backend as @Macrura suggested.

 

 

So what I'm trying to say TL;DR is:

  • If you want to use PW to manage your database, it'll be great.
  • It's not how "big" your data is, but how complex the relationships will be. And maybe not even that. It depends on how much you want to rely on the API to get your data. We were doing all well and good until we wanted to get something like $page->parent->child->children()->child on some reports, or something similar which of course resulted in hundreds of JOINS. (Don't do that, seriously, we were stupid. Lazy and stupid :undecided:).
  • But if it's something where you're doing huge queries and reports, it will work but it will not be the fastest unless you start writing custom SQL queries.
  • If you're lazy like me, use an ORM if you decide to use a separate database. But this of course will have an overhead in learning curve and in writing the custom modules for the admin.

I hope this helps. :/

  • Like 6
  • Thanks 1
Link to comment
Share on other sites

I think this heavily depends on the use cases and the size of the project (e.g. amount of data, people involved). I am currently working on a project where I started to do everything from scratch, but then moved to PW because it saves a lot of time regarding standard functionality (e.g. permissions, login, caching, security in general).  

I have also worked for several companies building Saas platforms - for these kind of uses cases PW would be not the best choice. The use cases here were much to special, need a lot of DB tweaking, rely on different DB types (e.g. Elastic Search, Postgres, Lucene) and so on. 

If you want to save time by having standard functionality out of the box: I'd use PW. If you have enough capacity to build this on your own because you expect that you scale that fast that PW is simply the "wrong" framework: I'd think of a pure framework. 

But as said: It's easy to export data from PW, so this might be a third option: start with PW, scale fast, move to something else ;)

  • Like 2
Link to comment
Share on other sites

@chrizz : Thank you.

Another aspect: What is your experience with PW regarding data loss or problems after upgrades etc?

Especially with Page References, Repeaters, Page Tables,etc.

No problem? Nothing that magically disappears? This would be really bad.

Link to comment
Share on other sites

@theo I've experienced transferring data both to and from ProcessWire with no major issues, and this was before the new import/export features which I haven't tried yet. 

Exporting data from ProcessWire is straightforward using CSV, or SQL dump. One thing to watch out for (but might not be a problem anymore with the new import/export features) when transferring to ProcessWire is making sure the order of templates are correct, meaning you have to import templates that are needed by other templates through PageReference.

But it's all in all relatively painless.

 

  • Like 1
Link to comment
Share on other sites

8 hours ago, alxndre said:

$page->parent->child->children()->child

???

@alxndre Thanks for the insights. So what you are saying is you still used PW, but the really large data-sets are in mySQL, and to gain query speed you used LessQL? Or did you even choose another DB-engine to optimize performance?

Another question: Did you compare PW-queries vs. native mySQL queries? Or native mySQL queries vs. LessQL?

"how big is big"? Care to give us an idea? 100'000+, millions?

btw, there are similar discussions in the forums... @bernhard's CRM case study comes to mind, and a new module in the works.

 

  • Like 1
Link to comment
Share on other sites

5 minutes ago, dragan said:

 So what you are saying is you still used PW, but the really large data-sets are in mySQL, and to gain query speed you used LessQL?

We still used ProcessWire in areas where it makes sense like users management, presentation related stuff, routing, web API, etc. We moved all business objects to a separate database (MariaDB, InnoDB engine) and used LessQL to simplify data access. We specifically picked LessQL because working with it still somehow feels like working with the PW API. 

$people = $lessQL->person()->select("id, firstname, lastname")->where("firstname LIKE ?", "%alex%")->orderBy("firstname")->limit(10);

 

About this:

$page->parent->child->children()->child

Well this is an exaggeration, but we really did have something similar. We worked with a C# ORM before this where it was perfectly okay to do:

// pseudocode
totalForThisOrderId = person.orders(id).items().sum(price)

And as I've said, this was not a problem with ProcessWire itself but with the way we designed our data structure in the PageTree where we had:

Transactions
  \_ Transaction #1
      \_ Line Item #1
      \_ Line Item #2
  \_ Transaction #n
      \_ etc...

So now to get the sum of the line items from the transaction where a line item comes from we'd write:

$sumTransactions = $lineItem->parent->children()->sum('price');

It's far from ideal, but it was the quickest way at the time to code based on our previous practice.

 

33 minutes ago, dragan said:

Care to give us an idea? 100'000+, millions?

We're currently at around 50 million records, I think. But like I said, ProcessWire gracefully scales with this. It only starts slowing down when you try to aggregate or count something huge. We were doing reports on hundred thousand records at a time. It still worked, but it was far from fast -- which was mostly our fault. Using LessQL with ProcessWire allowed us to transition without starting over from scratch, and still have the all the benefits of ProcessWire which made us decide to use it in the first place.

  • Like 5
  • Thanks 1
Link to comment
Share on other sites

Hmm, I am making some tests.
Having this code:

$time_start = microtime(true);
$count=0;

foreach ($pages->find('template=tpl_families') as $fam) {
    echo '<h4>' . $fam->title . '<h4>';
    foreach ($fam->children() as $model) {
        echo '<h5>' . $model->title . '<h5>';
        foreach ($model->children() as $size) {
            echo '<h6>' . $size->title . '<h6>';
            $count++;
        }
    }
}
 
$time_end = microtime(true);
$time = $time_end - $time_start;

echo 'Objects : '.$count;
echo ' / Execution time : '.$time.' seconds';

Returns:

Objects : 794 / Execution time : 1.4819648265839 seconds

Not really fast for just echoing the titles.

OK, it's only running on an i3 but should have enough memory.

How can I speed it up? Or do you think this speed is OK?

Thank you.

Link to comment
Share on other sites

This is a little bit faster, but just a little.
 

$time_start = microtime(true);
$count = 0;

foreach ($pages->find('template=tpl_sizes') as $size) {
    echo '<h4>' . $size->parent->parent->title . '</h4><h5>' . $size->parent->title . '</h5><h6>' . $size->title . '<h6>';
    $count++;
}
$time_end = microtime(true);
$time = $time_end - $time_start;

echo 'Objects : ' . $count;
echo ' / Execution time : ' . $time . ' seconds';

 

Link to comment
Share on other sites

@kongondo:

Thanks, just in time. ;)

I just wanted to ask why this works..

foreach ($pages->find('template=my_tmpl, limit=2500') as $endo) {
    echo '<p>' . $endo->title . '</p>';
}

...and this returns nothing (higher limit):

foreach ($pages->find('template=my_tmpl, limit=3000') as $endo) {
    echo '<p>' . $endo->title . '</p>';
}

Using findMany it works. :)

Link to comment
Share on other sites

1 hour ago, theo said:

and this returns nothing (higher limit):


foreach ($pages->find('template=my_tmpl, limit=3000') as $endo) {
    echo '<p>' . $endo->title . '</p>';
}

No error thrown? I haven't checked but guessing maybe a limit on the number of joins SQL allows? (The field Title is auto-joined).  

  • Like 1
Link to comment
Share on other sites

8 minutes ago, kongondo said:

No error thrown? I haven't checked but guessing maybe a limit on the number of joins SQL allows? (The field Title is auto-joined).  

No, nothing. It's as if the server would hang up.
The browser's load indicator stops and the old page remains.

Thanks.

Link to comment
Share on other sites

16 minutes ago, kongondo said:

I would have placed my bet on memory then, but you say there are no errors? $config->debug is true?

Yes, debug mode is on.

It's not a problem since I know about "findMany", it is just a liitle scary. ???

ProcessWire 3.0.93 / PHP 7.0.7

Link to comment
Share on other sites

Hmm, I think for certain queries, it is indeed better to use the database directly. Speed difference is remarkable (~64x)
These two snippets produce the same output:

$result = $this->db->query("SELECT id, data FROM pages LEFT JOIN field_title ON pages_id = pages.id WHERE templates_id=54 ORDER BY data limit 5000");
while($row = $result->fetch_array()) {
    echo '<p>' . $row['data']. '</p>';
    $count++;
}
//Objects : 4067 / Execution time : 0.042616128921509 seconds

 

foreach ($pages->findMany('template=my_tpl, sort=title, limit=5000') as $item) {
    echo '<p>' . $item->title . '</p>';
    $count++;
}
//Objects : 4067 / Execution time : 2.7485032081604 seconds

 

Link to comment
Share on other sites

Without trying to invalidate the endresult – mysql + joins will probably be faster – your example code is comparing apples to oranges. A $page->findMany does do quite a few things more than just a db query for data even if it's just constructing the Page objects and putting stuff in different caches. Also I'm not sure your title field is set to autojoin? If not that means you're comparing a single DB query to 4068  DB queries (1 for the items and one for each title). In cases where you're iterating a lot of pages you'll need to keep a look at the query count and not fall into n+1 problems.

  • Like 3
Link to comment
Share on other sites

@LostKobrakai:

I am not comparing these codes technically.
I'm totally aware that they do different things "behind the scenes".
But I don't think it is bad to be aware of the fact, that there might be a faster way to access PW data in certain cases.
For example for data export (csv) or the like.

Link to comment
Share on other sites

Autojoined (autoload) fields are loaded into memory every time a page is called, in contrast with non autojoined fields which are fetched only when they are used.

// if title is autojoined (by default)

$something = $pages->get(123); // get a page from db
echo $something->title; // no extra db hit because it was already fetched when the page was loaded
echo $something->else; // pw will query the db to get the field

 

  • Like 3
Link to comment
Share on other sites

@alxndre:

Ah. Thank you.

Is there an elegant way to define autojoin fields on-the-fly before using $pages->findMany() or in the selector?

I know I could probably do it like this example:

foreach($fields->find("name=myfield1|myfield2|myfield3") as $f) {
 $f->addFlag(Field::flagAutojoin);
 $f->save();
}

Another question: What happens without $f->save() ?

Will it auto-join in the subsequent find() call or not?

Or do I have to remember-state -> save -> restore-state?

Link to comment
Share on other sites

Answering my own question:

Yes, Autojoin makes a measurable difference in speed and no, you don't need "$f->save()" to speed up the subsequent find() operation.

So this is enough:

foreach($fields->find("name=sku|inactive|new|discontinued") as $f) {
   $f->addFlag(Field::flagAutojoin);
}

The settings in the field dialog will not be changed this way.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...