Jump to content
fruid

unusual Database structure

Recommended Posts

I'm quite new to Processwire, and I struggle with the database structure.

Each page has specific fields. To my understanding, most of the tables in the database, each corresponds to a field and each field is linked to another field on the same page by the page_id. That's all I can see when viewing the field: pages_id and data.

It would of course come in very handy if you could view each page as a table instead, a column for each field and all the content in the cells next to one another.

I doubt this issue hasn't been addressed before, it's unusual the way it is now. I now wonder if I mixed something up at the very beginning and if there's a way to change or work around this dilemma.

My client is actually quite educated on databases and would enter the content rather via SQL than via PW-admin-backend. 

Thanks for help!

Share this post


Link to post
Share on other sites
59 minutes ago, fruid said:

I doubt this issue hasn't been addressed before, it's unusual the way it is now.

Please explain, what is unusual? You have a data that is structured in an unusual manner or you think ProcessWire's database structure is unusual? If the latter, then many here will beg to differ 😄 

1 hour ago, fruid said:

It would of course come in very handy if you could view each page as a table instead, a column for each field and all the content in the cells next to one another.

Do you mean a page or module where you can view a page's data in this manner? Like in a spreadsheet? Or, do you mean a database that resembles a spreadsheet in its structure? If the latter, that's akin to a flat file database. ProcessWire uses a relational database. This model is not unusual at all :-). 

1 hour ago, fruid said:

I now wonder if I mixed something up at the very beginning and if there's a way to change or work around this dilemma.

I don't understand this bit. What is the dilemna?

  • Like 1

Share this post


Link to post
Share on other sites

please excuse my ignorance, I learned something today, will look into that.

18 minutes ago, kongondo said:

I don't understand this bit. What is the dilemna?

I was just wondering if for example there's any advantage in regards to the database complexity, either in using the same field on different templates or in creating a new field of the same type for each template. 

thanks!

Share this post


Link to post
Share on other sites
17 hours ago, fruid said:

I was just wondering if for example there's any advantage in regards to the database complexity, either in using the same field on different templates or in creating a new field of the same type for each template. 

Each field may belong to more than one template, and each template has a different set of fields. Current structure works well with that concept, makes it possible to connect (or disconnect) fields with/from templates with ease, makes it unlikely for a single table to grow to giant proportions (thus making all queries against it slower), and also allows fetching/searching/saving the exact data that ProcessWire needs to fulfil a specific request.

So yes — there are advantages to current structure. It's also a very fundamental part of ProcessWire, so changing it is not possible without major changes to the core 🙂

19 hours ago, fruid said:

I doubt this issue hasn't been addressed before, it's unusual the way it is now.

It has been. If you'd like to read a bit more on it, I'd suggest doing a google search for something like "processwire database structure". You'll find a lot of existing content on this topic 🙂

Your point of view is not unheard of for newcomers but trust me, there are valid reasons why the database architecture is what it is. Much of it is due to the fact that ProcessWire — unlike some competing platforms, I might add — was designed with custom data structures (custom fields) in mind from the ground-up. Some other systems (WordPress, for one) have a much simpler database structure, but that's because they weren't originally intended for the same kind of use as ProcessWire.

19 hours ago, fruid said:

My client is actually quite educated on databases and would enter the content rather via SQL than via PW-admin-backend.

In the context of ProcessWire this would be a bad idea:

  • First of all the Admin is a ready-to-use tool for managing content, and I highly doubt that anyone will really have easier time managing the content with raw SQL. It can be fun and/or if you've had to do that a lot in the past you may be used to it, but still: ask them to give the Admin a try and I bet that this idea will go away in no time.
  • If you manually update the rows in the database, most of what ProcessWire's fields do will be completely skipped. This includes validation, filtering, and sanitization; things that are there to help you build sites with valid and well formed data. Without these features you will run into trouble eventually, it's just a matter of time.
  • ProcessWire does internal cleanups and such using hooks, and those will not get triggered if you update the data manually in the database. This means that you'll likely be left with broken data, missing pieces here and there, and so on. Some fields (take Repeaters for an example) will also be very difficult to update manually via database.

Finally, while we're on the topic of hooks: they are a major feature in ProcessWire, used by both core and third party (module) code — and, once you get used to it, probably your own code as well — and again if you don't go through the "official channels" (API or Admin) you'll loose this benefit altogether.

  • Like 8

Share this post


Link to post
Share on other sites

Hi @fruid,

I understand your concerns. The db structure can look quite complex, especially if you are used to working with db tables and SQL select ... from ... etc; Teppo explained the reason for this structure. The great thing about this structure is that you get an abstraction layer that makes all the PW awesomeness possible. It transforms all the custom fields and data in the database into PHP page objects that can be used for easy and effortless markup generation. I'm talking about the great pw api 🙂 

echo $page->title;
echo $page->headline;
echo $page->image->size(200,200)->url;

Display all that in multiple languages? Same code 😎 I think that's really genius!

Are there downsides of this approach? Yes, as always. For example it is not easy for PW to do a "SELECT * FROM table_xy" to get a list of thousands of rows of data. That's because the magic of transforming the complex db structure into an easy to understand and use API has some costs. It needs to load all rows of data into memory and therefore this get's slow when working with lots of data. PW handles this by applying pagination wherever possible, so that it only loads chunks of data and stays fast. But still there might be situations where you simply need a good old "SELECT * FROM ..." and "foreach($pages as $page) $rows[] = [$page->title, $page->headline, ...]" is no option.

That was quite a long introduction and explanation why I built RockFinder3 😄 

So at least the PULL part of your request is already doable 🙂 What about the PUSH part (meaning updating data in the DB, doing "UPDATE ... , INSERT INTO ...")?

First, you can still use native SQL commands on PW, it's quite easy:

$result = $this->database->query("SELECT * FROM pages LIMIT 5");
var_dump($result->fetchAll(\PDO::FETCH_OBJ));

The problem is, that updating data can get quite complex because you need to update several tables, several fields, several languages...

That's why such operations should really be done via API. That's of course a totally different approach if you are used to working with SQL commands, but it is the best option in 99,5% of the cases. There's a topic about that where I showed my findings:

For the remaining 0,5%:

image.png.27b0fad8e865cda2b05e34d524e451d1.png

You see, it can be quite easy. Is it a good idea? 99,5% no, because you don't get all the security features of PW that ensure that data is sanitized before storage etc. And you don't get the power of hooks. Updating pages via API will still trigger saveReady() and saved() hooks while direct SQL updates will not.

Hope that helps 🙂 

 

  • Like 5

Share this post


Link to post
Share on other sites
13 hours ago, fruid said:

That's all I can see when viewing the field: pages_id and data.

It would of course come in very handy if you could view each page as a table instead, a column for each field and all the content in the cells next to one another.

This is not the whole picture. Simple fields might only have a single `data` column, but complex ones might have many more. Also multi-language support does work by adding `data_{lang}` columns to those field tables.

Though I can see the argument for having template (or more correctly fieldgroup) based tables, where columns of fields would be merged into an single table.

  • Like 3

Share this post


Link to post
Share on other sites

Then I'm afraid I might be among the said 0.5%.

While I have learned the PW-API and built the entire website, the data-handling and migration of an existing old joomla-database and its maintenance was planned to be done by the client via SQL, who would browse the tables and change a lot of data at once, all, hopefully, without the need to learn the PW-API.

As much as PW shows clear advantages for the developer in terms of building the website, these advantages pretty much take a backseat once the website is finished and goes live, isn't it?

Share this post


Link to post
Share on other sites
16 hours ago, fruid said:

It would of course come in very handy if you could view each page as a table instead, a column for each field and all the content in the cells next to one another.

That was the way they did it in other CMSs I used in former times - before I luckily found ProcessWire!
It was like a breeze of fresh air after a long stuffy time... 

Quote

I doubt this issue hasn't been addressed before, it's unusual the way it is now. I now wonder if I mixed something up at the very beginning and if there's a way to change or work around this dilemma.

The dilemma you are in is to incline to habits you are familiar with. Try to understand the way PW works, and you will soon be convinced. (And as a next step: Try to convince the client...)

All the best!

  • Like 4

Share this post


Link to post
Share on other sites

@fruid I don't really understand your use-case. You say your client is going to use SQL to update the database, does that mean they are going to write raw SQL queries in front of a terminal? That seems like a really inefficient way to go about content creation, especially since you have to make sure your data fits within the constraints defined in the template and field settings. Despite that, you can still do that, you just have to add a couple of JOINs and be careful with your WHERE clauses. Each field table comes with a foreign key corresponding to the page ID. If your client is really some kind of SQL wizard who prefers the power of raw SQL for data migration over the limited interfaces puny mortals have to use, what's stopping them?

Or do you mean "the client is going to log into phpMyAdmin / Adminer / MySQL Workbench / ... and insert data through that interface"? In that case, you're not really using SQL anyway, you're just using an interface that's closer to the database, that *may* be slightly more efficient to use for batch creation / updates if you really know what you're doing. If that's what your client wants – well, that's an interface you can build for them! Something like a batch update module that lists pages and allows you to edit them inline. In fact, such a module exists already: Lister Pro comes with inline editing for multiple fields at once. This gives you a convenient interface to update many pages at once and still stay within one backend and have all the input / constraint validation apply to your edits. Best of both worlds 🙂

By the way, if you've ever tried to manually find something in a Drupal 8 database, you will like ProcessWire's database structure MUCH more ...

  • Like 9
  • Thanks 1

Share this post


Link to post
Share on other sites
20 hours ago, fruid said:

While I have learned the PW-API and built the entire website, the data-handling and migration of an existing old joomla-database and its maintenance was planned to be done by the client via SQL, who would browse the tables and change a lot of data at once, all, hopefully, without the need to learn the PW-API.

As much as PW shows clear advantages for the developer in terms of building the website, these advantages pretty much take a backseat once the website is finished and goes live, isn't it?

One more addition to this:

The API is, in fact, one of the biggest advantages of ProcessWire. While the preferred term is "content management framework", one may as well think of it as an ORM of sorts. ProcessWire was never designed for direct database access, and that's the whole point: it's meant to (mostly) abstract the database layer away so that both developers and content editors can (as much as possible) avoid the complexities and inherent risks of dealing with SQL queries.

I must say that I don't agree at all with your point about ProcessWire's advantages taking a backseat once the website is finished. I get where you're coming with this, but I'd assume that we can both still agree that it's extremely rare for a client to actually prefer SQL over a GUI — let alone them being competent and meticulous enough to produce data that fulfils all expectations we as developers might have.

Obviously we come from different backgrounds and thus have different expectations. If I would've suggested to any of the clients I've ever worked with a platform that requires them to use SQL, that would've been a disaster 🙂

  • Like 3

Share this post


Link to post
Share on other sites

Apart from working with ProcessWire, my other area of work is SQL databases, where I tend to have nearly all application logic contained in the database through views, stored procedures, user defined functions, triggers etc, so I understand how coming to ProcessWire can take some getting used to.

If you simply want to import data from an existing system, then there's no problem to mix SQL queries in PHP and calls to the ProcessWire API to read data from an existing database using PHP's PDO or ODBC abstraction layers and then store fields to ProcessWire fields.

I've done this myself before importing data from another system, and it's not overly difficult.

Here's a bit of code I wrote a few years ago to import some data from another system into ProcessWire. 

Obviously the SQL will vary depending on what system you're importing from, and how you've named your ProcessWire fields will be different.

This is old code, and untested, although I think it worked at the time.

Having the Tracy Debugger module installed is a must have, as it provides a PHP console with code completion for the ProcessWire API, so that you can run ad-hoc bits of PHP code to do stuff like importing data without having to write a module or embed it in a template if it's just one off code that will only be run once.

$webpressiondb = new PDO('mysql:host=127.0.0.1;dbname=mol_webpression', $user, $pass);            
$sql = "SELECT {$site}_docs.docid, {$site}_docs.parentid,custhead, subindex, title, keywords, author, {$site}_docs.modified, created, wbdesc, name, content, template, folder FROM {$site}_docs INNER JOIN {$site}_pages ON ";
            $sql .= " {$site}_docs.docid = {$site}_pages.docid ORDER BY {$site}_docs.subindex";
            $wbdocs = $webpressiondb->query($sql)->fetchAll();
            foreach ($wbdocs as $record) {
                //Need to allow for old records with no title.
                if (preg_match("/[a-zA-Z0-9]/i", $record['name'])) {
                    print '<h1>' . $record['title'] . '</h1>';
                    $pwpage = new page();
                    $pwpage->parent = $pages->get($parent);
                    $pwpage->template = $templates->get("rosina");
                    $pwpage->title = $record['name'];
                    $pwpage->text3 = $record['title'];
                    $pwpage->text1 = $record['wbdesc'];
                    $pwpage->keywords = $record['keywords'];
                    $pwpage->textHtml1 = $record['content'];
                    $pwpage->wid = $record['docid'];
                    $pwpage->wparent = $record['parentid'];
                    $pwpage->worder = $record['subindex'];
                    $pwpage->wcreated = $record['created'];
                    $pwpage->wmodified = $record['modified'];
                    $pwpage->custhead = $record['custhead'];
                    $pwpage->save();
                }
            }

If you have cases where you really need traditional SQL tables with multiple fields in ProcessWire, you can create your own custom fieldtypes to do this, but you need to think carefully about whether you need this.

The ProcessWire approach can be slow at bulk inserts, as each field results in a database INSERT statement, however retrieving data, especially if you don't need all fields is fast as it doesn't need to load fields in a query if you don't use them.

If you're going to be doing a lot of multiple field inserts on an ongoing basis, and you're also going to want to retrieve most or all of the fields when you read the data, creating a custom fieldtype that's a traditional SQL table might give you performance advantage, but to date, I've generally found I've been able to work within ProcessWire's way of doing things.

  • Like 1

Share this post


Link to post
Share on other sites
5 hours ago, teppo said:

we can both still agree that it's extremely rare for a client to actually prefer SQL over a GUI

  I agree it's rare, and yet it's still the case. I don't know much SQL, am also not in charge of the database and the client IMO shouldn't have to learn the PW-API, as cool as it clearly is. 

On 8/5/2020 at 2:04 PM, MoritzLost said:

they are going to write raw SQL queries in front of a terminal

yes indeed that's what they used to do before PW.

1 hour ago, Kiwi Chris said:

If you have cases where you really need traditional SQL tables with multiple fields in ProcessWire, you can create your own custom fieldtypes to do this, but you need to think carefully about whether you need this.

That sounds like re-building huge parts of the site and I thought I'm almost done. What I wanted was for all fields to show more relation to other fields on the same page. Field-groups so to speak, as mentioned by @LostKobrakai

Just as an example (and possibly another issue).
I have a field on a template that always has a value (default-value).
I have 19 pages with that template, yet only 14 rows in the field-table in the database.
While I don't know why it wouldn't enter the value for 5 of those pages – like I said, different issue – I had a hard time figuring out which pages those are.
What I did is to write:

echo $page->id;

on the allways included _main.php to see which id the page I'm looking at has. Then open all the pages manually and see which page-id is not in the said column. I'm sure there's a more pro-approach to that conclusion using the API, but for someone who doesn't use it at all, it remains a hassle.

In any case, thanks @bernhard and @MoritzLost for your suggestions RockFinder3 and ListerPro respectively. I guess one of them (I reckon the latter) will be the solution and will have to replace the direct SQL-commands in the end.

Appreciate all your input.

 

Share this post


Link to post
Share on other sites

Just curious, what are you using from ProcessWire that is useful for your project?

Share this post


Link to post
Share on other sites

The client really is a SQL-wizard, they've been running and maintaining a website for decades, with new content coming in or needing to be updated on a monthly basis, much of which is far too large to be handled manually via a CMS. So convincing him to use the CMS-input instead is quite utopian.

And I think we are seeing the first symptoms of what was warned about when forgoing the CMS: many "pages" (to use PW way of thinking) have the same name. Upon SQL-import, the url defaults to the page's name which creates duplicates and the import thus breaks off. I guess that's something that wouldn't happen when using the CMS-input-method instead as newer entries with the same name (and the same url) would be serialized (i.e. appended with -1, -2, -3, etc.).
I can imagine that more problems would arise that I cannot think of just yet and since the client's knowledge of the framework is also limited, how would they know other than by trial and error? 

Can you use ListerPro to bulk import as well? @MoritzLost

Share this post


Link to post
Share on other sites

Quick idea, not well thought through: You could also create a custom DB table on which the client can work on and you write a script that pulls data from that table and populates pages via API (or synchronizes both).

  • Like 2

Share this post


Link to post
Share on other sites
On 8/6/2020 at 6:35 PM, elabx said:

Just curious, what are you using from ProcessWire that is useful for your project?

I am curious about this as well Was it Joomla? Why did you decide to stop using Joomla? Why ProcessWire?

Share this post


Link to post
Share on other sites
On 8/7/2020 at 3:15 PM, fruid said:

Can you use ListerPro to bulk import as well? @MoritzLost

Not sure, but I don't think so. It's primary function is to list and edit existing pages. Might be there's an action for bulk import / creation, but I haven't seen it yet. But again, if your client absolutely needs something like that, why not build a simple interface that will allows them to batch create pages? See the example code above by @Kiwi Chris, it's literally that simple. I've even done some imports directly from the Tracy Debugger Console before, and it took like 15 minutes ... all you need is a couple of lines to create a new page, and then assign values using the field names in your template. Add an file_get_contents to read from a CSV file or something, a foreach loop and you've got your reliable and simple pages import.

  • Like 1

Share this post


Link to post
Share on other sites
On 8/8/2020 at 1:15 AM, fruid said:

The client really is a SQL-wizard, they've been running and maintaining a website for decades, with new content coming in or needing to be updated on a monthly basis, much of which is far too large to be handled manually via a CMS. So convincing him to use the CMS-input instead is quite utopian.

And I think we are seeing the first symptoms of what was warned about when forgoing the CMS: many "pages" (to use PW way of thinking) have the same name. Upon SQL-import, the url defaults to the page's name which creates duplicates and the import thus breaks off. I guess that's something that wouldn't happen when using the CMS-input-method instead as newer entries with the same name (and the same url) would be serialized (i.e. appended with -1, -2, -3, etc.).
I can imagine that more problems would arise that I cannot think of just yet and since the client's knowledge of the framework is also limited, how would they know other than by trial and error? 

An SQL "expert" with records with duplicate keys? That starts alarm bells ringing for me. Generally any SQL table should have a primary key which by definition must be unique for each record. Of course it's possible they have some sort of composite key based on multiple fields that needs combining to correspond to a unique page name in ProcessWire. There are certainly cases where I've build SQL tables with composite keys, with one scenario being many-many relationships, and that's something that Processwire doesn't handle too well, although there is a module that makes many-many type relationships possible, although it doesn't compare to what you can do with pure SQL.

Processwire does handle One-Many relationships fine via page reference fields or Pagetable fields.

If you really must have a direct relationship between SQL commands and table structure and your CMS, I actually wonder whether ProcessWire is your best option. I've been doing a bit of investigation of Directus which looks promising, although it's headless, so no templates for output like ProcessWire, just a REST API, from what I can see, no full text indexing, and being more of a direct SQL - CMS mapping,  it also lacks the hierarchical parent-child structure that ProcessWire handles so well, as it doesn't make any assumptions about what sort of data structures you have, whereas ProcessWire, while generally very un-opinionated does treat everything as a page in a page hierarchy. For websites, that's generally a pretty reasonable assumption, but if you really don't want that, and just want pure SQL tables then there are alternatives. While something like Directus will give you a direct SQL to CMS mapping, it won't fix bad SQL data, so if you've already started down the ProcessWire path, you want to be really sure it's worth the effort to change.

I come from a pure SQL background myself, and it only took me about 20 minutes of reading the ProcessWire documentation to understand how it works, so I don't think it should be hard for someone from an SQL background to adapt. Maybe there's room for a blog post or tutorial showing how to "do it this way in SQL" and ProcessWire equivalent along with what's different.

  • Like 3

Share this post


Link to post
Share on other sites
On 8/7/2020 at 7:55 PM, kongondo said:

I am curious about this as well Was it Joomla? Why did you decide to stop using Joomla? Why ProcessWire?

The Joomla site was done decades ago. Sure we could have updated and stayed with Joomla, but it often feels like an overkill for what the site requires, leaving you with removing what you don't need instead of adding what you need. Same goes for Wordpress. Processwire gives you a blank markup to build on. That was the selling point. And we had to re-build the entire markup since it's so outdated. 

Lister itself (included with any new PW installation) is quite helpful and comes close to what a non-relational database looks like. Lister Pro, as mentioned, if I can figure out how to bulk import as well, I guess would be the best of both worlds, and would come close to the way the client is used to handle the data. I don't think we're well advised to yet again change horses, Directus would probably bring other complications, no CMS is perfect after all.

  • Like 1

Share this post


Link to post
Share on other sites
1 hour ago, fruid said:

if I can figure out how to bulk import as well,

 This particular feature doesn't exist in ListerPro as far as I know if you want to discard that already. (although it is a great product in bunch of other aspects).

Share this post


Link to post
Share on other sites

I'm currently trying to get the hang of the module "Lister" (the included version) before I buy the ListerPro. I have to say, it seems a little bit buggy, doesn't always deliver results when it should.

Particularly the selector "has_parent" doesn't seem to work properly. I'm using year-numbers as page-titles (2020, 2019, 2018) for the grandparents.

It works when doing:

parent… | Title | equals | 2019

but not:

has_parent | equals | 2019 

also, I can't seem to filter subfields – I'm using custom fields for images and also Profields Table – I wonder if that will work with ListerPro.

Appreciate your help.

Share this post


Link to post
Share on other sites
1 hour ago, fruid said:

I'm currently trying to get the hang of the module "Lister" (the included version) before I buy the ListerPro. I have to say, it seems a little bit buggy, doesn't always deliver results when it should.

ListerPro extends the Lister class, so it's a superset of functionality of Lister, so any bugs in Lister probably also apply in ListerPro.

API reference for both is here: https://processwire.com/api/ref/process-page-lister/

I've found the upgrade to ListerPro well worth it, however if you're prepared to write your own modules some of the things it provides like saved lister layouts can easily be achieved with Lister, with minimal coding.

I've used ListerPro to export (to CSV), but not to import.

1 hour ago, fruid said:

has_parent | equals | 2019 

I wonder if the issue here is that has_parent is referring to the page id field rather than the title?

  • Like 1

Share this post


Link to post
Share on other sites
8 hours ago, Kiwi Chris said:

I wonder if the issue here is that has_parent is referring to the page id field rather than the title?

yes, I figured, not a bug after all.

Share this post


Link to post
Share on other sites

the client insists that PW's database structure is not relational because some fields are redundant. Now I'm in doubt, did I screw up somewhere? As mentioned above, I thought some problems might arise when re-using the same fields on different templates, maybe that's what this is about?

Also, now I'm mixing ProFields Table and ListerPro, not sure if that is wise, ProFields Table, to my understanding, simulates (for lack of a better word) a non-relational database but it really just is a field with subfields? If you mostly avoid the CMS-backend of the page and use ListerPro instead, what's the point in using ProFields Table? Are there any technical advantages? Because in terms of visualisation, everything's a table somewhere.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...