Jump to content
dfunk006

Graph databases and the future of processwire

Recommended Posts

Hi,

Processwire makes efficient use of mysql - every field is a table and field-specific tables are joined on demand. However, joins are used nonetheless, especially in performing queries for $pages->find("selector") and related db-querying selector functions.

So queries for $pages->find(field1=value1, field2=value2, field3=value3) will join field_field1, field_field2 and field_field3 tables. In this case, does it make sense to use graph databases (such as neo4j) compared to mysql?

With data being connected now more than ever and with all the hype around graph databases and neo4j, will we see a potential support for graph databases in future releases of processwire?

Share this post


Link to post
Share on other sites

There isn't even support for things like sqlite oder postgree, even though there where requests for it, so I'll doubt others will be adopted in a near future. One has to keep in mind, ProcessWire is developed by Ryan only. He really has to put his work into the meat and by now it really does not make sense for a cms to support an other database than mysql if there isn't the time to support multiple databases.

  • Like 3

Share this post


Link to post
Share on other sites

Processwire is not the usual cms but more like an engine, so,

the future of processwire is you, how you are going to use it.

  • Like 1

Share this post


Link to post
Share on other sites

At this point it's a bit like asking to drop php in favor of go. PW is a php/(my)sql framework by nature, and that's what it will continue to be. I would love to see other projects assumidely inspired by pw but built in other systems popping up.

  • Like 4

Share this post


Link to post
Share on other sites

At this point it's a bit like asking to drop php in favor of go. PW is a php/(my)sql framework by nature, and that's what it will continue to be. I would love to see other projects assumidely inspired by pw but built in other systems popping up, though.

I think the Analogy is wrong, Go is a concurrent language which is nothing compared to PHP, however I don't see the need for SQLite but that's my opinion, so prob a database wrapper to allow others write their own drivers or their various databases. something in the line of PDO

Share this post


Link to post
Share on other sites

At this point it's a bit like asking to drop php in favor of go.

That's an interesting analogy, although it's not quite the same. Graph databases can co-exist with mysql (a lot of people are using it that way), so its only an extension (maybe a core module) rather than a complete re-work.

I'm not sure if anyone has seen neo4j's model. It's exactly like processwire! (http://neo4j.com/developer/guide-importing-data-and-etl/)

Check out the part where it talks about the Graph Model. It says:

  • A row is a node
  • A table name is a label name

Now this sounds familiar, doesn't it? In processwire's model:

  • A row is a Page
  • A table name is a field name

Whoa! Looks like Ryan built a graph model in a relational database - absolutely brilliant!

And that's not it, you can use Page fields to create connections between pages (much like creating connections between nodes to form a graph). 

Also, it's interesting to see the php implementation of neo4j, which also, coincidently, is very similar to processwire's API calls (https://github.com/neo4j-contrib/developer-resources/blob/gh-pages/language-guides/php/neo4jphp/index.php).

$app->get('/graph', function (Request $request) use ($neo4j) {
	$limit = (integer)$request->get('limit', 50);
	$queryTemplate = <<<QUERY
MATCH (m:Movie)<-[:ACTED_IN]-(a:Person)
 RETURN m.title as movie, collect(a.name) as cast
 LIMIT {limit}
QUERY;
	$cypher = new Query($neo4j, $queryTemplate, array('limit'=>$limit));
	$results = $cypher->getResultSet();
	$actors = [];
	$nodes = [];
	$rels = [];
	foreach ($results as $result) {
		$target = count($nodes);
		$nodes[] = array('title' => $result['movie'], 'label' => 'movie');
		foreach ($result['cast'] as $name) {
			if (!isset($actors[$name])) {
				$actors[$name] = count($nodes);
				$nodes[] = array('title' => $name, 'label' => 'actor');
			}
			$rels[] = array('source' => $actors[$name], 'target' => $target);
		}
	}
	return json_encode(array(
		'nodes' => $nodes,
		'links' => $rels,
	));
}); 

If you notice, there's a cypher query which stores the results in a result array. Now, if we were to query a graph database and store the results in a PageArray, everything from there could be done processwire way (with access to processwire APIs).

Graph Database seems like a natural fit to me for PW. Whether it's worth it or not is up for debate. Maybe there's performance improvements, maybe more flexibility or maybe just an alternative to mysql - i dunno. I pointed out a use case in my earlier post where graph db could be more useful, but I'm not sure - hence the discussion.

Thoughts?

  • Like 1

Share this post


Link to post
Share on other sites

The thing is: Ryan most likely won't rebuild something that's already working really great. I think there are really more important features on the roadmap. This should not hold somebody else of building a wrapper for other databases, I'm just talking about a core implementation. 

  • Like 1

Share this post


Link to post
Share on other sites

Yeah in terms of request this is not a big priority. alot of things in the Pipeline also out of curiosity is it possible to Contribute codes to Processwire, I think it'd be much easier and relieve Ryan a lot of stress. 

Share this post


Link to post
Share on other sites

Guys, that was not meant to be an analogy between php/(other languages) vs sql/(other DBs). What I meant is that PW is built to work with PHP and MySQL, and it does it very well. I don't think Ryan felt at any time that these two are limiting the functionality of PW, and as a big bonus, these are the easiest conditions to find in any hosting. IF there was an analogy, than that's the one I used.
 
 

Yeah in terms of request this is not a big priority.

 
Exactly! I think we all agree with this.

 

alot of things in the Pipeline also out of curiosity is it possible to Contribute codes to Processwire, I think it'd be much easier and relieve Ryan a lot of stress.

At this point, Ryan filters and rewrites any piece of code that enters the core. Until now he's proved that this is the best way of keeping the reliability of the code https://processwire.com/talk/topic/9254-dev-and-master-versions/

  • Like 2

Share this post


Link to post
Share on other sites

The thing is: Ryan most likely won't rebuild something that's already working really great. I think there are really more important features on the roadmap. This should not hold somebody else of building a wrapper for other databases, I'm just talking about a core implementation. 

I wasn't asking for this to be built by Ryan. If I wanted to request this feature for the roadmap, I would have posted in the respective forum. I just wanted to discuss if there was any value to building something like this or not.

Share this post


Link to post
Share on other sites

The thing is: Ryan most likely won't rebuild something that's already working really great.

I vote for that. For expansions, we should concentrate on api, php, modules

and tuto's, not the core. At least that's what I hope for the future.

Share this post


Link to post
Share on other sites

AFAIK there are some efforts being made to be PDO compliant to support other database servers, but there are many challenges. PW is highly optimized for MySQL, so it would probably not be something you can expect in the near future. As for neo4j especially, unless there are PDO drivers in the works, don't put any hope on this.

Share this post


Link to post
Share on other sites

AFAIK there are some efforts being made to be PDO compliant to support other database servers, but there are many challenges. PW is highly optimized for MySQL, so it would probably not be something you can expect in the near future. As for neo4j especially, unless there are PDO drivers in the works, don't put any hope on this.

There are, in fact, PDO drivers available for neo4j - https://github.com/jakewins/neo4j-pdo

Share this post


Link to post
Share on other sites

Interesting timing: I was just about to ask the same question the OP did a few weeks ago  :)

TL;DR Processwire can be looked at as a frontend to a network of pages with a semi-arbitrary selection of fields, so using a schema-less graph database instead of an RDBMS (if it can provide for all the other necessary features) would naturally mean less impedance mismatch to compensate for (that is, simpler code and easier implementation), it would allow for superior selectors (no need for intermediate PageArrays and foreach and such), and it would come with an increase in performance. Therefore, if Neo4j keeps gaining popularity and becomes more available as a choice, I believe it would be the right choice to support it as a store. In the long term. Not as a priority. Though I would love to see it tomorrow =)

 

PW is a php/(my)sql framework by nature, and that's what it will continue to be. I would love to see other projects assumidely inspired by pw but built in other systems popping up.

I understand my opinion about what's natural for PW is slightly less important than somebody else's with thousands of posts here on the forum =)
Still, let me try to challenge it.
 
I always considered PW's main strength its API and built-in admin interface, not the underlying data store. MySQL, of course, is an obvious choice considering its availability, and that one can force almost anything into an RDBMS with enough magic. However...
 
Looking at how PW uses MySQL, I wouldn't call it a SQL framework by nature. I understand the analogy was more about the ubiquity of MySQL than about a supposed inherent superiority, but I think it's interesting to have a look at why SQL is not the best fit. PW uses MySQL perfectly to do the job, but it does it in ways that go against traditional RDBMS principles; I would say it's a good sign that it's not an RDBMS problem by nature. Examples:

  • templates don't have their own tables in which they would store fields that have a 1-to-1 relationship to the PK,
  • instead, fields are always stored in key/value stores (that happen to be implemented in a certain flavor of SQL),
  • much of the querying is done using textual search in automatically populated columns that describe structural relationships, instead of key lookups,
  • it's fundamentally hierarchical, and page fields extend this structure to a full directed graph.

The above points summarized: Processwire is a frontend to a graph database implemented in MySQL.

MySQL is a good choice because it is available and usable, not because it (or any other kind of RDBMS) is naturally the best choice for PW.
 

[...] PW is highly optimized for MySQL, [...]

The reason I was thinking about Neo4j is exactly because there is no need for any such optimization: Neo4j is a natural fit for the PW use case "as is," so why not make one of those "other projects [...] inspired by pw" a flavor of PW itself?

  • PW selectors are trivial to implement in Neo4j because filters, as probably all other PW concepts, correspond to native Neo4j concepts and Cypher ("Neo4j SQL") statements. 

    Let's assume a "student" has "student_in_course" descendants, each of which store things like grades and the like, and also use a page field to refer to the actual "course" record (page) itself:

    • let's find the actual course pages for a student:
      • $courses = new PageArray;
        foreach($student
        ->find('template=student_in_course') as $student_in_course)
          $courses->add($student_in_course->course);
      • Cypher query: MATCH (:Student{id:$student_id})-[:IS_IN_COURSE]->(x:Course) RETURN x;
        • Note that Neo4j connections, while otherwise analogous to page fields and parent/child relationships in PW, can also have properties, just as nodes can. So, the "IS_IN_COURSE" connection itself can (unlike a page field in PW could) store the "percent", "grade", "passed" properties we need. Basically, Neo4j connections are something like the Table module from ProFields and the PageFields in one. When you use it as a "pure" page field, you don't add properties. When you use it as a Table row, as you MUST have a destination for an edge, you'd direct it to a (let's say) special NULL page to signal the destination is irrelevant; but this is just an implementation detail for our not-yet-existent Neo4j port =P
    • let's find other students who are in any of the same courses that our student is in:
      • $other_students_in_the_same_courses = new PageArray;
        foreach($student->find('template=student_in_course') as $student_in_course)
          foreach($pages->find("template=student_course,course={$student_in_course->course}") as $other_student_in_course)
           $other_students_in_the_same_courses->add($other_student_in_course->closest('template=student'));
      • Cypher query:
        MATCH (:Student{id:$student_id})-[:IS_IN_COURSE]->(c:Course),
                             (x:Student)-[:IS_IN_COURSE]->( c )
        RETURN x;
      • more concisely: MATCH (:Student{id:$student_id})-[:IS_IN_COURSE]->(:Course)<-[:IS_IN_COURSE]-(x:Student) RETURN x;
    • other students who took courses by any of the professors that we did (forgive me for not writing the php):
      • MATCH (:Student(id:$student_id})-[:IS_IN_COURSE]->(:Course)<-[:TEACHES]-(:Professor)-[:TEACHES]->(:Course)<-[:IS_IN_COURSE]-(x:Student) RETURN x;
      • if we know there are no other kinds of connections between students, courses, and professors, we can simply write:
        MATCH (:Student{id:$student_id})->(:Professor)<-(x:Student) RETURN x;
      • list just the names:
        MATCH (:Student{id:$student_id})->(p:Professor)<-(x:Student) RETURN p.name, x.name;
      • sorry i'm not fluent in Cypher so i didn't try, but i know it's just a few more words to eliminate duplicates and self from the results
  • parent/child relationships and page fields directly correspond to connections (edges) in Neo4j,
  • traversing/matching these connections is a native (extremely fast) operation, unlike the id and path based matching/joining that's necessary with an RDBMS,
  • it's schema-less, so there's no need to worry about how to store fields (e.g. in separate tables),
  • similarly no more need for an autojoin-like option to guess and explicitly fine-tune what we think would be frequently required, yet staying safe from extensive/expensive joins,
  • while otherwise schema-less, Neo4j can have uniqueness constraints, so things like unique page names within a parent can be easily ensured,
  • filtering against aggregates (e.g. "children.count") is possible just as with MySQL, except more flexible,
  • textual properties can be queried using full-text search, just as in MySQL.

I admit MySQL has these distinct advantages:

  • it's already done,
  • it's available anywhere and everywhere.

I love PW the way it is; it's ridiculously amazing compared to anything else I've seen before. The only reason I started wishing for a Neo4j-based version is because I'm using PW for something it was probably not designed in the mind with =) It's a database that utilizes page fields with custom PHP code way more than I would consider it healthy, and I got to where I need to run several queries to collect certain kinds of parents, then use intersections from some of those resulting arrays, then in turn collect items based no queries on the items from those, and so on. If PW had Neo4j under the hood, we would have access to selectors that would render custom PHP for page fields all but unnecessary, and my messed up query would be so very simple, too... All at the "expense" of a simpler database layer =P Therefore, please forgive me if I can't but dream about being able to use Neo4j's capabilities from PW selectors =)

  • Like 3

Share this post


Link to post
Share on other sites

I used neo4j for two projects in my studies. One of them was to visualize the GitHub network with User/Repository nodes and their connections over edges like "follows", "stars", "contributes", "owns" etc. I do like neo4j and the cypher query syntax and it was the right choice for this project.

However, for ProcessWire I don't see any advantage in using this database, simply because 99% of the apps written in Pw are fine with the tree. With page references you have endless power in modeling other relations than parent-child. For me, a graph database makes sense if you need queries like this:

  • Shortest path between two nodes
  • Get me all common nodes over at most 4 hubs
  • Fire a query from a known start node and return an (unknown) amount of target nodes of type X over edges Y
  • ...

If the underlying data structure is a graph and I must answer those type of questions, I would not choose ProcessWire to handle the job.

Another problem of neo4j is that it's not that fast, at least in my experience. And they recommend 16-32 GB RAM, this is simply too much for a "normal" web app :)

Cheers

  • Like 2

Share this post


Link to post
Share on other sites

If the underlying data structure is a graph and I must answer those type of questions, I would not choose ProcessWire to handle the job.

It hindsight, yes. The thing is, PW does almost everything my app needs, it breaks down only at one particularly messy task. I wish it wasn't so good, then I could've resisted the temptation to use it for this project =P

Share this post


Link to post
Share on other sites

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Juergen
      Hello @ all,
      I am creating a new inputfield/fieldtype to store opening hours, but I am struggeling to save values from multiple dynamic created inputfields in 1 column of the database.
      Scenario:
      The user can enter one or more opening times per day in a UI.
      Fe:
      Monday open from 08:00 to 12:00 and from 14:00 to 17:00 Tuesday open from 08:00 to 12:00 and from 14:00 to 19:00 and so on
      Via a little JavaScript you can add as much opening times as you need per day - the additional inputfield will be created dynamically.
      After form submission all the values are in the POST array -> this works (see example below):
      ProcessWire\WireInputData Object ( [openinghours_mo-0-start] => 09:00 [openinghours_mo-0-finish] => 13:00 [openinghours_mo-1-start] => 14:00 [openinghours_mo-1-finish] => 18:00 [openinghours_mo-2-start] => 21:00 [openinghours_mo-2-finish] => 23:00 [openinghours_tu-0-start] => 09:00 [openinghours_tu-0-finish] => 13:00 [openinghours_tu-1-start] => 14:00 [openinghours_tu-1-finish] => 18:00 [openinghours_we-0-start] => 09:00 [openinghours_we-0-finish] => 13:00 [openinghours_we-1-start] => 14:00 [openinghours_we-1-finish] => 18:00 [openinghours_th-0-start] => 09:00 [openinghours_th-0-finish] => 13:00 [openinghours_th-1-start] => 14:00 [openinghours_th-1-finish] => 18:00 [openinghours_fr-0-start] => 09:00 [openinghours_fr-0-finish] => 13:00 [openinghours_fr-1-start] => 14:00 [openinghours_fr-1-finish] => 18:00 [openinghours_sa-0-start] => [openinghours_sa-0-finish] => [openinghours_so-0-start] => [openinghours_so-0-finish] => ) The property name is always the name attribute of the field 😉 . If the property is empty means closed on that day.
      Now I need to combine all those values into 1 array (or json array) and store it in the database in 1 column called 'hours' in my case (see screenshot below):

      In my ___processInput(WireInputData $input) method I have tried to make it work like this:
      public function ___processInput(WireInputData $input): self { $name = $this->attr('name'); $value = $this->attr('value'); //input object includes always every input on the page, so lets filter out only inputs from this field //we need to do this, because the number of values is variable - so extract only values that starts with $name.'_' $nameAttributes = []; foreach($input as $key=>$value){ if(substr($key, 0, strlen($name.'_')) === $name.'_'){ $nameAttributes[$key] = $value; } } // loop through all inputfields of this fieldtype $time_values = []; foreach($nameAttributes as $nameAttr => $value) { $time_values[$nameAttr] = $value; } } //save it in the database $input->set('hours', serialize($time_values)); return $this; } The only important part of this code is the last part with the serialize function.
      After saving it will create a record in the database, but the value is always NULL (default value) (see below).

      Checking $time_values returns all the values, but printing out "$this" shows me that the property "hours" inside the Openinghours object is empty (see below) - so the mistake must be there, but I dont know where?!?!?!?
      [title] => Home [openinghours] => ProcessWire\OpeningHours Object ( [data] => Array ( [hours] => ) ) If I check the sleepValue() method or the sanitizeValue() - they are also empty. So it seems that the values will not reach these methods. I havent found a clear documentation of whats going on behind the saving process of an inputfield.
      As far as I know the saving process starts with the form submission. The values are in the POST array and will be processed by the processInput() method. Before they will be saved in the database they will be sanitized by the sanitizeValue() mehtod and afterwards they will be prepared for storage in the sleepValue() method.  The last step is the storage itself.
      Has someone an idea what is missing by storing values from multiple fields into 1 database column or has someone a working example of such a scenario on github to help me out.
      A clear explanation of the storage process will be also helpful.
      Thanks and best regards
    • By Pip
      Hi Everyone 
      I've been working on Processwire for two months now. Structuring the website as needed. Unsure why but I'm getting this one now. Seems my fields has crashed. 
      I've tried googling some answers but can't seem to find a step by step guide on how to rectify this. Any advise? Practically new on this. 
      TIA. 

    • By Rodd
      Hi everyone!
      I have a website in a production environment and I want to duplicate it in a local environment. I exported the content of the website (with the 'Site Profile Exporter' module) but I cannot use it actually. I've got an issue with the database. I imported this one in MAMP then.

      I also exported the pages (with the 'ProcessPagesExportImport' module), but I cannot import it to my local website because the fields don't exist. So I created this fields, but I have this error :
      How can I use the elements that already exist and are presents in my database? How can I duplicate correctly the templates, fields and pages?
      Thanks by advance
      PS: Sorry if my english is bad
       
    • By jds43
      Hello,
      Does anyone have experience with migrating content from Django to Processwire? Or are there any suggestions for achieving this?
    • By DooM
      Hello guys,
      I'm trying to figure out how to sync fields and templates between staging and production environments.
      I've found Migrations module by Lostkobrakai, but with use of it all the fields and templates must be created by API, which is kind of uncomfortable.
      I also tried ProcessDatabaseBackups module which can export only certain tables, but I don't think it's the best practice to do that.
      How do you guys solve this problem? It's very annoying to setup everything three times (dev, staging, production).
      Thanks a lot :)
×
×
  • Create New...