Jump to content

Recommended Posts

Posted

This week we've got a big batch of issue fixes on the core dev branch (about 18 fixes), most related to GitHub issue reports. I've also been catching up on some client work this week, so not as many updates as in the last few weeks. 

As I use more AI in my work, I've been building up a real desire to understand how it works. To me it seems like magic. So recently I started building a language model in PHP. Admittedly PHP isn't the ideal language for building such a thing, but this is really about learning, and so I thought I should use the language I know best. Using something like PyTorch in Python would certainly make things much easier, but would also abstract away a lot of the understanding I want to get out of the project.

The new model (and future ProcessWire module) is called Rambler. Currently it rambles on, often incoherently, hence the name Rambler. Though it'll get more coherent as time goes on, no doubt (hopefully!). Rambler uses zero machine learning libraries, no black boxes, and in fact has no dependencies at all. It's just pure PHP implementing the same mathematical foundations that power modern AI systems like GPT, at least that's the goal. 

I'm writing all the code myself, but I do have Claude as my teacher, describing each step, teaching me the concepts and terminology, and telling me what I need to code. After I code each part, he looks at it and tells me what I did right and wrong. It's a slow process but I learn by doing, and so it's also a lot of fun. It certainly helps to have an infinitely patient teacher. 
  
Currently the project runs two models side by side (for comparison): 

  • A Markov n-gram model, which is a classical statistical approach that predicts the next word based on how often word sequences appeared in training data. 
     
  • A neural network model that learns distributed word representations called embeddings. The neural model passes those embeddings through a hidden layer with ReLU activation, then predicts the next word using a softmax output. As far as I understand it, these are the same core building blocks used in early neural language models.

The most time consuming part of doing this in PHP is the training. Python has libraries and functions that handle a lot of the hardcore math in ways that can take advantage of the hardware, like GPUs. But this is not the case with PHP, so training uses the CPU, and a lot of it!

As far as the training system goes, it uses mini-batch gradient descent with backpropagation. The model makes predictions, measures how wrong it was (which is the "loss"), and then works backwards through the network, computing gradients to adjust every weight in the right direction. 

Rambler also includes two tokenizers: A word-level tokenizer, and a BPE (byte pair encoding) tokenizer. BPE is the subword strategy that is used by GPT, Claude and other modern LLMs. But for the small scale that I'm working at, the word-level tokenizer works faster, so far.  
  
The next milestone is adding an attention mechanism in a RamblerTransformer subclass. This attention mechanism (a transformer) is the core innovation behind a lot of modern LLMs. I'm hoping to get started on that part this weekend. 

Beyond being a learning exercise, the longer-term goal is to train it on all of the ProcessWire documentation (which is what they call a "corpus" in this context) and release it on GitHub as a learning resource, and a PW module. Perhaps someday it'll be a tool in ProcessWire, or at least a really smart search engine for ProcessWire, we'll see. 

As far as I could tell, there aren't any other PHP-based language models that use the same technologies used by modern LLMs, so I figured, why not. I want to understand how they work under the hood without wading through Python frameworks, and I'm sure others do too. Once I get a little farther along with it, I look forward to getting it up on GitHub as a standalone project, but also as a ProcessWire module. 

The slowness of the training process (the model, not me... well okay, probably me too) is the hard part. I'm currently running a 30 hour training on all the text from a book. When the project is finished, I'm likely going to have Claude or Codex do a translation of the training code into C, that takes advantage of the much faster math capabilities available there. From what I understand, a 30 hour training in PHP will take about 30 minutes in C. I don't know if that's accurate or not, but it sounds good enough that I'm going to find out. 🙂

Thanks for reading and have a great weekend!

  • Like 8
  • Thanks 4
Posted

I love that rather than just consuming AI, you're trying to learn how it actually works.

Speaking of the translation of code - I wonder if AI would make it easy to provide a PostgreSQL database layer for ProcessWire? 

Also speaking of C, I wonder about translating the entire ProcessWire codebase into Rust. (It would probably be expensive in terms of token use) ProcessWire is fast as PHP apps go, but Rust is blindingly fast, but harder to learn than PHP. It's also memory safe whereas C isn't. 

It might not work, as PHP is interpreted so it makes it easy to deploy individual files, whereas Rust compiles to a single binary executable, so it's probably like comparing apples with pumpkins, but all kinds of things ight be possible of you don't have to painstakingly write all the code by hand.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...