Jump to content

Processwire Forum Activity Tracker


netcarver
 Share

Recommended Posts

I've added a new sub-project to my PWGeeks site that turns the ephemeral activities shown in the PW forum's Online Users & All Activity pages into a live-updating, unified, timeline of events - both what users are replying to - and what they are viewing.  Whilst it is not built with Processwire, it is related and was an interesting little project to build. I thought I'd post about it here in case it's of use to anyone else.

You can find it here: https://activity.pwgeeks.com/

pwf-all-activity-list-header.thumb.png.30d4d65b25fbfe3ed7a895fd15f6077d.png

Clicking on a username drills down into their activity, but more on that later.

Unlike the usual All Activity view on the forum, this integrates consumption activities (viewing stuff) with production activities (posting/replying.)  Unfortunately, due to a limitation on what the forum lets non-logged in readers see, reactions to posts cannot be tracked at this time.

pwf-all-activity-list.thumb.png.84b24d8c1da2e97389819677bf1c7843.png

I initially thought this might be useful just to make the ephemeral viewing activity more obvious, but I'm now hoping that it can be turned into a tool to more easily help forum moderators deal with spammers (new joiners who quickly start posting). But that's yet to be proven.

If you don't wish your read/write activities to be traceable, you can login to the forum in anonymous mode.

Architecture

The architecture is split into a long-running ReactPHP Watcher Service, and the Index Generator code which creates the page you just viewed with the help of Caddy, PHP8.2 and PHP-FPM. It all runs on a cheap Contabo VPS. Both Caddy and the watcher process are defined as systemd services that are automatically restarted if they fail, or the box is rebooted.

332633881_ForumActivityMonitor.thumb.png.0a5596987961a53e8116fe5d7fc60353.png

Pusher is used to provide a Pub-Sub channel for immediate communication of changes found by the watcher service to anyone who is viewing the index pages - allowing the index to be updated as forum activity is detected by the watcher. Pusher takes care of any fan-out needed between the Watcher Service publishing the events, and any browsers subscribed, via Pusher-JS.

I used Pusher's free tier and created an "application" to get my channel and the needed credentials, which went in a .env file. I also turned on subscription counting on the channel within Pusher's dashboard to allow simple console logging of the number of clients connected to the channel at any one time.

All detected activity is also stored in a local SQLite DB which allows the Index Generator to build the initial table of activity shown in your browser. Once the page is loaded, JS events take over and continue populating the table in (almost) real time as they come in from Pusher.

The Watcher Service

ReactPHP is used to create a long running server process, in PHP, that has run for more than 2 months at a time with rock-solid (no, that's not a bernhard module) memory use at 10MB once all the user data is loaded from the DB. I am sure this process would have run indefinitely, but I recently restarted the service to add detection of write-activities on the forum.

The major Composer packages used are

  • pusher/pusher-php-server (to publish to my app's event channel)
  • vlucas/phpdotenv (to read the pusher credentials from an .env file)
  • react/event-loop (to run the PHP app indefinitely)
  • fabprot/goutte (for scraping the 2 forum pages)
  • symfony/console (for CLI output formatting and logging)

I would probably choose a different scraping library if I were to do this again, but goutte works just fine for now.

The watcher only uses two public forum pages; the Online Users page, with the logged in filter applied, and the All Activity page. It is worth noting that the service has no log-in details, so sees these pages as a logged-out visitor to the forums would - which means significantly less information is available to it than to you if you view those pages when you are logged in to the forum.

All Activities Page Differences

When logged in, the All Activities page shows user reactions to posts (2 below). These are not available to guests, so cannot be tracked by the Watcher Service.  If you have purchased Pro modules and have access to some of the VIP forums, then new posts or replies to posts in those forums are also shown (1)

pwf-all-activity-user.thumb.png.f91b2b37c2a872c9cc4956c55bab3ba8.png

Guests have no access to either of these - so VIP forum activities are not tracked. The Watcher Service does not have any login credentials, so this is the view it sees...

pwf-all-activity-guest.thumb.png.50cee581c11f9c395f3d31fbf43f430b.png

All activities listed on this page have a UTC timestamp in their HTML attributes that can be used to record the actual time of Joins, New topics and replies being posted.  I don't bother recording any "user started following..." activities.

Online Users (logged in)

You might not be aware of this page on the forum, but it's how the Watcher Service can tell who's viewing forums & posts, creating new topics, or using the personal messaging service.
If you have never visited this page on the forum, you want (1) browse, 2(Online Users) and then use the Filter-By dropdown to view logged-in users.

pwf-online-users.thumb.png.c1952702e78d7b51c30db862c84bd5ff.png

The activities listed (3) do not have a timestamp in the HTML - so the watcher limits itself to anything that happened "Just Now" that has not yet been recorded for that user and uses the server's time() to record the event as having occurred. When users view a VIP forum or post, or visit the All Activity page, the user list page does not show what the user is viewing - it just shows their activity as a blank string (See netcarver's activity in the above screenshot.)

Other Limitations

The main loop of the service runs several times a minute and scrapes and de-duplicates activities from the forum.  Any activity that happens on the forum between these samples are undetectable. So, if you visit a forum and then quickly click into a topic, and then back out, your activity will not be traceable.  

The Event Loop

Using ReactPHP is conceptually quite simple, but there are a few things to keep note of.

Here's the basics of the Watcher Service...

<?php declare(strict_types=1);

namespace Netcarver\ForumActivityMonitor;

require_once 'vendor/autoload.php';

use React\EventLoop\Factory;
...
use Pusher\Pusher;
use Dotenv\Dotenv;

require_once __DIR__ . '/.format.php';  // output formatting helpers
require_once __DIR__ . '/.storage.php'; // storage class

$dotenv = Dotenv::createImmutable(__DIR__);
$dotenv->load();

// Create pusher publication connection
$pusher = new Pusher(
    $_ENV['PUSHER_KEY'],
    ...
);

$sample_period_seconds = 20;
$started_ts = time();

$output = new ConsoleOutput();
$output->write("\n>>> ProcessWire Forum Activity Monitor (sampling every $sample_period_seconds seconds) <<<\n\n");


// Open the local SQLite DB for storage layer...
$db = new \PDO('sqlite:path/to/database.sqlite');
$storage = new Storage($db);


$loop = Factory::create();
$loop->addPeriodicTimer(1, function () use (&$storage, $started_ts, $sample_period_seconds, &$pusher, &$output) {

    $now = time();
    $can_access_pw_forum = ($now % $sample_period_seconds === 0);
    $elapsed_time_seconds = $now - $started_ts;

    showStatusLine($output, $can_access_pw_forum, $storage->userCount(), $elapsed_time_seconds);

    if ($can_access_pw_forum) {
        $client = new Client();

        try {
            // Scrape, dedupe & store events
            ...
            // publish events via pusher...
            if (!empty($event_timeline)) {
                ksort($event_timeline);
                $table = new Table($output);
                $table->setHeaders(['#', 'Time', 'UID', 'Username', 'Activity']);
                $mem_use = memory_get_usage(true);
                $runtime_str = formatElapsedTime($elapsed_time_seconds);
                
                $pusher_events = [];
                foreach ($event_timeline as $events_at_time) {
                    foreach ($events_at_time as $event) {
                        $uid = $event['uid'];
                        $user_activity_count = $storage->getUserActivityCount($uid);
                        $table->addRow([$user_activity_count, $event['time'], $uid, $event['user'], $event['activity']]);
                        $pusher_events[$event['time']][] = [
                            'uid'    => $uid,
                            'url'    => $event['url'],
                            'user'   => $event['user'],
                            'act'    => $event['activity'],
                            'mem'    => $mem_use,
                            'uptime' => $runtime_str,
                            'type'   => $event['type'],
                        ];
                    }
                }
                $pusher->trigger('activities', 'update', $pusher_events);

                $table->render();
                $output->write("\n");
            }
        }
        catch (\Throwable $e) {
            $output->write("\nCaught Throwable: ".$e->getMessage()."\n\n");
        }
    }
});

$loop->run();

Note that Pusher, the storage instance, and the console are all passed into the loop closure by reference so state can be maintained between each scheduled call to the loop function.

The loop closure uses a try {} catch (\Throwable) {} block to ensure it keeps running without systemd having to restart it in case of a PHP error.  The catch block does occasionally run - so far if DNS resolution fails when scraping the forum.

I've omitted the scraper and de-dupe code from the above as they are still a work in progress, but they populate an $event_timeline array if anything new is detected.

The unified array of events (if any) is published via Pusher and each entry includes information about the Service uptime and memory usage.  The Index page simply console logs this meta-data from the first event in the array of activities it receives, so you can use your browser's console to track these (along with the number of subscribers to the channel)...

pwf-console-metainfo.png.f2398e5fed1db9cc8aae4bd1b1658e8d.png

I made the event loop run every second, even if it's not time to sample the forum, so the CLI output can be updated regularly. This was especially useful when initially running the Watcher from the command line and I could probably drop the status updates now things are run via Systemd.

Running from the CLI on the server, or tailing the log file, gives a nicely formatted table of events as they occur thanks to Symfony's console library and table helper.

pwf-log-snapshot.thumb.png.6134f82ac6fba14b70749d79acdd8c62.png


Systemd Integration

To allow automatic restart of the watcher when the VPS is restarted, I added this service definition file to /etc/systemd/system/forumwatch.service that runs the watcher as an unprivilaged user...

[Unit]
Description=ReactPHP Processwire Forum Watcher

[Service]
ExecStart=/usr/bin/php8.2 /home/pwfw/activity.pwgeeks.com/watcher/react.php
WorkingDirectory=/home/pwfw/activity.pwgeeks.com/watcher/
StandardOutput=file:/var/log/pwforumwatch.log
StandardError=file:/var/log/pwforumwatch.log
Restart=always
User=pwfw
Group=pwfw

[Install]
WantedBy=multi-user.target

A quick sudo sytemctl enable forumwatch.service && sudo systemctl start forumwatch.service is then all that's needed to get things running.

As the output is logged to /var/log/pwforumwatch.log, I also gave it a logrotate.conf file to keep things under control.


The Index Generator

This is a single index.php file that takes care of reading the most recent user activities from the SQLite DB and generating the table of events from it for that point in time. JS is included (pusher-js) that subscribes to the application's activity channel.  Pusher's free tier allows up to 100 simultaneous connections to the event activity channel, and you can see how many users are connected via your browser's console.

NB: The following feature is now behind a basic auth login.

If you click on a user's name, you'll reload the page with a filter that lists only that user's activities. Here are Bernhard's as he has posted recently as I write this up.

pwf-bernhard-activity.thumb.png.90257e84f8f9d230bbe684ed7d8616ac.png
 
Click on the User's name or avatar (1) to be taken to their page on the forum, or click on the "Everyone" or PW logo (2) to be taken back to the all-inclusive index.

Trying It Out

I recommend opening the activity tracker in one browser on one side of your screen, then opening the Forum (and logging in) in another browser on the other side. As you (and anyone else) visits pages on the forum, you should see things update in the tracker. Also open up the console on the tracker page to see uptime/memory and viewing user count data as they come in.

If you read this far, thank you for your time, and I hope this was of some interest or use to you.

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

Hi @bernhard ReactPHP has been very solid for long running server processes in my experience. There's also Framework-X which is a nice layer on top, though I haven't used it much. And FrankenPHP - which I have never used, though it might allow long-running async stuff as well.

Sorry to make you feel tracked, I guess things like this that simply snapshot publicly available ephemeral info and turn it into a timeline are somewhat borderline. If people are worried about this I'm happy to take this service down, or put it behind basic auth so only the forum mods have access.

Basically, I want to get to the point where I can detect Join Events followed by new topic creation prior to it being published in order to get an early warning of possible spammers. This tool might be able to do that (time will tell.)

  • Like 2
Link to comment
Share on other sites

This is really neat. If the action relates to a topic, can the topic name be linked to the actual topic? (I saw people viewing topics that I hadn't read in a long time and recall it being interesting; currently have to go back to the forum and search for it).

Related to PWGeeks - is the "last updated" status the last time the related project was (determined to be) updated, or the last time PWGeeks checked the project's status? I ask because the very last page (279 as of right now) shows a last update time based on my current clock time. Ordering by "Recently Modified" shows FieldtypeFolderOptions on the first paginated page, but was last updated (Github) 4 years ago.

I love the PWGeeks project, btw. ❤️

image.thumb.png.793664ab3c734d6d550362ed24ce90ab.png

  • Thanks 1
Link to comment
Share on other sites

@BrendonKoz Thank you for the feedback, and I'm glad the PWGeeks Catalog is of use to you!

WRT. the activity tracker: whilst links to the forum topic could easily be added, I haven't yet done it and it was a deliberate decision I made at the time. I don't think I wanted to store the additional URL data in the SQLite DB, but can't recall exactly why now. As you say, it would be rather useful if they linked over to the actual posts on the forum. I'll take a look the next time I update this part of the site.

To answer your Q about the PWGeeks catalog: that timestamp is meant to be the last time github/gitlab/pw etc report the project as being updated - but looks like I have a bug there and it's just showing the last time the cached data was updated for just about everything.

  • Like 2
Link to comment
Share on other sites

@BrendonKoz Topic and forum links added to the activity list. Please reload your page to see them.

I still need to add detection for the links to new topics and replies - but that will be another day, as will looking at the update date on PWGeeks.

  • Like 1
Link to comment
Share on other sites

 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...