Crest Research - from WordPress to ProcessWire

millipedia · September 10, 2021

Ok - this write up got quite long so I reckon it counts as a case study.

Crest Research is a hub for academic articles and information about security research collated by the University of Lancaster and other universities in the UK.

Their old site had been running for several years on Word Press. There was a lot of content that wasn't brilliantly organised and there was lots of plugins that had been added to WP (honestly one of the things I like least about WP is that it's too easy for users to add plugins without really understanding the implications).

We persuaded the University that it would be much better to make the move to Processwire. No small part of that was being able to demonstrate that PW was a much better option from a security point of view. We also wanted to be able to develop an API that provides content to a native app we built for Crest a while back; that probably would have been doable in WP but much easier in PW.

This was our first reasonably large move from WP to PW so we learnt a lot on the way.

So - first step was to import all of the posts from WP. For this we headed to Adrian's ProcessMigrator module which worked well at getting the data over. Once we had the data over we used Wanze's ProcessBatcher module to do bulk updates and moves to try and organise things (including deleting a load of WP tags we didn't want to keep).

We found that we needed import certain things manually as well, in particular some thumbnail images. For these we just created an import script that read through a CSV of data that we'd dumped from the WP database. Honestly PW is just great at this - we had a column of page aliases and a column of image URLs and with about 10 lines of code we manged to download the images and add them to our imported pages.
We've used this method of a CSV and an import script on a couple of other projects where we've needed to load content from other platforms and it's been very straightforward and effective.

Once we had the content over there were a couple of other bits of functionality from the WP site that needed to replicate. One of these was a download manager. The old site kept statistics of the number of file downloads which we needed to replicate (and retain the old data) so we built a modue to handle that. This was pretty much our first foray into PW module development and Bernhard's blog on building admin modules was very useful.

We tried to remove as many WP short codes as possible but those we need to keep we replicated in the Hanna Code module.

The search on the new site was very important to the client - a lot of their targe audience is researchers and academics. We ended up with a system of filters (author, tag etc) together with the text matching operators that appeared in 3.0.160.

The client also asked if we could add some kind of fuzzy searching for misspelled words and US / UK spelling differences which we did by adding to the lemmas in Ryan's WireWordTools module. Our additions our available on GitHub. I think there's still plenty of refining to do on the search but it works well.

Another thing the client asked for was an indication of 'Reading Time' for an article as Medium have on their articles. We added a hook to calculate the reading time for an article when the page is saved. Can't seem to find the blogpost on Medium where they explained their formula but I've stuck the code we ended up with up on GitHub as a gist here.

Other modules we used include:

AIOM+ - this was before we got a license for ProCache. We used AIOM and then some hooks to generate cached versions of some chunks of html using MarkupCache. Probably wouldn't bother now and just use ProCache.

Redirects - we tried to keep the site structure the same as the old site for SEO, but there was quite a lot of organising. We grabbed the top few hundred pages from Google Analytics and then ran those through a PHP script to check their status on the dev site (gist of that script on GitHub as well ). We dumped those results into a spreadsheet and decided where they needed to be directed to. Then we imported that list back in the Redirects module.

Other honourable mentions go to Connect Page Fields, Page Field Edit Links, Dashboard, Schedule Pages and of course Tracy Debugger (which was particularly useful on this project).

So... I'm sure you're asking (assuming anyone has made it this far) what the end site was like and whether the client was happy?

Well comparing the old and new sites in Lighthouse gave us this:

And Mozilla Observatory gave these rather nice results (especially since it's a security focused site)

This resulted in a big upswing in traffic. We're seeing about a 500% increase in vistors compared to this time last year (and from pretty good numbers in the first place).

IMO the biggest factor in this increase was the improved page speeds.

Now - of course we probably could have got similar results in WordPress if we'd spent enough time and energy on the site but by using PW we've ended up with a much cleaner site which the client is happy to use. Logging into the old site with it's upselling of plugins and so on is just painful. We've also educated the client as to why adding random plugins is not a good thing; the old site loaded 18 javascript files from various sites most of which we didn't know anything about - we have 3 now (and one of those is analytics which we tried to persuade them to lose.).

Anyway - they're happy and we've got plans to keep developing the site over the next couple of years so hopefully it's just going to keep getting better.

s.

Zeka · September 10, 2021

Hi @millipedia.

Nice write-up.

How did you get forcing of PDF downloading?
Recently I was working on the project where they had a large amount of science articles in PDFs and they wanted to force downloading of these files with settings custom file names (user provided).
There is the code that I use, but for some reason some of PDFs files still open 'inline' in Chrome.

$wire->addHook('/download/{pageid}/{basename}/', function ($event) {
  $page = $this->wire()->pages->get($event->pageid);

  if ($page->id) {
    $file = $page->filesManager->getFile($event->basename);
    $filename = $file->description ? $file->description . '.' . $file->ext : $file->basename;

    wireSendFile($file->filename, [
      'downloadFilename' => $filename,
      'forceDownload'    => true,
      'exit'             => true,
    ]);
  } else {
    wire404();
  }
});

millipedia · September 10, 2021

50 minutes ago, Zeka said:

How did you get forcing of PDF downloading?

Honestly I didn't even try. I'm literally using

// download file.
wireSendFile($filepath);

in my download script, so no options at all.

I don't know the ins and outs of wireSendFile but in the past when I've had to implement my own downloads then I've had issues using "application/octet-stream" for large pdf files and had to switch to redirects. Does Chrome consistently open all your files inline whatever the size?

But hey, try taking out the options and see what happen...

Chris-PW · September 12, 2021

On 9/10/2021 at 2:18 PM, Zeka said:
Hi @millipedia.

Nice write-up.

How did you get forcing of PDF downloading?
Recently I was working on the project where they had a large amount of science articles in PDFs and they wanted to force downloading of these files with settings custom file names (user provided).
There is the code that I use, but for some reason some of PDFs files still open 'inline' in Chrome.
$wire->addHook('/download/{pageid}/{basename}/', function ($event) {
  $page = $this->wire()->pages->get($event->pageid);

  if ($page->id) {
    $file = $page->filesManager->getFile($event->basename);
    $filename = $file->description ? $file->description . '.' . $file->ext : $file->basename;

    wireSendFile($file->filename, [
      'downloadFilename' => $filename,
      'forceDownload'    => true,
      'exit'             => true,
    ]);
  } else {
    wire404();
  }
});

Hello, I'm not sure if that's what you mean. But you can specify a download of files directly as an attribute in the HTML.

Use the a download attribute:

<a href="/OldName.pdf" download="NewName"> Download PDF</a>

(The value "Filename" is optional.)

eelkenet · September 13, 2021

On 9/12/2021 at 11:31 AM, Chris-PW said:

<a href="/OldName.pdf" download="NewName"> Download PDF</a>

(The value "Filename" is optional.)

Note: if you want to force a download but don't want to change the downloaded filename, you should omit the value of the download attribute like so: <a href="yourfile.pdf" download>link</a>

Sign In

Crest Research - from WordPress to ProcessWire

Recommended Posts

millipedia

Zeka

millipedia

Chris-PW

eelkenet

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

Activity

My Activity Streams

Support

Store

My Details