Jump to content
ryan

Module: Page Link Abstractor

Recommended Posts

Page Link Abstractor module for ProcessWire 2

Plugin module that lets you move pages in your site without worry of ever breaking static links on other pages.

Download at:

https://github.com/ryancramerdesign/PW2-PageLinkAbstractor

What it does


Converts links in textarea/rich-text fields to an abstract format for storage, and converts them back at runtime. This means that if you move a page and another page is linking to it, the link won't be broken.  It also means you can move your site from subdirectory to root (or the opposite) and not break links you may have created in your textarea fields. This applies to any kind of links: pages, files, images, etc.

This module will also notify you when you edit a page that has a link to a page that doesn't exist or is in the trash.

This module has been tested with ProcessWire 2.1 but should also work with 2.0.

Since this module has not yet had a lot of testing, it should be considered "beta" and use is at your own risk. Please let me know of any issues or bugs you run into.

How to install


1. Download the PageLinkAbstractor.module file from https://github.com/ryancramerdesign/PW2-PageLinkAbstractor and place it in: /site/modules/

2. In the admin control panel, go to Modules. At the bottom of the screen, click the "Check for New Modules" button.

3. Now scroll to the PageLinkAbstractor module and click "Install".

4. Edit any of your textarea fields in Setup > Fields. You'll see a new configuration option to enable this module for that field.

How it works


This is a technical explanation for how this module works for those that are interested. Reading this is not required in order to use this module, but it may help you to use it more effectively.

When you save a page that has a textarea field with this module enabled, it will look for URLs in HTML attributes by looking for an equals sign followed by a URL. It replaces instances of your site's root URL with a special tag: {~root_url}

Next it checks to see if any of the URLs it found can be loaded as pages in your site. If so, it replaces those URLs with this special tag: {~page_123_url} where "123" is the Page's ID.

When the page is loaded, ProcessWire does the opposite and converts those special tags back to their URLs. Because the URLs were abstracted to tags that are generated at runtime, when a page (or a site) is moved, no links are broken.

Note that this module only converts URLs to tags when you save a page, so it only affects pages saved after the module is installed.

Where to use it


This would be most useful on your main 'body' field that uses a rich text editor (like TinyMCE).

Where not to use it


There is some overhead in using this module that will be insignificant if you use it carefully. Here are a few instances to avoid using it:

Avoid use on fields that have the 'autojoin' option on, unless your site doesn't load lots of pages in a given request.

Don't use on textarea fields that can contain anonymous (guest) user input.

Avoid use on fields that aren't likely to contain links to local site pages in HTML markup. No need to have this module parsing things unnecessarily.

Avoid use on fields where you think you might disable it later. Once disabled, the abstract tags representing the URLs will still be in place. If the module is disabled, those tags will no longer be converted to URLs are runtime. You would have to correct them manually by editing the page.

Side benefit


The tags that this module abstracts to are intentionally fulltext indexable, so you can perform searches for these tags.

This means that you can find  all pages linking to another by searching for it (minus the brackets and "~"). For example:

$links = $pages->find("body~=page_123_url");

That would return all [viewable and visible] pages linking to page ID 123.

Please note


In order to convert URLs for pages, this module needs to load those pages in order to obtain their URL. If you are linking to a hundred pages in your 'body' field, you should expect that it may slow down the page 'load' and page 'save' time for pages containing lots of links.

This module doesn't yet abstract local URLs that have a schema/protocol and domain in it. It just works with path-type links like /path/to/page/ and not http://domain.com/path/to/page/.

This module hasn't yet been tested with migrating a site from subdirectory to root, but I will be testing this soon.

  • Like 3

Share this post


Link to post
Share on other sites

TL;DR so far, but it sounds awesome. I'll read this & comment more when I care again. Good job!

Share this post


Link to post
Share on other sites

Yes, this sounds excellent. I think broken on-site links is one of the major pain points in many cms-software, at least when doing structural changes to sites. Of course this is not so big issue when site is well build and cms support clean relations (like PW does), but still excellent addition.

I couldn't find the topic where this was discussed months ago - it would be nice to link it here if anyone remembers that topic?

Share this post


Link to post
Share on other sites
I couldn't find the topic where this was discussed months ago - it would be nice to link it here if anyone remembers that topic?

I've been hunting for it too and can't find it. Now I'm wondering if this is something we were talking about via email rather than in the forum. I'll keep an eye out for it.

Share this post


Link to post
Share on other sites

Great--glad you found it. Thanks for updating it.

Share this post


Link to post
Share on other sites

I just began linking bits of a site together so thought I should check out this module. It's great as it's very similar to how I've worked in another CMS, plus I do like the added bonus of being able to find out what other pages link to a given page which could be useful on larger sites where there might be a possibility of pages being deleted and causing broken links.

Perhaps that might be a good addition to this module - a quick check on page delete to see if there are other pages linking to it first so the admin can then go and remove the links afterwards?

Share this post


Link to post
Share on other sites

I think this would be possible with the way it's setup. Something to plan for the next version.

Share this post


Link to post
Share on other sites

Thanks Ryan! This will come in handy when building my manual since each page will contain tons of links to other manual pages for additional reading.

The description above only seems to mention page moves, but I'd like to point out that this module also prevents link breakage when you change the URL of a page. It automatically fetches the new URL.

Share this post


Link to post
Share on other sites
The description above only seems to mention page moves, but I'd like to point out that this module also prevents link breakage when you change the URL of a page. It automatically fetches the new URL.

ProcessWire will also handle this for you automatically (with redirects) if you install the Page Path History module (already included with the core).

Share this post


Link to post
Share on other sites

Works like a charm with plain HTML input or WYSIWYG!

Though, It would be a killer if the plugin could detect Markdown link syntax as well! I could try to extend the plugin by myself, but surely Ryan has got better black magic Regex knowledge …

Here is [a question of love](/faq/misc/what-is-love) and you can convert it to an ID.

  • Like 1

Share this post


Link to post
Share on other sites

There is a little overhead in using this module since it has to parse through the abstractions and translate them to/from URLs. There's also the aspect of being unable to stop using the module once you start (since abstracts get stored rather than URLs). As a result, I think it's best not to use this module and instead just do a search/replace on your SQL dump before you migrate it from a subdirectory install to a root install. Though I am hoping to have a native/core solution for this particular issue before long. 

  • Like 1

Share this post


Link to post
Share on other sites

@ryan: I understand your concerns, but I need an ID-based internal link system. This is not for launch/migration reasons, but for daily content business. My client will always be moving around pages in the site tree, so there will be changing URLs all the time (SEO is not a big issue here). So I’ll give it a try and advance your plugin to suit my needs!

Share this post


Link to post
Share on other sites

I have to disagree with Ryan. It wouldn't be a problem to switch back from a module like this if really needed. The slight overhead isnt really something to worry about for what you get in return.

Share this post


Link to post
Share on other sites
I have to disagree with Ryan. It wouldn't be a problem to switch back from a module like this if really needed.

The problem is that your links get converted to tags like {123} and the only thing that knows what that is, is PageLinkAbstractor. So if you were to ever stop using PageLinkAbstractor, all those links would be broken with no clear way to fix them short of manually editing each page. But if people really like the module still, I do have a much newer version (compatible with PageLinkAbstractor) called LinkMonitor that I could release. The nice thing about LinkMonitor is that it alerts you when it finds a broken link too. It checks for broken image/page links in the background when rendering pages. The only reason I've not released it is that I've been thinking abstraction of links isn't a good idea since it is 1) a drug you can't stop using; and 2) ultimately makes the content less portable. But maybe these are worthwhile tradeoffs. 

@ryan: I understand your concerns, but I need an ID-based internal link system. This is not for launch/migration reasons, but for daily content business. My client will always be moving around pages in the site tree, so there will be changing URLs all the time (SEO is not a big issue here). So I’ll give it a try and advance your plugin to suit my needs!

The 2.3 core comes with a module called PagePathHistory. It's not installed by default, but you can install it just by clicking "install" in the modules screen. It will keep track of all page movements and setup redirects to ensure links aren't broken. 

Share this post


Link to post
Share on other sites

Do I miss something obvious? If so I'm sorry Ryan. Any abstractation can be reverted. If you really want or need to just run a script that converts it back the same way PageLinkAbstractor does.

Share this post


Link to post
Share on other sites

Of course, that's easy for you and me. But I'm guessing that most users would not know how to do that. It's a little bit of work to reverse what it does, and you have to know what you are doing. It's not feasible for the module itself to do it at uninstall just because it could take several executions to perform on a really large site. Basically, I don't like building/recommending solutions that can't be easily reversed just by uninstalling. But LinkMonitor/PageLinkAbstractor may be a special case.

Share this post


Link to post
Share on other sites

 But if people really like the module still, I do have a much newer version (compatible with PageLinkAbstractor) called LinkMonitor that I could release.

Yes please!

Uninstalling the module would mean: have the current abstraction replaced with the last known real link. Done by the module uninstaller, if possible.

In a second step an admin could run a classic link checker online to identify broken links and fix them manually. - Is this thinking correct?

Share this post


Link to post
Share on other sites
have the current abstraction replaced with the last known real link. Done by the module uninstaller, if possible.

This would not be scalable. A large quantity of abstracted links generated over a long period of time are not something that an uninstaller could remove in one request. That's one of the reasons why I'm not totally happy with abstracting links. 

LinkMonitor actually does go further than just abstracting links. It also locates broken links to images and such, and then logs them (and emails you, if you want). So it's a nice upgrade from PageLinkAbstractor, and also compatible with it. I'll have to get back to work on it. 

  • Like 1

Share this post


Link to post
Share on other sites

Please note that Page Link Abstractor does not work with Multi-Language Fields. It works entirely with strings and does not return a proper object with multiple languages at the end of the abstraction process.

I’ve switched to Page History now, and I am looking forward to LinkMonitor. Hope this plugin will be able to change all links in all pages to the most recent version in page url history.

Share this post


Link to post
Share on other sites

Found this topic and wanted to ask a few questions to revive it:

  1. The module has not been updated since version 2.2. Is it still usable? If it is shoul the compatibility be updated?
  2. What about LinkMonitor? Has it ever been released?
  3. Is there a recomended solution for moving sites from subdirectory to root (excluding search and replace in db dump)?

Thank you!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...