Module: Page Link Abstractor

ryan · May 27, 2011

Page Link Abstractor module for ProcessWire 2

Plugin module that lets you move pages in your site without worry of ever breaking static links on other pages.

Download at:

https://github.com/ryancramerdesign/PW2-PageLinkAbstractor

What it does

Converts links in textarea/rich-text fields to an abstract format for storage, and converts them back at runtime. This means that if you move a page and another page is linking to it, the link won't be broken. It also means you can move your site from subdirectory to root (or the opposite) and not break links you may have created in your textarea fields. This applies to any kind of links: pages, files, images, etc.

This module will also notify you when you edit a page that has a link to a page that doesn't exist or is in the trash.

This module has been tested with ProcessWire 2.1 but should also work with 2.0.

Since this module has not yet had a lot of testing, it should be considered "beta" and use is at your own risk. Please let me know of any issues or bugs you run into.

How to install

1. Download the PageLinkAbstractor.module file from https://github.com/ryancramerdesign/PW2-PageLinkAbstractor and place it in: /site/modules/

2. In the admin control panel, go to Modules. At the bottom of the screen, click the "Check for New Modules" button.

3. Now scroll to the PageLinkAbstractor module and click "Install".

4. Edit any of your textarea fields in Setup > Fields. You'll see a new configuration option to enable this module for that field.

How it works

This is a technical explanation for how this module works for those that are interested. Reading this is not required in order to use this module, but it may help you to use it more effectively.

When you save a page that has a textarea field with this module enabled, it will look for URLs in HTML attributes by looking for an equals sign followed by a URL. It replaces instances of your site's root URL with a special tag: {~root_url}

Next it checks to see if any of the URLs it found can be loaded as pages in your site. If so, it replaces those URLs with this special tag: {~page_123_url} where "123" is the Page's ID.

When the page is loaded, ProcessWire does the opposite and converts those special tags back to their URLs. Because the URLs were abstracted to tags that are generated at runtime, when a page (or a site) is moved, no links are broken.

Note that this module only converts URLs to tags when you save a page, so it only affects pages saved after the module is installed.

Where to use it

This would be most useful on your main 'body' field that uses a rich text editor (like TinyMCE).

Where not to use it

There is some overhead in using this module that will be insignificant if you use it carefully. Here are a few instances to avoid using it:

Avoid use on fields that have the 'autojoin' option on, unless your site doesn't load lots of pages in a given request.

Don't use on textarea fields that can contain anonymous (guest) user input.

Avoid use on fields that aren't likely to contain links to local site pages in HTML markup. No need to have this module parsing things unnecessarily.

Avoid use on fields where you think you might disable it later. Once disabled, the abstract tags representing the URLs will still be in place. If the module is disabled, those tags will no longer be converted to URLs are runtime. You would have to correct them manually by editing the page.

Side benefit

The tags that this module abstracts to are intentionally fulltext indexable, so you can perform searches for these tags.

This means that you can find all pages linking to another by searching for it (minus the brackets and "~"). For example:

$links = $pages->find("body~=page_123_url");

That would return all [viewable and visible] pages linking to page ID 123.

Please note

In order to convert URLs for pages, this module needs to load those pages in order to obtain their URL. If you are linking to a hundred pages in your 'body' field, you should expect that it may slow down the page 'load' and page 'save' time for pages containing lots of links.

This module doesn't yet abstract local URLs that have a schema/protocol and domain in it. It just works with path-type links like /path/to/page/ and not http://domain.com/path/to/page/.

This module hasn't yet been tested with migrating a site from subdirectory to root, but I will be testing this soon.

Adam Kiss · May 27, 2011

TL;DR so far, but it sounds awesome. I'll read this & comment more when I care again. Good job!

apeisa · May 28, 2011

Yes, this sounds excellent. I think broken on-site links is one of the major pain points in many cms-software, at least when doing structural changes to sites. Of course this is not so big issue when site is well build and cms support clean relations (like PW does), but still excellent addition.

I couldn't find the topic where this was discussed months ago - it would be nice to link it here if anyone remembers that topic?

ryan · May 29, 2011

I couldn't find the topic where this was discussed months ago - it would be nice to link it here if anyone remembers that topic?

I've been hunting for it too and can't find it. Now I'm wondering if this is something we were talking about via email rather than in the forum. I'll keep an eye out for it.

apeisa · May 30, 2011

I've been hunting for it too and can't find it. Now I'm wondering if this is something we were talking about via email rather than in the forum. I'll keep an eye out for it.

I did found it and added cross link from my reply: http://processwire.com/talk/index.php/topic,171.msg1124.html#msg1124

ryan · May 31, 2011

Great--glad you found it. Thanks for updating it.

Pete · September 24, 2011

I just began linking bits of a site together so thought I should check out this module. It's great as it's very similar to how I've worked in another CMS, plus I do like the added bonus of being able to find out what other pages link to a given page which could be useful on larger sites where there might be a possibility of pages being deleted and causing broken links.

Perhaps that might be a good addition to this module - a quick check on page delete to see if there are other pages linking to it first so the admin can then go and remove the links afterwards?

ryan · September 24, 2011

I think this would be possible with the way it's setup. Something to plan for the next version.

woop · January 3, 2013

Thanks Ryan! This will come in handy when building my manual since each page will contain tons of links to other manual pages for additional reading.

The description above only seems to mention page moves, but I'd like to point out that this module also prevents link breakage when you change the URL of a page. It automatically fetches the new URL.

ryan · January 4, 2013

The description above only seems to mention page moves, but I'd like to point out that this module also prevents link breakage when you change the URL of a page. It automatically fetches the new URL.

ProcessWire will also handle this for you automatically (with redirects) if you install the Page Path History module (already included with the core).

Nico Knoll · April 27, 2013

Used it today for the first time. It's working great!

gerritvanaaken · May 3, 2013

Works like a charm with plain HTML input or WYSIWYG!

Though, It would be a killer if the plugin could detect Markdown link syntax as well! I could try to extend the plugin by myself, but surely Ryan has got better black magic Regex knowledge …

Here is [a question of love](/faq/misc/what-is-love) and you can convert it to an ID.

ryan · May 3, 2013

There is a little overhead in using this module since it has to parse through the abstractions and translate them to/from URLs. There's also the aspect of being unable to stop using the module once you start (since abstracts get stored rather than URLs). As a result, I think it's best not to use this module and instead just do a search/replace on your SQL dump before you migrate it from a subdirectory install to a root install. Though I am hoping to have a native/core solution for this particular issue before long.

gerritvanaaken · May 6, 2013

@ryan: I understand your concerns, but I need an ID-based internal link system. This is not for launch/migration reasons, but for daily content business. My client will always be moving around pages in the site tree, so there will be changing URLs all the time (SEO is not a big issue here). So I’ll give it a try and advance your plugin to suit my needs!

Soma · May 6, 2013

I have to disagree with Ryan. It wouldn't be a problem to switch back from a module like this if really needed. The slight overhead isnt really something to worry about for what you get in return.

ryan · May 7, 2013

I have to disagree with Ryan. It wouldn't be a problem to switch back from a module like this if really needed.

The problem is that your links get converted to tags like {123} and the only thing that knows what that is, is PageLinkAbstractor. So if you were to ever stop using PageLinkAbstractor, all those links would be broken with no clear way to fix them short of manually editing each page. But if people really like the module still, I do have a much newer version (compatible with PageLinkAbstractor) called LinkMonitor that I could release. The nice thing about LinkMonitor is that it alerts you when it finds a broken link too. It checks for broken image/page links in the background when rendering pages. The only reason I've not released it is that I've been thinking abstraction of links isn't a good idea since it is 1) a drug you can't stop using; and 2) ultimately makes the content less portable. But maybe these are worthwhile tradeoffs.

@ryan: I understand your concerns, but I need an ID-based internal link system. This is not for launch/migration reasons, but for daily content business. My client will always be moving around pages in the site tree, so there will be changing URLs all the time (SEO is not a big issue here). So I’ll give it a try and advance your plugin to suit my needs!

The 2.3 core comes with a module called PagePathHistory. It's not installed by default, but you can install it just by clicking "install" in the modules screen. It will keep track of all page movements and setup redirects to ensure links aren't broken.

Soma · May 7, 2013

Do I miss something obvious? If so I'm sorry Ryan. Any abstractation can be reverted. If you really want or need to just run a script that converts it back the same way PageLinkAbstractor does.

ryan · May 7, 2013

Of course, that's easy for you and me. But I'm guessing that most users would not know how to do that. It's a little bit of work to reverse what it does, and you have to know what you are doing. It's not feasible for the module itself to do it at uninstall just because it could take several executions to perform on a really large site. Basically, I don't like building/recommending solutions that can't be easily reversed just by uninstalling. But LinkMonitor/PageLinkAbstractor may be a special case.

gerritvanaaken · May 10, 2013

Just for the record: I added Markdown compatibility as a fork of the original plugin:

https://github.com/gerritvanaaken/PW2-PageLinkAbstractor

ceberlin · May 11, 2013

But if people really like the module still, I do have a much newer version (compatible with PageLinkAbstractor) called LinkMonitor that I could release.

Yes please!

Uninstalling the module would mean: have the current abstraction replaced with the last known real link. Done by the module uninstaller, if possible.

In a second step an admin could run a classic link checker online to identify broken links and fix them manually. - Is this thinking correct?

ryan · May 14, 2013

have the current abstraction replaced with the last known real link. Done by the module uninstaller, if possible.

This would not be scalable. A large quantity of abstracted links generated over a long period of time are not something that an uninstaller could remove in one request. That's one of the reasons why I'm not totally happy with abstracting links.

LinkMonitor actually does go further than just abstracting links. It also locates broken links to images and such, and then logs them (and emails you, if you want). So it's a nice upgrade from PageLinkAbstractor, and also compatible with it. I'll have to get back to work on it.

gerritvanaaken · June 3, 2013

Please note that Page Link Abstractor does not work with Multi-Language Fields. It works entirely with strings and does not return a proper object with multiple languages at the end of the abstraction process.

I’ve switched to Page History now, and I am looking forward to LinkMonitor. Hope this plugin will be able to change all links in all pages to the most recent version in page url history.

Ivan Gretsky · October 26, 2014

Found this topic and wanted to ask a few questions to revive it:

The module has not been updated since version 2.2. Is it still usable? If it is shoul the compatibility be updated?
What about LinkMonitor? Has it ever been released?
Is there a recomended solution for moving sites from subdirectory to root (excluding search and replace in db dump)?

Thank you!

ceberlin · July 26, 2015

Uninstall the plugin? read here...

Sign In

Module: Page Link Abstractor

Recommended Posts

ryan

Adam Kiss

apeisa

ryan

apeisa

ryan

Pete

ryan

woop

ryan

Nico Knoll

gerritvanaaken

ryan

gerritvanaaken

Soma

ryan

Soma

ryan

gerritvanaaken

ceberlin

ryan

gerritvanaaken

Ivan Gretsky

ceberlin

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

Activity

My Activity Streams

Support

Store

My Details