Jump to content

Module: Page Link Abstractor


ryan
 Share

Recommended Posts

Page Link Abstractor module for ProcessWire 2

Plugin module that lets you move pages in your site without worry of ever breaking static links on other pages.

Download at:

https://github.com/ryancramerdesign/PW2-PageLinkAbstractor

What it does


Converts links in textarea/rich-text fields to an abstract format for storage, and converts them back at runtime. This means that if you move a page and another page is linking to it, the link won't be broken.  It also means you can move your site from subdirectory to root (or the opposite) and not break links you may have created in your textarea fields. This applies to any kind of links: pages, files, images, etc.

This module will also notify you when you edit a page that has a link to a page that doesn't exist or is in the trash.

This module has been tested with ProcessWire 2.1 but should also work with 2.0.

Since this module has not yet had a lot of testing, it should be considered "beta" and use is at your own risk. Please let me know of any issues or bugs you run into.

How to install


1. Download the PageLinkAbstractor.module file from https://github.com/ryancramerdesign/PW2-PageLinkAbstractor and place it in: /site/modules/

2. In the admin control panel, go to Modules. At the bottom of the screen, click the "Check for New Modules" button.

3. Now scroll to the PageLinkAbstractor module and click "Install".

4. Edit any of your textarea fields in Setup > Fields. You'll see a new configuration option to enable this module for that field.

How it works


This is a technical explanation for how this module works for those that are interested. Reading this is not required in order to use this module, but it may help you to use it more effectively.

When you save a page that has a textarea field with this module enabled, it will look for URLs in HTML attributes by looking for an equals sign followed by a URL. It replaces instances of your site's root URL with a special tag: {~root_url}

Next it checks to see if any of the URLs it found can be loaded as pages in your site. If so, it replaces those URLs with this special tag: {~page_123_url} where "123" is the Page's ID.

When the page is loaded, ProcessWire does the opposite and converts those special tags back to their URLs. Because the URLs were abstracted to tags that are generated at runtime, when a page (or a site) is moved, no links are broken.

Note that this module only converts URLs to tags when you save a page, so it only affects pages saved after the module is installed.

Where to use it


This would be most useful on your main 'body' field that uses a rich text editor (like TinyMCE).

Where not to use it


There is some overhead in using this module that will be insignificant if you use it carefully. Here are a few instances to avoid using it:

Avoid use on fields that have the 'autojoin' option on, unless your site doesn't load lots of pages in a given request.

Don't use on textarea fields that can contain anonymous (guest) user input.

Avoid use on fields that aren't likely to contain links to local site pages in HTML markup. No need to have this module parsing things unnecessarily.

Avoid use on fields where you think you might disable it later. Once disabled, the abstract tags representing the URLs will still be in place. If the module is disabled, those tags will no longer be converted to URLs are runtime. You would have to correct them manually by editing the page.

Side benefit


The tags that this module abstracts to are intentionally fulltext indexable, so you can perform searches for these tags.

This means that you can find  all pages linking to another by searching for it (minus the brackets and "~"). For example:

$links = $pages->find("body~=page_123_url");

That would return all [viewable and visible] pages linking to page ID 123.

Please note


In order to convert URLs for pages, this module needs to load those pages in order to obtain their URL. If you are linking to a hundred pages in your 'body' field, you should expect that it may slow down the page 'load' and page 'save' time for pages containing lots of links.

This module doesn't yet abstract local URLs that have a schema/protocol and domain in it. It just works with path-type links like /path/to/page/ and not http://domain.com/path/to/page/.

This module hasn't yet been tested with migrating a site from subdirectory to root, but I will be testing this soon.

  • Like 3
Link to comment
Share on other sites

Yes, this sounds excellent. I think broken on-site links is one of the major pain points in many cms-software, at least when doing structural changes to sites. Of course this is not so big issue when site is well build and cms support clean relations (like PW does), but still excellent addition.

I couldn't find the topic where this was discussed months ago - it would be nice to link it here if anyone remembers that topic?

Link to comment
Share on other sites

I couldn't find the topic where this was discussed months ago - it would be nice to link it here if anyone remembers that topic?

I've been hunting for it too and can't find it. Now I'm wondering if this is something we were talking about via email rather than in the forum. I'll keep an eye out for it.

Link to comment
Share on other sites

  • 3 months later...

I just began linking bits of a site together so thought I should check out this module. It's great as it's very similar to how I've worked in another CMS, plus I do like the added bonus of being able to find out what other pages link to a given page which could be useful on larger sites where there might be a possibility of pages being deleted and causing broken links.

Perhaps that might be a good addition to this module - a quick check on page delete to see if there are other pages linking to it first so the admin can then go and remove the links afterwards?

Link to comment
Share on other sites

  • 1 year later...

Thanks Ryan! This will come in handy when building my manual since each page will contain tons of links to other manual pages for additional reading.

The description above only seems to mention page moves, but I'd like to point out that this module also prevents link breakage when you change the URL of a page. It automatically fetches the new URL.

Link to comment
Share on other sites

The description above only seems to mention page moves, but I'd like to point out that this module also prevents link breakage when you change the URL of a page. It automatically fetches the new URL.

ProcessWire will also handle this for you automatically (with redirects) if you install the Page Path History module (already included with the core).

Link to comment
Share on other sites

  • 3 months later...

Works like a charm with plain HTML input or WYSIWYG!

Though, It would be a killer if the plugin could detect Markdown link syntax as well! I could try to extend the plugin by myself, but surely Ryan has got better black magic Regex knowledge …

Here is [a question of love](/faq/misc/what-is-love) and you can convert it to an ID.

  • Like 1
Link to comment
Share on other sites

There is a little overhead in using this module since it has to parse through the abstractions and translate them to/from URLs. There's also the aspect of being unable to stop using the module once you start (since abstracts get stored rather than URLs). As a result, I think it's best not to use this module and instead just do a search/replace on your SQL dump before you migrate it from a subdirectory install to a root install. Though I am hoping to have a native/core solution for this particular issue before long. 

  • Like 1
Link to comment
Share on other sites

@ryan: I understand your concerns, but I need an ID-based internal link system. This is not for launch/migration reasons, but for daily content business. My client will always be moving around pages in the site tree, so there will be changing URLs all the time (SEO is not a big issue here). So I’ll give it a try and advance your plugin to suit my needs!

Link to comment
Share on other sites

I have to disagree with Ryan. It wouldn't be a problem to switch back from a module like this if really needed. The slight overhead isnt really something to worry about for what you get in return.

Link to comment
Share on other sites

I have to disagree with Ryan. It wouldn't be a problem to switch back from a module like this if really needed.

The problem is that your links get converted to tags like {123} and the only thing that knows what that is, is PageLinkAbstractor. So if you were to ever stop using PageLinkAbstractor, all those links would be broken with no clear way to fix them short of manually editing each page. But if people really like the module still, I do have a much newer version (compatible with PageLinkAbstractor) called LinkMonitor that I could release. The nice thing about LinkMonitor is that it alerts you when it finds a broken link too. It checks for broken image/page links in the background when rendering pages. The only reason I've not released it is that I've been thinking abstraction of links isn't a good idea since it is 1) a drug you can't stop using; and 2) ultimately makes the content less portable. But maybe these are worthwhile tradeoffs. 

@ryan: I understand your concerns, but I need an ID-based internal link system. This is not for launch/migration reasons, but for daily content business. My client will always be moving around pages in the site tree, so there will be changing URLs all the time (SEO is not a big issue here). So I’ll give it a try and advance your plugin to suit my needs!

The 2.3 core comes with a module called PagePathHistory. It's not installed by default, but you can install it just by clicking "install" in the modules screen. It will keep track of all page movements and setup redirects to ensure links aren't broken. 

Link to comment
Share on other sites

Do I miss something obvious? If so I'm sorry Ryan. Any abstractation can be reverted. If you really want or need to just run a script that converts it back the same way PageLinkAbstractor does.

Link to comment
Share on other sites

Of course, that's easy for you and me. But I'm guessing that most users would not know how to do that. It's a little bit of work to reverse what it does, and you have to know what you are doing. It's not feasible for the module itself to do it at uninstall just because it could take several executions to perform on a really large site. Basically, I don't like building/recommending solutions that can't be easily reversed just by uninstalling. But LinkMonitor/PageLinkAbstractor may be a special case.

Link to comment
Share on other sites

 But if people really like the module still, I do have a much newer version (compatible with PageLinkAbstractor) called LinkMonitor that I could release.

Yes please!

Uninstalling the module would mean: have the current abstraction replaced with the last known real link. Done by the module uninstaller, if possible.

In a second step an admin could run a classic link checker online to identify broken links and fix them manually. - Is this thinking correct?

Link to comment
Share on other sites

have the current abstraction replaced with the last known real link. Done by the module uninstaller, if possible.

This would not be scalable. A large quantity of abstracted links generated over a long period of time are not something that an uninstaller could remove in one request. That's one of the reasons why I'm not totally happy with abstracting links. 

LinkMonitor actually does go further than just abstracting links. It also locates broken links to images and such, and then logs them (and emails you, if you want). So it's a nice upgrade from PageLinkAbstractor, and also compatible with it. I'll have to get back to work on it. 

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

Please note that Page Link Abstractor does not work with Multi-Language Fields. It works entirely with strings and does not return a proper object with multiple languages at the end of the abstraction process.

I’ve switched to Page History now, and I am looking forward to LinkMonitor. Hope this plugin will be able to change all links in all pages to the most recent version in page url history.

Link to comment
Share on other sites

  • 1 year later...

Found this topic and wanted to ask a few questions to revive it:

  1. The module has not been updated since version 2.2. Is it still usable? If it is shoul the compatibility be updated?
  2. What about LinkMonitor? Has it ever been released?
  3. Is there a recomended solution for moving sites from subdirectory to root (excluding search and replace in db dump)?

Thank you!

Link to comment
Share on other sites

  • 8 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...