Jump to content
teppo

Module: Process Link Checker

Recommended Posts

This is a beta release, so some extra caution is recommended. So far the module has been successfully tested on at least ProcessWire 2.7.2 and 3.0.18, but at least in theory it should work for 2.4/2.5 versions of ProcessWire too.
 
GitHub repo: https://github.com/teppokoivula/ProcessLinkChecker (see README.md for more techy details, settings etc.)
 
What you see is ...
 
This is a module that adds back-end tools for tracking down broken links and unnecessary redirects. That's pretty much all there is to these views right now; I'm still contemplating whether it should also provide a link text section (for SEO purposes etc.)  and/or other features.
 
The magic behind the scenes
 
The admin tool (Process module) is about half of Link Checker; the other half is a PHP class called Link Crawler. This is a tool for collecting links from a ProcessWire site, analysing them and storing the outcome to custom database tables.
 
Link Crawler is intended to be triggered via a cron task, but there's also a GUI tool for running the checker. This is a slow process and can result in issues, but for smaller sites and debugging purposes the GUI method works just fine. Just be patient; the data will be there once you wait long enough :)
 
Now what?
 
For the time being I'd appreciate any comments about the way this is heading and/or whether it's useful to you at all. What would you add to make it more useful for your own use cases? I'm going to continue working on this for sure (it's been a really fun project), but wouldn't mind being pushed to the correct direction early on.
 
This module is already in active use on two relatively big sites I manage. Lately I haven't had any issues with the module, but please consider this a beta release nevertheless; it hasn't been widely tested, and that alone is a reason to avoid calling it "stable" quite yet.

Screenshots

Dashboard:

link-checker-dashboard.png

List of broken links:

link-checker-broken-links.png

List of redirects:

link-checker-redirects.png

Check now tool/tab:

link-checker-check-now.png

Edited by teppo
Updated module description, status and screenshots.
  • Like 18

Share this post


Link to post
Share on other sites

Teppo this looks fantastic, nice work! While I haven't yet been able to test it out here I will be soon, as I have a regular need for a tool like this. It's also one of those things that come up with clients a lot: "how do I keep track of when a link no longer works?". I've been using Google Webmaster tools for 404 discovery in the past, but it's often hard to separate the noise from the goods there, and it's not particularly client friendly either. Regarding the cron side of this, I immediately thought of IftRunner (which itself is triggered by cron) and how this might work great as a PageAction with IftRunner. PageActions can also be executed by ListerPro and presumably other tools in the future as well. 

  • Like 2

Share this post


Link to post
Share on other sites

Thanks, Ryan. Let me know how it handles once you do test it, would be interesting to know. My tests so far have been very limited in scope, so I'm fully expecting a pile of issues (and most likely a few things I've completely missed).. though of course the opposite would be cool too :)

You've given me something new to consider there, will definitely take IftRunner and PageAction part into consideration.

Share this post


Link to post
Share on other sites

I have installed it at a 2.6.10 dev version. The installation process was successfull, but if I want to check the links I get the following messages:

2015-07-31 17:30:34	admin	    START: id!=2, has_parent!=2
2015-07-31 17:30:34	admin	    BATCH: 1/2 (pages 1-52/52)
2015-07-31 17:30:34	admin	        FOUND Page: /
2015-07-31 17:30:35	admin	            CHECKED URL: http://www.juergen-kern.at/site/templates/favicon.ico (200)

Warning: PDOStatement::execute(): MySQL server has gone away in /home/.sites/24/site1275/web/site/modules/ProcessLinkChecker/LinkCrawler.php on line 405

Warning: PDOStatement::execute(): Error reading result set's header in /home/.sites/24/site1275/web/site/modules/ProcessLinkChecker/LinkCrawler.php on line 405

Fatal error: Exception: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away (in /home/.sites/24/site1275/web/wire/core/Modules.php line 2264)

#0 /home/.sites/24/site1275/web/wire/core/Modules.php(2264): PDOStatement->execute()
#1 /home/.sites/24/site1275/web/wire/core/Modules.php(2523): Modules->getModuleConfigData(Object(ProcessPageSearch))
#2 /home/.sites/24/site1275/web/wire/core/Modules.php(446): Modules->setModuleConfigData(Object(ProcessPageSearch))
#3 /home/.sites/24/site1275/web/wire/core/Modules.php(1032): Modules->initModule(Object(ProcessPageSearch), false)
#4 /home/.sites/24/site1275/web/wire/core/Modules.php(939): Modules->getModule('ProcessPageSear...')
#5 /home/.sites/24/site1275/web/wire/modules/AdminTheme/AdminThemeReno/default.php(25): Modules->get('ProcessPageSear...')
#6 /home/.sites/24/site1275/web/wire/core/admin.php(148): require('/home/.sites/24...')
#7 /home/.sites/24/site1275/web/wire/modules/AdminTheme/AdminThemeReno/controller.php(13): require('/home/.sites/24...')
#8 /home/.sites/ in /home/.sites/24/site1275/web/index.php on line 254

Error: 	Exception: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away (in /home/.sites/24/site1275/web/wire/core/Modules.php line 2264)


#0 /home/.sites/24/site1275/web/wire/core/Modules.php(2264): PDOStatement->execute()

#1 /home/.sites/24/site1275/web/wire/core/Modules.php(2523): Modules->getModuleConfigData(Object(ProcessPageSearch))

#2 /home/.sites/24/site1275/web/wire/core/Modules.php(446): Modules->setModuleConfigData(Object(ProcessPageSearch))

#3 /home/.sites/24/site1275/web/wire/core/Modules.php(1032): Modules->initModule(Object(ProcessPageSearch), false)

#4 /home/.sites/24/site1275/web/wire/core/Modules.php(939): Modules->getModule('ProcessPageSear...')

#5 /home/.sites/24/site1275/web/wire/modules/AdminTheme/AdminThemeReno/default.php(25): Modules->get('ProcessPageSear...')

#6 /home/.sites/24/site1275/web/wire/core/admin.php(148): require('/home/.sites/24...')

#7 /home/.sites/24/site1275/web/wire/modules/AdminTheme/AdminThemeReno/controller.php(13): require('/home/.sites/24...')

#8 /home/.sites/

This error message was shown because site is in debug mode ($config->debug = true; in /site/config.php). Error has been logged. 

Share this post


Link to post
Share on other sites

Hangs for me too on a test server. Great Module though. Love the ability to run the check directly in the Admin.

Share this post


Link to post
Share on other sites

Looks like I've missed some messages here. I'm currently using this on a couple of sites with no issues; ProcessWire 2.7.2 and 3.0.18, on two separate servers.  Would be interesting to hear if aforementioned issues still exist.

Share this post


Link to post
Share on other sites

Hello teppo,

I have re-installed this module on a 3.25 dev version today and it works. I dont get any error messages :)

  • Like 1

Share this post


Link to post
Share on other sites

Can someone give me an example code of how to initialize this module with a cron job? Do I need to create a cron job module or can I use ready.php?

Thanks for your hints!

Share this post


Link to post
Share on other sites

@SteveB: shouldn't require any tricks, but to be honest I've never used such a setup myself, so it's probably a mistake on my side. I'll take a closer look at that ASAP :)

@Juergen: README includes instructions for setting up a cron job. The gist of it is that you should make a cron job that runs the module's own Init.php file periodically.

To be honest I'm not entirely sure what you mean by a cron job module or ready.php in this context – but please let me know what I'm missing!

Share this post


Link to post
Share on other sites

@Juergen

To setup a cronjob is really easy, but you have to understand some basics first.

The cronjob past has nothing to do with ProcessWire. It is a separate program running on your server which is able to run commands at a certain time. It is either configured in your hosting admin panel (easiest, ask your hosting provider) or you can set-up it yourself through the command line. You can follow this example if you're running a Linux based server.

You need to understand that you can execute a PHP file from the command line. Teppo has provided us with such a script that will activate the link checker. The file is "/ProcessLinkChecker/Init.php". This is the one the cronjob needs to run. If you are unsure what the correct path is you can ask your hosting provider or login into the shell and navigate to the "ProcessLinkChecker" folder and type "pwd". That will give you the current path. It will be something like:

/srv/username/apps/appname/public/site/modules/ProcessLinkChecker/

Combine the path with your new knowlegde from the tutorial and you can set it up.

p.s. If you are on Windows you need to create a "Task" in "Windows Task Scheduler".

p.s. 2 You don't have to wait to test if the link is working since you can test the script by running:

/usr/bin/php /path/to/site/modules/ProcessLinkChecker/Init.php >/dev/null 2>&1

p.s. 3 this whole timing stuff can be pretty  confusing so use a tool like crontab.guru.

p.s. 4 after proofreading this post now it seems pretty hard O0, but believe me after a few times you can set it up in a few minutes.

  • Like 8

Share this post


Link to post
Share on other sites

Great module thanks teppo!!

I can't edit crontab via ssh am only able to add crons via admin panel and there I can only provide a url and no path so without changing .htaccess I can't just run domain.com/site/modules/ProcessLinkChecker/Init.php..right?

But I have already set up crons so I thought about copying contents of Init.php in an existing cron which should trigger it..

$linkCrawlerPath = $config->paths->siteModules . 'ProcessLinkChecker/LinkCrawler.php';
if (file_exists($linkCrawlerPath)) {
	require $linkCrawlerPath;
	$crawler = new \LinkCrawler();
	$crawler->start();
}

But then I'm getting those

Notice: Undefined variable: wire in site/modules/ProcessLinkChecker/LinkCrawler.php on line 144
Fatal error: Call to undefined function wire() in site/modules/ProcessLinkChecker/LinkCrawler.php on line 144

Uh, I'm running 3.0.25 that's why the backslash

Any ideas? Or alternative paths? And, easier I included the Init.php in my cron script with the same result..

EDIT: Same error (at least the top one "undefined variable wire") when running from ProcessLinkChecker admin page..

Share this post


Link to post
Share on other sites

@Can: The issue you mentioned should be fixed in the latest version of LinkCrawler.php, though please let me know if it still persists. The problem was that LinkCrawler didn't have access to $wire from the global scope, but since PROCESSWIRE was already defined, it wasn't attempting to instantiate ProcessWire either.

I'm no longer entirely sure that current behaviour makes sense in this case (perhaps I should rather allow the user to pass an instance of ProcessWire to LinkCrawler when instantiating it) but at least this seems to fix the issue at hand :)

  • Like 1

Share this post


Link to post
Share on other sites

After removing content from site/init.php and site/ready.php for now (throwing errors about redeclared functions) I'm getting this now:

throw new Exception("Unrecognized render method");

I'm invoking LinkCrawler like this within an external script which bootstraps processwire (so not within template file, maybe that's the problem?)

if ($modules->get('ProcessLinkChecker')) {
	require $config->paths->siteModules . 'ProcessLinkChecker/LinkCrawler.php';
	$crawler = new \LinkCrawler();
	$crawler->start();
}

 

Share this post


Link to post
Share on other sites

@Can: Sorry for the delay. So far I haven't been able to reproduce the issue you're seeing, which is making it quite difficult to debug. This is one of those cases where it would be tremendously useful to be able to check which values LinkCrawler gets from the Process module, what $this->config contains, what that "unrecognized" render method really is, and so on :) 

Not calling the module from a template file isn't a problem, but I'm a bit confused why it would throw the "unrecognized render method" error. Could you check what the module config page of ProcessLinkChecker lists as the render method?

This error should only happen if render_method config setting contains something weird or if it's undefined. At this point I can only assume that either LinkCrawler doesn't have access to the ProcessLinkChecker module (it tries to get it's config from there) or those config variables are somehow mishandled.

Just checking, but is the above snippet the only code in that file? I assume it's bootstrapping the same ProcessWire installation that has ProcessLinkChecker installed, right?

Share this post


Link to post
Share on other sites

Hey @Can,

I just ran into some small things myself installing and configuring this module. Since I don't have shell access to the server (yet) I created a workaround. I've created a template and page called "cronjob" so I could trigger the script from an url (www.domainname.com/cronjobs/?key=123).

In the template.php I do a simple check on a get variable (key) to prevent people from accessing it on purpose. From there I include:

// Skip access since the guest user is loading the script
// Perhaps you might want to look into the permission check stuff since you're bootstrapping ProcessWire
$options = array('noPermissionCheck' => true);

// Load the Module to get the className
$linkCheckerModule = $this->modules->getModule("ProcessLinkChecker", $options);

// Include Teppo's LinkCrawler
require $config->paths->siteModules . $linkCheckerModule->className() . '/LinkCrawler.php';

// Start crawling
$crawler = new LinkCrawler();
$crawler->start();

// Stop ProcessWire from executing
$this->halt();

This seems to work fine for me. I've got a lot of data.

I still get some notices like Array to string conversion in */site/modules/ProcessLinkChecker/LinkCrawler.php on line 335*. I'll look into them tomorrow.

  • Like 3

Share this post


Link to post
Share on other sites

You mean $this->config in LinkCrawler.php? Would say it looks quite good, I put a var_dump($this->config) on line 151 (right after $this->config has been populated) and I'm getting this in the error message after clicking on check now on /processwire/setup/link-checker/

Spoiler

object(stdClass)#340 (16) { ["skipped_links"]=> array(0) { } ["cache_max_age"]=> string(5) "1 DAY" ["selector"]=> string(33) "status<8192, id!=2, has_parent!=2" ["http_host"]=> NULL ["log_level"]=> int(1) ["log_rotate"]=> int(0) ["log_on_screen"]=> bool(false) ["batch_size"]=> int(100) ["sleep_between_batches"]=> int(1) ["max_recursion_depth"]=> int(3) ["sleep_between_requests"]=> int(1) ["sleep_between_pages"]=> int(0) ["link_regex"]=> string(38) "/(?:href|src)=([\'"])([^#].*?)\g{-2}/i" ["skipped_links_regex"]=> NULL ["http_request_method"]=> string(11) "get_headers" ["http_user_agent"]=> string(120) "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36" } 

 

On 22.7.2016 at 3:39 PM, teppo said:

Could you check what the module config page of ProcessLinkChecker lists as the render method?

Don't now what you mean? Here /processwire/module/edit?name=ProcessLinkChecker right? don't know what I have to look for?!

On 22.7.2016 at 3:39 PM, teppo said:

Just checking, but is the above snippet the only code in that file? I assume it's bootstrapping the same ProcessWire installation that has ProcessLinkChecker installed, right?

No but for the tests I exit; right afterwards and those lines are the very first lines right after opening php and bootstrapping pw..

Thanks for your workaround @arjen think I'll give it a try soon :)

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Gadgetto
      Status update links (inside this thread) for SnipWire development will be always posted here:
      2019-10-18
      2019-08-08
      2019-06-15
      2019-06-02
      2019-05-25
      If you are interested, you can test the current state of development:
      https://github.com/gadgetto/SnipWire
      Please note that the software is not yet intended for use in a production system (alpha version).
      If you like, you can also submit feature requests and suggestions for improvement. I also accept pull requests.
      ---- INITIAL POST FROM 2019-05-25 ----
      I wanted to let you know that I am currently working on a new ProcessWire module that fully integrates the Snipcart Shopping Cart System into ProcessWire. (this is a customer project, so I had to postpone the development of my other module GroupMailer).
      The new module SnipWire offers full integration of the Snipcart Shopping Cart System into ProcessWire.
      Here are some highlights:
      simple setup with (optional) pre-installed templates, product fields, sample products (quasi a complete shop system to get started immediately) store dashboard with all data from the snipcart system (no change to the snipcart dashboard itself required) Integrated REST API for controlling and querying snipcart data webhooks to trigger events from Snipcart (new order, new customer, etc.) multi currency support self-defined/configurable tax rates etc. Development is already well advanced and I plan to release the module in the next 2-3 months.
      I'm not sure yet if this will be a "Pro" module or if it will be made available for free.
      I would be grateful for suggestions and hints!
      (please have a look at the screenshots to get an idea what I'm talking about)
       




    • By eelkenet
      Hi! I've created a small Inputfield module called InputfieldFloatRange which allows you to use an HTML5 <input type="range" ../> slider as an InputField. I needed something like this for a project where the client needs to be able to tweak this value more based on 'a feeling' than just entering a boring old number. Maybe more people can use this so I'm hereby releasing it into the wild.  
       
      What is it?
      The missing range slider Inputfield for Processwire. 
      What does it do?
      This module extends InputfieldFloat and allows you to use HTML5 range sliders for number fields in your templates.
      It includes a visible and editable value field, to override/tweak the value if required.  
      Features
      Min/max values Precision (number of decimals) Steps (Read more) Manual override of the selected value (will still adhere to the rules above) Usage
      Clone / zip repo Install FieldtypeFloatRange, this automatically installs the Inputfield Create new field of type `Float (range)` or convert an existing `Float`, `Integer` or `Text` field. To render the field's value simply echo `$page->field` Demo
      A field with Min=0, Max=1, Step=0.2, Precision=2

      Field with settings Min=0, Max=200, Step=0.25, Precision=2

       
      Todo
      Make the display-field's size configurable (will use the Input Size field setting)  Hopefully become redundant If it's usable for others I'll add it to the Modules list  
      Changelog
      v002
      - Fix issue where setting the step value to an empty value created problem with validation
      - Make the display-field optional 
      v001
      - Initial release
       
      Thanks!
       
       
    • By Robin S
      Another little admin helper module...
      Template Field Widths
      Adds a "Field widths" field to Edit Template that allows you to quickly set the widths of inputfields in the template.

      Why?
      When setting up a new template or trying out different field layouts I find it a bit slow and tedious to have to open each field individually in a modal just to set the width. This module speeds up the process.
      Installation
      Install the Template Field Widths module.
      Config options
      You can set the default presentation of the "Field widths" field to collapsed or open. Field widths entered into the Template Field Widths inputfield are only applied if the Edit Template form is submitted with the Template Field Widths inputfield in an opened state. "Collapsed" is the recommended setting if you think you might also use core inputs for setting field widths in a template context. You can choose Name or Label as the primary identifier shown for the field. The unchosen alternative will become the title attribute shown on hover. You can choose to show the original field width next to the template context field width.  
      https://github.com/Toutouwai/TemplateFieldWidths
      https://modules.processwire.com/modules/template-field-widths/
    • By adrian
      Tracy Debugger for ProcessWire
      The ultimate “swiss army knife” debugging and development tool for the ProcessWire CMF/CMS

       
      Integrates and extends Nette's Tracy debugging tool and adds 35+ custom tools designed for effective ProcessWire debugging and lightning fast development
      The most comprehensive set of instructions and examples is available at: https://adrianbj.github.io/TracyDebugger
      Modules Directory: http://modules.processwire.com/modules/tracy-debugger/
      Github: https://github.com/adrianbj/TracyDebugger
      A big thanks to @tpr for introducing me to Tracy and for the idea for this module and for significant feedback, testing, and feature suggestions.
×
×
  • Create New...