Jump to content

web crawler for cache warmup


maxf5
 Share

Recommended Posts

Hey guys,

i am using WireCache and template cache. Wondering if there is some kind of web crawler already for processwire which is crawling all your site to warm up the cache?
(like in Shopware eCommerce you can warm up your cache with a crawler via cronjob, etc.)


Would be a nice feature :)

 

 

Unbenannt.PNG

Link to comment
Share on other sites

found a function which could be made with a cronjob 

or this library which could be used for a module :) http://phpcrawl.cuab.de

function crawl_page($url, $depth = 5)
{
    static $seen = array();
    if (isset($seen[$url]) || $depth === 0) {
        return;
    }

    $seen[$url] = true;

    $dom = new DOMDocument('1.0');
    @$dom->loadHTMLFile($url);

    $anchors = $dom->getElementsByTagName('a');
    foreach ($anchors as $element) {
        $href = $element->getAttribute('href');
        if (0 !== strpos($href, 'http')) {
            $path = '/' . ltrim($href, '/');
            if (extension_loaded('http')) {
                $href = http_build_url($url, array('path' => $path));
            } else {
                $parts = parse_url($url);
                $href = $parts['scheme'] . '://';
                if (isset($parts['user']) && isset($parts['pass'])) {
                    $href .= $parts['user'] . ':' . $parts['pass'] . '@';
                }
                $href .= $parts['host'];
                if (isset($parts['port'])) {
                    $href .= ':' . $parts['port'];
                }
                $href .= $path;
            }
        }
        crawl_page($href, $depth - 1);
    }
    echo "URL:",$url,PHP_EOL,"CONTENT:",PHP_EOL,$dom->saveHTML(),PHP_EOL,PHP_EOL;
}
crawl_page($pages->get(1)->httpUrl, 2);
Link to comment
Share on other sites

You can pass a function to $cache->get and it will generate the cache for you

$expiration = 3600;
$cache->get("cache_name", $expiration, function() use($page) {
	$markup = "<div>$page->title</div>";
	return $markup;
});

https://processwire.com/api/ref/cache/get/

https://github.com/processwire/processwire/blob/57b297fd1d828961b20ef29782012f75957d6886/wire/core/WireCache.php#L136

Edit: Read again and I think that is not what you want. To prevent the cache being generated by a visitor you can generate it in a hook, depending on when you want the cache to update.

Link to comment
Share on other sites

This is one of my scripts that I use to quickly regenerate caches when I flush them. It wont work for dynamically generated urls (i.e. urlSegments), obviously

<?php
use ProcessWire\ProcessWire;
require_once 'vendor/autoload.php';

$wire = new Processwire();

// get urls for all public accessable pages
$urls = [];
foreach($wire->pages('id>0, check_access=1') as $p) $urls[] = $p->httpUrl;

header("Content-Type: text/plain");

// visit all urls
foreach($urls as $url) {
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_exec($ch);

    if(! curl_errno($ch)) {
        $info = curl_getinfo($ch);
        curl_close($ch);

        echo 'URL: ' . $url . "\n" .
             'Status: ' . $info['http_code'] . "\n";

        sleep(0.5);
    } else {
        echo 'ERROR: ' . $url . "\n";
    }
}

Save this in the same directory as index.php, such as cache.php then access it from mydomain.com/cache.php. It might take a while before anything to appear until output buffer is flushed to the browser.

  • Like 2
Link to comment
Share on other sites

7 minutes ago, maxf5 said:

how can i get a cache when it's not even generated yet?

The get method generates the cache if no cache is found, if you pass it a function/closure.

And with a hook (on page save or wherever it makes sense for your use case) you can generate/update the cache.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...