Jump to content

Long Process to be Ran Once a Day


briangroce
 Share

Recommended Posts

Hi guys,

I have a script that takes quite a LONG time to process (10 min or so) and requires a few loops within loops. (That's a whole separate issue that I am trying to optimize better) In some of the loops, I am scraping some information off of another website also so this script takes a while to run. Anyways, I need this script to run once a day at 2AM or so. I know about LazyCron, but I don't want it to have to be user initiated. I need it to run the script at the same time every day.

The script is grabbing Pages in ProcessWire, then scraping data, then updating the ProcessWire Page.

What do you guys suggest as the best way to have a script like this run efficiently?

Link to comment
Share on other sites

These are just quick ideas off the top of my head about running at the same time every day. If you can add cron jobs to the server box, that's probably the best way to go. Try...

  • Making the script executable by whatever user the webserver uses.
  • From the server's command prompt setup the cronjob via "crontab -e". You will need to add a line to this file and you can use this cronjob calculator to help you set it up.

However, if you have no access to cron on the server, install the lazycron module instead and then set up a cronjob on another internet connected box that is always on at 2am that uses wget to visit your site and hence trigger your script.

  • Like 2
Link to comment
Share on other sites

In this case, is running curl http://www.someurl.com/myPage/ reasonable?

Keep in mind that ProcessWire can run in shell scripts outside of Apache/http. But so long as you aren't dealing with timeout issues, it should also be fine to trigger it the way you are asking about too (whether curl or wget or something else). However, you'll want to make sure you've got some good security through obscurity (obscure URL), and/or a GET/POST variable pass key or something to ensure nobody else can trigger your script except you. This is always a concern with anything http accessible.

  • Like 2
Link to comment
Share on other sites

  • 1 year later...

Just wanted to mention that this doesn't seem to be true, at least not for me. I create a crontab and get this errors when the script is run:

PHP Notice:  Undefined index: SERVER_NAME in /home/www-data/processwire/wire/core/ProcessWire.php on line 93

Notice: Undefined index: SERVER_NAME in /home/www-data/processwire/wire/core/ProcessWire.php on line 93
PHP Notice:  Undefined index: HTTP_HOST in /home/www-data/processwire/wire/core/ProcessWire.php on line 94

Notice: Undefined index: HTTP_HOST in /home/www-data/processwire/wire/core/ProcessWire.php on line 94
PHP Fatal error:  Exception: SQLSTATE[28000] [1045] Access denied for user 'www-data'@'localhost' (using password: NO) (in /home/www-data/processwire/wire/core/ProcessWire.php line 143)

#0 /home/www-data/processwire/wire/core/ProcessWire.php(51): ProcessWire->load(Object(Config))
#1 /home/www-data/abc/index.php(183): ProcessWire->__construct(Object(Config))
#2 /home/www-data/abc/import/mitglieder/cron.php(5): include('/home/www-data/...')
#3 {main} in /home/www-data/abc/index.php on line 214

Fatal error: Exception: SQLSTATE[28000] [1045] Access denied for user 'www-data'@'localhost' (using password: NO) (in /home/www-data/processwire/wire/core/ProcessWire.php line 143)

#0 /home/www-data/processwire/wire/core/ProcessWire.php(51): ProcessWire->load(Object(Config))
#1 /home/www-data/abc/index.php(183): ProcessWire->__construct(Object(Config))
#2 /home/www-data/abc/import/mitglieder/cron.php(5): include('/home/www-data/...')
#3 {main} in /home/www-data/abc/index.php on line 214



This error message was shown because site is in debug mode ($config->debug = true; in /site/config.php). Error has been logged. Administrator has been notified. 
 

Any ideas how to get this working? processwire is symlinked from within the webroot of where I call the script.

Link to comment
Share on other sites

I'm running PW from crontabs all over the place. You can ignore the undefined index notices in this case as they don't have anything to do with the errors that follow. The error messages you are seeing seem to indicate that the database settings were not defined. No idea how that could happen, but maybe something to do with the symlink. Or you may be hitting up against some server security here, as it appears you've got one account (sev-online) trying to access another (abc). Make sure your cron job is running as the user that owns these files, or one with greater access. 

Link to comment
Share on other sites

No abc was me trying to take out sev-online.ch and forgot 2 of them, but anyway. I'm not sure why it wouldn't be able to connect to db as the config.php is local and clearly working.

There seems no security problem or symlink, the script works fine when run directly and not from crontab. The crontab is set up for the user that also owns the webs.

Link to comment
Share on other sites

OK I tried again today, and after some time I glimpsed at my config.php... of course it can't work cause I had DB connection infos dynamic on the host name using some $_SERVER which of course doesn't work in a crontab.

Seems to work fine for now.

So the problem was once again between chair and computer.

Link to comment
Share on other sites

This works great for me:

http://processwire.com/api/include/

On that example Ryan creates a executable file, but you can as well create a PHP file anywhere you want on the server and do the same:

<?php

include("/path/to/processwire/index.php");

// Do anything you want with PW. Remember to use wire('pages') and all the likes instead of $pages

After that you can run it in the terminal like this:

php path/to/your/file.php

Or for your cronjob:

@hourly php path/to/your/file.php

You can even put some echo's on the file for debugging purposes and see them in the terminal. That worked great for me with scripts tat created hundreds of pages in one go.

  • Like 2
Link to comment
Share on other sites

Re. "However, you'll want to make sure you've got some good security through obscurity (obscure URL), and/or a GET/POST variable pass key or something to ensure nobody else can trigger your script except you."

Assuming your intended way to run this is via cron job on the same server, have your PHP check that REMOTE_HOST equals LOCALHOST

Link to comment
Share on other sites

 OK I tried again today, and after some time I glimpsed at my config.php... of course it can't work cause I had DB connection infos dynamic on the host name using some $_SERVER which of course doesn't work in a crontab.

@Soma: I use same approach in site/config.php but use it like that:

$config->dbHost = 'MyComputersName'==getenv('COMPUTERNAME') ? 'example.com' : 'localhost';

The systems environment variable "COMPUTERNAME" is set on (every) Windows, but if it is set on a *nix system it will have a different value. This way it works in terminal too.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...