Jump to content

Importing pages and empting trash incredibly slow...


NorbertH
 Share

Recommended Posts

I had imported about 10000  Pages (realy simple ones 2 text fields only ) into PW using child batch editor and it took about 1 hour whith multiple script timeouts. 

Moved em to trash  but emtying the trash run into timeout again after about 10 minutes.

So i started a bash php script ($page->emptyTrash()) but after half an hour it is still running.  Pagetree shows only about 4000 Removed from trash right now. 

 

ProcessWire 3.0.98   Running on InnoDB (Mariadb 10)

Any Ideas ?

Link to comment
Share on other sites

Tracy debugger can help a lot monitoring such things. You have the log function where you can dump something to files. You can then follow in real-time what's happening. Eg

l("removing page $page->id from trash");

and monitor the logfile, eg

tail -f site/assets/logs/tracy/info.log

Not sure if that path is correct ? 

You might also use flush the pages cache: https://processwire.com/api/ref/pages/uncache-all/ or something like https://processwire.com/talk/topic/7572-processwire-recipes/?do=findComment&comment=88505

 

Link to comment
Share on other sites

Wouldn't it have been easier in that case to first get all the id's of the pages that are in the trash and then loop all available tables in the database and remove all rows that have a pages_id of the trashed pages?

Maybe it's not that easy, though. Not sure... I'd do a backup first ? ? 

Link to comment
Share on other sites

I have a ~40k CSV rows import/sync that runs for almost an hour. And it's not even importing, just syncing.

If you use the API to load and cycle through lots of pages it is slow, there's no real way around it except doing it in SQL directly.

Link to comment
Share on other sites

@Soma I would expect a 40k row -> 40k page import to take a while - but a simple sync with only a low % of rows changed should be pretty fast. Of those 40k rows, approximately how many of them change between each cycle?  I'd be tempted just to have a script do a diff between the last version and the current version and then only process the rows that have some kind of diff. Should be much faster that way if only a small set of rows change each cycle.

Link to comment
Share on other sites

  • 2 weeks later...

Hmm ok , did some testing .

Testserver is a synology disk station whith 8 G ram. 

The import ccript is pretty simple :

<?php
namespace ProcessWire;
include "index.php"; // Include PW 

include "berufe.php"; // Simple array with about 25000 Job names 
/*
$berufe = array(
  array('id_beruf' => '1','berufsbezeichnung' => 'Aalbrutzüchter/in','kldb2010' => '11412'),
  array('id_beruf' => '2','berufsbezeichnung' => 'Aalfischer/in','kldb2010' => '11422'),
  array('id_beruf' => '3','berufsbezeichnung' => 'Aalräucherer/-räucherin','kldb2010' => '29242'),
...
*/

foreach ($berufe as $beruf){
    echo $i++. " " .$beruf['kldb2010']." ".$beruf['berufsbezeichnung']."\n";

    $p = new Page();
    $p->template = 'berufsbezeichnung';
    $p->parent = '/einstellungen/berufi/';
    $p->title = $beruf['berufsbezeichnung'];
    $p->schluessel = $beruf['kldb2010'];
    $p->save();
}

It took about 60 minutes to import using a PW installation whith Inno DB 

It took about 40 minutes  using an installation whith MyIsam

Now i changed the script to use transactions: 

<?php
namespace ProcessWire;
include "index.php"; // Include PW 
include "berufe.php"; // Simple array with about 25000 Job names 

try {
	$database->beginTransaction();

    foreach ($berufe as $beruf){
        echo $i++. " " .$beruf['kldb2010']." ".$beruf['berufsbezeichnung']."\n";

        $p = new Page();
        $p->template = 'berufsbezeichnung';
        $p->parent = '/einstellungen/berufi/';
        $p->title = $beruf['berufsbezeichnung'];
        $p->schluessel = $beruf['kldb2010'];
        $p->save();
    }

	$database->commit();
 
} catch(\Exception $e) {
	$database->rollBack();
}

On InnoDB using transactions the script finished in about 10 Minutes. 

It seemed like the actual DB operation that happened after creating the transaction only took a few seconds, as the almost script imediately ended after the last echo.


Possibly  transactions are worth a closer look, if we do massive data handling. 

 

 

  • Like 4
Link to comment
Share on other sites

@NorbertH

Out of curiosity, how long does your script take if you don't actually save the page?

<?php
namespace ProcessWire;
include "index.php"; // Include PW 

include "berufe.php"; // Simple array with about 25000 Job names 
/*
$berufe = array(
  array('id_beruf' => '1','berufsbezeichnung' => 'Aalbrutzüchter/in','kldb2010' => '11412'),
  array('id_beruf' => '2','berufsbezeichnung' => 'Aalfischer/in','kldb2010' => '11422'),
  array('id_beruf' => '3','berufsbezeichnung' => 'Aalräucherer/-räucherin','kldb2010' => '29242'),
...
*/

foreach ($berufe as $beruf){
    echo $i++. " " .$beruf['kldb2010']." ".$beruf['berufsbezeichnung']."\n";

    $p = new Page();
    $p->template = 'berufsbezeichnung';
    $p->parent = '/einstellungen/berufi/';
    $p->title = $beruf['berufsbezeichnung'];
    $p->schluessel = $beruf['kldb2010'];
    //$p->save();
}

...and...

<?php
namespace ProcessWire;
include "index.php"; // Include PW 
include "berufe.php"; // Simple array with about 25000 Job names 

try {
	$database->beginTransaction();

    foreach ($berufe as $beruf){
        echo $i++. " " .$beruf['kldb2010']." ".$beruf['berufsbezeichnung']."\n";

        $p = new Page();
        $p->template = 'berufsbezeichnung';
        $p->parent = '/einstellungen/berufi/';
        $p->title = $beruf['berufsbezeichnung'];
        $p->schluessel = $beruf['kldb2010'];
        //$p->save();
    }

	$database->commit();
 
} catch(\Exception $e) {
	$database->rollBack();
}

...? Should give us a handle on the overhead of creating 25000 pages and setting the data on your test system.

  • Like 1
Link to comment
Share on other sites

The variant whithout transactions (and whithout save()) took about 45 seconds to run. 

Using transactions (not really using them as save() is commented out ) took about a 45 seconds to run too.

Both throw an error:   

PHP Warning:  touch(): Utime failed: Permission denied in /volume1/web/personalverwaltung/wire/core/FileCompiler.php on line 1063

 

Edit:

The error message has nothing to to whith the scripts , just an issue whith the test server. 

Edited by NorbertH
New Insight...
Link to comment
Share on other sites

Finally deleted all those pages using transactions: 

<?php
namespace ProcessWire;
include "index.php";

$dePages = wire('pages')->find("parent=/einstellungen/berufi/");

echo $dePages->count()."\n";

$i=0;
try {
	$database->beginTransaction();	
		foreach ($dePages as $dePage){	
			echo $i++. " ".$dePage['title']."\n";
			$dePage->delete();
		}
	$database->commit();
} catch(\Exception $e) {
	$database->rollBack();
}

Script took 7 Minutes to run , the actual query execution took about 7-10 seconds at the end of the script. 

I guess Transactions are helpfull at deleting pages too . What i did not try was to trash them first. 

  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...