David Beesley Posted October 11, 2016 Share Posted October 11, 2016 Hi All, On one of our PW sites, we are importing data from an external API into the processwire database. The import contains around 450 "Objects" and on average each of these objects has around 20 sub-objects. This means I am creating around 9,500 pages during the initial full import (subsequent imports only make amends and additions) The problem here is I seem to be hitting the php memory limit. I have already upped it to 256Mb, and don't really want to up it further. I am nulling and unsetting everything I feasibly can, so the memory bloat now appears to be in the process of page creation. I am nulling the object after each page is fully saved, and can see the destructor method is running. So cannot fathom why I have this memory bloat. The bloat appears to be around 120kb, per page created. Which means currently I'd need over 1Gb dedicated to this process for it to complete. Is there something I have missed? Link to comment Share on other sites More sharing options...
netcarver Posted October 11, 2016 Share Posted October 11, 2016 @David Beesley Not sure if this will help, but if you are creating pages using some variable, say $p, inside a loop - try telling $pages to un-cache $p at the end of the loop; $pages->uncache($p); Or omit the $p argument to clear all cached pages. 4 Link to comment Share on other sites More sharing options...
David Beesley Posted October 11, 2016 Author Share Posted October 11, 2016 I have tried using wire('pages')->uncacheAll(); That didn't seem to do anything. I'll try your more fine grained approach. Link to comment Share on other sites More sharing options...
David Beesley Posted October 11, 2016 Author Share Posted October 11, 2016 No joy I'm afraid memory usage BEFORE page creation: 19656360 memory usage AFTER page creation: 19735712 memory usage BEFORE page creation: 19735712 memory usage AFTER page creation: 19815680 memory usage BEFORE page creation: 20493800 memory usage AFTER page creation: 20652664 Link to comment Share on other sites More sharing options...
BitPoet Posted October 11, 2016 Share Posted October 11, 2016 Do you have $config->debug enabled? If yes, does setting it to false change memory consumption? Also, you may try calling gc_collect_cycles() at the end of the loop after uncaching the page, to allow PHP to garbage collect unused objects with circular references. 5 Link to comment Share on other sites More sharing options...
kongondo Posted October 11, 2016 Share Posted October 11, 2016 Similar to above...this nugget by @diogo has served me well in the past... Also, is there a possibility to do it over the command line? 6 Link to comment Share on other sites More sharing options...
netcarver Posted October 11, 2016 Share Posted October 11, 2016 @David Beesley One approach I've used before when importing thousands of records that need to be made into pages, is to stash the data from the external API into a work queue (php-resque or one of the WireQueue implementations) and have cron-initiated workers regularly batch-process a certain number of queue entries, creating pages for them, before exiting. I've done this for a client in the past and it works very well. 5 Link to comment Share on other sites More sharing options...
David Beesley Posted October 12, 2016 Author Share Posted October 12, 2016 Hey @netcarver Thanks for bringing WireQueue to my attention I'll have a look through the forum thread to see how this could be implemented in my case. @BitPoet I have tried turning debugging off and I end each foreach loop thusly $page = null; // null out page object unset($page); // remove pointer to page object (null) in memory wire('pages')->uncacheAll(); // clear the page cache gc_collect_cycles(); // force invoke the garbage collector But there is no discernible change in memory usage @kongondo I'm unsure how I would implement your example. My page creation process is $page = wire('pages')->get("parent={$parent_page->id}, this_id={$page['id']}"); // where this_id & $page['id'] are this entities id in the api if(!$page->id) { // CREATE new page $new_page = new Page(); ... // Some generic fields // save $new_page->save(); $page = wire('pages')->get($new_page->id); // Set my page object to this new page } ... // Set some more fields based on API data // save $page->save(); (nulling and unsetting omitted for clarity) Thanks all for your help so far. This is a great community 1 Link to comment Share on other sites More sharing options...
David Beesley Posted October 12, 2016 Author Share Posted October 12, 2016 @BitPoet I owe you an apology. I hadn't turned debug off properly. Once I did that the memory usage held between 25Mb and 40Mb. A drastic reduction 4 Link to comment Share on other sites More sharing options...
diogo Posted October 12, 2016 Share Posted October 12, 2016 After seeing your ode, the solution @kongondo pointed at doesn't really apply here because you're not creating a page array, but looping through the data that comes from the API. Any reason why you're not using the same variable to create a page, and to add the fields and save? You would spare a call to the database on each iteraction if you didn't: $page = wire('pages')->get("parent={$parent_page->id}, this_id={$page['id']}"); // where this_id & $page['id'] are this entities id in the api if(!$page->id) { // CREATE new page $page = new Page(); // <- removed the $new_page variable ... // Some generic fields // save $page->save(); // not needed -> $page = wire('pages')->get($new_page->id); } ... // Set some more fields based on API data // save $page->save(); 5 Link to comment Share on other sites More sharing options...
MuchDev Posted October 12, 2016 Share Posted October 12, 2016 One thing I did on a past site was to get a database copy from the live site, restore on a local machine, do all the stuff locally then restore. I had around 20,000 items I was importing and ran into timeout / memory issues. I was also pre-processing images which was helpful as I was able to upload those as well. My current server sometimes eats up the ram due to image variations for large images but I currently run that site on a server with 15gb. If this is a one time deal you may be able to think this way, otherwise there might be some more server / php / apache memory settings you can tweak to get usage down. 2 Link to comment Share on other sites More sharing options...
David Beesley Posted October 13, 2016 Author Share Posted October 13, 2016 @diogo Thanks for the update. I'll be honest with you, like most things in this profession I have inherited it from a predecessor and that bit was working so I left it alone. I have done loads of micro-optimisations through this process, so may add yours to it. I'm now also thinking that because debugging is off, and was the primary cause of the memory bloat (I hate the term memory leak), I may not need the multiple nulls, unsets and forced garbage collector cycles. Link to comment Share on other sites More sharing options...
bernhard Posted June 26, 2019 Share Posted June 26, 2019 I just want to share my findings from today when I wanted to create one million pages quickly: First, via Tracy Console: ini_set('max_execution_time', 150); $last = $pages->findOne("parent=/data,include=all,sort=-id"); $last = (int)$last->title ?: 0; for($i=$last+1;$i<$last+1000;$i++) { // create page $p = new Page(); $p->template = 'basic-page'; $p->parent = '/data'; $p->title = $i; $p->save(); l("saved page $i"); $pages->uncacheAll(); gc_collect_cycles(); } d('done'); It started at 23 pages per second: And at the end of the 150 seconds it was already down at 4 pages per second: Testing without logging: Memory usage also increased proportionally. Finally running a PHP script from the command line that bootstrapped PW worked well: <?php namespace ProcessWire; $time_pre = microtime(true); include("../../../index.php"); // bootstrap ProcessWire $last = $pages->findOne("parent=/data,sort=-id"); $i = (int)(string)$last->title ?: 0; // multilang quickfix convert to string // foreach($pages->find('parent=/data') as $p) { // $p->delete(); // echo "deleted $p\n"; // } // return; $i++; $num = 0; while($i <= 1000000) { // create page $p = new Page(); $p->template = 'basic-page'; $p->parent = '/data'; $p->title = $i; $p->save(); $pages->uncacheAll(); gc_collect_cycles(); $num++; echo "done: $num ($i)\n"; $i++; } function convert($size) { $unit=array('b','kb','mb','gb','tb','pb'); return @round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i]; } $time_post = microtime(true); $t = $time_post - $time_pre; $m = memory_get_usage(); echo "created $num pages in " . round($t, 3) . "s, " . round($t/$num, 3) . "s per page, used " . convert($m) . " memory\n"; Benefits: quick runs in the background and does not lock the site direct feedback via echo without creating huge log files cancel via ctrl+c any time easy persistant saving of your scripts 15 1 Link to comment Share on other sites More sharing options...
tpr Posted June 26, 2019 Share Posted June 26, 2019 53 minutes ago, bernhard said: from today when I wanted to create one million pages quickly Meanwhile in Vienna ? 2 3 Link to comment Share on other sites More sharing options...
bernhard Posted June 26, 2019 Share Posted June 26, 2019 without uncacheAll and gc_collect_cycles() created 500 pages in 24.82s, 0.05s per page, used 20.88 mb memory using only gc_collect_cycles() created 500 pages in 24.074s, 0.048s per page, used 15.16 mb memory using only $pages->uncacheAll() created 500 pages in 24.366s, 0.049s per page, used 20.88 mb memory created 2000 pages in 96.191s, 0.048s per page, used 25.07 mb memory Seems that gc_collect_cycles() is the best option?! Not sure if I'm using it correctly, though: https://stackoverflow.com/questions/19715048/what-gc-collect-cycles-function-is-useful-for Link to comment Share on other sites More sharing options...
kongondo Posted June 26, 2019 Share Posted June 26, 2019 2 hours ago, bernhard said: Memory usage also increased proportionally. Just be aware that if you have Tracy calls inside your script (d(), etc), these also contribute to the total memory usage :-). In my test cases (some unrelated work), the amount used by Tracy can be significant. 2 Link to comment Share on other sites More sharing options...
bernhard Posted June 26, 2019 Share Posted June 26, 2019 Thx, yes I thought that would be the case. But without any logging you never know whether the script is still running or not (or what the progress is). Console + echo is really nice for such situations where the tracy console does not fit ? Link to comment Share on other sites More sharing options...
kongondo Posted June 26, 2019 Share Posted June 26, 2019 8 minutes ago, bernhard said: But without any logging you never know whether the script is still running or not True. In my case, I used $log to save the memory usage results to file to get an average. After that, I switched to Tracy but was well aware of how much memory my script was actually using vs any additions from Tracy :-). 2 Link to comment Share on other sites More sharing options...
dragan Posted June 26, 2019 Share Posted June 26, 2019 @bernhard Thanks for these tips and tests. How long did finally this 1 million page-creation script take? Link to comment Share on other sites More sharing options...
Jonathan Lahijani Posted June 26, 2019 Share Posted June 26, 2019 How about deleting thousands of pages using the API? Right now I'm using this hackish approach, but it's only when syncing my live db to my dev db, so even though it's not clean and may leave orphan records (which aren't causing any problems if they are even there), it's ok for my purposes: Github Request:https://github.com/processwire/processwire-requests/issues/131 Link to comment Share on other sites More sharing options...
bernhard Posted June 27, 2019 Share Posted June 27, 2019 14 hours ago, dragan said: How long did finally this 1 million page-creation script take? I stopped at 200k, that's enough for my performance tests for now (RockTabulator) 9 hours ago, Jonathan Lahijani said: How about deleting thousands of pages using the API? created 1000 pages in 52.123s, 0.052s per page, used 15.16 mb memory deleted 1000 pages in 5.759s, 0.006s per page, used 19.35 mb memory Quote script that deletes 15,000 pages takes about 30 minutes which is painful. On my setup it would take 6*15 = 90seconds to delete 15k pages. Maybe you have some hooks running? That's always a pain and slowing down things when you need to modify huge amounts of pages... Code used: <?php namespace ProcessWire; $time_pre = microtime(true); include("../../../index.php"); // bootstrap ProcessWire function convert($size) { $unit=array('b','kb','mb','gb','tb','pb'); return @round($size/pow(1024,($i=floor(log($size,1024)))),2).' '.$unit[$i]; } function showTime($time_pre, $num) { $time_post = microtime(true); $t = $time_post - $time_pre; $m = memory_get_usage(); echo "processed $num pages in " . round($t, 3) . "s, " . round($t/$num, 3) . "s per page, used " . convert($m) . " memory\n"; } $num = 0; foreach($pages->find('parent=/data,id>257975') as $p) { $p->delete(); echo "deleted $p\n"; gc_collect_cycles(); $num++; } showTime($time_pre, $num); return; Link to comment Share on other sites More sharing options...
Beluga Posted June 27, 2019 Share Posted June 27, 2019 If using InnoDB, use transactions to batch the page creation and deletion. With Batch Child Editor and transactions, I am able to create 35,5k pages (with over a dozen fields) in 2min 44s on my local machine. So much faster than Bernhard's result. 2 Link to comment Share on other sites More sharing options...
dragan Posted July 13, 2019 Share Posted July 13, 2019 Wow, PW (backend) became absolutely unusable after I created some 14k pages. Absurd long loading times. On 6/27/2019 at 4:45 PM, Beluga said: If using InnoDB, use transactions to batch the page creation and deletion. Something like discussed here? Interesting. Gotta try it out too... Link to comment Share on other sites More sharing options...
bernhard Posted July 14, 2019 Share Posted July 14, 2019 18 hours ago, dragan said: Wow, PW (backend) became absolutely unusable after I created some 14k pages. Absurd long loading times. Maybe you have some hooks that are causing this? Or a module that does some calculations? A default PW installation should not have any problems with several thousand pages... Link to comment Share on other sites More sharing options...
dragan Posted July 14, 2019 Share Posted July 14, 2019 Yeah, I was thinking the same. tl;dr: there's probably something broken with my server, rather than PW. In my several tests, I had to shut down Laragon (and Apache manually via Task Manager) and restart the server. I also found an Apache error, same as reported here, (installing a new Apache didn't help) + a PHP error: #8 C:\laragon\www\pwdev\wire\core\ProcessWire.php(444): ProcessWire\Session->__construct(Obj in C:\laragon\www\pwdev\index.php on line 64 [13-Jul-2019 19:00:41 Europe/Zurich] PHP Fatal error: Exception: SQLSTATE[70100]: <<Unknown error>>: 1317 Query execution was interrupted (in C:\laragon\www\pwdev\wire\core\WireDatabasePDO.php line 480) I also switched to InnoDB in the meantime... So most likely there's something broken now with my LAMP setup ? But I'll disable every hook now and also some modules, and see how it goes. Does anyone know if it's safe to delete DB-sessions manually in mySQL? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now