Jump to content
mtwebit

Executing long tasks in ProcessWire

Recommended Posts

I was looking for a module that allows the execution of long-running tasks (working with tens of thousands of pages) and could not find a suitable solution.

It started with the import problem. I have lots of data in XML form (20k+ complex entries) that I want to import into ProcessWire.
XMLReader() works fine but it takes a very long time to import all data so a simple "upload + process data on page save" would not work.

So I've created a new module for this. Meet my first (well, third) PW module: Tasker.
It's a simple module that executes long tasks in the background (using Cron or LazyCron) and reports their status to the user using a JQuery progressbar.

Any suggestions are welcome. :) How do you solve similar problems?

E.g. which is the best way to delete large number of pages? (max_exec_time will expire so I check it before the delete() call.)

$children = $page->children('template='.$this->template.',include=all'); // creates lot of page objects, may not have time to delete all
+ ...->delete()
OR
$childIDs = $this->pages->findIDs('parent='.$page->id.',template='.$this->template.',include=all'); // will create page objects later
+ $pages->get(..id..)->delete()

Be nice (I know you're always), I'm a PW-newbie, just started working with ProcessWire. I was developing sites with Drupal for a very long time but their non-existent module upgrade path finally has driven me away from it.

  • Like 10
  • Thanks 1

Share this post


Link to post
Share on other sites

Congrats to your new module. I'm sure it will be a godsend for large import (or utility) tasks. I hope I'll have time to try it out soon.

As for your question: I guess I'd try to first get the IDs. And I'll probably also use findMany() - apparently it's faster than find().

https://processwire.com/api/ref/pages/find-many/

https://processwire.com/blog/posts/find-and-iterate-many-pages-at-once/

Edited by dragan
typo
  • Like 2

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By maba
      Hello,
      I need to import regularly - every 15 or 30 days - a big .xslx file into my PW installation.
      This file now has 14 columns, 5.000 rows and grows every month.
      I'll need to group, order and work with these data to:
      analyse User monthly costs analyse User costs per Asset ... User (real AD account) has to match with a PW user - I can't join to the domain - but as you can see I have some services users (start with sca_*) or no user at all. Those rows have to be assigned to a specific user, e.g. account100.
      And:
      I would like to be able to have a kind of diff function to compare User assets between this and last month (and so on) other request is to have a notification when something change for a User between actual and latest import First request: which is the best solution to store those data in your opinion? Page, Table, Repeater Matrix, ...?
      Those are very repetitive data and I think a page reference is better than to import all the data every time but I have to understand how to manage those "dynamic" groups of software (AccType Det), hardware (Asset), ... For example Price will be imported and not stored with the description because it could be change in the future and I'll not have any control on it.
      Thanks!
      User,OE,productNmr,AccType1,AccType Det,Count,Price (€),Sum,ASNA,CC,AccType Info,Asset,AccGroup,,,,,,,,,,,,,
    • By dragan
      Is it by design that a site/ready.php is not included when creating a new site profile? Is it possible to include it with a hook? Or are there any security thoughts? (I don't want to redistribute it in public, it's just so I have my own boilerplate)
    • By Mike Rockett
      Docs & Download: rockettpw/markup-sitemap
      Modules Directory: MarkupSitemap
      Composer: rockett/sitemap
      MarkupSitemap is essentially an upgrade to MarkupSitemapXML by Pete. It adds multi-language support using the built-in LanguageSupportPageNames. Where multi-language pages are available, they are added to the sitemap by means of an alternate link in that page's <url>. Support for listing images in the sitemap on a page-by-page basis and using a sitemap stylesheet are also added.
      Example when using the built-in multi-language profile:
      <url> <loc>http://domain.local/about/</loc> <lastmod>2017-08-27T16:16:32+02:00</lastmod> <xhtml:link rel="alternate" hreflang="en" href="http://domain.local/en/about/"/> <xhtml:link rel="alternate" hreflang="de" href="http://domain.local/de/uber/"/> <xhtml:link rel="alternate" hreflang="fi" href="http://domain.local/fi/tietoja/"/> </url> It also uses a locally maintained fork of a sitemap package by Matthew Davies that assists in automating the process.
      The doesn't use the same sitemap_ignore field available in MarkupSitemapXML. Rather, it renders sitemap options fields in a Page's Settings tab. One of the fields is for excluding a Page from the sitemap, and another is for excluding its children. You can assign which templates get these config fields in the module's configuration (much like you would with MarkupSEO).
      Note that the two exclusion options are mutually exclusive at this point as there may be cases where you don't want to show a parent page, but only its children. Whilst unorthodox, I'm leaving the flexibility there. (The home page cannot be excluded from the sitemap, so the applicable exclusion fields won't be available there.)
      As of December 2017, you can also exclude templates from sitemap access altogether, whilst retaining their settings if previously configured.
      Sitemap also allows you to include images for each page at the template level, and you can disable image output at the page level.
      The module allows you to set the priority on a per-page basis (it's optional and will not be included if not set).
      Lastly, a stylesheet option has also been added. You can use the default one (enabled by default), or set your own.
      Note that if the module is uninstalled, any saved data on a per-page basis is removed. The same thing happens for a specific page when it is deleted after having been trashed.
          
    • By stanoliver
      My aim is to output a very basic xml document which should be styled with a few css-styles.
      <?xml version = "1.0"?> <contact-info> <name>Donal Duck</name> <company>Superducks</company> <phone>(011) 123-4567</phone> </contact-info> How do I implement it with processwire?
    • By karian
      I don't know why multiple instances (repeater_repeat_columns1, repeater_repeat_columns2, ...) of my repeater field are displayed inside Template field (see image).
      Is there a way to clean/reset it ?
       

×
×
  • Create New...