mtwebit

DataSet import modules

Recommended Posts

I've created a set of modules for importing (manipulating and displaying) data from external resources. A key requirement was to handle large (100k+) number of pages easily.

Main features

  • import data from CSV and XML sources in the background (using Tasker)
  • purge, update or overwrite existing pages using selectors
  • user configurable input <-> field mappings
  • on-the-fly data conversion and composition (e.g. joining CSV columns into a single field)
  • download external resources (files, images) during import
  • handle page references by any (even numeric) fields

How it works

You can upload CSV or XML files to DataSet pages and specify import rules in their description.
The module imports the content of the file and creates/updates child pages automatically.

How to use it

Create a DataSet page that stores the source file. The file's description field specifies how the import should be done:

Spoiler

name: Testing the import
input: # Source configuration
  type: csv
  delimiter: ','
  header: 1
  limit: 10  # import only 10 entries, uncomment this if the test was successful
fieldmappings: # specified as field_name: csv_column_id (1, 2, 3, ...)
  title: 1
pages:  # Config for child pages
  template: Data
  selector: 'title=@title'

After saving the DataSet page an import button should appear below the file description.

dataset_file_description.thumb.png.b92cf93c8a529d9750622ef08b67fcad.png

When you start the import the DataSet module creates a task (executed by Tasker) that will import the data in the background.

You can monitor its execution and check its logs for errors.

dataset_import_running.thumb.png.ad0e58d907dcf1b379060afa9bc928e9.png

See the module's wiki for more details.

The module was already used in three projects to import and handle large XML and CSV datasets. It has some rough edges and I'm sure it needs improvement :) so comments are welcome.

  • Like 16

Share this post


Link to post
Share on other sites

Thanks for sharing your modules @mtwebit!  This looks like it could be really useful.  Is there any way you could include a place to add a url to the file instead of an upload?  For example, I store staff's contact information in a Google Spreadsheet.  This spreadsheets gets updated all the time.  It would be cool to just add the url to csv file instead of having to download the file and upload it into Processwire.  The input could also remember it's previous value so I can run the import over and over again as needed.  Maybe it also could be somehow automated to run the same import everyday?

If not, no worries.  Thanks again.

Share this post


Link to post
Share on other sites
14 hours ago, gmclelland said:

Thanks for sharing your modules @mtwebit!  This looks like it could be really useful.  Is there any way you could include a place to add a url to the file instead of an upload?  For example, I store staff's contact information in a Google Spreadsheet.  This spreadsheets gets updated all the time.  It would be cool to just add the url to csv file instead of having to download the file and upload it into Processwire.  The input could also remember it's previous value so I can run the import over and over again as needed.  Maybe it also could be somehow automated to run the same import everyday?

If not, no worries.  Thanks again.

I was thinking about this too...

There was a dev branch that dropped the [file + rules in description] scheme and introduced a fieldset of [rule + (optional) file]. It turned out to be too complicated and it did not work well so I dropped it.

An easy solution is to allow source location override. So... see this commit and use the input:location configuration option.
Not the best solution as it still requires a (dummy) file to be uploaded (to create the import rules in its description), but it works.
You can even use this solution to refer to files uploaded to other pages using this URL scheme: wire://pageid/filename

Hope it helps.

14 hours ago, gmclelland said:

It looks like you might have already considered and built this type of functionality https://github.com/mtwebit/DataSet/wiki/Import-rules#data-conversion-during-import

That's different. It downloads data for a single field (e.g. a file to be stored in a filefield) not for an entire DataSet.

  • Thanks 1

Share this post


Link to post
Share on other sites

really like this, will be complete if can do bulk export too, currently i'm using custom php script in front end for huge data export, but prefer if i can do this in admin area.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By anderson
      Hi all,
      I'm a new to website building. Learned some CRASH course of js,jquery,php. Then I found CMS. Still learning around forum, youtube....
      Anyway, please help me with some beginer questions:
      1, About template - please correct me if I understand wrong : every page should be (or recommended) built on a template. So if in total I'll have 10 pages, 2 of them have same layout, I'll need 9 templates.  And, what fields a page includes, is not defined in page, but defined in the template that page uses. 
      2,  Where to see what modules I've installed? is it in "Modules - Site"? 
      3, I installed "PageTable Extended", then what?  As in a youtube tutorial, it should appear in Setup tab, but it doesn't.  What's in there: templates,fields,logs,comments. (I installed "Uikit 3 site_blog profile".)
      4, I did a search in Processwire website for the famous "repeater matrix" module, and can not find it, there's a Repeater, as well as a Matrix. Is it not a module?
      5, I watched this youtube tuts: https://www.youtube.com/watch?v=IHqnLQy9R1A
      Anybody familiar with this tuts please help: 
      After he analysed a target webpage layout he wanted to mimic, he created some fields, some template, then based on those he created a page and input some "content" in there, then clicked "view", it's just some text. So, here comes my question, he copied a folder "assets" (subfolders are: css,fonts,js,img) over, then the page have the appearance/layout he wanted to mimic. Where does that assets folder come from?
      Appreciate any help.
    • By dreerr
      TemplateEnginePug (formally TemplateEngineJade)
       
      This module adds Pug templates to the TemplateEngineFactory. It uses https://github.com/pug-php/pug to render templates.
      doctype html html(lang='en') head meta(http-equiv='content-type', content='text/html; charset=utf-8') title= $page->title link(rel='stylesheet', type='text/css', href=$config->urls->templates . 'styles/main.css') body include header.pug h1= $page->title if $page->editable() p: a(href=$page->editURL) Edit Project on GitHub: github.com/dreerr/TemplateEnginePug
      Project in modules directory: modules.processwire.com/modules/template-engine-pug/
       
      For common problems/features/questions about the Factory, use the TemplateEngineFactory thread.
       
    • By Robin S
      Pages At Bottom
      Keeps selected pages at the bottom of their siblings.
      A "bottom page" will stay at the bottom even if it is drag-sorted to a different location or another page is drag-sorted below it (after Page List is refreshed the bottom page will still be at the bottom).
      Newly added sibling pages will not appear below a bottom page.
      The module also prevents the API methods $pages->sort() and $pages->insertAfter() from affecting the position of bottom pages.
      Note: the module only works when the sort setting for children on the parent page/template is "Manual drag-n-drop".
      Why?
      Because you want some pages to always be at the bottom of their siblings for one reason or another. And someone requested it. 🙂
      Usage
      Install the Pages At Bottom module.
      Select one or more pages to keep at the bottom of their siblings. If you select more than one bottom page per parent then their sort order in the page list will be the same as the sort order in the module config.

       
      https://github.com/Toutouwai/PagesAtBottom
      https://modules.processwire.com/modules/pages-at-bottom/
    • By Robin S
      Another little admin helper module...
      Template Field Widths
      Adds a "Field widths" field to Edit Template that allows you to quickly set the widths of inputfields in the template.

      Why?
      When setting up a new template or trying out different field layouts I find it a bit slow and tedious to have to open each field individually in a modal just to set the width. This module speeds up the process.
      Installation
      Install the Template Field Widths module.
      Config options
      You can set the default presentation of the "Field widths" field to collapsed or open. You can choose Name or Label as the primary identifier shown for the field. The unchosen alternative will become the title attribute shown on hover. You can choose to show the original field width next to the template context field width.  
      https://github.com/Toutouwai/TemplateFieldWidths
      https://modules.processwire.com/modules/template-field-widths/
    • By horst
      Croppable Image 3
      for PW 3.0.20+
      Module Version 1.1.16
      Sponsored by http://dreikon.de/, many thanks Timo & Niko!
      You can get it in the modules directory!
      Please refer to the readme on github for instructions.
       
      -------------------------------------------------------------------------
       
      Updating from prior versions:
       
      Updating from Croppable Image 3 with versions prior to 1.1.7, please do this as a one time step:
      In the PW Admin, go to side -> modules -> new, use "install via ClassName" and use CroppableImage3 for the Module Class Name. This will update your existing CroppableImage3 module sub directory, even if it is called a new install. After that, the module will be recogniced by the PW updater module, what makes it a lot easier on further updates.
      -------------------------------------------------------------------------
       
      For updating from the legacy Thumbnail / CropImage to CroppableImage3 read on here.
       
      -------------------------------------------------------------------------