ProcessWire modules for importing and handling large data sets.

DataSet

It is a set of ProcessWire modules for importing, manipulating and displaying large (50k+ entries) data sets.
The software was developed for the [Mikes-dictionary] and other Digital Humanities projects.

Main features


  • import data from CSV and XML sources
  • user configurable input <-> field mappings
  • on-the-fly field data composition
  • supports downloading external resources (files, images)
  • purge, extend or overwrite existing data (PW pages and their fields)
  • handle page references and option fields
  • fairly low resource requirements (uses Tasker to execute long-running jobs)
  • and many more (filtering, limits, default values etc.)

How to use it


See the wiki.

Important notice


This module is under development.
It is now considered fairly stable but things may be broken and the internal API may change at any time.

History


The first version was created in 2017 to import a large XML dataset into ProcessWire pages.
The CSV import sub-module was created in 2018. It was tested to import large dataset containing 200k+ entries and many kinds of references between them.
The CSV + PDF import was developed in 2019 to create a complete digital library using a single CSV upload.

License


The "github-version" of the software is licensed under MPL 2.0.

Install and use modules at your own risk. Always have a site and database backup before installing new modules.

Twitter updates

  • ProcessWire 3.0.185 (dev) core updates, plus new Session Allow module— More
    17 September 2021
  • Three new ProcessWire Textformatter modules: Find/Replace, Markdown in Markup, and Emoji— More
    3 September 2021
  • This week we have a new master version released after a year in the making. With nearly 40 pull requests, hundreds of new additions and more than 100 issue reports resolved, this new version has a ton of great new stuff— More
    27 August 2021

Latest news

  • ProcessWire Weekly #384
    In the 384th issue of ProcessWire Weekly we'll cover the latest core updates, introduce a new module called Session Allow, and highlight a new site of the week. Read on!
    Weekly.pw / 18 September 2021
  • ProcessWire 3.0.184 new master/main version
    This week we have a new master/main version released after a full year in the making. As you might imagine, this new version has a ton of great new stuff and we’ll try to cover much of it here.
    Blog / 27 August 2021
  • Subscribe to weekly ProcessWire news

“I am currently managing a ProcessWire site with 2 million+ pages. It’s admirably fast, and much, much faster than any other CMS we tested.” —Nickie, Web developer