Search the Community
Showing results for tags 'web scraping'.
-
Would anyone here be able and inspired to develop a ProcessWire equivalent to the WordPress plugin WP All Import Pro? or to help me do so? This is what I'm envisioning... Upon installation the module creates an admin page titled "Import & Update". On the module config page you can specify allowed templates to run this on, otherwise allowing any. Include the following PHP libraries: hQuery for web scraping, Csv for CSV handling, and Parser for XML. Create template "import-update". On the "Import & Update" page, a list of current import/updaters will be displayed (0 initially), each with corresponding links to "edit" or "run". When you "Add New", this be the "import-update" template (with all module-specific fields tagged "impupd"): title destination (req.): parent, template source (req.): type (web, csv, xml) location (url, file, text) if web: opt. index URL & link selector, + paginator selector if csv: opt. ignore 1st row if xml: req. individual item node xpath actions (check): import (if none matching UID) update (if matching UID & field values differ) save() [req. here in flow] map (repeater): input (select fields from specified template to affect) intake (corresponding DOM selectors / CSV col. letters/headers / xpath per field) (req.) UID (unique ID; field reference to compare against, from selected input fields) (req.) Lazy Cron interval Scripts can be run via the import-update template; keep logs; show preview (iframe/ajax) for manual runs. ...