Jump to content

DataSet import modules


mtwebit

Recommended Posts

 After installing Dataset module I get this error:

Module reported error during install (DataSet): SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'label' cannot be null

When I click an Import link nothing happens. Tasks page is empty

I tried to do a simple import for testing using this config

Any ideas? Where can I see error log?

JSON{
  "name": "Testing the import",
  "input": {
    "type": "csv",
    "delimiter": ",",
    "header": 1,
    "limit": 10
  },
  "fieldmappings": {
    "title": 1
  },
  "pages": {
    "template": "basic-page",
    "selector": "title=@title"
  }
}

 

Link to comment
Share on other sites

23 hours ago, double said:

 After installing Dataset module I get this error:


Module reported error during install (DataSet): SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'label' cannot be null

 

Thanks for reporting this. Probably a core compatibility issue. Fixed now.
Pull the updated module from github or delete from your site/modules and reinstall it.

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

I'm not sure what's gone wrong, but I had this module working perfectly earlier in the day , and I imported all but one data table I needed from CSV, but now it fails to run and my browser shows a 500 server error when I attempt to run it.

although I updated to the latest Processwire Dev version to address something else, and I'm not sure if that has broken it?

I tried copying the latest version of this and the tasker module from Github, as the modules directory doesn't seem to be up to date with the latest, and I wondered if there was a compatibility issue with the latest build of Processwire, but that didn't fix it.

Chrome debug tools show a resource like so: admin/page/tasks/api//?cmd=run&id=4820

Opening this resource directly in the browser says that the task has already been run and won't run again.

Manually updating the task page and setting 'task is running' to zero triggers an error 

Tracy Debugger shows : Call to undefined function ProcessWire\pcntl_signal() (/site/modules/Tasker/Tasker.module.php:697)

According to some places in PHP docs, pcntl functions shouldn't be called by a web server like Apache, so I'm not sure if that is an issue?

The key thing is the module was working just fine, then suddenly stopped working, with import configurations that had previously worked.

Any ideas?

[Edit] - I found out the problem: I'm running on a VPS with Plesk and PHP as an FPM App doesn't include pcntl but PHP as FastCGI does. The site was one that hadn't been updated in a while and was still running on PHP 7.3 so I updated to 7.4 and in the process, without thinking, changed it to FPM, as most of my other sites are, but this disabled pcntl support.  Something to be aware of for anyone else running Plesk.

Link to comment
Share on other sites

@Kiwi Chris I think it may be the file compiler as it seems to be translating the module so that PHP is looking for pcntl_signal() in the processwire namespace, hence the line from tracy: "ProcessWire\pcntl_signal()".

It's also telling you where that line occurs. I've not used Tasker before, but I'm guessing you can edit that line and wherever you see "pcntl_signal()" being called, just prepend it with a backslash "\" character to put it in the global namespace.

Doing so may reveal more errors about pcntl functions - but if that's the case, we know we are on the right track and you can do the same to them as they appear.

  • Like 1
Link to comment
Share on other sites

Although LazyCron and web-based task execution are supported, tasks should be invoked by command line tools (e.g. Unix cron), not from the the Web server environment.
See the wiki.
Tasker uses process control functions for checking signals (e.g. an execution timer) which should be safe to use in a webserver environment but the code also monitors the SIGINT and SIGTERM interrupts that should be moved to the command line specific part. And it seems I need to check the availability of that function.

Thanks for pointing this out.

  • Like 1
Link to comment
Share on other sites

I recently did an import of about 15K records with these modules, and they worked well, however there was one undesirable side effect of storing the log to a standard Processwire text_area field, log_messages.

Processwire creates a full text index on text_area and text field types.

I had a few issues with field mappings and defining unique identifiers the first couple of times I ran the import, so just used Tracy Debugger to delete the pages, fix the import definition, and run the import again.

Unfortunately, over my efforts, the full text index on the field log_messages grew to over 2GB in size, even though the imported CSV file itself was only about 1MB.

It took me a bit to figure out why the database had suddenly grown from a few MB to several GB, and once I located the problem, I ran Optimise Table in mySQLAdmin, which cleaned things up and got things back down to a normal size.

I suspect in this case, the log_messages field probably doesn't need to have a full text index on it, although I guess if someone has a large number of import pages, maybe there is a case for them to be searchable?
If there isn't a need for the log_messages field to be searchable, then perhaps a custom text fieldtype without a full text index might make both the import run more quickly if the field is repeatedly updated during import, and also avoid the issue of creating an enormous index file.

I've made a post to the Wishlist and roadmap thread, suggesting an option for textarea fields to have indexing disabled if not required, as this would resolve the issue.

 

Edited by Kiwi Chris
Post to wishlist
  • Thanks 1
Link to comment
Share on other sites

  • 3 weeks later...
On 12/23/2020 at 7:40 AM, Kiwi Chris said:

I suspect in this case, the log_messages field probably doesn't need to have a full text index on it, although I guess if someone has a large number of import pages, maybe there is a case for them to be searchable?

Actually, the log_messages field is created and used by Tasker not DataSet.
I don't see any easy solution to handle this problem. If PW core gets noindex support then I'll use that.
The best thing you can do atm is to turn off profiling and debugging in Tasker in a production system (this is independent from PW settings).
It is also a good practice to delete tasks after they are finished and create new ones when needed.

Link to comment
Share on other sites

  • 1 year later...

I have gotten the module @mtwebit to work to import records which is great, but have a couple questions. What is the point of the global config? I don't seem to be able to use a global config only the file description. Also when the page is created my page save hook is not firing.  What hook can I used for pages created with the import?

Link to comment
Share on other sites

The global config is used to delete or export existing data. E.g.:

pages:
  template: Family
export:
  fields: [ family_id, previous_name, family_name, family_name_variants, source_text, bibliography_ref: title, notes ]
  delimiter: ';'
  header: 1

The Dataset > Purge link below the global config can be used to clean up the dataset (pages created during import).

Unfortunately, I did not use page save hooks in datasets during my projects. You may check the source code here.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...