Jump to content

How to upload heavy data into Processwire?


Vineet Sawant
 Share

Recommended Posts

Hi,

I'm trying to import some heavy data into Processwire, but I'm not sure what would be the best way to do it.

Usually I use CSV to Pages plugin, but this time the data is too heavy(~40k rows with 10+ columns of excel sheet), thus this plugin can't help.

I also tried Tasker plugin but I can't seem to go through the setup itself, it requires some template setup but I'm totally clueless about how to do it, so that plugin is not of any use either.

I wanted to know from you guys how you do it and in future what would be the best way to migrate thousands of rows of data in to PW.

 

Thanks.

 

 

Link to comment
Share on other sites

I also would recommend to write your own shell script and bootstrap ProcessWire.

You could use for example the PHP League CSV composer package and write your own import script where you save the CSV entries as pages via the API. 😉

It is not that hard and you can import large data this way. If you want to, I could post an example.

Regards, Andreas

Link to comment
Share on other sites

Here is a script I made for the import of thousands of customers.

You have to save this as shell script (f. e. sync-customers.php.sh), make the script executable and execute it via command line (./sync-customers.php.sh).

Spoiler

#!/usr/bin/env php
<?php
namespace {
	// Bootstrap ProcessWire
	include("./../../index.php");
}

namespace ProcessWire {

	echo "Synchronisation started...\n";

	// Source: http://csv.thephpleague.com/
	use League\Csv\Reader;

	$csv = Reader::createFromPath("./../assets/csv/customer.csv", "r");
	$csv->setDelimiter(";");
	$csv->setHeaderOffset(0);

	// $header = $csv->getHeader();
	$records = $csv->getRecords();

	// var_dump($header);
	// var_dump($records);

	/*
	 * Save records in new array
	 */
	$recordsArr = array();

	foreach ($records as $record) {

		// Save columns in variables
		$supplierID = 				$record["SupplierID"];
		$customername1 = 			$record["Customername1"];
		$street = 					$record["Street"];
		$postcode = 				$record["Postcode"];
		$city = 					$record["City"];
		$country = 					$record["Country"];
		/*
		$customername2 = 			$record["Customername2"];
		$email1 = 					$record["Email1"];
		$email2 = 					$record["Email2"];
		$additionalInformation = 	$record["AdditionalInformation"];
		$fieldworker = 				$record["Fieldworker"];
		$indoorservice = 			$record["Indoorservice"];
		$webseite = 				$record["IF:Webseite"];
		$verband = 					$record["IF:Verband"];
		$segment = 					$record["IF:Segment"];
		$unternehmenskette = 		$record["IF:Unternehmenskette"];
		*/

		$recordsArr[] = array(
			"supplierID" => 	$supplierID,
			"customername1" => 	$customername1,
			"street" => 		$street,
			"postcode" => 		$postcode,
			"city" => 			$city,
			"country" => 		$country
		);

	}

	// Remove duplicates
	$recordsArr = array_map("unserialize", array_unique(array_map("serialize", $recordsArr)));

	// var_dump($recordsArr);
	// echo count($recordsArr) . ".\n";

	// Get customers
	$customersPage = pages()->get("template=customers");
	$customers = pages()->find("parent=$customersPage, template=customer");

	/*
	 * Delete customers
	 */
	/*
	foreach ($customers as $customer) {

		$log->save("customers", "Customer " . $customer->title . " deleted.");
		echo("Customer " . $customer->title . " deleted.\n");

		pages()->delete($customer);

	}
	*/

	/*
	 * Create or update customers
	 */
	foreach ($recordsArr as $r => $record) {

		// Save columns in variables
		$supplierID = 				$record["supplierID"];
		$customername1 = 			$record["customername1"];
		$street = 					$record["street"];
		$postcode = 				$record["postcode"];
		$city = 					$record["city"];
		$country = 					$record["country"];

        $customersPage = pages()->get("template=customers");

		// Create customer
		if (!$customers->has("title=$supplierID")) {

			$customer = new Page();
			$customer->parent = $customersPage;
			$customer->template = "customer";
			$customer->title = $supplierID;

			$customer->of(false);

			$customer->save();

			$customer->set("customer_name", $customername1);
			$customer->set("customer_postal_code", $postcode);
			$customer->set("customer_city", $city);

			// Create distribution country if it doesnt exist
			$distributionCountry = pages()->get("title=$country, template=distribution-country");

			if (!$distributionCountry->id) {

				$distributionCountry = new Page();
				$distributionCountry->parent = pages()->get("template=distribution-countries");
				$distributionCountry->template = "distribution-country";
				$distributionCountry->title = $country;

				$distributionCountry->of(false);

				$distributionCountry->save();

				$log->save("distribution-countries", "Distribution country " . $distributionCountry->title . " created.");
				echo("Distribution country " . $distributionCountry->title . " created.\n");

			}

			$customer->set("customer_distribution_country", $country);

			$customer->save();

			$log->save("customers", "Customer " . $customer->title . " created.");
			echo("Customer " . $customer->title . " created.\n");

		// Update customer
		} else {

			// Get customer
			$customer = $customers->get("parent=$customersPage, title=$supplierID, template=customer");

			if (($customer->customer_name !== $customername1) ||
				($customer->customer_postal_code !== $postcode) ||
				($customer->customer_city !== $city) ||
				((string)$customer->customer_distribution_country->title !== $country)) {



				$customer->of(false);

				$customer->set("customer_name", $customername1);
				$customer->set("customer_postal_code", $postcode);
				$customer->set("customer_city", $city);
				$customer->set("customer_distribution_country", $country);

				$customer->save();

				$log->save("customers", "Customer " . $customer->title . " updated.");
				echo("Customer " . $customer->title . " updated.\n");

			}

		}

	}

	/*
	 * Delete leftover customers
	 */
	$savedCustomersArr = array();
	$customersArr = array();

	foreach ($customers as $customer) {

		$savedCustomersArr[] = $customer->title->getLanguageValue("default");

	}

	foreach ($records as $record) {

		// Save columns in variables
		$supplierID = $record["SupplierID"];

		$customersArr[] = $supplierID;

	}

	$deletedCustomersArr = array_diff($savedCustomersArr, $customersArr);
	$deletedCustomersArr = array_unique($deletedCustomersArr);

	// var_dump($savedCustomersArr);
	// var_dump($customersArr);
	// var_dump($deletedCustomersArr);

	foreach ($deletedCustomersArr as $deletedCustomer) {

		$customer = pages()->findOne("parent=$customersPage, title=$deletedCustomer, template=customer");

		$log->save("customers", "Customer " . $deletedCustomer . " deleted.");
		echo("Customer " . $deletedCustomer . " deleted.\n");

		pages()->delete($customer);

	}

	echo "Synchronisation finished...\n";

}

 

 

  • Like 4
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Similar Content

    • By hellerdruck
      Hi all
      I need to export all the texts from a website to a translation company (as json or csv or txt...). How can this be done? Of course manually, but this website is huge and it would take me years...
      Also, as a second step, importing the translation ...
      Any ideas anyone? Tutorials? Plugins?
      Thanks for your help.
    • By Rodd
      Hi everyone!
      I have a website in a production environment and I want to duplicate it in a local environment. I exported the content of the website (with the 'Site Profile Exporter' module) but I cannot use it actually. I've got an issue with the database. I imported this one in MAMP then.

      I also exported the pages (with the 'ProcessPagesExportImport' module), but I cannot import it to my local website because the fields don't exist. So I created this fields, but I have this error :
      How can I use the elements that already exist and are presents in my database? How can I duplicate correctly the templates, fields and pages?
      Thanks by advance
      PS: Sorry if my english is bad
       
    • By hellerdruck
      Hi all
      I need help with something. Situation: We have let's say 2'000 Files (Excel) that should be displayed (list with links) on a page. We'd need to filter these files by given Keywords or a tree structure or both. Now, I'm looking for a solution whereas our customer can synchronise the files from his local computer with the folder on the webserver. They will update and upload files on a daily basis. Therefore, it would need to synchronise rather than load the files manually in pages or repeaters. Maybe indexing would be an idea, too.
      Are there any modules for Processwire that would help achieving this? Could anyone point me in the right direction?
      Thanks in advance.
    • By iNoize
      Hello, need some help for an RealEstate project. It have to use the OnOffice to import the objects. 
      https://apidoc.onoffice.de/
       
    • By maba
      Hello,
      I need to import regularly - every 15 or 30 days - a big .xslx file into my PW installation.
      This file now has 14 columns, 5.000 rows and grows every month.
      I'll need to group, order and work with these data to:
      analyse User monthly costs analyse User costs per Asset ... User (real AD account) has to match with a PW user - I can't join to the domain - but as you can see I have some services users (start with sca_*) or no user at all. Those rows have to be assigned to a specific user, e.g. account100.
      And:
      I would like to be able to have a kind of diff function to compare User assets between this and last month (and so on) other request is to have a notification when something change for a User between actual and latest import First request: which is the best solution to store those data in your opinion? Page, Table, Repeater Matrix, ...?
      Those are very repetitive data and I think a page reference is better than to import all the data every time but I have to understand how to manage those "dynamic" groups of software (AccType Det), hardware (Asset), ... For example Price will be imported and not stored with the description because it could be change in the future and I'll not have any control on it.
      Thanks!
      User,OE,productNmr,AccType1,AccType Det,Count,Price (€),Sum,ASNA,CC,AccType Info,Asset,AccGroup,,,,,,,,,,,,,
×
×
  • Create New...