Jump to content

How to import thousands of products of other retailers into your webshop?


Recommended Posts

Posted

So you built a webshop for a client... The client is selling products from other retailers... Now they want to import thousands of products into their webshop... How would you approach this? 🙈🤯

Posted

Nervously! 😆

On a more practical note, I built an import from CSV routine that imports data in batches and requires the ImportPagesCSV module. In client's case it was to import user data exported to CSV files from their accounting system. ImportCSVPages imports the CSV data to pages under page named "Imported Customers CSV".

Hard set limit was 50 per run. In site/ready.php:

// Turn imported CSV customer list to users
$wire->addHookAfter('Pages::saved', function (HookEvent $event) {
    $pages = $event->object;
    $page = $event->arguments(0);
    if ($page->id !== 5251) return; // Import Customers - use your own Import data parent page
    if ($page->children->count === 0) return;
    $return = $event->return;
    $users = wire('users');

    foreach($page->children('limit=50') as $child) {
        // check user not already created
        $childEmail = $child->email;
        $emailAlready = $users->get("email=$childEmail");
            if($emailAlready->count > 0) {
                wire('log')->save('myclient', "User with email,". $child->email . " already exists");
                continue;
            }

        $customer = new User();
        $customer->of(false);
        $customer->template = 'reseller';
        $customer->name = $child->name;
        $customer->parent = 1033; // change to the 'real' parent, eg products
        $customer->addRole('login-register');
        $customer->pass = bin2hex(random_bytes(10));

        $fields = $child->getFields();

        foreach ($fields as $field) {
            $fieldName = $field->name;
            $customer->of(false);

            if ($customer->template->hasField($fieldName)) {
                $customer->set($fieldName, $child->$fieldName);
            } elseif ($field->children) {
                // for fields within fieldsets
                foreach ($field->children as $child) {
                    $childName = $child->name;
                    if ($customer->template->hasField($childName))
                        $customer->set($childName, $child->$childName);
                }
            }
        }
        if(!$customer->hasRole('login-register'))
            $customer->addRole('login-register');

        $customer->save();
        $pages->delete($child);


    }
    $event->return = $return;
});

Having a hard limit of 50 helped if things went wrong and didn't trigger timeouts.

Process is:

  1. Import the data under the "Imported Customers CSV"
  2. Check the data is OK
  3. If OK, edit and save the "Imported Customers CSV" page to invoke the hook

If the import didn't work, I could easily delete all the child pages of "Imported Customers CSV", make adjustments and try again.

 

  • Like 2
Posted

https://github.com/teppokoivula/ImportTool was built to handle a similar need, in case you want to check it out. It’s most useful for imports that may be needed again later, though; the idea is that you define an import profile, which can then be executed via the admin while also providing an input file for the data.

Depending on the data I usually set limit somewhere between 50-500 pages and set the “on_duplicate” setting for the profile to “continue”. This way you can keep running the same import profile with the same data file until you’re done.

This module is a bit unpolished, but in use on a few of our sites 🙂

  • Like 3
Posted

Hey @teppo and @psy thx for your answers! Looks like I was not precise enough with my question. The import part is not the question - I've imported 1000+ pages for this project already from the old website using RockShell and a custom script. With RockShell it's super easy to create reusable scripts (as you have correctly mentioned as important need @teppo) and it's super easy to watch what's happening and to abort in case anything unwanted happens (just hit CTRL+C).

The question is more related to e-commerce and different vendors providing different data. What we got was a total mess. From CSV with thousands of entries in one huge table to excel files with related keys like an excel copy of a relational database. Think of it as a mysql dump of a ProcessWire installation... but without ProcessWire transforming it into something useful and understandable 😅

My guess is that the only way is to understand the data we get from the vendors and then try to transform it into the data format that RockCommerce needs. Sounds like a pain. But manually adding 15k products does not sound better either 😄 

So I also thought about scraping the data from their websites... but I'm not sure if that was any easier than reading their messy CSVs...

Anybody has some experience with that? Also happy with "no idea/not easy/not possible" answers to help me back what I told my client ^^

PS: A RockShell command to import from CSV is as simple as that if anybody has a similar need:

<?php

namespace Site;

use ProcessWire\Page;
use RockShell\Command;

use function ProcessWire\wire;

class ImportProducts extends Command
{
  public function handle()
  {
    while ($row = wire()->files->getCSV('/path/to/data.csv')) {
      // create page and save it
      $p = new Page();
      // ...
      echo "Saved page {$p->path}";
    }
    return self::SUCCESS;
  }
}

You can then run that command via "rockshell import:products" and watch it do its work 🙂 So the ProcessWire part is easy.......

Posted
1 hour ago, bernhard said:

My guess is that the only way is to understand the data we get from the vendors and then try to transform it into the data format that RockCommerce needs. Sounds like a pain. But manually adding 15k products does not sound better either 😄 

So I also thought about scraping the data from their websites... but I'm not sure if that was any easier than reading their messy CSVs...

Anybody has some experience with that? Also happy with "no idea/not easy/not possible" answers to help me back what I told my client ^^

It's possible that I'm still missing the point, but if the underlying problem is that there are multiple data sources with differing data formats, there are only two top level approaches that I can think of:

  • write a separate import script / profile / configuration for each data source, or
  • write an adapter per source that converts them to single, uniform format, which you can then import.

For me personally scraping is never the preferred option, as it comes with a number of potential issues. For one you may not be able to scrape all relevant data, or you may get malformed or partial data — and you may not know it before it is too late. Getting your hands on the raw data is almost always much, much better. At the very least I would contact each vendor before scraping to confirm that a) they think it is doable and b) it won't result in them blocking you due to your scraping tool exceeding rate limits etc.

If you can't make sense of the data format you've got, ask for some kind of documentation. The worst case is that you need to figure things out on your own — that can easily lead to nasty issues, as your assumptions could be completely wrong. If there is no way to get solid documentation for the data, let your client know that it's essentially a guessing game at this point. Especially when money is involved that's not a great situation to be in.

Anyway, from what I've heard so far this all seems completely doable, but could obviously get pretty time consuming — and hence costly 🙂

  • Like 4
Posted
2 minutes ago, teppo said:

It's possible that I'm still missing the point

No, spot on! Thx that was very helpful 🙂 

  • Like 1
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...