Jump to content
double

Import 8 million posts

Recommended Posts

Can I import 8 million posts? Is it possible? Hardware requirements?

I have 300 XML files (30k posts per file)

  1. SEO title
  2. Meta description
  3. H1 title
  4. Category
  5. Post content

What is the fastest way to import?

Edited by double

Share this post


Link to post
Share on other sites

You can do this through a CLI script boostraping ProcessWire. It should be possible, I've imported millions of rows from a CSV using very little memory, although taking quite a bit of time. I'd recommend also using database transactions for saving batches of new pages. 

  • Like 1

Share this post


Link to post
Share on other sites
3 minutes ago, elabx said:

You can do this through a CLI script boostraping ProcessWire. It should be possible, I've imported millions of rows from a CSV using very little memory, although taking quite a bit of time. I'd recommend also using database transactions for saving batches of new pages. 

How fast will be site after importing 8 million posts?

I tried to use Wordpress but it requires serious hardware to handle 8 million posts

I have a VDS 4 Cores 4 GB memory and NVMe storage

Edited by double

Share this post


Link to post
Share on other sites
1 hour ago, double said:

How fast will be site after importing 8 million posts?

That's basically impossible for anyone to answer as there are so many other variables involved than just the row count and your machine specs.

It will also depend on how much of that data needs to be loaded per page view, how many requests per second you expect to handle, will you be using caching, are there background updates happening, are the tables correctly indexed and using the most suitable storage engine, how many sessions will be active at peak, will you be triggering external API calls as part of the page views, what about asset loading - all assets optimised, and how often you'll need to be updating rows in the DB, do the pages involve JS rendering anything on the frontend  etc. etc.

I think you'd be better off setting a target for acceptable page loading times and then asking "What do I need to do to get 80% of my page loads to this time or better?"

You also need to consider if PW's API is a good fit for your programming needs and if the Admin interface is suitable for you and any users who may need access to the admin.

I'd suggest setting your speed goals and then trying an import of a subset of your data and then seeing how your resource needs and page speeds scale going from say 100 thousand to 200 thousand rows and then extrapolating from that.

If you do try out PW, please keep us updated with your results.

  • Like 5

Share this post


Link to post
Share on other sites
1 hour ago, double said:

How fast will be site after importing 8 million posts?

@netcarver 's answer pretty much covers this. My advice would be to watch out for anything involving counting, such as pagination and complex selectors using the Selector API. 

  • Like 3

Share this post


Link to post
Share on other sites
On 11/26/2020 at 4:12 PM, double said:

Can I import 8 million posts? Is it possible? Hardware requirements?I

Yes, Yes and 🤷‍♂️

 

It should be a good task candidate for the following.

👉 Give a try to the modules developed by @mtwebit 

DataSet and Tasker

Configure it and let the thing running the night or a good nap.

A preview

 

 

  • Like 4

Share this post


Link to post
Share on other sites

I haven't used the DataSet XML import for a while. Let me know if it needs some polishing.

  • Like 2

Share this post


Link to post
Share on other sites
On 11/26/2020 at 7:25 PM, netcarver said:

If you do try out PW, please keep us updated with your results.

I've added 1.5 million pages. It works faster than Wordpress, but server response time is still high - 0,8-2 s. I have 4 Cores, 4 GB RAM and SSD disk, Mysql 8 and PHP 7

I'll keep you updated

  • Like 3

Share this post


Link to post
Share on other sites

 

On 1/17/2021 at 6:42 AM, double said:

It works faster than Wordpress, but server response time is still high - 0,8-2 s.

Are you doing any $pages->find() or get() on the pages with such performance?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...