Jump to content

Real estate website from XML feed


Recommended Posts

I'm going to be building a real estate website soon and am exploring my options. The site will have some pages that will be native to PW, but all their property listing information will come from an XML feed from an external provider.

What's the best way to approach this? I've come across http://modules.processwire.com/modules/rss-feed-loader/ but is it possible to do searching, categorisation and pagination using that?

Or is there a better way to approach it?

Link to comment
Share on other sites

Does the XML feed contain *all* data always? I had to deal with a slightly different issue (include a facebook news stream with search, pagination etc), and I decided to create real PW pages from all feed items. This gave me the flexibility to use the build-in functions without building an extensive Facebook API connector. 

How much items do you have to handle? 

Happy guess: you can create a WireArray (https://processwire.com/api/arrays/) from the feed data which contains already some build-in methods (unfortuantely not the pagination thingy)


Link to comment
Share on other sites

The XML feed will contain all data related to property listings. Things like home page, about, contact, staff profiles etc will be handled in PW.

There'll be 100-150 items in the feed.

Link to comment
Share on other sites

Is the data from the XML needed to be shown in real time? I'm mean, does it change A LOT every minute or so? I don't think so, as is not an auction site apparently.

So I think is a good idea to import the data as PW pages and using a cron job or manually import only the fields that changed on a daily update and import new items if not created already. 

  • Like 1
Link to comment
Share on other sites

17 minutes ago, Sérgio said:

So I think is a good idea to import the data as PW pages and using a cron job or manually import only the fields that changed on a daily update and import new items if not created already. 

As long as he's not the owner of the source i don't think it's proper to create pages for the feeds that would change over-time, the only thing I can say is too fetch the XML Feeds and create a Cache for that maybe like 1 hour, to save speed. 

Link to comment
Share on other sites

@Tyssen , if you decided for this approach of caching the xml, you can see how I've done this when I was getting info from Slideshare using GuzzleHttp client.

use GuzzleHttp\Client;

public function getSlideshows() {

        $client = new Client([
            // Base URI is used with relative requests
            'base_uri' => 'https://www.slideshare.net/api/2/',
            // Set timeout. 
            'timeout'  => 35.0, //Slideshare was taking long 35 seconds to respond. The xml has 100+ items.
        $api_key = 'xxx';
        $username = 'xxx';
        $time = mktime(date("H"));
        $secret = 'xxx';
        $sha1 = sha1($secret].$time);

        $cache = wire('cache');
        $response = $cache->get("slideshare_xml");
        //save and cache the xml
        if(!$response) {            
            $url = $client->get('get_slideshows_by_user?username_for='.$username.'&detailed=1&api_key='.$api_key.'&ts='.$time.'&hash='.$sha1);
            $response = $url->getBody();
            $cache->save('slideshare_xml', $response); //default is 24h


        $xml = new \SimpleXMLElement($response);        

        echo $xml;
        // $slide["secret_key"] = (string) $xml->Slideshow->SecretKey;
        // $slide["title"] = (string) $xml->Slideshow->Title;
        // $slide["description"] = (string) $xml->Slideshow->Description;
        // $slide["url"] = (string) $xml->Slideshow->URL;
        // $slide["thumbnail_url"] = (string) $xml->Slideshow->ThumbnailURL;
        // $slide["embed_url"] = (string) $xml->Slideshow->SlideshowEmbedUrl;
        // $slide["created"] = (string) $xml->Slideshow->Created;
        // $slide["language"] = (string) $xml->Slideshow->Language;
        // $slide["num_views"] = (int) $xml->Slideshow->NumViews;


  • Like 1
Link to comment
Share on other sites

  • 1 month later...
On 01/03/2017 at 1:36 PM, Sérgio said:

So I think is a good idea to import the data as PW pages and using a cron job or manually import only the fields that changed on a daily update and import new items if not created already. 

I'm glad this thread is here. I might have to do exactly the same things on a job I'm looking at. I imagined the pages (about, contact etc,..) being just PW pages but the feed (I don't know the type yet) will contain the data required for the actual properties. The feed will come from a cloud CRM system where the actual properties are uploaded to.

Imagine the feed changes once a week, the PW pages have to reflect this. The property pages displayed in PW will not be edited in PW.

So how do you go about getting this feed and making PW pages out of it? This I'm confused about.

Link to comment
Share on other sites

On 13/04/2017 at 9:47 AM, DaveP said:

There may be other solutions/modules but Ryan's RSS Loader Module would probably be a good starting point. It will obviously depend on the format of your source feed.

Thanks @DaveP I've now found out it is a JSON feed so I'll try and work out how to do this. I'll start a new thread instead of hijacking this one.


Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Similar Content

    • By Mike Rockett
      Docs & Download: rockettpw/seo/markup-sitemap
      Modules Directory: MarkupSitemap
      Composer: rockett/sitemap
      ⚠️ NEW MAINTAINER NEEDED: Sitemap is in need of developer to take over the project. There are a few minor issues with it, but for the most part, most scenarios, it works, and it works well. However, I'm unable to commit to further development, and would appreciate it if someone could take it over. If you're interested, please send me a private message and we can take it from there.
      MarkupSitemap is essentially an upgrade to MarkupSitemapXML by Pete. It adds multi-language support using the built-in LanguageSupportPageNames. Where multi-language pages are available, they are added to the sitemap by means of an alternate link in that page's <url>. Support for listing images in the sitemap on a page-by-page basis and using a sitemap stylesheet are also added.
      Example when using the built-in multi-language profile:
      <url> <loc>http://domain.local/about/</loc> <lastmod>2017-08-27T16:16:32+02:00</lastmod> <xhtml:link rel="alternate" hreflang="en" href="http://domain.local/en/about/"/> <xhtml:link rel="alternate" hreflang="de" href="http://domain.local/de/uber/"/> <xhtml:link rel="alternate" hreflang="fi" href="http://domain.local/fi/tietoja/"/> </url> It also uses a locally maintained fork of a sitemap package by Matthew Davies that assists in automating the process.
      The doesn't use the same sitemap_ignore field available in MarkupSitemapXML. Rather, it renders sitemap options fields in a Page's Settings tab. One of the fields is for excluding a Page from the sitemap, and another is for excluding its children. You can assign which templates get these config fields in the module's configuration (much like you would with MarkupSEO).
      Note that the two exclusion options are mutually exclusive at this point as there may be cases where you don't want to show a parent page, but only its children. Whilst unorthodox, I'm leaving the flexibility there. (The home page cannot be excluded from the sitemap, so the applicable exclusion fields won't be available there.)
      As of December 2017, you can also exclude templates from sitemap access altogether, whilst retaining their settings if previously configured.
      Sitemap also allows you to include images for each page at the template level, and you can disable image output at the page level.
      The module allows you to set the priority on a per-page basis (it's optional and will not be included if not set).
      Lastly, a stylesheet option has also been added. You can use the default one (enabled by default), or set your own.
      Note that if the module is uninstalled, any saved data on a per-page basis is removed. The same thing happens for a specific page when it is deleted after having been trashed.
    • By iNoize
      Hello, need some help for an RealEstate project. It have to use the OnOffice to import the objects. 
    • By stanoliver
      My aim is to output a very basic xml document which should be styled with a few css-styles.
      <?xml version = "1.0"?> <contact-info> <name>Donal Duck</name> <company>Superducks</company> <phone>(011) 123-4567</phone> </contact-info> How do I implement it with processwire?
    • By gebeer
      just wanted to share something I came across while working on an import module for XML data from a web service. The XML I got was not huge, but still, loading around 3.5 MB of XML with 250+ large child nodes into memory at once with simplexml_load_file() and then looping over it had significant impact on performance.
      I searched for a solution and found this great article about how to parse large XML files.
      It basically explains how to utilize the native XMLReader class together with SimpleXMLElement to handle such situations in a memory efficient way.
      After implementing it I got a significant improval on perceived performance. No comparison in numbers to share here as I'm a bit short on time.
    • By John W.
      A little guide to generating an sitemap.xml using (I believe) a script Ryan originally wrote with the addition of being able to optionally exclude child pages from being output in the sitemap.xml file.
      I was looking back on a small project today where I was using a php script to generate an xml file, I believe the original was written by Ryan. Anyway, I needed a quick fix for the script to allow me to optionally exclude children of pages from being included in the sitemap.xml output.
      A good example of this is a site where if you visit /minutes/ a page displays a list of board meetings which includes a title,  date, description and link to download the .pdf file.
      I have a template called minutes and a template called minutes-document. The first page, minutes, when loaded via /minutes/ simply grabs all of its child pages and outputs the name, description and actual path of an uploaded .pdf file for a visitor to download.
      In my back-end I have the template MINUTES and MINUTES-DOCUMENT. Thus:

      So, basically, their employee can login, hover over minutes, click new, then create a new (child) record and name it the date of the meeting e.g. June 3rd, 2016 :

      Outputting the sitemap.xml and optionally excluding children that belong to a template.
      The setup of the original script is as follows:
      1. Save the file to the templates folder as sitemap.xml.php
      2. Create a template called sitemap-xml and use the sitemap.xml.php file.
      3. Create a page called sitemap.xml using the sitemap-xml template
      Now, with that done you will need to make only a couple of slight modifications that will allow the script to exclude children of a template from output to the sitemap.xml
      1. Create a new checkbox field and name it:   sitemap_exclude_children
      2. Add the field to a template that you want to control whether the children are included/excluded from the sitemap. In my example I added it to my "minutes" template.
      3. Next, go to a page that uses a template with the field you added above. In my case, "MINUTES"
      4. Enable the checkbox to exclude children, leave it unchecked to include children.
      For example, in my MINUTES page I enabled the checkbox and now when /sitemap.xml is loaded the children for the MINUTES do not appear in the file.

      A SIMPLE CONDITIONAL TO CHECK THE "sitemap_exclude_children" VALUE
      This was a pretty easy modification to an existing script, adding only one line. I just figure there may be others out there using this script with the same needs.
      I simply inserted the if condition as the first line in the function:
      function renderSitemapChildren(Page $page) { if($page->sitemap_exclude_children) return ""; ... ... ...  
      <?php /** * ProcessWire Template to power a sitemap.xml * * 1. Copy this file to /site/templates/sitemap-xml.php * 2. Add the new template from the admin. * Under the "URLs" section, set it to NOT use trailing slashes. * 3. Create a new page at the root level, use your sitemap-xml template * and name the page "sitemap.xml". * * Note: hidden pages (and their children) are excluded from the sitemap. * If you have hidden pages that you want to be included, you can do so * by specifying the ID or path to them in an array sent to the * renderSiteMapXML() method at the bottom of this file. For instance: * * echo renderSiteMapXML(array('/hidden/page/', '/another/hidden/page/')); * * patch to prevent pages from including children in the sitemap when a field is checked / johnwarrenllc.com * 1. create a checkbox field named sitemap_exclude_children * 2. add the field to the parent template(s) you plan to use * 3. when a new page is create with this template, checking the field will prevent its children from being included in the sitemap.xml output */ function renderSitemapPage(Page $page) { return "\n<url>" . "\n\t<loc>" . $page->httpUrl . "</loc>" . "\n\t<lastmod>" . date("Y-m-d", $page->modified) . "</lastmod>" . "\n</url>"; } function renderSitemapChildren(Page $page) { if($page->sitemap_exclude_children) return ""; /* Aded to exclude CHILDREN if field is checked */ $out = ''; $newParents = new PageArray(); $children = $page->children; foreach($children as $child) { $out .= renderSitemapPage($child); if($child->numChildren) $newParents->add($child); else wire('pages')->uncache($child); } foreach($newParents as $newParent) { $out .= renderSitemapChildren($newParent); wire('pages')->uncache($newParent); } return $out; } function renderSitemapXML(array $paths = array()) { $out = '<?xml version="1.0" encoding="UTF-8"?>' . "\n" . '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'; array_unshift($paths, '/'); // prepend homepage foreach($paths as $path) { $page = wire('pages')->get($path); if(!$page->id) continue; $out .= renderSitemapPage($page); if($page->numChildren) { $out .= renderSitemapChildren($page); } } $out .= "\n</urlset>"; return $out; } header("Content-Type: text/xml"); echo renderSitemapXML(); // Example: echo renderSitemapXML(array('/hidden/page/'));  
      In conclusion, I have used a couple different processwire sitemap generating modules. But for my needs, the above script is fast and easy to setup/modify.
      - Thanks
  • Create New...