Jump to content

Load RSS Feeds (MarkupLoadRSS)


ryan

Recommended Posts

ProcessWire RSS Feed Loader


Given an RSS feed URL, this module will pull it, and let you foreach() it or render it. This module will also cache feeds that you retrieve with it. The module is designed for ProcessWire 2.1+, but may also work with 2.0 (haven't tried yet).

This module is the opposite of the MarkupRSS module that comes with ProcessWire because that module creates RSS feeds. Whereas this module loads them and gives you easy access to the data to do whatever you want.

For a simple live example of this module in use, see the processwire.com homepage (and many of the inside pages) for the "Latest Forum Post" section in the sidebar.

Download at: https://github.com/r...n/MarkupLoadRSS

REQUIREMENTS


This module requires that your PHP installation have the 'allow_url_fopen' option enabled. By default, it is enabled in PHP. However, some hosts turn it off for security reasons. This module will prevent itself from being installed if your system doesn't have allow_url_fopen. If you run into this problem, let me know as we may be able to find some other way of making it work without too much trouble.

INSTALLATION


The MarkupLoadRSS module installs in the same way as all PW modules:

1. Copy the MarkupLoadRSS.module file to your /site/modules/ directory.

2. Login to ProcessWire admin, click 'Modules' and 'Check for New Modules'.

3. Click 'Install' next to the Markup Load RSS module.

USAGE


The MarkupLoadRSS module is used from your template files. Usage is described with these examples:

Example #1: Cycling through a feed

<?php

  $rss = $modules->get("MarkupLoadRSS");
  $rss->load("http://www.di.net/articles/rss/");

  foreach($rss as $item) {
      echo "<p>";
      echo "<a href='{$item->url}'>{$item->title}</a> ";
      echo $item->date . "<br /> ";
      echo $item->description;
      echo "</p>";
  }

Example #2: Using the built-in rendering

<?php

  $rss = $modules->get("MarkupLoadRSS");
  echo $rss->render("http://www.di.net/articles/rss/");

Example #3: Specifying options and using channel titles

<?php

  $rss = $modules->get("MarkupLoadRSS");

  $rss->limit = 5;
  $rss->cache = 0;
  $rss->maxLength = 255;
  $rss->dateFormat = 'm/d/Y H:i:s';

  $rss->load("http://www.di.net/articles/rss/");

  echo "<h2>{$rss->title}</h2>";
  echo "<p>{$rss->description}</p>";
  echo "<ul>";

  foreach($rss as $item) {
       echo "<li>" . $item->title . "</li>";
  }

  echo "</ul>";

OPTIONS


Options MUST be set before calling load() or render().

<?php

  // specify that you want to load up to 3 items (default = 10)
  $rss->limit = 3;

  // set the feed to cache for an hour (default = 120 seconds)
  // if you want to disable the cache, set it to 0.
  $rss->cache = 3600;

  // set the max length of any field, i.e. description (default = 2048)
  // field values longer than this will be truncated
  $rss->maxLength = 255;

  // tell it to strip out any HTML tags (default = true)
  $rss->stripTags = true;

  // tell it to encode any entities in the feed (default = true);
  $rss->encodeEntities = true;

  // set the date format used for output (use PHP date string)
  $rss->dateFormat = "Y-m-d g:i a";

See the $options array in the class for more options. You can also customize all output produced by the render() method, though it is probably easier just to foreach() the $rss yourself. But see the module class file and $options array near the top to see how to change the markup that render() produces.

MORE DETAILS


This module loads the given RSS feed and all data from it. It then populates that data into a WireArray of Page-like objects. All of the fields in the RSS <items> feed are accessible, so you use whatever the feed provides. The most common and expected field names in the RSS channel are:

  • $rss->title
  • $rss->pubDate (or $rss->date)
  • $rss->description (or $rss->body)
  • $rss->link (or $rss->url)
  • $rss->created (unix timestamp of pubDate)

The most common and expected field names for each RSS item are:

  • $item->title
  • $item->pubDate (or $item->date)
  • $item->description (or $item->body)
  • $item->link (or $item->url)
  • $item->created (unix timestamp of pubDate)

For convenience and consistency, ProcessWire translates some common RSS fields to the PW-equivalent naming style. You can choose to use either the ProcessWire-style name or the traditional RSS name, as shown above.

HANDLING ERRORS


If an error occurred when loading the feed, the $rss object will have 0 items in it:

<?php

  $rss->load("...");
  if(!count($rss)) { error }

In addition, the $rss->error property always contains a detailed description of what error occurred:

<?php

  if($rss->error) { echo "<p>{$rss->error}</p>"; }

I recommend only checking for or reporting errors when you are developing and testing. On production sites you should skip

error checking/testing, as blank output is a clear indication of an error. This module will not throw runtime exceptions so if an error occurs, it's not going to halt the site.

  • Like 1
Link to comment
Share on other sites

Great to have such a module, thanks Ryan!

Though can't install:

Warning: mkdir() [function.mkdir]: No such file or directory in /Applications/XAMPP/xamppfiles/htdocs/pw2.ch/site/modules/MarkupLoadRSS.module on line 447

Link to comment
Share on other sites

  • 3 months later...

I want to display a RSS feed that contains items like below and it works well, except for the author field (dc:creator), which isn't parsed. Is there a way to parse this value as well?

		<item>
	<title>Taalkundigen Uppsala ontcijferen geheimschrift</title>
	<link>http://www.wereldwijzerzweden.net/2011/11/03/uppsala-geheimschrift-taalkundige-copiale/</link>
	<comments>http://www.wereldwijzerzweden.net/2011/11/03/uppsala-geheimschrift-taalkundige-copiale/#comments</comments>
	<pubDate>Thu, 03 Nov 2011 16:03:16 +0000</pubDate>
	<dc:creator>Marcel Burger</dc:creator>
	<category><![CDATA[Actueel]]></category>
	<category><![CDATA[berlijn]]></category>
	<category><![CDATA[Copiale]]></category>
	<category><![CDATA[geheimschrift]]></category>
	<category><![CDATA[universiteit]]></category>
	<category><![CDATA[uppsala]]></category>

	<guid isPermaLink="false">http://www.wereldwijzerzweden.net/?p=7227</guid>
	<description><![CDATA[<a href="http://www.wereldwijzerzweden.net/2011/11/03/uppsala-geheimschrift-taalkundige-copiale/"><img align="left" hspace="5" width="150" src="http://www.wereldwijzerzweden.net/images/copiale_280.jpg" class="alignleft wp-post-image tfe" alt="Deel uit vrijgegeven beeld van het Copialeschrift" title="copiale_280.jpg" /></a>3 november 2011 &#124; Twee Zweedse taalkundigen en een Amerikaanse wetenschapper zijn erin geslaagd een 280 jaar oud geheimschrift uit Duitsland met voorheen onbegrijpelijke tekens te vertalen.]]></description>
	<wfw:commentRss>http://www.wereldwijzerzweden.net/2011/11/03/uppsala-geheimschrift-taalkundige-copiale/feed/</wfw:commentRss>
	<slash:comments>0</slash:comments>
	</item>

I outputted the $rss array with print_r(); and it doesn't contain the dc:creator field (some others seem to be missing as well,  but I don't need these  ;))

/Jasper

Link to comment
Share on other sites

If I recall correctly, SimpleXML doesn't work with the properties that have colons in them. But you can fix that by replacing the colon properties with underscore properties in the XML data. So in this case, you'd want to add this line in the load() function:

<?php
public function load($url) { 
    $this->items = new WireArray();
    $xmlData = $this->loadXmlData($url);
    $xmlData = str_replace('dc:creator', 'dc_creator', $xmlData); 

Or you may be able to cover all the colon properties at once using a regexp like this:

<?php
$xmlData = preg_replace('{(</?[_a-z0-9]+)[_a-z0-9]+>)}', '$1_$2', $xmlData); 

What that does is convert properties like <dc:creator> to <dc_creator> so that SimpleXML will understand them and likewise you can access them in the module. Let me know if this works for you. I'm not in a place where I can update the source on this module today, but will plan to add something like the above soon.

I don't know why the <comments> property wouldn't be getting parsed, as that appears to just be a string (URL). I need to test and experiment with that one to find out why.

Link to comment
Share on other sites

Thanks Ryan, replacing the colons work, both with the str_replace and the regexp..

I'm not in a place where I can update the source on this module today, but will plan to add something like the above soon.

I also submitted (via Github) a double encoding issue (I am good in finding these  :P) in this module. You might want to take a look at that one at the same time. :-)

I don't know why the <comments> property wouldn't be getting parsed, as that appears to just be a string (URL). I need to test and experiment with that one to find out why.

My fault  :-[, the comments property is parsed. One that didn't get parsed was the Category, but that may be because it appears multiple times. (guess).

The exact feed I am using is also in the Github issue, so you can test with it if you want/like.

/Jasper

Link to comment
Share on other sites

Thanks for submitting the issue, I will fix. Also I'd like to find a way to get Comments (and any multi-item properties) working as well, should be easy. The feeds I'd originally tested with were pretty basic and didn't have these extended properties.

Link to comment
Share on other sites

  • 4 months later...

Great work Ryan! Only thing I might add is support for multiple feeds. Though it might complicate this module too much?

I had need for multiple feeds and it seemed to be pretty straightforward implementation. Only few modifications to load method:

public function load($url) {
 $this->items = new WireArray();
 if (is_array($url)) {
  $items = array();
  foreach ($url as $feed) {
   $xmlData = $this->loadXmlData($feed);
   $xml = simplexml_load_string($xmlData);
   $items = array_merge($items, $xml->xpath('/rss//item'));
  }
  $rss = simplexml_load_string($xmlData);
 } else {
  $xmlData = $this->loadXmlData($url);
  $rss = simplexml_load_string($xmlData);
 }

 if(!$rss) {
  $msg = "Unable to load RSS feed at " . htmlentities($url) . ": \n";
  foreach(libxml_get_errors() as $error) $msg .= $error . " \n";
  $this->error($msg);
  return $this;
 }
 $this->channel['title'] = $this->cleanText((string) $rss->channel->title);
 $this->channel['description'] = $this->cleanText((string) $rss->channel->description);
 $this->channel['link'] = $this->cleanText((string) $rss->channel->link);
 $this->channel['created'] = strtotime((string) $rss->channel->pubDate);
 $this->channel['pubDate'] = date($this->options['dateFormat'], $this->channel['created']);
 $n = 0;
 // If we already have $items set, it means we are dealing with multiple sources. Let's sort them
 if(isset($items)) {
  usort($items, function ($x, $y) {
 return strtotime($y->pubDate) - strtotime($x->pubDate);
  });
 } else {
  $items = $rss->channel->item;
 }
 foreach($items as $item) {
  $a = new MarkupLoadRSSItem();
  foreach($item as $key => $value) {
   $value = (string) $value;
   if($key == 'pubDate') {
 $value = strtotime($value);
 $a->set('created', $value);
 $value = date($this->options['dateFormat'], $value);
   } else {
 $value = $this->cleanText($value);
   }
   $a->set($key, $value);
  }
  $this->items->add($a);
  if(++$n >= $this->options['limit']) break;
 }
 return $this;
}

What it does it sniffs if $url is array, then loads/caches all those and merge their rss-items to $items array. Then later on that $items is sorted by pubDate. So this is fully backwards compatible => just give it an array instead of single url if you need to parse multiple feeds.

If you guys can test it works for you too then maybe Ryan you can put this on your version. I can do pull request if you want to (although it seems that new and fancy GitHub for windows does mess up line endings..).

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

I'm not sure that the W3 validator is picking it up right either? Seems like it is showing the whole thing as double entity encoded. Also tried loading in Safari, and it can't seem to read the feed correctly either. Firefox seems okay. Definitely something unusual going on with this feed, but I am not familiar enough with this particular format to know what's wrong. W3 validator isn't helping much since it's seeing the whole thing as double entity encoded.

Link to comment
Share on other sites

  • 2 months later...

Yeah, there definitely was some strange going on with that feed. Now it seems to be working on my end too, so they must have been fixed that.

Ryan: have you thought about adding that multisource functionality to this module? I am already using it in couple of places, and it has been working great. Of course if you think the implementation should be different or alltogether different module then let me know (or if you prefer github pull request).

What I was thinking it might be more "pw" to have add->(source_url) etc and then load, instead of having all the urls in array load($array_of_urls) like it is currently.

Link to comment
Share on other sites

  • 4 months later...

Hi Ryan,

Thanks for this module.  Have been using it on our main site for awhile now.  Just wanted to let you know of an issue that I just discovered that others may run into, and see if there's a way to handle it.

I was trying to load a feed that for awhile was not responding.  The feed page wasn't throwing an error or even timing out, just loading for minutes on end.

This ended up causing a timeout on our site (the feed was loading on the main page) and producing this error in the PW log file:

Error Exception: MySQL server has gone away (in /mnt/stor7-wc2-dfw1/526843/www.agencypja.com/web/content/wire/core/Database.php line 118)

For now, we've just disabled that feed, but we are using the module to load other feeds.  Do you (or anyone else) know of a way to address this issue?  I don't see a timeout option in the module, but could certainly look into adding one if that determined to be the best option.

Thanks.

Link to comment
Share on other sites

Could you PM me the RSS feed you are working with? I can do some testing here. I believe we can get it working by switching MarkupLoadRSS to use the new WireHttp class in PW 2.3.10+, but I need an example to test with. 

Link to comment
Share on other sites

Hi Ryan,

Unfortunately, the feed that was causing problems is now back up and running normally.  I thought that I could recreate the issue by creating a php page on another server with a timeout set to at least 5 min, sleeping the script, and using that as the RSS feed, but that didn't work.

I'll be sure to let you know if I ever come across is again.

Thanks.

  • Like 1
Link to comment
Share on other sites

  • 1 year later...
  • 3 months later...
  • 11 months later...

I get always empty RSS Feed output! On my Page i call the RSS module by url selector like blog/rss and for the output i need the same page array that i use for the /blog page. But in the RSS Feed i get no content!

$blogposts = $pages->find("template=post, publish_date<$today, sort=-publish_date, limit=10");

if($input->urlSegment1 === 'rss'){
  // retrieve the RSS module
  $rss = $modules->get("MarkupRSS");

  // configure the feed. see the actual module file for more optional config options.
  $rss->title = "Letzte Blogeinträge";

  $rss->render($blogposts);
  return;
} else {
  $content = renderPosts($blogposts, true);
}
Link to comment
Share on other sites

  • 3 months later...

I just downloaded MarkupLoadRSS module from M.Cramer's Github repo.

Here is the demo code I used for test purpose in my template :

        $rss = $modules->get("MarkupLoadRSS");
        $rss->load("http://rss.cbc.ca/lineup/canada.xml");

        foreach($rss as $item) {
            echo "<p>";
            echo "<a href='{$item->url}'>{$item->title}</a> ";
            echo $item->date . "<br /> ";
            echo $item->description;
            echo "</p>";
        }

All I get is this error :

Error: Call to a member function load() on a non-object (line 65 of C:\wamp\www\mysite\site\templates\home.php)

As if the module wants an object, like a $page or $config or something...

I looked up the code and the function is load($url). Would it be conflicting with something ?

I'm running v2.7.2

Edited by kongondo
merged your topic here, the module's support forum
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Similar Content

    • By MarkE
      This fieldtype and inputfield bundle was built for storing measurement values within a field, rendering them in a variety of formats and converting them to other units or otherwise modifying them via the API.
      The API consists of a number of predefined functions, some of which include...
      render() for rendering the measurement object, valueAs() for converting the value to another unit value, convertTo() for converting the whole measurement object to different units, and add() and subtract() for for modifying the stored value by the value (converted as required) in another measurement. In the admin the inputfield includes a checkbox (which can be optionally disabled) for converting values on page save. For an example if a value was typed in as centimeters, the unit was changed to metres, and the page saved with this checkbox selected, said value would be automatically converted so that e.g. 170 cm becomes 1.7 m.

      A simple length field using Fieldtype Measurement and Inputfield Measurement.
      Combination units (e.g. feet and inches) are also supported.
      Please note that this module is 'proof of concept' at the moment - there are limited units available and quite a lot of code tidying to do. More units will be added shortly.
      See the GitHub at https://github.com/MetaTunes/FieldtypeMeasurement for full details and updates.
    • By tcnet
      File Manager for ProcessWire is a module to manager files and folders from the CMS backend. It supports creating, deleting, renaming, packing, unpacking, uploading, downloading and editing of files and folders. The integrated code editor ACE supports highlighting of all common programming languages.
      https://github.com/techcnet/ProcessFileManager

      Warning
      This module is probably the most powerful module. You might destroy your processwire installation if you don't exactly know what you doing. Be careful and use it at your own risk!
      ACE code editor
      This module uses ACE code editor available from: https://github.com/ajaxorg/ace

      Dragscroll
      This module uses the JavaScript dragscroll available from: http://github.com/asvd/dragscroll. Dragscroll adds the ability to drag the table horizontally with the mouse pointer.
      PHP File Manager
      This module uses a modified version of PHP File Manager available from: https://github.com/alexantr/filemanager
       
    • By tcnet
      This module implements the website live chat service from tawk.to. Actually the module doesn't have to do much. It just need to inserted a few lines of JavaScript just before the closing body tag </body> on each side. However, the module offers additional options to display the widget only on certain pages.
      Create an account
      Visit https://www.tawk.to and create an account. It's free! At some point you will reach a page where you can copy the required JavaScript-code.

      Open the module settings and paste the JavaScript-code into the field as shown below. Click "Submit" and that's all.

      Open the module settings
      The settings for this module are located int the menu Modules=>Configure=>LiveChatTawkTo.

       
    • By tcnet
      Session Viewer is a module for ProcessWire to list session files and display session data. This module is helpful to display the session data of a specific session or to kick out a logged in user by simply delete his session file. After installation the module is available in the Setup menu.

      The following conditions must be met for the module to work properly:
      Session files
      Session data must be stored in session files, which is the default way in ProcessWire. Sessions stored in the database are not supported by this module. The path to the directory where the session files are stored must be declared in the ProcessWire configuration which is by default: site/assets/sessions.
      Serialize handler
      In order to transform session data easier back to a PHP array, the session data is stored serialized. PHP offers a way to declare a custom serialize handler. This module supports only the default serialize handlers: php, php_binary and php_serialize. WDDX was dropped in PHP 7.4.0 and is therefore not supported by this module as well as any other custom serialize handler. Which serialize handler is actually used you can find out in the module configuration which is available under Modules=>Configure=>SessionViewer.

      Session data
      The session data can be displayed in two different ways. PHP's default output for arrays print_r() or by default for this module nice_r() offered on github: https://github.com/uuf6429/nice_r. There is a setting in the module configuration if someone prefers print_r(). Apart from the better handling and overview of the folded session data the output of nice_r() looks indeed nicer.

      Links
      ProcessWire module directory
      github.com
    • By Robin S
      Repeater Easy Sort
      Adds a compact "easy-sort" mode to Repeater and Repeater Matrix, making those fields easier to sort when there are a large number of items.
      The module also enhances Repeater Matrix by allowing a colour to be set for each matrix type. This colour is used in the item headers and in the "add new" links, to help visually distinguish different matrix types in the inputfield.
      Screencasts
      A Repeater field

      A Repeater Matrix field with custom header colours

      Easy-sort mode
      Each Repeater/Matrix item gets an double-arrow icon in the item header. Click this icon to enter easy-sort mode.
      While in easy-sort mode:
      The items will reduce in width so that more items can be shown on the screen at once. The minimum width is configurable in the field settings. Any items that were in an open state are collapsed, but when you exit easy-sort mode the previously open items will be reopened. You can drag an item left/right/up/down to sort it within the items. The item that you clicked the icon for is shown with a black background. This makes it easier to find the item you want to move in easy-sort mode. You can click an item header to open the item. An "Exit easy-sort mode" button appears at the bottom of the inputfield. Configuration
      In the field settings for Repeater and Repeater Matrix fields you can define a minimum width in pixels for items in easy-sort mode. While in easy-sort mode the items will be sized to neatly fill the available width on any screen size but will never be narrower than the width you set here.
      In the field settings for Repeater Matrix you can define a custom header colour for each matrix type using an HTML "color" type input. The default colour for this type of input is black, so when black is selected in the input it means that no custom colour will be applied to the header.
      Exclusions
      The easy-sort mode is only possible on Repeater/Matrix fields that do not use the "item depth" option.
       
      https://github.com/Toutouwai/RepeaterEasySort
      https://processwire.com/modules/repeater-easy-sort/
×
×
  • Create New...