Jump to content

Load RSS Feeds (MarkupLoadRSS)


ryan

Recommended Posts

ProcessWire RSS Feed Loader


Given an RSS feed URL, this module will pull it, and let you foreach() it or render it. This module will also cache feeds that you retrieve with it. The module is designed for ProcessWire 2.1+, but may also work with 2.0 (haven't tried yet).

This module is the opposite of the MarkupRSS module that comes with ProcessWire because that module creates RSS feeds. Whereas this module loads them and gives you easy access to the data to do whatever you want.

For a simple live example of this module in use, see the processwire.com homepage (and many of the inside pages) for the "Latest Forum Post" section in the sidebar.

Download at: https://github.com/r...n/MarkupLoadRSS

REQUIREMENTS


This module requires that your PHP installation have the 'allow_url_fopen' option enabled. By default, it is enabled in PHP. However, some hosts turn it off for security reasons. This module will prevent itself from being installed if your system doesn't have allow_url_fopen. If you run into this problem, let me know as we may be able to find some other way of making it work without too much trouble.

INSTALLATION


The MarkupLoadRSS module installs in the same way as all PW modules:

1. Copy the MarkupLoadRSS.module file to your /site/modules/ directory.

2. Login to ProcessWire admin, click 'Modules' and 'Check for New Modules'.

3. Click 'Install' next to the Markup Load RSS module.

USAGE


The MarkupLoadRSS module is used from your template files. Usage is described with these examples:

Example #1: Cycling through a feed

<?php

  $rss = $modules->get("MarkupLoadRSS");
  $rss->load("http://www.di.net/articles/rss/");

  foreach($rss as $item) {
      echo "<p>";
      echo "<a href='{$item->url}'>{$item->title}</a> ";
      echo $item->date . "<br /> ";
      echo $item->description;
      echo "</p>";
  }

Example #2: Using the built-in rendering

<?php

  $rss = $modules->get("MarkupLoadRSS");
  echo $rss->render("http://www.di.net/articles/rss/");

Example #3: Specifying options and using channel titles

<?php

  $rss = $modules->get("MarkupLoadRSS");

  $rss->limit = 5;
  $rss->cache = 0;
  $rss->maxLength = 255;
  $rss->dateFormat = 'm/d/Y H:i:s';

  $rss->load("http://www.di.net/articles/rss/");

  echo "<h2>{$rss->title}</h2>";
  echo "<p>{$rss->description}</p>";
  echo "<ul>";

  foreach($rss as $item) {
       echo "<li>" . $item->title . "</li>";
  }

  echo "</ul>";

OPTIONS


Options MUST be set before calling load() or render().

<?php

  // specify that you want to load up to 3 items (default = 10)
  $rss->limit = 3;

  // set the feed to cache for an hour (default = 120 seconds)
  // if you want to disable the cache, set it to 0.
  $rss->cache = 3600;

  // set the max length of any field, i.e. description (default = 2048)
  // field values longer than this will be truncated
  $rss->maxLength = 255;

  // tell it to strip out any HTML tags (default = true)
  $rss->stripTags = true;

  // tell it to encode any entities in the feed (default = true);
  $rss->encodeEntities = true;

  // set the date format used for output (use PHP date string)
  $rss->dateFormat = "Y-m-d g:i a";

See the $options array in the class for more options. You can also customize all output produced by the render() method, though it is probably easier just to foreach() the $rss yourself. But see the module class file and $options array near the top to see how to change the markup that render() produces.

MORE DETAILS


This module loads the given RSS feed and all data from it. It then populates that data into a WireArray of Page-like objects. All of the fields in the RSS <items> feed are accessible, so you use whatever the feed provides. The most common and expected field names in the RSS channel are:

  • $rss->title
  • $rss->pubDate (or $rss->date)
  • $rss->description (or $rss->body)
  • $rss->link (or $rss->url)
  • $rss->created (unix timestamp of pubDate)

The most common and expected field names for each RSS item are:

  • $item->title
  • $item->pubDate (or $item->date)
  • $item->description (or $item->body)
  • $item->link (or $item->url)
  • $item->created (unix timestamp of pubDate)

For convenience and consistency, ProcessWire translates some common RSS fields to the PW-equivalent naming style. You can choose to use either the ProcessWire-style name or the traditional RSS name, as shown above.

HANDLING ERRORS


If an error occurred when loading the feed, the $rss object will have 0 items in it:

<?php

  $rss->load("...");
  if(!count($rss)) { error }

In addition, the $rss->error property always contains a detailed description of what error occurred:

<?php

  if($rss->error) { echo "<p>{$rss->error}</p>"; }

I recommend only checking for or reporting errors when you are developing and testing. On production sites you should skip

error checking/testing, as blank output is a clear indication of an error. This module will not throw runtime exceptions so if an error occurs, it's not going to halt the site.

  • Like 1
Link to comment
Share on other sites

Great to have such a module, thanks Ryan!

Though can't install:

Warning: mkdir() [function.mkdir]: No such file or directory in /Applications/XAMPP/xamppfiles/htdocs/pw2.ch/site/modules/MarkupLoadRSS.module on line 447

Link to comment
Share on other sites

  • 3 months later...

I want to display a RSS feed that contains items like below and it works well, except for the author field (dc:creator), which isn't parsed. Is there a way to parse this value as well?

		<item>
	<title>Taalkundigen Uppsala ontcijferen geheimschrift</title>
	<link>http://www.wereldwijzerzweden.net/2011/11/03/uppsala-geheimschrift-taalkundige-copiale/</link>
	<comments>http://www.wereldwijzerzweden.net/2011/11/03/uppsala-geheimschrift-taalkundige-copiale/#comments</comments>
	<pubDate>Thu, 03 Nov 2011 16:03:16 +0000</pubDate>
	<dc:creator>Marcel Burger</dc:creator>
	<category><![CDATA[Actueel]]></category>
	<category><![CDATA[berlijn]]></category>
	<category><![CDATA[Copiale]]></category>
	<category><![CDATA[geheimschrift]]></category>
	<category><![CDATA[universiteit]]></category>
	<category><![CDATA[uppsala]]></category>

	<guid isPermaLink="false">http://www.wereldwijzerzweden.net/?p=7227</guid>
	<description><![CDATA[<a href="http://www.wereldwijzerzweden.net/2011/11/03/uppsala-geheimschrift-taalkundige-copiale/"><img align="left" hspace="5" width="150" src="http://www.wereldwijzerzweden.net/images/copiale_280.jpg" class="alignleft wp-post-image tfe" alt="Deel uit vrijgegeven beeld van het Copialeschrift" title="copiale_280.jpg" /></a>3 november 2011 &#124; Twee Zweedse taalkundigen en een Amerikaanse wetenschapper zijn erin geslaagd een 280 jaar oud geheimschrift uit Duitsland met voorheen onbegrijpelijke tekens te vertalen.]]></description>
	<wfw:commentRss>http://www.wereldwijzerzweden.net/2011/11/03/uppsala-geheimschrift-taalkundige-copiale/feed/</wfw:commentRss>
	<slash:comments>0</slash:comments>
	</item>

I outputted the $rss array with print_r(); and it doesn't contain the dc:creator field (some others seem to be missing as well,  but I don't need these  ;))

/Jasper

Link to comment
Share on other sites

If I recall correctly, SimpleXML doesn't work with the properties that have colons in them. But you can fix that by replacing the colon properties with underscore properties in the XML data. So in this case, you'd want to add this line in the load() function:

<?php
public function load($url) { 
    $this->items = new WireArray();
    $xmlData = $this->loadXmlData($url);
    $xmlData = str_replace('dc:creator', 'dc_creator', $xmlData); 

Or you may be able to cover all the colon properties at once using a regexp like this:

<?php
$xmlData = preg_replace('{(</?[_a-z0-9]+)[_a-z0-9]+>)}', '$1_$2', $xmlData); 

What that does is convert properties like <dc:creator> to <dc_creator> so that SimpleXML will understand them and likewise you can access them in the module. Let me know if this works for you. I'm not in a place where I can update the source on this module today, but will plan to add something like the above soon.

I don't know why the <comments> property wouldn't be getting parsed, as that appears to just be a string (URL). I need to test and experiment with that one to find out why.

Link to comment
Share on other sites

Thanks Ryan, replacing the colons work, both with the str_replace and the regexp..

I'm not in a place where I can update the source on this module today, but will plan to add something like the above soon.

I also submitted (via Github) a double encoding issue (I am good in finding these  :P) in this module. You might want to take a look at that one at the same time. :-)

I don't know why the <comments> property wouldn't be getting parsed, as that appears to just be a string (URL). I need to test and experiment with that one to find out why.

My fault  :-[, the comments property is parsed. One that didn't get parsed was the Category, but that may be because it appears multiple times. (guess).

The exact feed I am using is also in the Github issue, so you can test with it if you want/like.

/Jasper

Link to comment
Share on other sites

Thanks for submitting the issue, I will fix. Also I'd like to find a way to get Comments (and any multi-item properties) working as well, should be easy. The feeds I'd originally tested with were pretty basic and didn't have these extended properties.

Link to comment
Share on other sites

  • 4 months later...

Great work Ryan! Only thing I might add is support for multiple feeds. Though it might complicate this module too much?

I had need for multiple feeds and it seemed to be pretty straightforward implementation. Only few modifications to load method:

public function load($url) {
 $this->items = new WireArray();
 if (is_array($url)) {
  $items = array();
  foreach ($url as $feed) {
   $xmlData = $this->loadXmlData($feed);
   $xml = simplexml_load_string($xmlData);
   $items = array_merge($items, $xml->xpath('/rss//item'));
  }
  $rss = simplexml_load_string($xmlData);
 } else {
  $xmlData = $this->loadXmlData($url);
  $rss = simplexml_load_string($xmlData);
 }

 if(!$rss) {
  $msg = "Unable to load RSS feed at " . htmlentities($url) . ": \n";
  foreach(libxml_get_errors() as $error) $msg .= $error . " \n";
  $this->error($msg);
  return $this;
 }
 $this->channel['title'] = $this->cleanText((string) $rss->channel->title);
 $this->channel['description'] = $this->cleanText((string) $rss->channel->description);
 $this->channel['link'] = $this->cleanText((string) $rss->channel->link);
 $this->channel['created'] = strtotime((string) $rss->channel->pubDate);
 $this->channel['pubDate'] = date($this->options['dateFormat'], $this->channel['created']);
 $n = 0;
 // If we already have $items set, it means we are dealing with multiple sources. Let's sort them
 if(isset($items)) {
  usort($items, function ($x, $y) {
 return strtotime($y->pubDate) - strtotime($x->pubDate);
  });
 } else {
  $items = $rss->channel->item;
 }
 foreach($items as $item) {
  $a = new MarkupLoadRSSItem();
  foreach($item as $key => $value) {
   $value = (string) $value;
   if($key == 'pubDate') {
 $value = strtotime($value);
 $a->set('created', $value);
 $value = date($this->options['dateFormat'], $value);
   } else {
 $value = $this->cleanText($value);
   }
   $a->set($key, $value);
  }
  $this->items->add($a);
  if(++$n >= $this->options['limit']) break;
 }
 return $this;
}

What it does it sniffs if $url is array, then loads/caches all those and merge their rss-items to $items array. Then later on that $items is sorted by pubDate. So this is fully backwards compatible => just give it an array instead of single url if you need to parse multiple feeds.

If you guys can test it works for you too then maybe Ryan you can put this on your version. I can do pull request if you want to (although it seems that new and fancy GitHub for windows does mess up line endings..).

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

I'm not sure that the W3 validator is picking it up right either? Seems like it is showing the whole thing as double entity encoded. Also tried loading in Safari, and it can't seem to read the feed correctly either. Firefox seems okay. Definitely something unusual going on with this feed, but I am not familiar enough with this particular format to know what's wrong. W3 validator isn't helping much since it's seeing the whole thing as double entity encoded.

Link to comment
Share on other sites

  • 2 months later...

Yeah, there definitely was some strange going on with that feed. Now it seems to be working on my end too, so they must have been fixed that.

Ryan: have you thought about adding that multisource functionality to this module? I am already using it in couple of places, and it has been working great. Of course if you think the implementation should be different or alltogether different module then let me know (or if you prefer github pull request).

What I was thinking it might be more "pw" to have add->(source_url) etc and then load, instead of having all the urls in array load($array_of_urls) like it is currently.

Link to comment
Share on other sites

  • 4 months later...

Hi Ryan,

Thanks for this module.  Have been using it on our main site for awhile now.  Just wanted to let you know of an issue that I just discovered that others may run into, and see if there's a way to handle it.

I was trying to load a feed that for awhile was not responding.  The feed page wasn't throwing an error or even timing out, just loading for minutes on end.

This ended up causing a timeout on our site (the feed was loading on the main page) and producing this error in the PW log file:

Error Exception: MySQL server has gone away (in /mnt/stor7-wc2-dfw1/526843/www.agencypja.com/web/content/wire/core/Database.php line 118)

For now, we've just disabled that feed, but we are using the module to load other feeds.  Do you (or anyone else) know of a way to address this issue?  I don't see a timeout option in the module, but could certainly look into adding one if that determined to be the best option.

Thanks.

Link to comment
Share on other sites

Could you PM me the RSS feed you are working with? I can do some testing here. I believe we can get it working by switching MarkupLoadRSS to use the new WireHttp class in PW 2.3.10+, but I need an example to test with. 

Link to comment
Share on other sites

Hi Ryan,

Unfortunately, the feed that was causing problems is now back up and running normally.  I thought that I could recreate the issue by creating a php page on another server with a timeout set to at least 5 min, sleeping the script, and using that as the RSS feed, but that didn't work.

I'll be sure to let you know if I ever come across is again.

Thanks.

  • Like 1
Link to comment
Share on other sites

  • 1 year later...
  • 3 months later...
  • 11 months later...

I get always empty RSS Feed output! On my Page i call the RSS module by url selector like blog/rss and for the output i need the same page array that i use for the /blog page. But in the RSS Feed i get no content!

$blogposts = $pages->find("template=post, publish_date<$today, sort=-publish_date, limit=10");

if($input->urlSegment1 === 'rss'){
  // retrieve the RSS module
  $rss = $modules->get("MarkupRSS");

  // configure the feed. see the actual module file for more optional config options.
  $rss->title = "Letzte Blogeinträge";

  $rss->render($blogposts);
  return;
} else {
  $content = renderPosts($blogposts, true);
}
Link to comment
Share on other sites

  • 3 months later...

I just downloaded MarkupLoadRSS module from M.Cramer's Github repo.

Here is the demo code I used for test purpose in my template :

        $rss = $modules->get("MarkupLoadRSS");
        $rss->load("http://rss.cbc.ca/lineup/canada.xml");

        foreach($rss as $item) {
            echo "<p>";
            echo "<a href='{$item->url}'>{$item->title}</a> ";
            echo $item->date . "<br /> ";
            echo $item->description;
            echo "</p>";
        }

All I get is this error :

Error: Call to a member function load() on a non-object (line 65 of C:\wamp\www\mysite\site\templates\home.php)

As if the module wants an object, like a $page or $config or something...

I looked up the code and the function is load($url). Would it be conflicting with something ?

I'm running v2.7.2

Edited by kongondo
merged your topic here, the module's support forum
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...