Jump to content

Module: Twitter Puller


apeisa
 Share

Recommended Posts

I created simple module for fetching data from Twitter and saving those results as pw pages (or can be used without saving also). This is very much work in progress and I don't recommend anyone to use this on live site just yet, since there is few rough edges I want to solve first. But if you are interested, please try this out and comment.

Screencast: http://screencast.com/t/zBJOWzma

Usage:

  • Install the TwitterPuller-module
  • Create new page and give it template: twitter-puller
  • Give the fields values that you want
  • Edit one of your template files and add $page->renderTweets("/url/to/page/you/just/created/")
  • It should work :)

What it needs:

  • Caching: it supports cache now
  • Settings: there should be some editable defaults
  • jsondecode messes up tweet id value (too long integers or something like that, not sure if I even try to fix this...)
  • Maybe way to use cron and never fetch data for live user?

I didn't implement caching yet since that would require Markup Cache module and it's not installed by default. Do we have some way to tell about module dependency? Also Ryan - I am not sure if adding custom method to all pages is best way here? Is there lot's of overheat doing it this way?

UPDATE: New version. Now it supports caching and also possibility to pull tweets based to username.

TwitterPuller.module

Link to comment
Share on other sites

Antti, this looks great, nice work! Really well put together code here too. I am looking forward to using this module in future projects. Great screencast too.

Maybe way to use cron and never fetch data for live user?

I think we need an AutoCron module that provides 1 min, 30min, 1 hour, 6 hour, 12 hour, daily, weekly and monthly hooks that any other module can hook into and expect that whatever function it hooks will get executed according to the time they requested. AutoCron would run on every page request, but only execute it's timed hooks when a set amount of time has passed. This will actually be an easy module to build... I think I will get something started here, because we need a core module to provide this capability. I'm thinking this is something like Drupal's poor man's cron, though I've yet looked at that in Drupal.

I didn't implement caching yet since that would require Markup Cache module and it's not installed by default. Do we have some way to tell about module dependency?

Actually ProcessWire modules should automatically install when another module requests them. So when you do this:

$module = $this->modules->get("MarkupCache"); 

It should the module if it isn't already. Since MarkupCache is a core module, you can count on it being there so no additional error checking should be necessary. But for any required 3rd party modules, you should check that the returned $module is not null whenever you are using it. If it's null, then you'd want to abort or throw an exception alerting them they need to install whatever module is missing. This is better than checking in your install() function because it covers the possibility that they might uninstall a required 3rd party module sometime later.

In 2.1 or 2.2, we should probably update to the core to track module dependencies so that the above is handled internally by the system rather than the module developers.

Also Ryan - I am not sure if adding custom method to all pages is best way here? Is there lot's of overheat doing it this way?

I don't think there is any overhead in adding a custom method to all pages. Behind the scenes, it's only added once to the actual Page class rather than all $page instances separately. So I think your method here is good.

I also put together a Twitter module for use in processwire.com awhile back (MarkupTwitterFeed.module). The output it produces can be seen in processwire.com in the sidebar on most pages (minus the forum). It's a completely different thing from yours and doesn't do nearly as much, but figured I'd post in case you were interested. It does maintain it's own cache file... I didn't use MarkupCache to do it here because this module was written before MarkupCache existed. :)

MarkupTwitterFeed.module

Link to comment
Share on other sites

Thanks Ryan - looking your older module is very helpful (although these work pretty differently). I think that using pages will be a bit too much in most cases (it creates templates, fields etc), but I think it will allow simple UI for hiding tweets (this is requirement for one of our clients).

I actually started to think about some kind of data importer: you could put any feed (well.. at least xml, json and cvs) and after that map those values to your pages. That would be so great, but probably little bit too much for my skills.. although I do like challenges :)

Well.. I finish this first and after that I'll create RSS to pages -module. If these aren't enough, then I'll start to think about some kind of data importer... Probably they are, since PW makes it really simple to create import scripts. What do you guys think?

Link to comment
Share on other sites

I actually started to think about some kind of data importer: you could put any feed (well.. at least xml, json and cvs) and after that map those values to your pages. That would be so great, but probably little bit too much for my skills.. although I do like challenges

I think it's a great idea. This does sound like a really useful thing to have. It sounds like it could be complex, but in writing out the steps (if I have it right) it seems more approachable:

Step 1:

- You enter a feed URL

- You select the parent page where it will create children

- You select the template it will use

- Click submit

Step 2:

- It loads the feed, looks at the first record, and saves all the field names in an array.

- It displays all the PW field names from the template you selected. Next to each is a select box containing all the feed's field names.

- You select one of the feed's fields for each field in your template.

- Click submit.

Step 3:

- It loads the feed again and iterates through all the records.

- It creates a new Page for each record, associating the fields you selected in Step 2.

- It displays the pages it created.

- Pat yourself on the back and drink a beer.

What do you think, am I missing anything? Where it gets harder to create a universal feed import tool is with Page relations and file/image fields. But maybe those are for more specific tools, where something more is known about the data. If you decide you want to build a tool like this, just let me know what I can do to help.

Link to comment
Share on other sites

Yes, that is pretty much it! Very well planned too.

Just add cron and we have it ready, so this tool could be used to import events from event calender feed, news from rss and tweets from twitter feed.

What this adds is few things:

  • We have to tell which field (or many fields combined, like title and date) is "id" - so that we don't end up with duplicates.
  • We need list view of all data feeds, where we can choose how often it is fetched (or if it's manual)

I actually really want to build this, since I know we will have ton's of imports coming in our projects.

About page relations: I think if these are needed then many times best way could be running another script after import, which adds needed relations.

About images & files: I don't remember any import where we would have need for importing files, so I'm glad to drop this out to keep things simple. Well.. simple enough at least :)

Link to comment
Share on other sites

Back to topic: I added caching and also ability to pull tweets based on username. I think this is pretty save now for use. There isn't any settings and it generates awful markup by default - so you probably need to edit lines 238 or 247 for your usage. I am not sure if I will support or develop this much further (well, if people really find this useful, I can add needed settings etc.. but this is fine for my own needs now), so please feel free to fork this if you want (I'll put it on github if someone requests).

If you have installed this, please remove all the pages (from trash also) using templates twitter-puller or twitter-tweets and after that uninstall. I haven't done any error checking on those install / uninstall methods, so it is pretty easy to get errors there...

Link to comment
Share on other sites

If you decide you want to build a tool like this, just let me know what I can do to help.

I started coding and I have a pretty good start. Still very much work in progress, but works as a proof of concept. It would be super helpful for me, if you find time to check this out at some point and give me some comments or tips what I am doing wrong :)

https://github.com/apeisa/ProcessDataImport

It is actually pretty cool stuff (ugly, but still cool ;)), I can't wait to get this ready and start using it.

UPDATE: Oh, forgot to say that works only with JSON feeds at this point. Here is couple of feeds you can test (should "work" with any valid json though):

http://search.twitter.com/search.json?q=processwire

http://keikalle.com/api/city/helsinki

http://gdata.youtube.com/feeds/api/standardfeeds/most_popular?v=2&alt=json (this is monster)

Link to comment
Share on other sites

Wow this is really looking great! I tested on the feeds you indicated, and was able to perform the field mappings no problem. It looks to me like everything you've implemented so far works. Great work! Keep doing what you are doing. Here are a couple notes:

In your table structure, you may want to use ID's (integers) rather than strings for 'template' (in the main table) and 'to_field' (in the mappings table). This is in case either of these things get renamed -- the ID is permanent, but the name is not.

One thought occurred to me on the load screen, and that was that it would be handy to see an example of the data for each field (like the data from the first record) when associating the field, just to clarify what it is. Though I could just as easily load the feed in my browser and do that, so that's only a luxury feature (probably not something for v1).

Link to comment
Share on other sites

Thanks for the feedback! I will change field & template string to id:s.

One thought occurred to me on the load screen, and that was that it would be handy to see an example of the data for each field (like the data from the first record) when associating the field, just to clarify what it is.

This is actually implemented, but it is very bare bones implementation. I use a elements for value keys - and I have actual value there in title attribute. So depending on the browser you should see the value if you keep cursor on top of the key.

Link to comment
Share on other sites

This is actually implemented, but it is very bare bones implementation. I use a elements for value keys - and I have actual value there in title attribute. So depending on the browser you should see the value if you keep cursor on top of the key.

That's awesome, I missed that before – you are a step ahead of me. I'm very excited about this module.

Link to comment
Share on other sites

  • 4 weeks later...

I played little bit more with this, but now I need some help. I have still few bits missing, but the most important thing is giving me a problems now. This is now pure php, so no processwire knowledge needed to help me on this (I could ask help from many other sources, but I start from here).

I have array where I keep my "data_root". This tells me where is the "start" or "root" array in my source data (the array which I want to loop and take values from). If you take a look at twitter data (http://search.twitter.com/search.json?q=php) you will see that this "root" is "results". So I would loop this data like this:

<?php
foreach($dataFromTwitter->results as $item) {
   $out .= $item->text; // This would have actual tweet text
}

So my data_root array keeps in this example just one value: ['results']. Sometimes we loop right from the root (http://keikalle.com/api/city/helsinki) and sometimes the "results" are deeper (http://gdata.youtube.com/feeds/api/standardfeeds/most_popular?v=2&alt=json, where it is feed -> entry).

Ok, this has been probably very badly written question so far, so I try to keep it simple now:

How I can turn this:

<?php
Array
(
   [0] => feed
   [1] => entry
)

into this:

<?php
foreach ($data->feed->entry as $item) {
//
}

So that I could loop it.

Any help appreciated (like is there some function to map these or am I all lost :)). This is probably something super simple, but I couldn't get my mind working on this..

Link to comment
Share on other sites

I tried stackoverflow on this one, and it took 2 minutes to get a working solution. Gotta love that site (and people contributing).

function map_property($obj, $array) {
  $ret = $obj;
  foreach($array as $prop) {
    $ret = $ret->$prop;
  }
  return $ret;
}

foreach(map_property($data, array('feed', 'entry')) as $item) { }
Link to comment
Share on other sites

  • 4 months later...

Antti and Ryan, only just reading through this thread since I might use the Twitter module soon. Great work on the Twitter module Antti and some interesting comments on  the Cron and general data import tool. Would definitely be useful for cron module Ryan - that would be cool :)

On the idea of general data import - could a module like this piggyback on Yahoo Query Language? Not sure if either of you have used this but I've played with it a bit and it's very, very cool; uses JS to simplify access via SQL like syntax to over 1,200 information feeds - BBC, Amazon, Apple, Flickr, Twitter and so on - plus can format as XML or JSON or via REST... http://developer.yahoo.com/yql/console - use the link for 'show community tables' on the right to see the full list. I could be way off track here but kinda seems like it could be an uber-cool module and make ProcessWire the de-facto tool for data-mashups  ;D Either way, I might start another thread to try and understand how I might code something in PW to import YQL produced data into PW articles since I'd be very interested in getting that to work...

Link to comment
Share on other sites

Actually my work in progress Data Import does work with any JSON you throw at it: https://github.com/apeisa/ProcessDataImport

So you could easily use YQL to generate JSON feed for you, and then use Data Import to import those values to your pages. Data Import does need some work though (title needs to be unique - no way to combine values etc), but in simple tests it always works. Note though: JSON works always, XML has some problems.

Not sure if you meant something deeper integration between YQL and PW? Something like where you could build those queries from PW admin?

Link to comment
Share on other sites

Cool, that's useful to know about the JSON stuff, might be useful to bottom out the XML at some point; I know it's not Ryan's preference but I think it ties in well with the strict hierarchical and tree based approach of PW...

Well I see the main advantage of YQL is that it pulls together such a huge number of diverse and varied APIs through a unified and straight-forward query language so you don't have to keep learning all the different quirks or each one; so yes, if you had a plugin which could work a bit like the YQL console for quickly putting together feeds and then marry it with something like your module if/when you needed to not just display the data via a feed but also pull it into the back-end db - so once you had it working for one feed then would seem fairly trivial to adapt for the others (but I could be way out there). I could see some interesting possibilities... and kind of feels like it would answer both ends of the equation - easily preparing custom feeds and then importing. I've only very briefly looked at Yahoo Pipes but possibly that would also permit the easiest way of combining feeds, and combining values within a single feed or across feeds before importing. Anyway, just an idea. I need to take a closer look at DataImport to understand how it functions...

Link to comment
Share on other sites

YQL definitely sounds interesting to me. I look forward to learning more. I've been working with a lot of XML data feeds lately. Currently building a system that mirrors articles from a content provider a couple times a day and I'm pretty impressed by their web services. Lots of feeds to pull from between articles, images, comments, categories and more. Regardless of format, PHP's SimpleXML + ProcessWire makes it really straightforward. There's nothing more satisfying than pulling data from web services and watching a site run itself. :) So something like YQL sounds very intriguing.

Link to comment
Share on other sites

  • 2 weeks later...

Antti, is the above still true?

/Jasper

Dohhhhhh! My bad, I totally missed that. I could swear I saw "Release" somewhere on this module. Once I get a firmer grasp of PW's ins and outs, I'd be happy to contribute what brain cells I have left to working out the kinks.

Link to comment
Share on other sites

I am using this on lukio.fi site and it works nicely. I have not looked into codebase in a long time, but if you guys want to use this and run into any problems, I will definitely help you and improve this module.

Link to comment
Share on other sites

  • 2 weeks later...

Ryan, I'm trying to use your minimalist Twitter module (Apeisa's looks great but has more features than I need) and for some reason can't get my feed to load. Any idea what I might be doing wrong here?

This is what I have in the template file:

$t = new MarkupTwitterFeed(); 
echo $t->render("http://twitter.com/statuses/user_timeline/14601766.rss");

And this is what it renders:

MarkupTwitterFeed: Unable to load http://twitter.com/statuses/user_timeline/14601766.rss

The link resolves when I paste it into my browser, so I'm not sure what I'm doing wrong.

Link to comment
Share on other sites

Statestreet, I just realized I didn't have this on GitHub, so I posted it to there, just in case you've got an earlier version. Here is the version that I am currently using on processwire.com for the Twitter feed that appears throughout the site:

https://github.com/ryancramerdesign/MarkupTwitterFeed

I'm wondering if your PHP might have allow_url_fopen disabled? You can find out by looking at your phpinfo(), i.e. put this on your server in a PHP file (like test.php) and load in your browser:

<?php phpinfo();

I am guessing that you might have the allow_url_fopen option disabled, and that would prevent this module from being able to load the RSS feed.

If you find that the option is enabled, then next check your Twitter RSS URL. Logout of Twitter and see if it still works? If I recall, some of Twitter's RSS feeds don't work unless you are logged in with the same browser you are retrieving them from, and you may have to enable access to it somewhere in your Twitter settings... it's been awhile since I've looked at this, so not positive what the current deal is there. But something to look at. Please let me know what you find.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...