Jump to content


Photo

Module: Twitter Puller


  • Please log in to reply
30 replies to this topic

#1 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 28 May 2011 - 07:47 PM

I created simple module for fetching data from Twitter and saving those results as pw pages (or can be used without saving also). This is very much work in progress and I don't recommend anyone to use this on live site just yet, since there is few rough edges I want to solve first. But if you are interested, please try this out and comment.

Screencast: http://screencast.com/t/zBJOWzma

Usage:
  • Install the TwitterPuller-module
  • Create new page and give it template: twitter-puller
  • Give the fields values that you want
  • Edit one of your template files and add $page->renderTweets("/url/to/page/you/just/created/")
  • It should work :)

What it needs:
  • Caching: it supports cache now
  • Settings: there should be some editable defaults
  • jsondecode messes up tweet id value (too long integers or something like that, not sure if I even try to fix this...)
  • Maybe way to use cron and never fetch data for live user?

I didn't implement caching yet since that would require Markup Cache module and it's not installed by default. Do we have some way to tell about module dependency? Also Ryan - I am not sure if adding custom method to all pages is best way here? Is there lot's of overheat doing it this way?

UPDATE: New version. Now it supports caching and also possibility to pull tweets based to username.

Attached Files



#2 fenton

fenton

    Full Member

  • Members
  • PipPipPip
  • 69 posts
  • 23

Posted 29 May 2011 - 02:19 AM

nice, i got a cameo role in that screencast in form of a tweet :) well done

#3 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 29 May 2011 - 02:31 AM

nice, i got a cameo role in that screencast in form of a tweet :) well done


Yep! Few other PW-stars also featured :)

#4 ryan

ryan

    Reiska

  • Administrators
  • 7,797 posts
  • 6572

  • LocationAtlanta, GA

Posted 29 May 2011 - 09:43 AM

Antti, this looks great, nice work! Really well put together code here too. I am looking forward to using this module in future projects. Great screencast too.

Maybe way to use cron and never fetch data for live user?


I think we need an AutoCron module that provides 1 min, 30min, 1 hour, 6 hour, 12 hour, daily, weekly and monthly hooks that any other module can hook into and expect that whatever function it hooks will get executed according to the time they requested. AutoCron would run on every page request, but only execute it's timed hooks when a set amount of time has passed. This will actually be an easy module to build... I think I will get something started here, because we need a core module to provide this capability. I'm thinking this is something like Drupal's poor man's cron, though I've yet looked at that in Drupal.

I didn't implement caching yet since that would require Markup Cache module and it's not installed by default. Do we have some way to tell about module dependency?


Actually ProcessWire modules should automatically install when another module requests them. So when you do this:

$module = $this->modules->get("MarkupCache");

It should the module if it isn't already. Since MarkupCache is a core module, you can count on it being there so no additional error checking should be necessary. But for any required 3rd party modules, you should check that the returned $module is not null whenever you are using it. If it's null, then you'd want to abort or throw an exception alerting them they need to install whatever module is missing. This is better than checking in your install() function because it covers the possibility that they might uninstall a required 3rd party module sometime later.

In 2.1 or 2.2, we should probably update to the core to track module dependencies so that the above is handled internally by the system rather than the module developers.

Also Ryan - I am not sure if adding custom method to all pages is best way here? Is there lot's of overheat doing it this way?


I don't think there is any overhead in adding a custom method to all pages. Behind the scenes, it's only added once to the actual Page class rather than all $page instances separately. So I think your method here is good.

I also put together a Twitter module for use in processwire.com awhile back (MarkupTwitterFeed.module). The output it produces can be seen in processwire.com in the sidebar on most pages (minus the forum). It's a completely different thing from yours and doesn't do nearly as much, but figured I'd post in case you were interested. It does maintain it's own cache file... I didn't use MarkupCache to do it here because this module was written before MarkupCache existed. :)





Attached Files



#5 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 29 May 2011 - 01:08 PM

Thanks Ryan - looking your older module is very helpful (although these work pretty differently). I think that using pages will be a bit too much in most cases (it creates templates, fields etc), but I think it will allow simple UI for hiding tweets (this is requirement for one of our clients).

I actually started to think about some kind of data importer: you could put any feed (well.. at least xml, json and cvs) and after that map those values to your pages. That would be so great, but probably little bit too much for my skills.. although I do like challenges :)

Well.. I finish this first and after that I'll create RSS to pages -module. If these aren't enough, then I'll start to think about some kind of data importer... Probably they are, since PW makes it really simple to create import scripts. What do you guys think?

#6 ryan

ryan

    Reiska

  • Administrators
  • 7,797 posts
  • 6572

  • LocationAtlanta, GA

Posted 30 May 2011 - 10:34 AM

I actually started to think about some kind of data importer: you could put any feed (well.. at least xml, json and cvs) and after that map those values to your pages. That would be so great, but probably little bit too much for my skills.. although I do like challenges


I think it's a great idea. This does sound like a really useful thing to have. It sounds like it could be complex, but in writing out the steps (if I have it right) it seems more approachable:

Step 1:

- You enter a feed URL
- You select the parent page where it will create children
- You select the template it will use
- Click submit

Step 2:

- It loads the feed, looks at the first record, and saves all the field names in an array.
- It displays all the PW field names from the template you selected. Next to each is a select box containing all the feed's field names.
- You select one of the feed's fields for each field in your template.
- Click submit.

Step 3:

- It loads the feed again and iterates through all the records.
- It creates a new Page for each record, associating the fields you selected in Step 2.
- It displays the pages it created.
- Pat yourself on the back and drink a beer.

What do you think, am I missing anything? Where it gets harder to create a universal feed import tool is with Page relations and file/image fields. But maybe those are for more specific tools, where something more is known about the data. If you decide you want to build a tool like this, just let me know what I can do to help.

#7 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 30 May 2011 - 01:40 PM

Yes, that is pretty much it! Very well planned too.

Just add cron and we have it ready, so this tool could be used to import events from event calender feed, news from rss and tweets from twitter feed.

What this adds is few things:

  • We have to tell which field (or many fields combined, like title and date) is "id" - so that we don't end up with duplicates.
  • We need list view of all data feeds, where we can choose how often it is fetched (or if it's manual)

I actually really want to build this, since I know we will have ton's of imports coming in our projects.

About page relations: I think if these are needed then many times best way could be running another script after import, which adds needed relations.

About images & files: I don't remember any import where we would have need for importing files, so I'm glad to drop this out to keep things simple. Well.. simple enough at least :)

#8 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 30 May 2011 - 02:23 PM

Back to topic: I added caching and also ability to pull tweets based on username. I think this is pretty save now for use. There isn't any settings and it generates awful markup by default - so you probably need to edit lines 238 or 247 for your usage. I am not sure if I will support or develop this much further (well, if people really find this useful, I can add needed settings etc.. but this is fine for my own needs now), so please feel free to fork this if you want (I'll put it on github if someone requests).

If you have installed this, please remove all the pages (from trash also) using templates twitter-puller or twitter-tweets and after that uninstall. I haven't done any error checking on those install / uninstall methods, so it is pretty easy to get errors there...



#9 ryan

ryan

    Reiska

  • Administrators
  • 7,797 posts
  • 6572

  • LocationAtlanta, GA

Posted 31 May 2011 - 09:44 AM

Looks great, thanks for this update!

#10 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 04 June 2011 - 04:58 PM

If you decide you want to build a tool like this, just let me know what I can do to help.


I started coding and I have a pretty good start. Still very much work in progress, but works as a proof of concept. It would be super helpful for me, if you find time to check this out at some point and give me some comments or tips what I am doing wrong :)

https://github.com/a...ocessDataImport

It is actually pretty cool stuff (ugly, but still cool ;)), I can't wait to get this ready and start using it.

UPDATE: Oh, forgot to say that works only with JSON feeds at this point. Here is couple of feeds you can test (should "work" with any valid json though):
http://search.twitte...n?q=processwire
http://keikalle.com/api/city/helsinki
http://gdata.youtube...ar?v=2&alt=json (this is monster)

#11 ryan

ryan

    Reiska

  • Administrators
  • 7,797 posts
  • 6572

  • LocationAtlanta, GA

Posted 05 June 2011 - 08:02 AM

Wow this is really looking great! I tested on the feeds you indicated, and was able to perform the field mappings no problem. It looks to me like everything you've implemented so far works. Great work! Keep doing what you are doing. Here are a couple notes:

In your table structure, you may want to use ID's (integers) rather than strings for 'template' (in the main table) and 'to_field' (in the mappings table). This is in case either of these things get renamed -- the ID is permanent, but the name is not.

One thought occurred to me on the load screen, and that was that it would be handy to see an example of the data for each field (like the data from the first record) when associating the field, just to clarify what it is. Though I could just as easily load the feed in my browser and do that, so that's only a luxury feature (probably not something for v1).

#12 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 05 June 2011 - 12:22 PM

Thanks for the feedback! I will change field & template string to id:s.

One thought occurred to me on the load screen, and that was that it would be handy to see an example of the data for each field (like the data from the first record) when associating the field, just to clarify what it is.


This is actually implemented, but it is very bare bones implementation. I use a elements for value keys - and I have actual value there in title attribute. So depending on the browser you should see the value if you keep cursor on top of the key.

#13 ryan

ryan

    Reiska

  • Administrators
  • 7,797 posts
  • 6572

  • LocationAtlanta, GA

Posted 06 June 2011 - 08:22 AM

This is actually implemented, but it is very bare bones implementation. I use a elements for value keys - and I have actual value there in title attribute. So depending on the browser you should see the value if you keep cursor on top of the key.


That's awesome, I missed that before – you are a step ahead of me. I'm very excited about this module.

#14 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 02 July 2011 - 02:56 PM

I played little bit more with this, but now I need some help. I have still few bits missing, but the most important thing is giving me a problems now. This is now pure php, so no processwire knowledge needed to help me on this (I could ask help from many other sources, but I start from here).

I have array where I keep my "data_root". This tells me where is the "start" or "root" array in my source data (the array which I want to loop and take values from). If you take a look at twitter data (http://search.twitte...arch.json?q=php) you will see that this "root" is "results". So I would loop this data like this:

<?php
foreach($dataFromTwitter->results as $item) {
    $out .= $item->text; // This would have actual tweet text
}

So my data_root array keeps in this example just one value: ['results']. Sometimes we loop right from the root (http://keikalle.com/api/city/helsinki) and sometimes the "results" are deeper (http://gdata.youtube...ar?v=2&alt=json, where it is feed -> entry).

Ok, this has been probably very badly written question so far, so I try to keep it simple now:

How I can turn this:

<?php
Array
(
    [0] => feed
    [1] => entry
)

into this:

<?php
foreach ($data->feed->entry as $item) {
//
}

So that I could loop it.

Any help appreciated (like is there some function to map these or am I all lost :)). This is probably something super simple, but I couldn't get my mind working on this..

#15 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 02 July 2011 - 03:35 PM

I tried stackoverflow on this one, and it took 2 minutes to get a working solution. Gotta love that site (and people contributing).

function map_property($obj, $array) {
  $ret = $obj;
  foreach($array as $prop) {
    $ret = $ret->$prop;
  }
  return $ret;
}

foreach(map_property($data, array('feed', 'entry')) as $item) { }


#16 ryan

ryan

    Reiska

  • Administrators
  • 7,797 posts
  • 6572

  • LocationAtlanta, GA

Posted 03 July 2011 - 08:46 AM

Glad you found it, and thanks for posting the solution. I love that site too... probably one of the most useful sites I come across regularly.

#17 martinluff

martinluff

    Full Member

  • Members
  • PipPipPip
  • 79 posts
  • 2

  • LocationChristchurch NZ

Posted 06 November 2011 - 06:36 AM

Antti and Ryan, only just reading through this thread since I might use the Twitter module soon. Great work on the Twitter module Antti and some interesting comments on  the Cron and general data import tool. Would definitely be useful for cron module Ryan - that would be cool :)

On the idea of general data import - could a module like this piggyback on Yahoo Query Language? Not sure if either of you have used this but I've played with it a bit and it's very, very cool; uses JS to simplify access via SQL like syntax to over 1,200 information feeds - BBC, Amazon, Apple, Flickr, Twitter and so on - plus can format as XML or JSON or via REST... http://developer.yahoo.com/yql/console - use the link for 'show community tables' on the right to see the full list. I could be way off track here but kinda seems like it could be an uber-cool module and make ProcessWire the de-facto tool for data-mashups  ;D Either way, I might start another thread to try and understand how I might code something in PW to import YQL produced data into PW articles since I'd be very interested in getting that to work...

#18 apeisa

apeisa

    Hero Member

  • Moderators
  • 3,154 posts
  • 1705

  • LocationVihti, Finland

Posted 06 November 2011 - 06:45 AM

Actually my work in progress Data Import does work with any JSON you throw at it: https://github.com/a...ocessDataImport

So you could easily use YQL to generate JSON feed for you, and then use Data Import to import those values to your pages. Data Import does need some work though (title needs to be unique - no way to combine values etc), but in simple tests it always works. Note though: JSON works always, XML has some problems.

Not sure if you meant something deeper integration between YQL and PW? Something like where you could build those queries from PW admin?

#19 martinluff

martinluff

    Full Member

  • Members
  • PipPipPip
  • 79 posts
  • 2

  • LocationChristchurch NZ

Posted 09 November 2011 - 08:38 PM

Cool, that's useful to know about the JSON stuff, might be useful to bottom out the XML at some point; I know it's not Ryan's preference but I think it ties in well with the strict hierarchical and tree based approach of PW...

Well I see the main advantage of YQL is that it pulls together such a huge number of diverse and varied APIs through a unified and straight-forward query language so you don't have to keep learning all the different quirks or each one; so yes, if you had a plugin which could work a bit like the YQL console for quickly putting together feeds and then marry it with something like your module if/when you needed to not just display the data via a feed but also pull it into the back-end db - so once you had it working for one feed then would seem fairly trivial to adapt for the others (but I could be way out there). I could see some interesting possibilities... and kind of feels like it would answer both ends of the equation - easily preparing custom feeds and then importing. I've only very briefly looked at Yahoo Pipes but possibly that would also permit the easiest way of combining feeds, and combining values within a single feed or across feeds before importing. Anyway, just an idea. I need to take a closer look at DataImport to understand how it functions...

#20 ryan

ryan

    Reiska

  • Administrators
  • 7,797 posts
  • 6572

  • LocationAtlanta, GA

Posted 10 November 2011 - 10:18 AM

YQL definitely sounds interesting to me. I look forward to learning more. I've been working with a lot of XML data feeds lately. Currently building a system that mirrors articles from a content provider a couple times a day and I'm pretty impressed by their web services. Lots of feeds to pull from between articles, images, comments, categories and more. Regardless of format, PHP's SimpleXML + ProcessWire makes it really straightforward. There's nothing more satisfying than pulling data from web services and watching a site run itself. :) So something like YQL sounds very intriguing.




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users