Jump to content

Random Headlines block - (method and cache questions)


Recommended Posts

Hello All,

I've added a "Random Articles" block to the bottom of our article pages, at The Significato Journal.

The block is under the headline at the bottom, called:

"More Headlines You May be Interested In"

I have some questions:

a) Is my code efficient and the best method to display 4 random headlines? It's of special concern if a site eventually has tens of thousands of articles, because I'm creating an array of all of the articles pages, and then grabbing 4 random ones. It would be better, I think, if I could create an array of only 4, but I don't see how I can do that and still randomly select from all of them. Here's my code:

$random_selectors = "custom_template_file.select_value=article_page.php," .
                    "publish_date<=$now, headline_image_name!=";

$random_keys      = wire('pages')->find("$random_selectors");
$random_pages     = $random_keys->findRandom('4');

echo "<table width='100%' cellpadding='0' cellspacing='10'><tr>";

foreach ( $random_pages as $random_page )
      {
      # ... display headline and image
      }

echo "</tr></table>";

b) Every time I reload an article, the headlines change, which is what I want.

Does this break the 1-day template cache on the article, or does the headline block get passed through, with the rest of the page still cached? I'd like to keep the cache running.

Here's a sample url, so that you can see it in action:

http://significatojournal.com/columns/culture-of-heart/the-living-compass-of-kindness-and-compassionate-love/

Thanks for any feedback!

Peter

Link to comment
Share on other sites

I would just use random sort:

$pa = $pages->find("template=basic-page, sort=random, limit=4");

findRandom is more for in memory or image arrays.

As for the cache. No they won't get through, that's the purpose of cache. You could load them via ajax and inject them, after all they're at the bottom nobody will see they're loaded after.

  • Like 1
Link to comment
Share on other sites

Dear Soma,

Thanks! That's a much cleaner coding method. I revised my code, per your suggestion above.

I'm curious: how does "sort=random" work, internally? Does it use the MySQL "ORDER BY RAND()" code? Do you think that that method works okay, even with thousands of records? It would seem to me that it would have to create a random number on the fly for each record.

Also, about the cache: thanks for that. I had forgotten that I had set the cache on that template to only cache for guests, and I was logged in, which thus displayed the headlines refreshing every time I hit reload.

Since my pages are set to cache only for one day, I think that it will be okay, because the headlines will change once a day, which is fine.

Thanks for your help!

Peter

Link to comment
Share on other sites

A quick find in files for 'random' shows (in PageFinder.php):

if($value == 'random') { 
    $value = 'RAND()';

and

else $query->orderby("$value", true); 

So it does seem to use ORDER BY RAND(), which is pretty slow when dealing with a lot of rows for the reasons you already mentioned.

  • Like 2
Link to comment
Share on other sites

I just did this random query over 45k pages and don't see any slow down.

$pa = $pages->find("template=basic-page, sort=random, limit=4");

With 43k pages
random query time: 0.0474

With 720 pages
random query time: 0.0326

Almost as fast as a $pages->count()

  • Like 6
Link to comment
Share on other sites

That's nice Soma. PW seems to have optimized this as much as possible with the way the db is laid out. Impressive results to be honest. I would not expect this to be almost as fast as $pages->count(). I was thinking about this to randomize the offset and then grabbing a page based on that:

$pageCount = $pages->count("template=basic-page");
$startOffset = rand(0, $pageCount-1);
$results = $pages->find("template=basic-page, start=$startOffset, limit=1");

Dunno, maybe this is my noob thinking and it's probably not efficient at all.

Link to comment
Share on other sites

That's nice Soma. PW seems to have optimized this as much as possible with the way the db is laid out. Impressive results to be honest. I would not expect this to be almost as fast as $pages->count(). I was thinking about this to randomize the offset and then grabbing a page based on that:

$pageCount = $pages->count("template=basic-page");
$startOffset = rand(0, $pageCount-1);
$results = $pages->find("template=basic-page, start=$startOffset, limit=1");

Dunno, maybe this is my noob thinking and it's probably not efficient at all.

This code would work, but not really getting random entries but rather a random slice out of them. And query times are about the same. 

Strange thing is in my test I get a longer execution time of that code when I use

$results = $pages->find("template=basic-page, start=$startOffset, limit=1"); // ~0.08

than with

$startOffset = rand(0, $pageCount-4);
$results = $pages->find("template=basic-page, start=$startOffset, limit=4"); // –0.03

Not sure why.

Link to comment
Share on other sites

Dear SiNNuT and Soma,

Very interesting benchmarks. I'm not anywhere near 45,000 pages yet, but some of my clients could be.

I'll Google around... I would think that this question of the efficacy of MySQL's random sort with massive databases would have come up before.

Thanks for all the input.

Peter

Link to comment
Share on other sites

There's definitely enough reading material on the interwebs on the relatively poor performance of 'order by rand()'. For example this: http://jan.kneschke.de/projects/mysql/order-by-rand/ .Quite interesting (but also 2007, maybe MySQL has improved?). But seeing Soma's test results i wouldn't be too worried about it in most real-life scenarios with PW. And if you say that you keep a cache for a day i would say the problem is non-excistent.

In theory looping 4 times through the 'solution' i proposed (you only have to do the count once) should give random entries and possibly be faster on really large number of pages. Unfortunately i have not got a way to test this properly.

Link to comment
Share on other sites

Dear SiNNuT,

Thanks for that link. I see that your solution was based on this comment:

Teo wrote:

One thing I don't get. Why do we need to generate a random ID at all??? While can't we get the number of rows (rather than max(id)), generate a

random POSITION (rather than a random id), and then use that position as the first argument of LIMIT (being 1 the second argument)
???
Am I missing something?

Did that once. Works quite well, even on not uniformly distibuted ID values, but only if you want to fetch a single row. Still, you need to generate one random number per row either way.

I agree with your idea - it seems very practical on large data sets.

I'll give it a shot, and see how it performs.

Thanks again,

Peter

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...