Hello,
I'm developing a 'directory' web app which allows users to add their own listings, and admins to review and approve listings. All user interaction is done on a custom front-end - not using the processwire admin. After a deadline date, all listings will be reviewed and exported as custom markup for creating a print edition.
As part of that process the client needs to be able to identify potential duplicates. I created the script below, which does the following:
Takes in a GET variable assigned to $l - a letter of the alphabet
gets all the listings starting with $l
finds other listings in that PageArray that have the same surname and a matching telephone number (tels is a table field)
Returns an array of 'original' ids with duplicate ids within it
if ($l) {
$l = $sanitizer->selectorValue($l);
$all_listings = $pages->get(1070)->children('include=all,surname^='.$l);
$dups = array();
foreach ($all_listings as $a) {
$existing = $allr->find('surname='.$sanitizer->selectorValue($a->surname).',tels.value='.$a->tels->implode('|','value').',tels.count>0,id!='.$a->id);
if (count($existing)) {
$c = 0;
foreach ($existing as $e) {
if (!count($dups[$e->id]))
$dups[$a->id][$c++] = $e->id;
}
}
}
echo print_r($dups,true);
}
The script above does what it is meant to do. The problem is that there are nearly 5000 listings, and it is really slow - searching through 'A' takes about 15-20 seconds. I'd actually added in the letter filter as it was hitting the PHP timeout without it. I can't add a 'limit=' as in order to find the duplicates it needs to check all the listings.
Are there better ways to do this? I had initially tried a direct MySQL query, but my knowledge of this is pretty basic, and gave up when it got to trying to match the 'tels'.
Cheers (and thanks for the best system going!),
Chris