What I'm looking for is the clubs with the highest ratings average based on number of votes and obviously the highest ratings.
What I have done is the following:
- calculate the variance of the entire population (all votes && all clubs)
- determine the standard deviation of each club against the population variance
- calculate a new
club weighted median
based on subtracting one standard deviation to weed clubs with low votes
The problem is I cannot determine why my data is not showing correctly. I imagine there is an issue with my calcuations. I am getting numbers in the 10s and being negative when I should be receiving a value from 0-5 (as is the rating scores).
I'm not really quite sure where my logic fails.
Here is my ratings code logic:
$sql="SELECT SUM(rating) AS sumrating,COUNT(reviews.rating) AS countrating FROM reviews";
$rs=mysqli_fetch_array(mysqli_query($scx_dbh,$sql));
// get the total summation of ratings against all reviews
$ratingssum=(int)$rs['sumrating'];
// get the total number of ratings against all reviews
$ratingscount=(int)$rs['countrating'];
// get the population mediam
$mean = $ratingssum / $ratingscount;
// determine the variance of the population
$variance = 0;
$sql="SELECT rating AS score FROM reviews";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$score = (int)$row['score'];
$variance += pow(($score-$mean),2);
}
$variance = $variance/$ratingscount;
// loop through all clubs and implement new rating
$scores=array();
$sql="SELECT locid,COUNT(reviewid) AS locationrecordcount,AVG(rating) AS locationmedian FROM reviews GROUP BY locid";
$rs=mysqli_query($scx_dbh,$sql);
/// begin loop
while($row=mysqli_fetch_array($rs)){
// get the number of review votes for this club
$numvotes=(int)$row['locationrecordcount'];
// get the location id
$locId = (int)$row['locid'];
// find the standard deviation for this club (total variance * numclubvotes)
$standarddev=sqrt($variance*$numvotes);
// create the new rating for this club with 1 standard deviation less
$oldRating=$row['locationmedian'];
$newRating=$oldRating-$standarddev;
$scores[$locId] = array(
'numvotes'=>$numvotes,
'standard-deviation'=>$standarddev,
'original-rating'=> $oldRating,
'weighted-rating'=>$newRating
);
}
usort($scores,function($a,$b){
return $a['weighted-rating']-$b['weighted-rating'];
});
And here are my results:
top 10
[0] => Array
(
[numvotes] => 1121
[standard-deviation] => 68.898321138853
[original-rating] => 4.415700267618207
[weighted-rating] => -64.482620871235
)
[1] => Array
(
[numvotes] => 909
[standard-deviation] => 62.042283630954
[original-rating] => 3.1290979097910174
[weighted-rating] => -58.913185721163
)
[2] => Array
(
[numvotes] => 594
[standard-deviation] => 50.153247058093
[original-rating] => 4.414225589225589
[weighted-rating] => -45.739021468868
)
[3] => Array
(
[numvotes] => 505
[standard-deviation] => 46.243587892712
[original-rating] => 4.090099009900985
[weighted-rating] => -42.153488882811
)
[4] => Array
(
[numvotes] => 517
[standard-deviation] => 46.78979093937
[original-rating] => 4.661025145067699
[weighted-rating] => -42.128765794302
)
[5] => Array
(
[numvotes] => 505
[standard-deviation] => 46.243587892712
[original-rating] => 3.2117821782178173
[weighted-rating] => -43.031805714494
)
[6] => Array
(
[numvotes] => 398
[standard-deviation] => 41.053233483774
[original-rating] => 4.231155778894469
[weighted-rating] => -36.822077704879
)
[7] => Array
(
[numvotes] => 340
[standard-deviation] => 37.944190471069
[original-rating] => 3.9102941176470547
[weighted-rating] => -34.033896353422
)
[8] => Array
(
[numvotes] => 323
[standard-deviation] => 36.983422110177
[original-rating] => 3.261145510835913
[weighted-rating] => -33.722276599341
)
[9] => Array
(
[numvotes] => 280
[standard-deviation] => 34.433791770728
[original-rating] => 3.36767857142857
[weighted-rating] => -31.066113199299
)
[10] => Array
(
[numvotes] => 254
[standard-deviation] => 32.796136967109
[original-rating] => 3.1411417322834665
[weighted-rating] => -29.654995234825
)
Worst 10
[232] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 4.95
[weighted-rating] => 2.0398134378534
)
[233] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 5
[weighted-rating] => 2.0898134378534
)
[234] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4
[weighted-rating] => 1.9421873473882
)
[235] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 4.8
[weighted-rating] => 1.8898134378534
)
[236] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 3.25
[weighted-rating] => 1.1921873473882
)
[237] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[238] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[239] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4.1
[weighted-rating] => 2.0421873473882
)
[240] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[241] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 5
[weighted-rating] => 2.0898134378534
)
)
UPDATE
OK so I recalculated the standard deviation
against the entire population. It is 2.0578126526118
.
Here is my current code:
$sql="SELECT SUM(reviews.rating) AS sumrating,COUNT(reviews.rating) AS countrating FROM reviews";
$rs=mysqli_fetch_array(mysqli_query($scx_dbh,$sql));
$ratingssum=(int)$rs['sumrating'];
$ratingscount=(int)$rs['countrating'];
$mean = $ratingssum / $ratingscount;
$variance = 0;
$sql="SELECT rating AS score FROM reviews";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$score = (int)$row['score'];
$variance += pow(($score-$mean),2);
}
$variance = $variance/$ratingscount;
$standarddev=sqrt($variance);
$scores=array();
$sql="SELECT locid,COUNT(reviewid) AS locationrecordcount,AVG(rating) AS locationmedian FROM reviews GROUP BY locid";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$numvotes=(int)$row['locationrecordcount'];
$locId = (int)$row['locid'];
$oldRating=$row['locationmedian'];
$newRating=$oldRating-$standarddev;
$scores[$locId] = array(
'numvotes'=>$numvotes,
'standard-deviation'=>$standarddev,
'original-rating'=> $oldRating,
'weighted-rating'=>$newRating
);
}
usort($scores,function($a,$b){
return (int)($a['weighted-rating']-$b['weighted-rating']);
});
1./ I think my sorting function is incorrect. After sorting using my sorting function, these are the top 5:
[0] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.2
[weighted-rating] => -1.8578126526118
)
[1] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.05
[weighted-rating] => -2.0078126526118
)
[2] => Array
(
[numvotes] => 4
[standard-deviation] => 2.0578126526118
[original-rating] => 0.7625
[weighted-rating] => -1.2953126526118
)
[3] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.1
[weighted-rating] => -1.9578126526118
)
[4] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.4
[weighted-rating] => -1.6578126526118
)
As you can see, besides the fact that they are producing negative numbers, it looks like the weighted-average
of position 1 (index 0) is -1.85
and position 2 (index 1) is -2.00
. I imagine there's an issue with either my algorithm or my sorting function in my code or else why are there negative numbers being sorted as first
.
Also, i am getting clubs at positions 1 when they have 1 vote. The purpose of this algorithm was to weed these clubs out so I can focus on the clubs that have 1000's of votes.
Here are the bottom 5:
[237] => Array
(
[numvotes] => 29
[standard-deviation] => 2.0578126526118
[original-rating] => 4.112068965517241
[weighted-rating] => 2.0542563129054
)
[238] => Array
(
[numvotes] => 5
[standard-deviation] => 2.0578126526118
[original-rating] => 3.8800000000000003
[weighted-rating] => 1.8221873473882
)
[239] => Array
(
[numvotes] => 31
[standard-deviation] => 2.0578126526118
[original-rating] => 3.7499999999999996
[weighted-rating] => 1.6921873473882
)
[240] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[241] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4.45
[weighted-rating] => 2.3921873473882
)
The same behavior exhibits on the bottom 5. I've got a weighted-average
of 2.39
for position 5 (index 241) and a weighted-average
of 2.94
for position 4 (index 240)