CRS Pilot, A Problem is Identified - 8/10/23
Executive 1 Page Summary (more to follow for those who want more details).
We completed eight weeks of our Computerized Ratings Pilot. 152 players played 424 games in our Pilot, an average of 11.16 games per player. In three weeks, our Pilot will end. What have we learned so far (in one-page Executive Format)?
We learned we can quickly move players who are misclassified by 0.25 points (one color level) within two weeks.
We learned that upsets are part of pickleball. Two red players will beat two purple players 20% of the time. Two indigo players will beat two green players 20% of the time. This is a good thing!
We learned that it takes 10-25 games for a rating to stabilize under normal circumstances for normal players. Most of our pilot players were not put in normal circumstances. If we roll out a computer rating, circumstances will be normal.
We learned that it will be important in a competitive round robin or ladder to play against players reasonably equal in ability. The computer rewards winning, but it rewards winning against equal/better players a lot more than it rewards beating players at a lower level.
We learned that players have “fluctuating” computer ratings. In other words, a 2.75 today could become a 3.07 in a month, then revert to 2.75. We all have good days and bad days, good partners and bad partners, good luck and bad luck. The pilot demonstrated all these factors equal out over time.
As an observer, it is obvious that players who perform well in a computer rating environment keep balls in play … they hit fewer drop shots into the net, they do not paint lines, they do not hit returns of serve out, they do not hit out balls. These factors will be important if we adopt computer ratings, as important as the soft skills that were previously important.
We learned we may need a back-stop. In other words, a 3.00 player might play in a computerized environment and fall to 2.72. We likely need to do something to protect a player who participates in a computer rating environment. Trying and losing should count for more than not trying.
We demonstrated that if better players refuse to participate, good players can still get their computer rating high enough to be classified with better players.
An Experiment Regarding Opportunity
On Thursday, I asked all 16 players to participate in an experiment. I rank-ordered all players from 1 to 16 based on KPR. I had the top 8 players play in one group (courts 1-2) and the next 8 players play in another group (courts 3-4).
On courts 1-2, players with the 6th/7th/8th highest KPRs would be forced to play against better-than-average competition.
On courts 3-4, players with the 9th/10th/11th highest KPRs would not be able to play against better-than-average competition, they could not get to courts 1-2.
I measured KPR for these six players before play started, and then again at the end of the day.
Players rated 6th/7th/8th:
3.321 Before KPR
3.348 After KPR
0.027 Change
Players rated 9th/10th/11th:
2.952 Before KPR
2.969 After KPR
0.017 Change
Can we agree that the difference in rating (before/after) is essentially the same?
It means that the players given an “Opportunity” (an opportunity to prove themselves against better players) performed at the same level as players who were shut out of an opportunity.
I ran experiments like this all season. That’s the purpose of the “firewall” I’ve put up this year. The firewall allowed our better players to have an environment where their games were competitive and even. But more importantly, the firewall allowed us to test what happens when you don’t let players play against better competition. Can a computer rating increase if you don’t let players play against better competition? All summer long, your work proved that “yes” the computer will move your rating higher if you cannot play against better competition.
Why is this important?
The biggest complaint from our Maroon/Indigo players in the pilot is this:
“The Aqua/Burgundy/Green players will not play against us. How can we demonstrate we are good enough to move up if the players we’d prove we are good enough against will shut us out this Fall and beyond?”
As I’ve told our Indigo/Maroon audience all summer, we cannot create a system that forces players to do things they do not want to do. If our current Aqua/Burgundy/Green players do not participate in a computerized rating system, fine. That will not stop a computer from properly evaluating our Indigo/Maroon players and rewarding them for good play. If you are an Indigo/Maroon player and you are worried about being shut out by our better players, do not worry about the computer side of the equation.
In other words, the computer will offer opportunities, regardless of who plays in a competitive environment.
An Experiment Regarding Opportunity, Part 2
On Tuesday, I performed an experiment with our Men. No firewall! This doesn’t usually go over well with our better players, but I wanted to see what would happen if I gave players a short-term opportunity … to get to Court 1.
Without a firewall, four players begin on Court 3. One team wins, splits, and goes to Court 2. By default, one of the two players will end up on Court 1.
I wanted to see how many players beyond the default number (1) would make it to Court 1.
The answer?
Three! One didn’t get to play a fifth game, but he earned his way there regardless.
Combined, the three players saw their KPRs increase by 0.14 / 0.17 / 0.17 on Tuesday.
I realize these players likely played “keep away” from better players. But that was their job, wasn’t it? What should they do?
We’ve had a chorus of commentary this season about my format. “You can’t put a purple player up against a maroon player, it isn’t a fair test of skill.” We are not testing skill (if we were, we’d be upping ratings for players or even demoting players), we are testing the probability of an upset. We are testing how a computer deals with odd matchups. We are testing if a computer can manage our players in a fair manner. If we can learn things when the matchups aren’t even, we’ll be in great shape if we roll out computer ratings under reasonably equal circumstances.
For the Maroon/Indigo/Green players who may have been defeated by these players on Tuesday? Thank you. Thank you! You put your rear ends out there, and when you do that, it isn’t all rainbows and butterflies. I appreciate that you helped us on Tuesday.
Why Would I Risk Lowering My Rating?
One of our volunteers approached me this week, with a simple statement.
“My friend says he won’t play in a computerized format if computer ratings are adopted. He says he doesn’t want to risk his status at his current color level. He quit playing in the pilot, and he won’t participate going forward if you adopt computer ratings. What are you going to do about him?”
You know what? I understand this viewpoint. My feet are killing me. If we had computer ratings today, I’d lose everything I “earned” over the past five years. It seems like a harsh penalty.
Of course, I’d get it right back if my feet felt better and I returned to playing at my previous level, but that’s a story for another day.
Here’s something we are all going to have to become comfortable with. All of us play pickleball for different reasons. Some of us want to play with our friends – this is essentially a version of “self-rating”. I play in a couple of groups with Burgundy / Green / Indigo players. We have a blast. So much fun.
Some of us play drop in. This is also a version of self-rating. I know what the signs say on each court, but the Green/Burgundy/Aqua court always has Purple/Orange/Maroon players on it, regardless what the sign says. Those players are self-rating, aren’t they?
Some players love playing in round robins. They want equal play, they want guaranteed court availability, they want somebody to organize the activity for them. There’s no reason in the future that there can’t be different versions of round robins, and I’m confident our Board is thinking through reasonable options.
We have 300-400 players each year who like a competitive environment (Ladders). I’ve repeatedly heard from this audience that Ladders is a place where a player can compete and be rewarded independent of color/rating … it’s the only place in our Club that rewards winning regardless of style of play on a weekly basis. It’s not hard to imagine the synergy between Ladders and a Computer Rating.
We have a couple hundred players who like playing in in-house tournaments, and maybe another 100 players who like playing in tournaments outside of PebbleCreek. Another 150ish players play in Leagues outside of PebbleCreek.
As you can see, we have many different players with many different desires. A computer rating can fit into this framework. But we’re going to have to become comfortable with the concept that not all players are going to participate.
Here’s another quote I heard multiple times … “Some players will play computer events if they want to hurt the computer rating of a player they don’t respect.” Yes, that could happen. If we adopt computer ratings, our Board will need to create events where players have opportunities to play many players of comparable skill, spreading the risk out. I’m confident our Board can work through various issues if they decide to implement computer ratings.
No system works perfectly. If we decide to implement computer ratings, we will have challenges, yes. This is my opinion only … the positives outlined during our pilot outweigh the “what ifs” and negatives identified to date. I’m proud of all of you and the work you’ve put in this summer to help our Club!
KPR Change by Color Level
To date, here is how KPRs have changed from the start of the pilot, by color level.
-0.019 for Aqua (3 players)
-0.039 for Burgundy (9 players)
0.000 for Green (17 players)
-0.009 for Indigo (15 players)
-0.022 for Maroon (25 players)
-0.060 for Orange (26 players)
0.066 for Purple (20 players)
0.042 for Red (24 players)
0.025 for Teal (11 players)
0.025 for Unranked (2 players)
What we see is very little change by color level. Again, we’ve proven all year that our colors are not bad. Yes, there are individuals who are not assigned to the proper color level … but the overall ranking by color is likely fair for 75% of our members.
We do have changes happening in Purple/Red … there are players “on the rise”, and these players performed increasingly well during the summer. These players are improving! These players are getting chances to play Orange/Maroon players, and they are winning some of those games. As a result, individuals are moving up, taking ratings points away from players above them.
Outside of learning about the prevalence of upsets, the most important thing we learned about a computer rating is the ability of a computer to catch a player on a meteoric rise and properly place that player with comparable competition. The player won’t have to repeatedly ask a rater to be evaluated, and the player won’t be held back for hitting 11 effective drop shots out of 20 instead of 12 effective drop shots out of 20. The computer will facilitate the meteoric rise of the player until the player hits a ceiling, requiring acquisition of new skills, while at the same time helping players who are on a slower but traditional trajectory of player development.
Did The Computer Fail or Did the Format Fail or Something Else?
One of our loyal players contacted me. His rating is not moving up very fast, in spite of the player winning 60% of his games.
I created what I call a “Player Resume”. For every player in our pilot, I can calculate wins/losses vs. each player and vs. players at different color levels. Combined, your performance is your “Resume”.
I reviewed the “Resume” for this player.
This player won 85% of games against players one color level below him.
This player won 67% of games against players equal to him.
That’s a good resume! I mean, come on.
What went wrong?
Did the computer fail? Kind of. This player played 35% of games against players one color level below him. He probably should have gotten more credit (however, the KPRs of the players he played were > 0.25 less than his, giving him very little room to earn ratings points).
Did the format fail? Kind of. This is on me. I repeatedly placed this player in a starting position against competition below his level … I didn’t even realize I did this. I tried to fix the problem this week. I placed the player against much harder competition. The player won 1 game, lost 2 games … and saw his KPR INCREASE! This is what I mean when I say that I put players in harm’s way this season. This is not a test of player ability. I have too much control over where your pilot-based rating ends up. I constantly hurt this player’s ability to earn KPR ratings points.
Did registrations fail? Yes. In other words, a summer pilot at 9:30 am during the hottest summer in Maricopa Count history caused too few players at any color level to register. As a result, our players are not being given enough “equal” matches … we simply don’t have enough courts and enough registrants to provide a suitable quantity of “equal” matches.
So yes, the process (and I) failed this individual.
The pilot, in aggregate, succeeded. Averaging players who got the short end of the stick with players who got favorable matchups and everybody in-between yielded very good, usable results.
We learned some good stuff this week! Every week yielded something positive. We’ve also revealed problems (as mentioned above). You deserve to know everything that worked, and you deserve to know about the failures.