We completed eight weeks of our Computerized Ratings Pilot.
June 6/8 = Practice Week
June 13/15 = 1st Official Week, Scores Entered into DUPR via Phone
June 20/22 = 2nd Official Week, Scores Entered into DUPR via Administrator
June 27/29 = 3rd Official Week, All Players Must Participate in DUPR
July 11/13 = 4th Official Week, Thursday Women’s Session Cancelled due to Heat
July 18/20 = 5th Official Week, Team-Based Events
July 25/27 = 6th Official Week
August 1/3 = 7th Official Week, Pick-Your-Partner Events
146 of our Club Members have played 1+ game from June 13 – August 3. You have played a total of 384 games, an average of 10.5 games per player (384*4/146).
There are four weeks left.
August 8/10 = King of the Hill Format
August 15/17 = Pick-Your-Partner Event
August 22/24 = King of the Hill Format
August 29/31 = CHAMPIONSHIP TUESDAY/THURSDAY
Mark August 29/31 on your Calendar. Originally, I scheduled a team-based championship, but the team format did not work well, so it has been replaced with CHAMPIONSHIP TUESDAY and CHAMPIONSHIP THURSDAY. Every player who signs up on 8/29 and 8/31 will be rank-ordered by our KPR rating, then assigned to Courts 1/2/3/4/5/6/7/8. The two players who win the final game on Court 1 will earn the Championship and will receive a Culvers Gift Card. More importantly … the player who advances the most courts (i.e. starts on Court 7 and ends on Court 3) will win the HARD CHARGER award and will earn a more valuable Culvers Gift Card.
It's just a little way for me to thank all of you for helping our Club pilot computerized ratings this summer. Your work has been so … darn … valuable. Meaningful. Important. I am indebted to every person who played one event. You volunteered your time (often in incomprehensibly hot conditions) to lose games while trying to help our club understand what role a computer plays in evaluating players.
We have learned that a computer does a pretty good job of evaluating players. This fact is unavoidable now. Our sample size is big enough to say this.
Allow me to take a brief detour here, one that will get us back to computer ratings. I spent 35 minutes on the phone this afternoon with a person in charge of ratings at another club, one that is about half of our size. This club went from the typical 2.5 / 3.0 / 3.5 / 4.0 / 4.5 rating (with three control players and one player being tested with a ratings observer, three games played in one hour) to a self-rating process (Beginner, Intermediate, Intermediate-Plus, Advanced). The person I spoke with told me about cheating happening (i.e. control players playing poorly to help get a player moved up, control players playing super hard to push somebody back down), which led this club to move to self-ratings. I asked him if self-ratings were better than 2.5 / 3.0 / 3.5 / 4.0 / 4.5 with one player and three control players, and he said “yes”. That statement baffled me. He told me that too many control players conspired with those being tested to either keep players out or get players moved up. Self-rating eliminated cheating.
I asked the individual if any problems were created by self-ratings? He was very clear about the problems.
All players, even Beginners, thought they belonged one level higher.
The Beginner round robin dried up. People wouldn’t play because they thought they were better than “beginners”.
Intermediate players quit playing the Intermediate round robin because the Beginner players all moved into the Intermediate round robin – so the Intermediate players moved up to Intermediate-Plus.
The Intermediate-Plus players quit playing their round robin because they didn’t want to play with the Intermediate players, so they all joined the Advanced round robin.
The Advanced players revolted, they didn’t want to play with the Intermediate players, so they demanded an Advanced-Plus round robin or they’d just quit organized play altogether. An Advanced-Plus round robin was created to placate the best players.
Doesn’t that sound like fun? Welcome to self-ratings!
And this guy thought the self-rating system worked BETTER than having three control players with one player being tested and an observer judging performance because of the cheating involved with control players.
This individual asked me a lot of questions about our Pilot. At the end of the phone call he told me the following … these are direct quotes.
“If you have more than 140 players playing in a Computerized Ratings Pilot program with no benefit to them during the hot summer months, you have a pretty special Club.”
“I’ve talked to a lot of Clubs. It is obvious everybody is trying to solve the control-group issue and self-rating issue, and that a computer rating is likely to be adopted by clubs in the future as a solution to two methods that have flaws.”
Interestingly, this guy was thinking of implementing DUPR for all players 3.75 and above to get around the foolish decision DUPR made to start all players at 3.50.
Ok, that was a long detour, wasn’t it?
The point of the story is that we are working through this Pilot, and we are not alone. There are problems with control groups – it was fascinating to hear another ratings leader talk about control players either making it easy for friends or making it hard for people they didn’t like. A computer rating gets us past that problem. At the same time, the individual clearly outlined a reason for self-ratings to fail … players inflating their own value. A computer rating gets us past that problem.
Here's one final quote from the individual.
“We simply cannot participate in a world where we have volunteer raters doing a ton of work in good faith and then being yelled at by our members.”
This is another reason why we have to Pilot computerized ratings.
I’ve said this before … you are Pioneers. You are piloting something that is causing other clubs to have their eyes on us. Your work this summer matters.
Style vs. Winning
Have any of you watched the movie “Moneyball?” The movie tells the story of the 2002 Oakland A’s. The A’s lost their best players to free agency (i.e. they couldn’t afford to keep their best players). This team embraced “computer ratings” … they let a computer determine the value of players, then they spent their limited player budget on players who were undervalued. The team spent 1/6th as much as the New York Yankees and won the same number of games as the Yankees won.
There are so many parallels between the movie “Moneyball” and this Pilot.
A computer rating changes how we look at players. Any subjective ratings system incorporates “style.” Raters might ask you to hit drop shots perfectly to move up to a higher level within a subjective rating system. Hitting drop shots effectively is important. But it is a portion of pickleball, not all of pickleball. If you are bad at drop shots but are great at getting to the kitchen via good 5th shots and 7th shots, you are effective and you will win games. At some point, you’ll need to get better at drop shots to play against high-level competition, sure. Many of you told me early in the Pilot you didn’t like having somebody stand next to your court with a clipboard in hand, tallying drop shot effectiveness when drop shots were a fraction of all balls hit.
A subjective system rewards skills, then makes the assumption that skills translate to wins.
A computerized rating system cares about wins, and cares about how likely you are to win a game. Here is how a computer rating rewards teams:
3.50 Team Beats a 3.00 Team = 0.001 ratings point exchanged
3.50 Team Beats a 3.20 Team = 0.01 ratings points exchanged
3.50 Team Beats a 3.35 Team = 0.02 ratings points exchanged
3.50 Team Beats a 3.43 Team = 0.03 ratings points exchanged
3.50 Team Beats a 3.50 Team = 0.05 ratings points exchanged
3.43 Team Beats a 3.50 Team = 0.07 ratings points exchanged
3.35 Team Beats a 3.50 Team = 0.08 ratings points exchanged
3.20 Team Beats a 3.50 Team = 0.09 ratings points exchanged
3.00 Team Beats a 3.50 Team = 0.099 ratings points exchanged
When we have mismatches in our pilot, the computer quickly diagnoses the mismatch and does not reward the winner.
When we have upsets in our pilot, the computer immediately rewards the player by adding 0.09 ratings points. However, the increase in ratings points comes at a cost … because the player won via upset, the player has a higher rating and now will be favored in matches the player would normally be expected to lose. If the player loses, the player loses more ratings points because the player won more ratings points in games prior, resetting the rating closer to where it was before the original upset happened.
This is a big deal. The only way the player will be able to increase a computer rating is to win consistently or upset better teams. The way the player will win consistently is to be fundamentally sound in MANY ASPECTS of pickleball within a level, not just a handful of chosen attributes.
Can I show you a series of graphs of actual player ratings in our Pilot? What I am going to show you is what happens when we ignore “style” and focus instead on wins and losses.
This is the progression of a player in our pilot. I anonymized the rating to protect the player. But look at what happens to the computer rating over time. Remember, KPR is the computer rating we are piloting this summer, a rating that could eventually be imported into Court Reserve or will be similar to what Court Reserve will offer us later this year.
This player started at 3.00, fell quickly to about 2.93, then went on a tear and got the rating up as high as 3.225. Several bad games lowered the rating to 3.049, more good games raised the rating to 3.211, and as of this week the player is at 3.119.
After about 15 games, this player settled in at a level. The level is higher than our Club classified the player (3.000). But the player isn’t at a 3.250 level where the player would technically move up one color level. The player gets close to the higher level, then gets batted down. Over the last 15 games … the player wobbles around a 3.150 level. In other words, the player likely needs to acquire additional skills to improve the rating.
This is where the Computer Rating and Player Development fuse together.
The player is clearly better than where the player was assessed … but the player is not a full “color level” better. After the computer rating figured out where the player “should be”, the player performed as expected … some wins, some losses, some upsets, sometimes upset.
This player won 72% of games when the player was favored to win.
This player won 45% of the games where the player was the underdog (was not supposed to win).
The data makes it clear, however, that if this player wants to “punch through” and get to a higher level, the player will likely need to add skills. As the player builds a “resume,” the player basically wobbles +/- 0.15 ratings points above/below a baseline. At first, the computer greatly helped the player move up from an initial starting point. Now the computer is holding the player back because the player is not beating better competition often enough.
Here’s another anonymized case study from our pilot.
It took the computer ten games to find where this player should be, since then the player has had good days and not-as-good days. The player started at 3.000, got as high as 3.375, and is currently at 3.188.
We can see how the computer quickly fixes issues, then adjusts up-and-down as the player works through his/her journey.
Also notice an important point here … this player increased KPR by nearly 0.40 points (about 1.5 color levels). The player, however, didn’t stay there. This is something we are going to have to become comfortable with. In a color-based world, the player would move up and stay up. In a computer-based world, the player could move up and down. As a Club, we’re going to have to give thought about what “moving down” means.
Here is the story of a player that had less success (again, anonymized).
This player was upset four consecutive times (and six times in seven games), lowering the KPR from 3.025 to 2.780. The odds of being upset four consecutive times are really low (about 1 in 625), so this player could be a victim of bad luck. Nonetheless, we need to pay attention to these cases. Did the heat impact the player? Is the player injured?
This player performed at expectations for seven games and then had a difficult stretch of games. As a club, do we penalize this player if we elected to have non-competitive/fun round robins? Or do we build a “cushion” so that this player can participate in competitive events and risk a KPR without risking status in a non-competitive/fun round robin? One could easily envision a world where this player would stop playing in competitive events for fear of losing status in a non-competitive/fun round robin.
What this player suffered through might be unique to our Pilot. In a post-Pilot world (if we adopt computerized ratings), players would be playing against competition reasonably close to equal to the player. In other words, the player is highly unlikely to be in a situation where four consecutive upsets happen.
This is why we run a summer-long Pilot. If we don’t run a Pilot, we don’t observe this individual case study, and we don’t have an answer for the problem if it happened in the real world.
One of the best things about executing a Pilot is learning something unexpected.
In prior writeups, I discussed the concept of Upsets. The computer averages the ratings of a partnership (3.75 and 4.00 = 3.875 average), then compares the average to the average of the opposition (4.00 and 4.25 = 4.125 average). In this case, the Indigo/Green team is the underdog, the Green/Burgundy team is the favorite.
Our Pilot proved that upsets happen. In fact, upsets are supposed to happen. If they didn’t happen, why would we bother playing the game? Upsets are not proof a player deserves to be at a higher level. Upsets are instead managed by the computer.
Upset percentages have remained constant throughout the Pilot.
Ratings Difference of 0.10 point to 0.25 point = 29% chance of upsetting the better team.
Ratings Difference of 0.25 point to 0.50 point = 18% chance of upsetting the better team.
Upset Percentage in the Pick-Your-Partner Event? 21%.
Across 384 games, the computer has done a fabulous job of picking winners/losers, rewarding players who win consistently, and rewarding players who upset the opposition at rates higher than expected. A Pilot, executed over time, smooths out the bumps, odd outcomes, strange matchups, and targeting issues.
If we move forward with Computerized Ratings, we need to teach those who didn’t participate in the Pilot that upsets happen. It will no longer be good enough to say “I beat Janette once last week, I deserve to be moved up.” No more. You are supposed to beat Janet occasionally. It will become important to build a resume over 20-30 recent games. If you play 30 games, and you are supposed to get beaten handily in ten of the games and you win two of those ten games, you are simply doing your job. That is what is supposed to happen. If you win four or five of those ten games where an upset is possible, the computer will reward you and you will likely move up 0.25 ratings points as a result.
Your Starting Point Matters
The two people I’m about to reference will recognize what I’m about to say.
I made two mistakes working through our volunteers.
I mistakenly rated one player 0.25 points above his level. As we worked through the first six weeks of the pilot, his rating decreased by about 0.40 points. When I corrected his starting point and re-calculated his computer rating, his rating decreased by about 0.10 points (and has rebounded since).
This is important. It means that the computer worked hard to move this player to the place where the player actually resided. However, my mistake caused all the players this person played to collectively be rated 0.30 points higher in aggregate. This person played about ten people, meaning each player was rated 0.03 points higher than they should have been. I caused this problem. The players impacted did not cause this problem.
A second problem – one player was “unrated”, so I assigned a 2.50 rating. After 10-15 games, this player moved up to 2.80. This player was consistently playing 3.25 players and winning often. I re-ran my computer rating algorithm, starting this player at 3.25 instead. Now, this player is working toward 3.50. Because I initialized this player incorrectly, the player advanced fast (but not fast enough), and the players the individual played against collectively lost 0.30 ratings points (about 0.05 points per player) to make up for my mistake – this is not fair to those players (and is the primary reason why I don’t advocate moving anybody “up” a color level because of performance in our Pilot – the Pilot isn’t designed to move players up when issues like this are identified).
If we adopt computer ratings, we need to be “reasonably” good at assigning an initial rating. We cannot be like DUPR and mistakenly start everybody at 3.50. We must get the initial rating within +/- 0.25 points if we want a computerized ratings system to be successful. Our colors represent a good starting point (though we have individuals who will vociferously disagree with me). New players to our Club will need to be evaluated and given an introductory rating that is directionally appropriate.