CRS Pilot, Upsets Happen - 6/30/23
We have completed four weeks of our Computerized Ratings Pilot.
June 6/8 = Practice Week
June 13/15 = 1st Official Week, Scores Entered into DUPR via Phone
June 20/22 = 2nd Official Week, Scores Entered into DUPR via Administrator
June 27/29 = 3rd Official Week, All Players Must Participate in DUPR
This week, we asked all players to participate in DUPR. We had 59 players participate this week, of which 57 had DUPR accounts. One non-DUPR player was recruited from drop-in because a registered player did not show up. Overall, we conclude that our volunteers overwhelmingly supported our effort. I’m proud of all of you!
Through three weeks of officially counted games, 121 players supported this pilot. Our community rallied, helping us measure something that may or may not benefit each individual player. Make no mistake – what you’ve done over the past three weeks will make a difference. Your games have meaning, they contribute to a bigger goal. Many of you chose to sacrifice wins in an effort to get our Club to a better place. How many of you started on Court 5 and lost two games and ended up on Court 7 and thought “this isn’t fun!” Regardless, you helped us. Not every Club in the United States possesses players displaying selflessness that benefits somebody else.
DUPR
When we conceived this pilot a few months ago, DUPR used a different ratings process.
Players were introduced at varying levels … beginners around 2.50 – 3.00, advanced players between 4.50 – 5.00.
You could lose a game against better competition and your DUPR could improve.
Three weeks ago, DUPR changed their algorithm.
It is very similar to the KPR algorithm we are using in parallel.
DUPR is using what is called “ELO” to measure players … essentially determining the odds of a worse team beating a better team. This closely aligns with the UTPR formula used in USA Pickleball Tournaments.
All new players are assigned a 3.50 rating.
The last bullet point is a problem.
Almost none of us are 3.50 players. Some are 2.50 players. Some are 3.25 players. Some are 3.75 players. Some are 5.00 players. But few of us represent the rating that DUPR now assigns to players who claim an account.
This means that it takes a sample of games to properly adjust an introductory 3.50 rating. One of the reasons we went with a DUPR-only week was to see how many games it will take to adjust a 3.50 rating to a rating that is accurate and represents who the player is.
It is going to take a lot of games to adjust a 3.50 rating to a rating that is accurate and represents who the player is.
Through Tuesday/Thursday, most of our players made modest progress toward their actual DUPR. If you started as a 3.50, 17 of 42 players who were new saw their DUPR change by +/- 0.10 ratings point. In other words, if you were a 4.25 player, you started at 3.50 and you had close to a 50/50 chance of moving by 0.10 ratings points (to 3.60 if you were lucky, to 3.40 if you were unlucky). If everything went perfectly over time, it would take a few months of weekly play for your DUPR to adjust to the level where it should be.
DUPR appears to give favorites more credit than the KPR formula. If you are highly favored to win a match, DUPR appears to give you 0.01 – 0.03 ratings points if you win, whereas KPR (which is the UTPR formula used in USA Pickleball Tournaments) would give you 0.001 – 0.01 ratings point. One of our players was highly favored in all five games … winning three and losing two. This player’s DUPR increased by about 0.07 ratings points in the three wins. This player’s DUPR decreased by about 0.15 ratings points in the two losses. Overall, this player’s DUPR decreased by 0.08 ratings points despite winning three games and losing two games.
DUPR is not fixing old ratings. We have several players with DUPR ratings below 3.00. These players are competing against players who were assigned 3.50 ratings. The 3.00 player has to win five consecutive games against the 3.50 player in order for the two players to equalize, and once the players equalize, the player who drops from 3.50 to maybe 3.40 shouldn’t drop to 3.40 but has to in order for the 3.00 player to get up to 3.40. This creates another series of errors that require a large sample of DUPR games to self-correct.
Our pilot (to date) proved that upsets happen. Nothing happens according to plan. Purple players beat Indigo players. Red players beat Orange players. This comment does not prove that the color system was bad – in fact, it proves quite the opposite. Our data shows that we have feisty players who seize opportunities to defeat better players. In a DUPR world, this fact requires many more games in a sample to achieve an accurate outcome. When a better player drops 0.08 ratings points by losing to a player not rated as high, the player must win four (4) additional games to equalize for the upset. Think about it this way. We have had three (3) Aqua players support us by playing in our pilot. These players won 11 games and lost 3 games. According to the way that DUPR is scoring our players, our Aqua players would gain about 0.02 ratings points for each of the 11 wins (11*0.02 = 0.22 ratings points) while losing 0.08 ratings points for each of the 3 losses (3*0.08 = 0.24). Our Aqua players essentially do not experience a DUPR gain. Maybe they shouldn’t experience a DUPR gain. However, the same issues happen among Teal, Red, Purple, Orange, and Maroon players. If these players don’t play equal/better players, they will have a harder time increasing their DUPR given the current process.
We’ve learned that it will truly take 30-40 matches (maybe more) for a player to record a DUPR that accurately reflects their actual skill level. This makes it very difficult for a Club like PebbleCreek to use DUPR.
Now, having said that … DUPR is important in inter-community play. If you wanted to go to Surprise or Mesa and play against comparable players, your competition will review your DUPR rating. There are very good reasons for PebbleCreek to encourage players to record matches via DUPR. You are always welcome to record your own matches via phone (which several groups are now doing). Your matches will be recorded at 50% of the value that they will be recorded at if I enter your matches (which I cannot do unless your matches are being supervised … by me). If you want to find a tournament partner outside of PebbleCreek, DUPR is an important metric to find a trusted partner. For players with 30+ games within DUPR, DUPR allows PebbleCreek to evaluate new players coming into our community.
In other words, DUPR is not going away. We need DUPR.
We may not need DUPR to evaluate all of our players, however (to be determined). Who knows, maybe DUPR will change their ratings rules again in three months and DUPR becomes the best ratings system available to us. As long as DUPR assigns new players at a 3.50 level, however, DUPR is problematic for use with nearly 1,700 players.
Didn’t You Promise us a CRPR Rating from Court Reserve?
Yes, we promised that Court Reserve will have a ratings system. We promised that their ratings system would be applied to your games, then entered into their system. We await their system.
Because their ratings system will likely employ the spirit of an ELO-based ratings system (like the KPR system we are running in parallel), we can learn how their system will behave by running our KPR system through this trial.
Did Three Weeks of KPR (pronounced “caper”) Tell Us Anything?
Absolutely!
Among players with 8+ games:
12% moved up at least 0.25 ratings points (i.e. one color level).
12% moved down at least 0.25 ratings points (i.e. one color level).
Red and Indigo players were most likely to be mis-classified (i.e. should be one color level higher than they currently are).
This proves two things at the same time:
Our Color-Based system is not as bad as some of our players perceive it to be. For the most part, colors are accurate. Individual placement of some players is inaccurate. Just because you (individually) might be better (or worse) does not invalidate the 75% of players who are close to properly classified.
Those who were negatively impacted by colors (i.e. their skills were not properly evaluated by our raters) can change their destiny within three weeks by participating in a process where a computer adjusts a rating.
Within just three weeks of our pilot, we’ve demonstrated that use of a computerized ratings process can help players who were under-valued by the color system. The under-valued player can earn 0.25 ratings points within three weeks, enough to be moved up one color level or 0.25 ratings points. This is probably the most important fact we could have hoped to prove … a computer can correct injustices.
Of course, you must do your part. You need to win your games, and you do not need to win games against better players for the computer to adjust your ratings level. You just need to win. Our pilot proved that you do not have to have better players participating to see your ratings level adjust up. You simply need to win.
Luck?
Yes, I was able to demonstrate that luck plays a weekly role in computerized ratings. Allow me to re-state that comment. Luck is prevalent in pickleball. Evaluating a player on a one-day basis is probably not the best formula to assess readiness for a new level.
If a player had a great week (3-1, 4-1, 4-0, 5-0) the player only has a 36% chance of having a great week the following week.
If the player had a poor week (1-3,1-4, 0-4, 0-5) the player also has an approximate 36% chance of having a poor week the following week.
The odds of having back-to-back-to-back good weeks are 1 in 21. In other words, if you have three good weeks in a row, it probably isn’t by accident. If that happens there is a 20-in-21 chance you are better than the color level assigned to you.
Realistically, after accounting for “bad partners” and bad luck, if you can jump a quarter point within five weeks, you’ve accomplished something that our Club should recognize.
Bad Partners?
In the past week, I was peppered by comments about having “bad partners”.
Over time (i.e. a month or five weeks), the distribution of what you consider to be “bad partners” equals out. Over time, you are accountable for your performance. If you aren’t winning, the computer will not increase your ratings level.
Several of you have told me that you cannot play with “players that are not as good as me”.
As you become better at pickleball, you end up in more situations where you do not get balls because your partner is a weaker player. It becomes your job to take games over. Your returns of serve should be hit in places that force your opposition into bad situations, feeding easy third-shot put-aways to your partner. You fully control the return of serve when the serve is hit to you. Use your return of serve to hit balls down lines to the backhand of the opposition.
Use third shots to your advantage. If the opposition chooses to hit the return of serve to you, make the opposition pay. Hit a ground stroke on a short return of serve. Hit your third shot drop to the backhand of the opposition. If you do not get balls, take full advantage of the balls that are hit to you.
Our three Aqua players, who are clearly better than everybody else, won 11 games and lost 3 games in the pilot. These players barely got any balls, period. How did they win 79% of their games?
I will not accept “weaker partner” arguments going forward. We now have a sample of games over three weeks. We demonstrated that regardless of partner, players who were undervalued by the color system were able to increase their KPRs regardless of partner or opponent.
Upsets!
We continue to see upsets happening. Upsets happen in sports. An upset occurs when a team that is not supposed to win … wins! If upsets never happened, there would be no reason to play the games.
In our pilot, there are many upsets. This would be an example of an upset.
Player 1 = 3.30 KPR
Player 2 = 3.20 KPR
Player 3 = 3.24 KPR
Player 4 = 2.80 KPR
Players 1 and 2 = 3.25 average KPR
Players 3 and 4 = 3.02 average KPR
Players 3 and 4 win the match by a score of 11-7
In this example, players 3 and 4 would be rewarded by an increase of about 0.09(ish) ratings points. Their KPRs would increase to 3.33 and 2.89. The losing team would lose 0.09(ish) ratings points, falling to 3.21 and 3.11. Player 3 now has the highest KPR. In the next match, players 3 and 4 would still be a small underdog against players 1 and 2, however, the ratings system more closely equalized the players, and if players 3 and 4 win again, they only earn 0.06 ratings points. If they win again, they only earn 0.04 ratings points (because they would be the favorite to win the match). As you can see, it only takes a few games for a properly calibrated computerized ratings system to fix “specific” problems. It takes many games for a properly calibrated computerized ratings system to fix problems across hundreds of players.
In our Pilot, there have been many upsets. However, the color system did it’s job. If the color system failed, half of our matches would be upsets. This is not happening.
Equal KPRs = 56% chance of an upset.
KPR Difference of 0.10 to 0.25 points = 36% chance of an upset.
KPR Difference of 0.25 to 0.50 points = 21% chance of an upset.
KPR Difference of 0.50 to 0.75 points = 17% chance of an upset.
Our color system is doing exactly what it is supposed to do. If you are two color levels different than your opponent, you only have an approximate 1 in 5 chance of upsetting your opponent. This does not mean, however, that if you upset your opponent you deserve to be two color levels higher. You are not better than the opponent, you simply upset your opponent. If you consistently beat better opponents across 10-30 games, well, the computer will increase your KPR and you will be rewarded accordingly.
Let’s look at a case where two 3.00 players keep beating two 3.50 players.
Game 1 = 3.00 players win. Their KPR increases by 0.099 points, to 3.099. The two 3.50 players fall to 3.401.
Game 2 = 3.099 players win. Their KPR increases by 0.094 points, to 3.193. The two 3.401 players fall to 3.307.
Game 3 = 3.193 players win. Their KPR increases by 0.074 points, 3.267. The two 3.307 players fall to 3.233.
Within three games, KPRs equalize. FYI, DUPRs appear to follow the same process. However, the odds of pulling off three upsets in a row are low … about 0.21*0.21*0.36 = 1 in 63.
In other words, if there is a mistake in a Computerized Ratings System, the mistake can be corrected in a handful of games. If there isn’t a mistake, then the odds of three consecutive upsets are about 1 in 63 (in other words, very unlikely to happen).
Court Assignments
When I determine initial court assignments, you should have noticed that some of my matchups are very odd. There will be two Oranges playing against two Red players, or two Indigos playing against two Maroon players. This is done on purpose, and is done for several reasons.
I want to give you a chance to upset better players.
I want to create matches in the second game that are competitive.
I want to give a handful of players a chance to get in way above their heads, or have to fight their way back after losing a pair of games.
I want to see how a Computerized Ratings System deals with odd matchups.
The design of the “King of the Hill” format means that exactly two players on each starting court are going on a wild ride. Let’s say you and three other players begin on Court 3.
Math dictates that 1 of those players must win two games and end up on Court 1
Math dictates that 1 of those players must lose two games and end up on Court 5
No matter what happens, one player will perceive that they are gifted and will be on Court 1. One player will have a bad day and end up on Court 5. And honestly, much of it is due to luck. Math dictates the outcome, not the talent level of the players.
If I were to categorize complaints, the top category would be “This system isn’t fair, I shouldn’t have to play on Court “x” against inferior players just because I lost twice.” Well, this is part of the design of our pilot. By losing twice, you play in a match where you should win, and because you likely will win, we get to see how the computer rewards your efforts. Many players have won two games after losing two games, proving our hypothesis that the computer can properly assess players playing against competition they do not usually play against. I realize your day isn’t as much fun as you wanted it to be. Please remember, we are not testing your ability. We are testing the ability of a computer to evaluate odd situations.
No Games on July 4 / July 6
We will evaluate how our Pilot is performing during our off week (July 4, July 6), and communicate updates on our process moving forward.