Finally, A Challenge Stat. Introducing: The Challenge Domination Index.

Kyle
13 min readJan 18, 2021

There is a lot of discourse around The Challenge being the fifth major American sport.

That sentiment is mostly used ironically as a fun way to legitimize the loyalty and devotion that a lot of people feel towards the people they’ve watched compete in silly games on television for the last two decades, but it still always makes me think.

It makes me think about the ways I wish The Challenge was more like mainstream professional sports, and the ways I wish sports were more like The Challenge.

I imagine a world in which MTV released full, unedited footage of Challenges so we can study it without edits and find out who the truly best swimmers, runners and jumpers are and by how much.

I imagine a world where Adam Silver gets on the mic before game 1 of the NBA Finals, announces that he’s triggered a security breach, and the Portland Trail Blazers come running out of the tunnel after being eliminated in round one, with a final chance of beating the Lakers to take their place against the Heat in the Finals.

I imagine a world where I can fly to a mountainous former Soviet state, park a lawn chair down alongside a trail, and wait all day to see CT barrel toward the next checkpoint in a Final, and I chase him down, cheering him on like he’s a cyclist in the Tour De France.

I imagine a world where the best football team in the regular season gets to stand in the middle of an empty stadium before the playoffs start and pick which of the 8 teams sitting in the stands around them will have to come down onto the turf to play against them.

But most of all, I imagine a world where there are enough stats, data, and information about The Challenge to host 30 minute debate shows about it on ESPN. Will Wes really retire after an early exit? How much money is Ashley projected to win now that Cara Maria is out of the game? Who plays better in cold environments vs. tropical ones? Shit like that.

I don’t have enough pull at ESPN to make that show happen, but I still decided to play my part.

I’d like to introduce The Challenge Domination Index. The first of its kind (I hope) advanced performance metric for competitive performance on The Challenge.

Here’s how it works.

The Goals

An interesting thing about The Challenge and the conversation around the Challenge is that, in the murky waters between sport and television, opinion often gets mistaken for fact. This phenomenon is both incredible and frustrating. Incredible because in the absence of data, there is a wide range of opinions that aren’t disprovable, which leads to incredible online content of people arguing about meaningless nonsense for hours.

But the frustrating thing is that there is no data to look at which has any basis in fact or data. Sure, the flimsy metric of wins and money won are available, but those don’t really hold up.

So I wanted to create a system to score these people based on all of the competitions they take part in (daily challenges, elimination rounds, final challenges)

I didn’t want longevity to be emphasized too much, but I wanted it to be rewarded. If two people performed at similar levels, I wanted to reward the person who has been doing it for a longer period of time, much like any other sport. But I didn’t want people to rise to the top just because they’ve been bad at this for 10 seasons.

I also wanted to find a way to equally value each format whether those seasons be teams/pairs/solo/random.

The Sample

On The Challenge wikipedia page there is a helpful “5 Timers Club’ section. I figured that would be a helpful guideline for my work. Having 5 seasons worth of sample size seemed like a good amount.

However, when I got into the information I had to make some alterations. I needed to use only people who have been on 5 or more qualifying seasons. Non-qualifying seasons include the first 6 seasons, which we’re too difficult to fit into this model due to their formats. I also decided not to include the currently airing Double Agents season. These specifications eliminated some significant characters like Mark Long, Tori Deal, The Miz, Emily Schromm, and Syrus but it had to be done.

There ended up being 23 men and 27 women whose data was collected for this exercise.

Expected Wins (xW)

The foundation of this metric is the concept of expected wins. Like expected goals in European football, the expected win metric evaluates the chance of an individual competitor winning each competition they compete in, and combining those numbers to create an expected win total for a season.

Since Emily Schromm did not qualify for the sample, let’s use her as an example.

The first step was to find Wikipedia’s Challenge cast member page and find out the first season Emily was on, which was Cutthroat.

Then we travel to the Cutthroat season page to see the season’s handy-dandy chart.

As we can see here, Emily competed in all 9 of the daily challenges. Since, on Cutthroat, there were 3 teams, Emily has a 33% chance of winning each of the 9 challenges. Which brings her expected win total to 3 wins for the Cutthroat season. We can see that she only actually won two.

So we have all the information we need four Emily’s daily challenge data.

Daily Challenges Played: 9

Daily Challenges Won: 2

Daily Challenges expected wins (xW): 3

Now we have to grab the same data for eliminations. We can see she competed in two, and won two. Since eliminations are head-to-head, the xW in an elimination is almost always .5, a 50% chance. Since Emily did two, her xW is 1 win.

Eliminations played: 2

Eliminations won: 2

Elimination xW: 1

(note: I included all “mercenary” eliminations for that person, so those mercenary seasons only have elimination data for those mercenaries.)

Now for the final, which is a little trickier. I wanted to reward the concept of making it to the final. Finals played is pretty straightforward. One final played. Final xW also pretty straightforward, the one divided by the number of teams/people who could win. So in terms of two team finals, xW = .5, in most of the ‘3-pairs in the final’ seasons, xW = .33.

Wins, however, are distributed differently for the final. If you lost a two-team final, you receive .25 wins. If you were second in a 3-way final, you earn .5 wins. If you are third in a 3-way final, you also get .25 wins. In finals with four or more teams, I worked to create a reasonable distribution of wins, usually fourth got .1. 0 wins are given to anyone who quit a final.

So our ‘Final’ data for Emily in Cutthroat

Final played: 1

Final wins: .25

Final xW: .33

Now we have to find the data for Emily’s other two seasons.

Emily’s Battle of the Exes 1 chart looks like this.

Seasons like this are a little more confusing. As people leave the game, your expected wins go up, so while you have a 1/13 percent chance (.077 xW) of winning in week one. You have a 1 in 4 chance (.25 xW) of winning the last daily challenge. So in seasons like this, where only one team is the winner, you have to do one divided by the number of participants every week, then add that total to find an expected win total for the season. In this case.

(1/13) + (1/12) + (1/10) + (1/8) + (1/7) + (⅙) + (⅕) + (¼) = 1.14 xW for Exes 1

It got pretty tedious to type those entire calculations out every time, so with a simple function in the python shell, I could calculate those much more easily.

def challenge(list):

xW = []

for numb in list:

xW.append(1/numb)

return sum(xW)

Where ‘list’ is a list of integers of the participants in every daily challenge they played.

So we have all of Emily’s data for Exes 1, which leaves Rivals 2 as her only remaining season. So after we collect the Rivals 2 Data, we enter it all into a spreadsheet that looks like this.

Above xW

After we have all of our raw data. We get to start to manipulate it in order to begin creating some helpful metrics.

My first task is to find out how many more wins than xW Emily Schromm has. This will give us an idea of how much she under-over performed throughout her Challenges. Since Emily won 8 daily challenges and was expected to win 5.78, her ‘above xW’ number is 2.22

Above xW — 2.22

Above xW/Challenge

But that number ultimately doesn’t mean much. Johnny Bananas has played in over 150 daily challenges. If he was expected to win 60 of them and won 2.22 more than 60, that number would mean less than Emily’s 2.22 in 27 challenges. In order to normalize that, we divide the players Above xW number by Total Daily’s Played to get…

Above xW/Challenge = .0822

Once we apply that method to eliminations and finals we end up with this.

The Wins Above/Challenge metric is essentially saying “This is how more or less likely this person is to win a competition than if it was a random draw.” Emily is 8.2% more likely to win a daily challenge than if winners were drawn out of a hat, 50% more likely to win an elimination, and 17% more likely to win a final.

Weighting for Frequency

But to add these numbers together to get any significant metric seems irresponsible. Each player plays significantly more daily challenges than eliminations and finals. Some players make a final without any eliminations. We need to weigh each type of contest against both it’s frequency and its significance. Here’s what I came up with.

60% — Dailies
25% — Eliminations
15% — Finals

This ratio properly weighs how many times people play in dailies vs. the other two, but also adds a little more weight to both based on their importance.

To weigh each of these numbers, you take each of the Wins Above/Challenge numbers, and multiply them by their assigned weight ratio number. So

(Daily Above xW/Challenge) x .6
(Elim Above xW/Challenge) x .25
(Final Above xW/Challenge) x .15

Which gives us even smaller numbers. Since no one loves working with smaller numbers, we can multiply those numbers by a thousand to give us a more easily digestible score to look at for each of these.

Now that we have:

-Found out how much more likely Emily is to win at each contest
-Weighted them appropriately
-Created a score based on how good she is at each of them

We can add those together to create our first meaningful metric: RealContestScore

RealContestScore

(we’ll talk about RealContestAdjusted in just a minute)

Real Contest Score is good. But it only measures these people as if they were all on one season. I think a metric that values only their performance when they are on the show is important. But I think any hopefully comprehensive Challenge metric would include an addition that indicates their performance over the entirety of the Challenge’s history. Landon crushed 4 seasons, but in the grand scheme of things, has he dominated the Challenge for the past 15 years? Not really.

So I came up with an idea.

Winscore

Winscore

Winscore would be a multiplier that would recognize the total number of wins each player has had over their Challenge career, while, this time only weighing them for importance.

Winscore = (TotalDailyWins) + (TotalElimWins * 2) + (TotalFinalWins * 3)

Here’s what Emily’s looks like

Then, in order to merge WinScore with Total Contest Score, we create a Win Multiplier metric, which takes 1.005 to the Winscore power. So Emily’s=

1.005 ^^ 14.5 (her WinScore) = 1.074998592

But wait!

We have two problems.

Problem #1

If we’re going to create a multiplier, it’s important that we eliminate negative numbers from the equation. If someone has a negative contest score, it means they are less likely to win a contest than if the contest were random. However, if we reward a negative ContestScore competitor with a win multiplier, then their total metric with only go further into the negatives, which eliminates the benefit of the WinMultiplier for them. In order to make sure everyone’s numbers are positive (so they are able to rep the benefits of their winscore, we have to equally adjust each participant’s score into the positive, which is where the RealContestAdjusted metric comes in, adding 250 to each player’s contestscore.

Problem #2

This Winscore model weighs each daily win equally, which doesn’t seem fair. Some people have only been on Challenges where 15% of the cast wins on each day and you’d be lucky to win 2 times. Other folks have been on mostly seasons with two teams, where it would require complete failure to get anything less than 6 wins.

In order to treat that imbalance, I created a season multiplier which divides the total wins that the player can count towards their winscore by two for the seasons that have only two teams, which creates a new data point, AdjustedDailywinTotal, which replaces DailyTotalWins in our Winscore calculation.

Note: The season multiplier only applies to wins that affect the winscore total, the xW numbers are unaffected by the season multiplier metric.

Now that we’ve fixed our significant issues, we have this.

Because we have those two final pieces of data, we can calculate our comprehensive metric, our Challenge Domination Score.

So…

Emily Schromm’s Challenge Domination Score = 483.5

Specific Data Collection Points

  • The 3 Gauntlet Seasons, Battle of the Sexes 2, The Ruins, and War of the Worlds 2 were the seasons with a .5 daily win multiplier, as all daily challenges were two teams against one another.
  • All Inferno seasons had a .75 daily win multiplier. This is because, while each challenge was the red vs. the blue team. Every other challenge had a life shield winner. And for those challenges, the only “winners” were the winners of the life shield, not teams.
  • Bloodlines was given .75 daily win multiplier because about half the season was played in teams of two and half as pairs.
  • On The Island, there was a very limited number of contests and they were all 3-person contests. For players participating in those, I counted each as both a daily and an elimination.
  • There was the potential for another category of competition in terms of “purges” and ‘redemption house challenges” it would have turned into a mess, so I classified those all as daily challenges.
  • Free Agents, Dirty Thirty, and Vendettas were so complicated to figure out xW for with redemption houses, random formats, team switches, but I believe I did it accurately.
  • It was really nice when I got halfway through because 80% of the times I could just find the data I already collected for someone’s partner on that season and copy/paste.
  • Most of the time, when there was a DQ, someone quit, or someone was injured, I was able to remember at which point they left, however, some xW numbers may be slightly off if I misremembered a team/person competing in a daily or not. Those errors would be very small and have no tangible effect on the results.
  • If someone had been on 5 challenges and never made a final, they were incurred a penalty. I added 1 final played to their score, with 0 wins, and .5 xW.
  • I gave Johnny Bananas and Ashley Mitchell an extra “Final Played” and “Final Win” for their Rivals 3 and Final Reckoning final wins against their own partner. However, I thought it would be unfair to penalize their partners for simply getting money stolen from them, so I left their final performance from those seasons as played 1 won 1.

Results

And here is how individual’s unweighted elimination score compares to their daily challenge score.

Here’s how each players Total Winscore Relates to their Challenge Domination Score

Surely, experience helps people dominate more right, and as people go on more and more season’s they would surely improve. Here’s how season’s played interacts with Challenge Domination Score.

Thanks for reading!

--

--