Winrate Calculator

— 1 Comment

winrate calculator splash

Well, we’ve got two Diamonds and they’ve only got one, but they’ve got more Golds than we do, but…

The Winrate Calculator lets you quickly assess how fair a match is, in terms of skill tiers, before you start it. One of my side projects has been taking all the thousands of Factions matches and using this data to compute the rough significance of various tier gaps. It’s not perfect, but it’s better than (e.g.) arbitrarily assigning point values to different tiers, or not having any balancing system at all to reduce gains from predictable stomps and properly reward upset victories by lower-ranked teams.

To use the calculator, first figure out the ranks on each team. For instance, let’s say Demacia has two Diamonds, a Gold, and two Silvers, and Noxus has a Diamond, two Plats, and two Bronzes. Demacia would be DDGSS, and Noxus would be DPPBB. You’d then punch DDGSS versus DPPBB into the winrate calculator to get an estimated winrate for Demacia, with 50% meaning the match is expected to be even.

Winrate Calculator (fill in the white boxes with the tiers of each team; don’t be shy about deleting previous calculations)

This system uses the database of Factions matches from previous arcs. Every Intermission Match result you submit makes the system smarter.

How It Works

I spent a lot of time thinking about how to balance skill gaps in Factions. Over many hours of pondering, and many consultations with friends with much stronger stats backgrounds than I have, I evaluated a number of options. One idea was to compute the winrate for each tier. Another was to implement a sort of modified Elo system, treating each rank as a “player” and tracking their performance over many matches.

In the end, I decided on something simpler and more direct: break matches down into particular matchups, and calculate winrates for those matchups. For example, let’s say we have GGSSS vs. GGGSS. Let’s then say that, out of ten such matches in the database, GGSSS won only 30% of the time. We can then say that, skill-wise, the GGSSS side has a 30% chance of winning. Thus, if they win, they should gain more points, and if the lose, they should lose fewer points. The idea is to correct for rank, so that if the only difference between the team is rank composition, they could play a hundred matches and neither would come out ahead: the dominant side would win more but earn fewer points, while the weaker side would win less often but win more points. By correcting for skill, we shift the focus onto the factors we want to matter: faction strength, teamwork, and individual effort to play above one’s usual tier.

Simplifying Assumptions

The immediate problem is that there are 8,001 possible matchups in a 5v5, which spreads the data much too thinly. I adopted two simplifications to deal with this. A basic principle of social sciences is that one must sacrifice accuracy for workability. The “gold standard” would be a detailed subatomic analysis of every particle of every player on both teams. This is no more possible here than in (e.g.) an attempt to determine how an election will play out, or whether two countries will agree to lower tariffs. The important thing is to make note of these simplifying assumptions.

Mirror cancellation. On the theory that equally skilled Summoners will “cancel each other out”, we ignore mirror matchups. For example, let’s say the matchup is DSSSS v PSSSS. We’ll treat that as a D v P match. By doing so, we dramatically deepen the dataset. DSSSS v PSSSS may not play the same as DGGGG v PGGGG or DGGSS v PGGSS, but the idea here is that in all these cases the only net difference between the teams is that one side has a Diamond and the other has a Plat.
- Issue: It may be the case that, for example, Diamonds play better on high-tier teams than Plats do, and Plats play better on low-tier teams than Diamonds do. If so, then D v P will play out differently depending on the tier-levels of the “cancelled” Summoners. My assumption is that skill is essentially additive rather than multiplicative, and this “context” effect is secondary.
HGL. If the system lacks enough data even after mirror cancellation, it will try a second trick: HGL. Diamonds and Plats are collapsed into a single H(igh) tier, and Silvers and Bronzes into a L(ow) tier. This drastically reduces the number of possible matchups, from 8,001 to 210.
- Issue: To the extent that Diamonds and Plats are different, and Silvers and Bronzes are different, this reduces accuracy.

As noted, both of these key assumptions bring some amount of inaccuracy into the system. The more data we have, the less we need to rely on these assumptions.