At long last, the new Balance of Power has been posted. This new Balance of Power model reflects many, many, many hours of statistical analysis and data-gathering. (My thanks to everyone who helped play test matches to enhance our dataset.) In short, the skill-gap adjust applied in scoring Factions matches is now based on actual observed winrates (e.g. how often does an all-Gold team beat an all-Silver team?) rather than speculation about “which tiers have the biggest impact.” It’s also now an adaptive system, meaning it will become more and more accurate as more matches are played.
Special thanks to Banane des Bois, our lead Scoring staffer. Banane single-handedly inputted most of the 70 or so matches for the Nyroth arc that had built up while I worked on designing the new scoring system.
The Factions Scoreboard
In each storyline, the factions involved are fighting on the Fields of Justice to resolve a dispute. The Balance of Power represents how each faction is doing in that struggle. Ultimately, the Balance of Power will decide which faction is declared victorious. (There are some exceptions. For example, a faction that’s sufficiently dominant on the BoP can trigger a “victory tournament”; if they win there, they win the arc, regardless of current BoP.)
Determining the Value of a Match
Here’s how the value of a given match is determined.
Each map and mode has a different base value. The primary map/mode is Summoner’s Rift 5v5, which has a base value of 5 points.
So long as you don’t play against your own faction, you are allowed to “sub in” for another faction if they’re having trouble filling both teams.
However, for each substitute Summoner in a match, the value of the match is decreased by 20%. After all, it’s not fair for a faction to lose points from a match when half its Summoners were from other factions.
Skill Gap Adjust
We adjust matches based on the relative skill levels of the two factions. If a bunch of Diamonds and Plats beat a bunch of Silvers and Bronzes, the match will not be worth very much, for example.
Problem: in messy mixed-tier matches, how do you compute the adjust?
The tricky question is precisely how much of an adjust to apply. It might be very clear if it’s Diamonds and Plats versus Silvers and Bronzes, but what if it’s something more subtle, like three Golds and two Silvers versus a Diamond, a Gold, two Silvers, and a Bronze? Different people have different intuitions about which team has the advantage there: some say that the Diamond will probably carry the other team, while others will say that the jump from Silver to Gold is more meaningful. Still others said that Ranked tier is not a strong predictor of winrate, for example because not everyone plays Ranked, and so there’s not much need for a skill adjust.
Solution: apply an adjust such that, if each team merely plays to its skill level, there is no net change over many matches.
I spent a lot of time thinking about how to analyze this question. I decided that the gold standard would be computing winrates based on specific tier-based matchups.
For example, the above example was 3 Golds and 2 Silvers versus 1 Diamond, 1 Gold, 2 Silvers, and 1 Bronze: i.e. GGGSS vs. DGSSB. If over a large number of matches we find that GGGSS beats DGSSB 75% of the time, then we should apply a proportionate adjust: if the GGGSS wins, their faction’s gains (and the other faction’s losses) should be reduced to 50% of normal value, while if DGSSB wins, they should be picking up 150% of normal value. Let’s say these two factions play 100 such matches, with Ionia fielding GGGSS and Demacia fielding DGSSB. If both sides merely play to their skill level, we’d expect to see Ionia win 75 times and Demacia win 25 times. If you multiply it all out with the skill adjust, these two teams would each walk away with a net score change of zero. In other words, we’ve corrected for skill. What will cause a net change in score? Faction strategies, teamwork, people learning to play their factions’ Champions, and so on.
Another problem: not enough data
Here’s a problem: there are 8,001 possible matchups, if Summoners are categorized as Bronze through Diamond, even if we fold Unranked and Sub-30 Summoners into the Bronze category. We just don’t have the data to analyze every single matchup: we’d need over 80,000 evenly distributed matches just to get n = 10 for each.
We have two means of addressing this problem.
First solution: cancel out mirrored pairs of Summoners
First, we cancel out mirrored pairs. For example, the GGGSS vs. DGSSB matchup could be simplified to GS v. DB, by cancelling out 1 Gold and 2 Silvers on each side. This is not an uncontroversial step: for example, this would treat DBBBB vs. GBBBB the same as DGGGG vs. GGGGG, but it could be that (e.g.) a Diamond plays a lot better with Gold teammates than with Bronze teammates. However, my intuition (which I need to critically examine in a future study) is that in most cases this doesn’t compromise accuracy too much. It also dramatically reduces the number of combinations, and amplifies the dataset.
Here are some winrates, using this cancellation method to simplify matchups. For example, it turns out that DB actually beats GS about 56% of the time.
Even with cancellation, though, we don’t have data on every matchup. For example, there are some very messy combinations (like “DGGGB vs. SBBBB”) that don’t allow much cancellation.
Second solution: combine Diamonds/Plats and Silvers/Bronzes
If the new Balance of Power system cannot find a documented winrate for the matchup, it condenses Diamond and Plat into a single “High-Tier” (“H”) category, and Silver/Bronze into a single “Low-Tier” (“L”) category. This reduces the total number of combinations down to 231, and provides more opportunities for cancellation. Why collapse these particular tiers? This post is already too long, but in short, I did some in-depth analysis of tier-by-tier winrates and found that these categorizations were the most reasonable.
Here are some winrates using the HGL system. For example, GGGSS vs. DGSSB is transformed into GGGLL vs. HGLLL, which cancels out to GG v. HL. HL beats GG 41% of the time. Thus, even if we didn’t have enough data on GG vs. DB, we could end up with a reasonably close estimate.
Long story short: the scoring system is now much more fair, because the skill gap adjust is based on how the tiers actually fare against one another, rather than on speculation.
I have several improvements planned for this system. Ultimately, I’d like the process to look something like this:
- First, check to see if the exact matchup (e.g. GGGSS vs. DGSSB) has occurred sufficiently often to generate reliable data. If so, use that.
- If not, try converting it to HGL (GGGLL v. HGLLL).
- Failing that, try the exact matchup with cancellation (GG v. DB).
- Finally, resort to cancellation and HGL recoding (GG v. HL).
However, I believe this is already a pretty dramatic improvement over the old way of doing this. For the moment, I’m going to focus my energy on other areas of Factions, such as lore-writing.
Faction Standing Adjust
Finally, the system accounts for the gap in standings between the two factions. If the highest-ranked faction beats the lowest-ranked faction, the match will be worth fewer points than usual. In contrast, if the underdogs manage to beat the dominant faction, the value of the match will be increased. This encourages lower-ranked factions to take on the heavyweights.
This is a much smaller adjust than the skill gap adjust: we want some kind of “rubber-band” mechanism, but we don’t want to penalize success.
Okay! Time to write some long-overdue lore.