UPDATE: See this newer post.
We’re going to be running some test matches to gather data to improve our balancing system. Any help you can provide would be greatly appreciated. Read on for more.
If you want to volunteer as a lab rat, join the appropriate in-client chat for your rank (higher of S3 and S4 solo queue SR rank): e.g. “BRONZE”, “PLATINUM”, “SILVER”. We need lots of Silvers in particular.
Set the chat to auto-join by clicking the gear icon in the upper-right corner of the chatbox and selecting “auto-join”. Then you can just forget about it; you’ll eventually get a custom game invite or two. People who are setting up these test matches will pull from those chats once they reach sufficient size. You can also ask around in the general Factions chat.
Even better: If you’re willing to run some of these matches, read on for instructions.
Also, if you haven’t yet, you can take this poll to share your intuitions about the size of the tier gaps.
Primary Objective: Weight Mixed-Tier Matches Appropriately
Factions does not currently have the benefit of the automated Matchmaker system. Thus, unlike normal LoL games, Factions matches are usually not razor-balanced 50/50 win/loss matchups. We do encourage people to take the time to at least roughly balance the teams in pick-up matches, and we try to be strict about that for tournaments and featured matches, but it’s safe to say that one side has the skill advantage going into any particular Factions match.
In some ways, I think this has become a strength of Factions: you get a chance to play with and against Summoners of a broad range of skill levels, whereas normally in League you play 99% of games with people in the same narrow MMR band.
However, to make things fair when scoring matches, we do need to take skill gaps into account. This requires some non-trivial analysis of a few key questions:
- What are the relative “strength values” of various Tiers? If a typical Silver brings “3 points” of game-winning power, what’s the number for a Gold? A Bronze? A Diamond?
- How do you combine these strength values to generate a number for an entire team? Is it an average? Or more of a multiplication? Something more complex that factors in “outliers”?
- Ultimate question: what is the best formula for computing how slanted a match is, given the Tiers of Summoners making up each team?
When scoring Factions matches, we want to apply appropriate weighting. A predictable stomp (5 Diamonds beating 5 Silvers) should be worth few points, while an impressive upset victory (5 Bronzes beating 5 Golds) should be worth more points.
The Status Quo
We’ve done our best to jerry-rig a rudimentary point-weighting system for Factions. For reference, I’ll briefly describe our current approach. We model team skill as an additive average, using the following point values:
- Bronze/Unranked: 1
- Silver: 3
- Gold: 5
- Platinum: 8
- Diamond: 12
So, for example, the following teams have the following averages:
- Gold, Gold, Gold, Silver, Silver: 4.2
- Diamond, Plat, Bronze, Bronze, Bronze: 4.6
- Diamond, Diamond, Diamond, Silver, Silver: 5.0
- Gold, Gold, Bronze, Bronze, Bronze: 2.6
- Plat, Silver, Bronze, Bronze, Bronze: 2.8
Though this produces nice tidy decimal-pointed numbers, it’s really just guesswork. For example, this model presumes that upgrading a Plat to a Diamond is a bigger boost for a team than upgrading a Bronze to a Silver. I based that on the non-linear distribution of Elo ratings, which is indirect evidence at best of actual game impact, and many Summoners have remarked to me that they have a different intuition, i.e., that they think the low-Tier upgrades are worth more.
I think we’re going to need to perform some empirical investigation to move beyond guesswork and intuition here. So, after some thought, I’ve designed a first round of experiments to help us get a rudimentary idea of what might be going on here.
Initial Experiment Design
I want to run some matches and try to get a feel for the relative power gaps between the Tiers in mixed-Tier matches. To do so, I invite Summoners to run matches with the following parameters and submit the results. Apologies for the nitpicky details: obviously, I appreciate any help you can provide, but I really want to make sure we have good, clean data, so I’m being rather strict with these parameters.
We’re going to start with an assessment of the relative gaps between the Tiers, using Silver as a baseline. We’ll do this with large numbers of trial matches with the following configurations:
- Team A: Silver, Silver, Silver, Silver, Silver
- Team B: ____, ____, Silver, Silver, Silver
- Fill in the blanks with a pair of Summoners from the same non-Silver Tier: Bronze, Gold, Platinum, or Diamond.
The key statistics we’ll be tracking will be the win rates of various Team B arrangements versus the measuring all-Silver team. For example, if 2 Golds and 3 Silvers beat 5 Silvers 70% of the time, then there should be a large point difference between Golds and Silvers when weighting matches.
- Bump this thread to help recruit Silvers. Check the SILVER in-client chat for them.
- Here’s a list of people who have volunteered as balance testers.
- Matches are to be Tournament Draft. Use up to 5 minutes of pause time per side if someone d/cs or has similar problems.
- Use the full (100+) Champion roster, not Factions rosters. (This removes the potentially troublesome variable of faction strength, albeit at the cost of not being able to examine specifically how Tiers perform in the Factions context, a possible subject for future study.)
- Use bans as normal (as, e.g., in a Ranked game).
- Play your best: these matches are imba by design, and we need precise measurements of how much imbalance different Tiers create when added to the mix. Although your Elo isn’t at stake, the data will be compromised if people derp around with joke builds or whatnot. Don’t go easy on the other team if you’re the higher-Tier team: the goal here is not to have fair matches but to quantitatively measure the degree of imbalance resulting from different Tier variations. To do that, we need people to cast aside their weighted training clothing, eat any Senzu Beans they have, and fight at their true power level. Pretend you’re playing Ranked. Play to win. (Of course, you should still be sportsmanlike and friendly in chat and such.)
- Each Silver player should toss a coin to decide which side they’re on for each match. (Obviously, if one side fills up, any new entrants just go on the remaining side.)
- Use the following Tier arrangements:
- One team should be all-Silver.
- One team should be 3 Silvers plus a pair of Summoners from another Tier. (Both should be from the same Tier, e.g. 2 Bronzes or 2 Diamonds.)
- Use a coin toss to decide whether the all-Silver team is Blue Side or Purple Side.
- Tier is defined as the higher of end-of-S3 solo queue SR rank and current S4 solo queue SR rank. In other words: use LoL Nexus and take the higher of the two ranks that appear.
- op.gg can also show you at a glance a Summoner’s S3 and S4 Tiers.
- Bronze can mean Bronze or Unranked. That said, for these experimental matches, we prefer “actual” Bronzes, and want to stay away from people who, e.g., were Plat in S2 but haven’t played Ranked since then.
- Example: FrozhanKhetos was Plat in S3 and is now Gold. For our purposes, they are a Plat.
- Verify Tiers before the game starts.
- The match organizer is also responsible for submitting the results.
- The match organizer should open up LoL Nexus while the game loads and screenshot that.
- Submit results here at the end.
- Blue Side: Silver, Silver, Silver, Silver, Silver
- Purple Side: Plat, Plat, Silver, Silver, Silver
One side should always be all-Silver. The other side will have a pair of Summoners from another Tier.
You can copy and paste the following text into Factions chat to help you gather players (edited as appropriate for the specific Tiers you want to test):
Running a balance test match! We need lots of Silvers and two Golds. (Tier is higher of S3 and current rank, solo queue SR.) If you’re available, join the appropriate chat room: SILVER or GOLD.
Why This Design?
I thought a lot about this, and I think the best first step is to try to suss out where the various Tiers stand in relation to Silver in a mixed-Tier match. Our initial results from this round of tests will help us figure out where to go next.
This data will be very, very useful to us, and will help us make Factions more fair in future arcs. People who organize lots of these matches will be commended and may receive some kind of reward from CupcakeTrap as a token of appreciation.
There are many, many more issues we’re planning to investigate to improve balance. While this is only a rudimentary first step, the move from guesswork and intuition to empirical testing is a crucial one.