Combat Balancing Simulation Framework

Summary

For my master’s thesis, I developed a modular Monte Carlo simulation framework for analyzing and balancing extended Risk-like combat mechanics. The project was based on balancing challenges from my own strategic board game, where the combat system expanded beyond standard Risk-style dice combat.

The framework supports heterogeneous dice setups, mixed dice pools, unit profiles, faction/class differences and custom effects such as rerolls. Its purpose is not to automatically generate perfectly balanced units, but to support designer-driven balancing decisions through simulations, metrics and visualizations.

I designed and implemented the framework myself, including the combat model, simulation logic, experiment setup, evaluation metrics and visualizations.

What This Project Shows

Simulation-based balancing
Combat systems analysis
Understanding of comparison-based dice mechanics
Monte Carlo simulation
Modular framework design
Baseline-based evaluation
Data-driven game design
Critical awareness of metric limitations

Design Problem

Combat system expansion with factions, unit levels and dice setups

Combat system expansion with new factions, unit levels and dice setups.

Risk-like combat is resolved through repeated battle rounds. The attacker and defender roll a limited number of dice, sort the results and compare the highest dice pairwise. For each comparison, the lower result loses one unit. In case of a tie, the defender wins. A full battle consists of multiple battle rounds until one side retreats, is eliminated or the territory is conquered.

The combat system is comparison-based rather than purely additive. In additive systems, balance can often be estimated more directly, for example by comparing expected values.

Why Balancing Becomes Difficult

Classic Risk-style combat: battle round.

In additive combat systems, dice strength can often be estimated more directly. For example, different dice combinations can be compared through expected values, which makes it easier to judge whether an additional die would bring a unit closer to the intended strength.

Balancing in additive combat mechanics through expected values

Additive combat systems can often be compared through expected values.

In comparison-based dice systems, this intuition does not transfer directly. Additional dice can change the number of pairwise comparisons and therefore the outcome distribution. In some cases, an additional die can even become disadvantageous.

Balancing comparison-based combat mechanics

In comparison-based combat, an additional die can change the number of comparisons and even become disadvantageous.

This means that small changes to dice pools, unit profiles or special effects can significantly influence win probabilities, expected losses and combat duration. As the system grew with different factions, levels, dice types and effects, balancing through intuition alone became unreliable.

Technical Approach

Requirements for extended combat mechanics.

I built a modular framework with separate layers for combat modeling, simulation and evaluation. The model layer defines dice, effects, unit profiles and factions. The simulation layer resolves battle rounds and full battles through Monte Carlo experiments. The evaluation layer aggregates the results and creates metrics, scorecards and plots.

Modular architecture of the simulation framework.

The framework allows manually defined unit variants to be compared under controlled scenarios. For example, different Orc level 2 variants can be tested against a human baseline to evaluate which setup behaves closer to the intended balance range.

Evaluation

To make combat setups comparable, I used a baseline-based evaluation approach. In my thesis, the human faction served as the reference faction, and manually defined unit or faction variants were compared against it across selected combat scenarios.

The experiments were not intended to cover every possible combat situation. Instead, I selected representative scenarios, including small fights, symmetric matchups and larger reference setups. This reduced the complexity of the experiment space while still allowing meaningful comparisons.

I evaluated both homogeneous and heterogeneous setups. Homogeneous experiments compared units of the same level or type, while heterogeneous experiments tested mixed unit compositions. This was important because a unit variant can appear balanced in a simple isolated matchup but behave differently in mixed combat situations.

The evaluation focused on metrics such as attacker win rate, defender win rate, expected losses, combat duration, delta to baseline, outliers and tolerance-band violations.

The results were aggregated into scorecards and plots, helping to identify which variants stayed close to the baseline on average and which setups created problematic outliers. The goal was not to find a mathematically perfect setup, but to provide better information for designer-driven balancing decisions.

Evaluation Results

Which orc variant stays closer to the baseline?

Aggregated scorecard across selected heterogeneous scenarios: 1v1, 2v1, 2v2, 3v2 and 5v3.

Heatmap of scenario-level results for each variant. These results are aggregated into the scorecard.

Design Role of the Framework

The framework is not an automatic balancing tool. It does not search for the “best” dice pool or generate optimal unit stats.

Instead, the designer defines candidate configurations, runs simulations and interprets the results. The data helps explain how a mechanical change affects combat outcomes, but the final balancing decision still depends on the intended faction identity, player experience, pacing and playtesting feedback.

Limitations

The framework focuses on controlled simulation experiments rather than exhaustive analysis of every possible setup. Important limitations include the chosen baseline, the reduced scenario space, isolated combat analysis and the fact that the framework is not an optimizer.

These limitations are important because balance is not an absolute value. A setup can be acceptable in one context but problematic in another. The framework supports balancing decisions, but it does not replace design judgement or playtesting.