Multiplayer

This guide covers the multiplayer features supporting 3-4 player games.

Overview

While standard UNO is often played with 2 players, our implementation supports 2-4 players for more dynamic gameplay.

Key Differences from 2-Player

  • Turn Direction: Reverse card actually reverses turn order

  • Skip Mechanics: Skip affects the immediate next player

  • Draw Effects: +2 and +4 target the next player in turn order

  • Strategy: Must consider multiple opponents

Multiplayer Environment

The multiplayer environment extends the base UNO environment:

from src.multiplayer_env import MultiplayerUnoEnv

# Create 4-player game
env = MultiplayerUnoEnv(num_players=4)

# Reset returns observation for player 0
obs = env.reset()

# Step returns (obs, reward, done, truncated, info)
obs, reward, done, truncated, info = env.step(action)

Observation Space

Extended 25-dimensional observation:

Feature

Dimensions

Description

Open card color

4

One-hot encoded

Number cards per color

4

Cards in hand

Special cards

3

Skip, Reverse, +2

Wild cards

2

Wild, Wild +4

Playable colors

4

Which colors can be played

Opponent 1 hand size

4

Bucketed (1-3, 4-6, 7+, UNO)

Opponent 2 hand size

4

Same buckets

Action Space

Same 9 actions as 2-player:

  1. Play Red card

  2. Play Green card

  3. Play Blue card

  4. Play Yellow card

  5. Play Skip

  6. Play Reverse

  7. Play +2

  8. Play Wild +4

  9. Play Wild (choose color)

Turn Direction

The Reverse card changes turn direction:

Clockwise (Normal):     1 → 2 → 3 → 4 → 1 → ...
Counter-clockwise:      1 → 4 → 3 → 2 → 1 → ...

Implementation:

class MultiplayerUnoEnv:
    def __init__(self, num_players):
        self.direction = 1  # 1 = clockwise, -1 = counter-clockwise

    def _handle_reverse(self):
        self.direction *= -1

    def _next_player(self):
        return (self.current_player + self.direction) % self.num_players

Running Multiplayer Games

GUI Mode

Launch the multiplayer GUI:

python multiplayer_gui.py --players 4

Or from the main GUI:

  1. Open uno_gui.py

  2. Click “Multiplayer (3-4)”

  3. Select number of players

Programmatic Mode

from src.multiplayer_env import MultiplayerUnoEnv
from sb3_contrib import RecurrentPPO

# Load models
models = [
    RecurrentPPO.load("models/selfplay_champion.zip"),
    RecurrentPPO.load("models/best_recurrent_ppo.zip"),
    RecurrentPPO.load("models/optimal_recurrent_ppo.zip"),
    RecurrentPPO.load("models/sb3_recurrent_ppo.zip")
]

# Create environment
env = MultiplayerUnoEnv(num_players=4)

# Game loop
obs = env.reset()
lstm_states = [None] * 4

while True:
    player = env.current_player
    action, lstm_states[player] = models[player].predict(
        obs, state=lstm_states[player], deterministic=True
    )

    obs, reward, done, truncated, info = env.step(action)

    if done:
        print(f"Player {info['winner']} wins!")
        break

Battle Arena Multiplayer

In the battle arena:

  1. Click 3P or 4P button

  2. Configure a model for each player slot

  3. Run batch evaluation

  4. Compare multi-player statistics

Strategy Considerations

3-Player Strategy

  • Watch both opponents’ card counts

  • Reverse is defensive (buys time)

  • +2/+4 more impactful (one enemy at a time)

  • Position matters (who’s before/after you)

4-Player Strategy

  • Track all three opponents

  • Reverse changes who gets targeted

  • Alliances form naturally

  • More variance in outcomes

Special Card Effects

Card

2-Player

3-4 Player

Reverse

Acts as Skip

Changes direction

Skip

Opponent skipped

Next player skipped

+2

Opponent draws 2

Next player draws 2

+4

Opponent draws 4

Next player draws 4

Training Multiplayer Agents

Training in multiplayer requires special considerations:

from src.multiplayer_env import MultiplayerUnoEnv
from sb3_contrib import RecurrentPPO

# Create multiplayer environment
env = MultiplayerUnoEnv(num_players=4)

# Train agent
model = RecurrentPPO(
    "MlpLstmPolicy",
    env,
    verbose=1,
    learning_rate=1e-4,
    n_steps=256,  # Longer episodes
    batch_size=64
)

model.learn(total_timesteps=2000000)
model.save("models/multiplayer_champion.zip")

Evaluation

Win rate interpretation changes:

  • 2-player: 50% = equal skill

  • 3-player: 33% = equal skill

  • 4-player: 25% = equal skill

A 40% win rate in 4-player is excellent (1.6x expected)!

API Reference

MultiplayerUnoEnv

class MultiplayerUnoEnv(gym.Env):
    """
    UNO environment supporting 2-4 players.

    Parameters
    ----------
    num_players : int
        Number of players (2-4)

    Attributes
    ----------
    current_player : int
        Index of current player (0 to num_players-1)
    direction : int
        Turn direction (1=clockwise, -1=counter-clockwise)
    hands : List[List[Card]]
        Cards in each player's hand
    """

    def reset(self):
        """Reset game and return initial observation."""

    def step(self, action):
        """Execute action and return (obs, reward, done, truncated, info)."""

    def get_valid_actions(self):
        """Return list of valid action indices."""

Known Limitations

Current multiplayer implementation notes:

  1. Human Player: Always player 0 in GUI

  2. Training: Agents trained in 2-player may underperform in 4-player

  3. Observation: Can see opponent hand sizes but not specific cards

  4. Stacking: +2/+4 stacking not implemented (official UNO rules)

Future Improvements

Planned enhancements:

  • [ ] Online multiplayer support

  • [ ] Mixed human/AI games

  • [ ] Tournament mode

  • [ ] Elo rating system

  • [ ] Replay saving/loading