Multiplayer

This guide covers the multiplayer features supporting 3-4 player games.

Overview

While standard UNO is often played with 2 players, our implementation supports 2-4 players for more dynamic gameplay.

Key Differences from 2-Player

Turn Direction: Reverse card actually reverses turn order
Skip Mechanics: Skip affects the immediate next player
Draw Effects: +2 and +4 target the next player in turn order
Strategy: Must consider multiple opponents

Multiplayer Environment

The multiplayer environment extends the base UNO environment:

from src.multiplayer_env import MultiplayerUnoEnv

# Create 4-player game
env = MultiplayerUnoEnv(num_players=4)

# Reset returns observation for player 0
obs = env.reset()

# Step returns (obs, reward, done, truncated, info)
obs, reward, done, truncated, info = env.step(action)

Observation Space

Extended 25-dimensional observation:

Feature	Dimensions	Description
Open card color	4	One-hot encoded
Number cards per color	4	Cards in hand
Special cards	3	Skip, Reverse, +2
Wild cards	2	Wild, Wild +4
Playable colors	4	Which colors can be played
Opponent 1 hand size	4	Bucketed (1-3, 4-6, 7+, UNO)
Opponent 2 hand size	4	Same buckets

Action Space

Same 9 actions as 2-player:

Play Red card
Play Green card
Play Blue card
Play Yellow card
Play Skip
Play Reverse
Play +2
Play Wild +4
Play Wild (choose color)

Turn Direction

The Reverse card changes turn direction:

Clockwise (Normal):     1 → 2 → 3 → 4 → 1 → ...
Counter-clockwise:      1 → 4 → 3 → 2 → 1 → ...

Implementation:

class MultiplayerUnoEnv:
    def __init__(self, num_players):
        self.direction = 1  # 1 = clockwise, -1 = counter-clockwise

    def _handle_reverse(self):
        self.direction *= -1

    def _next_player(self):
        return (self.current_player + self.direction) % self.num_players

Running Multiplayer Games

GUI Mode

Launch the multiplayer GUI:

python multiplayer_gui.py --players 4

Or from the main GUI:

Open uno_gui.py
Click “Multiplayer (3-4)”
Select number of players

Programmatic Mode

from src.multiplayer_env import MultiplayerUnoEnv
from sb3_contrib import RecurrentPPO

# Load models
models = [
    RecurrentPPO.load("models/selfplay_champion.zip"),
    RecurrentPPO.load("models/best_recurrent_ppo.zip"),
    RecurrentPPO.load("models/optimal_recurrent_ppo.zip"),
    RecurrentPPO.load("models/sb3_recurrent_ppo.zip")
]

# Create environment
env = MultiplayerUnoEnv(num_players=4)

# Game loop
obs = env.reset()
lstm_states = [None] * 4

while True:
    player = env.current_player
    action, lstm_states[player] = models[player].predict(
        obs, state=lstm_states[player], deterministic=True
    )

    obs, reward, done, truncated, info = env.step(action)

    if done:
        print(f"Player {info['winner']} wins!")
        break

Battle Arena Multiplayer

In the battle arena:

Click 3P or 4P button
Configure a model for each player slot
Run batch evaluation
Compare multi-player statistics

Strategy Considerations

3-Player Strategy

Watch both opponents’ card counts
Reverse is defensive (buys time)
+2/+4 more impactful (one enemy at a time)
Position matters (who’s before/after you)

4-Player Strategy

Track all three opponents
Reverse changes who gets targeted
Alliances form naturally
More variance in outcomes

Special Card Effects

Card	2-Player	3-4 Player
Reverse	Acts as Skip	Changes direction
Skip	Opponent skipped	Next player skipped
+2	Opponent draws 2	Next player draws 2
+4	Opponent draws 4	Next player draws 4

Training Multiplayer Agents

Training in multiplayer requires special considerations:

from src.multiplayer_env import MultiplayerUnoEnv
from sb3_contrib import RecurrentPPO

# Create multiplayer environment
env = MultiplayerUnoEnv(num_players=4)

# Train agent
model = RecurrentPPO(
    "MlpLstmPolicy",
    env,
    verbose=1,
    learning_rate=1e-4,
    n_steps=256,  # Longer episodes
    batch_size=64
)

model.learn(total_timesteps=2000000)
model.save("models/multiplayer_champion.zip")

Evaluation

Win rate interpretation changes:

2-player: 50% = equal skill
3-player: 33% = equal skill
4-player: 25% = equal skill

A 40% win rate in 4-player is excellent (1.6x expected)!

API Reference

MultiplayerUnoEnv

class MultiplayerUnoEnv(gym.Env):
    """
    UNO environment supporting 2-4 players.

    Parameters
    ----------
    num_players : int
        Number of players (2-4)

    Attributes
    ----------
    current_player : int
        Index of current player (0 to num_players-1)
    direction : int
        Turn direction (1=clockwise, -1=counter-clockwise)
    hands : List[List[Card]]
        Cards in each player's hand
    """

    def reset(self):
        """Reset game and return initial observation."""

    def step(self, action):
        """Execute action and return (obs, reward, done, truncated, info)."""

    def get_valid_actions(self):
        """Return list of valid action indices."""

Known Limitations

Current multiplayer implementation notes:

Human Player: Always player 0 in GUI
Training: Agents trained in 2-player may underperform in 4-player
Observation: Can see opponent hand sizes but not specific cards
Stacking: +2/+4 stacking not implemented (official UNO rules)

Future Improvements

Planned enhancements:

[ ] Online multiplayer support
[ ] Mixed human/AI games
[ ] Tournament mode
[ ] Elo rating system
[ ] Replay saving/loading