Multiplayer
This guide covers the multiplayer features supporting 3-4 player games.
Overview
While standard UNO is often played with 2 players, our implementation supports 2-4 players for more dynamic gameplay.
Key Differences from 2-Player
Turn Direction: Reverse card actually reverses turn order
Skip Mechanics: Skip affects the immediate next player
Draw Effects: +2 and +4 target the next player in turn order
Strategy: Must consider multiple opponents
Multiplayer Environment
The multiplayer environment extends the base UNO environment:
from src.multiplayer_env import MultiplayerUnoEnv
# Create 4-player game
env = MultiplayerUnoEnv(num_players=4)
# Reset returns observation for player 0
obs = env.reset()
# Step returns (obs, reward, done, truncated, info)
obs, reward, done, truncated, info = env.step(action)
Observation Space
Extended 25-dimensional observation:
Feature |
Dimensions |
Description |
|---|---|---|
Open card color |
4 |
One-hot encoded |
Number cards per color |
4 |
Cards in hand |
Special cards |
3 |
Skip, Reverse, +2 |
Wild cards |
2 |
Wild, Wild +4 |
Playable colors |
4 |
Which colors can be played |
Opponent 1 hand size |
4 |
Bucketed (1-3, 4-6, 7+, UNO) |
Opponent 2 hand size |
4 |
Same buckets |
Action Space
Same 9 actions as 2-player:
Play Red card
Play Green card
Play Blue card
Play Yellow card
Play Skip
Play Reverse
Play +2
Play Wild +4
Play Wild (choose color)
Turn Direction
The Reverse card changes turn direction:
Clockwise (Normal): 1 → 2 → 3 → 4 → 1 → ...
Counter-clockwise: 1 → 4 → 3 → 2 → 1 → ...
Implementation:
class MultiplayerUnoEnv:
def __init__(self, num_players):
self.direction = 1 # 1 = clockwise, -1 = counter-clockwise
def _handle_reverse(self):
self.direction *= -1
def _next_player(self):
return (self.current_player + self.direction) % self.num_players
Running Multiplayer Games
GUI Mode
Launch the multiplayer GUI:
python multiplayer_gui.py --players 4
Or from the main GUI:
Open
uno_gui.pyClick “Multiplayer (3-4)”
Select number of players
Programmatic Mode
from src.multiplayer_env import MultiplayerUnoEnv
from sb3_contrib import RecurrentPPO
# Load models
models = [
RecurrentPPO.load("models/selfplay_champion.zip"),
RecurrentPPO.load("models/best_recurrent_ppo.zip"),
RecurrentPPO.load("models/optimal_recurrent_ppo.zip"),
RecurrentPPO.load("models/sb3_recurrent_ppo.zip")
]
# Create environment
env = MultiplayerUnoEnv(num_players=4)
# Game loop
obs = env.reset()
lstm_states = [None] * 4
while True:
player = env.current_player
action, lstm_states[player] = models[player].predict(
obs, state=lstm_states[player], deterministic=True
)
obs, reward, done, truncated, info = env.step(action)
if done:
print(f"Player {info['winner']} wins!")
break
Battle Arena Multiplayer
In the battle arena:
Click 3P or 4P button
Configure a model for each player slot
Run batch evaluation
Compare multi-player statistics
Strategy Considerations
3-Player Strategy
Watch both opponents’ card counts
Reverse is defensive (buys time)
+2/+4 more impactful (one enemy at a time)
Position matters (who’s before/after you)
4-Player Strategy
Track all three opponents
Reverse changes who gets targeted
Alliances form naturally
More variance in outcomes
Special Card Effects
Card |
2-Player |
3-4 Player |
|---|---|---|
Reverse |
Acts as Skip |
Changes direction |
Skip |
Opponent skipped |
Next player skipped |
+2 |
Opponent draws 2 |
Next player draws 2 |
+4 |
Opponent draws 4 |
Next player draws 4 |
Training Multiplayer Agents
Training in multiplayer requires special considerations:
from src.multiplayer_env import MultiplayerUnoEnv
from sb3_contrib import RecurrentPPO
# Create multiplayer environment
env = MultiplayerUnoEnv(num_players=4)
# Train agent
model = RecurrentPPO(
"MlpLstmPolicy",
env,
verbose=1,
learning_rate=1e-4,
n_steps=256, # Longer episodes
batch_size=64
)
model.learn(total_timesteps=2000000)
model.save("models/multiplayer_champion.zip")
Evaluation
Win rate interpretation changes:
2-player: 50% = equal skill
3-player: 33% = equal skill
4-player: 25% = equal skill
A 40% win rate in 4-player is excellent (1.6x expected)!
API Reference
MultiplayerUnoEnv
class MultiplayerUnoEnv(gym.Env):
"""
UNO environment supporting 2-4 players.
Parameters
----------
num_players : int
Number of players (2-4)
Attributes
----------
current_player : int
Index of current player (0 to num_players-1)
direction : int
Turn direction (1=clockwise, -1=counter-clockwise)
hands : List[List[Card]]
Cards in each player's hand
"""
def reset(self):
"""Reset game and return initial observation."""
def step(self, action):
"""Execute action and return (obs, reward, done, truncated, info)."""
def get_valid_actions(self):
"""Return list of valid action indices."""
Known Limitations
Current multiplayer implementation notes:
Human Player: Always player 0 in GUI
Training: Agents trained in 2-player may underperform in 4-player
Observation: Can see opponent hand sizes but not specific cards
Stacking: +2/+4 stacking not implemented (official UNO rules)
Future Improvements
Planned enhancements:
[ ] Online multiplayer support
[ ] Mixed human/AI games
[ ] Tournament mode
[ ] Elo rating system
[ ] Replay saving/loading