Architecture
This document describes the overall architecture of the UNO Card Game RL project.
Project Structure
uno-card-game-rl/
├── src/ # Core source code
│ ├── __init__.py
│ ├── game.py # UNO game engine
│ ├── cards.py # Card definitions
│ ├── players.py # Player classes
│ ├── turn.py # Turn management
│ ├── agents.py # RL agent wrappers
│ ├── dqn_agent.py # DQN implementation
│ ├── sb3_agent.py # Stable-Baselines3 agents
│ ├── state_action_reward.py # RL interface
│ ├── utils.py # Utilities
│ └── multiplayer_env.py # Multiplayer environment
│
├── training/ # Training scripts
│ └── train_selfplay.py # Self-play training
│
├── models/ # Saved models
│ ├── selfplay_champion.zip
│ ├── best_recurrent_ppo.zip
│ └── ...
│
├── docs/ # Documentation
│ ├── conf.py
│ ├── index.rst
│ └── ...
│
├── logs/ # Training logs
├── assets/ # Data files
├── comparison_results/ # Evaluation results
│
├── uno_gui.py # Main game GUI
├── model_battle_gui.py # Battle arena GUI
├── multiplayer_gui.py # Multiplayer GUI
│
├── train_rl.py # General training script
├── compare_models.py # Model comparison
├── config.py # Configuration
├── run.py # Quick run script
└── requirements.txt # Dependencies
Core Components
Game Engine
The game engine (src/game.py) handles all UNO game logic:
class UnoGame:
"""
Main UNO game controller.
Manages:
- Deck and discard pile
- Player hands
- Turn order
- Win conditions
"""
def play_card(self, player, card):
"""Execute a card play."""
def draw_card(self, player):
"""Player draws from deck."""
def get_winner(self):
"""Return winner if game over, else None."""
Card System
Cards (src/cards.py) are represented as:
@dataclass
class Card:
color: str # 'red', 'green', 'blue', 'yellow', 'wild'
value: str # '0'-'9', 'skip', 'reverse', 'draw2', 'wild', 'draw4'
class Deck:
"""Standard 108-card UNO deck."""
def shuffle(self): ...
def draw(self): ...
RL Environment
The Gymnasium environment (src/state_action_reward.py):
class UnoEnv(gym.Env):
"""
UNO as a Gymnasium environment.
Observation: 17-dim vector
Actions: 9 discrete actions
Reward: +1 win, -1 loss, 0 ongoing
"""
observation_space = spaces.Box(low=0, high=1, shape=(17,))
action_space = spaces.Discrete(9)
def step(self, action):
# Execute action, return (obs, reward, done, truncated, info)
def reset(self):
# Start new game, return initial observation
Agent Wrappers
Agents (src/agents.py) provide unified interface:
class RLAgent:
"""Base class for all RL agents."""
def select_action(self, obs, valid_actions):
"""Return action index."""
class SB3Agent(RLAgent):
"""Wrapper for Stable-Baselines3 models."""
def __init__(self, model_path):
self.model = RecurrentPPO.load(model_path)
def select_action(self, obs, valid_actions):
action, self.state = self.model.predict(obs, state=self.state)
return action
Data Flow
Training Flow
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Environment │ ←── │ Agent │ ←── │ Model │
│ (UnoEnv) │ │ (SB3Agent) │ │ (RecPPO) │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
│ observation │ │
│ ─────────────────→│ observation │
│ │ ─────────────────→│
│ │ │
│ │ action │
│ action │ ←─────────────────│
│ ←─────────────────│ │
│ │ │
│ reward, done │ │
│ ─────────────────→│ experience │
│ │ ─────────────────→│
└───────────────────┴───────────────────┘
Inference Flow
User Input → GUI → Agent.select_action() → Environment.step() → GUI Update
State Representation
17-Dimensional Observation
Index |
Feature |
Range |
|---|---|---|
0-3 |
Open card color (one-hot) |
[0, 1] |
4-7 |
Number cards per color |
[0, 1] normalized |
8-10 |
Special cards (Skip/Rev/+2) |
[0, 1] normalized |
11-12 |
Wild cards count |
[0, 1] normalized |
13-16 |
Playable colors |
[0, 1] |
Action Encoding
Index |
Action |
Description |
|---|---|---|
0 |
RED |
Play any red card |
1 |
GREEN |
Play any green card |
2 |
BLUE |
Play any blue card |
3 |
YELLOW |
Play any yellow card |
4 |
SKIP |
Play skip card |
5 |
REVERSE |
Play reverse card |
6 |
DRAW2 |
Play +2 card |
7 |
DRAW4 |
Play wild +4 |
8 |
WILD |
Play wild card |
Reward Structure
def get_reward(self, done, winner):
if not done:
return 0.0 # Game ongoing
if winner == self.agent_player:
return 1.0 # Win
return -1.0 # Loss
Neural Network Architecture
RecurrentPPO Network
┌──────────────────┐
│ Observation │
│ (17 dim) │
└────────┬─────────┘
│
┌────────▼─────────┐
│ LSTM │
│ (256 hidden) │
│ │
│ h_t = LSTM( │
│ x_t, h_{t-1}) │
└────────┬─────────┘
│
┌────────▼─────────┐
│ MLP Layers │
│ 256 → 128 → 64 │
│ (ReLU) │
└────────┬─────────┘
│
┌─────┴─────┐
│ │
┌──▼───┐ ┌───▼──┐
│Policy│ │Value │
│(9dim)│ │(1dim)│
└──────┘ └──────┘
Why LSTM?
LSTM enables:
Memory: Track cards played earlier in game
Inference: Deduce opponent’s hand from play history
Strategy: Remember opponent patterns
Context: Understand game state evolution
GUI Architecture
All GUIs use Pygame with a similar structure:
class GameGUI:
def __init__(self):
pygame.init()
self.screen = pygame.display.set_mode((1280, 720))
self.clock = pygame.time.Clock()
def run(self):
while self.running:
self.handle_events()
self.update()
self.draw()
pygame.display.flip()
self.clock.tick(60)
Component Hierarchy
GameGUI
├── MenuScreen
│ ├── ModelSelector
│ └── Buttons
├── GameScreen
│ ├── CardRenderer
│ ├── DeckDisplay
│ └── ActionLog
└── EndScreen
Extension Points
Adding New Algorithms
Create agent wrapper in
src/agents.pyAdd training script
Register in
config.pyAdd to GUI model list
Adding New Features
Extend
UnoEnvobservation/action spaceUpdate
UnoGamelogicModify GUI rendering
Update documentation
Testing
Unit tests are in tests/:
pytest tests/ -v
Test coverage includes:
Card mechanics
Game rules
Environment interface
Agent behavior