Architecture

This document describes the overall architecture of the UNO Card Game RL project.

Project Structure

uno-card-game-rl/
├── src/                    # Core source code
│   ├── __init__.py
│   ├── game.py            # UNO game engine
│   ├── cards.py           # Card definitions
│   ├── players.py         # Player classes
│   ├── turn.py            # Turn management
│   ├── agents.py          # RL agent wrappers
│   ├── dqn_agent.py       # DQN implementation
│   ├── sb3_agent.py       # Stable-Baselines3 agents
│   ├── state_action_reward.py  # RL interface
│   ├── utils.py           # Utilities
│   └── multiplayer_env.py # Multiplayer environment
│
├── training/              # Training scripts
│   └── train_selfplay.py  # Self-play training
│
├── models/                # Saved models
│   ├── selfplay_champion.zip
│   ├── best_recurrent_ppo.zip
│   └── ...
│
├── docs/                  # Documentation
│   ├── conf.py
│   ├── index.rst
│   └── ...
│
├── logs/                  # Training logs
├── assets/                # Data files
├── comparison_results/    # Evaluation results
│
├── uno_gui.py            # Main game GUI
├── model_battle_gui.py   # Battle arena GUI
├── multiplayer_gui.py    # Multiplayer GUI
│
├── train_rl.py           # General training script
├── compare_models.py     # Model comparison
├── config.py             # Configuration
├── run.py                # Quick run script
└── requirements.txt      # Dependencies

Core Components

Game Engine

The game engine (src/game.py) handles all UNO game logic:

class UnoGame:
    """
    Main UNO game controller.

    Manages:
    - Deck and discard pile
    - Player hands
    - Turn order
    - Win conditions
    """

    def play_card(self, player, card):
        """Execute a card play."""

    def draw_card(self, player):
        """Player draws from deck."""

    def get_winner(self):
        """Return winner if game over, else None."""

Card System

Cards (src/cards.py) are represented as:

@dataclass
class Card:
    color: str      # 'red', 'green', 'blue', 'yellow', 'wild'
    value: str      # '0'-'9', 'skip', 'reverse', 'draw2', 'wild', 'draw4'

class Deck:
    """Standard 108-card UNO deck."""

    def shuffle(self): ...
    def draw(self): ...

RL Environment

The Gymnasium environment (src/state_action_reward.py):

class UnoEnv(gym.Env):
    """
    UNO as a Gymnasium environment.

    Observation: 17-dim vector
    Actions: 9 discrete actions
    Reward: +1 win, -1 loss, 0 ongoing
    """

    observation_space = spaces.Box(low=0, high=1, shape=(17,))
    action_space = spaces.Discrete(9)

    def step(self, action):
        # Execute action, return (obs, reward, done, truncated, info)

    def reset(self):
        # Start new game, return initial observation

Agent Wrappers

Agents (src/agents.py) provide unified interface:

class RLAgent:
    """Base class for all RL agents."""

    def select_action(self, obs, valid_actions):
        """Return action index."""

class SB3Agent(RLAgent):
    """Wrapper for Stable-Baselines3 models."""

    def __init__(self, model_path):
        self.model = RecurrentPPO.load(model_path)

    def select_action(self, obs, valid_actions):
        action, self.state = self.model.predict(obs, state=self.state)
        return action

Data Flow

Training Flow

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Environment │ ←── │   Agent     │ ←── │   Model     │
│  (UnoEnv)   │     │ (SB3Agent)  │     │ (RecPPO)    │
└──────┬──────┘     └──────┬──────┘     └──────┬──────┘
       │                   │                   │
       │    observation    │                   │
       │ ─────────────────→│   observation     │
       │                   │ ─────────────────→│
       │                   │                   │
       │                   │      action       │
       │      action       │ ←─────────────────│
       │ ←─────────────────│                   │
       │                   │                   │
       │ reward, done      │                   │
       │ ─────────────────→│   experience      │
       │                   │ ─────────────────→│
       └───────────────────┴───────────────────┘

Inference Flow

User Input → GUI → Agent.select_action() → Environment.step() → GUI Update

State Representation

17-Dimensional Observation

Index	Feature	Range
0-3	Open card color (one-hot)	[0, 1]
4-7	Number cards per color	[0, 1] normalized
8-10	Special cards (Skip/Rev/+2)	[0, 1] normalized
11-12	Wild cards count	[0, 1] normalized
13-16	Playable colors	[0, 1]

Action Encoding

Index	Action	Description
0	RED	Play any red card
1	GREEN	Play any green card
2	BLUE	Play any blue card
3	YELLOW	Play any yellow card
4	SKIP	Play skip card
5	REVERSE	Play reverse card
6	DRAW2	Play +2 card
7	DRAW4	Play wild +4
8	WILD	Play wild card

Reward Structure

def get_reward(self, done, winner):
    if not done:
        return 0.0  # Game ongoing
    if winner == self.agent_player:
        return 1.0  # Win
    return -1.0  # Loss

Neural Network Architecture

RecurrentPPO Network

┌──────────────────┐
│  Observation     │
│    (17 dim)      │
└────────┬─────────┘
         │
┌────────▼─────────┐
│      LSTM        │
│   (256 hidden)   │
│                  │
│  h_t = LSTM(     │
│    x_t, h_{t-1}) │
└────────┬─────────┘
         │
┌────────▼─────────┐
│    MLP Layers    │
│  256 → 128 → 64  │
│     (ReLU)       │
└────────┬─────────┘
         │
   ┌─────┴─────┐
   │           │
┌──▼───┐   ┌───▼──┐
│Policy│   │Value │
│(9dim)│   │(1dim)│
└──────┘   └──────┘

Why LSTM?

LSTM enables:

Memory: Track cards played earlier in game
Inference: Deduce opponent’s hand from play history
Strategy: Remember opponent patterns
Context: Understand game state evolution

GUI Architecture

All GUIs use Pygame with a similar structure:

class GameGUI:
    def __init__(self):
        pygame.init()
        self.screen = pygame.display.set_mode((1280, 720))
        self.clock = pygame.time.Clock()

    def run(self):
        while self.running:
            self.handle_events()
            self.update()
            self.draw()
            pygame.display.flip()
            self.clock.tick(60)

Component Hierarchy

GameGUI
├── MenuScreen
│   ├── ModelSelector
│   └── Buttons
├── GameScreen
│   ├── CardRenderer
│   ├── DeckDisplay
│   └── ActionLog
└── EndScreen

Extension Points

Adding New Algorithms

Create agent wrapper in src/agents.py
Add training script
Register in config.py
Add to GUI model list

Adding New Features

Extend UnoEnv observation/action space
Update UnoGame logic
Modify GUI rendering
Update documentation

Testing

Unit tests are in tests/:

pytest tests/ -v

Test coverage includes:

Card mechanics
Game rules
Environment interface
Agent behavior