============ Architecture ============ This document describes the overall architecture of the UNO Card Game RL project. Project Structure ================= .. code-block:: text uno-card-game-rl/ ├── src/ # Core source code │ ├── __init__.py │ ├── game.py # UNO game engine │ ├── cards.py # Card definitions │ ├── players.py # Player classes │ ├── turn.py # Turn management │ ├── agents.py # RL agent wrappers │ ├── dqn_agent.py # DQN implementation │ ├── sb3_agent.py # Stable-Baselines3 agents │ ├── state_action_reward.py # RL interface │ ├── utils.py # Utilities │ └── multiplayer_env.py # Multiplayer environment │ ├── training/ # Training scripts │ └── train_selfplay.py # Self-play training │ ├── models/ # Saved models │ ├── selfplay_champion.zip │ ├── best_recurrent_ppo.zip │ └── ... │ ├── docs/ # Documentation │ ├── conf.py │ ├── index.rst │ └── ... │ ├── logs/ # Training logs ├── assets/ # Data files ├── comparison_results/ # Evaluation results │ ├── uno_gui.py # Main game GUI ├── model_battle_gui.py # Battle arena GUI ├── multiplayer_gui.py # Multiplayer GUI │ ├── train_rl.py # General training script ├── compare_models.py # Model comparison ├── config.py # Configuration ├── run.py # Quick run script └── requirements.txt # Dependencies Core Components =============== Game Engine ----------- The game engine (``src/game.py``) handles all UNO game logic: .. code-block:: python class UnoGame: """ Main UNO game controller. Manages: - Deck and discard pile - Player hands - Turn order - Win conditions """ def play_card(self, player, card): """Execute a card play.""" def draw_card(self, player): """Player draws from deck.""" def get_winner(self): """Return winner if game over, else None.""" Card System ----------- Cards (``src/cards.py``) are represented as: .. code-block:: python @dataclass class Card: color: str # 'red', 'green', 'blue', 'yellow', 'wild' value: str # '0'-'9', 'skip', 'reverse', 'draw2', 'wild', 'draw4' class Deck: """Standard 108-card UNO deck.""" def shuffle(self): ... def draw(self): ... RL Environment -------------- The Gymnasium environment (``src/state_action_reward.py``): .. code-block:: python class UnoEnv(gym.Env): """ UNO as a Gymnasium environment. Observation: 17-dim vector Actions: 9 discrete actions Reward: +1 win, -1 loss, 0 ongoing """ observation_space = spaces.Box(low=0, high=1, shape=(17,)) action_space = spaces.Discrete(9) def step(self, action): # Execute action, return (obs, reward, done, truncated, info) def reset(self): # Start new game, return initial observation Agent Wrappers -------------- Agents (``src/agents.py``) provide unified interface: .. code-block:: python class RLAgent: """Base class for all RL agents.""" def select_action(self, obs, valid_actions): """Return action index.""" class SB3Agent(RLAgent): """Wrapper for Stable-Baselines3 models.""" def __init__(self, model_path): self.model = RecurrentPPO.load(model_path) def select_action(self, obs, valid_actions): action, self.state = self.model.predict(obs, state=self.state) return action Data Flow ========= Training Flow ------------- .. code-block:: text ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Environment │ ←── │ Agent │ ←── │ Model │ │ (UnoEnv) │ │ (SB3Agent) │ │ (RecPPO) │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ observation │ │ │ ─────────────────→│ observation │ │ │ ─────────────────→│ │ │ │ │ │ action │ │ action │ ←─────────────────│ │ ←─────────────────│ │ │ │ │ │ reward, done │ │ │ ─────────────────→│ experience │ │ │ ─────────────────→│ └───────────────────┴───────────────────┘ Inference Flow -------------- .. code-block:: text User Input → GUI → Agent.select_action() → Environment.step() → GUI Update State Representation ==================== 17-Dimensional Observation -------------------------- .. list-table:: :header-rows: 1 * - Index - Feature - Range * - 0-3 - Open card color (one-hot) - [0, 1] * - 4-7 - Number cards per color - [0, 1] normalized * - 8-10 - Special cards (Skip/Rev/+2) - [0, 1] normalized * - 11-12 - Wild cards count - [0, 1] normalized * - 13-16 - Playable colors - [0, 1] Action Encoding --------------- .. list-table:: :header-rows: 1 * - Index - Action - Description * - 0 - RED - Play any red card * - 1 - GREEN - Play any green card * - 2 - BLUE - Play any blue card * - 3 - YELLOW - Play any yellow card * - 4 - SKIP - Play skip card * - 5 - REVERSE - Play reverse card * - 6 - DRAW2 - Play +2 card * - 7 - DRAW4 - Play wild +4 * - 8 - WILD - Play wild card Reward Structure ---------------- .. code-block:: python def get_reward(self, done, winner): if not done: return 0.0 # Game ongoing if winner == self.agent_player: return 1.0 # Win return -1.0 # Loss Neural Network Architecture =========================== RecurrentPPO Network -------------------- .. code-block:: text ┌──────────────────┐ │ Observation │ │ (17 dim) │ └────────┬─────────┘ │ ┌────────▼─────────┐ │ LSTM │ │ (256 hidden) │ │ │ │ h_t = LSTM( │ │ x_t, h_{t-1}) │ └────────┬─────────┘ │ ┌────────▼─────────┐ │ MLP Layers │ │ 256 → 128 → 64 │ │ (ReLU) │ └────────┬─────────┘ │ ┌─────┴─────┐ │ │ ┌──▼───┐ ┌───▼──┐ │Policy│ │Value │ │(9dim)│ │(1dim)│ └──────┘ └──────┘ Why LSTM? --------- LSTM enables: 1. **Memory**: Track cards played earlier in game 2. **Inference**: Deduce opponent's hand from play history 3. **Strategy**: Remember opponent patterns 4. **Context**: Understand game state evolution GUI Architecture ================ All GUIs use Pygame with a similar structure: .. code-block:: python class GameGUI: def __init__(self): pygame.init() self.screen = pygame.display.set_mode((1280, 720)) self.clock = pygame.time.Clock() def run(self): while self.running: self.handle_events() self.update() self.draw() pygame.display.flip() self.clock.tick(60) Component Hierarchy ------------------- .. code-block:: text GameGUI ├── MenuScreen │ ├── ModelSelector │ └── Buttons ├── GameScreen │ ├── CardRenderer │ ├── DeckDisplay │ └── ActionLog └── EndScreen Extension Points ================ Adding New Algorithms --------------------- 1. Create agent wrapper in ``src/agents.py`` 2. Add training script 3. Register in ``config.py`` 4. Add to GUI model list Adding New Features ------------------- 1. Extend ``UnoEnv`` observation/action space 2. Update ``UnoGame`` logic 3. Modify GUI rendering 4. Update documentation Testing ======= Unit tests are in ``tests/``: .. code-block:: bash pytest tests/ -v Test coverage includes: - Card mechanics - Game rules - Environment interface - Agent behavior