Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

[1.0.0] - 2026-01-XX

Added

  • Self-Play Training: New training framework for achieving 70%+ win rates

    • Population-based training with opponent pool

    • Curriculum learning from random to self-play

    • Checkpoint management for diverse opponents

  • Multiplayer Support: Extended game to 2-4 players

    • MultiplayerUnoEnv with proper turn direction

    • 25-dimensional observation including opponent hand sizes

    • Skip and Reverse mechanics for 3-4 players

  • Model Battle Arena: New GUI for comparing models

    • 2-4 player battle support

    • Batch evaluation (10-1000 games)

    • CSV export functionality

    • Multiple model selectors

  • GUI Improvements

    • ModelSelector dropdown in main menu

    • Model discovery from filesystem

    • Multiplayer launcher button

    • Glassmorphism design updates

  • Documentation

    • Complete ReadTheDocs documentation

    • LaTeX report with methodology

    • Presentation slides (Beamer)

    • API reference documentation

Changed

  • Updated uno_gui.py with model selection and multiplayer button

  • Extended model_battle_gui.py for 2-4 players

  • Added Self-Play Champion to all comparison scripts

  • Improved README with model performance table

Fixed

  • Fixed gymnasium import in train_selfplay.py

  • Corrected action masking for invalid plays

  • Fixed card rendering for special cards

[0.2.0] - 2025-12-XX

Added

  • Recurrent PPO Implementation

    • LSTM-based policies for partial observability

    • 60% win rate achievement

    • Multiple training configurations

  • Training Scripts

    • train_recurrent_ppo.py

    • train_best_recurrent_ppo.py

    • train_optimal_recurrent_ppo.py

  • Model Comparison

    • compare_models.py for batch evaluation

    • CSV result export

    • TensorBoard logging

Changed

  • Switched from gym to gymnasium

  • Updated stable-baselines3 to v2.0+

  • Improved reward shaping

[0.1.0] - 2025-11-XX

Added

  • Core Game Engine

    • Complete UNO rules implementation

    • Card and Deck classes

    • Game state management

  • Basic RL Environment

    • Gymnasium-compatible UnoEnv

    • 17-dimensional observation

    • 9 discrete actions

  • Initial Agents

    • Q-Learning (tabular)

    • DQN agent

    • PPO and A2C via stable-baselines3

  • Main GUI

    • Pygame-based interface

    • Human vs AI mode

    • AI vs AI spectator mode

  • Training Infrastructure

    • train_rl.py general trainer

    • train_sb3.py for SB3 models

    • Evaluation callbacks