Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

[1.0.0] - 2026-01-XX

Self-Play Training: New training framework for achieving 70%+ win rates
- Population-based training with opponent pool
- Curriculum learning from random to self-play
- Checkpoint management for diverse opponents
Multiplayer Support: Extended game to 2-4 players
- MultiplayerUnoEnv with proper turn direction
- 25-dimensional observation including opponent hand sizes
- Skip and Reverse mechanics for 3-4 players
Model Battle Arena: New GUI for comparing models
- 2-4 player battle support
- Batch evaluation (10-1000 games)
- CSV export functionality
- Multiple model selectors
GUI Improvements
- ModelSelector dropdown in main menu
- Model discovery from filesystem
- Multiplayer launcher button
- Glassmorphism design updates
Documentation
- Complete ReadTheDocs documentation
- LaTeX report with methodology
- Presentation slides (Beamer)
- API reference documentation

Recurrent PPO Implementation
- LSTM-based policies for partial observability
- 60% win rate achievement
- Multiple training configurations
Training Scripts
- train_recurrent_ppo.py
- train_best_recurrent_ppo.py
- train_optimal_recurrent_ppo.py
Model Comparison
- compare_models.py for batch evaluation
- CSV result export
- TensorBoard logging

Core Game Engine
- Complete UNO rules implementation
- Card and Deck classes
- Game state management
Basic RL Environment
- Gymnasium-compatible UnoEnv
- 17-dimensional observation
- 9 discrete actions
Initial Agents
- Q-Learning (tabular)
- DQN agent
- PPO and A2C via stable-baselines3
Main GUI
- Pygame-based interface
- Human vs AI mode
- AI vs AI spectator mode
Training Infrastructure
- train_rl.py general trainer
- train_sb3.py for SB3 models
- Evaluation callbacks