===========
Multiplayer
===========

This guide covers the multiplayer features supporting 3-4 player games.

Overview
========

While standard UNO is often played with 2 players, our implementation supports 2-4 players for more dynamic gameplay.

Key Differences from 2-Player
-----------------------------

- **Turn Direction**: Reverse card actually reverses turn order
- **Skip Mechanics**: Skip affects the immediate next player
- **Draw Effects**: +2 and +4 target the next player in turn order
- **Strategy**: Must consider multiple opponents

Multiplayer Environment
=======================

The multiplayer environment extends the base UNO environment:

.. code-block:: python

    from src.multiplayer_env import MultiplayerUnoEnv
    
    # Create 4-player game
    env = MultiplayerUnoEnv(num_players=4)
    
    # Reset returns observation for player 0
    obs = env.reset()
    
    # Step returns (obs, reward, done, truncated, info)
    obs, reward, done, truncated, info = env.step(action)

Observation Space
-----------------

Extended 25-dimensional observation:

.. list-table::
   :header-rows: 1
   :widths: 40 20 40

   * - Feature
     - Dimensions
     - Description
   * - Open card color
     - 4
     - One-hot encoded
   * - Number cards per color
     - 4
     - Cards in hand
   * - Special cards
     - 3
     - Skip, Reverse, +2
   * - Wild cards
     - 2
     - Wild, Wild +4
   * - Playable colors
     - 4
     - Which colors can be played
   * - Opponent 1 hand size
     - 4
     - Bucketed (1-3, 4-6, 7+, UNO)
   * - Opponent 2 hand size
     - 4
     - Same buckets

Action Space
------------

Same 9 actions as 2-player:

0. Play Red card
1. Play Green card  
2. Play Blue card
3. Play Yellow card
4. Play Skip
5. Play Reverse
6. Play +2
7. Play Wild +4
8. Play Wild (choose color)

Turn Direction
==============

The Reverse card changes turn direction:

.. code-block:: text

    Clockwise (Normal):     1 → 2 → 3 → 4 → 1 → ...
    Counter-clockwise:      1 → 4 → 3 → 2 → 1 → ...

Implementation:

.. code-block:: python

    class MultiplayerUnoEnv:
        def __init__(self, num_players):
            self.direction = 1  # 1 = clockwise, -1 = counter-clockwise
            
        def _handle_reverse(self):
            self.direction *= -1
            
        def _next_player(self):
            return (self.current_player + self.direction) % self.num_players

Running Multiplayer Games
=========================

GUI Mode
--------

Launch the multiplayer GUI:

.. code-block:: bash

    python multiplayer_gui.py --players 4

Or from the main GUI:

1. Open ``uno_gui.py``
2. Click "Multiplayer (3-4)"
3. Select number of players

Programmatic Mode
-----------------

.. code-block:: python

    from src.multiplayer_env import MultiplayerUnoEnv
    from sb3_contrib import RecurrentPPO
    
    # Load models
    models = [
        RecurrentPPO.load("models/selfplay_champion.zip"),
        RecurrentPPO.load("models/best_recurrent_ppo.zip"),
        RecurrentPPO.load("models/optimal_recurrent_ppo.zip"),
        RecurrentPPO.load("models/sb3_recurrent_ppo.zip")
    ]
    
    # Create environment
    env = MultiplayerUnoEnv(num_players=4)
    
    # Game loop
    obs = env.reset()
    lstm_states = [None] * 4
    
    while True:
        player = env.current_player
        action, lstm_states[player] = models[player].predict(
            obs, state=lstm_states[player], deterministic=True
        )
        
        obs, reward, done, truncated, info = env.step(action)
        
        if done:
            print(f"Player {info['winner']} wins!")
            break

Battle Arena Multiplayer
------------------------

In the battle arena:

1. Click **3P** or **4P** button
2. Configure a model for each player slot
3. Run batch evaluation
4. Compare multi-player statistics

Strategy Considerations
=======================

3-Player Strategy
-----------------

- Watch both opponents' card counts
- Reverse is defensive (buys time)
- +2/+4 more impactful (one enemy at a time)
- Position matters (who's before/after you)

4-Player Strategy
-----------------

- Track all three opponents
- Reverse changes who gets targeted
- Alliances form naturally
- More variance in outcomes

Special Card Effects
--------------------

.. list-table::
   :header-rows: 1

   * - Card
     - 2-Player
     - 3-4 Player
   * - Reverse
     - Acts as Skip
     - Changes direction
   * - Skip
     - Opponent skipped
     - Next player skipped
   * - +2
     - Opponent draws 2
     - Next player draws 2
   * - +4
     - Opponent draws 4
     - Next player draws 4

Training Multiplayer Agents
===========================

Training in multiplayer requires special considerations:

.. code-block:: python

    from src.multiplayer_env import MultiplayerUnoEnv
    from sb3_contrib import RecurrentPPO
    
    # Create multiplayer environment
    env = MultiplayerUnoEnv(num_players=4)
    
    # Train agent
    model = RecurrentPPO(
        "MlpLstmPolicy",
        env,
        verbose=1,
        learning_rate=1e-4,
        n_steps=256,  # Longer episodes
        batch_size=64
    )
    
    model.learn(total_timesteps=2000000)
    model.save("models/multiplayer_champion.zip")

Evaluation
----------

Win rate interpretation changes:

- 2-player: 50% = equal skill
- 3-player: 33% = equal skill
- 4-player: 25% = equal skill

A 40% win rate in 4-player is **excellent** (1.6x expected)!

API Reference
=============

MultiplayerUnoEnv
-----------------

.. code-block:: python

    class MultiplayerUnoEnv(gym.Env):
        """
        UNO environment supporting 2-4 players.
        
        Parameters
        ----------
        num_players : int
            Number of players (2-4)
            
        Attributes
        ----------
        current_player : int
            Index of current player (0 to num_players-1)
        direction : int
            Turn direction (1=clockwise, -1=counter-clockwise)
        hands : List[List[Card]]
            Cards in each player's hand
        """
        
        def reset(self):
            """Reset game and return initial observation."""
            
        def step(self, action):
            """Execute action and return (obs, reward, done, truncated, info)."""
            
        def get_valid_actions(self):
            """Return list of valid action indices."""

Known Limitations
=================

Current multiplayer implementation notes:

1. **Human Player**: Always player 0 in GUI
2. **Training**: Agents trained in 2-player may underperform in 4-player
3. **Observation**: Can see opponent hand sizes but not specific cards
4. **Stacking**: +2/+4 stacking not implemented (official UNO rules)

Future Improvements
===================

Planned enhancements:

- [ ] Online multiplayer support
- [ ] Mixed human/AI games
- [ ] Tournament mode
- [ ] Elo rating system
- [ ] Replay saving/loading