Deep Q Learning | 1 | Temporal Difference | Off-Policy | Yes | No | No | Yes | No |
Double Deep Q Learning V1 (Randomly Chosen Network) | 1 (2 Model Parameters) | Temporal Difference | Off-Policy | Yes | No | No | Yes | No |
Double Deep Q Learning V2 (Target Network) | 1 (2 Model Parameters) | Temporal Difference | Off-Policy | Yes | No | No | Yes | No |
Deep State-Action-Reward-State-Action | 1 | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
Double Deep State-Action-Reward-State-Action V1 (Randomly Chosen Network) | 1 (2 Model Parameters) | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
Double Deep State-Action-Reward-State-Action V2 (Target Network) | 1 (2 Model Parameters) | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
Deep Expected State-Action-Reward-State-Action | 1 | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
Double Deep Expected State-Action-Reward-State-Action V1 (Randomly Chosen Network) | 1 (2 Model Parameters) | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
Double Deep Expected State-Action-Reward-State-Action V2 (Target Network) | 1 (2 Model Parameters) | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
MonteCarloControl | 1 | Both | On-Policy | Yes | No | No | Yes | No |
OffPolicyMonteCarloControl | 1 | Both | Off-Policy | Yes | No | No | Yes | No |
REINFORCE | 1 | Both | On-Policy | No | Yes | Yes | Yes | Yes |
Vanilla Policy Gradient | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
Actor-Critic | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
Advantage Actor-Critic | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
Asynchronous Advantage Actor-Critic | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
Proximal Policy Optimization | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
Proximal Policy Optimization with Clipped Objective | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
Soft Actor Critic | 2 (Actor + Critic) | Temporal Difference | Off-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
Deep Deterministic Policy Gradient | 2 (Actor + Critic) | Temporal Difference | Off-Policy | Yes (Actor) | Yes (Critic) | Yes | No | Yes |
Twin Delayed Deep Deterministic Policy Gradient | 2 (Actor + Critic) | Temporal Difference | Off-Policy | Yes (Actor) | Yes (Critic) | Yes | No | Yes |