| Deep Q Learning | 1 | Temporal Difference | Off-Policy | Yes | No | No | Yes | No |
| Double Deep Q Learning V1 (Randomly Chosen Network) | 1 (2 Model Parameters) | Temporal Difference | Off-Policy | Yes | No | No | Yes | No |
| Double Deep Q Learning V2 (Target Network) | 1 (2 Model Parameters) | Temporal Difference | Off-Policy | Yes | No | No | Yes | No |
| Deep State-Action-Reward-State-Action | 1 | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
| Double Deep State-Action-Reward-State-Action V1 (Randomly Chosen Network) | 1 (2 Model Parameters) | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
| Double Deep State-Action-Reward-State-Action V2 (Target Network) | 1 (2 Model Parameters) | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
| Deep Expected State-Action-Reward-State-Action | 1 | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
| Double Deep Expected State-Action-Reward-State-Action V1 (Randomly Chosen Network) | 1 (2 Model Parameters) | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
| Double Deep Expected State-Action-Reward-State-Action V2 (Target Network) | 1 (2 Model Parameters) | Temporal Difference | On-Policy | Yes | No | No | Yes | No |
| MonteCarloControl | 1 | Both | On-Policy | Yes | No | No | Yes | No |
| OffPolicyMonteCarloControl | 1 | Both | Off-Policy | Yes | No | No | Yes | No |
| REINFORCE | 1 | Both | On-Policy | No | Yes | Yes | Yes | Yes |
| Vanilla Policy Gradient | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
| Actor-Critic | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
| Advantage Actor-Critic | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
| Asynchronous Advantage Actor-Critic | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
| Proximal Policy Optimization | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
| Proximal Policy Optimization with Clipped Objective | 2 (Actor + Critic) | Both | On-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
| Soft Actor-Critic | 2 (Actor + Critic) | Temporal Difference | Off-Policy | Yes (Actor) | Yes (Critic) | Yes | Yes | Yes |
| Deep Deterministic Policy Gradient | 2 (Actor + Critic) | Temporal Difference | Off-Policy | Yes (Actor) | Yes (Critic) | Yes | No | Yes |
| Twin Delayed Deep Deterministic Policy Gradient | 2 (Actor + Critic) | Temporal Difference | Off-Policy | Yes (Actor) | Yes (Critic) | Yes | No | Yes |