Reinforcement Learning

This page contains resources about Reinforcement Learning.

Subfields and ConceptsEdit

  • Action space
    • Discrete
    • Continuous (usually dealt by Actor-Critic Methods)
  • Multi-Armed Bandit
  • Finite Markov Decision Process (MDP)
  • Partially Observable MDP (POMDP)
  • Model-based RL (i.e. model the environment)
  • Model-free RL
    • Value-based Methods
      • Temporal-Difference (TD) Learning
      • SARSA
      • Q-Learning
        • Deep Q-learning / Deep Q Network (DQN)
        • Double Q-learning / Double DQN
        • Dueling DQN
    • Policy-based methods / Policy Optimization
      • (Pure) Policy Gradients Methods
      • Trust Region Policy Optimization (TRPO)
    • Actor-Critic Methods (i.e. combination of Value-based and Policy-based Methods)
      • Advantage-Actor-Critic (A2C)
      • Asynchronous Advantage-Actor-Critic (A3C)
      • Soft Actor Critic (SAC)
      • Neural Fitted Q Iteration with Continuous Actions (NFQCA)
      • Deterministic Policy Gradient (DPG)
      • Deep DPG (DDPG)
      • Twin Delayed DDPG (TD3)
      • Proximal Policy Optimization (PPO)
  • Evolutionary Algorithms
    • Cross­ Entropy Method (CEM)
    • Covariance Matrix Adaptation (CMA)
    • Genetic Algorithms
  • Adaptive Dynamic Programming
  • Deep Reinforcement Learning
    • Deep Q-learning / Deep Q Network (DQN)
    • Deep Recurrent Q-Network (DRQN)
    • Deep Soft Recurrent Q-Network (DSRQN)
    • Double Q-learning / Double DQN (DDQN)
    • Proximal Policy Optimization (PPO)
  • Multi-Agent Reinforcement Learning (MARL)
  • Connectionist Reinforcement Learning
    • Score function estimator / REINFORCE
  • Variance Reduction Techniques (VRT) for gradient estimates
  • Inverse Reinforcement Learning
  • On-policy Learning
    • Temporal-Difference (TD) Learning
    • SARSA
    • (Pure) Policy Gradients Methods
  • Off-policy Learning
    • Q-Learning
  • Exploration Vs. Exploitation problem

Online CoursesEdit

Video LecturesEdit

Lectures NotesEdit

Books and Book ChaptersEdit

  • Lapan, M. (2018). Deep Reinforcement Learning Hands-On. Packt Publishing.
  • Dutta, S. (2018). Reinforcement Learning with TensorFlow. Packt Publishing.
  • Ravichandiran, S. (2018). Hands-On Reinforcement Learning with Python. Packt Publishing.
  • Russell, S. J., & Norvig, P. (2010). "Chapter 21: Reinforcement Learning". Artificial Intelligence: A Modern Approach. Prentice Hall.
  • Alpaydin, E. (2010). "Chapter 18: Reinforcement Learning". Introduction to Machine Learning. MIT Press.
  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press. (draft)
  • Mitchell, T. M. (1997). "Chapter 13: Reinforcement Learning". Machine Learning. McGraw Hill.

Scholarly ArticlesEdit

  • Bard, N. ... (2018). The Hanabi Challenge: A New Frontier for AI Research. arXiv preprint arXiv:1902.00506.
  • Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866.
  • Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Perolat, J., Silver, D., & Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems (pp. 4193-4206).
  • Foerster, J., Assael, I. A., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems (pp. 2137-2145).
  • Ghavamzadeh, M., Mannor, S., Pineau, J., & Tamar, A. (2015). Bayesian reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 8(5-6), 359-483.
  • Szepesvari, C. (2010). Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine 4(1), 1-103.
  • Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2).
  • Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems, 11(3), 387-434.
  • Bowling, M., & Veloso, M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning (No. CMU-CS-00-165). CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE.
  • Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research4, 237-285.
  • Wl, M. H., Harmon, M. E., & Harmon, S. S. (1996). Reinforcement Learning: A Tutorial.
  • Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4), 229-256.



See alsoEdit

Other resourcesEdit