This wiki has had no edits or log actions made within the last 45 days and has been automatically marked as inactive. If you would like to prevent this wiki from being closed, please start showing signs of activity here. If there are no signs of this wiki being used within the next 15 days, this wiki will be closed in accordance to the Dormancy Policy (which all wiki founders accept when requesting a wiki). If this wiki is closed and no one reopens it 135 days from now, this wiki will become eligible for deletion. Note: If you are a bureaucrat, you can go to Special:ManageWiki and uncheck "inactive" yourself.

Reinforcement Learning

From Ioannis Kourouklides
Revision as of 06:47, 30 May 2020 by Kourouklides (talk | contribs) (→‎Other resources)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This page contains resources about Reinforcement Learning.

Subfields and Concepts[edit]

  • Action space
    • Discrete
    • Continuous (usually dealt by Actor-Critic Methods)
  • Multi-Armed Bandit
  • Finite Markov Decision Process (MDP)
  • Partially Observable MDP (POMDP)
  • Model-based RL (i.e. model the environment)
  • Model-free RL
    • Value-based Methods
      • Temporal-Difference (TD) Learning
      • SARSA
      • Q-Learning
        • Deep Q-learning / Deep Q Network (DQN)
        • Double Q-learning / Double DQN
        • Dueling DQN
    • Policy-based methods / Policy Optimization
      • (Pure) Policy Gradients Methods
      • Trust Region Policy Optimization (TRPO)
    • Actor-Critic Methods (i.e. combination of Value-based and Policy-based Methods)
      • Advantage-Actor-Critic (A2C)
      • Asynchronous Advantage-Actor-Critic (A3C)
      • Soft Actor Critic (SAC)
      • Neural Fitted Q Iteration with Continuous Actions (NFQCA)
      • Deterministic Policy Gradient (DPG)
      • Deep DPG (DDPG)
      • Twin Delayed DDPG (TD3)
      • Proximal Policy Optimization (PPO)
  • Evolutionary Algorithms
    • Cross­ Entropy Method (CEM)
    • Covariance Matrix Adaptation (CMA)
    • Genetic Algorithms
  • Adaptive Dynamic Programming
  • Deep Reinforcement Learning
    • Deep Q-learning / Deep Q Network (DQN)
    • Deep Recurrent Q-Network (DRQN)
    • Deep Soft Recurrent Q-Network (DSRQN)
    • Double Q-learning / Double DQN (DDQN)
    • Proximal Policy Optimization (PPO)
  • Multi-Agent Reinforcement Learning (MARL)
  • Connectionist Reinforcement Learning
    • Score function estimator / REINFORCE
  • Variance Reduction Techniques (VRT) for gradient estimates
  • Inverse Reinforcement Learning
  • On-policy Learning
    • Temporal-Difference (TD) Learning
    • SARSA
    • (Pure) Policy Gradients Methods
  • Off-policy Learning
    • Q-Learning
  • Exploration Vs. Exploitation problem

Online Courses[edit]

Video Lectures[edit]

Lectures Notes[edit]

Books and Book Chapters[edit]

  • Lapan, M. (2018). Deep Reinforcement Learning Hands-On. Packt Publishing.
  • Dutta, S. (2018). Reinforcement Learning with TensorFlow. Packt Publishing.
  • Ravichandiran, S. (2018). Hands-On Reinforcement Learning with Python. Packt Publishing.
  • Russell, S. J., & Norvig, P. (2010). "Chapter 21: Reinforcement Learning". Artificial Intelligence: A Modern Approach. Prentice Hall.
  • Alpaydin, E. (2010). "Chapter 18: Reinforcement Learning". Introduction to Machine Learning. MIT Press.
  • Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press. (draft)
  • Mitchell, T. M. (1997). "Chapter 13: Reinforcement Learning". Machine Learning. McGraw Hill.

Scholarly Articles[edit]

  • Bard, N. ... (2018). The Hanabi Challenge: A New Frontier for AI Research. arXiv preprint arXiv:1902.00506.
  • Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866.
  • Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Perolat, J., Silver, D., & Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems (pp. 4193-4206).
  • Foerster, J., Assael, I. A., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems (pp. 2137-2145).
  • Ghavamzadeh, M., Mannor, S., Pineau, J., & Tamar, A. (2015). Bayesian reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 8(5-6), 359-483.
  • Szepesvari, C. (2010). Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine 4(1), 1-103.
  • Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2).
  • Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems, 11(3), 387-434.
  • Bowling, M., & Veloso, M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning (No. CMU-CS-00-165). CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE.
  • Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research4, 237-285.
  • Wl, M. H., Harmon, M. E., & Harmon, S. S. (1996). Reinforcement Learning: A Tutorial.
  • Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4), 229-256.



See also[edit]

Other resources[edit]