Reinforcement Learning

This page contains resources about Reinforcement Learning.

Subfields and Concepts

 * Action space
 * Discrete
 * Continuous (usually dealt by Actor-Critic Methods)
 * Multi-Armed Bandit
 * Finite Markov Decision Process (MDP)
 * Partially Observable MDP (POMDP)
 * Model-based RL (i.e. model the environment)
 * Model-free RL
 * Value-based Methods
 * Temporal-Difference (TD) Learning
 * SARSA
 * Q-Learning
 * Deep Q-learning / Deep Q Network (DQN)
 * Double Q-learning / Double DQN
 * Dueling DQN
 * Policy-based methods / Policy Optimization
 * (Pure) Policy Gradients Methods
 * Score function estimator / REINFORCE
 * Monte Carlo Policy Gradient
 * Bayesian Policy Gradient
 * Trust Region Policy Optimization (TRPO)
 * Actor-Critic Methods (i.e. combination of Value-based and Policy-based Methods)
 * Advantage-Actor-Critic (A2C)
 * Asynchronous Advantage-Actor-Critic (A3C)
 * Soft Actor Critic (SAC)
 * Neural Fitted Q Iteration with Continuous Actions (NFQCA)
 * Deterministic Policy Gradient (DPG)
 * Deep DPG (DDPG)
 * Twin Delayed DDPG (TD3)
 * Proximal Policy Optimization (PPO)
 * Evolutionary Algorithms
 * Cross­ Entropy Method (CEM)
 * Covariance Matrix Adaptation (CMA)
 * Genetic Algorithms
 * Adaptive Dynamic Programming
 * Deep Reinforcement Learning
 * Deep Q-learning / Deep Q Network (DQN)
 * Deep Recurrent Q-Network (DRQN)
 * Deep Soft Recurrent Q-Network (DSRQN)
 * Double Q-learning / Double DQN (DDQN)
 * Proximal Policy Optimization (PPO)
 * Multi-Agent Reinforcement Learning (MARL)
 * Connectionist Reinforcement Learning
 * Score function estimator / REINFORCE
 * Variance Reduction Techniques (VRT) for gradient estimates
 * Inverse Reinforcement Learning
 * On-policy Learning
 * Temporal-Difference (TD) Learning
 * SARSA
 * (Pure) Policy Gradients Methods
 * Off-policy Learning
 * Q-Learning
 * Exploration Vs. Exploitation problem

Video Lectures

 * Practical Reinforcement Learning - Coursera
 * Machine Learning and Reinforcement Learning in Finance Specialization by Igor Halperin - Coursera
 * Reinforcement Learning by Pascal Poupart

Lectures Notes

 * Reinforcement Learning by David Silver
 * Reinforcement Learning by Michael Herrmann
 * Deep Reinforcement Learning by Sergey Levine
 * Deep Reinforcement Learning and Control by Katerina Fragkiadaki and Ruslan Satakhutdinov
 * Reinforcement Learning by Mario Martin

Books and Book Chapters

 * Lapan, M. (2018). Deep Reinforcement Learning Hands-On. Packt Publishing.
 * Dutta, S. (2018). Reinforcement Learning with TensorFlow. Packt Publishing.
 * Ravichandiran, S. (2018). Hands-On Reinforcement Learning with Python. Packt Publishing.
 * Russell, S. J., & Norvig, P. (2010). "Chapter 21: Reinforcement Learning". Artificial Intelligence: A Modern Approach. Prentice Hall.
 * Alpaydin, E. (2010). "Chapter 18: Reinforcement Learning". Introduction to Machine Learning. MIT Press.
 * Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press. (draft)
 * Mitchell, T. M. (1997). "Chapter 13: Reinforcement Learning". Machine Learning. McGraw Hill.

Scholarly Articles

 * Bard, N. ... (2018). The Hanabi Challenge: A New Frontier for AI Research. arXiv preprint arXiv:1902.00506.
 * Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). A brief survey of deep reinforcement learning. arXiv preprint arXiv:1708.05866.
 * Lanctot, M., Zambaldi, V., Gruslys, A., Lazaridou, A., Perolat, J., Silver, D., & Graepel, T. (2017). A unified game-theoretic approach to multiagent reinforcement learning. In Advances in Neural Information Processing Systems (pp. 4193-4206).
 * Foerster, J., Assael, I. A., de Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems (pp. 2137-2145).
 * Ghavamzadeh, M., Mannor, S., Pineau, J., & Tamar, A. (2015). Bayesian reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 8(5-6), 359-483.
 * Szepesvari, C. (2010). Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine , 4(1), 1-103.
 * Busoniu, L., Babuska, R., & De Schutter, B. (2008). A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2).
 * Panait, L., & Luke, S. (2005). Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems, 11(3), 387-434.
 * Bowling, M., & Veloso, M. (2000). An analysis of stochastic game theory for multiagent reinforcement learning (No. CMU-CS-00-165). CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE.
 * Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research, 4, 237-285.
 * Wl, M. H., Harmon, M. E., & Harmon, S. S. (1996). Reinforcement Learning: A Tutorial.
 * Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4), 229-256.

Software

 * PyBrain - Python
 * OpenAI Gym - A toolkit for developing and comparing Reinforcement Learning algorithms
 * Reinforcement-Learning-Toolkit - Python
 * DeepRL - Python
 * coach - Python
 * Griduniverse - Python
 * Retro (formerly Universe) - Python
 * RoomAI - Python
 * ViZDoom - Python, C++, Lua, Java and Julia
 * tensorforce - Python and TensorFlow
 * keras-gym - Python, TensorFlow and Gym
 * spinningup - Python, TensorFlow and Gym

Other resources

 * Awesome-RL (GitHub) - A curated list of Reinforcement Learning resources
 * awesome-deep-rl (GitHub) - A curated list of Deep Reinforcement Learning resources
 * awesome-rl (GitHub) - A curated list of Deep Reinforcement Learning resources
 * Practical_RL( GitHub) - A course in reinforcement learning in the wild
 * Deep RL Bootcamp - Videos and slides
 * Software Tools for RL, ANNs and Robotics - Python and MATLAB
 * Reinforcement Learning - blog post
 * Learning Diverse Skills via Maximum Entropy Deep Reinforcement Learning - blog post
 * I’m a bandit - Random topics in optimization, probability, and statistics. By Sébastien Bubeck - blog
 * Simple Reinforcement Learning with Tensorflow (Part 0 to 9) - blog posts
 * Paper Collection of Multi-Agent Reinforcement Learning (MARL)
 * Practical_RL - Github
 * AgentNet - Github
 * DataLab Cup 5: Deep Reinforcement Learning
 * Reinforcement learning tutorial using Python and Keras - blog post
 * Reinforcement Learning w/ Keras + OpenAI: Actor-Critic Models - blog post
 * Deep Q-Learning with Keras and Gym - blog post
 * deep-q-learning - Github
 * keras-rl - Github
 * Reinforcement Q-Learning from Scratch in Python with OpenAI Gym - blog post with code
 * A (Long) Peek into Reinforcement Learning - blog post
 * trfl (GitHub) - code
 * multiworld (GitHub) - code
 * Deep Reinforcement Learning Workshop at NIPS 2018
 * Spinning Up in Deep RL - educational resource produced by OpenAI
 * Horizon (GitHub) - code
 * Unsupervised Neural Networks Fight in a Minimax Game - blog post
 * Cases for Applying Multi-Agent Reinforcement Learning - blog post
 * surreal (GitHub) - code
 * robotics-rl-srl (GitHub) - code
 * Deep Reinforcement Learning with TensorFlow 2.0 - blog post
 * rllab (GitHub) - code
 * garage (GitHub) - code
 * metaworld (GitHub) - code
 * RLBench (GitHub) - code
 * stable-baselines (GitHub) - code
 * RLzoo(GitHub) - code
 * RLlib - code