Online Learning

This page contains resources about Online Learning and Sequential Prediction.

Subfields and Concepts

 * Recursive Least Squares
 * Mini-Batch Learning
 * Mini-Batch Gradient Descent Methods
 * Decision Theory
 * Information Theory
 * Entropy
 * Kullback-Leibler (KL) Divergence
 * Game Theory
 * Minimax Theorem
 * Blackwell's Approachability
 * Online Dictionary Learning
 * Online Algorithms
 * Wake-Sleep Algorithm
 * Auto-Encoding Variational Bayes (AEVB) Algorithm
 * Online Convex Optimization
 * Regret Bound
 * Bregman Divergence
 * No-regret Learning
 * Online Gradient Descent
 * Online Subgradient Descent
 * Mirror Descent
 * Stochastic Gradient Descent (SGD)
 * Mini-batch Gradient Descent Methods
 * Follow The Regularized Leader (FTRL)
 * Multi-Armed Bandit (MAB)
 * Regularization
 * L2-regularization / Tikhonov regularization / Ridge regression
 * L1-regularization / Least absolute shrinkage and selection operator (LASSO)
 * Matrix Regularization

Video Lectures

 * Online Learning with a Memory Harness by Shai Shalev-Shwartz - VideoLectures.NET
 * Trading Regret Rate for Computational Efficiency in Online Learning with Limited Feedback by Shai Shalev-Shwartz - VideoLectures.NET

Lecture Notes

 * Statistical Learning Theory and Sequential Prediction by Alexander Rakhlin and Karthik Sridharan
 * Machine Learning Theory by Karthik Sridharan
 * Prediction and Learning: It's Only a Game by Jacob Abernethy
 * Learning Theory by Sham Kakade and Ambuj Tewari
 * Statistical Learning Theory by Prof. Dmitry Panchenko
 * Introduction to Machine Learning by Shai Shalev-Shwartz
 * Statistical Learning Theory by Maxim Raginsky
 * Introduction to Online Optimization by Sebastien Bubeck

Books and Book Chapters

 * Hazan, E. (2015). Introduction to online convex optimization. Foundations and Trends® in Optimization, 2(3-4), 157-325.
 * Theodoridis, S. (2015). "Chapter 8: Parameter Learning: A Convex Analytic Path". Machine Learning: A Bayesian and Optimization Perspective. Academic Press.
 * Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press.
 * Sra, S., Nowozin, S., & Wright, S. J. (2012). Optimization for machine learning. MIT Press.
 * Hazan, E. (2011). "Chapter 10: The Convex Optimization Approach to Regret Minimization". Optimization for machine learning. MIT Press.
 * Shalev-Shwartz, S. (2011). Online Learning and Online Convex Optimization. Foundations and Trends® in Machine Learning, 4(2), 107-194.

Scholarly Articles

 * Villa, S., Rosasco, L. & Poggio, T. (2013). On Learning, Complexity and Stability. arXiv preprint arXiv:1303.5976.
 * Arora, S., Hazan, E., & Kale, S. (2012). The Multiplicative Weights Update Method: A Meta-Algorithm and Applications. Theory of Computing, 8(1), 121-164.
 * Shalev-Shwartz, S. (2011). Online learning and online convex optimization. Foundations and Trends® in Machine Learning, 4(2), 107-194.
 * Abernethy, J., Bartlett, P. L., & Hazan, E. (2011). Blackwell Approachability and No-Regret Learning are Equivalent. In COLT (pp. 27-46).
 * Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th International Conference on Machine Learning (pp. 689-696).
 * Ying, Y., & Pontil, M. (2008). Online gradient descent learning algorithms. Foundations of Computational Mathematics, 8(5), 561-596.
 * Shalev-Shwartz, S. (2007). Online Learning: Theory, Algorithms, and Applications. PhD Diss. Hebrew University.
 * Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge University Press.
 * Zinkevich, M. (2003). Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (pp. 928–936).
 * Dietterich, T. G. (2002). Machine learning for sequential data: A review. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (pp. 15-30). Springer Berlin Heidelberg.

Other resources

 * Wiki for research in Online Prediction
 * How large should the batch size be for stochastic gradient descent? - Cross Validated Stackexchange
 * Should training samples randomly drawn for mini-batch training neural nets be drawn without replacement? - Cross Validated Stackexchange