Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)

REINFORCEMENT LEARNING AND OPTIMAL CONTROL

BOOKS, VIDEOLECTURES, AND COURSE MATERIAL

Dimitri P. Bertsekas

Recently Posted Videolectures

Reinforcement Learning, Model Predictive Control, and the Newton step for solving Bellman's equation Lecture at Harvard University, June, 2025. Slides.

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization Lecture at the ASU Mathematics Department, April, 2025. Slides.

Computer Chess with Model Predictive Control and Reinforcement Learning At Reinforcement Learning Workshop, The Indian Institute of Science, Bengaluru, India, Jan. 2025. Slides.

Model Predictive Control, and Reinforcement Learning: A Unified Framework Based on Dynamic Programming , Plenary talk at IFAC NMPC, Kyoto, Japan, Aug. 2024. Slides.

Distributed and Multiagent Reinforcement Learning , at Workshop on Intersections between Control, Learning and Optimization, IPAM, 2020. Slides.

Abandon reinforcement learning in favor of model predictive control: An interesting statement by deep learning pioneer Yann LeCun. See the followup comments and further discussion

The Textbook for my Reinforcement Learning Course at ASU

2nd Edition, 2025

Free download

A Review by Antonio Montano

Author's Description

This is the main textbook I use for my course at ASU. It is based on the class notes and the books I wrote over the years 2019-2025. It is a standalone book, but can also be used in conjunction with my videolectures and slides, available at this site.

The book can be downloaded and used freely for instructional purposes. It is available in digital form from Google Play, and will be soon available in print from the publishing company.

The book is about 500 pages long and includes solved end-of-chapter exercises. It places primary emphasis on intuitive reasoning, based on the mathematical framework of dynamic programming. While mathematical proofs are deemphasized, the textbook relies on the theoretical development and analysis given in my Dynamic Programming (DP) and Reinforcement Learning (RL) books listed at this site. All of these books share a consistent notation and terminology.

An important structural characteristic of the textbook is that it is organized in a modular way, with a view towards flexibility, so it can accommodate variations in course content. In particular, it is divided in two parts:

(1) A foundational platform , which consists of Chapter 1. It contains a selective overview of the approximate DP/RL landscape, and a starting point for a more detailed in-class development of other RL topics, whose choice can be at the instructor's discretion. It includes the conceptual framework and visualizations for approximation in value space based on Newton's method for solving the Bellman equation. It also sets a foundation for the development of the Model Predictive Control (MPC) methodology.

(2) An in-depth coverage of selected methodologies. In Chapter 2, we discuss methods of approximation in value space with one-step or multi-step lookahead. Methods of deterministic and stochastic rollout, and lookahead tree search receive special attention. Other topics of interest include multiagent rollout, adaptive control by rollout reoptimization, Bayesian optimization, and minimax problems. In Chapter 3, we discuss off-line training of neural networks and other approximation architectures, in conjunction with policy iteration/self-learning, Q-learning, policy gradient, and aggregation methods.

In a different course, other choices for in-depth coverage may be made, using the same foundational platform. For example, an optimal control/MPC/adaptive control course can be built upon the platform of Chapter 1. Similarly, more and less mathematically-oriented courses can be built upon this platform.

Chapter 1, Exact and Approximate Dynamic Programming. Contents: AlphaZero Off-Line Training, and On-Line Play, Deterministic Dynamic Programming, Stochastic Exact and Approximate Dynamic Programming, Infinite Horizon Problems - An Overview, Infinite Horizon Linear Quadratic Problems, Conceptual Framework for Approximation in Value Space - Newton's Method, Examples Reformulations, and Simplifications (POMDP, Model Predictive and Adaptive Control), Relations of Reinforcement Learning and Decision/Control.

Chapter 2, Approximation in Value Space - Rollout Algorithms. Contents: Deterministic Finite Horizon Problems, Approximation in Value Space - Deterministic Problems, Rollout Algorithms for Discrete Optimization, Rollout and Approximation in Value Space with Multistep Lookahead, Constrained Forms of Rollout Algorithms, Small Stage Costs and Long Horizon - Continuous-Time Rollout, Stochastic Rollout and Monte Carlo Tree Search, Rollout for Infinite-Spaces Problems - Optimization, Multiagent Rollout, Rollout for Bayesian Optimization and Sequential Estimation, Adaptive Control by Rollout with a POMDP Formulation, Rollout for Minimax Control, Application to Computer...

Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy