Technology

Q learning

Q-learning is a model-free, off-policy reinforcement learning algorithm: it finds the optimal action-value function (Q-function) to maximize cumulative future rewards through trial-and-error.

Q-learning is a fundamental reinforcement learning algorithm: it trains an agent to act optimally without a model of the environment (model-free). The core mechanism involves iteratively updating a Q-table, which stores the 'quality' (Q) value for taking a specific action in a given state. This update leverages the Temporal Difference (TD) learning rule, derived from the Bellman equation, to factor in the maximum expected future reward. The algorithm is off-policy: it learns the optimal policy while using a separate, often $\epsilon$-greedy, policy for exploration. This process guarantees the agent converges on an optimal action-selection policy for any finite Markov Decision Process, maximizing its long-term reward.

https://en.wikipedia.org/wiki/Q-learning

1 project · 1 city

Related technologies

Python 611 PyTorch 263

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Chess via Deep Q-Learning & MCTS

Los Angeles Jun 9

PyTorch Python