Technology
Q learning
Q-learning is a model-free, off-policy reinforcement learning algorithm: it finds the optimal action-value function (Q-function) to maximize cumulative future rewards through trial-and-error.
Q-learning is a fundamental reinforcement learning algorithm: it trains an agent to act optimally without a model of the environment (model-free). The core mechanism involves iteratively updating a Q-table, which stores the 'quality' (Q) value for taking a specific action in a given state. This update leverages the Temporal Difference (TD) learning rule, derived from the Bellman equation, to factor in the maximum expected future reward. The algorithm is off-policy: it learns the optimal policy while using a separate, often $\epsilon$-greedy, policy for exploration. This process guarantees the agent converges on an optimal action-selection policy for any finite Markov Decision Process, maximizing its long-term reward.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1