Reinforcement learning is an area of machine learning which addresses how an autonomous agent can learn long-term successful behavior through interaction with its environment. The term reinforcement learning has its roots in behavioral psychology, in particular to Pavlovian models of reward learning in animals. The modern theory of reinforcement learning, however, is much more influenced by mathematical theories of optimal control in operations research, such as dynamic programming.
Reinforcement learning differs from the more well-studied problem of
supervised learning, in that the learner is not given input-output
samples of the desired behavior. Rather, the learner is only supplied
scalar feedback regarding the appropriateness of the actions, after
they have been carried out. What makes this credit assignment problem
even harder is that the feedback could be significantly delayed
(e.g. win/loss in an extended game).