A key question regarding reinforcement learning
methods is whether they will converge, in the limit, to the optimal
value function. Several of the most popular discounted algorithms,
such as Q-learning and TD(lambda), have convergence proofs, assuming
that the value function is represented in a tabular
fashion. Theoretical analysis of these methods in the general case
where a nonlinear function approximator is used is still an open
problem, as is the case when undiscounted methods are used.
publications