These methods are based on an
optimality criteria where the agent tries to maximize the expected
payoff per step (rather than the expected discounted sum of
rewards). Undiscounted methods are less well-understood than
discounted methods at the present time, and are also significantly
harder to analyze theoretically.
publications