Publications

The following is a list of publications in reverse chronological order. Many of the references have links that allow you access to that publication. Please contact the lab or a specific author if you have any questions. Links associated with a particular reference (e.g., conference) were current at the time of publication.


2017

  • Durugkar, I., Gemp, I., Mahadevan, S.
    Generative Multi-Adversarial Networks
    In International Conference on Learning Representations, 2017.
    [ pdf ]

  • Guo, X., Klinger, T., Rosenbaum, C., Bigus, J. P., Campbell, M., Kawas, B., Talamadupula, K., Tesauro, G., Singh, S.
    Learning to Query, Reason, and Answer Questions On Ambiguous Texts
    In International Conference on Learning Representations, 2017.
    [ pdf ]

2016

  • Dernbach, S., Taft, N., Kurose, J., Weinsberg, U., Diot, C., and Ashkan, A.
    Cache Content-Selection Policies for Streaming Video Services
    Proceedings of the IEEE International Conference on Computer Communications, 2016.
    [ pdf ]

2015

  • Wang, L., Feng, M., Zhou, B., Xiang, B. and Mahadevan, S.
    Efficient hyper-parameter optimization for NLP applications
    Empirical Methods in Natural language processing (EMNLP), Portugal, September 2015.
    [ pdf ]

  • Mahadevan, S. and Chandar, S.
    Reasoning about Linguistic Regularities in Word Embeddings using Matrix Manifolds
    Arxiv, July 2015.
    [ pdf ]

  • Theocharous, G., Thomas, P. and Ghavamzadeh, M.
    Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees
    Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2015.
    [ pdf ]

  • Giguere, S., Carey, C., Boucher, T., Mahadevan, S. and Dyar, M.D.
    An Optimization Perspective on Baseline Removal for Spectroscopy
    Proceedings of the 5th IJCAI Workshop on Artificial Intelligence in Space, 2015.

  • Boucher, T., Carey, C., Giguere, S., Mahadevan, S., Dyar, M.D., Clegg, S. and Wiens, R.
    Manifold Learning for Regression of Mars Spectra
    Proceedings of the 5th IJCAI Workshop on Artificial Intelligence in Space, 2015.

  • Carey, C., Boucher, T., Giguere, S., Mahadevan, S. and Dyar, M.D.
    Automatic Whole-Spectrum Matching
    Proceedings of the 5th IJCAI Workshop on Artificial Intelligence in Space, 2015.

  • Theocharous, G., Thomas, P. and Ghavamzadeh, M.
    Ad Recommendation Systems for Life-Time Value Optimization
    TargetAd 2015: Ad Targeting at Scale, at the World Wide Web Conference, 2015.
    [ pdf ]

  • Liu, B., Liu, J., Ghavamzadeh, M., Mahadevan, S., and Petrik, M.
    Finite-Sample Analysis of Proximal Gradient TD Algorithms
    Proceedings of the 31th Conference on Uncertainty in Artificial Intelligence (UAI), 2015.
    [ pdf ]

  • Thomas, P., Theocharous, G. and Ghavamzadeh, M.
    High Confidence Policy Improvement
    Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
    [ pdf ]

  • Boucher, T., Ozanne, M., Carmosino, M., Dyar, M.D., Mahadevan, S., Breves, E., Lepore, K. and Clegg, S.
    A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy
    Spectrochimica Acta Part B, vol. 107, pp. 1-10, 2015.
    [ pdf ]

  • Gemp, I. and Mahadevan, S.
    Finding Equilibria in Large Games using Variational Inequalities
    AAAI Spring Symposium on Applied Computational Game Theory, Stanford, CA, 2015.
    [ pdf ]

  • Boucher, T., Carey C., Mahadevan, S., and Dyar, M.D.
    Aligning Mixed Manifolds
    Proceedings of the AAAI Conference, Austin, Texas, 2015.
    [ pdf ]

  • Thomas, P., Theocharous, G., and Ghavamzadeh, M.
    High Confidence Off-Policy Evaluation
    Proceedings of the AAAI Conference, Austin, Texas, 2015.
    [ pdf ]

  • Gemp, I. and Mahadevan, S.
    Solving Large Scale Substainable Supply Chain Networks using Variational Inequalities
    AAAI Workshop on Computational Sustainability, Austin, TX, 2015.
    [ pdf ]

2014

  • da Silva, B.C., Konidaris, G., and Barto, A.G.
    Active Learning of Parameterized Skills
    Proceedings of the 31st International Conference on Machine Learning (ICML 2014). Beijing, China, 2014.
    [ pdf ]

  • Thomas, P. S.
    GeNGA: A Generalization of Natural Gradient Ascent with Positive and Negative Convergence Results
    Proceedings of the 31st International Conference on Machine Learning (ICML 2014). Beijing, China, 2014.
    [ pdf ]

  • Thomas, P. S.
    Bias in Natural Actor-Critic Algorithms
    Proceedings of the 31st International Conference on Machine Learning (ICML 2014). Beijing, China, 2014.
    [ pdf ]

  • da Silva, B.C., Baldassarre, G., Konidaris, G., and Barto, A.G.
    Learning Parameterized Motor Skills on a Humanoid Robot
    Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA 2014). Hong Kong, China, 2014.
    [ pdf | video ]

  • Mahadevan, S., Liu, B., Thomas, P. S., Dabney, W., Giguere, S., Jacek, N., Gemp, I., and Liu, J.
    Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces
    Arxiv, May 26, 2014.
    [ pdf | arxiv ]

  • Carey, C. and Mahadevan, S.
    Manifold Spanning Graphs
    Proceedings of the 28th Conference on Artificial Intelligence (AAAI), 2014.
    [ pdf ]

  • Dabney, W., and Thomas, P. S.
    Natural Temporal Difference Learning
    Proceedings of the 28th Conference on Artificial Intelligence (AAAI), 2014.
    [ pdf ]

2013

  • Thomas, P. S., Dabney, W., Mahadevan, S., and Giguere, S.
    Projected natural actor-critic.
    Advances in Neural Information Processing Systems 26, 2013.
    [ pdf ]

  • Shah, A., Barto, A.G., and Fagg, A.H.
    A Dual Process Account of Coarticulation in Motor Skill Acquisition.
    Journal of Motor Behavior, volume 45, pages 531--549.
    [ DOI link ]

  • Wang, C. and Mahadevan, S.
    Manifold Alignment Preserving Global Geometry.
    Proceedings of the IJCAI Conference, August 3-9, 2013, Beijing, China.
    [ pdf ]

  • Mahadevan, S., Giguere, S., and Jacek, N.
    Basis Adaptation for Sparse Nonlinear Reinforcement Learning.
    Proceedings of the AAAI Conference, July 14-18, 2013, Bellevue, Washington.
    [ pdf ]

  • Wang, C. and Mahadevan, S.
    Multiscale Manifold Learning.
    Proceedings of the AAAI Conference, July 14-18, 2013, Bellevue, Washington.
    [ pdf ]

  • Kuindersma, S., Grupen, R., and Barto, A.
    Variable risk control via stochastic optimization.
    The International Journal of Robotics Research, June 2013, vol. 32 no. 7 806-825.
    [ pdf ]

2012

  • Thomas, P. and Barto, A. G.
    Motor primitive discovery
    Proceedings of the IEEE Conference on Development and Learning and Epigenetic Robotics, 2012.
    [ pdf ]

  • Liu, B., Mahadevan, S., and Liu, J.
    Regularized Off-Policy TD-Learning.
    Proceedings of the Conference on Neural Information Processing Systems (NIPS), December 1-3, 2012, Lake Tahoe, CA.
    [ pdf ]

  • S. Niekum, S. Osentoski, G.D. Konidaris, and Andrew G. Barto.
    Learning and Generalization of Complex Tasks from Unstructured Demonstrations.
    IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5239-5246, October 2012.
    [ pdf ]

  • Mahadevan, S. and Liu, B.
    Sparse Q-learning with Mirror Descent.
    Proceedings of the Conference on Uncertainty in AI (UAI), August 15-17, 2012, Catalina Island, CA.
    [ pdf ]

  • Vu, H., Carey, C., and Mahadevan, S.
    Manifold Warping: Manifold Alignment over Time.
    Proceedings of the 26th Conference on Artificial Intelligence (AAAI), July 22-26, 2012, Toronto, Canada.
    [ pdf ]

  • Wang, C. and Mahadevan, S.
    Manifold Alignment Preserving Global Geometry.
    Technical Report, UMass Computer Science Department UM-CS-2012-031, 2012.
    [ pdf ]

  • Wang, C., Liu, B., Vu, H., and Mahadevan, S.
    Sparse Manifold Alignment.
    Technical Report, UMass Computer Science UM-2012-030, 2012.
    [ pdf ]

  • Kuindersma, S.R.
    Variable Risk Policy Search for Dynamic Robot Control.
    PhD thesis, Department of Computer Science, University of Massachusetts Amherst.
    [ pdf ]

  • da Silva, B.C. and Barto, A.G.
    TD-Δπ: A Model-Free Algorithm for Efficient Exploration
    Proceedings of the 25th Conference on Artificial Intelligence (AAAI 2012). Toronto, Canada, July 2012.
    [ pdf ]

  • Kuindersma, S., Grupen, R., and Barto, A.
    Variational Bayesian Optimization for Runtime Risk-Sensitive Control.
    In Robotics: Science and Systems VIII, Sydney, Australia, July 2012.
    [ pdf ]

  • Kuindersma, S., Grupen, R., and Barto, A.
    Variable Risk Dynamic Mobile Manipulation.
    In RSS 2012 Mobile Manipulation Workshop, Sydney, Australia, July 2012.
    [ pdf ]

  • da Silva, B.C., Konidaris, G., and Barto, A.G.
    Learning Parameterized Skills.
    In Proceedings of the 29th International Conference on Machine Learning (ICML 2012). Edinburg, Scotland, June 2012.
    [ pdf ]

  • da Silva, B.C., Barto, A.G., and Kurose, J.
    Designing Adaptive Sensing Policies for Meteorological Phenomena via Spectral Analysis of Radar Images.
    Technical Report UM-CS-2012-006, Department of Computer Science, University of Massachusetts Amherst, March 2012.
    [ pdf ]

  • Konidaris, G., Kuindersma, S., Grupen, R., and Barto, A.
    Robot Learning from Demonstration by Constructing Skill Trees.
    The International Journal of Robotics Research 31(3), pages 260-275, March 2012.
    [ pdf | ijrr ]

2011

  • Konidaris, G.D., Niekum, S., and Thomas, P.S.
    TDγ: Reevaluating Complex Backups in Temporal Difference Learning.
    In Advances in Neural Information Processing Systems 24 (NIPS). Granada, Spain. December 2011.
    [ pdf ]

  • Niekum, S., and Barto, A.G.
    Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery.
    In Advances in Neural Information Processing Systems 24 (NIPS). Granada, Spain. December 2011.
    [ pdf ]

  • Thomas, P.S.
    Policy Gradient Coagent Networks.
    In Advances in Neural Information Processing Systems 24 (NIPS). Granada, Spain. December 2011.
    [ pdf ]

  • Kuindersma, S., Grupen, R., and Barto, A.
    Learning Dynamic Arm Motions for Postural Recovery.
    Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots (Humanoids '11). Bled, Slovenia, October 2011.
    [ pdf ]

  • Thomas, P. and Barto, A.
    Conjugate Markov Decision Processes.
    Proceedings of the Twenty-Eighth International Conference on Machine Learning (ICML '11), June 2011.
    [ pdf ] [ source code ]

  • Konidaris, G., Osentoski, S., and Thomas, P.
    Value Function Approximation in Reinforcement Learning using the Fourier Basis.
    Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI '11), August 2011.
    [ pdf ]

  • Konidaris, G., Kuindersma, S., Grupen, R., and Barto, A.
    CST: Constructing Skill Trees by Demonstration.
    In Proceedings of the ICML Workshop on New Developments in Imitation Learning, July 2011.
    [ pdf ]

  • Konidaris, G., Kuindersma, S., Grupen, R., and Barto, A.
    Acquiring Transferrable Mobile Manipulation Skills.
    In the RSS 2011 Workshop on Mobile Manipulation: Learning to Manipulate, June 2011.
    [ pdf ]

  • Konidaris, G.
    Autonomous Robot Skill Acquisition.
    PhD thesis, Computer Science, University of Massachusetts Amherst.
    [ pdf ]

  • Wang, C. and Mahadevan, M.
    Heterogeneous Domain Adaptation using Manifold Alignment.
    In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-11). July 18-23, 2011, Barcelona, Spain.
    [ pdf ]

  • Wang, C. and Mahadevan, M.
    Jointly Learning Data-Depdendent Label and Locality-Preserving Projections.
    In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-11). July 18-23, 2011, Barcelona, Spain.
    [ pdf ]

  • Foster, B., Mahadevan, S., and Wang, R.
    GPU-Based Approximate SVD Algorithm.
    9th International Conference on Parallel Programming and Mathematics, Torun, Poland, September 11-14, 2011.
    [ pdf ]
    (also available as Technical Report UM-CS-2011-025, Univ. of Massachusetts, Amherst)

  • Wang. C., Krafft, P., and Mahadevan, S.
    Manifold Alignment.
    appearing in Manifold Learning: Theory and Applications, Taylor and Francis CRC Press, 2011
    [ pdf ]

  • Liu, B. and Mahadevan, S.
    Compressive Reinforcement Learning with Oblique Random Projections.
    Technical Report UM-CS-2011-024, Department of Computer Science, University of Massachusetts at Amherst, 2011.
    [ pdf ]

  • Konidaris, G., Kuindersma, S., Grupen, R., and Barto, A.
    Autonomous Skill Acquisition on a Mobile Manipulator.
    In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11). San Francisco, CA. August 2011.
    [ pdf ]

2010

  • Kuindersma, S., Konidaris, G., Grupen, R., and Barto, A. (2010).
    Learning from a Single Demonstration: Motion Planning with Skill Segmentation (poster abstract).
    NIPS Workshop on Learning and Planning in Batch Time Series Data. Whistler, BC. December 2010.
    [ pdf ]

  • Konidaris, G., Kuindersma, S., Barto, A., and Grupen, R. (2010).
    Constructing Skill Trees for Reinforcement Learning Agents from Demonstration Trajectories.
    In Advances in Neural Information Processing Systems 23 (NIPS). Vancouver, BC. December 2010.
    [ pdf ]

  • Wang, C. and Mahadevan, S. (2010)
    Multiscale Manifold Alignment.
    Technical Report UM-CS-2010-049, Department of Computer Science, University of Massachusetts at Amherst.
    [ pdf ]

  • Wang, C. and Mahadevan, S. (2010)
    Learning Locality Preserving Discriminative Features.
    Technical Report UM-CS-2010-048, Department of Computer Science, University of Massachusetts at Amherst.
    [ pdf ]

  • Wang, C. (2010)
    A Geometric Framework for Transfer Learning Using Manifold Alignment.
    PhD thesis, Computer Science, University of Massachusetts Amherst.
    [ pdf ]

  • Theocharous, G. and Mahadevan, S. (2010)
    Compressing POMDPs using Locality Preserving Non-Negative Matrix Factorization.
    24th Conference on Artificial Intelligence (AAAI '10), Atlanta, GA, July 11-15, 2010.
    [ pdf ]

  • Mahadevan, S. (2010)
    Representation Discovery in Sequential Decision Making.
    24th Conference on Artificial Intelligence (AAAI '10), Atlanta, GA, July 11-15, 2010.
    [ pdf ]

  • Osentoski, S. and Mahadevan, S. (2010)
    Basis Function Construction in Hierarchical Reinforcement Learning.
    9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS '10), Toronto, Canada, May 10-14, 2010.
    [ pdf ]

  • Vigorito, C.M. and Barto, A.G. (2010)
    Intrinsically Motivated Hierarchical Skill Learning in Structured Environments.
    IEEE Transactions on Autonomous Mental Development (IEEE TAMD). Volume 2, Issue 2.
    [ pdf ]

  • Kuindersma, S. (2010)
    Control Model Learning for Whole-Body Mobile Manipulation (extended abstract).
    Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI-10). Atlanta, GA. July, 2010.
    [ pdf ]

  • Niekum, S. (2010)
    Evolved Intrinsic Reward Functions for Reinforcement Learning (extended abstract).
    Proceedings of the Twenty-Fourth Conference on Artificial Intelligence (AAAI-10). Atlanta, GA. July, 2010.

  • Wolfe, A.P. (2010)
    Paying Attention To What Matters: Observation Abstraction In Partially Observable Environments.
    PhD thesis, Computer Science, University of Massachusetts Amherst.
    [ pdf ]

  • Johns, J.T. (2010)
    Basis Construction and Utilization for Markov Decision Processes Using Graphs.
    PhD thesis, Computer Science, University of Massachusetts Amherst.
    [ pdf ]

2009

  • Osentoski, S. (2009)
    Action-Based Representation Discovery in Markov Decision Processes.
    PhD thesis, Computer Science, University of Massachusetts Amherst.
    [ pdf ]

  • Konidaris, G.D. and Barto, A.G. (2009)
    Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining.
    In Y. Bengio, D. Schuurmans, J. Lafferty, C.K.I. Williams and A. Culotta (Eds.), Advances in Neural Information Processing Systems 22 (NIPS '09), pp. 1015-1023.
    [ pdf ]

  • Vigorito, C.M. and Barto, A.G. (2009)
    Incremental Structure Learning in Factored MDPs with Continuous States and Actions.
    Technical Report UM-CS-2009-029, Department of Computer Science, University of Massachusetts at Amherst.
    [ pdf ]

  • Singh, S., Lewis, R.L., and Barto A.G.(2009)
    Where Do Rewards Come From?
    In N.A. Taatgen & H. van Rijn (Eds.), Proceedings of the 31st Annual Conference of the Cognitive Science Society, pp. 2601-2606. Austin, TX.
    [ pdf ]

  • Johns, J. and Mahadevan, S. (2009)
    Sparse Approximate Policy Evaluation using Graph-based Basis Functions.
    Technical Report UM-CS-2009-041, Department of Computer Science, University of Massachusetts at Amherst.
    [ pdf ]

  • Wang, C. and Mahadevan, S. (2009)
    A General Framework for Manifold Alignment.
    AAAI Fall Symposium on Manifold Learning, Arlington, VA, November 5-7.
    [ pdf ]

  • Konidaris, G.D. and Barto, A.G. (2009)
    Towards the Autonomous Acquisition of Robot Skill Hierarchies (poster abstract).
    In the Robotics: Science and Systems Workshop on Bridging the Gap Between High-Level Discrete Representations and Low-Level Continuous Behaviors (RSS Workshop), Seattle, WA, June 2009.
    [ pdf ]

  • Konidaris, G.D. and Osentoski, S. (2009)
    Value Function Approximation using the Fourier Basis (extended abstract).
    In the Multidisciplinary Symposium on Reinforcement Learning (MSRL '09), Montreal, Canada, June 2009.
    [ pdf ]

  • Konidaris, G.D. and Barto, A.G. (2009)
    Skill Chaining: Skill Discovery in Continuous Domains (extended abstract).
    In the Multidisciplinary Symposium on Reinforcement Learning (MSRL '09), Montreal, Canada, June 2009.
    [ pdf ]

  • Mahadevan, S. (2009)
    Learning Representation and Control in Markov Decision Processes: New Frontiers.
    Foundations and Trends in Machine Learning (editor, Michael, Jordan), Vol. 1, No. 4, pp. 403-565 (163 pages), 2009.
    [ pdf ]

  • Johns, J., Petrik, M., and Mahadevan, S. (to appear)
    Hybrid Least-Squares Algorithms for Approximate Policy Evaluation.
    Machine Learning journal.
    (One of only 7 papers selected to appear in Machine Learning journal from those to be presented at European Conference on Machine Learning (ECML), Bled, Slovenia, 2009.)

  • Wang, C. and Mahadevan, S. (2009)
    Manifold Alignment without Correspondence.
    Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI '09), June 14-17, Pasadena, CA.
    [ pdf ]

  • Wang, C. and Mahadevan, S. (2009)
    Multiscale Analysis of Document Corpora based upon Diffusion Models.
    Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI '09), July 14-17, Pasadena, CA.
    [ pdf ]

  • Wang, C. and Mahadevan, S. (2009)
    Multiscale Dimensionality Reduction with Diffusion Wavelets.
    Technical Report UM-CS-2009-030, Department of Computer Science, University of Massachusetts at Amherst, June 2009.
    [ pdf ]

  • Osentoski, S. and Mahadevan, S. (2009)
    Basis Function Construction for Hierarchical Reinforcement Learning.
    ICML '09 Workshop on Abstraction in Reinforcement Learning
    [ pdf ]

  • Vigorito, C.M. (2009)
    Temporal-Difference Networks for Dynamical Systems with Continuous Observations and Actions.
    Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI '09), June 19-21, Montreal, Canada.
    [ pdf ]

  • Konidaris, G.D. and Barto, A.G. (2009)
    Efficient Skill Learning Using Abstraction Selection.
    In C. Boutilier (Ed.), Proceedings of the Twenty First International Joint Conference on Artificial Intelligence (IJCAI '09), pp. 1107-1112.
    [ pdf ]

  • Botvinick, M.M., Niv, Y., and Barto, A.G. (2009)
    Hierarchically organized behavior and its neural foundations: A reinforcement-learning perspective.
    Cognition, vol. 113 (special issue on Reinforcement Learning and Higher Cognition, edited by Michael Frank and Nathaniel Daw), pages 262--280.
    [ pdf ]

  • Shah, A. and Barto, A.G. (2009)
    Effect on movement selection of an evolving sensory representation: a multiple controller model of skill acquisition.
    Brain Research, vol. 1299 (special issue on Computational Cognitive Neuroscience II, edited by Sue Becker and Nathaniel Daw), pages 55--73.
    [ pdf ]

2008

  • Şimşek, Ö. (2008)
    Behavioral building blocks for autonomous agents: description, identification, and learning.
    PhD thesis, University of Massachusetts Amherst, 2008.
    [ pdf ]

  • Shah, A. (2008)
    Biologically-Based Functional Mechanisms of Motor Skill Acquisition
    PhD thesis, Neuroscience and Behavior Program, University of Massachusetts Amherst
    [ abstract | pdf ]

  • Şimşek, Ö. and Barto, A.G. (2008)
    Skill characterization based on betweenness
    Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS-08), December 8-11, Vancouver, B.C., Canada.
    [ pdf ]

  • Mahadevan, S. (2008)
    Representation Discovery using Harmonic Analysis
    from the series Brachman, R., and Dietterich, T. (editors), Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 2, No. 1, Morgan and Claypool.
    [ publisher's website for this book ] NOTE: this book is available, through the publisher, as both a hard copy and in pdf format. However, be warned that some figures use color, and while the pdf displays such colors, the hard copy is printed in grey-scale.

  • Wang, C. and Mahadevan, S. (2008)
    Multiscale Analysis of Document Corpora Using Diffusion Models
    University of Massachusetts Technical Report 16.
    [ pdf ]

  • Konidaris, G.D. and Barto, A.G. (2008)
    Sensorimotor Abstraction Selection for Efficient, Autonomous Robot Skill Acquisition
    Proceedings of the 7th IEEE International Conference on Development and Learning (ICDL-08), August 9 - 12, Monterey, CA
    [ pdf ]

  • Konidaris, G.D. (2008)
    Autonomous Robot Skill Acquisition (thesis summary)
    Doctoral Symposium, 23rd National Conference on Artificial Intelligence (AAAI-08), July 13 - 17, Chicago, Illinois

  • Nelson, E.L., Konidaris, G.D., and Berthier, N.E. (2008)
    Using Real-Time Motion Capture to Measure Handedness in Infants
    Poster at the XVIth Biennial International Conference on Infant Studies, June, Vancouver, Canada

  • Konidaris, G.D. and Osentoski, S. (2008)
    Value Function Approximation in Reinforcement Learning using the Fourier Basis
    Technical Report UM-CS-2008-19, Department of Computer Science, University of Massachusetts at Amherst, June 2008
    [ pdf ]

  • Vigorito, C.M. and Barto, A.G. (2008)
    Autonomous Hierarchical Skill Acquisition in Factored MDPs
    Proceedings of The Fourteenth Yale Workshop on Adaptive and Learning Systems, June 2 - 4, New Haven, CT
    [ pdf ]

  • Vigorito, C.M. and Barto, A.G. (2008)
    Hierarchical Representations of Behavior for Efficient Creative Search
    AAAI Spring Symposium on Creative Intelligent Systems, March 26 - 28, Palo Alto, CA
    [ pdf ] [ poster ]

  • Mahadevan, S. (2008)
    Fast Spectral Learning using Lanczos Eigenspace Projections
    Proceedings from the Twenty-Third Conference on Artificial Intelligence (AAAI-08), July 13 - 17, Chicago, Illinois
    [ pdf ]

  • Wang, C. and Mahadevan, S. (2008)
    Manifold Alignment using Procrustes Analysis
    Proceedings from the Twenty-Fifth International Conference on Machine Learning (ICML-08), July 5 - 9, Helsinki, Finland
    [ pdf ]

2007

  • Jonsson, A. and Barto, A.G. (2007)
    Active Learning of Dynamic Bayesian Networks in Markov Decision Processes
    Proceedings of the Seventh Symposium on Abstraction, Reformulation, and Approximation (SARA 2007), Whistler, British Columbia, Canada, July 18 - 21, 2007.
    Also published as Miguel, I. and Tuml, W. (editors) Lecture Notes in Artificial Intelligence: Abstraction, Reformulation, and Approximation, vol. 4612, pages 273-284, Springer, New York, NY, USA.
    [ pdf ]

  • Barto, A.G. (2007)
    Temporal difference learning
    Scholarpedia, 2(11):1604
    [ webpage ] for this article

  • Ghavamzadeh, M., and Mahadevan, S. (2007)
    Hierarchical Average Reward Reinforcement Learning
    Journal of Machine Learning Research (JMLR), 8(Nov):2629--2669, 2007
    [ pdf ]

  • Shah, A., and Barto, A.G. (2007)
    Effect on Movement Selection of Evolving Sensory Representation
    poster presented at the Third Annual Computational Cognitive Neuroscience Conference (CCNC07), in conjunction with Dynamical Neuroscience XV, November 1 -- 2, San Diego, CA.
    [ pdf ]

  • Mahadevan, S., and Maggioni, M. (2007)
    Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
    Journal of Machine Learning Research (JMLR), 8(Oct):2169--2231, 2007, MIT Press
    [ pdf ] NOTE: this is a revised version that corrects some errors found in the original published version.

  • Shah, A., and Barto, A.G. (2007)
    Functional Mechanisms of Motor Skill Acquisition
    poster presented at the Sixteenth Annual Computational Neuroscience Meeting (CNS*2007), July 7th - 12th, Toronto, Ontario, Canada. Abstract published in BMC Neuroscience 2007, 8(Suppl 2):P203 (6 July 2007)
    [ abstract | pdf of poster ]

  • Johns, J., Mahadevan, S., and Wang, C. (2007)
    Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization
    Proceedings of the Twenty-second National Conference on Artificial Intelligence (AAAI-07) Vancouver, British Columbia, Canada
    [ pdf ]

  • Mahadevan, S. (2007)
    New Frontiers in Representation Discovery
    Tutorial given at the Twenty-second National Conference on Artificial Intelligence (AAAI-07), July 23, 2007, Vancouver, British Columbia, Canada
    [ tutorial website ]

  • Johns, J., Osentoski, S., and Mahadevan, S. (2007)
    Representation Discovery in Planning using Harmonic Analysis
    AAAI Fall Symposium on Computational Approaches to Representation Change during Learning and Development, Nov. 8-11, 2007, Washington, D.C.
    [ pdf ]

  • Mahadevan, S. (2007)
    Adaptive Mesh Compression in 3D Computer Graphics using Multiresolution Manifold Learning
    Proceedings of the International Conference on Machine Learning (ICML-07), Corvallis, OR June 2007
    [ pdf ]

  • Johns, J., and Mahadevan, S. (2007)
    Constructing Basis Functions from Directed Graphs for Value Function Approximation
    Proceedings of the International Conference on Machine Learning (ICML-07), Corvallis, OR June 2007
    [ pdf ]

  • Osentoski, S., and Mahadevan, S. (2007)
    Learning State-Action Basis Functions for Hierarchical MDPs
    Proceedings of the International Conference on Machine Learning (ICML-07), Corvallis, OR June 2007
    [ pdf ]

  • Mahadevan, S., Osentoski, S., Johns, J., Ferfuson, K., and Wang, C. (2007)
    Learning to Plan using Harmonic Analysis of Diffusion Models
    Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS-07), Providence, RI, September.
    [ pdf ]

  • Arroyo, I., Ferguson, K., Johns, J., Dragon, T., Meheranian, H., Fisher, D., Barto, A.G., Mahadevan, S., and Woolf, B. (2007)
    Repairing Disengagement with Non-Invasive Interventions
    Proceedings of the 13th International Conference of Artificial Intelligence in Education (AIED-07)
    [ pdf ]

  • Vigorito, C. (2007)
    Distributed Path Planning for Mobile Robots using a Swarm of Interacting Reinforcement Learners
    Proceedings of the Sixth Annual International Conference on Autonomous Agents and Multiagent Systems (AAMAS-07), Honolulu, HI.
    [ pdf ]

  • Vigorito, C., Ganesan, D., and Barto, A.G. (2007)
    Adaptive Control of Duty-Cycling in Energy-Harvesting Wireless Sensor Networks
    Proceedings of the Fourth Annual IEEE Communications Society Conference on Sensor, Mesh, and Ad Hoc Communications and Networks (SECON-07), San Diego, CA.
    [ pdf ] (Note: this recieved the best paper award)

  • Şimşek, Ö. and Barto, A.G. (2007)
    Betweenness Centrality as a Basis for Forming Skills
    University of Massachusetts, Department of Computer Science Technical Report TR-2007-26, 2007
    [ pdf ]

  • Barringer, C.W. and Barto, A.G. (2007)
    Discrete Submovements Using Predictive Models
    poster presented at the Neural Control of Movement Conference, March 25-30, Seville, Spain
    [ pdf ]

  • Mahadevan, S. (2007)
    Learning Representations for Markov Decision Processes and Reinforcement Learning
    A tutorial given at the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), Hyderabad, India, January 6-12, 2007.
    [ pdf slides ]

  • Konidaris, G.D. and Barto, A.G. (2007)
    Building Portable Options: Skill Transfer in Reinforcement Learning
    Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), Hyderabad, India, January 6-12, 2007.
    [ pdf ]

      NOTE: an earlier version appeared as a 2006 tech report: Konidaris, G.D. and Barto, A.G. (2006), Building Portable Options: Skill Transfer in Reinforcement Learning, University of Massachusetts Department of Computer Science Technical Report UM-CS-2006-17, March, 2006 [ pdf ]


2006

  • Jonsson, A. and Barto, A.G. (2006)
    Causal Graph Based Decomposition of Factored MDPs
    Journal of Machine Learning Research, vol 7, pages 2259--2301, Nov., 2006.
    [ pdf ]

  • Rohanimanesh, K. (2006)
    Concurrent Decision Making in Markov Decision Processes
    PhD Dissertation, Department of Computer Science, University of Massachusetts Amherst.
    [ pdf ]

  • Mahadevan, S. and Maggiono, M. (2006)
    Proto-Value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
    University of Massachusetts, Department of Computer Science Technical Report TR-2006-35, 2006
    [ pdf ]

  • Maggiono, M. and Mahadevan, S. (2006)
    A Multiscale Framework For Markov Decision Processes using Diffusion Wavelets
    University of Massachusetts, Department of Computer Science Technical Report TR-2006-36, 2006
    [ pdf ]

  • Mahadevan, S. and Maggiono, M. (2006)
    Learning Representation And Behavior: Manifold and Spectral Methods for Markov Decision Processes and Reinforcement Learning
    A tutorial given at ICML-06, Carnegie Mellon University, June 25, 2006
    [ pdf slides ]

  • Ghavamzadeh, M., Mahadevan, S., and Makar, R. (2006)
    Hierarchical Multiagent Reinforcement Learning
    Journal of Autonomous Agents and Multiagent Systems, vol. 13(2), pages 197-229, September
    [ pdf ]

  • Mahadevan, S. and Maggioni, M. (2006)
    Value Function Approximation using Diffusion Wavelets and Laplacian Eigenfunctions
    Neural Information Processing Systems (NIPS), MIT Press, 2006.
    [ pdf ]

  • Maggioni, M. and Mahadevan, S. (2006)
    Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes
    Proceedings of the Twenty Third International Conference on Machine Learning (ICML 2006), Pittsburgh, PA, June 2006
    [ pdf ]

  • Polewan, R.J., Vigorito, C.M., Nason, C.D., Block, R.A., and Moore, J.W. (2006)
    A Cartesian Reflex Assessment of Face Processing
    Behavioral and Cognitive Neuroscience Reviews, vol. 5(1), pages 3-23
    [ pdf ] NOTE: This paper is part of the UMass Cartesian Reflex Project

  • Mahadevan, S., Maggioni, M., Ferguson, K., and Osentoski., S. (2006)
    Learning Representation and Control In Continuous Markov Decision Processes
    Proceedings of The 21st National Conference on Artificial Intelligence (AAAI-06), Boston, MA, July 16-20, 2006
    [ pdf ]

  • Ferguson, K. and Mahadevan, S.(2006)
    Proto-transfer Learning in Markov Decision Processes Using Spectral Methods
    Proceedings of the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning, Pittsburgh, PA, June 2006
    [ pdf ]

  • Ferguson, K., Arroyo, A., Mahadevan, S., Woolf, B., and Barto, A.G. (2006)
    Improving Intelligent Tutoring Systems: Using Expectation Maximization To Learn Student Skill Levels
    Proceedings of the Eighth International Conference on Intelligent Tutoring Systems (ITS-06), Jhongli, Taiwan, June 26 - 30, 2006
    [ pdf ]

  • Johns, J. and Woolf, B. (2006)
    A Dynamic Mixture Model to Detect Student Motivation and Proficiency
    Proceedings of The 21st National Conference on Artificial Intelligence (AAAI-06), Boston, MA, July 16-20, 2006
    [ pdf | ppt slides]

  • Johns, J., Mahadevan, S., and Woolf, B.(2006)
    Estimating Student Proficiency Using an Item Response Theory Model
    Proceedings of the Eighth International Conference on Intelligent Tutoring Systems (ITS-06), Jhongli, Taiwan, June 26 - 30, 2006
    [ pdf ]

  • Konidaris, G.D. and Barto, A.G. (2006)
    An Adaptive Robot Motivational System
    Animals to Animats 9: Proceedings of the 9th International Conference on Simulation of Adaptive Behavior (SAB-06), CNR, Roma, Italy, September 25 - 29, 2006.
    [ pdf ]

  • Konidaris, G.D. (2006)
    A Framework for Transfer in Reinforcement Learning
    Proceedings of the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning, Pittsburgh, PA, June 2006
    [ pdf ]

  • Konidaris, G.D. and Barto, A.G. (2006)
    Autonomous Shaping: Knowledge Transfer in Reinforcement Learning
    Proceedings of the Twenty Third International Conference on Machine Learning (ICML 2006), Pittsburgh, PA, June 2006
    [ pdf ]

      NOTE: an earlier version appeared as a 2005 tech report: Konidaris, G.D. and Barto, A.G. (2005), Autoshaping: Learning to Predict Reward for Novel States, University of Masschusetts Department of Computer Science Technical Report UM-CS-2005-58, September, 2005 [ ps.gz ]

  • Konidaris, G.D. and Barto, A.G. (2006)
    Building Portable Options: Skill Transfer in Reinforcement Learning
    University of Massachusetts Department of Computer Science Technical Report UM-CS-2006-17, March, 2006
    [ pdf ]

  • Wolfe, A.P. and Barto, A.G. (2006)
    Decision Tree Methods for Finding Reusable MDP Homomorphisms
    Proceedings of The 21st National Conference on Artificial Intelligence (AAAI-06), Boston, MA, July 16 - 20, 2006
    [ pdf ]

      NOTE: parts of this work were later presented as a poster [ pdf ] at the Women in Machine Learning Workshop, San Diego, CA 2006

  • Wolfe, A.P. and Barto, A.G. (2006)
    Defining Object Types and Options Using MDP Homomorphisms
    Proceedings of the ICML-06 Workshop on Structural Knowledge Transfer for Machine Learning, Pittsburgh, PA, June, 2006
    [ pdf paper | pdf slides ]

  • Shah, A., Barto, A.G., and Fagg, A.H. (2006)
    Biologically-Based Functional Mechanisms of Coarticulation
    poster presented at the Neural Control of Movement Conference, May 2-7, 2006, Key Biscayne, FL
    [ pdf ]

  • Şimşek, Ö. and Barto, A.G. (2006)
    An Intrinsic Reward Mechanism for Efficient Exploration
    Proceedings of the Twenty-Third International Conference on Machine Learning (ICML 06), Pittsburgh, PA, June, 2006
    [
    pdf ]

  • Ferguson, K. (2006)
    Improving Intelligent Tutoring Systems: Using Expectation Maximization To Learn About Student Skill Levels
    University of Masschusetts Department of Computer Science Technical Report UM-CS-2006-09, February, 2006
    [ pdf ]

2005

  • Barto, A.G. and Şimşek, Ö. (2005)
    Intrinsic Motivation for reinforcement learning systems.
    Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems
    [ pdf ]

  • Ghavamzadeh, M. (2005)
    Hierarchical Reinforcement Learning in Continuous State and Multi-Agent Environments
    PhD Dissertation, Department of Computer Science, University of Massachusetts Amherst.
    [ pdf ]

  • Jonsson, A. (2005)
    A Causal Approach to Hierarchical Decomposition in Reinforcement Learning
    PhD Dissertation, Department of Computer Science, University of Massachusetts Amherst.
    [ pdf ]

  • Konidaris, G.D. and Barto, A.G. (2005)
    Autoshaping: Learning to Predict Reward for Novel States
    University of Masschusetts Department of Computer Science Technical Report UM-CS-2005-58, September, 2005
    [ ps.gz ]

      NOTE: a later version appeared in ICML-06: Konidaris, G.D. (2006) Autonomous Shaping: Knowledge Transfer in Reinforcement Learning Proceedings of the Twenty Third International Conference on Machine Learning (ICML 2006), Pittsburgh, PA, June 2006 [ pdf ]

  • Stout, A., Konidaris, G.D., and Barto, A.G. (2005)
    Intrinsically Motivated Reinforcement Learning: A Promising Framework For Developmental Robot Learning
    Proceedings of the AAAI Spring Symposium on Developmental Robotics, Stanford University, Stanford, CA, March 21-23, 2005.
    [ pdf ]

  • Jonsson, A. and Barto, A.G. (2005)
    A Causal Approach to Hierarchical Decomposition of Factored MDPs
    Proceedings of the Twenty-Second International Conference on Machine Learning ICML 05, Bonn, Germany, August 7-13
    [ pdf | ps ]

  • Mahadevan, S. (2005)
    Representation Policy Iteration: A Unified Framework for Learning Representation and Behavior
    Invited talk given at National Conference on Artificial Intelligence AAAI05, Pittsburgh, PA, July 9-13, 2005
    [ pdf slides ]

  • Rohanimanesh, K. and Mahadevan, S. (2005)
    Coarticulation: An Approach for Generating Concurrent Plans in Markov Decision Processes
    Proceedings of the Twenty-Second International Conference on Machine Learning ICML 05, Bonn, Germany, August 7-13
    [ pdf | ps ]

  • Mahadevan, S. and Maggioni, M.(2005)
    Value Function Approximation using Diffusion Wavelets and Laplacian Eigenfunctions
    University of Massachusetts, Department of Computer Science Technical Report TR-2005-38, 2005
    [ pdf ]

  • Maggioni, M. and Mahadevan, S.(2005)
    Fast Direct Policy Evaluation Using Multiscale Markov Diffusion Processes
    University of Massachusetts, Department of Computer Science Technical Report TR-2005-39, 2005
    [ pdf ]

  • Theocharous, G., Mahadevan, S., and Kaelbling, L (2005)
    Spatial and Temporal Abstraction in POMDPs for Robot Navigation
    submitted (soon to appear as MIT CSAIL TR)
    [ ps.gz ]

  • Johns, J. and Mahadevan, S. (2005)
    A Variational Learning Algorithm for the Abstract Hidden Markov Model
    Proceedings of the National Conference on Artificial Intelligence AAAI05, Pittsburgh, PA, July 9-13, 2005
    [ pdf ]

  • Mahadevan, S. (2005)
    Samuel Meets Amarel: Automating Value Function Approximation using Global State Space Analysis
    Proceedings of the National Conference on Artificial Intelligence AAAI05, Pittsburgh, PA, July 9-13, 2005
    [ pdf ]

  • Mahadevan, S. (2005)
    Representation Policy Iteration
    Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence UAI05, Edinburgh, Scotland, July 26-29, 2005
    [ pdf ]

  • Mahadevan, S. (2005)
    Proto-Value Functions: Developmental Reinforcement Learning
    Proceedings of the International Conference on Machine Learning ICML05, Bonn, Germany, August 7-13, 2005
    [ pdf ]

  • Manfredi, V. and Mahadevan, S. (2005)
    Hierarchical Reinforcement Learning using Graphical Models
    ICML Workshop on Rich Representation for Reinforcement Learning, Bonn, August 7th, 2005.
    [ pdf ]

  • Manfredi, V. and Mahadevan, S. (2005)
    Dynamic Abstraction Networks
    University of Massachusetts, Amherst, Technical Report TR 2005-33, 2005
    [ ps ]

  • Manfredi, V. and Mahadevan, S. (2005)
    Kalman Filters for Prediction and Tracking in an Adaptive Sensor Network
    University of Massachusetts, Amherst, Technical Report 2005-7, 2005
    [ ps ]

  • Jonsson, A., Johns, J., Mehranian, H., Arroyo, I., Woolf, B., Barto, A.G., Fisher, D., and Mahadevan, S.(2005)
    Evaluating the Feasibility of Learning Student Models from Data
    AAAI Workshop on Educational Data Mining, Pittsburgh, PA, July 9, 2005
    [ ps ]

  • Şimşek, Ö., Wolfe, A.P., and Barto, A.G. (2005)
    Identifying useful subgoals in reinforcement learning by local graph partitioning.
    Proceedings of the Twenty-Second International Conference on Machine Learning ICML 05, Bonn, Germany, August 7-13
    [ pdf | bibtex ]

  • Berthier, N. E., Rosenstein, M. T., and Barto, A. G. (2005)
    Approximate Optimal Control as a Model for Motor Learning
    Psychological Review vol. 112, pages 329 - 346

2004

  • Ghavamzadeh, M. and Mahadevan, S. (2004)
    Hierarchical Multiagent Reinforcement Learning
    Technical Report UM-CS-2004-02, Department of Computer Science, University of Massachusetts, Amherst, MA
    [ ps | pdf ]

  • Singh, S., Barto, A.G., and Chentanez, N. (2004)
    Intrinsically Motivated Reinforcement Learning
    18th Annual Conference on Neural Information Processing Systems (NIPS), Vancouver, B.C., Canada, December 2004
    [ pdf ]

  • Barto, A.G., Singh, S., and Chentanez, N. (2004)
    Intrinsically Motivated Learning of Hierarchical Collections of Skills
    International Conference on Developmental Learning (ICDL), LaJolla, CA, USA
    [ pdf ]

  • Şimşek, Ö., Wolfe, A.P., and Barto, A.G. (2004)
    Local Graph Partitioning as a Basis for Generating Temporally-Extended Actions in Reinforcement Learning
    In Proceedings of the AAAI-04 Workshop on Learning and Planning in Markov Processes - Advances and Challenges 2004.
    [ ps | pdf | bibtex ]

  • Osentoski, S., Manfredi, V., Mahadevan, S. (2004)
    Learning Hierarchical Models of Activity
    IEEE/RSJ International Conference on Robots and Systems (IROS 2004)
    [ pdf ]

  • Si. J., Barto, A. G., Powell, W. B., Wunch D., editors. (2004)
    Handbook of Learning and Approximate Dynamic Programming
    Wiley-IEEE Press, Piscataway, NJ.
    [ publishers website for this book ]

  • Barto, A.G. and Dietterich, T.G. (2004)
    Reinforcement Learning and Its Relationship to Supervised Learning
    In Si, J., Barto, A.G., Powell, W.B., and Wunsch, D., editors, Handbook of Learning and Approximate Dynamic Programming, Chapter 2, pages 47 - 64. Wiley-IEEE Press, Piscataway, NJ.
    [ pdf ]

  • Mahadevan, S., Ghavamzadeh, M., Rohanimanesh, K., and Theocharous G. (2004)
    Hierarchical Approaches to Concurrency, Multiagency, and Partial Observability
    In Si, J., Barto, A.G., Powell, W.B., and Wunsch, D., editors, Handbook of Learning and Approximate Dynamic Programming, Chapter 11, pages 285 - 310. Wiley-IEEE Press, Piscataway, NJ.

  • Rosenstein, M.T. and Barto, A.G.(2004)
    Supervised Actor-Critic Reinforcement Learning
    In Si, J., Barto, A.G., Powell, W.B., and Wunsch, D., editors, Handbook of Learning and Approximate Dynamic Programming, Chapter 14, pages 359 - 380. Wiley-IEEE Press, Piscataway, NJ.
    [ pdf ]

  • Rosenstein, M.T. and Barto, A.G. (2004)
    Reinforcement learning with supervision by a stable controller
    Proceedings of the 2004 American Control Conference, pages 4517-4522
    NOTE: for an expanded version of this work, see the book chaper just above (Supervised Actor-Critic Reinforcement Learning).

  • Rohanimanesh, K., Platt, R., Mahadevan, S., and Grupen, R (2004)
    Coarticulation in Markov Decision Processes
    18th Annual Conference on Neural Information Processing Systems (NIPS), Vancouver, B.C., Canada, December 2004
    [ ps ]

  • Rohanimanesh, K., Platt, R., Mahadevan, S., and Grupen, R (2004)
    A Framework for Coarticulation in Markov Decision Processes
    Technical Report 04-33, Department of Computer Science, University of Massachusetts, Amherst, Massachusetts
    [ ps | pdf ]

  • Şimşek, Ö. and Barto, A.G. (2004)
    Using Relative Novelty to Identify Useful Temporal Abstractions in Reinforcement Learning
    Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004).
    [ ps | pdf | bibtex]

  • Ghavamzadeh, M. and Mahadevan, S. (2004)
    Learning to Act and Communicate in Cooperative Multiagent Systems using Hierarchical Reinforcement Learning
    Autonomous Agents and Multiagent Systems (AAMAS 2004).
    [ pdf ]

  • Saria, S. and Mahadevan, S. (2004)
    Probabilistic Plan Recognition in Multiagent Systems
    International Conference on AI and Planning Systems (ICAPS 2004).
    [ pdf ]

  • Shah, A., Fagg, A. H., and Barto, A. G. (2004)
    Cortical Involvement in the Recruitment of Wrist Muscles
    Journal of Neurophysiology vol. 91, pages 2445 - 2456
    [ pdf ]

  • Ravindran, B. and Barto, A.G. (2004)
    Approximate Homomorphisms: A Framework for Non-exact Minimization in Markov Decision Processes
    Proceedings of the Fifth International Conference on Knowledge Based Computer Systems (KBCS 04), Hyderabad, India, December 19--22
    [ pdf ]

  • Ravindran, B. (2004)
    An Algebraic Approach to Abstraction in Reinforcement Learning
    PhD Dissertation, Department of Computer Science, University of Massachusetts Amherst.
    [ pdf ]

2003

  • Ghavamzadeh, M., Mahadevan, S., and Makar, R. (2003)
    Extending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models
    Technical Report UM-CS-2003-23, Department of Computer Science, University of Massachusetts, Amherst, MA
    [ ps | pdf ]

  • Ghavamzadeh, M. and Mahadevan, S. (2003)
    Hierarchical Average Reward Reinforcement Learning
    Technical Report UM-CS-2003-19, Department of Computer Science, University of Massachusetts, Amherst, MA
    [ ps | pdf ]

  • Barto, A. G. and Mahadevan, S. (2003)
    Recent Advances in Hierarchical Reinforcement Learning
    Discrete Event Dynamic Systems vol. 13(4), pages 341 - 379
    [ pdf ]

  • Rosenstein, M.T. (2003)
    Learning To Exploit Dynamics For Robot Motor Coordination
    PhD Dissertation, Department of Computer Science, University of Massachusetts Amherst.
    [ abstract | pdf ]

  • Ghavamzadeh, M., Mahadevan, S. (2003)
    Hierarchical Policy Gradient Algorithms
    Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003).
    [ pdf ]

  • Ravindran, B. and Barto, A. G. (2003)
    Relativized Options: Choosing the Right Transformation
    Proceedings of the Twentieth International Conference on Machine Learning (ICML 2003)
    [ pdf ]

  • Ravindran, B. and Barto, A.G. (2003)
    SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi Markov Decision Processes
    The Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03).
    [ pdf ]

  • Ravindran, B. and Barto, A. G. (2003)
    An Algebraic Approach to Abstraction in Reinforcement Learning
    In the Proceedings of the Twelfth Yale Workshop on Adaptive and Learning Systems, pp. 109-114, Yale University.
    [ pdf ]

2002

  • Barto, A. G. (2002)
    Reinforcement learning
    In Handbook of Brain Theory and Neural Networks, Second Edition M.A. Arbib (Ed.), pages 963-968. Cambridge: MIT Press..

  • Barto, A. G. (2002)
    Reinforcement learning in motor control
    In Handbook of Brain Theory and Neural Networks, Second Edition M.A. Arbib (Ed.), pages 968-972. Cambridge: MIT Press..

  • McGovern, Amy , Moss, Eliot, and Andrew G. Barto (2002)
    Building a Basic Block Instruction Scheduler using Reinforcement Learning and Rollouts
    Machine Learning, Special Issue on Reinforcement Learning. Volume 49, Numbers 2/3, Pages 141-160.
    [ ps (200K) | gzipped ps (60K) | pdf (160K)]

  • Fagg, A. H., Shah, A., and Barto, A. G. (2002)
    A Computational Model of Muscle Recruitment for Wrist Movements
    Journal of Neurophysiology, 88:3348 - 3358
    [ pdf ]

  • Perkins, T.J. (2002)
    Lyapunov Methods for Safe Intelligent Agent Design
    PhD Dissertation, Department of Computer Science, University of Massachusetts Amherst.
    [ ps | pdf ]

  • Perkins, T.J. and Barto, A.G. (2002)
    Lyapunov Design for Safe Reinforcement Learning
    Journal of Machine Learning Research (JMLR), vol. 3, pg 803--832, 2002.
    [ pdf ]

  • Rohanimanesh, K., and Mahadevan, S. (2002)
    Learning to Take Concurrent Actions
    16th Annual Conference on Neural Information Processing Systems (NIPS), Vancouver, Canada, December 2002
    [ ps ]

  • McGovern, Amy (2002)
    Autonomous Discovery of Temporal Abstractions from Interaction with an Environment
    PhD Dissertation, Department of Computer Science, University of Massachusetts Amherst.
    [ ps | gzipped ps | pdf ]

  • Pickett, M., and Barto, A. G (2002)
    PolicyBlocks: An Algortithm for Creating Useful Macro-Actions in Reinforcement Learning
    In Proceedings of the Nineteenth International Conference of Machine Learning
    [ ps ]

  • Perkins, T.J., and Pendrith, M.D. (2002)
    On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains
    In Proceedings of the Nineteenth International Conference of Machine Learning, pp. 490--497.
    [ ps | pdf ]

  • Perkins, T.J. (2002)
    Reinforcement Learning for POMDPs based on Action Values and Stochastic Optimization
    In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 199--204.
    [ ps | pdf ]

  • Ravindran, B. and Barto, A. G. (2002)
    Model Minimization in Hierarchical Reinforcement Learning
    In the Proceedings of the Fifth Symposium on Abstraction, Reformulation and Approximation (SARA 2002), pp.196-211, LNCS, Springer Verlag.
    [ gzipped pdf ]

  • Bernstein, D.S., Perkins, T.J., Zilberstein, S., and Finkelstein, L. (2002)
    Scheduling Contract Algorithms on Multiple Processors
    In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 702--706.
    [ ps | pdf ]

  • Shah, A., Fagg, A. H., and Barto, A. G. (2002)
    Cortical Involvement in the Recruitment of Wrist Muscles
    poster presented at the Neural Control of Movement Conference, April 14-21, 2002, Naples, FL
    [ abstract | pdf ]

  • Kositsky, M. and Barto, A.G. (2002)
    The emergence of movement units through learning with noisy efferent signals and delayed sensory feedback
    Neurocomputing, 44-46, pp. 889-895, 2002.
    [ pdf ]

  • Kositsky, M. and Barto, A.G. (2002)
    Emergence of Multiple Movement Units in the Presence of Noise and Feedback Delay
    in Dietterich, T.G., Becker, S., and Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems 14 (NIPS) 2002.
    [ pdf ]

  • Rosenstein, M.T. and Grupen, R.A. (2002)
    Velocity-dependentdynamic manipulability
    In Proceedings of the IEEE International Conference on Robotics and Automation, vol 3, 2424-2429. XGGG
    [ gzipped ps | pdf ]

  • Ghavamzadeh, M., Mahadevan, S. (2002)
    Hierarchically Optimal Average Reward Reinforcement Learning
    Proceedings of the Nineteenth International Conference on Machine Learning (ICML-2002).
    [ ps ]

  • Ghavamzadeh, M., Mahadevan, S. (2002)
    A Multiagent Reinforcement Learning Algorithm by Dynamically Merging Markov Decision Processes
    Proceedings of the First International Joint Conference on Autonomous Agents & Multiagent Systems (AAMAS-2002)
    [ ps ]

  • Arbib, M. A., Fagg, A. H., and Grafton, S. T. (2002)
    Synthetic PET Imaging for Grasping: From Primate Neurophysiology to Human Behavior
    in Explorative analysis and data modelling in functional neuroimaging,(F. Sommer and A. Wichert, Eds.), Cambridge MA: The MIT Press, pp. 231-250.
    [ pdf ]

  • Houk, J. C., Fagg, A. H., Barto, A. G. (2002)
    Fractional Power Damping Model of Joint Motion
    in Progress in Motor Control: Structure-Function Relations in Voluntary Movements,(M. Latash, Ed.), vol II, pages 147-178
    [ ps ]

  • Fagg, A. H. and Weitzenfeld, A. (2002)
    A Model of Primate Visual-Motor Conditional Learning
    in NSL - Neural Simulation Language: Systems and Applications ,(A. Weitzenfeld, M. A. Arbib, and A. Alexander, Eds.), MIT Press
    [ ps | pdf ]

2001

  • Perkins, T.J., and Barto, A.G. (2001)
    Lyapunov-Constrained Action Sets for Reinforcement Learning
    In Proceedings of the Eighteenth International Conference on Machine Learning, pp. 409--416.
    [ ps ]

  • Perkins, T.J., and Barto, A.G. (2001)
    Heuristic Search in Infinite State Spaces Guided by Lyapunov Analysis
    In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 242--247.
    [ ps | pdf]

  • Rosenstein, M.T. and Barto, A.G. (2001)
    Robot weightlifting by direct policy search
    In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, vol. 2, 839-844.
    [ gzipped ps | pdf ]

  • Rosenstein, M.T. and Barto, A.G. (2001)
    A robotic weightlifter that learns to exploit dynamics
    In Studies in Perception and Action VI: Eleventh International Conference on Perception and Action , 25-28.

  • Rosenstein, M.T. and Barto, A.G. (2001)
    From elementary movements to coordination for a robotic weightlifter
    In Abstracts of the Third International Symposium on Progress in Motor Control: From Basic Science to Applications, p. 40.

  • Rohanimanesh, K. and Mahadevan, S. (2001)
    Decision-Theoretic Planning with Concurrent Temporally Extended Actions
    17th Conference on Uncertainty in Artificial Intelligence (UAI '01), August 3-5, 2001, University of Washington, Seattle, WA, USA
    [ ps ]

  • Ravindran, B. and Barto, A. G. (2001)
    Symmetries and Model Minimization of Markov Decision Processes
    Computer Science Technical Report 01-43, University of Massachusetts, Amherst, MA.
    [ gzipped ps ]

  • McGovern, Amy , and Andrew G. Barto (2001)
    Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density
    In 2001 International Conference on Machine Learning
    [ ps (252K) | gzipped ps (160K) ]

  • Kositsky, M. and Barto, A. G. (2001)
    Nonlinear Damping Dynamics and the Variability of Rapid Aimed Movements
    Technical Report 01-15, Department of Computer Science, University of Massachusetts, Amherst.
    [ gzipped ps | gzipped pdf ]

  • Kositsky, M. and Barto, A. G. (2001)
    Nonlinear Damping Dynamics and the Variability of Rapid Aimed Movements
    Poster presented at the 2001 Conference on Neural Control of Movement, Seville, Spain.
    one-page poster: [ gzipped ps | gzipped pdf ]

  • Kositsky, M. and Barto, A. G. (2001)
    The emergence of multiple movement through learning with noisy efferent signals and delayed sensory feedback
    Tenth Annual Computational Neuroscience Meeting, San Francisco and Pacific Grove, California, 2001.
    [ pdf ]

  • Kositsky, M. and Barto, A. G. (2001)
    Reinforcement learning model for noisy environment and delayed feedback: natural emergence of movement units
    Fifth International Conference on Cognitive and Neural Systems, Boston, Massachusetts, 2001.

  • Shah, A., Fagg, A. H., and Barto, A. G. (2001)
    A Computational Model of Muscle Recruitment for Wrist Movements
    poster presented at Neural Control of Movement Conference, March 25-30, 2001, Seville, Spain
    [ abstract | pdf ]

  • Jonsson, A. and Barto, A. G. (2001)
    Automated State Abstraction for Options using the U-Tree Algorithm
    In Advances in Neural Processing Information Systems 13, Cambridge, MA: MIT Press.
    [ gzipped ps ]

  • Schlesinger, M., and Parisi, D. (2001)
    The agent-based approach: A new direction for computational models of development
    Developmental Review, 21, pp 121--146.
    [ gzipped ps ]

  • Engelbrecht, S.E. (2001)
    Minimum Principles in Motor Control
    Journal of Mathematical Psychology, 45:497-542.
    [ pdf | ps ]

  • Ghavamzadeh, M., Mahadevan, S. (2001)
    Continuous-Time Hierarchical Reinforcement Learning
    Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001)
    [ ps ]

  • Makar, R., Mahadevan, S., Ghavamzadeh, M. (2001)
    Hierarchical Multi-Agent Reinforcement Learning
    Proceedings of the Fifth International Conference on Autonomous Agents 2001.
    [ ps ]

  • Davis, J.A., Fagg, A.H., and Levine, B.N. (2001)
    Wearable Computers as Packet Transport Mechanisms in Highly-Partitioned Ad-Hoc Networks
    Proceedings of the International Symposium on Wearable Computing, Zurich, Switzerland, October 2001, pp. 141-148.
    [ ps ]

2000

  • J. Randløv, A.G. Barto, and M.T. Rosenstein (2000)
    Combining reinforcement learning with a local control algorithm
    In Proceedings of the Seventeenth International Conference on Machine Learning, 775-782.
    [ gzipped ps | pdf ]

  • Precup, D. (2000)
    Temporal Abstraction in Reinforcement Learning
    Ph.D. Dissertation, Department of Computer Science, University of Massachusetts, Amherst.
    [ gzipped ps ]

  • R. Moll, T. Perkins, and A. Barto (2000)
    Machine Learning for Subproblem Selection
    Proceedings of the Seventeenth International Conference on Machine Learning (ICML-2000), P. Langley (Ed.),Morgan Kaufmann, San Francisco,CA, pp. 615-622.
    [ ps ]

  • Precup, D., Sutton, R. S. and Singh, S. (2000)
    Eligibility Traces for Off-Policy Policy Evaluation
    In Proceedings of the Seventeenth Conference on Machine Learning (ICML 2000), pp. 759--766. Morgan Kaufman.
    [ ps ]

  • Schlesinger, M., Parisi, D., and Langer, J. (2000)
    Learning to reach by constraining the movement search space
    Developmental Science, 3, 67-80.
    [ gzipped ps ]

  • Berthier, N.E., Barto, A.G., and Schlesinger, M. (2000)
    Learning and dynamics
    Proceedings of the NSF DARPA Conference on Learning and Development
    [ pdf ]

  • Fagg, A.H. (2000)
    A Model of Muscle Geometry for a Two Degree-Of-Freedom Planar Arm
    Technical Report #00-03, Department of Computer Science, University of Massachusetts, Amherst
    [ ps ]

1999

  • R. Moll, A. Barto, T. Perkins, and R. Sutton (1999)
    Learning Instance-Independent Value Functions to Enhance Local Search
    Advances in Neural Information Processing Systems 11 (NIPS11), M. S. Kearns, S. A. Solla, and D. A. Cohn (Eds.), Cambridge, MA: MIT Press, 1999, pp. 1017-1023.
    [ ps ]

  • Sutton, R. S., Precup, D., and Singh, S. (1999)
    Between MDPs and semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
    In Artificial Intelligence, vol. 112, pp.181-211.
    [ pdf | gzipped ps ]
    An earlier version appeared as Technical Report UM-CS-1998-74, Department of Computer Science, University of Massachusetts, Amherst, MA 01003-4610.
    [ ps ]

  • Schlesinger, M., and Langer, J. (1999)
    Infants' developing expectations of possible and impossible tool-use events between ages 8 and 12 months
    Developmental Science, 2, 195-205.
    [ gzipped ps ]

  • Schlesinger, M., and Barto, A. (1999)
    Optimal control methods for simulating the perception of causality in young infants
    Proceedings of the Twenty First Annual Conference of the Cognitive Science Society, pp. 625-630. New Jersey: Lawrence Erlbaum.
    [ gzipped ps ]

  • McGovern, Amy, Moss, Eliot, and Barto, Andrew G. (1999)
    Basic-block Instruction Scheduling Using Reinforcement Learning and Rollouts
    Proceedings of the 1999 IJCAI workshop on learning and optimization.
    [ ps ]

  • Sutton, R. S., Singh, S., Precup, D., and Ravindran, B. (1999)
    Improved Switching among Temporally Abstract Actions
    In Advances in Neural Information Processing Systems 11 (Proceedings of NIPS'98), pp.1066-1072. MIT Press.
    [ ps ]

  • McGovern, Amy, and Moss, Eliot (1999)
    Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts
    Proceedings of the 11th Neural Information Processing Systems Conference (NIPS '98), pages 903-909.
    [ ps ]

  • Barto, A. G, Fagg, A. H., Sitkoff, N., and Houk, J. C. (1999)
    A Cerebellar Model of Timing and Prediction in the Control of Reaching
    Neural Computation, vol. 11, pp. 565-594.
    [ ps | pdf ]

1998

  • Sutton, Richard S., and Barto, Andrew G. (1998)
    Reinforcement Learning: An Introduction
    MIT Press.
    [ HTML Version | MIT Press Site for this book ]

  • McGovern, Amy (1998)
    acQuire-macros: An Algorithm for Automatically Learning Macro-actions
    In the Neural Information Processing Systems Conference (NIPS '98) workshop on Abstraction and Hierarchy in Reinforcement Learning
    [ ps ]

  • Crites, R. H., and Barto, A. G (1998)
    Elevator Group Control Using Multiple Reinforcement Learning Agents
    Machine Learning 33: 235-262.
    [ gzipped ps ]

  • McGovern, Amy, and Sutton, Richard S. (1998)
    Macro-Actions in Reinforcement Learning: An Empirical Analysis
    Master's thesis and University of Massachusetts, Amherst Technical Report 98-70
    [ ps ]

  • Fagg, A. H., Zelevinsky, L., Barto, A. G., and Houk, J. C. (1998)
    A Pulse-Step Model of Control for Arm Reaching Movements
    Proceedings of the Spring Meeting on the Neural Control of Movement.

  • Fagg, A. H., Barto, A. G., and Houk, J. C.(1998)
    Learning to Reach Via Corrective Movements
    Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, New Haven, CT.
    [ ps | html ]

  • McGovern, Amy, Precup, Doina, Ravindran, B., Singh, Satinder, and Sutton, Richard S. (1998)
    Hierarchical Optimal Control of MDPs
    Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, pp.186-191.
    [ ps ]

  • Sutton, R. S., Precup, D., and Singh, S. (1998)
    Intra-Option Learning about Temporally Abstract Actions
    In Proceedings of the Fifteenth International Conference on Machine Learning (ICML'98), pp.556-564. Morgan Kaufman.
    [ ps ]

  • Precup, D., Sutton, R. S., and Singh, S. (1998)
    Theoretical Results on Reinforcement Learning with Temporally Abstract Behaviors
    In Machine Learning: ECML-98. 10th European Conference on Machine Learning, Chemnitz, Germany, April 1998. Proceedings, pp. 382-393. Springer Verlag.
    [ ps ]

  • Precup, D., and Sutton, R. S. (1998)
    Multi-Time Models for Temporally Abstract Planning
    In Advances in Neural Information Processing Systems 10 (Proceedings of NIPS'97), pp. 1050-1056. MIT Press.
    [ ps ]

  • Fagg, A. H., Lotspeich, D. L., Hoff, J. Bekey, G. A. (1998)
    Rapid Reinforcement Learning for Reactive Control Policy Design for Autonomous Robots
    in Artificial Life in Robotics , (T. Shibata and T. Fukuda, Eds.)
    [ ps | pdf]

1997

  • McGovern, Amy, Sutton, Richard S., and Fagg, Andrew H. (1997)
    Roles of Macro-Actions in Accelerating Reinforcement Learning
    1997 Grace Hopper Celebration of Women in Computing.
    [ ps ]

  • Precup, D., Sutton, R. S., and Singh, S. (1997)
    Planning with Closed-Loop Macro Actions
    In Working Notes of the AAAI Fall Symposium '97 on Model-directed Autonomous Systems, pp. 70-76.
    [ ps ]

  • Precup, D., and Sutton, R. S. (1997)
    Multi-Time Models for Reinforcement Learning
    Proceedings of the ICML'97 Workshop on Modelling in Reinforcement Learning.
    [ ps ]

  • Barto, A.G. and Sutton, R.S. (1997)
    Reinforcement Learning in Artificial Intelligence
    in Neural-Network Models of Cognition, Volume 121: Biobehavioral Foundations (Advances in Psychology) (eds. Donahoe, J.W. and Dorsel, V.P.), Elsevier, North-Holland, The Netherlands.
    [ pdf ]

  • Precup, D., and Sutton, R. S. (1997)
    Exponentiated Gradient Methods for Reinforcement Learning
    Proceedings of the 14th International Conference on Machine Learning, ICML'97, Morgan Kaufmann, pp.272-277.
    [ ps ]

  • R. E. Kettner, S. Mahamud, H. -C. Leung, N. Sitkoff, Houk, J. C., B. W. Peterson, and Barto, A. G. (1997)
    Prediction of Complex Two-Dimensional Trajectories by the Eye and by a Cerebellar Model of Smooth Eye Movements
    Journal of Neurophysiology, 77:2115-2130
    [ ps | pdf ]

  • Fagg, A. H., Sitkoff, N., Barto, A. G., and Houk, J. C. (1997)
    Cerebellar Learning for Control of a Two-Link Arm in Muscle Space
    Proceedings of the IEEE Conference on Robotics and Automation, May, pp. 2638-2644.
    [ gzipped ps ]

  • Fagg, A. H., Zelevinsky, L., Barto, A. G., and Houk, J. C. (1997)
    Using Crude Movements to Learn Accurate Motor Programs for Reaching
    Presented at the NIPS workshop: Can Artificial Cerebellar Models Compete to Control Robots? Dec. 5, Breckenridge, CO.
    [ ps ]

  • Fagg, A. H., Sitkoff, N., Barto, A. G., and Houk, J. C. (1997)
    A Computational Model of Cerebellar Learning for Limb Control
    Proceedings of the Spring 1997 Meeting of the Neural Control of Movement.
    [ gzipped ps poster text ]

  • Fagg, A. H., Sitkoff, N., Barto, A. G., and Houk, J. C. (1997)
    A Model of Cerebellar Learning for Control of Arm Movements Using Muscle Synergies
    Proceedings of the IEEE International Symposium on Computational Intelligence in Robotics and Automation, July 10-11, pp. 6-12.
    [ gzipped ps ]

  • A. H. Fagg, N. Sitkoff, A. G. Barto, and Houk, J. C. (1997)
    Cerebellar Learning for Control of a Two-Link Arm in Muscle Space
    Proceedings of the IEEE Conference on Robotics and Automation (ICRA), May, pages 2638-2644.
    [ gzipped ps ]

1996

  • Bradke, S.J. and Barto, A.G. (1996)
    Linear Least-Squares Algorithms for Temporal DIfference Learning
    Machine Learning 22(1-3):33--57, 1996
    [ pdf ]

  • Singh, S.P. and Sutton, R.S. (1996)
    Reinforcement Learning with Replacing Eligibility Traces
    Machine Learning 22(1-3):123--158, 1996
    [ pdf ]

  • Precup, D., and Sutton, R. S. (1996)
    Empirical Comparison of Gradient Descent and Exponentiated Gradient Descent in Supervised and Reinforcement Learning
    Technical Report UM-CS-1996-070, Department of Computer Science, University of Massachusetts, Amherst, MA 01003.
    [ ps ]

  • Houk, J.C., Buckingham, J.T., and Barto, A.G. (1996)
    Models of the Cerebellum and Motor Learning
    Behavioral and Brain Sciences vol. 19, pages 368-383.
    [
    pdf | ps ] (NOTE: These are pdf/ps copies of an unofficial online version that does not include figures. They were provided to give you an idea of the paper, but please go through the journal for the official version. Thanks.)

  • Hansen, E.A., Barto, A.G., and Zilberstein, S. (1996)
    Reinforcement Learning for Mixed Open-Loop and Closed-Loop Control
    Proceedings from the Ninth Annual Neural Information Processing Systems Conference (NIPS), Denver, Colorado, USA
    [ pdf ]

1995

  • A. G. Barto, J. T. Buckingham, and Houk, J. C. (1995)
    A Predictive Switiching Model of Cerebellar Movement Control
    Neural Information Processing Systems 8, MIT Press, 1995, pp. 138-144.
    [ gzipped ps ]

  • R. H. Crites, and A. G. Barto (1995)
    Improving Elevator Performance Using Reinforcement Learning
    Neural Information Processing Systems 8, MIT Press, 1995, pp. 1017-1023.
    [ zipped ps ]

  • Houk, J. C., J. L. Adams, and Barto, A. G. (1995)
    A model of how the basal ganglia generates and uses neural signals that predict reinforcement
    In Models of Information Processing in the Basal Ganglia, J. C. Houk, J. Davis, and D. Beiser (Eds.), Cambridge, MA: MIT Press, 1995, pp. 249-270.

  • A. G. Barto (1995)
    Adaptive critics and the basal ganglia
    In Models of Information Processing in the Basal Ganglia, J. C. Houk, J. Davis, and D. Beiser (Eds.), Cambridge, MA: MIT Press, 1995, pp. 215-232.
    [ ps | pdf ] (note: this version is missing one figure)

  • J. T. Buckingham, Barto, A. G., and Houk, J. C. (1995)
    Adaptive Predictive Control with a Cerebellar Model
    In Proceedings of the 1995 World Congress on Neural Networks, Volume 1, Lawrence Erlbaum Associates, Inc: Mahwah, NJ, 1995, pp. 373-380

  • M. Duff (1995)
    Q-Learning for bandit problems
    In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference on Machine Learning (ML95), Morgan Kaufmann: Tahoe City, CA, 1995, pp. 209-217.

  • Barto, A. G. (1995)
    Reinforcement learning and dynamic programming
    Presented at IFAC'95, Conference on Man-Machine Systems, Cambridge, MA, June 1995.

  • A. G. Barto, S. J. Bradtke, and S. P. Singh (1995)
    Learning to act using real-time dynamic programming
    Artificial Intelligence, Special Volume on Computational Research on Interaction and Agency,
    72(1): 81-138, 1995.
    [ gzipped ps ]
    • (Reprinted in Computational Theories of Interaction and Agency, P. E. Agre & S. J. Rosenschein (Eds.), Cambridge, MA: MIT Press, 1996.)
    • (Also appeared as CMPSCI Technical Report 93-02, University of Massachusetts, January 1993. (Supercedes TR 91-57.))

  • Crites RH, and Barto, A. G. (1995)
    An Actor/Critic Algorithm that is Equivalent to Q-Learning
    NIPS 7.
    [ pdf ]

  • Bradtke, Steven J. and Duff, Michael O. (1995)
    Reinforcement Learning Methods for Continuous-Time Markov Decision Problems
    NIPS 7.

1994

  • Singh, S. P. (1994)
    Learning to Solve Markovian Decision Processes
    Ph.D Thesis
    [ zipped ps ]

  • Singh, S. P., Jaakkola, T., and Jordan, M. (1994)
    Reinforcement Learning With Soft State Aggregation
    NIPS-7
    [ zipped ps ]

  • Singh, S. P. (1994)
    Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes
    AAAI-94
    [ zipped ps ]

  • Singh, S. P., Jaakkola, T., and Jordan, M. (1994)
    Learning Without State-Estimation in Partially Observable Markovian Decision Processes
    ML-94
    [ zipped ps ]

  • Sutton, R. S., and Singh, S. P. (1994)
    On Step-Size and Bias in Temporal-Difference Learning
    Eighth Yale Workshop
    [ zipped ps ]

  • Jaakkola, T., Jordan, M., and Singh, S. P. (1994)
    On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
    Neural Computation
    [ zipped ps ]

  • J. T. Buckingham, J. C. Houk, and A. G. Barto (1994)
    Controlling a nonlinear spring-mass system with a cerebellar model
    8th Yale Workshop on Adaptive and Learning Systems, Yale University, June 1994. pp. 1-6.

  • S. J. Bradtke, A. G. Barto, and B. E. Ydstie (1994)
    A reinforcement learning method for direct adaptive linear quadratic control
    8th Yale Workshop on Adaptive and Learning Systems, Yale University, June 1994. pp. 85-96.

  • V. Gullapalli and A. Barto (1994)
    Convergence of indirect adaptive asynchronous value iteration algorithms
    In Advances in Neural Information Processing Systems 6, J.D. Cowan, G. Tesauro and J. Alspector (Eds.), San Francisco: Morgan Kauffmann, 1994. pp. 695-702.

  • A. Barto and M. Duff (1994)
    Monte Carlo matrix inversion and reinforcement learning
    In Advances in Neural Information Processing Systems 6, J.D. Cowan, G. Tesauro and J. Alspector (Eds.), San Francisco: Morgan Kauffmann, 1994. pp. 687-694.

  • S. P. Singh, A. G. Barto, R. Grupen, and C. Connolly (1994)
    Robust reinforcement learning in motion planning
    In Advances in Neural Information Processing Systems 6, J.D. Cowan, G. Tesauro and J. Alspector (Eds.), San Francisco: Morgan Kauffmann, 1994. pp. 655-662.

  • V. Gullapalli, A. G. Barto, and R. A. Grupen (1994)
    Learning admittance mappings for force-guided assembly
    Proceedings of the 1994 International Conference on Robotics and Automation, 1994, pp. 2633-2638.

  • V. Gullapalli, J. A. Franklin, and H. Benbrahim (1994)
    Acquiring robot skills via reinforcement learning
    IEEE Control Systems Special Issue on Robotics: Capturing Natural Motion, 4(1): 13-24, Feb. 1994.

  • Grupen RA, Coelho, J. A., Connolly, C. I., Gullapalli, V., Huber, M., and Souccar, K. (1994)
    Toward Physical Interaction and Manipulation: Screwing in a Light Bulb
    AAAI 1994 Spring Symposium on Physical Interaction and Manipulation.
    [ zipped ps ](large file)

  • T. W. Sandholm and R. H. Crites (1994)
    Multiagent Reinforcement Learning in the Iterated Prisoner's Dilemma
    Submitted to Biosystems Journal, November 1994.

  • S. J. Bradtke and A. G. Barto (1994)
    New Algorithms for Temporal Difference Learning
    Machine Learning, 108, Special Issue on Reinforcement Learning.

  • Singh, S. P., and Yee, R. C. (1994)
    An Upper Bound on the Loss from Approximate Optimal-Value Functions
    Machine Learning
    [ zipped ps ]

  • Barto, A. G. (1994)
    Reinforcement Learning Control
    Current Opinion in Neurobiology, 4:888-893, 1994.

  • V. Gullapalli (1994)
    Skillful Control Under Uncertainty via Direct Reinforcement Learning
    (Submitted to Robotics and Autonomous Systems.)

  • N. Berthier, R. Clifton, V. Gullapalli, D. McCall, and D. Rubin (1994)
    Visual information and object size in the control of reaching
    (Submitted.)

  • V. Gullapalli (1994)
    Direct associative reinforcement learning methods for dynamic systems control
    (Submitted to Neurocomputing.)

  • S. J. Bradtke (1994)
    Incremental Dynamic Programming for On-Line Adaptive Optimal Control
    (Ph.D. Thesis) CMPSCI Technical Report 94-62, University of Massachusetts, August 1994.

  • Barto, A. G. (1994)
    Learning as hillclimbing in weight space
    In Handbook of Brain Theory and Neural Networks, M.A. Arbib (Ed.), Cambridge: MIT Press..

  • Barto, A. G. (1994)
    Reinforcement learning in motor control
    In Handbook of Brain Theory and Neural Networks, M.A. Arbib (Ed.), Cambridge: MIT Press..

  • Barto, A. G.(1994)
    Reinforcement Learning
    In Handbook of Brain Theory and Neural Networks, M.A. Arbib (Ed.), Cambridge: MIT Press..

  • M. Duff (1994)
    Solving Bellman's Equation by the method of continuity
    Proceedings of the 1994 American Control Conference, Baltimore, June 1994.

  • S. J. Bradtke, B. E. Ydstie, and A. G. Barto (1994)
    Adaptive linear quadratic control using policy iteration
    CMPSCI Technical Report 94-49, University of Massachusetts, June 1994. Submitted to IEEE Transactions on Automatic Control, April 1994.

  • S. Bradtke and M. Duff (1994)
    Reinforcement learning methods for continuous-time Markov decision problems
    7th Annual Conference on Neural Information Processing Systems (NIPS 7), November 1994.

before 1994