ALL Logo

Autonomous Learning Laboratory

College of Information and Computer Sciences
University of Massachusetts

The Autonomous Learning Laboratory (ALL) conducts foundational artificial intelligence (AI) research, with emphases on AI safety and reinforcement learning (RL), and particularly the intersection of these two areas.

The long-term goals of the laboratory are to develop more capable artificial agents, ensure that systems that use artificial intelligence methods are safe and well-behaved, improve our understanding of biological learning and its neural basis, and to forge stronger links between studies of learning by computer scientists, engineers, neuroscientists, and psychologists.

For an overview of the ML papers from UMass in 2021, see our 2021 retrospective here.

People

Directors

Bruno Castro da Silva Bruno Castro da Silva Co-director bsilva@cs.umass.edu
Philip Thomas Philip S. Thomas Co-director pthomas@cs.umass.edu

Staff

Matt Lustig Matt Lustig Grants and Contracts Coordinator mllustig@cs.umass.edu

Doctoral Students

Blossom Metevier Blossom Metevier PhD Student bmetevier@cs.umass.edu
James Kostas James Kostas PhD Student jekostas@cs.umass.edu
Aline Weber Aline Weber PhD Student alineweber@cs.umass.edu
Dhawal Gupta Dhawal Gupta PhD Student dgupta@cs.umass.edu
Shreyas Chaudhari Shreyas Chaudhari PhD Student schaudhari@cs.umass.edu
Will Schwarzer Will Schwarzer PhD Student wschwarzer@cs.umass.edu
John Raisbeck John Raisbeck PhD Student jraisbeck@cs.umass.edu
Alexandra Burushkina Alexandra Burushkina PhD Student aburushkina@cs.umass.edu
Norman Renhao Zhang Norman Renhao Zhang PhD Student renhaozhang@cs.umass.edu
Alumni

Directors

Sridhar Mahadevan Sridhar Mahadevan Director
Not accepting new students
mahadeva@cs.umass.edu
Andrew Barto Andrew Barto Founder
Not accepting new students
barto@cs.umass.edu

Doctoral Students

Name Adviser Year Current Website
Chris Nota Philip S. Thomas 2023 link
Scott Jordan Philip S. Thomas 2022 link
Yash Chandak Philip S. Thomas 2022 link
Stephen Giguere Philip S. Thomas 2021 link
Francisco Garcia Philip S. Thomas 2019 link
Clemens Rosenbaum Sridhar Mahadevan 2019 link
Ian Gemp Sridhar Mahadevan 2019 link
Thomas Boucher Sridhar Mahadevan 2018 link
CJ Carey Sridhar Mahadevan 2017 link
Bo Liu Sridhar Mahadevan 2015 link
Chris Vigorito Andrew Barto 2015 link
Philip Thomas Andrew Barto 2015 link
Bruno Castro da Silva Andrew Barto 2015 link
William Dabney Andrew Barto 2014 link
Scott Niekum Andrew Barto 2013 link
Yariv Z. Levy Andrew Barto 2012 link
Scott Kuindersma Andrew Barto 2012 link
George Konidaris Andrew Barto 2011 link
Jeffrey Johns Sridhar Mahadevan 2010 link
Chang Wang Sridhar Mahadevan 2010 link
Alicia "Pippin" Peregrin Wolfe Andrew Barto 2010 link
Sarah Osentoski Sridhar Mahadevan 2009 link
Ashvin Shah Andrew Barto 2008 link
Özgür Şimşek Andrew Barto 2008 link
Khashayar Rohanimanesh Sridhar Mahadevan 2006
Mohammad Ghavamzadeh Sridhar Mahadevan 2005 link
Anders Jonsson Andrew Barto 2005 link
Thomas Kalt Andrew Barto 2005
Balaraman Ravindran Andrew Barto 2004 link
Michael Rosenstein Andrew Barto 2003
Michael Duff Andrew Barto 2002
Amy McGovern Andrew Barto 2002 link
Theodore Perkins Andrew Barto 2002 link
Doina Precup Andrew Barto 2000 link
Bob Crites Andrew Barto 1996
S. J. Bradtke Andrew Barto 1994
Satinder Singh Andrew Barto 1993 link
J. R. Backrach Andrew Barto 1992 link
Vijaykumar Gullapalli Andrew Barto 1992
Robert A. Jacobs Andrew Barto 1990 link
J. S. Judd Andrew Barto 1988
Charles W. Anderson Andrew Barto 1986 link
Richard S. Sutton Andrew Barto 1984 link

Postdocs

Name Adviser Year Current Website
Jay Buckingham Andrew Barto
Michael Kositsky Andrew Barto 1998 - 2001
Matthew Schlesinger Andrew Barto 1998 - 2000 link
Andrew H. Fagg Andrew Barto 1998 - 2004 link
Sascha E. Engelbrecht Andrew Barto 1996 - 2002
Vijaykumar Gullapalli Andrew Barto 1992 - 1994
Michael Jordan Andrew Barto link

Masters and Bachelors Students

Name Adviser Year Degree
Sarah Brockman P. S. Thomas 2019 BS
Michael Amirault P. S. Thomas 2018 BS
Stefan Dernbach Sridhar Mahadevan 2015 MS
Jonathan Leahey Sridhar Mahadevan 2013 MS
Jie Chen Sridhar Mahadevan 2013 MS
Andrew Stout Andrew Barto 2011 MS
Armita Kaboli Andrew Barto 2011 MS
Peter Krafft Andrew Barto 2010 BS
Colin Barringer Andrew Barto 2007 MS
Suchi Saria Sridhar Mahadevan 2002 - 2004 BS
Eric Sondhi Sridhar Mahadevan BS
Ilya Scheidwasser Sridhar Mahadevan BS

Publications

2024

  • J. P. Hanna, Y. Chandak, P. S. Thomas, M. White, P. Stone, and S. Niekum
    Data-Efficient Policy Evaluation Through Behavior Policy Search
    In Journal of Machine Learning Research, vol. 25, Issue 313, pages 1–58, 2024.
    [pdf]
  • S. Chaudhari, A. Deshpande, B. Castro da Silva, and P. S. Thomas
    Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation
    In Advances in Neural Information Processing Systems (NeurIPS), 2024.
    [arXiv]
  • A. Ayoub, D. Szepesvari, F. Zanini, D. Gupta, B. Chan, B. C. da Silva, D. Schuurmans
    Mitigating the Curse of Horizon in Monte-Carlo Returns
    In The 1st Reinforcement Learning Conference (RLC), 2024.
    [pdf]
  • K. Choudhary, D. Gupta, and P. S. Thomas
    ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
    Reinforcement Learning Journal, vol. 4, pages 1546–1566, September 2024.
    [pdf, arXiv]
  • S. M. Jordan, S. Neumann, J. E. Kostas, A. White, and P. S. Thomas
    The Cliff of Overcommitment with Policy Gradient Step Sizes
    Reinforcement Learning Journal, vol. 2, pages 864–883, September 2024.
    [pdf]
  • S. M. Jordan, B. Castro da Silva, A. White, M. White, and P. S. Thomas
    Position: Benchmarking in Reinforcement Learning is limited and Alternatives are Needed
    In Proceedings of the International Conference on Machine Learning (ICML), 2024.
    [pdf, arXiv]
  • Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Ameet Deshpande, Karthik Narasimhan, Vishvak Murahari
    Personagym: Evaluating persona agents and LLMs
    [arXiv]
  • S. Yeh, B. Metevier, A. Hoag, and P. S. Thomas
    Analyzing the Relationship Between Difference and Ratio-Based Fairness Metrics
    In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT), 2024.
    [pdf]
  • D. M. Bossens and P. S. Thomas
    Low Variance Off-policy Evaluation with State-based Importance Sampling
    In Proceedings of the IEEE Conference on Artificial Intelligence (IEEE CAI), 2024.
    [pdf, arXiv]
  • Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva
    RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
    [arXiv]
  • Abhiman Neelakanteswara, Shreyas Chaudhari, Hamed Zamani
    RAGs to Style: Personalizing LLMs with Style Embeddings
    In Workshop on Personalization of Generative AI Systems @ EACL '24.
    [pdf]
  • D. Gupta, S. M. Jordan, S. Chaudhari, B. Liu, P. S. Thomas, and B. Castro da Silva
    From Past to Future: Rethinking Eligibility Traces
    In The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024.
    [pdf, arXiv]
  • Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis
    Distributional Off-Policy Evaluation for Slate Recommendations
    In AAAI '24.
    [pdf]

2023

  • D. Gupta, Y. Chow, M. Ghavamzadeh, C. Boutilier
    Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
    In Thirty-Seventh Conference on Neural Information Processing Systems (NeurIPS), 2023.
    [arXiv]
  • D. Gupta, Y. Chandak, S. M. Jordan, P. S. Thomas, and B. Castro da Silva
    Behavior Alignment via Reward Function Optimization
    In Advances in Neural Information Processing Systems (NeurIPS), 2023.
    [pdf, arXiv]
  • S. Chaudhari, P. S. Thomas, and B. Castro da Silva
    Learning Models and Evaluating Policies with Offline Off-Policy Data under Partial Observability
    In NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World (RealML-2023), 2023.
    [pdf]
  • Y. Luo, A. Hoag, and P. S. Thomas
    Learning Fair Representations with High-Confidence Guarantees
    arXiv:2310.15358, 2023.
    [pdf, arXiv]
  • J. E. Kostas, S. M. Jordan, Y. Chandak, G. Theocharous, D. Gupta, M. White, B. Castro da Silva, and P. S. Thomas
    Coagent Networks: Generalized and Scaled
    arXiv:2305.09838, 2023.
    [pdf, arXiv]
  • C. Nota
    On the Convergence of Discounted Policy Gradient Methods
    arXiv:2212.14066, 2023.
    [arXiv]
  • A. Hoag, J. Kostas, B. da Silva, P. S. Thomas, and Y. Brun
    Seldonian Toolkit: Building Software with Safe and Fair Machine Learning
    At ICSE 2023.
  • V. Liu, Y. Chandak, P. S. Thomas, and M. White
    Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments.
    At AI Stats 2023.
  • Y. Chow, A. Tulepbergenov, O. Machum, D. Gupta, M. Ryu, M. Ghavamzadeh, C. Boutilier
    A Mixture-of-Expert Approach to RL-based Dialogue Management
    In International Conference on Learning Representations (ICLR), 2023.
    [pdf]

2022

  • Y. Chandak, S. Niekum, B. C. D. Silva, E. Learned-Miller, E. Brunskill, P. S. Thomas
    Universal Off-Policy Evaluation
    RLDM 2022 Best Paper Winner!
    [arXiv]
  • A. Weber*, B. Metevier*, Y. Brun, P. S. Thomas, B.C. da Silva
    Enforcing Delayed-Impact Fairness Guarantees
    At RLDM 2022.
  • J. E. Kostas, S. M. Jordan, Y. Chandak, G. Theocharous, D. Gupta, P. S. Thomas
    A Generalized Learning Rule for Asynchronous Coagent Networks
    At RLDM 2022.
  • C Nota, C Wong, and P. S. Thomas
    Auto-Encoding Recurrent Representations
    At RLDM 2022.
    [pdf]
  • W. Tan, D. Koleczek, S. Pradhan, N. Perello, V. Chettiar, N. Ma, A. Rajaram, V. Rohra, S. Srinivasan, H. M. S. Hossain, Y. Chandak
    On Optimizing Interventions in Shared Autonomy
    In AAAI 2022.
    [arXiv]
  • C. Yuan, Y. Chandak, S. Giguere, P. S. Thomas, S. Niekum
    SOPE: Spectrum of Off-Policy Estimators
    At AAAI 2022.
    [arXiv]
  • S. Giguere, B. Metevier, Y. Brun, B. Castro da Silva, P. S. Thomas, and S. Niekum
    Fairness Guarantees under Demographic Shift
    In ICLR 2022.
    [pdf]
  • J. Yeager, E. Moss, M. Norrish, and P. S. Thomas
    Mechanizing Soundness of Off-Policy Evaluation
    In ITP 2022.
  • A. Bhatia, P. S. Thomas, S. Zilberstein
    Adaptive Rollout Length for Model-Based RL using Model-Free Deep RL
    arXiv:2206.02380, 2022.
  • Y. Chandak, S. Shankar, N. Bastian, B. Castro da Silva, E. Brunskill, and P. S. Thomas.
    Off-Policy Evaluation for Action-Dependent Non-stationary Environments
    In NeurIPS 2022.

2021

  • J. Kostas, Y. Chandak, S. Jordan, G. Theocharous, and P. S. Thomas
    High Confidence Generalization for Reinforcement Learning
    In ICML 2021.
    [pdf] [link]
  • C. Nota, B. Castro da Silva, and P. S. Thomas
    Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods
    In ICML 2021.
    [pdf] [link]
  • Y. Chandak, S. Shankar, P.S. Thomas
    High Confidence Off-Policy (or Counterfactual) Variance Estimation
    In AAAI 2021.
    [pdf] [link] [arXiv]
  • M. Phan, P. S. Thomas, and E. Learned-Miller
    Towards Practical Mean Bounds for Small Samples
    In ICML 2021.
    [link] [arXiv]
  • L. Alegre, A. L. Bazzan, and B. Castro da Silva
    Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection
    In AAMAS 2021.
    [pdf] [link]
  • W. Tan, D. Koleczek, S. Pradhan, N. Perello, V. Chettiar, N. Ma, A. Rajaram, V. Rohra, S. Srinivasan, H. M. S. Hossain, Y. Chandak
    Intervention Aware Shared Autonomy
    HumanAI@ICML, 2021.
    [pdf]
  • Y. Chandak, S. Niekum, B. C. D. Silva, E. Learned-Miller, E. Brunskill, P. S. Thomas
    Universal Off-Policy Evaluation
    In NeurIPS 2021.
    [arXiv]
  • C. Yuan, Y. Chandak, S. Giguere, P. S. Thomas, S. Niekum
    SOPE: Spectrum of Off-Policy Estimators
    In NeurIPS 2021.
    [arXiv]
  • E. Lobo, Y. Chandak, D. Subramanian, J. Hanna, M. Petrik
    Behavior Policy Search for Risk Estimators in Reinforcement Learning
    At SafeRL@NeurIPS 2021.
    [arXiv]

2020

  • Y. Chandak, S. Jordan, G. Theocharous, M. White, P. S. Thomas
    Towards Safe Policy Improvement for Non-Stationary MDPs
    In NeurIPS 2020.
    [pdf] [link] [arXiv]
  • J. Kostas, C. Nota, and P. S. Thomas
    Asynchronous Coagent Networks
    In ICML 2020.
    [pdf] [supplementary materials]
  • S. M. Jordan, Y. Chandak, D. Cohen, M. Zhang, and P. S. Thomas
    Evaluating the Performance of Reinforcement Learning Algorithms
    In ICML 2020.
    [pdf] [arXiv] [code]
  • Y. Chandak, G. Theocharous, S. Shankar, M. White, S. Mahadevan, and P. S. Thomas
    Optimizing for the Future in Non-Stationary MDPs
    In ICML 2020.
    [pdf] [arXiv]
  • C. Nota and P. S. Thomas
    Is the Policy Gradient a Gradient?
    In AAMAS 2020.
    [pdf] [arXiv]
  • Y. Chandak, G. Theocharous, C. Nota, and P. S. Thomas
    Lifelong Learning with a Changing Action Set
    In AAAI 2020.
    [pdf] [arXiv]
  • Y. Chandak, G. Theocharous, B. Metevier, P. S. Thomas
    Reinforcement Learning When All Actions are Not Always Available
    In AAAI 2020.
    [pdf] [arXiv]
  • G. Theocharous, Y. Chandak, P. S. Thomas, and F. de Nijs.
    Reinforcement Learning for Strategic Recommendations.
    arXiv:2009.07346, 2020.
    [pdf] [arXiv]

2019

  • P. S. Thomas, B. Castro da Silva, A. G. Barto, S. Giguere, Y. Brun, and E. Brunskill.
    Preventing Undesirable Behavior of Intelligent Machines
    In Science vol. 366, Issue 6468, pages 999–1004, 2019.
    [link] [supplementary materials]
  • B. Metevier, S. Giguere, S. Brockman, A. Kobren, Y. Brun, E. Brunskill, and P. S. Thomas
    Offline Contextual Bandits with High Probability Fairness Guarantees
    In NeurIPS, 2019.
    [pdf] [link]
  • F. Garcia and P. S. Thomas
    A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
    In NeurIPS 2019
    [pdf] [link]
  • Y. Chandak, G. Theocharous, J. Kostas, S. M. Jordan, and P. S. Thomas
    Learning Action Representations for Reinforcement Learning
    In ICML, 2019.
    [pdf] [arXiv]
  • P. S. Thomas and E. Learned-Miller
    Concentration Inequalities for Conditional Value at Risk
    In ICML, 2019.
    [pdf]
  • S. Tiwari and P. S. Thomas
    Natural Option Critic
    In AAAI, 2019.
    [pdf] [arXiv]
  • S. M. Jordan, Y. Chandak, M. Zhang, D. Cohen, P. S. Thomas
    Evaluating Reinforcement learning Algorithms Using Cumulative Distributions of Performance
    At RLDM, 2019.
  • Y. Chandak, G. Theocharous, J. Kostas, S. M. Jordan, and P. S. Thomas
    Improving Generalization over Large Action Sets
    At RLDM, 2019.
  • P. S. Thomas, S. M. Jordan, Y. Chandak, C. Nota, and J. Kostas
    Classical Policy Gradient: Preserving Bellman's Principle of Optimality
    [arXiv]
  • E. Learned-Miller and P.S. Thomas
    A New Confidence Interval for the Mean of a Bounded Random Variable
    [pdf] [arXiv]
  • J. Kostas, C. Nota, and P. S. Thomas
    Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
    [pdf] [arXiv]

2018

  • P. S. Thomas, C. Dann, and E. Brunskill.
    Decoupling Gradient-Like Learning Rules from Representations
    In ICML, 2018.
    [ pdf ]
  • C. Rosenbaum, T. Klinger, and M. Riemer
    Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning
    In ICLR, 2018.
    [ pdf ]
  • M. Machado, C. Rosenbaum, X. Guo, M. Liu, G. Tesauro, and M. Campbell
    Eigenoption Discovery through the Deep Successor Representation
    In ICLR, 2018.
    [ pdf ]
  • Y. Chandak, G. Theocharous, J. Kostas, and P. S. Thomas
    Reinforcement Learning with a Dynamic Action Set
    In Continual Learning workshop , NIPS 2018.
  • S. M. Jordan, D. Cohen, and P. S. Thomas
    Using Cumulative Distribution Based Performance Analysis to Benchmark Models
    In Critiquing and Correcting Trends in ML workshop, NIPS 2018.
    [ pdf ]
  • S. Giguere and P. S. Thomas.
    Classification with Probabilistic Fairness Guarantees
    Presented at FairWare, 2018.
  • A. Jagannatha, P. S. Thomas, and H. Yu.
    Towards High Confidence Off-Policy Reinforcement Learning for Clinical Applications
    Presented at CausalML, 2018.
    [ pdf ]

2017

  • I. Durugkar, I. Gemp, and S. Mahadevan
    Generative Multi-Adversarial Networks
    In ICLR, 2017.
    [ pdf ]
  • X. Guo, T. Klinger, C. Rosenbaum, J. P. Bigus, M. Campbell, B. Kawas, K. Talamadupula, G. Tesauro, and S. Singh
    Learning to Query, Reason, and Answer Questions On Ambiguous Texts
    In ICLR, 2017.
    [ pdf ]
  • C. Rosenbaum, T. Gao, and T. Klinger
    e-QRAQ: A Multi-turn Reasoning Dataset and Simulator with Explanations
    In WHI@ICML, 2017.
    [ pdf ]

1978 – 2016

Click here for a listing of older publications.

Joining

Prospective Doctoral Students:

Prof. da Silva will be accepting one student. Prof. Thomas will not be recruiting doctoral students for Fall 2024. During years that we are recruiting, submit your application here. If you mention the lab directors and your interest in the lab in your application, we will be notified and will look through your application materials.


Prospective Interns:

The Autonomous Learning Laboratory is not accepting applications for interns at any level at this time.


Prospective Masters Students:

The Autonomous Learning Laboratory is not accepting applications for masters level positions at this time.


Prospective Postdoctoral Researchers:

The Autonomous Learning Laboratory is not accepting applications for postdoctoral researchers at this time.