ALL Logo

Autonomous Learning Laboratory

College of Information and Computer Sciences
University of Massachusetts

The Autonomous Learning Laboratory (ALL) conducts foundational artificial intelligence (AI) research, with emphases on AI safety and reinforcement learning (RL), and particularly the intersection of these two areas.

The long-term goals of the laboratory are to develop more capable artificial agents, ensure that systems that use artificial intelligence methods are safe and well-behaved, improve our understanding of biological learning and its neural basis, and to forge stronger links between studies of learning by computer scientists, engineers, neuroscientists, and psychologists.

For an overview of the ML papers from UMass in 2021, see our 2021 retrospective here.

People

Directors

Bruno Castro da Silva Bruno Castro da Silva Co-director bsilva@cs.umass.edu
Philip Thomas Philip S. Thomas Co-director pthomas@cs.umass.edu

Staff

Matt Lustig Matt Lustig Grants and Contracts Coordinator mllustig@cs.umass.edu

Doctoral Students

Blossom Metevier Blossom Metevier PhD Student bmetevier@cs.umass.edu
James Kostas James Kostas PhD Student jekostas@cs.umass.edu
Aline Weber Aline Weber PhD Student alineweber@cs.umass.edu
Dhawal Gupta Dhawal Gupta PhD Student dgupta@cs.umass.edu
Shreyas Chaudhari Shreyas Chaudhari PhD Student schaudhari@cs.umass.edu
Will Schwarzer Will Schwarzer PhD Student wschwarzer@cs.umass.edu
John Raisbeck John Raisbeck PhD Student jraisbeck@cs.umass.edu
Alexandra Burushkina Alexandra Burushkina PhD Student aburushkina@cs.umass.edu
Norman Renhao Zhang Norman Renhao Zhang PhD Student renhaozhang@cs.umass.edu
Alumni

Directors

Sridhar Mahadevan Sridhar Mahadevan Director
Not accepting new students
mahadeva@cs.umass.edu
Andrew Barto Andrew Barto Founder
Not accepting new students
barto@cs.umass.edu

Doctoral Students

Name Adviser Year Current Website
Chris Nota Philip S. Thomas 2023 link
Scott Jordan Philip S. Thomas 2022 link
Yash Chandak Philip S. Thomas 2022 link
Stephen Giguere Philip S. Thomas 2021 link
Francisco Garcia Philip S. Thomas 2019 link
Clemens Rosenbaum Sridhar Mahadevan 2019 link
Ian Gemp Sridhar Mahadevan 2019 link
Thomas Boucher Sridhar Mahadevan 2018 link
CJ Carey Sridhar Mahadevan 2017 link
Bo Liu Sridhar Mahadevan 2015 link
Chris Vigorito Andrew Barto 2015 link
Philip Thomas Andrew Barto 2015 link
Bruno Castro da Silva Andrew Barto 2015 link
William Dabney Andrew Barto 2014 link
Scott Niekum Andrew Barto 2013 link
Yariv Z. Levy Andrew Barto 2012 link
Scott Kuindersma Andrew Barto 2012 link
George Konidaris Andrew Barto 2011 link
Jeffrey Johns Sridhar Mahadevan 2010 link
Chang Wang Sridhar Mahadevan 2010 link
Alicia "Pippin" Peregrin Wolfe Andrew Barto 2010 link
Sarah Osentoski Sridhar Mahadevan 2009 link
Ashvin Shah Andrew Barto 2008 link
Özgür Şimşek Andrew Barto 2008 link
Khashayar Rohanimanesh Sridhar Mahadevan 2006
Mohammad Ghavamzadeh Sridhar Mahadevan 2005 link
Anders Jonsson Andrew Barto 2005 link
Thomas Kalt Andrew Barto 2005
Balaraman Ravindran Andrew Barto 2004 link
Michael Rosenstein Andrew Barto 2003
Michael Duff Andrew Barto 2002
Amy McGovern Andrew Barto 2002 link
Theodore Perkins Andrew Barto 2002 link
Doina Precup Andrew Barto 2000 link
Bob Crites Andrew Barto 1996
S. J. Bradtke Andrew Barto 1994
Satinder Singh Andrew Barto 1993 link
J. R. Backrach Andrew Barto 1992 link
Vijaykumar Gullapalli Andrew Barto 1992
Robert A. Jacobs Andrew Barto 1990 link
J. S. Judd Andrew Barto 1988
Charles W. Anderson Andrew Barto 1986 link
Richard S. Sutton Andrew Barto 1984 link

Postdocs

Name Adviser Year Current Website
Jay Buckingham Andrew Barto
Michael Kositsky Andrew Barto 1998 - 2001
Matthew Schlesinger Andrew Barto 1998 - 2000 link
Andrew H. Fagg Andrew Barto 1998 - 2004 link
Sascha E. Engelbrecht Andrew Barto 1996 - 2002
Vijaykumar Gullapalli Andrew Barto 1992 - 1994
Michael Jordan Andrew Barto link

Masters and Bachelors Students

Name Adviser Year Degree
Sarah Brockman P. S. Thomas 2019 BS
Michael Amirault P. S. Thomas 2018 BS
Stefan Dernbach Sridhar Mahadevan 2015 MS
Jonathan Leahey Sridhar Mahadevan 2013 MS
Jie Chen Sridhar Mahadevan 2013 MS
Andrew Stout Andrew Barto 2011 MS
Armita Kaboli Andrew Barto 2011 MS
Peter Krafft Andrew Barto 2010 BS
Colin Barringer Andrew Barto 2007 MS
Suchi Saria Sridhar Mahadevan 2002 - 2004 BS
Eric Sondhi Sridhar Mahadevan BS
Ilya Scheidwasser Sridhar Mahadevan BS

Publications

2023

  • C. Nota
    On the Convergence of Discounted Policy Gradient Methods
    arXiv:2212.14066, 2023.
    [arXiv]
  • A. Hoag, J. Kostas, B. da Silva, P. S. Thomas, and Y. Brun
    Seldonian Toolkit: Building Software with Safe and Fair Machine Learning
    At ICSE 2023.
  • V. Liu, Y. Chandak, P. S. Thomas, and M. White
    Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments.
    To appear in AI Stats 2023.

2022

  • Y. Chandak, S. Niekum, B. C. D. Silva, E. Learned-Miller, E. Brunskill, P. S. Thomas
    Universal Off-Policy Evaluation
    RLDM 2022 Best Paper Winner!
    [arXiv]
  • A. Weber*, B. Metevier*, Y. Brun, P. S. Thomas, B.C. da Silva
    Enforcing Delayed-Impact Fairness Guarantees
    At RLDM 2022.
  • J. E. Kostas, S. M. Jordan, Y. Chandak, G. Theocharous, D. Gupta, P. S. Thomas
    A Generalized Learning Rule for Asynchronous Coagent Networks
    At RLDM 2022.
  • C Nota, C Wong, and P. S. Thomas
    Auto-Encoding Recurrent Representations
    At RLDM 2022.
    [pdf]
  • W. Tan, D. Koleczek, S. Pradhan, N. Perello, V. Chettiar, N. Ma, A. Rajaram, V. Rohra, S. Srinivasan, H. M. S. Hossain, Y. Chandak
    On Optimizing Interventions in Shared Autonomy
    In AAAI 2022.
    [arXiv]
  • C. Yuan, Y. Chandak, S. Giguere, P. S. Thomas, S. Niekum
    SOPE: Spectrum of Off-Policy Estimators
    At AAAI 2022.
    [arXiv]
  • S. Giguere, B. Metevier, Y. Brun, B. Castro da Silva, P. S. Thomas, and S. Niekum
    Fairness Guarantees under Demographic Shift
    In ICLR 2022.
    [pdf]
  • J. Yeager, E. Moss, M. Norrish, and P. S. Thomas
    Mechanizing Soundness of Off-Policy Evaluation
    In ITP 2022.
  • A. Bhatia, P. S. Thomas, S. Zilberstein
    Adaptive Rollout Length for Model-Based RL using Model-Free Deep RL
    arXiv:2206.02380, 2022.
  • Y. Chandak, S. Shankar, N. Bastian, B. Castro da Silva, E. Brunskill, and P. S. Thomas.
    Off-Policy Evaluation for Action-Dependent Non-stationary Environments
    In NeurIPS 2022.

2021

  • J. Kostas, Y. Chandak, S. Jordan, G. Theocharous, and P. S. Thomas
    High Confidence Generalization for Reinforcement Learning
    In ICML 2021.
    [pdf] [link]
  • C. Nota, B. Castro da Silva, and P. S. Thomas
    Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods
    In ICML 2021.
    [pdf] [link]
  • Y. Chandak, S. Shankar, P.S. Thomas
    High Confidence Off-Policy (or Counterfactual) Variance Estimation
    In AAAI 2021.
    [pdf] [link] [arXiv]
  • M. Phan, P. S. Thomas, and E. Learned-Miller
    Towards Practical Mean Bounds for Small Samples
    In ICML 2021.
    [link] [arXiv]
  • L. Alegre, A. L. Bazzan, and B. Castro da Silva
    Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection
    In AAMAS 2021.
    [pdf] [link]
  • W. Tan, D. Koleczek, S. Pradhan, N. Perello, V. Chettiar, N. Ma, A. Rajaram, V. Rohra, S. Srinivasan, H. M. S. Hossain, Y. Chandak
    Intervention Aware Shared Autonomy
    HumanAI@ICML, 2021.
    [pdf]
  • Y. Chandak, S. Niekum, B. C. D. Silva, E. Learned-Miller, E. Brunskill, P. S. Thomas
    Universal Off-Policy Evaluation
    In NeurIPS 2021.
    [arXiv]
  • C. Yuan, Y. Chandak, S. Giguere, P. S. Thomas, S. Niekum
    SOPE: Spectrum of Off-Policy Estimators
    In NeurIPS 2021.
    [arXiv]
  • E. Lobo, Y. Chandak, D. Subramanian, J. Hanna, M. Petrik
    Behavior Policy Search for Risk Estimators in Reinforcement Learning
    At SafeRL@NeurIPS 2021.
    [arXiv]

2020

  • Y. Chandak, S. Jordan, G. Theocharous, M. White, P. S. Thomas
    Towards Safe Policy Improvement for Non-Stationary MDPs
    In NeurIPS 2020.
    [pdf] [link] [arXiv]
  • J. Kostas, C. Nota, and P. S. Thomas
    Asynchronous Coagent Networks
    In ICML 2020.
    [pdf] [supplementary materials]
  • S. M. Jordan, Y. Chandak, D. Cohen, M. Zhang, and P. S. Thomas
    Evaluating the Performance of Reinforcement Learning Algorithms
    In ICML 2020.
    [pdf] [arXiv] [code]
  • Y. Chandak, G. Theocharous, S. Shankar, M. White, S. Mahadevan, and P. S. Thomas
    Optimizing for the Future in Non-Stationary MDPs
    In ICML 2020.
    [pdf] [arXiv]
  • C. Nota and P. S. Thomas
    Is the Policy Gradient a Gradient?
    In AAMAS 2020.
    [pdf] [arXiv]
  • Y. Chandak, G. Theocharous, C. Nota, and P. S. Thomas
    Lifelong Learning with a Changing Action Set
    In AAAI 2020.
    [pdf] [arXiv]
  • Y. Chandak, G. Theocharous, B. Metevier, P. S. Thomas
    Reinforcement Learning When All Actions are Not Always Available
    In AAAI 2020.
    [pdf] [arXiv]
  • G. Theocharous, Y. Chandak, P. S. Thomas, and F. de Nijs.
    Reinforcement Learning for Strategic Recommendations.
    arXiv:2009.07346, 2020.
    [pdf] [arXiv]

2019

  • P. S. Thomas, B. Castro da Silva, A. G. Barto, S. Giguere, Y. Brun, and E. Brunskill.
    Preventing Undesirable Behavior of Intelligent Machines
    In Science vol. 366, Issue 6468, pages 999–1004, 2019.
    [link] [supplementary materials]
  • B. Metevier, S. Giguere, S. Brockman, A. Kobren, Y. Brun, E. Brunskill, and P. S. Thomas
    Offline Contextual Bandits with High Probability Fairness Guarantees
    In NeurIPS, 2019.
    [pdf] [link]
  • F. Garcia and P. S. Thomas
    A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
    In NeurIPS 2019
    [pdf] [link]
  • Y. Chandak, G. Theocharous, J. Kostas, S. M. Jordan, and P. S. Thomas
    Learning Action Representations for Reinforcement Learning
    In ICML, 2019.
    [pdf] [arXiv]
  • P. S. Thomas and E. Learned-Miller
    Concentration Inequalities for Conditional Value at Risk
    In ICML, 2019.
    [pdf]
  • S. Tiwari and P. S. Thomas
    Natural Option Critic
    In AAAI, 2019.
    [pdf] [arXiv]
  • S. M. Jordan, Y. Chandak, M. Zhang, D. Cohen, P. S. Thomas
    Evaluating Reinforcement learning Algorithms Using Cumulative Distributions of Performance
    At RLDM, 2019.
  • Y. Chandak, G. Theocharous, J. Kostas, S. M. Jordan, and P. S. Thomas
    Improving Generalization over Large Action Sets
    At RLDM, 2019.
  • P. S. Thomas, S. M. Jordan, Y. Chandak, C. Nota, and J. Kostas
    Classical Policy Gradient: Preserving Bellman's Principle of Optimality
    [arXiv]
  • E. Learned-Miller and P.S. Thomas
    A New Confidence Interval for the Mean of a Bounded Random Variable
    [pdf] [arXiv]
  • J. Kostas, C. Nota, and P. S. Thomas
    Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
    [pdf] [arXiv]

2018

  • P. S. Thomas, C. Dann, and E. Brunskill.
    Decoupling Gradient-Like Learning Rules from Representations
    In ICML, 2018.
    [ pdf ]
  • C. Rosenbaum, T. Klinger, and M. Riemer
    Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning
    In ICLR, 2018.
    [ pdf ]
  • M. Machado, C. Rosenbaum, X. Guo, M. Liu, G. Tesauro, and M. Campbell
    Eigenoption Discovery through the Deep Successor Representation
    In ICLR, 2018.
    [ pdf ]
  • Y. Chandak, G. Theocharous, J. Kostas, and P. S. Thomas
    Reinforcement Learning with a Dynamic Action Set
    In Continual Learning workshop , NIPS 2018.
  • S. M. Jordan, D. Cohen, and P. S. Thomas
    Using Cumulative Distribution Based Performance Analysis to Benchmark Models
    In Critiquing and Correcting Trends in ML workshop, NIPS 2018.
    [ pdf ]
  • S. Giguere and P. S. Thomas.
    Classification with Probabilistic Fairness Guarantees
    Presented at FairWare, 2018.
  • A. Jagannatha, P. S. Thomas, and H. Yu.
    Towards High Confidence Off-Policy Reinforcement Learning for Clinical Applications
    Presented at CausalML, 2018.
    [ pdf ]

2017

  • I. Durugkar, I. Gemp, and S. Mahadevan
    Generative Multi-Adversarial Networks
    In ICLR, 2017.
    [ pdf ]
  • X. Guo, T. Klinger, C. Rosenbaum, J. P. Bigus, M. Campbell, B. Kawas, K. Talamadupula, G. Tesauro, and S. Singh
    Learning to Query, Reason, and Answer Questions On Ambiguous Texts
    In ICLR, 2017.
    [ pdf ]
  • C. Rosenbaum, T. Gao, and T. Klinger
    e-QRAQ: A Multi-turn Reasoning Dataset and Simulator with Explanations
    In WHI@ICML, 2017.
    [ pdf ]

1978 – 2016

Click here for a listing of older publications.

Joining

Prospective Doctoral Students:

Prof. da Silva will be accepting one student. Prof. Thomas will not be recruiting doctoral students for Fall 2024. During years that we are recruiting, submit your application here. If you mention the lab directors and your interest in the lab in your application, we will be notified and will look through your application materials.


Prospective Interns:

The Autonomous Learning Laboratory is not accepting applications for interns at any level at this time.


Prospective Masters Students:

The Autonomous Learning Laboratory is not accepting applications for masters level positions at this time.


Prospective Postdoctoral Researchers:

The Autonomous Learning Laboratory is not accepting applications for postdoctoral researchers at this time.