Autonomous Learning Laboratory

Name	Adviser	Year	Current Website
James Kostas	Philip S. Thomas	2025	link
Chris Nota	Philip S. Thomas	2023	link
Scott Jordan	Philip S. Thomas	2022	link
Yash Chandak	Philip S. Thomas	2022	link
Stephen Giguere	Philip S. Thomas	2021	link
Francisco Garcia	Philip S. Thomas	2019	link
Clemens Rosenbaum	Sridhar Mahadevan	2019	link
Ian Gemp	Sridhar Mahadevan	2019	link
Thomas Boucher	Sridhar Mahadevan	2018	link
CJ Carey	Sridhar Mahadevan	2017	link
Bo Liu	Sridhar Mahadevan	2015	link
Chris Vigorito	Andrew Barto	2015	link
Philip Thomas	Andrew Barto	2015	link
Bruno Castro da Silva	Andrew Barto	2015	link
William Dabney	Andrew Barto	2014	link
Scott Niekum	Andrew Barto	2013	link
Yariv Z. Levy	Andrew Barto	2012	link
Scott Kuindersma	Andrew Barto	2012	link
George Konidaris	Andrew Barto	2011	link
Jeffrey Johns	Sridhar Mahadevan	2010	link
Chang Wang	Sridhar Mahadevan	2010	link
Alicia "Pippin" Peregrin Wolfe	Andrew Barto	2010	link
Sarah Osentoski	Sridhar Mahadevan	2009	link
Ashvin Shah	Andrew Barto	2008	link
Özgür Şimşek	Andrew Barto	2008	link
Khashayar Rohanimanesh	Sridhar Mahadevan	2006
Mohammad Ghavamzadeh	Sridhar Mahadevan	2005	link
Anders Jonsson	Andrew Barto	2005	link
Thomas Kalt	Andrew Barto	2005
Balaraman Ravindran	Andrew Barto	2004	link
Michael Rosenstein	Andrew Barto	2003
Michael Duff	Andrew Barto	2002
Amy McGovern	Andrew Barto	2002	link
Theodore Perkins	Andrew Barto	2002	link
Doina Precup	Andrew Barto	2000	link
Bob Crites	Andrew Barto	1996
S. J. Bradtke	Andrew Barto	1994
Satinder Singh	Andrew Barto	1993	link
J. R. Backrach	Andrew Barto	1992	link
Vijaykumar Gullapalli	Andrew Barto	1992
Robert A. Jacobs	Andrew Barto	1990	link
J. S. Judd	Andrew Barto	1988
Charles W. Anderson	Andrew Barto	1986	link
Richard S. Sutton	Andrew Barto	1984	link

Name	Adviser	Year	Current Website
Jay Buckingham	Andrew Barto
Michael Kositsky	Andrew Barto	1998 - 2001
Matthew Schlesinger	Andrew Barto	1998 - 2000	link
Andrew H. Fagg	Andrew Barto	1998 - 2004	link
Sascha E. Engelbrecht	Andrew Barto	1996 - 2002
Vijaykumar Gullapalli	Andrew Barto	1992 - 1994
Michael Jordan	Andrew Barto		link

Name	Adviser	Year	Degree
Sarah Brockman	P. S. Thomas	2019	BS
Michael Amirault	P. S. Thomas	2018	BS
Stefan Dernbach	Sridhar Mahadevan	2015	MS
Jonathan Leahey	Sridhar Mahadevan	2013	MS
Jie Chen	Sridhar Mahadevan	2013	MS
Andrew Stout	Andrew Barto	2011	MS
Armita Kaboli	Andrew Barto	2011	MS
Peter Krafft	Andrew Barto	2010	BS
Colin Barringer	Andrew Barto	2007	MS
Suchi Saria	Sridhar Mahadevan	2002 - 2004	BS
Eric Sondhi	Sridhar Mahadevan		BS
Ilya Scheidwasser	Sridhar Mahadevan		BS

2024

J. P. Hanna, Y. Chandak, P. S. Thomas, M. White, P. Stone, and S. Niekum
Data-Efficient Policy Evaluation Through Behavior Policy Search
In Journal of Machine Learning Research, vol. 25, Issue 313, pages 1–58, 2024.
[pdf]
S. Chaudhari, A. Deshpande, B. Castro da Silva, and P. S. Thomas
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation
In Advances in Neural Information Processing Systems (NeurIPS), 2024.
[arXiv]
A. Ayoub, D. Szepesvari, F. Zanini, D. Gupta, B. Chan, B. C. da Silva, D. Schuurmans
Mitigating the Curse of Horizon in Monte-Carlo Returns
In The 1st Reinforcement Learning Conference (RLC), 2024.
[pdf]
K. Choudhary, D. Gupta, and P. S. Thomas
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data
Reinforcement Learning Journal, vol. 4, pages 1546–1566, September 2024.
[pdf, arXiv]
S. M. Jordan, S. Neumann, J. E. Kostas, A. White, and P. S. Thomas
The Cliff of Overcommitment with Policy Gradient Step Sizes
Reinforcement Learning Journal, vol. 2, pages 864–883, September 2024.
[pdf]
S. M. Jordan, B. Castro da Silva, A. White, M. White, and P. S. Thomas
Position: Benchmarking in Reinforcement Learning is limited and Alternatives are Needed
In Proceedings of the International Conference on Machine Learning (ICML), 2024.
[pdf, arXiv]
Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Ameet Deshpande, Karthik Narasimhan, Vishvak Murahari
Personagym: Evaluating persona agents and LLMs
[arXiv]
S. Yeh, B. Metevier, A. Hoag, and P. S. Thomas
Analyzing the Relationship Between Difference and Ratio-Based Fairness Metrics
In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT), 2024.
[pdf]
D. M. Bossens and P. S. Thomas
Low Variance Off-policy Evaluation with State-based Importance Sampling
In Proceedings of the IEEE Conference on Artificial Intelligence (IEEE CAI), 2024.
[pdf, arXiv]
Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
[arXiv]
Abhiman Neelakanteswara, Shreyas Chaudhari, Hamed Zamani
RAGs to Style: Personalizing LLMs with Style Embeddings
In Workshop on Personalization of Generative AI Systems @ EACL '24.
[pdf]
D. Gupta, S. M. Jordan, S. Chaudhari, B. Liu, P. S. Thomas, and B. Castro da Silva
From Past to Future: Rethinking Eligibility Traces
In The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024.
[pdf, arXiv]
Shreyas Chaudhari, David Arbour, Georgios Theocharous, Nikos Vlassis
Distributional Off-Policy Evaluation for Slate Recommendations
In AAAI '24.
[pdf]

2023

D. Gupta, Y. Chow, M. Ghavamzadeh, C. Boutilier
Offline Reinforcement Learning for Mixture-of-Expert Dialogue Management
In Thirty-Seventh Conference on Neural Information Processing Systems (NeurIPS), 2023.
[arXiv]
D. Gupta, Y. Chandak, S. M. Jordan, P. S. Thomas, and B. Castro da Silva
Behavior Alignment via Reward Function Optimization
In Advances in Neural Information Processing Systems (NeurIPS), 2023.
[pdf, arXiv]
S. Chaudhari, P. S. Thomas, and B. Castro da Silva
Learning Models and Evaluating Policies with Offline Off-Policy Data under Partial Observability
In NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World (RealML-2023), 2023.
[pdf]
Y. Luo, A. Hoag, and P. S. Thomas
Learning Fair Representations with High-Confidence Guarantees
arXiv:2310.15358, 2023.
[pdf, arXiv]
J. E. Kostas, S. M. Jordan, Y. Chandak, G. Theocharous, D. Gupta, M. White, B. Castro da Silva, and P. S. Thomas
Coagent Networks: Generalized and Scaled
arXiv:2305.09838, 2023.
[pdf, arXiv]
C. Nota
On the Convergence of Discounted Policy Gradient Methods
arXiv:2212.14066, 2023.
[arXiv]
A. Hoag, J. Kostas, B. da Silva, P. S. Thomas, and Y. Brun
Seldonian Toolkit: Building Software with Safe and Fair Machine Learning
At ICSE 2023.
V. Liu, Y. Chandak, P. S. Thomas, and M. White
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments.
At AI Stats 2023.
Y. Chow, A. Tulepbergenov, O. Machum, D. Gupta, M. Ryu, M. Ghavamzadeh, C. Boutilier
A Mixture-of-Expert Approach to RL-based Dialogue Management
In International Conference on Learning Representations (ICLR), 2023.
[pdf]

2022

Y. Chandak, S. Niekum, B. C. D. Silva, E. Learned-Miller, E. Brunskill, P. S. Thomas
Universal Off-Policy Evaluation
RLDM 2022 Best Paper Winner!
[arXiv]
A. Weber*, B. Metevier*, Y. Brun, P. S. Thomas, B.C. da Silva
Enforcing Delayed-Impact Fairness Guarantees
At RLDM 2022.
J. E. Kostas, S. M. Jordan, Y. Chandak, G. Theocharous, D. Gupta, P. S. Thomas
A Generalized Learning Rule for Asynchronous Coagent Networks
At RLDM 2022.
C Nota, C Wong, and P. S. Thomas
Auto-Encoding Recurrent Representations
At RLDM 2022.
[pdf]
W. Tan, D. Koleczek, S. Pradhan, N. Perello, V. Chettiar, N. Ma, A. Rajaram, V. Rohra, S. Srinivasan, H. M. S. Hossain, Y. Chandak
On Optimizing Interventions in Shared Autonomy
In AAAI 2022.
[arXiv]
C. Yuan, Y. Chandak, S. Giguere, P. S. Thomas, S. Niekum
SOPE: Spectrum of Off-Policy Estimators
At AAAI 2022.
[arXiv]
S. Giguere, B. Metevier, Y. Brun, B. Castro da Silva, P. S. Thomas, and S. Niekum
Fairness Guarantees under Demographic Shift
In ICLR 2022.
[pdf]
J. Yeager, E. Moss, M. Norrish, and P. S. Thomas
Mechanizing Soundness of Off-Policy Evaluation
In ITP 2022.
A. Bhatia, P. S. Thomas, S. Zilberstein
Adaptive Rollout Length for Model-Based RL using Model-Free Deep RL
arXiv:2206.02380, 2022.
Y. Chandak, S. Shankar, N. Bastian, B. Castro da Silva, E. Brunskill, and P. S. Thomas.
Off-Policy Evaluation for Action-Dependent Non-stationary Environments
In NeurIPS 2022.

2021

J. Kostas, Y. Chandak, S. Jordan, G. Theocharous, and P. S. Thomas
High Confidence Generalization for Reinforcement Learning
In ICML 2021.
[pdf] [link]
C. Nota, B. Castro da Silva, and P. S. Thomas
Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods
In ICML 2021.
[pdf] [link]
Y. Chandak, S. Shankar, P.S. Thomas
High Confidence Off-Policy (or Counterfactual) Variance Estimation
In AAAI 2021.
[pdf] [link] [arXiv]
M. Phan, P. S. Thomas, and E. Learned-Miller
Towards Practical Mean Bounds for Small Samples
In ICML 2021.
[link] [arXiv]
L. Alegre, A. L. Bazzan, and B. Castro da Silva
Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection
In AAMAS 2021.
[pdf] [link]
W. Tan, D. Koleczek, S. Pradhan, N. Perello, V. Chettiar, N. Ma, A. Rajaram, V. Rohra, S. Srinivasan, H. M. S. Hossain, Y. Chandak
Intervention Aware Shared Autonomy
HumanAI@ICML, 2021.
[pdf]
Y. Chandak, S. Niekum, B. C. D. Silva, E. Learned-Miller, E. Brunskill, P. S. Thomas
Universal Off-Policy Evaluation
In NeurIPS 2021.
[arXiv]
C. Yuan, Y. Chandak, S. Giguere, P. S. Thomas, S. Niekum
SOPE: Spectrum of Off-Policy Estimators
In NeurIPS 2021.
[arXiv]
E. Lobo, Y. Chandak, D. Subramanian, J. Hanna, M. Petrik
Behavior Policy Search for Risk Estimators in Reinforcement Learning
At SafeRL@NeurIPS 2021.
[arXiv]

2020

Y. Chandak, S. Jordan, G. Theocharous, M. White, P. S. Thomas
Towards Safe Policy Improvement for Non-Stationary MDPs
In NeurIPS 2020.
[pdf] [link] [arXiv]
J. Kostas, C. Nota, and P. S. Thomas
Asynchronous Coagent Networks
In ICML 2020.
[pdf] [supplementary materials]
S. M. Jordan, Y. Chandak, D. Cohen, M. Zhang, and P. S. Thomas
Evaluating the Performance of Reinforcement Learning Algorithms
In ICML 2020.
[pdf] [arXiv] [code]
Y. Chandak, G. Theocharous, S. Shankar, M. White, S. Mahadevan, and P. S. Thomas
Optimizing for the Future in Non-Stationary MDPs
In ICML 2020.
[pdf] [arXiv]
C. Nota and P. S. Thomas
Is the Policy Gradient a Gradient?
In AAMAS 2020.
[pdf] [arXiv]
Y. Chandak, G. Theocharous, C. Nota, and P. S. Thomas
Lifelong Learning with a Changing Action Set
In AAAI 2020.
[pdf] [arXiv]
Y. Chandak, G. Theocharous, B. Metevier, P. S. Thomas
Reinforcement Learning When All Actions are Not Always Available
In AAAI 2020.
[pdf] [arXiv]
G. Theocharous, Y. Chandak, P. S. Thomas, and F. de Nijs.
Reinforcement Learning for Strategic Recommendations.
arXiv:2009.07346, 2020.
[pdf] [arXiv]

2019

P. S. Thomas, B. Castro da Silva, A. G. Barto, S. Giguere, Y. Brun, and E. Brunskill.
Preventing Undesirable Behavior of Intelligent Machines
In Science vol. 366, Issue 6468, pages 999–1004, 2019.
[link] [supplementary materials]
B. Metevier, S. Giguere, S. Brockman, A. Kobren, Y. Brun, E. Brunskill, and P. S. Thomas
Offline Contextual Bandits with High Probability Fairness Guarantees
In NeurIPS, 2019.
[pdf] [link]
F. Garcia and P. S. Thomas
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning
In NeurIPS 2019
[pdf] [link]
Y. Chandak, G. Theocharous, J. Kostas, S. M. Jordan, and P. S. Thomas
Learning Action Representations for Reinforcement Learning
In ICML, 2019.
[pdf] [arXiv]
P. S. Thomas and E. Learned-Miller
Concentration Inequalities for Conditional Value at Risk
In ICML, 2019.
[pdf]
S. Tiwari and P. S. Thomas
Natural Option Critic
In AAAI, 2019.
[pdf] [arXiv]
S. M. Jordan, Y. Chandak, M. Zhang, D. Cohen, P. S. Thomas
Evaluating Reinforcement learning Algorithms Using Cumulative Distributions of Performance
At RLDM, 2019.
Y. Chandak, G. Theocharous, J. Kostas, S. M. Jordan, and P. S. Thomas
Improving Generalization over Large Action Sets
At RLDM, 2019.
P. S. Thomas, S. M. Jordan, Y. Chandak, C. Nota, and J. Kostas
Classical Policy Gradient: Preserving Bellman's Principle of Optimality
[arXiv]
E. Learned-Miller and P.S. Thomas
A New Confidence Interval for the Mean of a Bounded Random Variable
[pdf] [arXiv]
J. Kostas, C. Nota, and P. S. Thomas
Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock
[pdf] [arXiv]

2018

P. S. Thomas, C. Dann, and E. Brunskill.
Decoupling Gradient-Like Learning Rules from Representations
In ICML, 2018.
[ pdf ]
C. Rosenbaum, T. Klinger, and M. Riemer
Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning
In ICLR, 2018.
[ pdf ]
M. Machado, C. Rosenbaum, X. Guo, M. Liu, G. Tesauro, and M. Campbell
Eigenoption Discovery through the Deep Successor Representation
In ICLR, 2018.
[ pdf ]
Y. Chandak, G. Theocharous, J. Kostas, and P. S. Thomas
Reinforcement Learning with a Dynamic Action Set
In Continual Learning workshop , NIPS 2018.
S. M. Jordan, D. Cohen, and P. S. Thomas
Using Cumulative Distribution Based Performance Analysis to Benchmark Models
In Critiquing and Correcting Trends in ML workshop, NIPS 2018.
[ pdf ]
S. Giguere and P. S. Thomas.
Classification with Probabilistic Fairness Guarantees
Presented at FairWare, 2018.
A. Jagannatha, P. S. Thomas, and H. Yu.
Towards High Confidence Off-Policy Reinforcement Learning for Clinical Applications
Presented at CausalML, 2018.
[ pdf ]

2017

I. Durugkar, I. Gemp, and S. Mahadevan
Generative Multi-Adversarial Networks
In ICLR, 2017.
[ pdf ]
X. Guo, T. Klinger, C. Rosenbaum, J. P. Bigus, M. Campbell, B. Kawas, K. Talamadupula, G. Tesauro, and S. Singh
Learning to Query, Reason, and Answer Questions On Ambiguous Texts
In ICLR, 2017.
[ pdf ]
C. Rosenbaum, T. Gao, and T. Klinger
e-QRAQ: A Multi-turn Reasoning Dataset and Simulator with Explanations
In WHI@ICML, 2017.
[ pdf ]

1978 – 2016

Click here for a listing of older publications.

	Bruno Castro da Silva	Co-director	bsilva@cs.umass.edu
	Philip S. Thomas	Co-director	pthomas@cs.umass.edu

Blossom Metevier	PhD Student	bmetevier@cs.umass.edu
Aline Weber	PhD Student	alineweber@cs.umass.edu
Dhawal Gupta	PhD Student	dgupta@cs.umass.edu
Shreyas Chaudhari	PhD Student	schaudhari@cs.umass.edu
Will Schwarzer	PhD Student	wschwarzer@cs.umass.edu
John Raisbeck	PhD Student	jraisbeck@cs.umass.edu
Alexandra Burushkina	PhD Student	aburushkina@cs.umass.edu
Norman Renhao Zhang	PhD Student	renhaozhang@cs.umass.edu

	Sridhar Mahadevan	Director Not accepting new students	mahadeva@cs.umass.edu
	Andrew Barto	Founder Not accepting new students	barto@cs.umass.edu

Directors

Staff

Doctoral Students

Directors

Doctoral Students

Postdocs

Masters and Bachelors Students

2024

2023

2022

2021

2020

2019

2018

2017

1978 – 2016

Prospective Doctoral Students:

Prospective Interns:

Prospective Masters Students:

Prospective Postdoctoral Researchers: