I am a final-year Ph.D. student in Computer Sciences at the University of Wisconsin-Madison, advised by Josiah Hanna. My research focuses on improving the data efficiency of reinforcement learning (RL) algorithms by designing better data collection strategies. Since RL algorithms typically require an impractical amount of interaction with a task to perform well, my work asks: What data should we collect to learn as efficiently as possible? and How do we efficiently collect such data? Towards this end, I currently work on:
- Adaptive sampling algorithms for on-policy policy gradient methods
- Synthetic data generation for off-policy and offline RL
Previously, I was a research intern with Amazon's Rufus team (working on multi-objective alignment for LLMs) and Sandia National Laboratories (working on RL for power grid management). During the first year of my PhD, I worked in databases with Jignesh Patel. I received a BPhil in physics and a B.S. in mathematics from University of Pittsburgh where I studied high-energy physics with Vladimir Savinov.
I am looking for postdoc or research scientist opportunities starting Summer/Fall 2026.
Feel free to drop an email if you're interested in chatting!
News
- [July 2025] 🏆 "When Can Model-Free Reinforcement Learning be Enough for Thinking?" was awarded Most-Thought-Provoking Paper at Finding the Frame Workshop @ RLC 2025 (Oral)!
- [July 2025] "When Can Model-Free Reinforcement Learning be Enough for Thinking?" and "On-Policy Policy Gradient Learning Without On-Policy Sampling" accepted at Finding the Frame Workshop @ RLC 2025!
- [May 2025] "AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs" accepted at ACL 2025 Main Conference!
- [Nov 2024] 🏆 I received the Top Review Award at NeurIPS 2024!
- [July 2024] I joined Amazon's Rufus Team as a research intern working with Julian Katz-Samuels!
- [May 2024] 1 paper accepted at RLC 2024!
- [January 2024] 1 paper accepted at ICLR 2024!
- [October 2023] I gave a talk on adaptive off-policy sampling for on-policy learning at the University of Edinburgh RL reading group!
- [January 2023] 🏆 I received the Sandia Employee Recognition Award!
Publications and Preprints
Josiah P. Hanna, Nicholas E. Corrado
Under Review
🏆 Most-Thought-Provoking Paper in Finding the Frame Workshop @ RLC, 2025 (Oral)
arXiv / bibtex
TLDR: This paper answers the question: Under what conditions will model-free reinforcement learning give rise to thinking as a strategy for reward maximization?
Another special thanks to Julian Katz-Samuels. Much of what I know about LLMs comes from working with him, and that foundation made it easy to contribute to this project.
Nicholas E. Corrado, Josiah P. Hanna
Under Review
arxiv
TLDR: We identify a subtle failure mode of independent on-policy MARL: on-policy sampling can produce data that deviates from the expected joint on-policy distribution, yielding inaccurate gradients that can make agents converge suboptimally—even when the expected gradient of each agent aligns with optimal behavior. We introduce adaptive sampling algorithm that reduces this sampling error w.r.t. the joint on-policy distribution, enabling agents to more reliability converge to an optimal equilibria.
Nicholas E. Corrado, Julian Katz-Samuels, Adithya Devraj, Hyokun Yun, Chao Zhang, Yi Xu, Yi Pan, Bing Yin, Trishul Chilimbi
Association for Computational Linguistics (ACL, Main Conference), 2025
arXiv / bibtex
TLDR: Naively aligning LLMs across many datasets each targeting different tasks often yields a model that performs well on some tasks but not others. We introduce AutoMixAlign, a theoretically-grounded data mixing algorithm that adaptively mixes datasets during training to balance performance across tasks.
Special thanks to Julian Katz-Samuels, my mentor at Amazon. I came into this project with no prior LLM experience, and Julian's guidance, support, and confidence in me made all the difference.
Nicholas E. Corrado, Josiah P. Hanna
Under Review
Also in Finding the Frame Workshop @ Reinforcement Learning Conference (RLC), 2025
arXiv / bibtex
TLDR: On-policy learning requires on-policy data, not on-policy sampling! We introduce an adaptive, off-policy sampling algorithm that produces on-policy data more efficiently than on-policy sampling, improving data efficiency.
Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, & Josiah P. Hanna
Reinforcement Learning Conference (RLC), 2024
arXiv / bibtex
TLDR: While offline RL algorithm can in principle learn from highly suboptimal data, they nevertheless perform much better with near expert-quality data. Taking a "data first" perspective, we introduce a data augmentation framework that automatically generates near expert-quality synthetic data.
Special thanks to John U. Balis and Adam Labiosa for helping with the physical robotics experiments.
Nicholas E. Corrado, Josiah P. Hanna
International Conference on Learning Representations (ICLR), 2024
arXiv / bibtex
TLDR: Several prior works introduce data augmentation techniques that improve the data efficiency of RL. Rather than proposing yet another method, we ask: when and why does data augmentation help? Our work offers practical insights to help RL practitioners apply data augmentation more effectively.
Nicholas E. Corrado, Michael Livesay, Tyson Bailey, & Drew Levin.
IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, (IEEE SmartGridComm), 2023
paper / bibtex
Nicholas E. Corrado, Yuxiao Qu, & Josiah P. Hanna
Conference on Lifelong Learning Agents (CoLLAs), 2022
paper / bibtex