Nicholas E. Corrado
(at RLC 2025)

Ph.D. student, University of Wisconsin-Madison

I am a final-year Ph.D. student in Computer Sciences at the University of Wisconsin-Madison, advised by Josiah Hanna. My research focuses on improving the data efficiency of reinforcement learning (RL) algorithms by designing better data collection strategies. Since RL algorithms typically require an impractical amount of interaction with a task to perform well, my work asks: What data should we collect to learn as efficiently as possible? and How do we efficiently collect such data? Towards this end, I currently work on:

  1. Adaptive sampling algorithms for on-policy policy gradient methods
  2. Synthetic data generation for off-policy and offline RL

Previously, I was a research intern with Amazon's Rufus team (working on multi-objective alignment for LLMs) and Sandia National Laboratories (working on RL for power grid management). During the first year of my PhD, I worked in databases with Jignesh Patel. I received a BPhil in physics and a B.S. in mathematics from University of Pittsburgh where I studied high-energy physics with Vladimir Savinov.

I am looking for postdoc or research scientist opportunities starting Summer/Fall 2026.

Feel free to drop an email if you're interested in chatting!

News

Publications and Preprints

When Can Model-Free Reinforcement Learning be Enough for Thinking?
Josiah P. Hanna, Nicholas E. Corrado
Under Review
🏆 Most-Thought-Provoking Paper in Finding the Frame Workshop @ RLC, 2025 (Oral)
arXiv / bibtex

TLDR: This paper answers the question: Under what conditions will model-free reinforcement learning give rise to thinking as a strategy for reward maximization?

Another special thanks to Julian Katz-Samuels. Much of what I know about LLMs comes from working with him, and that foundation made it easy to contribute to this project.

Centralized Adaptive Sampling for Reliable Co-training of Independent Multi-Agent Policies
Nicholas E. Corrado, Josiah P. Hanna
Under Review
arxiv

TLDR: We identify a subtle failure mode of independent on-policy MARL: on-policy sampling can produce data that deviates from the expected joint on-policy distribution, yielding inaccurate gradients that can make agents converge suboptimally—even when the expected gradient of each agent aligns with optimal behavior. We introduce adaptive sampling algorithm that reduces this sampling error w.r.t. the joint on-policy distribution, enabling agents to more reliability converge to an optimal equilibria.

AutoMixAlign: Adaptive Data Mixing for Multi-Task Preference Optimization in LLMs
Nicholas E. Corrado, Julian Katz-Samuels, Adithya Devraj, Hyokun Yun, Chao Zhang, Yi Xu, Yi Pan, Bing Yin, Trishul Chilimbi
Association for Computational Linguistics (ACL, Main Conference), 2025
arXiv / bibtex

TLDR: Naively aligning LLMs across many datasets each targeting different tasks often yields a model that performs well on some tasks but not others. We introduce AutoMixAlign, a theoretically-grounded data mixing algorithm that adaptively mixes datasets during training to balance performance across tasks.

Special thanks to Julian Katz-Samuels, my mentor at Amazon. I came into this project with no prior LLM experience, and Julian's guidance, support, and confidence in me made all the difference.

On-Policy Policy Gradient Learning Without On-Policy Sampling
Nicholas E. Corrado, Josiah P. Hanna
Under Review
Also in Finding the Frame Workshop @ Reinforcement Learning Conference (RLC), 2025
arXiv / bibtex

TLDR: On-policy learning requires on-policy data, not on-policy sampling! We introduce an adaptive, off-policy sampling algorithm that produces on-policy data more efficiently than on-policy sampling, improving data efficiency.

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, & Josiah P. Hanna
Reinforcement Learning Conference (RLC), 2024
arXiv / bibtex

TLDR: While offline RL algorithm can in principle learn from highly suboptimal data, they nevertheless perform much better with near expert-quality data. Taking a "data first" perspective, we introduce a data augmentation framework that automatically generates near expert-quality synthetic data.

Special thanks to John U. Balis and Adam Labiosa for helping with the physical robotics experiments.

Understanding when Dynamics-Invariant Data Augmentations Benefit Model-free Reinforcement Learning Updates
Nicholas E. Corrado, Josiah P. Hanna
International Conference on Learning Representations (ICLR), 2024
arXiv / bibtex

TLDR: Several prior works introduce data augmentation techniques that improve the data efficiency of RL. Rather than proposing yet another method, we ask: when and why does data augmentation help? Our work offers practical insights to help RL practitioners apply data augmentation more effectively.

Deep Reinforcement Learning for Distribution Power System Cyber-Resilience via Distributed Energy Resource Control
Nicholas E. Corrado, Michael Livesay, Tyson Bailey, & Drew Levin.
IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, (IEEE SmartGridComm), 2023
paper / bibtex

Simulation-Acquired Latent Action Spaces for Dynamics Generalization
Nicholas E. Corrado, Yuxiao Qu, & Josiah P. Hanna
Conference on Lifelong Learning Agents (CoLLAs), 2022
paper / bibtex

Personal

Outside of research, I am a jazz guitarist — in fact, I almost pursued music professionally after high school. I am heavily influenced by gypsy jazz (or jazz manouche) but also draw inspiration from bossa, Dixieland, and modal jazz styles. When I was 13, I had the great fortune of meeting and playing with Joe Negri (aka "Handyman Negri" from Mister Rogers' Neighborhood). He is a jazz legend in my hometown of Pittsburgh and arguably one of the best jazz guitarists in the US.