I'm a Senior Research Scientist at Databricks Mosaic, NYC, where I build efficient, high-performance agentic systems for enterprise deployment. My research centers on synthetic data generation and reinforcement learning for post-training large language models, with a focus on agentic reasoning.

Prior to Databricks, I was a Senior Research Scientist at NVIDIA, where I contributed to the Nemotron model family and led the OpenMath series of datasets and models — including work that earned 1st place at the AI Math Olympiad 2 among 2,212 teams. Before NVIDIA, I was a Research Scientist at FAIR, Meta AI, where I worked on advancing reasoning capabilities in language models.

I received my Ph.D. in Computer Science from TTI Chicago, advised by Kevin Gimpel and Karen Livescu, and my B.Tech. in Computer Science from IIT Kanpur.

Research Interests

My research examines how large language models can reason reliably in complex environments. I focus on agentic reasoning — how models use tools, maintain and update world state over long horizons, and how we can make such reasoning scalable, efficient, and robust for real-world deployment.

Synthetic Data for Scalable Post-Training High-quality training data is central to advancing large language models, particularly during post-training. I study how to generate large-scale, targeted synthetic data to improve reasoning capabilities. At Databricks, this translates into designing scalable pipelines that enhance efficiency, robustness, and performance in enterprise applications.

Previously at NVIDIA, I developed post-training strategies for reasoning-centric models, specifically large-scale synthetic data generation for mathematical reasoning. I was a core contributor to the OpenMath series of datasets and models, including OpenMathInstruct-1 (NeurIPS 2024, Oral), OpenMathInstruct-2 (ICLR 2025), and OpenMathReasoning, which earned 1st place at the AI Math Olympiad 2 (among 2,212 teams). These datasets and models have been widely adopted by the research community and contributed to the reasoning performance of the Nemotron model family.

World Modeling and Memory Mechanisms Robust reasoning requires that models and agents represent and track world state across long contexts. A recurring thread in my research is understanding how language models build such internal representations and how memory mechanisms support state tracking over extended horizons.

In Chess as a Testbed for Language Model State Tracking (AAAI 2022), we demonstrated that transformers trained purely on move sequences develop a latent representation of the underlying board state despite receiving no explicit supervision. Code Pretraining Improves Entity Tracking Abilities of Language Models (arXiv 2024) found that training on code significantly improves entity tracking across model families. At FAIR, we proposed Learning to Reason and Memorize with Self-Notes (NeurIPS 2023), which introduced a simple note-taking mechanism to externalize intermediate reasoning steps and state details, improving performance on long and complex reasoning tasks.

Coreference and Entity Tracking Understanding entities and their references is a fundamental problem in natural language understanding. During my PhD, I developed neural models with explicit memory mechanisms that achieved state-of-the-art results in coreference resolution — PeTra: A Sparsely Supervised Memory Model for People Tracking (ACL 2020), Learning to Ignore: Long Document Coreference with Bounded Memory Neural Networks (EMNLP 2020), and On Generalization in Coreference Resolution (Best Paper Award at CRAC@EMNLP 2021).

As large language models reshaped NLP, we revisited coreference through more practical formulations. In Major Entity Identification: A Generalizable Alternative to Coreference Resolution (EMNLP 2024), we restricted the coreference task to pre-selected major entities. In IdentifyMe: A Challenging Long-Context Mention Resolution Benchmark for LLMs (NAACL 2025), we introduced a long-context benchmark for mention resolution to evaluate the referential abilities of frontier models.

This progression reflects my sustained focus on foundational questions in representation, memory, and reasoning — specifically, how models construct structured world state, maintain it over long horizons, and leverage it to enable scalable agentic systems.


Recent Highlights

Feb 2026 Our work Learning Generative Selection for Best-of-N is on arXiv!
Oct 2025 Joined Databricks Mosaic. Excited to work on agentic reasoning!
Jul 2025 Attended and presented three papers in the AI for Math Workshop at ICML 2025 — GenSelect: A Generative Approach to Best-of-N, The Challenge of Teaching Reasoning to LLMs Without RL or Distillation, and Scaling Mathematical Reasoning through Data, Tools, and Generative Selection.
Apr 2025 Our team NemoSkills won the AIMO-2 competition among 2200+ teams! We also released the OpenMathReasoning dataset, models, and report.