I'm a Senior Research Scientist at NVIDIA, NYC. Prior to NVIDIA, I was a Research Scientist at FAIR, Meta AI. I received my Ph.D. in Computer Science from TTI Chicago working with Kevin Gimpel and Karen Livescu. Before TTIC, I was at IBM Research India working on IBM Watson related research. I received my B.Tech. in Computer Science and Engineering from IIT Kanpur.
Research Interests
My research focuses primarily on natural language processing (NLP). Some of the key topics which drive my research are:
- Reasoning with Language Models: Performing reasoning with LLMs; Generating synthetic data with LLMs; Impact of pretraining data on emergence of LLM capabilities. [1][2][3]
- Entity Tracking with Language Models: Understanding text requires that the models are building an implicit/explit representation of the underlying world. Are language models trained on mere surface form capable of representing the underlying world? [3][4][5]
Recent Highlights
- Oct 2024 - Released OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data. Links - [Code][Models and Dataset]
- Sep 2024 - Major Entity Identification: A Generalizable Alternative to Coreference Resolution has been accepted to EMNLP 2024 and CRAC 2024!
- Sep 2024 - OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset has been accepted to NeurIPS Datasets and Benchmarks Track as an oral. See you in Vancouver!
- June 2024 - Nemotron-4 340B is out!
- May 2024 - Arxiv report of our ongoing work on effect of code pretraining on entity tracking.