About me

I'm a Senior Research Scientist at NVIDIA, NYC. My research focuses primarily on natural language processing (NLP). Some of the key topics which drive my research are:

  • Long-context models: How should models efficiently process long-text documents? What should be the structure of the working memory? How to pretrain LMs to encourage the modeling of the latent long-context dependencies? [1][2][3]
  • State Tracking with Language Models: Understanding text requires that the models are building an implicit/explit representation of the underlying world. Are language models trained on mere surface form capable of representing the underlying world? [1][2]

Short Bio

I was a Research Scientist at FAIR, Meta AI before joining NVIDIA. I received my Ph.D. in Computer Science from TTI Chicago working with Kevin Gimpel and Karen Livescu. During my Ph.D., I did internships at Google Research NYC and Google Research SF. Before Ph.D., I spent two years at IBM Research India working on IBM Watson related research. I received my B.Tech. in Computer Science and Engineering from IIT Kanpur.

Recent Highlights