Gunshi Gupta

Deep Learning Researcher

Biography

Hey! I’ll soon be graduating as a Machine Learning D.phil student at the OATML group at University of Oxford. I’m supervised by Prof. Yarin Gal.

I’m currently working on designing methods, architectures and benchmarks to enable transformer-based agents to do long horizon tasks by creating and accessing memories, through large-scale RL.

Some of the topics I have done research on over the previous three years are:

Leveraging advances in visual diffusion modeling for robotics
Mechanistic interpretability in transformer-based world models
Training generative world models for video games and robotics, and
Causally-correct, sample-efficient learning from imbalanced data.

I also collaborated closely with researchers from Toyota Research (Adrien Gaidon and Rowan McAllister) on topics related to causal robot learning.

Prior to my Ph.D, I was a deep learning researcher at Wayve, exploring reinforcment learning algorithms on autonomous driving data. I graduated from a Machine Learning Research Master’s at Mila (Sept 2020) where I did research on meta learning, continual learning and inverse reinforcement learning. I was also an ED&I Fellow with the MPLS department at the University of Oxford in 2022-2023 cohort.

Download my resumé.

Interests

Policy Learning, Reinforcement Learning
Diffusion modeling
Continual Learning, Meta Learning
Memory-augmented models

Education

D.Phil Machine Learning (AIMS CDT), 2024
University of Oxford
Research Master's in Machine Learning, 2020
Montreal Institute of Learning Algorithms
B.Tech in Maths and Computing (Applied Mathematics), 2016
Delhi Technological University (DTU/DCE)

Experience

Deep Learning Intern (RL/IL)

Microsoft Research

Apr 2023 – Jul 2023 Cambridge

I contributed to a team submission to NeurIPS titled “WHAM: World and Human Action Modelling in a Modern Xbox Game” exploring a VQGAN-transformer based world-and-action model trained on 3 years of gameplay trajectories in a high-fidelity multi-player game.
Develop an evaluation suite for mechanistic interpretability of transformer representations to track emergence of game-relevant concepts like locations of adversaries, health resources of the player and so on.

Deep Learning Researcher

Wayve

Jul 2020 – Sep 2021 London

I am part of the policy learning team that focuses on exploring algorithms that can learn in robust and sample-efficient manner aided by expert demonstrations.

Graduate Research Assistant

Robotics Research Center, IIITH

Feb 2017 – Apr 2018 Hyderabad

Here I:

Developed a Multi Robot Visual SLAM framework for the Center for Artificial Intelligence and Robotics (CAIR, India) that was tested using the Husky UGV platform
Published “View-Invariant Intersection Recognition from Videos using Deep Network Ensembles” at IROS 2018

Software Developer

Microsoft

Jun 2016 – Feb 2017 Hyderabad

Built prediction and summarisation modules for employee performance feedback using deep learning and NLP
Organised workshops on ‘Machine Learning Fundamentals’ for Microsoft employees.

Gunshi Gupta

Deep Learning Researcher

University of Oxford

OATML

Wayve

MILA

Biography

Interests

Education

Experience

Deep Learning Intern (RL/IL)

Microsoft Research

Deep Learning Researcher

Wayve

Graduate Research Assistant

Robotics Research Center, IIITH

Software Developer

Microsoft

Publications & Preprint

FindingDory: A Benchmark to Evaluate Memory in Embodied Agents

Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning

Recurrent Attention-based Token Selection for Efficient Streaming Video-LLMs

WHAM: World and Human Action Modelling in a Modern Xbox Game

Pretrained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning?

La-MAML: Look-Ahead Meta-Learning for Continual Learning

Probabilistic Object Detection: Strenghts, Weaknesses, and Opportunities

Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales

Viewpoint Invariant Junction Recognition using Deep Network Ensembles

Geometric Consistency for Self-Supervised End-to-End Visual Odometry

Contact