Aadarsh Sahoo

Ph.D. student at Caltech, advised by Prof. Georgia Gkioxari and Prof. Pietro Perona.

Email Scholar Resume X/Twitter LinkedIn GitHub

About

I am a Ph.D. student at Caltech advised by Prof. Georgia Gkioxari and Prof. Pietro Perona.

Previously, I was a student researcher at the MIT–IBM Watson AI Lab, pretraining large language models and doing research on vision–language models with Dr. Rameswar Panda, Dr. Rogerio Feris, and Prof. Yoon Kim. I completed a dual degree (B.Tech + M.Tech) at IIT Kharagpur, where I worked in the Computer Vision and Intelligence Research Lab under Prof. Abir Das.

In Summer 2021, I worked with Prof. Kate Saenko (Boston University) and Prof. Trevor Darrell (UC Berkeley) as a research intern for the DARPA LwLL project.

Research

My research focuses on building multimodal foundation models that perceive, reason, and interact in 3D environments. I am particularly interested in developing unified representations that bridge geometry, vision, and language-guided systems that can understand spatial relationships, engage in grounded conversations, and perform complex reasoning tasks. My work spans 3D tokenization for sequential modeling, conversational visual understanding, and parameter-efficient adaptation of large models. Ultimately, I aim to advance embodied AI systems that can robustly interpret and act within real-world environments to benefit society.

News

Jun 2025
Preprint “Aligning Text, Images, and 3D Structure Token-by-Token (Kyvo)” released on arXiv (project page).
Jul 2024
“XPL: A Cross-Model framework for Semi-Supervised Prompt Learning in VLMs” accepted by TMLR.
Jul 2024
Attended ICVSS 2024 in Sicily.
Oct 2024
Recognized as an Outstanding Reviewer for ECCV 2024.
Jun 2024
Named an Outstanding Reviewer for CVPR 2024.
Sep 2023
Started my PhD at Caltech.
Sep 2023
I have been selected as a Kortschak Scholar at Caltech!
Jan 2023
Paper on Anytime Domain Adaptation accepted at ICLR 2023.
Jan 2023
Select, Label, and Mix (SLM) received the Best Paper Honorable Mention at WACV 2023.
Oct 2022
Paper on Partial Domain Adaptation accepted at WACV 2023.
Jul 2022
Started as a student researcher at the MIT–IBM Watson AI Lab in Cambridge, MA.
Oct 2021
Extended Abstract on Partial Domain Adaptation accepted at the NeurIPS DistShift Workshop 2021.
Sep 2021
Paper on Domain Adaptation in Action Recognition accepted at NeurIPS 2021.
Jun 2021
Volunteer at the CVPR 2021 workshop Dynamic Neural Networks Meets Computer Vision (DNetCV).
May 2021
Started internship at UC Berkeley and Boston University.
Aug 2020
Paper accepted at the ECCV 2020 Workshop on Imbalance Problems in Computer Vision (IPCV).
May 2019
Joined CVIR, IIT Kharagpur as an undergraduate researcher in Computer Vision.
Jul 2018
Selected for Computer Science and Engineering major (<1% acceptance).
Jul 2017
Admitted to IIT Kharagpur via JEE Advanced 2017.

Publications

Aligning Text, Images, and 3D Structure Token-by-Token

Aadarsh Sahoo, Vansh Tibrewal, Georgia Gkioxari — arXiv, 2025

We present a unified LLM that aligns language, images, and structured 3D scenes, demonstrating applications in rendering, recognition, instruction following, and 3D QA.

Project page arXiv Code BibTeX

XPL: Semi-Supervised Prompt Learning for VLMs teaser

XPL: A Cross-Model framework for Semi-Supervised Prompt Learning in Vision–Language Models

Omprakash Chakraborty, Aadarsh Sahoo, Rameswar Panda, Abir Das — TMLR, 2024

A semi-supervised prompt learning framework leveraging unlabeled data to improve VLM adaptation via cross-model consistency.

Paper TMLR

AnyDA: Anytime Domain Adaptation

Omprakash Chakraborty, Aadarsh Sahoo, Rameswar Panda, Abir Das — ICLR, 2023

We propose a domain-alignment approach with switchable depth, width, and input resolution to realize accuracy–efficiency trade-offs under different constraints.

Project Code

CoMix: Temporal contrastive video domain adaptation graphic

Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing

Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das — NeurIPS, 2021

We introduce a temporal contrastive learning approach for unsupervised video domain adaptation by leveraging video speed, background mixing, and target pseudo-labels.

Project Poster Video Slides Code

Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation

Aadarsh Sahoo, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das — NeurIPS-W 2021; WACV 2023 (Best Paper Honorable Mention)

We develop a ‘Select, Label, and Mix’ (SLM) framework that aims to learn discriminative invariant feature representations for partial domain adaptation.

Project Poster Video Slides Code

Mitigating Dataset Imbalance via Joint Generation and Classification

Aadarsh Sahoo, Ankit Singh, Rameswar Panda, Rogerio Feris, Abir Das — ECCV-W, 2020 (Oral)

We introduce a dataset repair strategy combining a classifier with a GAN to augment minority-class examples.

Project Code Talk

Services

Reviewer: CVPR (Outstanding Reviewer 2024), ECCV (Outstanding Reviewer 2024), NeurIPS, ICLR, ICCV, ICML, WACV, TPAMI.
Student Volunteer & Reviewer: DNetCV @ CVPR 2022.
Student Volunteer & Reviewer: DNetCV @ CVPR 2021.

Contact

Email: aadarsh.sahoo.99@gmail.com

Profiles: Google Scholar · LinkedIn · GitHub · X/Twitter