Skip to content

Aadarsh Sahoo — Caltech Ph.D. Student

Aadarsh Sahoo

I am a Ph.D. student at Caltech advised by Prof. Georgia Gkioxari and Prof. Pietro Perona.

Prior to this, I was a student researcher at the MIT-IBM Watson AI Lab pretraining Large Language Models along with research on Vision-Language Models with Dr. Rameswar Panda, Dr. Rogerio Feris, and Prof. Yoon Kim. I completed my dual-degree (B.Tech + M.Tech) from IIT Kharagpur where I worked as an undergraduate researcher at the Computer Vision and Intelligence Research Lab under the supervision of Prof. Abir Das.

In the summer of 2021, I was fortunate to get an opportunity to work under the guidance of Prof. Kate Saenko (Boston University) and Prof. Trevor Darrell (UC Berkeley), as a research intern for the DARPA LwLL Project.

Email  /  Google Scholar  /  Resume  /  LinkedIn  /  Github

Portrait of Aadarsh Sahoo
News
  • Preprint “Aligning Text, Images, and 3D Structure Token-by-Token (Kyvo)” released on arXiv (project page).
  • “XPL: A Cross-Model framework for Semi-Supervised Prompt Learning in VLMs” accepted by TMLR.
  • Paper on Anytime Domain Adaptation accepted at ICLR 2023.
  • Select, Label, and Mix (SLM) received the Best Paper Honorable Mention Award at WACV 2023!
  • Paper on Partial Domain Adaptation accepted at WACV 2023.
  • Started working as a student researcher at the MIT-IBM Watson AI Lab at Cambridge, MA.
  • Extended Abstract on Partial Domain Adaptation accepted at NeurIPS DistShift Workshop 2021.
  • Paper on Domain Adaptation in Action Recognition accepted at NeurIPS 2021.
  • Volunteer at the workshop on Dynamic Neural Networks Meets Computer Vision (DNetCV) at CVPR 2021.
  • Started my internship at UC Berkeley and Boston University.
  • One paper accepted at the ECCV 2020 Workshop on Imbalance Problems in Computer Vision (IPCV).
  • Joined CVIR, IIT Kharagpur as an undergraduate researcher in Computer Vision.
  • Got Computer Science and Engineering as my major. (less than 1% acceptance)
  • Got accepted into IIT Kharagpur for my undergraduate studies through the JEE Advanced 2017.
Research

My research interests lie in understanding the principles of learning from multiple modalities and exploring how knowledge from one modality can be transferred to applications in others, with a goal to design embodied multimodal agents benefiting humanity. Obtaining answers to questions like - "Do toddlers use the same principles in learning new languages as they do in learning how to walk?" - should be fun!

Publications
Kyvo: Aligning Text, Images, and 3D Structure Token-by-Token teaser Aligning Text, Images, and 3D Structure Token-by-Token
Aadarsh Sahoo, Vansh Tibrewal, Georgia Gkioxari
arXiv, 2025.
project page / arXiv / code / bibtex

We present a unified LLM that aligns language, images, and structured 3D scenes and demonstrate it across rendering, recognition, instruction-following, and 3D QA.

XPL: Semi-Supervised Prompt Learning for VLMs teaser XPL: A Cross-Model framework for Semi-Supervised Prompt Learning in Vision-Language Models
Omprakash Chakraborty, Aadarsh Sahoo, Rameswar Panda, Abir Das
Transactions on Machine Learning Research (TMLR), 2024.
paper / TMLR

A semi-supervised prompt learning framework that leverages unlabeled data to improve VLM adaptation via cross-model consistency.

AnyDA: Anytime Domain Adaptation teaser AnyDA: Anytime Domain Adaptation
Omprakash Chakraborty, Aadarsh Sahoo, Rameswar Panda, Abir Das
11th International Conference on Learning Representations (ICLR), 2023.
project page / code

We introduce a novel approach for anytime domain adaptation by considering domain alignment with switchable depth, width and input resolutions to achieve accuracy-efficiency trade-offs in the target domain for different resource constraints.

CoMix: Temporal contrastive video domain adaptation graphic Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing
Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, Abir Das
35th Conference on Neural Information Processing Systems (NeurIPS), 2021.
project page / poster / video presentation / slides / code

We introduce a novel temporal contrastive learning approach for unsupervised video domain adaptation, achieved by jointly leveraging video speed, background mixing, and target pseudo-labels.

SLM: Select, Label, and Mix teaser Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation
Aadarsh Sahoo, Rameswar Panda, Rogerio Feris, Kate Saenko, Abir Das
NeurIPS DistShift Workshop (NeurIPS-W), 2021.
Winter Conference on Applications of Computer Vision (WACV), 2023.
(Best Paper Honorable Mention).
project page / poster / video presentation / slides / code

We develop a novel 'Select, Label, and Mix' (SLM) framework that aims to learn discriminative invariant feature representations for partial domain adaptation.

Mitigating Dataset Imbalance via Joint Generation and Classification teaser Mitigating Dataset Imbalance via Joint Generation and Classification
Aadarsh Sahoo, Ankit Singh, Rameswar Panda, Rogerio Feris, Abir Das
ECCV Workshop on Imbalance Problems in Computer Vision (ECCV-W), 2020 (Oral).
project page / code / live talk

We introduce a joint dataset repairment strategy by combining a classifier with a GAN that makes up for the deficit of training examples from the minority class by producing additional examples.

Services

Webpage template courtesy: Jon Barron