Hi! I am a second year MS student in the CVIT group at IIIT Hyderabad, advised by Prof. C. V. Jawahar and Prof. Makarand Tapaswi. I am working in multimodal learning (jointly learning from vision and language modalities).

Prior to this I was an Engineer at Mercedes Benz Research & Development India.

I am broadly interested in the problems related to computer vision, natural language processing and multimodal representation learning (especially using self-supervision).

CV / Google Scholar / Github / LinkedIn /

News


May, 2024 : Submitting two exciting papers to NeurIPS 2024.

April, 2024 : Serving as a reviewer for ECCV 2024. Reviewed four papers including one emergency review. Also submitted a paper to ECCV’24.

Decemeber, 2023 : Excited to announce our new work on improving fine-grained understanding in CLIP. FigCLIP

January, 2023 : One paper accpeted at WACV 2023, Unsupervised Audio-Visual Lecture Segmentation

August, 2022 : Joining IIIT Hyderabad as a full time MS by research student at CVIT, I will be jointly advised by Prof. C. V. Jawahar and Prof. Makarand Tapaswi

See all news

Publications


FigCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos

Addressed the lack of fine-grained and syntactic information in CLIP’s representations by adapting CLIP on holistic, multidimensional, and densely annotated video-text data using lightweight adaptation strategy with LoRA adapters.

Darshan Singh S, Zeeshan Khan, Makarand Tapaswi

Paper (under review)

Unsupervised Audio Visual Lecture Segmentation

Proposed video lecture segmentation that splits lectures into bite-sized topics. Approached this problem by first learning the lecture-clip representations by leveraging visual, textual, and OCR cues using a pretext self-supervised task of matching lecture narrations with temporally aligned visual content. Used these learned representations to temporally segment the lectures using an algorithm called TW-FINCH. Introduced a new dataset, AVLectures, a large-scale dataset consisting of 86 courses with over 2,350 lectures covering various STEM subjects from MIT-OpenCourseWare, which we used for pre-training, fine-tuning, and evaluating the segmentation performance.

Darshan Singh S, Anchit Gupta, C.V. Jawahar and Makarand Tapaswi

Winter Conference on Applications of Computer Vision (WACV), 2023

Paper / Code (GitHub)


See all publications