Hi! I am a second year MS student in the CVIT group at IIIT Hyderabad, advised by Prof. C. V. Jawahar and Prof. Makarand Tapaswi. I am working in multimodal learning (jointly learning from vision and language modalities).
Prior to this I was an Engineer at Mercedes Benz Research & Development India.
I am broadly interested in the problems related to computer vision, natural language processing and multimodal representation learning (especially using self-supervision).
CV / Google Scholar / Github / LinkedIn /
May, 2024 : Submitting two exciting papers to NeurIPS 2024.
April, 2024 : Serving as a reviewer for ECCV 2024. Reviewed four papers including one emergency review. Also submitted a paper to ECCV’24.
Decemeber, 2023 : Excited to announce our new work on improving fine-grained understanding in CLIP. FigCLIP
January, 2023 : One paper accpeted at WACV 2023, Unsupervised Audio-Visual Lecture Segmentation
August, 2022 : Joining IIIT Hyderabad as a full time MS by research student at CVIT, I will be jointly advised by Prof. C. V. Jawahar and Prof. Makarand Tapaswi
Addressed the lack of fine-grained and syntactic information in CLIP’s representations by adapting CLIP on holistic, multidimensional, and densely annotated video-text data using lightweight adaptation strategy with LoRA adapters.
Darshan Singh S, Zeeshan Khan, Makarand Tapaswi
Proposed video lecture segmentation that splits lectures into bite-sized topics. Approached this problem by first learning the lecture-clip representations by leveraging visual, textual, and OCR cues using a pretext self-supervised task of matching lecture narrations with temporally aligned visual content. Used these learned representations to temporally segment the lectures using an algorithm called TW-FINCH. Introduced a new dataset, AVLectures, a large-scale dataset consisting of 86 courses with over 2,350 lectures covering various STEM subjects from MIT-OpenCourseWare, which we used for pre-training, fine-tuning, and evaluating the segmentation performance.
Darshan Singh S, Anchit Gupta, C.V. Jawahar and Makarand Tapaswi
Winter Conference on Applications of Computer Vision (WACV), 2023