group img banner
header-image_0001_layer_4.jpg
header-image_0003_layer_0.jpg
header-image_0002_layer_1.jpg

Two Papers in ECCV 2022

Video Computing Group members have two papers in ECCV 2022. One paper is on cross-modal knowledge transfer which is an oral (collaboration with MERL) and another on how to temporally localize video moments based on text queries
  1. Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality and have some advantages over RGB in domains like autonomous navigation and remote sensing. Building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge from a neural network trained on a well-labeled large dataset in the source modality (RGB) to a neural network that works on a target modality (depth, infrared, etc.) is of great value. It may not be possible to access the source data for reasons like memory and privacy, and knowledge transfer needs to work with only the source models. We describe a practical solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data.

    Title: Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

    Sk M. Ahmed, S. Lohit, K.-C. Peng, M. J. Jones, and A. Roy-Chowdhury, European Conference on Computer Vision (ECCV), 2022

  2. Although recent works on text-based localization of moments have shown high accuracy, these approaches are trained and evaluated relying on the assumption that the localization system, during testing, will only encounter events that are available in the training set (i.e., seen events). However, acquiring videos and text comprising all possible scenarios for training is not practical. In this regard, our work introduces and tackles the problem of text-based temporal localization of novel/unseen events. The goal is to temporally localize video moments based on text queries, where both the video moments and text queries are not observed/available during training. Towards solving this problem, the inference task of text-based localization of moments is formulated as a relational prediction problem, hypothesizing a conceptual relation between semantically relevant moments. 

    Title: Text-based Temporal Localization of Novel Events

    S. Paul, N. C. Mithun, and A. Roy-Chowdhury, M. S. Asif, European Conference on Computer Vision (ECCV), 2022