Efficient Learning and Inference

The recent successes in image and video analysis have been largely in the domain of supervised learning. Supervised learning methods assume the availability of extensive amounts of manually annotated/labeled training data, which limits the applicability of existing methods to complex and unseen environments. This has motivated growing interest in developing semi-supervised, and even unsupervised, methods for image and video analysis, i.e., methods that have limited or even no manually annotated data.

In our CVPR 2021 paper, we showed how to combine information from multiple source models to a target domain that is unlabeled. Moreover, the data used to train the source models was not available (e.g., due to loss of data or privacy issues). Our method provided a way to automatically weight the source models based on the distribution of the target data, and is the first work on multi-source adaptation without access to source data.

We have also explored source-free knowledge transfer across modalities. In the ECCV 2022 paper, we showed how to transfer models learned on RGB source data (which is plentiful) to depth or infra-red data modalities in the target domain without the need for source data or any paired data relevant to the task at hand.

Another problem in learning with limited supervision is identifying what to label. If one can identify the optimal subset to label, it is likely that the learning process will be more efficient than randomly choosing representatives that are labeled by a human. We have worked on active learning approaches for identifying the subset of data to be labeled. We have proposed approaches that rely on information-theoretic measures and exploit the structure in the data.

Learning with limited supervision has been applied in a number of problems that we are working on, including person re-identification, video enhancement, and video retrieval, among others.

This work has been supported by NSF and ONR.