group 2024 banner
header-image_0001_layer_4.jpg
header-image_0003_layer_0.jpg
header-image_0002_layer_1.jpg

The problem setting of our paper. Our goal is to utilize web images associated with noisy tags to learn a robustvisual-semantic embedding from a dataset of clean images with ground truth sentences. We test the learned latent space byprojecting images and