Connecting Multi-modal Contrastive Representations

Zehan Wang1, Yang Zhao2, Xize Chen1, Haifeng Huang1, Jiageng Liu1, Li Tang1, Linjun Li1, Yongqi Wang1, Aoxiong Yin1, Ziang Zhang1, Zhou Zhao1,3,

1Zhejiang University 2ByteDance 3Shanghai AI Laboratory

[paper][github]
Select a Task
Using audio to retrieve image
Clock

Diving

Beach

Seagull

Shooting

Piano

Fire

Alarm

Bell

Singing Kid

Plane Engine

Fork Singing

Lecture

Marching Band

Racing Car

Excavator

Dogs

Children's Chorus

Church

Sheep and Goose