Connecting Multi-modal Contrastive Representations

Zehan Wang1, Yang Zhao2, Xize Chen1, Haifeng Huang1, Jiageng Liu1, Li Tang1, Linjun Li1, Yongqi Wang1, Aoxiong Yin1, Ziang Zhang1, Zhou Zhao1,3,

1Zhejiang University 2ByteDance 3Shanghai AI Laboratory

[paper][github]
Select a Task
Using image to retrieve audio
Volleyball Game

Piano

Fish

Man

Marching Band

Guitar

Shooting

Bell

Chorus

Duck

Swimming

Fireworks

Forklift

Band

Cutting

Horse and Motorbike

Two Mem

Concert Party

Singing

Cooking