Connecting Multi-modal Contrastive Representations
Zehan Wang
1
,
Yang Zhao
2
,
Xize Chen
1
,
Haifeng Huang
1
,
Jiageng Liu
1
,
Li Tang
1
,
Linjun Li
1
,
Yongqi Wang
1
,
Aoxiong Yin
1
,
Ziang Zhang
1
,
Zhou Zhao
1,3
,
1
Zhejiang University
2
ByteDance
3
Shanghai AI Laboratory
[paper]
[github]
Comparisons
More Examples
Select a Task
Audio to Image Retrieval
Image to Audio Retrieval
Using image to retrieve audio
Volleyball Game
Piano
Fish
Man
Marching Band
Guitar
Shooting
Bell
Chorus
Duck
Swimming
Fireworks
Forklift
Band
Cutting
Horse and Motorbike
Two Mem
Concert Party
Singing
Cooking