Connecting Multi-modal Contrastive Representations
Zehan Wang
1
,
Yang Zhao
2
,
Xize Chen
1
,
Haifeng Huang
1
,
Jiageng Liu
1
,
Li Tang
1
,
Linjun Li
1
,
Yongqi Wang
1
,
Aoxiong Yin
1
,
Ziang Zhang
1
,
Zhou Zhao
1,3
,
1
Zhejiang University
2
ByteDance
3
Shanghai AI Laboratory
[paper]
[github]
Comparisons
More Examples
Select a Task
Audio to Image Retrieval
Image to Audio Retrieval
Audio-Visual Source Localization
Select an Audio
(click audio)
Fireworks
Train
Female Speaker
Truck
Male Speaker
Cat
Football Game
Ducks
Recorder
See the Results