Connecting Multi-modal Contrastive Representations
Zehan Wang
1
,
Yang Zhao
2
,
Xize Chen
1
,
Haifeng Huang
1
,
Jiageng Liu
1
,
Li Tang
1
,
Linjun Li
1
,
Yongqi Wang
1
,
Aoxiong Yin
1
,
Ziang Zhang
1
,
Zhou Zhao
1,3
,
1
Zhejiang University
2
ByteDance
3
Shanghai AI Laboratory
[paper]
[github]
Comparisons
More Examples
Select a Task
Audio to Image Retrieval
Image to Audio Retrieval
Audio-Visual Source Localization
Select an Audio
(click audio)
Bell
Sewing Machine
Racing Car
Cartoon Truck
Tractor
Air Blower
Cello
Dog
Scratch
Cat
Bird
Popcorn
Saxophone
Chorus
Explosion
Cello
Puppy
Bird
Cartoon Sheep
Bird
Electronic Organ
See the Results