Connecting Multi-modal Contrastive Representations
Zehan Wang
1
,
Yang Zhao
2
,
Xize Chen
1
,
Haifeng Huang
1
,
Jiageng Liu
1
,
Li Tang
1
,
Linjun Li
1
,
Yongqi Wang
1
,
Aoxiong Yin
1
,
Ziang Zhang
1
,
Zhou Zhao
1,3
,
1
Zhejiang University
2
ByteDance
3
Shanghai AI Laboratory
[paper]
[github]
Comparisons
More Examples
Select a Task
Audio to Image Retrieval
Image to Audio Retrieval
Using audio to retrieve image
Clock
Diving
Beach
Seagull
Shooting
Piano
Fire
Alarm
Bell
Singing Kid
Plane Engine
Fork Singing
Lecture
Marching Band
Racing Car
Excavator
Dogs
Children's Chorus
Church
Sheep and Goose