TY - GEN
T1 - One-shot image recognition using prototypical encoders with reduced hubness
AU - Xiao, Chenxi
AU - Madapana, Naveen
AU - Wachs, Juan
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - Humans have the innate ability to recognize new objects just by looking at sketches of them (also referred as to proto-type images). Similarly, prototypical images can be used as an effective visual representations of unseen classes to tackle few-shot learning (FSL) tasks. Our main goal is to recognize unseen hand signs (gestures) traffic-signs, and corporate-logos, by having their iconographic images or prototypes. Previous works proposed to utilize variational prototypical-encoders (VPE) to address FSL problems. While VPE learns an image-to-image translation task efficiently, we discovered that its performance is significantly hampered by the so-called hubness problem and it fails to regulate the representations in the latent space. Hence, we propose a new model (VPE++) that inherently reduces hubness and incorporates contrastive and multi-task losses to increase the discriminative ability of FSL models. Results show that the VPE++ approach can generalize better to the unseen classes and can achieve superior accuracies on logos, traffic signs, and hand gestures datasets as compared to the state-of-the-art.
AB - Humans have the innate ability to recognize new objects just by looking at sketches of them (also referred as to proto-type images). Similarly, prototypical images can be used as an effective visual representations of unseen classes to tackle few-shot learning (FSL) tasks. Our main goal is to recognize unseen hand signs (gestures) traffic-signs, and corporate-logos, by having their iconographic images or prototypes. Previous works proposed to utilize variational prototypical-encoders (VPE) to address FSL problems. While VPE learns an image-to-image translation task efficiently, we discovered that its performance is significantly hampered by the so-called hubness problem and it fails to regulate the representations in the latent space. Hence, we propose a new model (VPE++) that inherently reduces hubness and incorporates contrastive and multi-task losses to increase the discriminative ability of FSL models. Results show that the VPE++ approach can generalize better to the unseen classes and can achieve superior accuracies on logos, traffic signs, and hand gestures datasets as compared to the state-of-the-art.
UR - http://www.scopus.com/inward/record.url?scp=85106684131&partnerID=8YFLogxK
U2 - 10.1109/WACV48630.2021.00230
DO - 10.1109/WACV48630.2021.00230
M3 - Conference contribution
AN - SCOPUS:85106684131
T3 - Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
SP - 2251
EP - 2260
BT - Proceedings - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
PB - Institute of Electrical and Electronics Engineers
T2 - 2021 IEEE Winter Conference on Applications of Computer Vision, WACV 2021
Y2 - 5 January 2021 through 9 January 2021
ER -