TY - GEN
T1 - Fusing visual and range imaging for object class recognition
AU - Bar-Hillel, Aharon
AU - Hanukaev, Dmitri
AU - Levi, Dan
PY - 2011/12/1
Y1 - 2011/12/1
N2 - Category level object recognition has improved significantly in the last few years, but machine performance remains unsatisfactory for most real-world applications. We believe this gap may be bridged using additional depth information obtained from range imaging, which was recently used to overcome similar problems in body shape interpretation. This paper presents a system which successfully fuses visual and range imaging for object category classification. We explore fusion at multiple levels: using depth as an attention mechanism, high-level fusion at the classifier level and low-level fusion of local descriptors, and show that each mechanism makes a unique contribution to performance. For low-level fusion we present a new algorithm for training of local descriptors, the Generalized Image Feature Transform (GIFT), which generalizes current representations such as SIFT and spatial pyramids and allows for the creation of new representations based on multiple channels of information. We show that our system improves state-of-the-art visual-only and depth-only methods on a diverse dataset of every-day objects.
AB - Category level object recognition has improved significantly in the last few years, but machine performance remains unsatisfactory for most real-world applications. We believe this gap may be bridged using additional depth information obtained from range imaging, which was recently used to overcome similar problems in body shape interpretation. This paper presents a system which successfully fuses visual and range imaging for object category classification. We explore fusion at multiple levels: using depth as an attention mechanism, high-level fusion at the classifier level and low-level fusion of local descriptors, and show that each mechanism makes a unique contribution to performance. For low-level fusion we present a new algorithm for training of local descriptors, the Generalized Image Feature Transform (GIFT), which generalizes current representations such as SIFT and spatial pyramids and allows for the creation of new representations based on multiple channels of information. We show that our system improves state-of-the-art visual-only and depth-only methods on a diverse dataset of every-day objects.
UR - http://www.scopus.com/inward/record.url?scp=84856645388&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2011.6126226
DO - 10.1109/ICCV.2011.6126226
M3 - Conference contribution
AN - SCOPUS:84856645388
SN - 9781457711015
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 65
EP - 72
BT - 2011 International Conference on Computer Vision, ICCV 2011
T2 - 2011 IEEE International Conference on Computer Vision, ICCV 2011
Y2 - 6 November 2011 through 13 November 2011
ER -