View article

[PDF] from neurips.cc

Devise: A deep visual-semantic embedding model

Authors

Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc'Aurelio Ranzato, Tomas Mikolov

Publication date

2013

Journal

Advances in neural information processing systems

Volume

Description

Modern visual recognition systems are often limited in their ability to scale to large numbers of object categories. This limitation is in part due to the increasing difficulty of acquiring sufficient training data in the form of labeled images as the number of object categories grows. One remedy is to leverage data from other sources--such as text data--both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. We demonstrate that this model matches state-of-the-art performance on the 1000-class ImageNet object recognition challenge while making more semantically reasonable errors, and also show that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training. Semantic knowledge improves such zero-shot predictions by up to 65%, achieving hit rates of up to 10% across thousands of novel labels never seen by the visual model.

Total citations

Cited by 3220

2013201420152016201720182019202020212022202320248 44 118 172 236 341 402 452 440 416 409 137

Scholar articles

Devise: A deep visual-semantic embedding model

A Frome, GS Corrado, J Shlens, S Bengio, J Dean… - Advances in neural information processing systems, 2013

Cited by 3220 Related articles All 18 versions