View article

[PDF] from neurips.cc

Two-stream convolutional networks for action recognition in videos

Authors

Karen Simonyan, Andrew Zisserman

Publication date

2014

Journal

Advances in neural information processing systems

Volume

Description

We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between frames. We also aim to generalise the best performing hand-crafted features within a data-driven learning framework. Our contribution is three-fold. First, we propose a two-stream ConvNet architecture which incorporates spatial and temporal networks. Second, we demonstrate that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data. Finally, we show that multi-task learning, applied to two different action classification datasets, can be used to increase the amount of training data and improve the performance on both. Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art. It also exceeds by a large margin previous attempts to use deep nets for video classification.

Total citations

Cited by 9159

2015201620172018201920202021202220232024144 386 719 1051 1265 1326 1380 1164 1175 406

Scholar articles

Two-stream convolutional networks for action recognition in videos

K Simonyan, A Zisserman - Advances in neural information processing systems, 2014

Cited by 9159 Related articles All 21 versions