Modeling Deep Temporal Dependencies with Recurrent “Grammar Cells”

2014 Michalski, V., Memisevic, R., Konda, K.
Modeling Deep Temporal Dependencies with Recurrent “Grammar Cells”
Neural Information Processing Systems (NIPS 2014)
[pdf][supplementary][bibtex]

Abstract

We propose modeling time series by representing the transformations that take a frame at time t to a frame at time t+1. To this end we show how a bi-linear model of transformations, such as a gated autoencoder, can be turned into a recurrent network, by training it to predict future frames from the current one and the inferred transformation using backprop-through-time. We also show how stacking multiple layers of gating units in a recurrent pyramid makes it possible to represent the ”syntax” of complicated time series, and that it can outperform standard recurrent neural networks in terms of prediction accuracy on a variety of tasks.

Supplementary Material

Bottom-layer PGP filter pairs

Filters pairs (left/right input receptive fields) of PGP models trained on the accelerated transformation data sets introduced in our paper, the bouncing balls data set [1] and NORBvideos [2]:

Accelerated Rotations	Accelerated Shifts	Bouncing Balls	NORBvideos

Generated Sequences

Bouncing Balls

Some sequences generated by a three-layer PGP on the bouncing balls data set (generated with the script released with

(the first 4 frames are seeded, the remaining frames are generated by the model):

Some shorter predicted sequences (right) together with ground truth (left) from preliminary experiments with a 2-layer PGP (3 seed frames not shown):

Chirps

Additional chirp predictions not shown in the paper, because of space restrictions. After seeing 5 windows of 10 frames (frames 1-50) each the models predicted the remaining sequence. The comparison models are the Conditional Restricted Boltzmann Machine (CRBM) [3] trained with contrastive divergence and a vanilla RNN trained with backpropagation through time.

References

[1] I. Sutskever, G. E. Hinton, and G. W. Taylor. The recurrent temporal restricted boltzmann machine. In Advances in Neural Information Processing Systems 21, pages 1601–1608, 2008.

[2] R. Memisevic and G. Exarchakis. Learning invariant features by harnessing the aperture problem. In Proceedings of the 30th International Conference on Machine Learning, 2013.

[3] G. W. Taylor, G. E. Hinton, and S. T. Roweis. Modeling human motion using binary latent variables. In Advances in Neural Information Processing Systems 20, pages 1345–1352, 2007.

Modeling Deep Temporal Dependencies with Recurrent “Grammar Cells”

Abstract

Supplementary Material

Bottom-layer PGP filter pairs

Accelerated Rotations

Accelerated Shifts

Bouncing Balls

NORBvideos

Generated Sequences

References