The usage of deep generative models for image compression has led to impressive performance gains over classical codecs while neural video compression is still in its infancy. Here, we propose an end-to-end, deep generative modeling approach to compress temporal sequences with a focus on video. Our approach builds upon variational autoencoder (VAE) models for sequential data and combines them with recent work on neural image compression.
We present a light field video synthesis technique that can achieve accurate reconstruction given a low-cost, widebaseline camera rig. Our system called, INDiuM, novelly integrates optical flow with methods for rectification, disparity estimation, and feature extraction, which we then feed to a neural network view synthesis solver with widebaseline capability. A new bi-directional warping approach resolves reprojection ambiguities that would result from either backward or forward warping only. The system and method enables the use of off-the-shelf surveillance camera hardware in a simplified and expedited capture workflow. A thorough analysis of the refinement process and resulting view synthesis accuracy over state of the art is provided.
Our normalized cut loss approach to segmentation brings the quality of weakly-supervised training significantly closer to fully supervised methods.
We investigate the advantages of a stereo, multi-spectral acquisition system for material classication in ground-level landscape images.
This paper presents underwater 3D capture using a commercial depth camera.
Videos of multi-player team sports provide a challenging domain for dynamic scene analysis. Player actions and interactions are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game.
In this paper, we address the problem of jointly summarizing large-scale Flickr images and YouTube user videos.
We address this issue by introducing a simple, low-level, patch-consistency assumption that leverages the extra information present in video data to resolve this ambiguity.
Page 1 of 2