We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i.e., localize) arbitrary linguistic phrases, in the form of spatial attention masks.
In this paper, we study non-linear tensor factorization methods based on deep variational autoencoders.
We present a robust, unbiased technique for intelligent light-path construction in path-tracing algorithms.
Our experiments illustrate the merits of the proposed approach in challenging re-identification scenarios including crowded public spaces.
We propose an end-to-end solution for presenting movie quality animated graphics to the user while still allowing the sense of presence afforded by free viewpoint head motion.
We propose the concept of blendmaterials to give artists an intuitive means to account for changing material properties due to muscle activation.
We present a real-time multi-view facial capture system facilitated by synthetic training imagery.
We propose the first system for live dynamic augmentation of human faces.
We present the first method for capturing dynamic hair and automatically determining the physical properties for simulating the observed hairstyle in motion.
We propose a predictor that is based on a number of category specific features ( e.g., sample size, entropy, etc.) for whether independent or joint composite detector may be more accurate for a given conjunction.
Page 7 of 32