Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures
We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i.e., localize) arbitrary linguistic phrases, in the form of spatial attention masks.
Factorized Variational Autoencoders for Modeling Audience Reactions to Movies
In this paper, we study non-linear tensor factorization methods based on deep variational autoencoders.
Practical Path Guiding for Efficient Light-Transport Simulation
We present a robust, unbiased technique for intelligent light-path construction in path-tracing algorithms.
Groups Re-identification with Temporal Context
Our experiments illustrate the merits of the proposed approach in challenging re-identification scenarios including crowded public spaces.
Real-time Rendering with Compressed Animated Light Fields
We propose an end-to-end solution for presenting movie quality animated graphics to the user while still allowing the sense of presence afforded by free viewpoint head motion.
Enriching Facial Blendshape Rigs with Physical Simulation
We propose the concept of blendmaterials to give artists an intuitive means to account for changing material properties due to muscle activation.
Real-Time Multi-View Facial Capture with Synthetic Training
We present a real-time multi-view facial capture system facilitated by synthetic training imagery.
Makeup Lamps: Live Augmentation of Human Faces via Projection
We propose the first system for live dynamic augmentation of human faces.
Simulation-Ready Hair Capture
We present the first method for capturing dynamic hair and automatically determining the physical properties for simulating the observed hairstyle in motion.
Learn How to Choose: Independent Detectors versus Composite Visual Phrases
We propose a predictor that is based on a number of category specific features ( e.g., sample size, entropy, etc.) for whether independent or joint composite detector may be more accurate for a given conjunction.
Page 7 of 32