Abstract
Real-time facial performance capture has recently been gaining popularity in virtual film production, driven by advances in machine learning, which allows for fast inference of facial geometry from video streams. These learning-based approaches are significantly influenced by the quality and amount of labelled training data. Tedious construction of training sets from real imagery can be replaced by rendering a facial animation rig under on-set conditions expected at runtime. We learn a synthetic actor-specific prior by adapting a state-of-the-art facial tracking method. Synthetic training significantly reduces the capture and annotation burden and in theory allows generation of an arbitrary amount of data. But practical realities such as training time and compute resources still limit the size of any training set. We construct better and smaller training sets by investigating which facial image appearances are crucial for tracking accuracy, covering the dimensions of expression, viewpoint, and illumination. A reduction of training data in 1-2 orders of magnitude is demonstrated whilst tracking accuracy is retained for challenging on-set footage.
Additional Content
Copyright Notice
The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.