In this paper, we address the problem of jointly summarizing large-scale Flickr images and YouTube user videos. Starting from the intuition that the characteristics of the two media are different yet complementary, we develop a fast and easily-parallelizable approach for creating not only high-quality video summary but also a novel structural summary of online images as storyline graphs, which can illustrate various events or activities associated with the topic in a form of a branching network. In our approach, the video summarization is achieved by diversity ranking on the similarity graphs between images and video frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos. For evaluation, we create the datasets of 20 outdoor recreational activities, consisting of 2.7M of Flickr images and 16K of YouTube user videos. Due to the large-scale nature of our problems, we evaluate our algorithm via crowdsourcing using Amazon Mechanical Turk. In our experiments, we demonstrate that the proposed joint summarization approach outperforms other important baselines and our own methods using videos or images only.
The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder.