Presentation
Transcription
Presentation
Total Cluster: A person agnostic clustering method for broadcast videos Makarand Tapaswi, Omkar M. Parkhi, Esa Rahtu, Eric Sommerlade, Rainer Stiefelhagen, Andrew Zisserman Face track clustering Input Output Video with face tracks Face track clusters 2 Why cluster face tracks? • Large number of videos • Good basis for manual or automatic labelling • Facilitates video retrieval or summarization 3 Overview • Editing structure in videos • Shots, threads and scenes • Using editing structure for clustering • Negative pairs • Scene level clustering • Episode level clustering • Dataset and evaluation • Scrubs, Buffy • Weighted clustering purity • Clustering results T1 T2 T3 T4 Related work Person Identification in TV series (Everingham et al., 2006) (Cour et al., 2009) (Bojanowski et al., 2013) (Cinbis et al., 2011) (Khoury et al., 2013) (Ramanathan et al., 2014) Face Clustering (Ramanan et al., 2007) (Wu et al., 2013, 2013b) Novelty of the current work • Error free clustering • Utilize video-editing structure • Use state-of-the-art face track descriptor 5 Overview • Editing structure in videos • Shots, threads and scenes • Using editing structure for clustering • Negative pairs • Scene level clustering • Episode level clustering • Dataset and evaluation • Scrubs, Buffy • Weighted clustering purity • Clustering results T1 T2 T3 T4 Video editing overview Episode Scenes Threads … Shots 7 Shots and Threads shot thread (n.)(n.) A sequence set of shots oftaken uninterrupted from the frames same camera for a short with the period same ofviewing time angle and content A B A C D C Scrubs S01 E01 D … 8 Scenes scene (n.) A sequence of shots in a video with the same characters, at the same location and time Scene 4 Scene 5 Scene 6 Scene boundary detection (Tapaswi et al. 2014) Maximize within-scene visual similarity Don’t break shot threads 9 Overview • Editing structure in videos • Shots, threads and scenes • Using editing structure for clustering • • • • Face tracking Negative pairs Scene level clustering Episode level clustering • Dataset and evaluation • Scrubs, Buffy • Weighted clustering purity • Clustering results T1 T2 T3 T4 Face tracks Face tracking Face track descriptor (Parkhi et al. 2014) • • • • frontal+profile VJ detector multi-pose head detector KLT tracking false positive removal • dense SIFT • Fisher encoding • discriminative dimensionality reduction 11 Negative constraints Tracks in the same frame (Cinbis et al. 2011, Wu et al. 2013) do NOT merge these pairs! Tracks in a thread A B A C B 12 Overview • Editing structure in videos • Shots, threads and scenes • Using editing structure for clustering • Negative pairs • Scene level clustering • Episode level clustering • Dataset and evaluation • Scrubs, Buffy • Weighted clustering purity • Clustering results T1 T2 T3 T4 for tracks not in negative pairs Within scene clustering Strong face similarity In a thread, • area overlap • relaxed face similarity T1 T2 T2 T1 128D T4 Dr. Kelso T3 Turk 14 Overview • Editing structure in videos • Shots, threads and scenes • Using editing structure for clustering • Negative pairs • Scene level clustering • Episode level clustering • Dataset and evaluation • Scrubs, Buffy • Weighted clustering purity • Clustering results T1 T2 T3 T4 Cluster descriptor compare frontal vs. frontal, profile vs. profile Frontal Frontal descriptor Mixed Both descriptors Profile Profile descriptor 16 Full episode clustering Exemplar SVMs Cluster across scenes • one positive cluster • against negative clusters + YouTube Faces • score clusters with e-SVM • merge high-scoring 17 Recap A A B B • Video editing structure • Do not merge track pairs • Scene level clustering T2 T1 Turk • Episode level clustering 18 Overview • Editing structure in videos • Shots, threads and scenes • Using editing structure for clustering • Negative pairs • Scene level clustering • Episode level clustering • Dataset and evaluation • Scrubs, Buffy • Weighted clustering purity • Clustering results T1 T2 T3 T4 Data set • episodes 1..5, 23 • ~20 minutes each • sitcom, interns at a hospital average episode # named characters # tracks # tracks in thread in shots # do not merge pairs in threads • episodes 1..6 • ~40 minutes each • supernatural drama, action SCRUBS BUFFY 16 412 280 214 851 15 756 515 440 1891 20 Evaluation metrics Goal: minimize #clusters, don’t make errors! WCP: weighted clustering purity • Cluster 1: Purity p1 = 3/3 • Cluster 2: Purity p2 = 4/5 1 • WCP: 𝑐 𝑛𝑐 𝑝𝑐 = 0.875 Cluster 1 Purity: 1 𝑁 Trade-off WCP vs. #clusters Cluster 2 Purity: 0.8 Purity 1 # clusters 21 Within scene clustering SCRUBS Tracks Best HAC Ours #Clusters #Clusters 450 350 250 BUFFY 950 850 750 650 550 450 150 SCR-1 SCR-2 SCR-3 SCR-4 SCR-5 Lower number of clusters @ purity = 1 BF-1 BF-2 BF-3 BF-5 BF-6 Prevent errors 22 Full episode clustering SCRUBS 500 Best HAC Ours 400 #Clusters #Clusters Tracks 1000 300 200 100 BUFFY 850 700 550 400 SCR-1 SCR-2 SCR-3 SCR-4 SCR-5 BF-1 BF-2 BF-3 BF-5 BF-6 Full episode clustering • hard to choose threshold for HAC • proposed method reduces #clusters @ purity → 1 23 Summary Purity 1 • Minimize #clusters at very high purity • Affords manual and automatic annotation A # clusters B A • Leverage video editing structure • Face tracks are not independent data points • 4x number of do not merge track pairs • Stage-wise clustering • Small number of characters in one scene • At purity 1, #clusters is about half #tracks 24 Thank you! Questions? Intro, Related work, Video-editing, Negative pairs Scene clustering, Episode clustering, Dataset, Evaluation
Similar documents
Complexity-Adaptive Distance Metric for Object Proposals Generation
First we evaluate the performance on PASCAL VOC 2012 dataset. Figure 1 shows the recall versus IoU threshold curves with candidates number varying from 500-3000. The quantitative results of corresp...
More information