Davide Baltieri, Roberto Vezzani, Rita Cucchiara Ákos
Transcription
Davide Baltieri, Roberto Vezzani, Rita Cucchiara Ákos
Multi-View People Surveillance Using 3D Information Davide Baltieri, Roberto Vezzani, Rita Cucchiara Ákos Utasi, Csaba Benedek, Tamás Szirányi {davide.baltieri,roberto.vezzani,rita.cucchiara}@unimore.it {utasi,bcsaba,sziranyi}@sztaki.hu Motivation Pixel-Level Features Short-Term Tracking People Re-Identification Goal: localize and track people in the scene Assumptions: • Scene monitored by multiple cameras Head feature: Costant Velocity Kalman Filter: • Unmatched detection ⇒ create new object • Detection assigned to object ⇒ update object state • Unmatched track ⇒ Kalman prediction or delete • Output: segmented/broken trajectories Find correspondences between models: • Vertex-to-vertex Hellinger distance: p t p t d v , v = dH H , H = s Xp H p (h, s, v) · H t (h, s, v) = 1− • Cameras: calibrated + overlapping FOV Work-flow of the System fhi (p) = + − h (p) −Area A ∩S Area(Ah ∩S ) ( i i h h (p)) + Area(Sh (p)) Closed leg feature: + − 0 0 Area A ∩S (p) −Area A ∩S ( i cl ) ( i cl (p)) i fcl (p) = + Area(Scl (p)) Open leg feature: + − 0 0 Area A ∩S (p) −Area A ∩S ( ) ( i i ol ol (p)) i fol (p) = + Area(Sol (p)) Joint leg feature: i i i fl (p) = max fcl (p), fol (p) Dynamic range: truncate fli (p) and fhi (p) to [0, fˆ], normalize by fˆ Long-Term Tracking Feature Fusion P0 + Extended version of [3]: • Connects broken tracks by people reidentification • 3D body model is placed and oriented: – Height: from the people detection – Orientation: from the last K positions • Appearance features are extracted for matching body models P168cm × + = h,s,v • Model-to-model distance: P p t (wi · d(vi , vi )) p t i=1...M P D(Γ , Γ ) = i=1...M (wi ) • Weights are computed from saliency p p t wi = f (θi ) · f (θi ) · si p p t si ∝ min dH (Hi , Hi ) + s0 p si : t Experiments Improved localization accuracy: • 5% improvement over [1] Original method[1] f (p, 168cm) Extended f (p, h) = People Detection Extended version of [1]: 1. Multi-plane projection: to Ground plane P0 , and to Parallel planes Pz q 1 N PN i (p) × f i=1 l 1 N i (p) f i=1 h 3-D Marked Point Process Model Person object: cylinder u, with constant radius R Optimal object configuration: minimize energy Data Prior zX }| zX }| { { ΦD (ω) = JD (u) +γ · I(u, v) u∈ω z<h z=h z>h Data term: u,v∈ω u∼v Prior term: 2. Pixel-level feature extraction: from the projected foreground masks JD (u) ∈ [−1, 1] method PN I(u, v) ∈ [0, 1] Long-term tracking performance: • Recall: 88.8% Feature Extraction 1. Project the vertex to the camera image 2. Initialize the vertex features: • Normal vector ~ni : static, pre-computed • Mean color ci • Local HSV histogram Hi • Optical reliability: θi = ~ni · p~, i.e. frontviewed vertices are favoured • Saliency si : uniqueness (e.g. logo) 3. Vertices outside the person silhouette: • Copy features from the nearest vertex • Use θi = 0 • Precision: 72.73% References [1] Á. Utasi, C. Benedek. A 3-D Marked Point Process Model for Multi-View People Detection, CVPR, 2011 [2] X. Descombes, R. Minlos, E. Zhizhina. Object Extraction Using a Stochastic Birth-and-Death Dynamics in Continuum, J. of Math. Imaging and Vision, 2009 [3] D. Baltieri, R. Vezzani, R. Cucchiara. 3D Body Model Construction and Matching for Real Time People Reidentification, EG-IT, 2010 Acknowledgement This work has been done within the THIS project with the support of the Features of [1] New feature Optimization: • Multiple Birth-and-Death Dynamics [2] Prevention, Preparedness and Consequence Management of Terrorism and other Security-related Risks Programme European Commission Directorate-General Justice, Freedom and Security.