Automatic Panoramic Image Stitching
Transcription
Automatic Panoramic Image Stitching
Automatic Panoramic Image Stitching Dr. Matthew Brown, University of Bath Automatic 2D Stitching • The old days: panoramic stitchers were limited to a 1-D sweep • 2005: 2-D stitchers use object recognition for discovery of overlaps 2 AutoStitch iPhone “Create gorgeous panoramic photos on your iPhone” - Cult of Mac “Raises the bar on iPhone panoramas” - TUAW “Magically combines the resulting shots” - New York Times 3 4F12 class of ‘99 37 Projection Case study – Image mosaicing Any two images of a general scene with the same camera centre are related by a planar projective transformation given by: w̃! = KRK−1w̃ where K represents the camera calibration matrix and R is the rotation between the views. This projective transformation is also known as the homography induced by the plane at infinity. A minimum of four image correspondences can be used to estimate the homography and to warp the images onto a common image plane. This is known as mosaicing. 4 Meanwhile in 1999... • • • • David Lowe publishes “Scale Invariant Feature Transform” 11,572 citations on Google scholar A breakthrough solution to the correspondence problem SIFT is capable of operating over much wider baselines than previous methods Figure 5: Examples of 3D object recognition with occlusion. Figure 4: Top row shows model images for 3D objects with outlines found by background segmentation. Bottom image shows recognition results for 3D objects with model outlines and image keys used for matching. 7. Experiments Figure 5: Examples o The affine solution provides a good approximation to perspective projection of planar objects, so planar models provide a good initial test of the approach. The top row of FigFigure 5: Examples of 3D object recognition with occlusion. ure 3 shows three model images of rectangular planar faces4: Top row shows model images for 3D objects with Figure of objects. The figure also shows a cluttered image outlines contain-found by background segmentation. Bottom image ing the planar objects, and the same image is shown overshows recognition results for 3D objects with model outlines 7. Experiments layed with the models following recognition. Theand model image keys used for matching. keys that are displayed are the ones used for recognition and 7. Experimen [ Lowe ICCV 1999 ] mined by solving the corresponding normal equations, The affine solution p spective projection of vide a good initial tes ure 3 shows three mo of objects. The figure 5 Local Feature Matching • Given a point in the world... [ ] ...compute a description of that point that can be easily found in other images 6 Scale Invariant Feature Transform 12 Lindeberg • • Start by detecting points of interest (blobs) original image scale-space maxima of (∇2norm L)2 (trace Hnorm L)2 (det Hnorm L)2 Find maxima of image Laplacian over scale and space ∂2I ∂2I L(I(x)) = ∇.∇I = + ∂x2 ∂y 2 [ T. Lindeberg ] 7 Scale Invariant Feature Transform • Describe local region by distribution (over angle) of gradients Image gradients Image gradients • Keypoint descriptor Keypoint descriptor 7: A keypoint descriptor is created by first computing the gradient magnitude orientation FigureFigure 7: A keypoint descriptor is created by first computing the gradient magnitude andand orientation at each image sample point in a region around the keypoint location, as shown on the left. These are at each image sample point in a region around the keypoint location, as shown on the left. These are weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated into orientation histograms summarizing the contents over 4x4 subregions, as shown on the right, with into orientation histograms summarizing the contents over 4x4 subregions, as shown on the right, with the length of each arrow corresponding to the sum of the gradient magnitudes near that direction within the length of each This arrowfigure corresponding to the sum of array the gradient magnitudes nearset thatofdirection the region. shows a 2x2 descriptor computed from an 8x8 samples,within whereas the region. This figure shows a 2x2 descriptor array computed from an 8x8 set of samples, whereas the experiments in this paper use 4x4 descriptors computed from a 16x16 sample array. the experiments in this paper use 4x4 descriptors computed from a 16x16 sample array. Each descriptor: 4 x 4 grid x 8 orientations = 128 dimensions 8 Scale Invariant Feature Transform • Extract SIFT features from an image • Each image might generate 100’s or 1000’s of SIFT descriptors [ A. Vedaldi ] 9 Feature Matching • Goal: Find all correspondences between a pair of images ? • Extract and match all SIFT descriptors from both images [ A. Vedaldi ] 10 Feature Matching • • • Each SIFT feature is represented by 128 numbers Feature matching becomes task of finding a nearby 128-d vector All nearest neighbours: ∀j N N (j) = arg min ||xi − xj ||, i �= j i • • • • Solving this exactly is O(n2), but good approximate algorithms exist e.g., [Beis, Lowe ’97] Best-bin first k-d tree Construct a binary tree in 128-d, splitting on the coordinate dimensions Find approximate nearest neighbours by successively exploring nearby branches of the tree 11 2-view Rotational Geometry • • Feature matching returns a set of noisy correspondences To get further, we will have to understand something about the geometry of the setup 12 2-view Rotational Geometry • Recall the projection equation for a pinhole camera ũ = K R | | | t X̃ T : Homogeneous image position ũ ∼ [u, v, 1] T X̃ ∼ [X, Y, Z, 1] : Homogeneous world coordinates K (3 × 3) R (3 × 3) t (3 × 1) : Intrinsic (calibration) matrix : Rotation matrix : Translation vector 13 2-view Rotational Geometry • • Consider two cameras at the same position (translation) WLOG we can put the origin of coordinates Appendix there A. Multiple View Geometry ũ1 = K1 [ R1 | t1 ] X̃ • Set translation to 0 ũ1 = K1 [ R1 | 0 ] X̃ • Remember X̃ ∼ [X, Y, Z, 1] T so ũ1 = K1 R X 1 Figure A.4: Panoramic Geometry. For a camera that rotates abou there is a one-to-one mapping between the points in the two image (where X = [X, T Y, Z] ) A.5.1 Panoramic Geometry For images related by a rotation about the camera centre there is14a o 2-view Rotational Geometry • Add a second camera (same translation but different rotation and intrinsic matrix) Appendix A. Multiple View Geometry ũ1 = K1 R1 X ũ2 = K2 R2 X • Now eliminate X X= • −1 T R1 K1 ũ1 Substitute in equation 1 ũ2 = −1 T Figure K A.4: Panoramic K2 R2 R ũ1 Geometry. For a camera that rotates abou 1 1 there is a one-to-one mapping between the points in the two image Panoramic This is a 3x3 matrixA.5.1 -- a (special form)Geometry of homography For images related by a rotation about the camera centre there is15a o Computing H: Quiz • • • • u h11 s v = h21 1 h31 h12 h22 h32 h13 x h23 y h33 1 Each correspondence between 2 images generates _____ equations A homography has _____ degrees of freedom _____ point correspondences are needed to compute the homography Rearranging to make H the subject leads to an equation of the form Mh = 0 • This can be solved by _____ 16 Finding Consistent Matches • Raw SIFT correspondences (contains outliers) 17 Finding Consistent Matches • SIFT matches consistent with a rotational homography 18 Finding Consistent Matches • Warp images to common coordinate frame 19 RANSAC • • • • • RAndom SAmple Consensus [Fischler-Bolles ’81] Allows us to robustly estimate the best fitting homography despite noisy correspondences Basic principle: select the smallest random subset that can be used to compute H Calculate the support for this hypothesis, by counting the number of inliers to the transformation Repeat sampling, choosing H that maximises # inliers 20 RANSAC H = eye(3,3); nBest = 0; for (int i = 0; i < nIterations; i++) { P4 = SelectRandomSubset(P); Hi = ComputeHomography(P4); nInliers = ComputeInliers(Hi); if (nInliers > nBest) { H = Hi; nBest = nInliers; } } 21 Recognising Panoramas [ Brown, Lowe ICCV’03 ] 22 Global Alignment • • • The pairwise image relationships are given by homographies But over time multiple pairwise mappings will accumulate errors Notice: gap in panorama before it is closed... 23 Gap Closing 24 Bundle Adjustment 25 Bundle Adjustment • Minimise sum of robustified residuals e(Θ) = np � � i=1 j�V(i) - • f (uij (Θ) − mij ) uij = projected position of point i in image j mij = measured position of point i in image j V(i) = set of images where point i is visible np Θ = # points/tracks (mutual feature matches across images) = camera parameters Robust error function (Huber) f (x) = � |x|2 , 2σ|x| − σ 2 , |x| < σ |x| ≥ σ 26 Aerial Image Stitching [ Images: SenseFly, Zufferey, Beleyer EPFL] 27 Yellowstone Geysers [ B. Planar-Friedrich (U. Bayreuth) and Yellowstone National Park Service ] 28 Endoscope Imaging [ E. Seibel, J. Wong (U. Washington) ] 29 Other topics • • • Blending/Gain compensation/HDR -- how to combine multiple images to give a consistent, seamless result Gravity direction estimate -- how do we know which way is up? Different projections -- rectilinear, spherical, cylindrical, fisheye... More details http://cs.bath.ac.uk/brown 30 iPhone Computer Vision Apps Also good: Kooaba Déjà vu, Photosynth, Word Lens 31 Aardman, Disney, Double Negative, EA, Frontier, BBC, Buro Happold, Smoke and Mirrors... PhD and EngD positions starting September 2012 m.brown@bath.ac.uk 32