Automatic Panoramic Image Stitching

Transcription

Automatic Panoramic Image Stitching
Automatic Panoramic
Image Stitching
Dr. Matthew Brown,
University of Bath
Automatic 2D Stitching
•
The old days: panoramic stitchers were limited to a 1-D sweep
•
2005: 2-D stitchers use object recognition for discovery of overlaps
2
AutoStitch iPhone
“Create gorgeous panoramic
photos on your iPhone”
- Cult of Mac
“Raises the bar on iPhone panoramas”
- TUAW
“Magically combines the resulting shots”
- New York Times
3
4F12 class of ‘99
37
Projection
Case study – Image mosaicing
Any two images of a general scene with the same
camera centre are related by a planar projective
transformation given by:
w̃! = KRK−1w̃
where K represents the camera calibration matrix
and R is the rotation between the views.
This projective transformation is also known as the
homography induced by the plane at infinity. A minimum of four image correspondences can be used to
estimate the homography and to warp the images
onto a common image plane. This is known as mosaicing.
4
Meanwhile in 1999...
•
•
•
•
David Lowe publishes “Scale Invariant Feature Transform”
11,572 citations on Google scholar
A breakthrough solution to the correspondence problem
SIFT is capable of operating over much wider baselines than
previous methods
Figure 5: Examples of 3D object recognition with occlusion.
Figure 4: Top row shows model images for 3D objects with
outlines found by background segmentation. Bottom image
shows recognition results for 3D objects with model outlines
and image keys used for matching.
7. Experiments
Figure 5: Examples o
The affine solution provides a good approximation to perspective projection of planar objects, so planar models provide a good initial test of the approach. The top row of FigFigure 5: Examples of 3D object recognition with occlusion.
ure 3 shows three model images of rectangular planar
faces4: Top row shows model images for 3D objects with
Figure
of objects. The figure also shows a cluttered image outlines
contain-found by background segmentation. Bottom image
ing the planar objects, and the same image is shown
overshows
recognition results for 3D objects with model outlines
7.
Experiments
layed with the models following recognition. Theand
model
image keys used for matching.
keys that are displayed are the ones used for recognition and
7. Experimen
[ Lowe ICCV 1999 ]
mined by solving the corresponding normal equations,
The affine solution p
spective projection of
vide a good initial tes
ure 3 shows three mo
of objects. The figure
5
Local Feature Matching
•
Given a point in the world...
[ ]
...compute a description of that point
that can be easily found in other images
6
Scale Invariant Feature Transform
12
Lindeberg
•
•
Start by detecting points of interest (blobs)
original image
scale-space maxima of (∇2norm L)2
(trace Hnorm L)2
(det Hnorm L)2
Find maxima of image Laplacian over scale and space
∂2I
∂2I
L(I(x)) = ∇.∇I =
+
∂x2
∂y 2
[ T. Lindeberg ]
7
Scale Invariant Feature Transform
•
Describe local region by distribution (over angle) of gradients
Image
gradients
Image
gradients
•
Keypoint
descriptor
Keypoint
descriptor
7: A keypoint
descriptor
is created
by first
computing
the gradient
magnitude
orientation
FigureFigure
7: A keypoint
descriptor
is created
by first
computing
the gradient
magnitude
andand
orientation
at each image sample point in a region around the keypoint location, as shown on the left. These are
at each image sample point in a region around the keypoint location, as shown on the left. These are
weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated
weighted by a Gaussian window, indicated by the overlaid circle. These samples are then accumulated
into orientation histograms summarizing the contents over 4x4 subregions, as shown on the right, with
into orientation histograms summarizing the contents over 4x4 subregions, as shown on the right, with
the length of each arrow corresponding to the sum of the gradient magnitudes near that direction within
the length
of each This
arrowfigure
corresponding
to the
sum of array
the gradient
magnitudes
nearset
thatofdirection
the region.
shows a 2x2
descriptor
computed
from an 8x8
samples,within
whereas
the region.
This
figure
shows
a
2x2
descriptor
array
computed
from
an
8x8
set
of
samples,
whereas
the experiments in this paper use 4x4 descriptors computed from a 16x16 sample array.
the experiments in this paper use 4x4 descriptors computed from a 16x16 sample array.
Each descriptor: 4 x 4 grid x 8 orientations = 128 dimensions
8
Scale Invariant Feature Transform
•
Extract SIFT features from an image
•
Each image might generate 100’s or 1000’s of SIFT descriptors
[ A. Vedaldi ]
9
Feature Matching
•
Goal: Find all correspondences between a pair of images
?
•
Extract and match all SIFT descriptors from both images
[ A. Vedaldi ]
10
Feature Matching
•
•
•
Each SIFT feature is represented by 128 numbers
Feature matching becomes task of finding a nearby 128-d vector
All nearest neighbours:
∀j N N (j) = arg min ||xi − xj ||, i �= j
i
•
•
•
•
Solving this exactly is O(n2), but good approximate algorithms
exist
e.g., [Beis, Lowe ’97] Best-bin first k-d tree
Construct a binary tree in 128-d, splitting on the coordinate
dimensions
Find approximate nearest neighbours by successively exploring
nearby branches of the tree
11
2-view Rotational Geometry
•
•
Feature matching returns a set of noisy correspondences
To get further, we will have to understand something about the
geometry of the setup
12
2-view Rotational Geometry
•
Recall the projection equation for a pinhole camera

ũ = 
K


R
|
|
|

t X̃
T
: Homogeneous image position
ũ ∼ [u, v, 1]
T
X̃ ∼ [X, Y, Z, 1] : Homogeneous world coordinates
K (3 × 3)
R (3 × 3)
t (3 × 1)
: Intrinsic (calibration) matrix
: Rotation matrix
: Translation vector
13
2-view Rotational Geometry
•
•
Consider two cameras at the same position (translation)
WLOG we can put the origin of coordinates
Appendix there
A. Multiple View Geometry
ũ1 = K1 [ R1 | t1 ] X̃
•
Set translation to 0
ũ1 = K1 [ R1 | 0 ] X̃
•
Remember
X̃ ∼ [X, Y, Z, 1]
T
so
ũ1 = K1 R
X
1
Figure A.4:
Panoramic Geometry. For a camera that rotates abou
there is a one-to-one mapping between the points in the two image
(where
X = [X,
T
Y, Z] )
A.5.1 Panoramic Geometry
For images related by a rotation about the camera centre there is14a o
2-view Rotational Geometry
•
Add a second camera (same translation but different rotation
and intrinsic matrix)
Appendix A. Multiple View Geometry
ũ1 = K1 R1 X
ũ2 = K2 R2 X
•
Now eliminate X
X=
•
−1
T
R1 K1 ũ1
Substitute in equation 1
ũ2 =
−1
T
Figure K
A.4: Panoramic
K2 R2 R
ũ1 Geometry. For a camera that rotates abou
1
1
there is a one-to-one mapping between the points in the two image
Panoramic
This is a 3x3 matrixA.5.1
-- a (special
form)Geometry
of homography
For images related by a rotation about the camera centre there is15a o
Computing H: Quiz
•
•
•
•
  
u
h11
s v  = h21
1
h31
h12
h22
h32
 
h13
x
h23  y 
h33
1
Each correspondence between 2 images generates _____
equations
A homography has _____ degrees of freedom
_____ point correspondences are needed to compute the
homography
Rearranging to make H the subject leads to an equation of the
form
Mh = 0
•
This can be solved by _____
16
Finding Consistent Matches
•
Raw SIFT correspondences (contains outliers)
17
Finding Consistent Matches
•
SIFT matches consistent with a rotational homography
18
Finding Consistent Matches
•
Warp images to common coordinate frame
19
RANSAC
•
•
•
•
•
RAndom SAmple Consensus [Fischler-Bolles ’81]
Allows us to robustly estimate the best fitting homography
despite noisy correspondences
Basic principle: select the smallest random subset that can be
used to compute H
Calculate the support for this hypothesis, by counting the
number of inliers to the transformation
Repeat sampling, choosing H that maximises # inliers
20
RANSAC
H = eye(3,3); nBest = 0;
for (int i = 0; i < nIterations; i++)
{
P4 = SelectRandomSubset(P);
Hi = ComputeHomography(P4);
nInliers = ComputeInliers(Hi);
if (nInliers > nBest)
{
H = Hi;
nBest = nInliers;
}
}
21
Recognising Panoramas
[ Brown, Lowe ICCV’03 ]
22
Global Alignment
•
•
•
The pairwise image relationships are given by homographies
But over time multiple pairwise mappings will accumulate
errors
Notice: gap in panorama before it is closed...
23
Gap Closing
24
Bundle Adjustment
25
Bundle Adjustment
•
Minimise sum of robustified residuals
e(Θ) =
np
�
�
i=1 j�V(i)
-
•
f (uij (Θ) − mij )
uij = projected position of point i in image j
mij = measured position of point i in image j
V(i) = set of images where point i is visible
np
Θ
= # points/tracks (mutual feature matches across images)
= camera parameters
Robust error function (Huber)
f (x) =
�
|x|2 ,
2σ|x| − σ 2 ,
|x| < σ
|x| ≥ σ
26
Aerial Image Stitching
[ Images: SenseFly, Zufferey, Beleyer EPFL]
27
Yellowstone Geysers
[ B. Planar-Friedrich (U. Bayreuth) and Yellowstone National Park Service ]
28
Endoscope Imaging
[ E. Seibel, J. Wong (U. Washington) ]
29
Other topics
•
•
•
Blending/Gain compensation/HDR -- how to combine multiple
images to give a consistent, seamless result
Gravity direction estimate -- how do we know which way is up?
Different projections -- rectilinear, spherical, cylindrical,
fisheye...
More details http://cs.bath.ac.uk/brown
30
iPhone Computer Vision Apps
Also good: Kooaba Déjà vu,
Photosynth, Word Lens
31
Aardman, Disney, Double Negative, EA, Frontier,
BBC, Buro Happold, Smoke and Mirrors...
PhD and EngD positions starting September 2012
m.brown@bath.ac.uk
32