Sample Problem Set 2 – SOLUTIONS

Transcription

Sample Problem Set 2 – SOLUTIONS
CSCI 510 / EENG 510
Sample Problem Set 2 – SOLUTIONS
The exam covers these sections in the Gonzalez and Woods textbook:
 Ch 10 (10.1-10.3.3)
 Ch 11(11.1.2, 11.2.2-11.2.4, 11.3.3-11.3.4)
 Ch 5 (5.1-5.3, 5.5-5.8)
 Ch 7 (7.1.1)
 Ch 8 (8.1-8.2.9)
It will be closed book, but handwritten notes are allowed. The problems below are
representative of exam problems (although there may be more problems than would appear on
the actual exam). Some of the problems below are drawn from previous exams.
1. Explain why the edges found by the Laplacian of a Gaussian edge operator form closed
contours.
Solution:
The edges are the zero crossings of the values resulting from convolving the image with the
Laplacian of Gaussian operator. In other words, the edges are the boundary between regions
with negative values and regions with positive values. Those regions are connected components.
A contour surrounding a connected component must be a closed path.
2. The left image below is a noisy 200x200 image of two squares, rotated at 45 degrees. An
edge detection operation is performed to yield the binary edge image on the right.
Where are the peaks in the Hough transform; i.e., what values of ()? Describe the
relative height of the peaks to each other.
Note: assume the normal representation of a line: x cos   y sin    , where the
origin is at the upper left and the x axis points to the right and the y axis points down:
1
CSCI 510 / EENG 510
x


y
Solution:
The geometry is as shown below. The lines at +45° have  equal to 50 2 , 100 2 , or 150 2 .
The lines at -45° have  equal to 50 2 , 0, or  50 2 . (Or equivalently, you could have one
line at  = -45°,  = 50 2 , another at  = -45°,  = 0, and the third at  = 135°,  = 50 2 .)
The peaks at (+45°, 100 2 ), (-45°, 0) are twice as big as the others.
 -45°
+45°
x
x

y
y
3. The steps for the basic global thresholding algorithm are given below. Restate the
algorithm to use the histogram of the image h(rk), instead of the image itself.
1. Select an initial estimate for the threshold T.
2. Segment the image using T, to obtain two groups of pixels: G1 is all pixels with
values < T and G2 is all pixels with values >= T.
3. Compute the mean values m1 and m2, for groups G1 and G2 respectively.
4. The new threshold is T = (m1 + m2)/2.
5. Repeat steps 2 through 4 until there is no further change in T.
Solution:
We first compute the probabilities of the pixel values from the histogram: pk = h(rk)/N where N
is the total number of pixels in the image.
2
CSCI 510 / EENG 510
The means can be computed directly from the probabilities using:
T 1
m1   ipi
i 0
L 1
m2   ipi
i T
T 1
p
i 0
i
L 1
p
i T
i
4. The basic global thresholding algorithm is applied to an image with the histogram as
shown below.
i. What value for the threshold does the algorithm find? Assume that the initial
estimate for the threshold is somewhere between the minimum and maximum
values in the image.
ii. Is the threshold obtained with the algorithm dependent on the initial estimate
(assuming that the initial estimate is somewhere between the minimum and
maximum values)? Give an example to show your conclusion.
H(r)
50
50
40
40
10
10
100
150 180
r
Solution:
The basic global thresholding algorithm is:
(1) Select an initial estimate of the threshold T.
(2) Find the mean of the pixels less or equal to T and the mean of the pixels greater than T.
(3) Compute the new threshold as the midpoint between the means.
(4) Repeat steps 2 and 3 until there are no more changes to T.
(a) There are only three values in the image: 100, 150, and 180. Pick the initial threshold T
somewhere between 100 and 150, say 140 (it doesn’t matter where; the results will be the same).
The mean of the pixels less than T is 1  100 . The mean of the pixels greater than T is
150(10)  180(40)
100  174
 174 . The new threshold is T 
 137 . Obviously 1 and 1
50
2
will be the same in the next iteration, so this is the final answer for T.
2 
3
CSCI 510 / EENG 510
(b) The initial estimate of T doesn’t matter. If you chose T somewhere between 150 and 180
(say, 160), it would quickly adjust T to be between 100 and 150, and then converge to the answer
found above. As an example, the mean of the pixels less than T is
100(50)  150(10)
 108.33 . The mean of the pixels greater than T is 2  180 . The new
1 
60
108.33  180
 144.16 . The next iteration of the algorithm will move T to its
threshold is T 
2
final answer of 137.
5. The Hough transform algorithm for detecting lines (of the form   x cos   y sin  )
can be written as follows:
Input a binary edge image E(x,y)
Initialize accumulator array A(i,j) to zeros
for all values of (x,y)
if E(x,y) == 1
 between  min  max
  x cos   y sin 
for all values of
Compute
Increment A(i,j) where (i,j) corresponds to the cell
associated with
  , 
i
j
end
end
end
Search for peaks in A(i,j) – the corresponding values of
  ,  are the parameters of
i
j
the detected lines
Rewrite the algorithm to detect ellipses of a specific size and shape. Recall that the
x2 y2

 1 . The ellipses can be centered
a2 b2
at any location in the image. As in the example above, you can assume that the input is a
binary edge image E(x,y).
equation of an ellipse (centered at the origin) is
Solution: The ellipses will be parameterized by their center (x0,y0) in the equation
 x  x 0 2  y  y 0 2
a2

b2
1
Given a hypothesized value for x0, we can solve for y0 using:
4
CSCI 510 / EENG 510
y0  y  b 1 
 x  x 0 2
a2
for all pixels (x,y) in the edge image E
if E(x,y) is an edge point
for each possible value of x0
compute the corresponding values of y0 (using eqn above)
increment accumulator array A at those locations (x0,y0)
end
end
end
6. Recall the definition of Fourier descriptors: The coordinates of the points on a contour
are expressed as complex numbers, and the discrete Fourier transform (DFT) is taken
of that series of numbers. What kind of contour shape would have a DFT consisting of
real numbers? Hint: If x0,…,xN-1 are real numbers, then the DFT obeys the symmetry
Xk = X*N-k where the star denotes complex conjugation.
Solution: The points (xk,yk) on the contour are expressed as (xk + j yk). If the values of Xk are
real, then the points on the contour satisfy the property xk = x*N-k . Taking the conjugate of a
complex number means to change the sign of the imaginary part. So (xk + j yk) = (xN-k – j yN-k).
Therefore, xk = xN-k and yk = -yN-k. So the shapes are symmetrical about the x-axis.
7. The two checkerboard images below are each 8x8 binary images, where black=0 and
white=1.
Image 1
Image 2
(a) Compute the co-occurrence matrices for each of the images, using the position
operator “one pixel to the right”.
(b) Compute the “contrast” feature for each image. The contrast feature is derived
K
from the co-occurrence matrix and is defined as
K
  i  j 
i 1 j 1
2
pij where pij is the
probability that a pair of points satisfying the position relationship will have the values
(zi,zj).
5
CSCI 510 / EENG 510
Solution:
(a) Co-occurrence matrix for image 1:
24
4
4
24
Co-occurrence matrix for image 2:
16
12
12
16
(b)
The normalized matrix (pij) for image 1:
24/56=3/7
4/56=1/14
4/56=1/14
24/56=3/7
The normalized matrix (pij) for image 2:
16/56=2/7
12/56=3/14
12/56=3/14 16/56=2/7
Contrast for image 1:
8/56 = 0.14
Constrast for image 2:
24/56 = 0.43
6
CSCI 510 / EENG 510
8. A 4 level Laplacian image pyramid is formed from a 256x256 image as shown.
(a) How much storage is required to represent a 4 level pyramid, assuming 1 byte per
pixel?
Solution: 256x256 + 128x128 + 64x64 + 32x32 = 87040 bytes. Note that if you generate the
complete pyramid of an NxN image, the total storage is (4/3)N2 = 87381 bytes; only a little bit
more.
(b) Assume that the smallest gray scale image can be compressed to 5.5 bits per pixel
without loss of data, and the Laplacian images can be compressed to 1.5 bits per pixel
without loss of data. What is the storage required now? Give the compression ratio
with respect to the original image.
Solution:
The smallest gray scale image is 32x32 = 1024 pixels. At 5.5 bits per pixel, this is 5632 bits.
The Laplacian images are 256x256 + 128x128 + 64x64 = 86016 pixels. At 1.5 bits per pixel,
this is 129024 bits.
The total for the pyramid is 134656 bits.
The original gray scale image was 256x256 = 65536 pixels. With no compression, this is
524288 bits.
The compression ratio for the compressed pyramid is 524288/134656 ≈ 4:1.
7
CSCI 510 / EENG 510
9. The adaptive local noise reduction filter is defined as:

 2
f ( x, y )  g ( x, y )  2 g ( x, y )  m L 
L
where mL is the local mean,  L2 is the local variance, and  2 is the variance of the noise.
This filter has which of the following properties (choose two)?
(a) It adapts its window (kernel) size depending on the image statistics.
(b) It acts like a mean filter in areas where the local image variance is close to the
overall noise variance.
(c) It acts like a median filter in areas where the local image variance is close to zero.
(d) It maintains the power spectrum of the original image.
(e) It returns the original image in areas where the local image variance is high.
Solution: b, e
10. Consider a linear, position invariant degradation system with impulse response equal to
2
2
h ( x, y )  e  x  y  .
(a) Assume that the input image is a bright spot at point (a,b) in the image, and zero
everywhere else. Namely, f(x,y) = δ(x-a, y-b). What is the degraded output image g(x,y)?
Solution:
The shift invariant system performs a convolution of the input image f(x,y) with the impulse
response function h(x,y). The definition of convolution is:
h  x, y   f  x, y  
 
  hx' , y' f x  x' , y  y' dx' dy'
  
We are told that the input image is an impulse, or delta function. By definition, the delta
function δ(x, y) is zero everywhere except at x,y = 0,0. At the origin, its value is infinity (a
spike, or impulse). Also the area (or volume, in the case of 2D) is defined to be equal to 1:
 
   x, y  dx dy  1
  
Convolving the delta function with another function h(x,y) will just return the function h(x,y):
h  x , y     x, y  
 
  hx' , y'  x  x' , y  y' dx' dy'  hx, y 
  
because the delta function is zero everywhere except at x’=x, y’=y.
8
CSCI 510 / EENG 510
If we translate the impulse to x=a, y=b, we get the same result except that h is translated to a,b:
h  x, y     x  a , y  b  
 
  hx' , y'  x  a   x' ,  y  b  y' dx dy  hx  a, y  b
 
because the delta function is zero everywhere except at x’=x-a, y’=y-b.
So the output image g(x,y) is the same as the impulse response but translated to a,b; namely,
2
2
g ( x, y )  e   x  a    y b  
(b) Assume that the input image is a bright vertical line located at x=a in the image, and
zero everywhere else.
Namely, f(x,y) = δ(x-a). What is the degraded output image g(x,y)?
Solution:
h  x, y     x  a  
 

  hx' , y'  x  a   x' dx' dy'
 

  h( x  a, y ' ) dy '


  e   x  a 
2
 y '2
 dy '

 e x  a 
2

e
 y'2
dy '

2
The integral of e  y over all y is equal to the square root of pi. The output image g(x,y) is a
2
vertical line that is blurred in the x direction. Namely, g ( x, y )   e   x  a   .
11. An image is degraded by a Gaussian blur whose Fourier transform H(u,v) is a Gaussian
2
2
2
H (u )  e  ( u  v )/ 2 , where  = 20. A one-dimensional cross section of H through the uaxis is shown below.
9
CSCI 510 / EENG 510
1
0.9
0.8
0.7
H(u)
0.6
0.5
0.4
0.3
0.2
0.1
0
-50
-40
-30
-20
-10
0
u
10
20
30
40
50
The blurred image is corrupted by noise, with power spectrum S(u,v). We know that
spectrum S(u,v) >> Sf(u,v) for values of (u 2  v 2 )  2 2 , where Sf is the power
spectrum of the original image. Also, S(u,v) << Sf(u,v) for all values of (u 2  v 2 )  2 2
, except for the frequencies (u , v)  ( , 0) , where S (u, v)  S f (u, v) (this is due to an
additional periodic noise component).
A Wiener filter RW(u) is used to restore the signal, such that
Fˆ  u, v   R  u, v  G  u, v 
W
where Fˆ  u, v  is the estimated Fourier transform of the restored signal and G  u , v  is
the Fourier transform of the measured (degraded) signal. Recall that RW(u) is defined
as


H *  u, v 


RW  u , v  
 H  u , v  2  S  u, v  S f  u, v  




What are the values of RW(u,v) at (u , v)  (0, 0) and (u , v)  ( , 0) ?
Sketch a cross section of RW(u,v) through the u-axis.
Solution:
At values of (u,v) such that (u 2  v 2 )  2 2 the filter is just the inverse of H(u,v), or
RW (u )  e (u
2
 v 2 )/2 2
. So at (0,0), it is equal to 1. At values of (u,v) such that (u 2  v 2 )  2 2 , the
filter drops to close to zero, except for the two points (u , v )  ( , 0) . At those two points
 e 1/ 2 
RW  u , v    1  = 0.44.
 e  1
10
CSCI 510 / EENG 510
A one dimensional cross section:
3
2.5
2
1.5
1
0.5
0
0
20
40
60
80
100
R as a two dimensional image:
R as a surface plot:
3
2.5
2
1.5
1
0.5
0
150
120
100
100
80
60
50
40
20
0
0
11
120
CSCI 510 / EENG 510
12. The images below are quite different, but their histograms are the same. Suppose each
image is blurred with a 3x3 box filter smoothing mask.
(a) Would the histograms still be identical after blurring?
Solution: The number of boundary points between black and white regions is much larger in the
image on the right. When the images are blurred, the boundary points will give rise to a larger
number of different values for the image on the right, so the histograms of the two blurred
images will be different.
(b) If the answer is yes, sketch the histograms. If the answer is no, just sketch the
histogram of the left image. Note: Assume that the white regions have value=1 and the
dark regions have value=0. You can ignore the border effects around the outside border of
the image, by assuming that the pixels outside the border are replicated from the values
just inside the border.
Solution: Assume that the image is of size N x N. Blurring is implemented by a 3 x 3 mask
whose coefficients are 1/9. The mask does not change the image except for the columns
immediately adjacent to the center border between the black and white regions. We still have
almost half the pixels having value 0 and half having value 1. There is one column with value
1/3, and one column with value 2/3. So if you ignore the border effects, the approximate number
of pixels with each value is.
No. of Points
≈N2/2
N
N
≈N2/2
Value
0
3/9
6/9
1
A histogram is easily constructed from the entries in this table.
12
CSCI 510 / EENG 510
13. We have an image that contains 3-bit gray values with the following probabilities.
Gray Level
0
1
2
3
4
5
6
7
Probability
0.4
0
0
0.1
0.25
0.2
0.05
0
(a) Calculate the entropy of this image, in units of bits/pixel.
J
The entropy is defined as H (z )    P ( ai ) log P ( ai ) . In our case, this is
j 1
H = - [ (0.4) log (0.4) + (0.1) log (0.1) + (0.25) log (0.25) + (0.2) log (0.2) + (0.05) log (0.05) ]
= 2.04 bits
(b) Calculate a Huffman code for this image, and give the average length of the code in
units of bits/pixel.
Solution:
Gray Level
0
1
2
3
4
5
6
7
Probability
0.4 (1)
0
0
0.1 (0010)
0.25 (01)
0.2 (000)
0.05 (0011)
0
Sorted
0.4 (1)
0.25 (01)
0.2 (000)
0.1 (0010)
0.05 (0011)
0
0
0
The Huffman code is
13
1
0.4 (1)
0.25 (01)
0.2 (000)
0.15 (001)
0
0
0
0
2
0.4 (1)
0.35 (00)
0.25 (01)
0
0
0
0
0
3
0.6 (0)
0.4 (1)
0
0
0
0
0
0
CSCI 510 / EENG 510
Gray Level Probability
0
1
2
3
4
5
6
7
0.4
0
0
0.1
0.25
0.2
0.05
0
Huffman
code
1
0010
01
000
0011
-
Average length = P(0) * len(0) + P(1) * len(1) + … + P(7) * len(7)
= (0.4)(1) + (0.1)(4) + (0.25)(2) + (0.2)(3) + (0.05)(4)
= 2.1 bits
So we are still a little above the absolute minimum of 2.04 bits.
14