Sample Problem Set 2 – SOLUTIONS
Transcription
Sample Problem Set 2 – SOLUTIONS
CSCI 510 / EENG 510 Sample Problem Set 2 – SOLUTIONS The exam covers these sections in the Gonzalez and Woods textbook: Ch 10 (10.1-10.3.3) Ch 11(11.1.2, 11.2.2-11.2.4, 11.3.3-11.3.4) Ch 5 (5.1-5.3, 5.5-5.8) Ch 7 (7.1.1) Ch 8 (8.1-8.2.9) It will be closed book, but handwritten notes are allowed. The problems below are representative of exam problems (although there may be more problems than would appear on the actual exam). Some of the problems below are drawn from previous exams. 1. Explain why the edges found by the Laplacian of a Gaussian edge operator form closed contours. Solution: The edges are the zero crossings of the values resulting from convolving the image with the Laplacian of Gaussian operator. In other words, the edges are the boundary between regions with negative values and regions with positive values. Those regions are connected components. A contour surrounding a connected component must be a closed path. 2. The left image below is a noisy 200x200 image of two squares, rotated at 45 degrees. An edge detection operation is performed to yield the binary edge image on the right. Where are the peaks in the Hough transform; i.e., what values of ()? Describe the relative height of the peaks to each other. Note: assume the normal representation of a line: x cos y sin , where the origin is at the upper left and the x axis points to the right and the y axis points down: 1 CSCI 510 / EENG 510 x y Solution: The geometry is as shown below. The lines at +45° have equal to 50 2 , 100 2 , or 150 2 . The lines at -45° have equal to 50 2 , 0, or 50 2 . (Or equivalently, you could have one line at = -45°, = 50 2 , another at = -45°, = 0, and the third at = 135°, = 50 2 .) The peaks at (+45°, 100 2 ), (-45°, 0) are twice as big as the others. -45° +45° x x y y 3. The steps for the basic global thresholding algorithm are given below. Restate the algorithm to use the histogram of the image h(rk), instead of the image itself. 1. Select an initial estimate for the threshold T. 2. Segment the image using T, to obtain two groups of pixels: G1 is all pixels with values < T and G2 is all pixels with values >= T. 3. Compute the mean values m1 and m2, for groups G1 and G2 respectively. 4. The new threshold is T = (m1 + m2)/2. 5. Repeat steps 2 through 4 until there is no further change in T. Solution: We first compute the probabilities of the pixel values from the histogram: pk = h(rk)/N where N is the total number of pixels in the image. 2 CSCI 510 / EENG 510 The means can be computed directly from the probabilities using: T 1 m1 ipi i 0 L 1 m2 ipi i T T 1 p i 0 i L 1 p i T i 4. The basic global thresholding algorithm is applied to an image with the histogram as shown below. i. What value for the threshold does the algorithm find? Assume that the initial estimate for the threshold is somewhere between the minimum and maximum values in the image. ii. Is the threshold obtained with the algorithm dependent on the initial estimate (assuming that the initial estimate is somewhere between the minimum and maximum values)? Give an example to show your conclusion. H(r) 50 50 40 40 10 10 100 150 180 r Solution: The basic global thresholding algorithm is: (1) Select an initial estimate of the threshold T. (2) Find the mean of the pixels less or equal to T and the mean of the pixels greater than T. (3) Compute the new threshold as the midpoint between the means. (4) Repeat steps 2 and 3 until there are no more changes to T. (a) There are only three values in the image: 100, 150, and 180. Pick the initial threshold T somewhere between 100 and 150, say 140 (it doesn’t matter where; the results will be the same). The mean of the pixels less than T is 1 100 . The mean of the pixels greater than T is 150(10) 180(40) 100 174 174 . The new threshold is T 137 . Obviously 1 and 1 50 2 will be the same in the next iteration, so this is the final answer for T. 2 3 CSCI 510 / EENG 510 (b) The initial estimate of T doesn’t matter. If you chose T somewhere between 150 and 180 (say, 160), it would quickly adjust T to be between 100 and 150, and then converge to the answer found above. As an example, the mean of the pixels less than T is 100(50) 150(10) 108.33 . The mean of the pixels greater than T is 2 180 . The new 1 60 108.33 180 144.16 . The next iteration of the algorithm will move T to its threshold is T 2 final answer of 137. 5. The Hough transform algorithm for detecting lines (of the form x cos y sin ) can be written as follows: Input a binary edge image E(x,y) Initialize accumulator array A(i,j) to zeros for all values of (x,y) if E(x,y) == 1 between min max x cos y sin for all values of Compute Increment A(i,j) where (i,j) corresponds to the cell associated with , i j end end end Search for peaks in A(i,j) – the corresponding values of , are the parameters of i j the detected lines Rewrite the algorithm to detect ellipses of a specific size and shape. Recall that the x2 y2 1 . The ellipses can be centered a2 b2 at any location in the image. As in the example above, you can assume that the input is a binary edge image E(x,y). equation of an ellipse (centered at the origin) is Solution: The ellipses will be parameterized by their center (x0,y0) in the equation x x 0 2 y y 0 2 a2 b2 1 Given a hypothesized value for x0, we can solve for y0 using: 4 CSCI 510 / EENG 510 y0 y b 1 x x 0 2 a2 for all pixels (x,y) in the edge image E if E(x,y) is an edge point for each possible value of x0 compute the corresponding values of y0 (using eqn above) increment accumulator array A at those locations (x0,y0) end end end 6. Recall the definition of Fourier descriptors: The coordinates of the points on a contour are expressed as complex numbers, and the discrete Fourier transform (DFT) is taken of that series of numbers. What kind of contour shape would have a DFT consisting of real numbers? Hint: If x0,…,xN-1 are real numbers, then the DFT obeys the symmetry Xk = X*N-k where the star denotes complex conjugation. Solution: The points (xk,yk) on the contour are expressed as (xk + j yk). If the values of Xk are real, then the points on the contour satisfy the property xk = x*N-k . Taking the conjugate of a complex number means to change the sign of the imaginary part. So (xk + j yk) = (xN-k – j yN-k). Therefore, xk = xN-k and yk = -yN-k. So the shapes are symmetrical about the x-axis. 7. The two checkerboard images below are each 8x8 binary images, where black=0 and white=1. Image 1 Image 2 (a) Compute the co-occurrence matrices for each of the images, using the position operator “one pixel to the right”. (b) Compute the “contrast” feature for each image. The contrast feature is derived K from the co-occurrence matrix and is defined as K i j i 1 j 1 2 pij where pij is the probability that a pair of points satisfying the position relationship will have the values (zi,zj). 5 CSCI 510 / EENG 510 Solution: (a) Co-occurrence matrix for image 1: 24 4 4 24 Co-occurrence matrix for image 2: 16 12 12 16 (b) The normalized matrix (pij) for image 1: 24/56=3/7 4/56=1/14 4/56=1/14 24/56=3/7 The normalized matrix (pij) for image 2: 16/56=2/7 12/56=3/14 12/56=3/14 16/56=2/7 Contrast for image 1: 8/56 = 0.14 Constrast for image 2: 24/56 = 0.43 6 CSCI 510 / EENG 510 8. A 4 level Laplacian image pyramid is formed from a 256x256 image as shown. (a) How much storage is required to represent a 4 level pyramid, assuming 1 byte per pixel? Solution: 256x256 + 128x128 + 64x64 + 32x32 = 87040 bytes. Note that if you generate the complete pyramid of an NxN image, the total storage is (4/3)N2 = 87381 bytes; only a little bit more. (b) Assume that the smallest gray scale image can be compressed to 5.5 bits per pixel without loss of data, and the Laplacian images can be compressed to 1.5 bits per pixel without loss of data. What is the storage required now? Give the compression ratio with respect to the original image. Solution: The smallest gray scale image is 32x32 = 1024 pixels. At 5.5 bits per pixel, this is 5632 bits. The Laplacian images are 256x256 + 128x128 + 64x64 = 86016 pixels. At 1.5 bits per pixel, this is 129024 bits. The total for the pyramid is 134656 bits. The original gray scale image was 256x256 = 65536 pixels. With no compression, this is 524288 bits. The compression ratio for the compressed pyramid is 524288/134656 ≈ 4:1. 7 CSCI 510 / EENG 510 9. The adaptive local noise reduction filter is defined as: 2 f ( x, y ) g ( x, y ) 2 g ( x, y ) m L L where mL is the local mean, L2 is the local variance, and 2 is the variance of the noise. This filter has which of the following properties (choose two)? (a) It adapts its window (kernel) size depending on the image statistics. (b) It acts like a mean filter in areas where the local image variance is close to the overall noise variance. (c) It acts like a median filter in areas where the local image variance is close to zero. (d) It maintains the power spectrum of the original image. (e) It returns the original image in areas where the local image variance is high. Solution: b, e 10. Consider a linear, position invariant degradation system with impulse response equal to 2 2 h ( x, y ) e x y . (a) Assume that the input image is a bright spot at point (a,b) in the image, and zero everywhere else. Namely, f(x,y) = δ(x-a, y-b). What is the degraded output image g(x,y)? Solution: The shift invariant system performs a convolution of the input image f(x,y) with the impulse response function h(x,y). The definition of convolution is: h x, y f x, y hx' , y' f x x' , y y' dx' dy' We are told that the input image is an impulse, or delta function. By definition, the delta function δ(x, y) is zero everywhere except at x,y = 0,0. At the origin, its value is infinity (a spike, or impulse). Also the area (or volume, in the case of 2D) is defined to be equal to 1: x, y dx dy 1 Convolving the delta function with another function h(x,y) will just return the function h(x,y): h x , y x, y hx' , y' x x' , y y' dx' dy' hx, y because the delta function is zero everywhere except at x’=x, y’=y. 8 CSCI 510 / EENG 510 If we translate the impulse to x=a, y=b, we get the same result except that h is translated to a,b: h x, y x a , y b hx' , y' x a x' , y b y' dx dy hx a, y b because the delta function is zero everywhere except at x’=x-a, y’=y-b. So the output image g(x,y) is the same as the impulse response but translated to a,b; namely, 2 2 g ( x, y ) e x a y b (b) Assume that the input image is a bright vertical line located at x=a in the image, and zero everywhere else. Namely, f(x,y) = δ(x-a). What is the degraded output image g(x,y)? Solution: h x, y x a hx' , y' x a x' dx' dy' h( x a, y ' ) dy ' e x a 2 y '2 dy ' e x a 2 e y'2 dy ' 2 The integral of e y over all y is equal to the square root of pi. The output image g(x,y) is a 2 vertical line that is blurred in the x direction. Namely, g ( x, y ) e x a . 11. An image is degraded by a Gaussian blur whose Fourier transform H(u,v) is a Gaussian 2 2 2 H (u ) e ( u v )/ 2 , where = 20. A one-dimensional cross section of H through the uaxis is shown below. 9 CSCI 510 / EENG 510 1 0.9 0.8 0.7 H(u) 0.6 0.5 0.4 0.3 0.2 0.1 0 -50 -40 -30 -20 -10 0 u 10 20 30 40 50 The blurred image is corrupted by noise, with power spectrum S(u,v). We know that spectrum S(u,v) >> Sf(u,v) for values of (u 2 v 2 ) 2 2 , where Sf is the power spectrum of the original image. Also, S(u,v) << Sf(u,v) for all values of (u 2 v 2 ) 2 2 , except for the frequencies (u , v) ( , 0) , where S (u, v) S f (u, v) (this is due to an additional periodic noise component). A Wiener filter RW(u) is used to restore the signal, such that Fˆ u, v R u, v G u, v W where Fˆ u, v is the estimated Fourier transform of the restored signal and G u , v is the Fourier transform of the measured (degraded) signal. Recall that RW(u) is defined as H * u, v RW u , v H u , v 2 S u, v S f u, v What are the values of RW(u,v) at (u , v) (0, 0) and (u , v) ( , 0) ? Sketch a cross section of RW(u,v) through the u-axis. Solution: At values of (u,v) such that (u 2 v 2 ) 2 2 the filter is just the inverse of H(u,v), or RW (u ) e (u 2 v 2 )/2 2 . So at (0,0), it is equal to 1. At values of (u,v) such that (u 2 v 2 ) 2 2 , the filter drops to close to zero, except for the two points (u , v ) ( , 0) . At those two points e 1/ 2 RW u , v 1 = 0.44. e 1 10 CSCI 510 / EENG 510 A one dimensional cross section: 3 2.5 2 1.5 1 0.5 0 0 20 40 60 80 100 R as a two dimensional image: R as a surface plot: 3 2.5 2 1.5 1 0.5 0 150 120 100 100 80 60 50 40 20 0 0 11 120 CSCI 510 / EENG 510 12. The images below are quite different, but their histograms are the same. Suppose each image is blurred with a 3x3 box filter smoothing mask. (a) Would the histograms still be identical after blurring? Solution: The number of boundary points between black and white regions is much larger in the image on the right. When the images are blurred, the boundary points will give rise to a larger number of different values for the image on the right, so the histograms of the two blurred images will be different. (b) If the answer is yes, sketch the histograms. If the answer is no, just sketch the histogram of the left image. Note: Assume that the white regions have value=1 and the dark regions have value=0. You can ignore the border effects around the outside border of the image, by assuming that the pixels outside the border are replicated from the values just inside the border. Solution: Assume that the image is of size N x N. Blurring is implemented by a 3 x 3 mask whose coefficients are 1/9. The mask does not change the image except for the columns immediately adjacent to the center border between the black and white regions. We still have almost half the pixels having value 0 and half having value 1. There is one column with value 1/3, and one column with value 2/3. So if you ignore the border effects, the approximate number of pixels with each value is. No. of Points ≈N2/2 N N ≈N2/2 Value 0 3/9 6/9 1 A histogram is easily constructed from the entries in this table. 12 CSCI 510 / EENG 510 13. We have an image that contains 3-bit gray values with the following probabilities. Gray Level 0 1 2 3 4 5 6 7 Probability 0.4 0 0 0.1 0.25 0.2 0.05 0 (a) Calculate the entropy of this image, in units of bits/pixel. J The entropy is defined as H (z ) P ( ai ) log P ( ai ) . In our case, this is j 1 H = - [ (0.4) log (0.4) + (0.1) log (0.1) + (0.25) log (0.25) + (0.2) log (0.2) + (0.05) log (0.05) ] = 2.04 bits (b) Calculate a Huffman code for this image, and give the average length of the code in units of bits/pixel. Solution: Gray Level 0 1 2 3 4 5 6 7 Probability 0.4 (1) 0 0 0.1 (0010) 0.25 (01) 0.2 (000) 0.05 (0011) 0 Sorted 0.4 (1) 0.25 (01) 0.2 (000) 0.1 (0010) 0.05 (0011) 0 0 0 The Huffman code is 13 1 0.4 (1) 0.25 (01) 0.2 (000) 0.15 (001) 0 0 0 0 2 0.4 (1) 0.35 (00) 0.25 (01) 0 0 0 0 0 3 0.6 (0) 0.4 (1) 0 0 0 0 0 0 CSCI 510 / EENG 510 Gray Level Probability 0 1 2 3 4 5 6 7 0.4 0 0 0.1 0.25 0.2 0.05 0 Huffman code 1 0010 01 000 0011 - Average length = P(0) * len(0) + P(1) * len(1) + … + P(7) * len(7) = (0.4)(1) + (0.1)(4) + (0.25)(2) + (0.2)(3) + (0.05)(4) = 2.1 bits So we are still a little above the absolute minimum of 2.04 bits. 14