Detection of Scene Obstructions and Persistent View Changes
Transcription
Detection of Scene Obstructions and Persistent View Changes
2012 15th International IEEE Conference on Intelligent Transportation Systems Anchorage, Alaska, USA, September 16-19, 2012 Detection of Scene Obstructions and Persistent View Changes in Transportation Camera Systems* Ajay Raghavan, Robert Price, and Juan Liu Two important classes of problems that affect such fixed camera systems are obstructions and persistent view changes from causes such as undesirable tilting of the camera. Obstructions can cause the transportation scene of interest to be partially blocked or out of view. These can result from various factors: spray paint or blockage due to vandalism, foliage growing into the scene, ice forming on the porthole in the winter, construction signage, soot, etc. In addition, for fixed cameras, maintaining its orientation is important to ensure that the scene of interest is framed correctly. Often times, subtle unintentional tilting of the camera can cause certain scene elements of interest (e.g., a turn lane) to go out of view. Such tilting from the nominal orientation can be caused by factors such as technicians cleaning the deviceviewing porthole periodically, intentional vandalism, and accidental collisions with vehicles. On some occasions, new scene elements of interest might appear and cause persistent view changes. Examples include stalled cars, suspicious packages, or an accident. These might not be problems affecting the camera function per se, but might be cause for alerting operators, particularly in surveillance applications. Abstract— Unattended camera devices are increasingly being used in various intelligent transportation systems (ITS) for applications such as surveillance, toll collection, and photo enforcement. In these fielded systems, a variety of factors can cause camera obstructions and persistent view changes that may adversely affect their performance. Examples include camera misalignment, intentional blockage resulting from vandalism, and natural elements causing obstruction, such as foliage growing into the scene and ice forming on the porthole. In addition, other persistent view changes resulting from new scene elements of interest being captured, such as stalled cars, suspicious packages, etc. might warrant alarms. Since these systems are often unattended, it is often important to automatically detect such incidents early. In this paper, we describe innovative algorithms to address these problems. A novel approach that uses the image edge map to detect nearfield obstructions without a reference image of the unobstructed scene is presented. A second algorithm that can be used to detect more generic obstructions and persistent view changes using a learned scene element cluster map is then discussed. Lastly, an approach to detect and distinguish persistent view changes from changes in the orientation of the fixed camera system is explained. Together, these algorithms can be useful in a variety of camera-based ITS. In order to implement automated checks for the occurrence of the above-mentioned problems and incidents, it is desirable to use algorithms that can operate without the need for the full reference image of the same scene without the problem/incident. As one can imagine, transportation scenes captured by fixed cameras are constantly changing due to vehicles, pedestrians, seasonal and daily lighting changes, weather elements, and longer term changes in background elements (e.g., trees shedding leaves, changing billboards, etc.) Therefore, it is typically not practical to compare identical images with and without particular problems. Therefore, there is a need for no-reference or reduced-reference image quality algorithms that can detect these problems. Among works that have examined camera diagnostics for image quality, Hallowell et al. [1] explored the use of surveillance cameras for airport visibility-related weather and roadway condition monitoring applications. They discussed the concept of composite images for a camera aimed at a site that used a map of “expected” edges based on a series of training images for that site and ignored “transient” edges to avoid data quality issues from obstructions such as rain drops and dust. Harasse et al. [2] discussed the problem of detecting surveillance video camera issues such as obstructions, displacements, and focus issues using a similar idea of “stable” edges (also learned for each site) and examining their characteristics as they change from frame to frame. In this respect, it is arguably easier to deal with a continuous stream of video frames where one has the previous video frame as “reference” to examine for significant, abrupt changes. The algorithms presented by us in this paper are not restricted to such camera devices, but are I. INTRODUCTION Unattended fixed camera-based devices are now key components of various intelligent transportation systems. They find application for automated toll collection, traffic incident monitoring, and photo enforcement of various traffic rules, such as red light, speed limits, and stop signs, among others. Since these systems are unattended fielded systems, it is important to periodically check them for problems and examine for potential image/video quality problems that might interfere with their intended functionality. Without such checks, downtime and device degradation losses can quickly add up to substantial portions of operating expenses. Presently, various intelligent transportation solution service providers do such checks manually. As the fleet of fielded camera devices increases, these checks can contribute to significant operational overhead; therefore, there is interest in automating such checks. In addition, for surveillance applications, persistent scene element changes are of interest to detect suspicious activity or backups due to accidents and other incidents warranting attention. Automated detection of such incidents can help operators monitoring these cameras manage a larger fleet of camera devices without accidentally missing key events. *Research supported by Xerox Corporation. Ajay Raghavan#, Robert Price, and Juan Liu are with the Palo Alto Research Center (PARC, A Xerox Company), 3333 Coyote Hill Road, Palo Alto, CA 94304, USA (#Corresponding author phone: 1-650-812-4724; email: {raghavan, bprice, jjliu}@parc.com). 978-1-4673-3063-3/12/$31.00 ©2012 IEEE 957 from the particular camera being monitored. It uses edge detection [7] over the image captured by the camera and computes the local edge density over an appropriately sized window. The key assumptions made are: a) the scene of interest is largely in focus under nominal conditions. Consequently, scene features are sharply captured and the image exhibit a certain homogeneity in edge density or more generally regions with high local gradients or sharper focus, and b) an obstruction is in the near field compared to the focusing range (see Fig. 1) and hence not in focus. Consequently, the features of the obstruction are blurred, and there is a significant reduction in edge density or a local focus metric. Given these assumptions, we propose an image quality metric that computes edge density of an observed image. The edge density metric can then be compared to a threshold value that is learned from a nominal set of images to detect near-field obstruction. Figs. 2 and 3 show a couple of representative images with scene obstructions and their corresponding edge maps. Thus, the aforementioned assumptions are satisfied for a wide variety of obstructions commonly encountered by fixed transportation cameras and are extendable to pan-zoom-tilt cameras since it does not need a reference image. Thus, the edge density metric is a fast and efficient way for detecting such blockage patterns. Figure 1. Camera device imaging typical scene and near-field obstruction also applicable to devices capturing photographs and/or video clips of incidents intermittently when triggered by vehicle detection or other events of interest. Automated detection of persistent changes caused by incidents of interest is useful in many domains. For instance, one may wish to detect the introduction of a suspicious piece of luggage (a persistent change) against the background scene of a busy airport (many transient changes). Many cameramonitoring systems provide functions to detect transient changes, but the detection of persistent changes in dynamically varying scenes is still largely performed manually by human agents examining images and videos. The manual process is tedious and error prone. Automated site-specific persistent change detection is thus a subject of significant interest. However, site-specific persistent change detection in outdoor scenes with their naturally varying illumination due to sun position and weather effects is difficult. The introduction of local distribution models and intensity-invariant image spaces have been found to partially address some of these challenges [3]. The segmentation of images into regions and the detection of changes in images are both well-explored topics in computer vision. For instance, an SVM classifier has been used to classify pixels based on color for classifying image segments into sky and non-sky for aircraft monitoring [4]. This method is simple, but requires training with labeled regions of the sky. Many algorithms have been proposed in the domain of security monitoring. Typically these are based on either simple frame differences [5] or incorporate specific dynamic models for tracking of people [6]. Algorithms that can find persistent changes in scenes without the need to train specific models for specific types of objects are desirable, since we may not know a priori what type of changes we are looking for. B. Implementation for Obstruction Detection and Training Fig. 4(a) shows the computation of edge density metric. First the image goes through an edge detector to generate an edge map. In practice we favor the use of gradient-based edge detector such as Sobel edge detector over soft-edge detector such as Canny edge detector [7]. The latter can detect softer edges (i.e., edges with lower gradient intensity) attached to harder edges and enforce edge continuity. In the present context, this turns out to be disadvantageous, since obstructions within subtle features such as tree leaves might be detected, which is undesirable. The second step uses a local window of suitable size (a 500 pixel square worked well on images of size 2700 × 4080) to find larger local pixel neighborhoods of the image without edges. For any given pixel (i, j), it is labeled 1 if there is an edge pixel within the local window centered at (i, j), and labeled 0 otherwise. The local windows used for averaging are shown on the edge maps using dashed red boxes in Figs. 2(b) and 3(b). This generates a binary image where pixel values indicate local edge content. The algorithm then applies connected component analysis to identify clusters of pixels corresponding to edge-free regions. We then summarize the overall edge content as a scalar value, which measures the percentage of edge-free regions. Fig. 4(b) shows the learning phase. The input is a set of images, nominal and/or with blockage. The edge density metric is computed for each image. The edge metric values are then analyzed over this training set to get its nominal distribution. Such statistics are used to determine the proper threshold value (3-5% worked well for daytime images). Blockage detection is fairly simple: the overall edge density metric is compared against the learned threshold and an alarm is raised if the former is lower. This approach was validated over a set of 23 examples of traffic scenes with a variety of obstructions; of these, only 3 examples of non-homogeneous obstructions were not In the rest of this paper, we present three algorithms that address these highlighted issues. First, a novel approach for reference-image independent near-field obstruction detection is discussed. The second algorithm relies on characterizing the scene of interest based on feature-level descriptors into classes (without any manual tagging) and detects persistent view changes by looking for class changes at the featurelevel. The third algorithm can detect shifts in the framing based on its static scene elements. Each of these is described in the following sections. II. REFERENCE IMAGE-FREE OBSTRUCTION DETECTION A. Edge Density Metric for Near-Field Obstructions To address the above-mentioned blockage problems, we propose a near-field scene obstruction detection method that does not require a reference image of the scene of interest 958 successfully detected (discussed next). In addition, approximately 200 examples of unobstructed traffic scenes were tested under various traffic and lighting conditions and false alarms were not raised (except for foggy days). These outlier cases are discussed next. feature clusters is described next. Such methods would fall into the class of “reduced-reference” image quality algorithms, In such approaches, while the original image of the scene without degradation is not needed, some of its reduced features are extracted and used for subsequent quality analysis by comparing them with the respective features from the test image. The assumption that the region of interest of the image scene exhibits homogeneity in edge density is not always true. For instance, sun glare, overexposure, and scenes with significant portions of clear blue skies during daytime may reduce the overall or local edge density (see Fig. 5). In our implementation, we identify sun glare and overexposure problems and do not check for near-field blockage in those (minority) cases to avoid false alarms. Sun glare and overexposure problems can be detected by looking for saturated portions in the image (see Fig. 5(c) and 5(d)). Blue-sky regions are identifiable by their distinctive color signature. In addition, daytime versus nighttime classification needs to be done in the pre-processing stage (addressed by us in [8]): this is because nighttime images tend to have larger regions without features due to lack of ambient lighting and therefore need larger thresholds for obstructions (at the cost of lower detection sensitivity). (a) (c) Deduced binary obstructed region map overlaid on grayscale original image (b) Figure 3. (a) Foliage growth into traffic intersection scene causing partial obstruction over region of interest; (b) Sobel edge map of image in (a), and (c) Deduced obstruction map (a) (c) Deduced binary obstructed region map overlaid on grayscale original image (b) Figure 2. (a) Transportation scene captured by a camera with deliberate obstruction introduced over porthole; (b) Sobel edge map of image in (a), and (c) Deduced obstruction map It should be mentioned that there are some limitations of this method. Traffic scenes captured on foggy days do tend to lack sufficient hard edges. However, it should be added that on such days, the cameras may or may not capture poor quality images of some of the vehicles, as shown in Fig. 6(a). This would be an example of an uncontrollable obstruction in which case, the algorithm alerts to the fact that the camera may not be able to capture the scene of interest with desired clarity. In addition, the algorithm can miss some subtle obstructions that do not cause significant drops in edge density, such as in Fig. 6(b). However, such obstructions are less common in transportation applications. Arguably, some of these obstructions cannot be detected without at least some site-specific training done a priori for the algorithm. An example of such an algorithm that can learn the composition for a specific scene by grouping it into (a) (b) Figure 4. (a) Flowchart for computation of edge density metric; (b) learning to obtain threshold value from a set of training images III. PERSISTENT SCENE CHANGE DETECTION THROUGH A REDUCED-REFERENCE METHOD To address the above-mentioned persistent view change detection problem, we propose a reduced-reference algorithm in this section. The key requisite element here is a method of filtering out transient changes to leave behind persistent changes. One way to separate these types of changes is to exploit their time scale. We classify changes into transient changes lasting a short duration and persistent changes that are visible over an extended time. In the 959 (a) (b) (c) (d) Figure 5. Some issues that need to be checked for during preprocessing before the edge density method can be used for obstruction detection: (a) Sun glare in the scene and (b) Overexposure; (c) Histogram of top quarter of image in (a) that indicates sun glare based on % of pixels clipping/saturated; (d) Histogram of overall image in (b) that similarly indicates overexposure. following sections, we show how to combine this idea of filtering at various time scales together with methods for abstracting pixels into classes to cope with variations in illumination and appearance to create a robust mechanism for persistent change detection in dynamically changing scenes. The algorithm proposed in this section can detect both scene obstructions and other persistent scene view changes (the latter not necessarily being in the near field of the camera). Subsequently, it can be used in conjunction with the algorithm described in Section II to distinguish between near-field obstructions and other persistent changes. XYZ color space coordinates. L*a*b* space is preferred as unit changes in L*a*b* space correspond to perceptually salient changes in visual stimuli for the human visual system. The L*a*b* space also separates out the image intensity dimension from the hue component. This provides a natural basis to provide intensity invariant features that are robust to changes in illumination. In addition to local intensity and hue features, we also provide a feature that captures the local texture around the pixel. The current embodiment employs the local entropy of pixel intensities in each of the three L*a*b* components as a measure of local texture. To compute point-wise entropy, the histogram is obtained over a local neighborhood to get counts of various intensities and then normalize these over the total number pixels in the neighborhood to get probabilities. The entropy of this distribution is a measure of how uniform the neighborhood of the pixel is. Finally, the x and y coordinate of the pixel are used to associate certain regions with obstacles and to encourage smoothness of classification. The result is a vector V of pixel features with one row per pixel in the image, each of which has the form: ∗ ∗ ∗ ∗ ! = !, !, !∗(!,!) , !(!,!) , !(!,!) , ! !∗(!,!) , ! !(!,!) , ! !(!,!) (1) The abstract pixel features capture important properties of the pixels but are still hard to compare across images. For instance, a tree might have small variations in pixel hue and texture across images, but this is not significant. In the next step, we quantize or cluster these pixel features into a small number of classes that are robust to local changes. We accomplish this by grouping the image features as a mixture of Gaussian components. The location features provide useful additional features in the context of pixel clustering. The location parameter also forces generalization of the Gaussian components. Given several images of the same intersection from a fixed camera, we would expect a particular pixel to be generated by the same object class. The Gaussian components must therefore orient themselves to handle variations in lighting across the scenes over different days and seasons. A foliage pixel has to represent a range of appearances of the same leaf object. A roadway pixel must be broad enough to represent both road and passing cars. The result of the Gaussian mixture segmentation is the classification of pixels in the original image into a few large regions corresponding to different types of objects. These tend to group things like foliage, sky, roads and cars into clusters. Note that manual cluster tagging is not needed. (a) (b) Figure 6. Examples of some limitations of proposed reference-image independent edge density-based obstruction detection method: (a) Fog in the scene that is likely to be detected as an obstruction (but might warrant an alarm all the same since vehicles cannot be seen clearly in such situations) and (b) Small tree with low density that cannot be detected A. Pixel Features Our solution to this problem has two main components. The first component computes a new representation of the image in an abstract feature space. The features represent pixels in terms of intensity, hue, texture and location. These features naturally discriminate between objects of interest. Foliage tends to have predominantly green and brown tones and has a richly textured appearance. It is generally around the outsides of the image. Automobiles tend to have lighter colors and smooth textures. They tend to be in the middle of the picture. The abstraction process starts with conversion of the RGB image from the camera into L*a*b* space. The L*a*b* color space is a well-known color-opponent space with dimension L* for lightness and a* and b* for the coloropponent dimensions, based on nonlinearly compressed CIE In Fig. 7, the pixel features found in a set of training images have been clustered into two classes corresponding to “low lightness objects with lots of texture” and a “high lightness region of pixels with a broad range of textures” ranging from smooth to rough. Here texture is captured by a local entropy calculation. Given a pixel from a new image, we can calculate its features in this space and then examine which cluster it is closest too. This cluster gives us the class of the pixel. In this example, the new point is closest to the cluster on the right, namely the brighter objects with a variety of textures, so it would get the label of this group. When we are classifying pixels in a new image that was not present in the training set, it can happen that a pixel very far 960 Figure 7. Frequency of image pixel types by brightness L and local entropy H(L). Higher count frequencies are indicated by red while lower ones are indicated by blue. (a) (b) (c) away from any cluster, or is roughly equal distance from several clusters. In this case, we will be unsure which cluster it belongs to. We assign these pixels the “unknown” class. In our experiments on detecting persistent changes in transportation images, we found that a Gaussian mixture with 5 components, corresponding to five possible pixel (d) types provided sufficient distinctions to detect the kind of changes that we are concerned with. Applied to our Figure 8. (a) – (c): Three training images converted to abstract pixel class representations; (d) Persistent reference feature-class level image generated transportation domain, we obtain images like those in Fig. 8 from images in (a) through (c) (a)-(c). The top of each frame contains the original RGB image. The bottom of the frame contains a false color image actually sitting over top of a transient change in the original displaying the abstract class of each pixel. We can see that image set (there is a car gray car in the far left lane in the first distinct classes of pixels are separated in the images image of the training set). The persistent reference model and the persistent snapshot are shown in Fig. 10. Traditional corresponding to distinct elements (sky, trees, road, and car). background subtraction methods can then be used on the B. Persistent Change Detection abstract persistent reference and test images to detect changes Given an image represented in terms of abstract pixel in the scene between camera setup and the current time. A classes, the next step is to differentiate transient and threshold can then be applied to the persistent differences persistent changes. First, we create a “persistent reference image to detect changes of sufficient size and intensity to be image”. When the camera is setup at a new installation, we worth notifying a human operator that a review is required. In gather a number of images (~5 to 10) of the same scene taken Fig. 10, we can see that the persistent vehicle appearing in the from a fixed camera viewpoint and convert them into our left hand lane is clearly highlighted whereas the transient cars abstract representation (Fig. 8). We can then apply a passing through the scene have been completely suppressed. consistency operator (such as the statistical “mode” or conjunction) to the sequence of abstractions to find the most common class at each location. This is easily generalized to moveable cameras (such as pan-tilt-zoom ones) by forming a reference image for each of the camera viewpoints. In the persistent reference image shown in Fig. 8(d), we have used the set of abstract pixel class images above to generate a persistent reference image. Even with only a few images, the transient cars passing through the frame disappear, but persistent features such as the trees remain clearly segmented. This gives us a clear reference image free from transient artifacts. This result is obtained without any human labeling of images. An illustrative flow chart of the concept is shown in Fig. 9. During deployment, a set of images is taken to characterize the current state of the intersection. Again each of these images is converted into abstract pixel class images. We can construct the persistent test image from the set of abstract pixel class images by the same process described earlier for training images. Again, the mode operator gives us a stable picture free of transient automobile artifacts, however, the persistent features of the image are preserved (Fig. 10b). In this case, an automobile in the far left hand lane persists for a large number of frames and is captured in the abstract image. Notice that the persistent change detected is Figure 9. Flowchart for computing persistent feature-level image from a set of images for a scene from a single site. Thus, the algorithm described in this section makes it possible to automate the detection of persistent changes occurring within images appearing anywhere within a largescale deployment of monitoring cameras. Furthermore, configuring the detection system requires no specific programming, model building or other input from the 961 operator. The system merely needs a few (5-10) images across one or more days from each scene to be monitored. It should be noted that in addition to obstructions and stable scene element changes, the persistent view change algorithm would also detect camera misalignment/tilting that should shift the scene in the fixed camera view. To distinguish such changes, another algorithm that relies on the location of static scene elements in the view is briefly described next. (a) (b) (c) Figure 11. Illustration of misalignment detection algorithm: (a) Original night-time region of interest within intersection scene for traffic signals; (b) Binary image of candidate objects matching color template for red light phase; (c) Filtering by size and other criteria preserve only the light phases. − a) b) V. CONCLUSION In summary, we presented three algorithms to address the general problem of persistent view change detection for transportation cameras. These algorithms do not need a full reference image from the camera for evaluating a scene. They have shown promising results in initial tests. They can detect undesirable problems such as obstructions and displacements, as well as alert to scene element changes of interest from a traffic monitoring or surveillance perspective. = c) Figure 10. Stable scene change (marked with blue patch) obtained using proposed persistent view change detection algorithm. IV. CAMERA MISALIGNMENT DETECTION To detect unintended camera misalignments, we propose the use of the location coordinates of automatically detectable static scene elements. Since the scene captured by such camera devices is dynamic and exhibits significant variations over time, it is essential to rely on one or more elements that are known a priori to be static and unchanging. For example, in traffic intersection cameras, the traffic lights and signs, road lane and stop bar markings are examples of naturally occurring static scene elements. It is possible to train for element-specific features (either human-perceived or machine learned) that can isolate them; subsequently its location(s) is used to detect camera misalignment. ACKNOWLEDGMENTS The authors gratefully acknowledge our project partners within Xerox Transportation Services for frequent consultations and database aggregation: Natesh Manikoth, Marco Bressan, John Kowalsky, Todd Jackson, Juan Suarez, Michael Peterson, Allison McCann, Thomas Wright, Adil Attari, and Michael Shollenberger. We also appreciate the support of our program managers at Xerox, Norm Zeck and Kenneth Mihalyov and the early-stage project leadership of Serdar Uckun (PARC, presently at CyDesign). We have implemented algorithms to check for camera misalignments using traffic light locations within the image. In those, we start by detecting objects that match the color templates for red, yellow, or green light phases (done over a region of interest, say top quarter where signals are typically framed for enforcement applications); typically multiple templates are needed to accommodate a variety of ambient lighting conditions. That is followed by filtering to remove objects that fall outside learned size limits. Subsequently, connected components that meet a particular circularity criteria are preserved as candidate traffic light phases (Fig. 11). For example, for circular phases, a circularity metric C = 4πP/A (P is the object’s perimeter and A its area) is used while for turn signals a template match is used. If these detected candidate light phase objects consistently do not match one or more of the marked signal phase locations during training within some tolerance (to avoid false alarms from wind oscillations), the camera is judged to be out of its nominal alignment. This concept has been verified over an initial database made available to us by our project partners of approximately 500 intersection images taken under a broad variety of lighting conditions (day/night) at various sites for accurately detecting misalignments using the traffic lights. REFERENCES [1] R.G. Hallowell, M.P. Matthews, and P.A. Pisano, “Automated extraction of weather variables from camera imagery,” Proc. MidContinent Transportation Research Symposium, Ames, Iowa, 2005. [2] S. Harasse, L. Bonnaud, A. Caplier, and M. Desvignes, "Automated camera dysfunctions detection,” 6th IEEE Southwest Symposium on Image Analysis and Interpretation, 28-30 March 2004, pp. 36-40. [3] T. Horprasert, D. Harwood, and L. Davis, “A robust background subtraction and shadow detection,” Proceedings of the Asian Conference on Computer Vision, 2000. [4] T.G. McGee, R. Sengupta, K. Hedrick, “Obstacle detection for small autonomous aircraft using sky segmentation,” Proc. IEEE Intl. Robotics and Automation Conference, 2005, pp 4679-84. [5] H. Woo, Y. M. Jung, J.-G. Kim, and J. K. Seo, “Environmentally robust motion detection for video surveillance,” IEEE Transactions on Image Processing, Nov. 2010, v. 19(11), pp 2838 – 2848. [6] L. Snidaro, C. Micheloni, and C. Chiavedale, “Video security for ambient intelligence” IEEE Transactions on Systems, Man and Cybernetics, Part A, Jan. 2005, v. 35(1), pp 133 - 144. [7] J. Lim, 2-D Signal and Image Processing, Prentice Hall, 1990 [8] A. Raghavan, J. Liu, B. Saha, and R. Price, “Reference imageindependent fault detection in transportation camera systems for nighttime scene,” Paper #198, Proc. IEEE ITSC, Anchorage, AK, ’12. 962