Using eye movements to study visual search and to improve tumor
Transcription
Using eye movements to study visual search and to improve tumor
RadloGraphics index terms: Imaging technoloav PERCEPTION AND DISPLAY Cumulative DIagnostIc observer Index terms: radIology, prformance Using eye movements to study visual search and to improve tumor detection Calvin F. Nodine, ph.D.* Harold L. Kundel, M.D.t S Introduction Picture labeling perception areas of the consists picture with of assigning the names meaning to a picture of objects that have by exist- ence in the real world. Thus, an experienced observer can point to a chest image and say, “This is a rib” or “This is the heart”. Both “rib” and “heart” are the names of anatomic objects, stored in memory, that are used as labels for certain groupings of picture elements. The mechanism for the grouping and labeling which we shall call object recognition is one of the great puzzles of human perception to which no solution will be offered here. Recognition, which implies assigning a meaning, is distinguished from detection, which is deciding about the presence or absence of something that is expected. Radiological investigators have theory where from modelled object the object everything else in the detection of interest image using (the signal) (the noise). statistical decision must be distinguished When the object is just a blob on a clear background and the noise is random variation, object detectability can be described by a signal-to-noise ratio equation that can be derived from the principles of imaging physics and physiological optics. Few situations outside of the laboratory are this simple. Usually, the object of interest (the target) is mixed in with other objects. The target may either stand out or be hidden by the background objects depending upon their structural characteristics and arrangements. Statistical-decision From the Department of Educational Psychology. TempIe University (*) and the Pendergrass Diagnostic Research Laboratory. University of Pennsylvania (t). Philadelphia. PA. This work was supported in part by NIH Research Grant CA 32870. Address reprint requests to CF. Nodine. Ph.D., Pendergrass Radiology Laboratory. UniversHy of Pennsylvania. 3 Medical Education Building. Philadelphia. PA 19104-6068. Volume theory still provides a useful method for quantifying the detection of the target objects (i.e. receiver operating characteristic (ROC) analysis), but the relationship to theory is stretched thin. Detecting an object That is hidden in a natural scene detecting an object displayed against a background is not the same as of random noise (V. The objects in the background of a scene camouflage the target interfering with object recognition and forcing the observer to sift through extraneous objects in order to find the target. This is illustrated by a game devised by artist Al Hirschfeld of the New York Times who challenges the viewers of his cartoons by hiding the name NINA somewhere in the scene. 7, Number 6, Monograph #{149} November 1987 #{149} RadloGraphics 1241 Perception and display in diagnostic Nodine imaging and Kundel Can you find NINA in Hirschfeld’s depiction of a scene from “The Apartment”? Looking for a lung tumor, camouflaged by the anatomical structures of the chest presents the same type of problem to the perceptual system. IA lB Figure 1 (A) A scene from the film “The Apartment” by Al Hirschfeld. Hirschfeld’s daughter’s name NINA is embedded in a pictonal detail of the scene. (B) A close-up of the pictorial detail containing the name NINA. Notice how Hirschfeld uses the natural contours of the lamp to hide the letters. This illustrates the camouflaging effect known as mimicry. I -c #{149}1 .;; 2A Figure 2 (A) A portion of the right chest exposed in full inspiration showing a lung tumor. (B) The same chest exposed at partial inspiration. Notice how the vasculature camou- 1242 2B I RadloGraphics #{149} November 1987 flages the tumor. This illustrates the camouflage principle of “dazzle” in which target-object contours are broken by counterforms (17). Volume #{149} 7, Number 6, Monograph and Kundel Nodine Perception and The lung tumor that is easily detectable in the Figure 2A is more difficult to detect in the Figure 2B because of the camouflaging effect from overlapping vascular structures. The practical consequence of this situation difference radiologist. from three programs missed on is diagnostic visible in retrospect. difference error which does not to a “Ninamaniac” make much but may Percent make Screening People Cancers Interval Screened Found ANNUAL In Retrospect No. % 78/156 46 1984 10,040 168 1983 (3) 4 MOS. 4.618 92 70/92 76 Hopkins 1978 (4) ANNUAL 10,362 78 14/78 18 Eye Movements Detection of camouflaged lung tumors in chest x-ray images demands identifying places where tumors might be hiding, examining those places for tumor features, and perceiving a tumor when only part of it is showing. We have studied how perception and cognition interact detection Visible Mayo Recording in the from Three Large Programs Memorial (2) of camouflaged objects imaging a Screening Institution in diagnostic to the people consulting with the Table I presents a summary of data recent NIH supported lung-screening showing the percent of lung cancers the original reading of the image, but Table I Visible in Retrospect of Lung Cancers display by studying eye movements of experts and non-experts searching for targets in natural pictorial backgrounds The eye movement recordings can be used to determine where viewers focus their visual attention. Eye movements are measured by having the viewer wear a specially-designed spectacle frame containing infrared emitters and sensors that measure changes in light reflected from the border between the iris and sclena of each eye (5,6,7). (8). Figure 3 A frontal view of a viewer wearing the eye-movement spectacles. The sensors are mounted below the field of view of the eyes. Volume 7, Number 6, Monograph #{149} November 1987 RadioGraphics #{149} 1243 Perception and display tions). in diagnostic imaging The eye moves in jumps The reflectance changes Nodine (saccades) are with intervening converted pauses to x,y coordinates and Kundei (fixawhich indicate the location and duration of fixations. We have found that fixations occur in clusters, and we have used fixation clusters to infer where attention is directed. Figure 4 A series of eye-fixation records showing the locus of the axis of gaze during fixation and the path of the eye during movement. (A) The raw eye-position record. The small squares show the x,y locations of the eyes during a 15 sec. viewing period. Each square represents 1/60 sec. (B) The eye-fixation record. The raw data points are grouped into fixations. The small circles show the x,y centers of the raw data groupings. It is the centers of these data groupings that define individual fixations. The lines between fixations are added to show the scanpath. Each fixation has a duration, which is the sum of the raw data points, that varies from 1/60 to I sec. (C) The same fixation record showing the foveal field of view of each fixation. Circles having a radius of .5 degrees have been drawn around the center of each fixation to indicate how much image detail is picked up by the high-resolution fovea. The range ofthe fovea has been estimated at I degree visual angle on the chest image. (D) The same fixation record showing fixation clusters. Fixations are grouped into clusters using a running-mean rule, and circles having a radius of 2.5 degrees have been drawn around the mean of each fixation cluster. The clusters contain differing numbers of fixations, reflecting differences in the degree to which underlying image features are scrutinized. Figure 5 * The model of visual search. The initial glance results in a global impression that provides the viewer with information about orientation, symmetry and anatomic layout of distinctive image features. Scanning tests diagnostic hypotheses by scrutinizing distinctive image features for significant anatomical perturbations. The evidence gathered from these tests is used to generate plausible perceptual interpretations that lead to a diagnostic decisian. 1244 RadioGraphics #{149} November 1987 Volume #{149} 7, Number 6, Monograph Nodine and Kundel Percepflon The Visual Search and and DetectIon ments display From over a decade of measuring the eye moveof radiologists as they scan x-ray images I GLOBAL L major components: attention to image EXPECTATIONS SCHEMA IMPRESSION 1 ORI ENTATION SYMMETRY ANATOMIC LAYOUT overall detail; model provides basic applied an organizational to the understanding perceptual It has been ures and useful FOCAL ATTENTION SCRUTINY OF IMAGE PERTURBA11ONS TESTING OF DIAGNOSTIC HYPOTHESES N Volume 7, Number for that have we of lung tumor for classifying methods We are that occur detec- detection fail- for improving particularly when the per- interested signal-to- noise ratio is well above that required for “threshold” detectability as in Hirschfeld’s NINAs and most missed lung tumors. The model N viewers N G developed a that has three framework processes has suggested ceptual performance. in detection failures S C A we have detection pattern recognition; focal and, decision making. This studying tion. imaging Model searching for abnormalities, model of visual search and COGNITIVE in diagnostic It assumes that to viewing. 6, Monograph depicts first glance #{149} the a search November events to the decision 1987 task has that occur about been RadioGraphics #{149} from the the image. defined prior 1245 Perception and display in diagnostic imaging Nodine Overall pattern recognition. The first glance at the image produces a global impression in which the viewer brings his cognitive schema to bear on the image data obtained by the retina. The cognitive schema consists of knowledge about the mapping of anatomy and pathology on to radiographic images together with expectations about the to-be-seen image. This initial interaction, hundred with which takes milliseconds, a fairly accurate Cl) Ui > Cl) 0 the viewer conception Kundel Flash a few leaves and 0. of the Ui content of the image. It has been shown that radiologists can make reasonably accurate diagnostic interpretations from the information obtained in a single, brief glance (9,10). The global impression provides the perceptual system with the informotion needed to carry out the diagnostic task. Potential target sites are flagged and deviations from the viewer’s cognitive schema, called perturbations are noted. This initial impression sets the stage for detailed focal analysis of the image by the central vision (IV. Figure 6 Receiver operating characteristic (ROC) curves comparing detection performance on a set of 22 normal and abnormal chest images after one 0.2 sec. flash and after unlimited free search. Scanning. Following the initial impression, the eyes are moved over the image so that central vision can be used to examine potential target sites and perturbations. Percent true positives is plotted The examination Decision. The fixations that cluster at perturbations or potential target sites are presumably collecting the data necessary to test for the where presence of an abnormal object. veals a target object, a decision that prolonged on image detail ment record however, reveals where lingered, providing indirect evidence rea tumor may result. If testing is negative or inconclusive, search continues. We consider each fixation cluster as a decision node. Thus, the report “normal chest” is an overall impression based on a series of local decisions that are needed because the relevant anatomic features can only be resolved by central vision. The viewer is not aware of all of the decisions, positive and negative, made during 1246 RadioGraphlcs scanning. #{149} covert decisions were 1987 We believe making activity associated with the interpretation of anatomical perturbations that have potentialas tumor targets. Volume #{149} made. the eye about or multiple fixations that cluster signal the testing and decision- The eye-move- November and percent indicates the index of detectability (del which for the flash condition is significantly greater (de’ 1.2) than chance performance (de’ 0). Overall accuracy for the flash condition was 70 percent true positives as compared with 97 percent true positives for free viewing (based on 10). is accomplished by clusters of closely-spaced fixations. After the places identified during the initial global impression have been scrutinized, the viewer may follow the same stereotyped scanning pattern aimed at discovering something that was missed, or, may simply scan at random while thinking about the image. If testing to report on the ordinate false positives is plotted on the abcissa. The dotted diagonal line 7, Number 6, Monograph Nodine and Kundel Perception in Detecting Errors and Interpreting Following this model, we hypothesize three sources of error: sampling, recognition, and, decision making (12). Sampling Error. If the purpose of scanning is to sample the image with the high-resolution region of the central retina called the fovea, then it is likely that some parts of the image will be neglected. they are image Maps of fixation unevenly clusters distributed (See Figure over 4D). We have show that a chest hypothesized that prolonged scrutiny is accomplished by increasing the number of fixations instead of just increasing the duration of a single fixation. A cluster then, extends the limited foveal vision to a wider circular field of ± 2.5 deg. and display in diagnostic Targets Decision-Making Error. Often, parts camouflaged objects are detected, viewer decides that they are normal rather than imaging the target. These errors of but the variants are relatively easy to identify in the eye-movement record because there is an increase in the number of fixations clustering increased on the target visual scrutiny. site caused by the This is the most preva- lent type of error. A study of lung nodule detection showed that 10 percent of the misses were due to sampling, 30 percent due to recognition and 60 percent to decision making. (10) visual angle (10). Typically 80-90 percent of the lung image is covered by fixation clusters of this size (13). Stated another way, it takes about 18 fixation clusters to sample adequately a chest image. Coverage is not exhaustive, because the main purpose of scanning is the testing of perceptual inferences. In the process, some locations that are considered perceptually uninteresting are not covered (14). Recognition Error. Many targets are looked at directly but are not reported. Looking at a region containing a target does not guarantee that it will be recognized, especially when the target is camouflaged. It has been shown that fixating a region for one third of a second is sufficient for a negative decision, but a deeplyembedded target can reguire a cluster of fixations lasting up to 3 seconds (5,15). It is not clear if the negative decision is made actively or by default. When the viewer spends no more time attending to an unreported target than is spent attending to a normal anatomical structure, it is assumed that the local picture elements were not synthesized into a recognizable object. Volume 7, Number 6, Monograph Figure 7A Examples of three types of errors. Sampling error. The chest image is scanned by fixation clusters, the boundaries of which are represented by circles, but the lung tumor in the left upper lobe is notfixated as indicated by an absence of clusters on or near the target. #{149} November 1987 RadloGraphics #{149} 1247 Perception and display in diagnostic imaging Nodine 7B and Kundel 7C Figure 7B & C (B) Detection error. The lung tumor is fixated by one fixation cluster, but the target is scrutinized by only a single fixation indicating lack of visual interest in the local image features. (C) Decision-making error. The lung tumor is fixated by a fixation cluster containing multiple fixations; five fixations are shown. Despite this evidence of exiensive scrutiny, the viewer decided that the local image features did not meet his criteria for defining a true tumor target and called the image “normal”. Feedback-Assisted The number of fixations that cluster at decision nodes varies with the decision made at that local image site. True negatives have the fewest number of fixations per cluster with a mode of 2. True positives have the most fixations per cluster with a mode of 5. False negatives fall in between with a mode between 3-4 fixations per cluster indicating that these decisions receive increased visual scrutiny compared to true negative decisions. Given that many false negative decision nodes can be identified on the basis of multi-fixation clusters, an interesting question is: If the viewer is given feedback about the location of these multi-fixation clusters, can re-evaluation of decisions at these potential target sites improve performance? Feeding back locations on the image that received intensive visual scrutiny gives the viewer 1248 an opportunity RadioGraphics #{149} to review November 1987 Visual Search areas that aroused suspicion but were dismissed as normal. The original decision can then be revised or confirmed on the of the second Preliminary Results. An experiment using Feedback-Assisted Visual Search is now in progress (16). A computer-display system has been developed that provides visual feedback to the viewer. The feedback, in the form of highlights on the display, is based on data obtained by monitoring highlighted gorithm eye-position locations that multi-fixation identifies clusters. are during scanning. The determined by an al- image These features image receiving features are presumed to have perceptual significance to the viewer. They represent perceptually suspicious aspects of the image. those Volume #{149} basis look. 7, Number 6, Monograph Nodine and Kundel Perception NO. FIXATIONS I and display in diagnostic imaging Figure 8 The distributions of number of fixations per cluster for three types of decisions. The decision types were determined by measuring all fixation locations leading up to an overt decision by the viewer. If the decision’is positive (tumor present) fixations clustering on truly abnormal image features are categorized as true positives. If the decision is negative (tumor absent) fixations clustering on truly abnormal image features are categorized as false positives; fixations clustering on truly normal image features are categorized as true negatives. The true negative decisions peak at 2 fixations per cluster, the false negatives at 3 fixations per cluster and the true positives at 5 fixations per cluster. CLUSTER Cl) Ui U) -I C) In Phase 1, eye movements are recorded as the radiologist searches for lung tumors in chest images. The radiologist then gives his decision. In Phase 2, the image is re-presented highlighting the locations of intense scrutiny indicated by multi-fixation clusters (feedback condition), or, random locations are fed back (pseudofeedback condition) as a control. The viewer examines a second each highlighted decision. About location and gives 6-8 locations are highlighted. Prelimimary tests were carried out on three viewers each examining 120 chest im- ages, 60 with tumors show that when and 60 normals. feedback was given The results where at least one highlight fell on a tumor target, 19 percent of false negative decisions were revised to true positive decisions compared with 8 percent for pseudofeedback when none of the highlights fell on a tumor target. The conversion of true negatives to false positives was the same in both conditions (8 percent). This finding mdicates that informative feedback has a positive effect on nodule detection. Encouraged by this result, research is continuing identification of perceptually derived from eye-movement methods for displaying visual Volume especially suspicious recordings feedback. 7, Number on the areas and 6, Monograph Figure 9 A schematic diagram of the Feedback-Assisted Visual Search system. The viewer wears a pair of spectacles containing the eye-movement sensors. These sensors record the viewer’s eye fixations and send them to a computer that analyzes and stores them in Phase I. The viewer is given 15 sec. to scan the image and make a decision. The image is re-presented in Phase 2 where multi-fixation clusters from Phase I meeting a certain numerical criterion are fedback by highlighting their locations on the image. The viewer re-evaluates the highlighted areas and revises his original decision. #{149} November 1987 #{149} RadioGraphics 1249 Perception and display in diagnostic imaging Nodine and Kundel Conclusions The eye-brain system is presently the best target detector known, despite the fact that it is occasionally fooled by the veil of camouflage that hides relevant targets. Medical education and training can program the eye-brain system direct measures tually suspicious to make during image interpretation glimpse of the fundamental mind of the radiologist. plausible perceptual interpretations even from medical images containing the most meager perceptual data. It may be possible to further improve viewer performance by using in- in diagnostic decision perimental and of attention to identify percepimage features for re-evaluation making. The unique tool that has enabled quantify the pattern of human ex- us to measure attention has also given us a workings of the References 1. Kundel HL. Nodine CF. Thickman Dl, Carmody D. Toto L. Nodule detection with and without a chest image. Invest Radiol 1985; 20:94-99. 2. Heelan RT, Flehinger BJ, Melamed, MR. et al. Non-smallcell lung cancer: Results of the New York screening program. Radiology 1984; 151:289-293. 3, Muhm JR. Miller WE. Fontana RS, et al. Lung cancer detected during a screening program using four-month chest radiographs. Radiology 1983; 148:609-615. 4. Stitik FP, Tockman MS. Radiographic screening In the early detection of lung cancer. Rad Clin North Am 1978; 16:347- 366. 5. Kundel HL. Nodine CF. Studies of eye movements and visual search in radiology. In: Eye movements and the higher psychological functions. Senders JW. Fisher DF. Monly RA eds. Hillsdale, N.J.: Erlbaum, 1978. 6. Nodine CF. Carmody DP, Kundel HL. Searching for NINA. In: Eye movements and the higher psychological functions. Senders JW. Fisher DF, Monly RA, eds. Hillsdale, N.J.: Erlbaum. 1978. 7. Kundel HL. La Follette PS Jr.,. Visual search patterns and experience with radiological images. Radiology 1972; 103:523 -528. 8. Carmody DP, Kundel HL. Nodine CF. Performance of a computer system for recording eye fixations using limbus reflection. Behav Res Meth Instr 1980; 12:63-66. 9. Gale A Vernon J. Millar K Worthington BS. Interpreting radiographs in a single glance (abstr.). Radiology 1983; 149(P):253. 1250 RadioGraphics November #{149} 1987 10. Kundel HL. Nodine CF. Interpreting chest radiographs without visual search. Radiology 1975; 116:527-532. 11. Kundel HL, Nodine CF. A visual concept shapes image perception. Radiology 1983; 146:363368. 12. Kundel HL, Nodine CF. Carmody DP. Visual scanning, pattern recognition and decision-making in pulmonary nodule detection. Invest Radiol 1978; 13:175-181. 13. Nodine CF. Kundel HL. The cognitive side of visual search In radiology. In: O’Regan JK Levy-Schoen A eds. Eye movements: From physiology to cognition. Amsterdam: Elsevier. 1987: 234. 16. Nodine CF. Kundel HL. Using eye movements to study decision-making processes of radiologists. Presented at the Fourth European Conference on Eye Movements, Goffingen. W. Germany. 1987. 17. Volume #{149} 573-582. 14. Kundel HL, Nadine CF. Thickman D, Toto L. Searching for lung nodules: A comparison of human performance with random and systematic scanning methods. Invest Radio 1987; 22:417-422. 15. King MG. Stanley GV. Burrows GD. Visual search processes in camouflage detection. Hum Factors 1984; 26:223- Behrens RR. Art and American Review, 7, Number camouflage. 1981. 6, Monograph Cedar Falls, Iowa: North