|COGNITIVE PSYCHOLOGY, 353-383 (1977)
Forest Before Trees: The Precedence of Global Features
in Visual Perception
University at Haifa, Haifa, Israel
The idea that global structuring of a visual scene precedes analysis of local features is suggested, discussed, and tested. In the first two experiments subjects were asked to respond to an auditorily presented name of a letter while looking at a visual stimulus that consisted of a large character (the global level) made out of small characters (the local level). The subjects' auditory discrimination responses were subject to interference only by the global level and not by the local one. In Experiment 3 subjects were presented with large characters made out of small ones, and they had to recognize either just the large characters or just the small ones. Whereas the identity of the small characters had no effect on recognition of the large ones, global cues which conflicted with the local ones did inhibit the responses to the local level. In Experiment 4 subjects were asked to judge whether pairs of simple patterns of geometrical forms which were presented for a brief duration were the same or different. The patterns within a pair could differ either at the global or at the local level. It was found that global differences were detected more often than local differences.
The Principle of Global Precedence
Do we perceive a visual scene feature-by-feature? Or is the process instantaneous and simultaneous as some Gestalt psychologists believed? Or is it somewhere in between? The Gestaltists' view of the perceptual system as a perfectly elastic device that can swallow and digest all visual information at once, no matter how rich it is, is probably too naive. There is ample evidence that people extract from a picture more and more as they keep looking at it (e.g., Helson & Fehrer, 1932; Bridgen, 1933; Yarbus, 1967). But does this mean that interpreting the picture is done by integrating in formation collected in a piecemeal fashion? Is the perceptual whole literally constructed out of the percepts of its elements?
This paper is based on parts of a doctoral dissertation submitted to the Department of Psychology, University of California, San Diego. The research was supported by Grant No. NS07454 from the National Institutes of Health. I am greatly indebted to Lynn Cooper, David Rumelhart, and many members of LNR research group at UCSD for lots of useful comments. Special thanks to Donald Norman for his guidance, really valuable suggestions. comments, and careful review of many drafts. I also thank Donald Broadbent, Ralph Haber, and Yaakov Kareev for useful comments on a draft. Requests for reprints should be addressed to: David Navon, Department of Psychology, University of Haifa, Haifa 31999, Israel.
Copyright @ 1977 by Academic Press Inc.
All rights of reproduction in any form reserved. ISSN 0010-0285
354 DAVID NAVON
My approach to the problem is, in a sense, in the tradition of the early studies of Aktualgenese (see review in Flavell & Draguns, 1957). The idea put forward in this paper is that perceptual processes are temporally organized so that they proceed from global structuring towards more and more fine-grained analysis. In other words, a scene is decomposed rather than built up. Thus the perceptual system treats every scene as if it were in a process of being focused or zoomed in on, where at first it is relative indistinct and then it gets clearer and sharper.
Some Definitional Framework
The interpreted contents of a scene can be viewed as a hierarchy of subscenes interrelated by spatial relationships (cf. Winston, 1973; Palmer. 1975). The decomposition of a scene into parts, each of which corresponds to exactly one node of the hierarchical network, is conceivably done in accordance with some laws of Gestalt such as proximity, connectedness, good continuation, and so forth (see Palmer, Note I). This statement does not explain the process of decomposition, but it gives an idea of its product. The globality of a visual feature corresponds to the place it occupies in the hierarchy: The nodes and arcs at the top of the hierarchy are more global than the nodes and arcs at the bottom. The latter are said to be more local. We cannot claim, however, that one visual feature is more global than another one, unless we know that both correspond to actual nodes in the network. The operational test is to try to construct networks in which feature "x" will dominate feature "y" (or vice versa). If this can be done in just one direction, it can be argued that if "x" and "y" constitute perceptual units, then "x" must be more global than "y" (or vice versa). For an experimental test of the principle of global precedence one should use as stimuli figures in which the spatial hierarchy is intuitively transparent.
It is claimed that processing of a scene proceeds from the top of the hierarchy to the bottom; that is to say, it is global-to-local. It follows that the global features of a visual object that is within an observer's effective visual span (i.e., none of its parts is either viewed peripherally or below the threshold of visual acuity) will be apprehended before its local features.
As an example, consider how a picture like the one in Fig. I may be processed. The structuring of the picture that comes first in the process of perception is something like L (blob-1, frame), where blob-1 is the region where the figure is, and L is the spatial relationship that holds between blob-1 and the overall frame. At this point more processing effort is directed to the analysis of blob-1, so that the structure of the scene is refined into something like L (R(blob-2, b!ob-3), frame), where blob-2 is the region of the crescent, blob-3 is the region of the star, and R is the spatial relationship of those two. During the next stage more effort
FOREST BEFORE TREES
Fig 1. An example of a simple picture and its parts.
goes into differentiating blob-2 and blob-3 until they are recognized as a crescent and a star, respectively, so that the final structure is: L (R(crescent, star), frame).
The view presented here does not amount to a distinction between stages of attention (cf. Neisser, 1967). It is, rather, a claim about perceptual analysis of whatever is attended to. Note that perceptual processing is viewed as a unified process, in that both the "where" and the "what" questions are answered while the scene is structured. Spatial organization is treated as a sort of crude figural analysis which some-limes may even be sufficient for recognition.
Functional Importance of Global-to-Local Processing
In most real situations the task of the human perceptual processor is not just to account for given input but also to select which part of the surrounding stimulation is worth receiving, attending to, and processing, The constraints imposed by the optical limits of our eyes and by the nature of the surroundings have a twofold implication for the processing structure in the visual domain. One, the resolution of most of the Stimuli in the picture plane (or the largest part of their visible surface) is low by default. The crude information extracted from the low-resolution parts of the visual field should be used for determining the course of further processing. Two, in an ecology where uncertainty is the rule, there is little to be gained from being set for a particular type of input. The system should be flexible enough to allow for gross initial cues to suggest the special way for processing a given set of incoming data. These two observations suggest that a multipass system, in which fine-grained processing is guided by prior cursory processing, may be superior to a system that tries to find a coherent structure for all pieces of data simultaneously.
One important function of the first pass is that of locating the stimuli, an obvious prerequisite for any figural analysis. Note, however, that
356 DAVID NAVON
finding the location of stimuli provides the system with some very global figural information.
Since perception is basically dynamic, there is often only time for partial analysis because of the constant change in input. In that case a rough idea about general structure is more valuable than few isolated details. Furthermore, often we do not place the same importance on every portion of the input. It was found (Yarbus, 1967; Mackworth & Bruner, 1970) that people tend to spend more time on the more informative sectors of the picture. In this case we ought to have the results of some initial gross analysis in order to determine which part of the field is likely to bear more on our behavior or thinking. But even when we focus on an important sector of the field, we may not need to build a very elaborate structure for it. Details are detected only to the degree that they are essential for determining contents.
As pointed out by Palmer (1975) and Norman and Bobrow (1976) and supported by much empirical evidence, perceptual processing must be both input-driven and concept-driven. That is, the activity in the system is triggered by the sensory input but is guided by expectancies formed by context and early indications from sensory data. Thus, perception is regarded as a two-way process: The hypotheses about what a stimulus may be interact with what the stimulus actually is in determining what the stimulus is finally perceived to be. Now, since local features often serve as constituents in more global structures, the identification of the global features is a very useful device of narrowing down the range of candidates for accounting for a certain local region. Moreover, as pointed out by Palmer (1975, Figure 11.6), sometimes the identification of apart of the picture merely on the basis of its own features is almost impossible, yet it can be easily recognized within the appropriate context. In general, the more definite the output of global analysis, the more concept-driven local analysis is, so the less effort has to be expended in overcoming deficiencies in data quality. This is a substantive advantage in view of the limited acuity of the visual sensory mechanisms.
The point that spatial organization precedes interpretation of details is essential for resolving ambiguities stemming from rotation, projection, and interposition. By the time a particular stimulus in the field is interpreted, the more global processes have generated a hypothesis about the angle from which that stimulus is viewed. Thus, the expected object is not any instance of a category, but rather an actual object as seen from a certain point of view and probably partly concealed either by other objects or by parts of itself.
Some Empirical Evidence
It is well supported that perceiving the whole facilitates the perception of its parts.
FOREST BEFORE TREES 357
The word-letter phenomenon (Reicher, 1969; Wheeler, 1970) is an excellent demonstration of how the mere presence of a higher-level perceptual unit improves later forced-choice recognition of its individual constituents over the case when they are presented alone (or just being focused upon, as shown by Johnston and McClelland, 1974). Whether it is the discriminability or the codability of the constituents which is enhanced is yet an open question.
Selfridge (in Neisser, 1967, p. 47) illustrated how the same pattern can be interpreted as two different letters depending on the context. In many cases it appears that the perceptual system ignores details that are inconsistent with the interpretation indicated by the context or even completes features that are missing in the actual scene. Pillsbury (1S97) demonstrated how readers may not be disturbed at all by omission or substitution of letters in texts they read. Warren (1970) reported about a similar effect in speech perception. Huey (1908) and recently Johnson (1975) provided some more examples in this vein. Palmer (1975a) showed that interpretation of ambiguous elements of a picture tends to conform to the semantic structure of the whole scene, even when it involves some distortion or deletion of few details.
In those examples, the perception of the global unit or the overall theme is more veridical than the perception of the elements. However, note that in all of these examples (excluding, perhaps, the word-letter phenomenon) the whole is more predictable than the elements, especially when the target element is incongruent to some extent with the rest. Therefore, it is not clear whether the veridical identification of the whole is due to very potent extraction of global features or to highly redundant inference made on the basis of a sample of local features. (See Johnson, 1975; Rumelhart & Siple, 1974).
What is the evidence that global features or relationships are perceived first? First, there are indications that people can take advantage of peripheral information. The subjects in an experiment by Williams (1966) were able to utilize peripherally viewed size or color to direct their search for targets. Rayner (1975) showed that readers seem to perceive the gross shape of peripherally viewed words to the right of the word being fixated at the time. Since peripheral information must be of low resolution then to he extent that recognition is aided by peripheral cues, the precedence of the gross features falls out.
Second, even within the angular span that can be perceived with high acuity in just one fixation, there seems to be progression with exposure time from very gross global perception to very line-grained recognition. In many early studies of the development of percepts (comprehensively (viewed in Flavell & Draguns. 1957) subjects were presented with visual stimuli for very short durations. The general finding is that as the duration of exposure got longer, subjects progressed from perceiving just the loca-
358 DAVID NAVON
tion of the object, through differentiating figure and ground, then to some inaccurate apprehension of the global form, and finally to good figural sensation. In another experiment (Navon, Note 2) subjects were presented for a brief duration with a picture of a clock with Greek letters as hour markers and arms in sleeves as hands. Whereas all of them identified the clock correctly, recognition of the details was below chance level. Thus, not only is the perception of global structure earlier than detailed figural analysis, but it is often sufficient for identifying an object or a scene with a fair amount of confidence.
Another relevant finding comes from motion perception. Since motion perception has to keep up with the continual change in the visual field. it must be affected mostly by those properties of the visual stimuli that are processed first. It was found (Navon, 1976) that in situations of ambiguous apparent motion, figural identity of the elements did not have any effect on determining the type of motion experienced, whereas more global features did.
On the developmental level, Meili-Dworetzki (1956) had children of different ages respond to several ambiguous figures in which whole and parts suggest different interpretations (e.g., a man made out of fruit). She found that children perceived wholes at an earlier age than parts. On the other hand, Elkind, Koegler, and Go (1964) devised a set of figures that produced the opposite effect. The source of conflict in those findings resides, obviously, in the stimuli. Ambiguous figures may vary in the relative plausibility of their alternative interpretations. If young children tend to make just one decision about a stimulus, then they arc likely to overlook the duality of ambiguous figures and to detect just the more salient aspect, whichever the case may be.
It seems, thus, that the general problem with the experimental treatment discussed so far is lack of proper control over the stimulus material Global and local structures may differ in complexify, salience, familiarity, recognizabitity, or relative diagnosticity for determining the identity of the whole, and they do differ in some of these properties in all the studies mentioned so far. Hence, the two major principles of the experimental attack I used were: (a) control of all these properties of global and local features; and (b) independence of global and local features, so that the whole cannot be predicted from the elements and vice versa.
GLOBAL PRECEDENCE IN A DIFFUSE-ATTENTION SITUATION: EXPERIMENTS 1 AND 2
The best way to equate the properties of global and local features is to use stimuli in which the set of possible global features is identical with the set of possible local ones. For this purpose I constructed large letters that were made out of small letters (see Fig. 5A). When one looks at
FOREST BEFORE TREES 359
these stimuli in normal viewing conditions, one cannot miss either the ideality of the whole stimuli or the fact that they are made out of letters whose identities are also definitely recognizable.
I constructed a task in which visual perception is slightly restricted both by visibility conditions and by limited attention (or using the terminology of Norman & Bobrow, 1975, the quality of the data and the availability of processing resources). My prediction was that in such a situation subjects' performance will be insensitive to the figural identity of the local features. Perceptual awareness was measured by means of an indirect method: an intermodality Stroop task. Stroop tasks are named after Stroop (1935) who found that when subjects have to name the color of an ink in which a word is written, their responses are inhibited when the word is a name of color different from the ink color.
Subjects were asked to respond to a name of a letter while looking at a visual stimulus of the type shown in Fig. 5A. The rationale was that recognition of a visual stimulus may interfere with discrimination of an equivalent auditory stimulus or with performing the appropriate responses for it. Experiment 1 was done in order to test the validity of this rationale and to determine the optimal parameters for such auditory-visual interference. One of the problems with Stroop tasks, which is especially pronounced in an intermodality task, is ensuring that the subject is actually exposed to the secondary channel or aspect of the stimulus. I have taken several measures in order to do that, and they are described below.
Apparatus. The equipment consisted of a display Tektronix oscilloscope with a fast decay phosphor (decays to 90% in .63 msec), a typewriter keyboard, a Krohn-Hite 355OR filter, a Shure microphone mixer, two Hewlett—Packard 35OD attenuator sets, a chin rest, and a pair of headphones. Auditory and visual stimuli were generated and controlled by a PDP-9 computer equipped with digital-tO-analog converters. The subject sat alone in an acoustically isolated booth in front of a table wearing the headphones; his chin was on a chin rest and his hands were on the keyboard. The display oscilloscope was positioned in front of the subject at eye level. Viewing distance was 50 cm. The intensity of the picture on the oscilloscope was adjusted so that when plotting a test square with side of 13 mm containing 51 x 51 dots, the luminance of the square was 1.37 cd/m2. The room illumination was such that the luminance of the periphery of the screen was .65 cd/m2.
Design and procedure. The major characteristics of the experimental task are schematized in Fig. 2. The temporal structure of the stimuli is presented in Fig. 3. The subject listened through the headphones to a sequence of utterances of equal duration evenly spaced in lime.
The utterances could be either of the names of the letters H and S (namely “ach” and “es”) sequenced at random with equal probabilities. While listening to the sequence of auditory stimuli, the subject was monitoring the oscilloscope. A visual stimulus of the set shown in Fig, 2A could either be flashed on the oscilloscope or not (randomly
SET OF VISUAL STIMULI
Fig. 2. A schematization of the experimental task in Experiment 1.
with probability .5) in very close temporal proximity to the auditory stimulus. The primary task of the subject was to indicate after each utterance which of the two letters he had heard by depressing either of two keys with the second or third finger of his right hand. The secondary task was to respond by depressing a key with his left hand to the appearance of any visual stimulus regardless of what it was.
The subject was instructed to perform his response to the visual display only after he had made the auditory discrimination response. It was also emphasized that although he only had to respond to the presence of the visual stimulus, he would be questioned about the identity of the visual stimulus later. He was not told before the experiment what the visual stimuli would look like.
Each trial was preceded by a short warning beep. Simultaneously with the onset of the beep a square frame with a side slightly larger than the longer side of the visual stimuli appeared on the screen and persisted until the scheduled offset time of a visual stimulus regardless of whether or not such a stimulus was actually presented. The warning signals.
FOREST BEFORE TREES
3. A diagram of the temporal structure of the stimuli in Experiment 1.
362 DAVID NAVON
served to minimize both the temporal and spatial uncertainty of the subject with regard to the stimuli. Accuracy and latency for both the auditory discrimination response and the visual detection response were recorded. Latency was measured from the start of the auditory stimulus.
Each subject was run individually in one session of 504 trials. After every block of 72 consecutive trials the subject was given a rest period of about 40 sec. The necessary ran domizations were done for each block independently. The auditory stimuli and the presence or absence of visual stimuli were factorially crossed in each block, as were The visual Stimuli with The auditory ones in those trials where visual stimuli appeared. The visual stimuli was said to be consistent with the auditory stimulus if they both consisted of the same letter; it was said to be conflicting with the auditory stimulus if they were different letters (namely an H and an S); and it was said to be neutral with regard to the auditory stimulus if it consisted of a rectangle.1 It falls out that these levels of consistency were randomized and balanced in each block.
There were three conditions of temporal overlap between the auditory and the visual stimuli. As seen in Fig. 3, the exposure duration of the visual stimuli was always the same, but the delay of their onset with respect to the start of the auditory stimuli was either -40.0, or 40 msec. The overlap conditions were administered in different blocks. The order of the administration of the conditions for half of the subjects was: 1, 2, 3, 1, 2, 3, in blocks 2 through 7, respectively. For the other half, the reverse order was used. The first block was considered to be practice and the temporal overlap used during it was identical to the one in condition 2.
Stimuli. The auditory stimuli were generated in the following way: Each stimulus was uttered by a male native English speaker, and its wave form was digitized by the PDP-9 computer at a sampling rate of 10 khz. It was stored in digitized form and converted back into its analog form and played to The subject whenever needed The quality of the sound and the signal-to-noise ratio were sufficient to preclude any possibility of acoustic confusion.
The longest vertical diameter of each of the visual stimuli was 28 mm, thus subtending about 30 12' visual angle with viewing distance of 50 cm. The side of the square frame was 33 mm, thus .subtending about 30 47' visual angle.
Subjects. Eight subjects were used, all undergraduates at the University of California, San Diego, who participated in the experiment as part of their course requirement. The subjects were also paid a monetary bonus that depended heavily on accuracy for both auditory discrimination and visual detection responses and slightly on the speed of the first one. The subjects were asked to try to be as fast as they could without making errors at all. All had normal vision or fully corrected vision.
Results and Discussion
The error percentages for the individual subjects were mostly between 1 and 2% and never exceeded 4%. Latencies for incorrect responses did not depart much from latencies for correct responses. There was no indication of a speed-accuracy tradeoff. Thus, it was decided to use in the analysis all the latency data for both correct and incorrect responses.
Letters were considered as a random factor in the analysis and preliminary tests were performed to determine for each systematic source whether or not its interaction with the letters factor and its triple inter-
1 All the subjects in this experiment and the following one referred to the rectangle in their later verbal description as the letter O.
FOREST BEFORE TREES
action with the letters and subjects factors, should be included in the error term, (See Winer, 1971, pp. 378-384). In all the tests described here and later in the paper, those interaction terms were found nonsignificant, thus the error term for each within-subject source was its interaction with subjects.
The mean latencies to auditory discrimination responses for each of the three temporal overlap conditions are plotted in Fig. 4 as a function of the consistency between the auditory and the visual input. The effect of consistency on latency is highly significant, F(2,12) = 39.09; P < .001. The agreement among individual subjects with respect to the order of mean latencies for the different levels of consistency is very high (Kendall coefficient of concordance of .89). The different conditions of temporal overlap not only vary with regard to their effect on mean latency, F(2, 12) = 26.30; p < .001, but also interact with the variable of consistency,
FIG. 4. Mean auditory latencies in Experiment I as a function of consistency level and temporal overlap condition. The delay is of the visual stimulus with respect to the auditory stimulus.
364 DAVID NAVON
MEan Latencies (MILLISECONDS) in Experiment i Tabulated by Consistency Levels and Temporal Overlap conditions
Note. Results of post hoc pairwise comparisons are represented by equality (=) and inequality (<) signs. An equal sign denotes nonsignificant comparison. One inequality sign denotes comparison significant to the .05 level. Two inequality signs denote comparison significant to the .01 level.
F(4,24) = 3.45; P < .025. A closer inspection of the data by means of Newman-Keuls procedure for post hoc comparisons (See Table I) suggests that the differences between all pairs of levels of consistency are significant and that there is a significant difference between the negative delay of the flash and the two other delays. The difference between the last two is not significant. It also appears that the effect of consistency is stronger when the auditory stimulus starts after the onset of the flash.
This experiment has the main characteristics of Experiment I with regard to procedure, apparatus, and setting. The major difference is in the set of visual stimuli. Two sessions were administered, a test session and then a control session. The visual stimuli used in the test session were a large H, S. or rectangle. These global characters were made out of local characters: small Ms, Ss, or rectangles. The shapes of the local and global characters were identical. The arrangement Hs, the centers of the small characters making up a large one was the same as the arrangement of the dots making up the small characters proper. The set of actual stimuli is shown in Fig. 5A.
The global level was factorially crossed with the local level, and each visual stimulus was presented twice with each auditory stimulus, within each block. The test session consisted of one practice block and three test blocks. The control session included two parts of two blocks each. In the first part the stimuli used were the ones used in Experiment I. The stimuli used in the second part (See in Fig. 5B) were just single-element characters of the stimuli from the test session presented at the center of the display field. The square frame used for delimiting the field of the visual stimulus was smaller during the second part of the control session. The ratio between its size and the size of the stimuli was the same as in the test session and in the first part of the control session.
The temporal overlap between the auditory and the visual stimuli was identical to that in condition I of Experiment I: The flash was 80 msec long and was turned on 40 msec before the start of the auditory stimulus.
FOREST BEFFOREST BEFORE TREES 365
FIG. 5. The set of stimuli used in the test session of Experiment 2 is presented in A. The set of stimuli used in the second part of the control session is presented in B.
Eighteen subjects were run, none of whom had served in Experiment 1. The size of the visual display was varied between subjects. For nine of them the size of the whole stimulus was the same as in Experiment 1; the size of the element characters (or of the whole stimulus in the second part of the control session) was 1/8 of that size. The other nine subjects were presented with stimuli that were 1.5 as large as the respective stimuli the first nine subjects were presented with.
The same type of analysis as in Experiment 1 was applied. The data of two subjects, one from each size condition, were eliminated because their error rates were too high (greater than 4%).
The mean latencies to auditory discrimination responses dining the test session are plotted in Fig. 6 as a function of the consistency between the auditory input and both the global level and the local level of the visual
6. Mean auditory latencies in the test session of Experiment 2 as a function of global consistency and local consistency.
input. The effect of consistency of the global level with the auditory stimuli is highly significant, F(2,28)=94.90; p < .001, whereas the consistency of the local level does not have any significant effect, F(2,28) = 1.64; p > ,20.
The agreement among individual subjects with respect to the order of mean latencies is very high for the global level but very low for the local ones (Kendall coefficients of concordance were .94 and .09, respectively). The post hoc pairwise comparisons between the levels of global consistency done by means of Newman-Keuls procedure were all significant at the .01 level.
No other factor in the design, including size, was found to be significant, except for the factor of subjects and for two quadruple interactions.
The mean latencies to auditory discrimination responses during each of the parts of the control session are plotted in Fig. 7 as a function of the consistency between the auditory and the visual input. The effect of consistency is highly significant, F(2,28) =103.74;p < .001, and it does not interact with the type of stimuli, F(2,28) = 0.17. The effect of the type of stimuli proper fell short of significance, F (1, 14) = 2.89; p > ,10. The agreement among individual subjects with respect to the order of mean
FOREST BEFORE TREES
7. Mean auditory latencies in the control session of Experiment 2 as a function of consistency level and type of stimuli.
latencies was high for either of the two types (Kendall coefficients were .79 for the larger bold type and .74 for the small thin type). The post hoc pairwise comparisons were all significant at the .01 level. The factors of subjects and letters were found significant as well as five interactions involving at least one of these factors.
The difference between overall mean latency in the test session and overall mean latency in the control session was nonsignificant, F (1,15)=1.21; p > .25, for trials without visual input but significant, F(l,15)=10.44; p < ,01, for trials with visual input. The significant drop in average latency from test to control may be due to the fact that the subjects may have been more practiced during the control session since it was administered after the test session. Another possibility is that as the visual stimuli in the control session were simpler, they required less processing, thus enabling earlier completion of auditory processing and response selection.
The interference effect applied just to the global visual pattern and not at all to the elements of which it was made. The same effect holds
368 DAVID NAVON
for stimuli as small as the elements when they stand alone in the visual field. Hence it is not the smaller size of the elements per se that makes them relatively or absolutely unnoticed.
The results of this experiment suggest that there are situations in which visual processing is carried just to a limited depth. The global pattern is apprehended hut not its components. All but three subjects did not even notice that the stimuli were made of small letters. When asked after the experiment was over, they said that the stimuli may have been made of dots or of blocks.
There seems to be no sensory limit responsible for the superficial account for the visual data; the second part of the control session of this experiment, as well as the results of a later experiment, Experiment 3, indicate that one can voluntarily attend to the local features as well. The visual system seems rather to have made a decision to neglect the processing of the elements in view of the structure of the task, although it might have had enough capacity for performing a more thorough analysis. I believe that such economy in processing effort is very characteristic of human vision. Since usually the informative contribution of local features over the information gained by processing global structures is small or negligible, whereas the reverse is not true, the system will quit after a scene is interpreted on the global level.