Research shows that object-location binding errors can occur in VWM indicating a failure to store bound representations rather than mere forgetting (Bays et al., 2009; Pertzov et. al. 2012). Here we investigated how categorical similarity between real-world objects influences the probability of object-location binding errors. Our observers memorized three objects (image set: Konkle et. al. 2010) presented for 3 seconds and located around an invisible circumference. After a 1-second delay they had to (1) locate one of those objects on the circumference according to its original position (localization task), or (2) recognize an old object when paired with a new object (recognition task). On each trial, three encoded objects could be drawn from a same category or different categories, providing two levels of categorical similarity. For the localization task, we used the mixture model (Zhang & Luck, 2008) with swap (Bays et al., 2009) to estimate the probabilities of correct and swapped object-location conjunctions, as well as the precision of localization, and guess rate (locations are forgotten). We found that categorical similarity had no effect on localization precision and guess rate. However, the observers made more swaps when the encoded objects have been drawn from the same category. Importantly, there were no correlations between the probabilities of these binding errors and probabilities of false recognition in the recognition task, which suggests that the binding errors cannot be explained solely by poor memory for objects. Rather, remembering objects and binding them to locations appear to be partially distinct processes. We suggest that categorical similarity impairs an ability to store objects attached to their locations in VWM.
It has been shown that multiple objects can be efficiently represented as ensemble summary statistics, such as the average. Recently, Kanaya et al. (2018) demonstrated the amplification effect in the perception of average. Their participants judged the mean size or temporal frequency of ensembles, and they tended to exaggerate their estimates, especially larger set sizes. Kanaya et al. explained it by non-exhaustive sampling mechanism favoring ~sqrt(N) most salient items, which are either largest or most frequently ones. But how do the rest of elements contribute to ensemble perception? In our study, we used orientation averaging (which does not have any inevitably salient values) and manipulated the salience of individual items via size. Participants had to adjust the average orientation of 4, 8, or 16 triangles. We measured systematic biases, like Kanaya et al. (2018), and SD of errors that are known to correlate with the physical ensemble range. In Experiment 1, most clockwise elements could be bigger, counterclockwise, middle, or all elements were same-size. We found strong clockwise and counterclockwise biases in the corresponding conditions. The biases increased with set size replicating Kanaya et al. (2018). But we found no SD difference between the conditions suggesting that all items were somehow taken into account. In Experiment 2, we compared distributions with same ranges (full-sets) but salient elements being middle or extreme (most clockwise and counterclockwise). We used distribution with only middle elements or only extremes as controls (half-sets). We found that SD in the full-sets were greater than in the middle half-sets and smaller than in the extreme half-sets suggesting that all items were taken into account. We also found that SD in the extreme full-sets were greater than in the middle full-sets in large set size. We conclude that both exhaustive and amplification types of sampling work in averaging.
Increased distractor heterogeneity complicates visual search, but only when the set of distractors has high dissimilarity. However, if a gap between those dissimilar distractors in the feature space is filled with numerous intermediate feature values, it paradoxically improves the salience of a target singleton despite increased distractor heterogeneity. To explain this paradox we suggested that the distractor heterogeneity effect is mediated by "segmentability". This predicts different heterogeneity effects on singleton search depending on the smoothness of transition between neighboring features.
Ensemble summary statistics represent multiple objects on the high level of abstraction—that is, without representing individual features and ignoring spatial organization. This makes them especially useful for the rapid visual categorization of multiple objects of different types that are intermixed in space. Rapid categorization implies our ability to judge at one brief glance whether all visible objects represent different types or just variants of one type. A framework presented here states that processes resembling statistical tests can underlie that categorization. At an early stage (primary categorization), when independent ensemble properties are distributed along a single sensory dimension, the shape of that distribution is tested in order to establish whether all features can be represented by a single or multiple peaks. When primary categories are separated, the visual system either reiterates the shape test to recognize subcategories (indepth processing) or implements mean comparison tests to match several primary categories along a new dimension. Rapid categorization is not free from processing limitations; the role of selective attention in categorization is discussed in light of these limitations.
The knowledge of target features can be used to guide attention in many conjunction searches in a top-down manner. For example, in search for a red vertical line among blue vertical and red horizontal lines, observers can guide attention toward all red items and all vertical items. Items with both features would gain greater activation. It could be that attention is guided to the group of red items and the group of vertical items with items neatly divided into those with a target feature and those without. Alternatively, attention might be guided to any reddish and relatively vertical items, with no grouping. We tested whether clear, categorical groups were useful in guided search. Observers searched for color-orientation (Experiment 1) or length-orientation (Experiment 2) conjunction targets. Distractors could form two segmentable groups (e.g blue steep and red flat) or distractors could be “non-segmentable” varying from red to blue and steep to flat discouraging grouping and increasing overall heterogeneity. We found that, when the target was present, the searches were quite efficient in Experiment 1 (~9–14 ms/item) and more efficient in Experiment 2 (~0–6 ms/item). Target-present slopes were not affected by “segmentability” manipulations. However, target-absent slopes were less efficient if one of the dimensions was “non-segmentable” (especially in length-orientation conjunctions). In Experiment 3, we demonstrated that search in “non-segmentable” conjunction sets search no less and could be even more efficient than search in “non-segmentable” feature search. Our results suggest that attention is directly guided by the overlap between top-down activation signals corresponding to target features. The guidance mechanism bypasses grouping and segmentation cues that are very important in other tasks like scene parsing and object recognition.
Top-down guidance of visual search is an issue of continuous discussions (e.g. Wolfe, Horowitz, 2017). However, it’s still unclear when guidance emerges in the course of individual development, and whether the fronto-parietal brain network, which underpins attentional control, is necessary for the attentional guidance. Although there were a number of experiments studying visual search in children, to our knowledge no study directly confronted conditions, under which adults do and do not demonstrate guided search, in younger populations. In our experiment, we compared feature search, guided conjunction search and unguided conjunction search in 20 young adults (university students, mean age 18.5) and 20 junior schoolchildren (7.5–9.5 years old, mean age 8.5). The two groups performed three randomized blocks of the standard visual search task, searching for a target “fox’s house” among distractor houses and receiving feedback after each trial. The target house differed from distractors only in color (feature search), in color and shape (conjunction search), or was defined as a specific combination of two colors (conjunction search with no possibility of top-down guidance). Set sizes of 4, 7, and 10 stimuli were used, with only a half of the trials containing a target. Our hypothesis was that in adults we would observe top-down regulation of the conjunction search, whereas in children the search besides the feature search condition will be equally inefficient, because of the fron-to-parietal network immaturity (e.g. Astle et al., 2015). Surprisingly, the overall pattern of results in all three conditions was the same in children and adults, with pronouncedly more efficient conjunction search as compared to the unguided search, although children were significantly (and proportionally) slower in all types of search. This allows concluding that top-down attentional guidance is already fully present in junior schoolchildren.
The visual system can represent multiple objects in a compressed form of ensemble summary statistics (such as object numerosity, mean, and feature variance/range). Yet the relationships between the different types of visual statistics remain relatively unclear. Here, we tested whether two summaries (mean and numerosity, or mean and range) are calculated independently from each other and in parallel. Our participants performed dual tasks requiring a report about two summaries in each trial, and single tasks requiring a report about one of the summaries. We estimated trial-by-trial correlations between the precision of reports as well as correlations across observers. Both analyses showed the absence of correlations between different types of ensemble statistics, suggesting their independence. We also found no decrement (except that related to the order of report explained by memory retrieval) in performance in dual compared to single tasks, which suggests that two statistics of one ensemble can be processed in parallel.
The visual system can represent multiple objects in a compressed form of ensemble summary statistics (such as object numerosity, mean, and variance of their features). Yet, the relationships between the different types of visual statistics remains relatively unclear. Here, we tested whether two summaries (mean and numerosity – Experiment 1, and mean and variance – Experiment 2) are calculated independently from each other and in parallel, that is, without cost of dividing attention. Our participants performed dual tasks requiring report about two summaries in each trial, and single tasks requiring report about only one of the summaries. Observers were briefly shown sample sets of circles of various sizes. At test, they had to report the number of circles, their mean size, or the variance of sizes using the adjustment method. The relative difference between an adjusted value and a correct answer was used as a measure of precision. We estimated trial-by-trial correlations between the precision of reports in dual task separately for each observer, as well as correlations between averaged errors in reporting summaries in different conditions across all observers. Both analyses showed (1) the absence of correlations between different types of ensemble statistics suggesting their independence, (2) strong auto-correlations of same-type statistics in different tasks (dual vs. single) suggesting good between-test consistency. We also found no decrement (except that related to the order of report explained by memory retrieval) in performance in dual compared to single tasks, which suggests that two statistics of one ensemble can be processed in parallel. In an additional experiment, we found that the precision of variance reports did not change even when mean size and spatial density changed substantially between sample and adjustment sets. This finding also says for independence between the ensemble statistics.
An uninformative exogenous cue speeds target detection if cue and target appear in the same location separated by a brief temporal interval. This finding is usually ascribed to the orienting of spatial attention to the cued location. Here we examine the role of perceptual merging of the two trial events in speeded target detection. That is, the cue and target may be perceived as a single event when they appear in the same location. If so, cueing effects could reflect, in part, the binding of the perceived target onset to the earlier cue onset. We observed the traditional facilitation of cued over uncued targets and asked the same observers to judge target onset time by noting the time on a clock when the target appeared. Observers consistently judged the onset time of the target as being earlier than it appeared with cued targets judged as earlier than uncued targets. When the event order is reversed so that the target precedes the cue, perceived onset is accurate in both cued and uncued locations. This pattern of results suggests that perceptual merging does occur in exogenous cueing. A modified attention account is discussed that proposes reentrant processing, evident through perceptual merging, as the underlying mechanism of reflexive orienting of attention.
When storing multiple objects in visual working memory, observers sometimes misattribute perceived features to incorrect locations or objects. These "swaps" are usually explained by a failure to store object representations in a bound form. Swap errors have been demonstrated mostly in simple objects whose features (color, orientation, shape) are easy to encode independently. Here, we tested whether similar swaps can occur with real-world objects where the connections between features are meaningful. In Experiment 1, observers were simultaneously shown four items from two object categories (two exemplars per category). Within a category, the exemplars could be presented in either the same (two open boxes) or different states (one open, one closed box). After a delay, two exemplars drawn from one category were shown in both possible states. Participants had to recognize which exemplar went with which state. In a control task, they had to recognize two old vs. two new exemplars. Participants showed good memory for exemplars when no binding was required. However, when the tested objects were shown in the different states, participants were less accurate. Good memory for state information and for exemplar information on their own, with a significant memory decrement for exemplar-state combinations suggest that binding was difficult for observers and "swap" errors occurred even for real-world objects. In Experiment 2 we used the same tasks, but on half of trials the locations of the exemplars were swapped at test. We found that participants ascribed incorrect states to exemplars more frequently when the locations were swapped. We conclude that the internal features of real-world objects are not perfectly bound in VWM and can be attached to locations independently. Overall, we provide evidence that even real-world objects are not stored in an entirely bound representation in working memory.
Observers are good at rapid estimation of the average size of multiple objects (Ariely, 2001; Chong & Treisman, 2003). We tested whether the average is calculated along a "raw" (proximal) stimulus size (where only visual angle is important) or relies on the distal size of an object (which requires taking distance information into account). Our participants performed the size averaging task adjusting the size of a probe circle. Using a stereoscope, we changed the apparent distance of ensemble members from the observer. In Experiment 1, all ensemble members shifted by the same disparity angle in both eyes, so that they seemed at different distances but always in one plane. The probe was always in a same plane (zero disparity). We found that presenting ensembles in apparently remote planes made observers to overestimate their mean size in comparison to what is expected from simple visual angle averaging. In Experiment 2, ensemble members were presented at different planes so that (1) visual angle reduced with the apparent distance, making apparent sizes of individual members more similar, (2) visual angle increased with the apparent distance, increasing this apparent dissimilarity, and (3) all members were presented at the zero disparity plane. We found that the mean error in probe averaging in condition (1) was significantly smaller than in other conditions. This finding is in line with previous studies also showing that similarity between ensemble members in one plane reduce the error. As the items in condition (1) could look more similar than in the others only due to the distance cues, we conclude that observers took into these cues into account. Our main theoretical conclusion is that the visual system appears to work with bound objects rather than their separate features when representing their global properties such as the average size.
Meeting abstract presented at VSS 2016
Numerous studies report that observers are good at evaluating various ensemble statistics, such as mean or range. Recent studies have shown that, in the perception of mean size, the visual system relies on size information individually rescaled to distance for each item (Utochkin & Tiurina, 2018). Here, we directly tested this rescaling mechanism on the perception of variance. In our experiment, participants were stereoscopically shown a sample set of circles with different sizes and in different apparent depths. Then they had to adjust a test set so that the range of sizes to match the range of the sample. We manipulated the correlation between sizes and depth for both samples and tests. In positive size-depth correlation, bigger circles were presented farther and had to seem larger and small circles were presented closer and had to seem smaller; therefore, the apparent range had to increase. In negative size-depth correlation, the apparent range had to decrease, since bigger circles had to become smaller, and vice versa. We tested all possible couplings of correlation conditions between samples and tests. We found that in general, observers tended to overestimate the range of the sample (over-adjusted it on the test). Yet, the strongest underestimation was shown when the sample had a negative correlation and the test had a positive correlation. This pattern is consistent with the prediction following from the idea of rescaling. As the negative correlation reduced an apparent range, participants had to under-adjust the range of a positively correlated test to compensate for the difference in variance impressions. We conclude, therefore, that multiple sizes are automatically rescaled in accordance with their distances and this rescaling can be used to judge ensemble variance.
Illusory conjunctions (IC) refer to errors in which an observer correctly reports features present in the display, but incorrectly pairs features or parts from multiple objects. There is a long-standing debate in the literature about the nature of ICs; for example, whether they arise from the lack of focused attention (Treisman & Schmidt, 1982) or from lossy peripheral representations (Rosenholtz et al., 2012). Here, we test the hypothesis that the occurrence of ICs relates to spatial uncertainty of features falling within the same noisy “window”. According to this idea, ICs occur when the spatial uncertainty is large compared to the distance between items, causing confusion over which features belong to which item. In Experiment 1, we directly measured the spatial noise at 3°, 6°, 9°, 12° from fixation. A compact “crowd” of four dots briefly appeared, followed by the presentation of a probe circle at various distances from the “crowd”. Observers had to respond whether any dot had fallen within the probed region. The probability of perceiving the dots as outside the probe as a function of distance provides a measure of spatial noise as a function of eccentricity. In Experiment 2, we presented four differently colored and oriented bars, located on an invisible circle with a diameter varying from 1° to 3.5° (the “separation”), and centered at one of three eccentricities (4°, 8°, 12°). Participants had to report the color, orientation, and location of any of the bars. The number of correct answers, guesses (reporting non-presented features), and ICs were estimated. The number of IC increased with eccentricity and decreased with separation. There is good resemblance between the spatial noise and the IC pattern. We conclude that there can be an overlap between the mechanisms of spatial localization and IC in peripheral vision.
The visual search for multiple targets can cause errors called subsequent search misses (SSM) – a decrease in accuracy at detecting a second target after a first target has been found (e.g. Adamo, Cain, & Mitroff, 2013). One of the possible explanations is perceptual set. After the first target is found, the subject becomes biased to find perceptually similar targets, therefore he is more likely to find perceptually similar targets and less likely to find the targets that are perceptually dissimilar. The experiment investigated the role of perceptual similarity in SSM errors. The search array in each trial consisted of 20 stimuli (ellipses and crosses, black and white, small and big, oriented horizontally and vertically), which could contain one, two or no targets. In case of two targets, they could have two, three or four shared features (in the last case they were identical). The features of target stimuli were indicated at the beginning of each trial. Participant's task was to find all the target stimuli or report their absence. Accuracy for conditions with two stimuli with two, three, four shared features and with one stimulus was compared. In case of two targets the correct answer assumed finding both stimuli. Repeated measures ANOVA revealed the main effect of shared features factor, F(1, 19) = 15.71, p = .000. Pairwise comparisons (with Holm-Bonferroni adjustment) revealed significant differences between all the conditions, except the conditions with one stimulus and two identical stimuli. SSM errors were found for all conditions, except fully identical stimuli condition. The size of SSM effect decreased with increasing the similarity between the targets. The results indicate the role of perceptual similarity and have implications for the perceptual set theory.