Research shows that object-location binding errors can occur in VWM indicating a failure to store bound representations rather than mere forgetting (Bays et al., 2009; Pertzov et. al. 2012). Here we investigated how categorical similarity between real-world objects influences the probability of object-location binding errors. Our observers memorized three objects (image set: Konkle et. al. 2010) presented for 3 seconds and located around an invisible circumference. After a 1-second delay they had to (1) locate one of those objects on the circumference according to its original position (localization task), or (2) recognize an old object when paired with a new object (recognition task). On each trial, three encoded objects could be drawn from a same category or different categories, providing two levels of categorical similarity. For the localization task, we used the mixture model (Zhang & Luck, 2008) with swap (Bays et al., 2009) to estimate the probabilities of correct and swapped object-location conjunctions, as well as the precision of localization, and guess rate (locations are forgotten). We found that categorical similarity had no effect on localization precision and guess rate. However, the observers made more swaps when the encoded objects have been drawn from the same category. Importantly, there were no correlations between the probabilities of these binding errors and probabilities of false recognition in the recognition task, which suggests that the binding errors cannot be explained solely by poor memory for objects. Rather, remembering objects and binding them to locations appear to be partially distinct processes. We suggest that categorical similarity impairs an ability to store objects attached to their locations in VWM.
Eye-tracking is widely used in research of attentional strategies in tasks with visual representations. Strategies improve with learning and many have examined differences in attention allocation between experts and novices. Research show that when math problems are presented on the screen with response options, novices fixated more on response options that included distractors, whereas experts fixated more on the math problem and the correct answer. If experts and novices apply their attention on different parts of the scree these strategy differences would also be observable when comparing high and low performers. Participants (N = 26; 20-30 years), were non-math university majors who completed the Parametric Math Task (PMT; Konopkina, 2019) while their eye movements were recorded in a remote head-free-to-move mode. The PMT contains mathematical problems of addition, subtraction, multiplication and division with three levels of difficulty. Individuals who scored above median were high performers and below were considered as low performers. Data were analysed by evaluating dwell time (total duration of fixation) to the math problem area (top of screen) and response options areas (bottom of screen). Results showed that high performers and low performers were significantly different in their dwell times for two interest areas: problem area and distractor responses (problem area: p = 0.029, Cohen’s d = 0.92; distractor responses: p = 0.018, Cohen’s d = 0.99). Findings indicate that high performers spent significantly more time on the math problem area of the screen whereas low performers spent more time on distractor options. In educational practice, knowledge of looking times and locations may be indicative of strategies used by the problem solvers.
It has been shown that multiple objects can be efficiently represented as ensemble summary statistics, such as the average. Recently, Kanaya et al. (2018) demonstrated the amplification effect in the perception of average. Their participants judged the mean size or temporal frequency of ensembles, and they tended to exaggerate their estimates, especially larger set sizes. Kanaya et al. explained it by non-exhaustive sampling mechanism favoring ~sqrt(N) most salient items, which are either largest or most frequently ones. But how do the rest of elements contribute to ensemble perception? In our study, we used orientation averaging (which does not have any inevitably salient values) and manipulated the salience of individual items via size. Participants had to adjust the average orientation of 4, 8, or 16 triangles. We measured systematic biases, like Kanaya et al. (2018), and SD of errors that are known to correlate with the physical ensemble range. In Experiment 1, most clockwise elements could be bigger, counterclockwise, middle, or all elements were same-size. We found strong clockwise and counterclockwise biases in the corresponding conditions. The biases increased with set size replicating Kanaya et al. (2018). But we found no SD difference between the conditions suggesting that all items were somehow taken into account. In Experiment 2, we compared distributions with same ranges (full-sets) but salient elements being middle or extreme (most clockwise and counterclockwise). We used distribution with only middle elements or only extremes as controls (half-sets). We found that SD in the full-sets were greater than in the middle half-sets and smaller than in the extreme half-sets suggesting that all items were taken into account. We also found that SD in the extreme full-sets were greater than in the middle full-sets in large set size. We conclude that both exhaustive and amplification types of sampling work in averaging.
Knowledge of target features can guide attention in many conjunction searches in a top-down manner. For example, in search of a red vertical line among blue vertical and red horizontal lines, observers can guide attention toward all red items and all vertical items. In typical conjunction searches, distractors often form perceptually vivid, categorical groups of identical objects. This could favor the efficient search via guidance of attention to these “segmentable” groups. Can attention be guided if the distractors are not neatly segmentable (e.g., if colors vary continuously from red through purple to blue)? We tested search for conjunctions of color × orientation (Experiments 1, 3, 4, 5) or length × orientation (Experiment 2). In segmentable conditions, distractors could form two clear groups (e.g., blue steep and red flat). In non-segmentable conditions, distractors varied smoothly from red to blue and/or steep to flat; thus, discouraging grouping and increasing overall heterogeneity. We found that the efficiency of conjunction search was reasonably high and unaffected by segmentability. The same lack of segmentability had a detrimental effect on feature search (Experiment 4) and on conjunction search, if target information was limited to one feature (e.g., find the odd item in the red set, “subset search,” Experiment 3). Guidance in conjunction search may not require grouping and segmentation cues that are very important in other tasks like texture discrimination. Our results support an idea of simultaneous, parallel top-down guidance by multiple features and argue against models suggesting sequential guidance by each feature in turn.
Increased distractor heterogeneity complicates visual search, but only when the set of distractors has high dissimilarity. However, if a gap between those dissimilar distractors in the feature space is filled with numerous intermediate feature values, it paradoxically improves the salience of a target singleton despite increased distractor heterogeneity. To explain this paradox we suggested that the distractor heterogeneity effect is mediated by "segmentability". This predicts different heterogeneity effects on singleton search depending on the smoothness of transition between neighboring features.
Cognitive effort a subjective phenomenon, generally defined as the amount of sustained mental activity, exerted during a cognitive task. A well-established eye movement indice of cognitive effort is blink rate. Many studies show that in cognitive tasks that involve visual stimuli blink rate decreases as a function of difficulty (Maffei, & Angrilli, 2018). Working memory (WM) is a core cognitive ability and refers to the number of items or schemes that can be simultaneously held and manipulated in mind. While a great deal of studies have explored behavioral correlates of WM load and task complexity, little is known about how these relate to eye movements across development. We implement an eye-tracking paradigm to study effect of complexity and WM load on eye movements from a developmental perspective. 57 healthy adults (23 male., age = 23.25±3.6) and 26 children (10 male, age = 9.53±0.76) participated in the study. Eye-tracking data was recorded with the EyeLink Portable Duo, while participants performed the Colour Matching Task (Arsalidou et al., 2010). During the CMT the participant is shown a picture with multiple colours for 3 s. and gives a response at the following picture: are the colours same or different. CMT has 6 levels of WM load: the number of relevant colors and two levels of task complexity (low and high interference conditions). Analyses of variance showed a significant main effect of age group on blink rate (p < 0.01, F = 9.091, η2 = 0.009) with children making less blinks in all levels of WM load, as well as significant main effect of WM load (p < 0.001, F = 130.5, η2 = 0.021) with blink rate decreasing as WM load increased. No significant effects were observed for task complexity. Results will be discussed in terms of cognitive development and implications to education.
Ensemble summary statistics represent multiple objects on the high level of abstraction—that is, without representing individual features and ignoring spatial organization. This makes them especially useful for the rapid visual categorization of multiple objects of different types that are intermixed in space. Rapid categorization implies our ability to judge at one brief glance whether all visible objects represent different types or just variants of one type. A framework presented here states that processes resembling statistical tests can underlie that categorization. At an early stage (primary categorization), when independent ensemble properties are distributed along a single sensory dimension, the shape of that distribution is tested in order to establish whether all features can be represented by a single or multiple peaks. When primary categories are separated, the visual system either reiterates the shape test to recognize subcategories (indepth processing) or implements mean comparison tests to match several primary categories along a new dimension. Rapid categorization is not free from processing limitations; the role of selective attention in categorization is discussed in light of these limitations.
The knowledge of target features can be used to guide attention in many conjunction searches in a top-down manner. For example, in search for a red vertical line among blue vertical and red horizontal lines, observers can guide attention toward all red items and all vertical items. Items with both features would gain greater activation. It could be that attention is guided to the group of red items and the group of vertical items with items neatly divided into those with a target feature and those without. Alternatively, attention might be guided to any reddish and relatively vertical items, with no grouping. We tested whether clear, categorical groups were useful in guided search. Observers searched for color-orientation (Experiment 1) or length-orientation (Experiment 2) conjunction targets. Distractors could form two segmentable groups (e.g blue steep and red flat) or distractors could be “non-segmentable” varying from red to blue and steep to flat discouraging grouping and increasing overall heterogeneity. We found that, when the target was present, the searches were quite efficient in Experiment 1 (~9–14 ms/item) and more efficient in Experiment 2 (~0–6 ms/item). Target-present slopes were not affected by “segmentability” manipulations. However, target-absent slopes were less efficient if one of the dimensions was “non-segmentable” (especially in length-orientation conjunctions). In Experiment 3, we demonstrated that search in “non-segmentable” conjunction sets search no less and could be even more efficient than search in “non-segmentable” feature search. Our results suggest that attention is directly guided by the overlap between top-down activation signals corresponding to target features. The guidance mechanism bypasses grouping and segmentation cues that are very important in other tasks like scene parsing and object recognition.
Top-down guidance of visual search is an issue of continuous discussions (e.g. Wolfe, Horowitz, 2017). However, it’s still unclear when guidance emerges in the course of individual development, and whether the fronto-parietal brain network, which underpins attentional control, is necessary for the attentional guidance. Although there were a number of experiments studying visual search in children, to our knowledge no study directly confronted conditions, under which adults do and do not demonstrate guided search, in younger populations. In our experiment, we compared feature search, guided conjunction search and unguided conjunction search in 20 young adults (university students, mean age 18.5) and 20 junior schoolchildren (7.5–9.5 years old, mean age 8.5). The two groups performed three randomized blocks of the standard visual search task, searching for a target “fox’s house” among distractor houses and receiving feedback after each trial. The target house differed from distractors only in color (feature search), in color and shape (conjunction search), or was defined as a specific combination of two colors (conjunction search with no possibility of top-down guidance). Set sizes of 4, 7, and 10 stimuli were used, with only a half of the trials containing a target. Our hypothesis was that in adults we would observe top-down regulation of the conjunction search, whereas in children the search besides the feature search condition will be equally inefficient, because of the fron-to-parietal network immaturity (e.g. Astle et al., 2015). Surprisingly, the overall pattern of results in all three conditions was the same in children and adults, with pronouncedly more efficient conjunction search as compared to the unguided search, although children were significantly (and proportionally) slower in all types of search. This allows concluding that top-down attentional guidance is already fully present in junior schoolchildren.
The visual system can represent multiple objects in a compressed form of ensemble summary statistics (such as object numerosity, mean, and feature variance/range). Yet the relationships between the different types of visual statistics remain relatively unclear. Here, we tested whether two summaries (mean and numerosity, or mean and range) are calculated independently from each other and in parallel. Our participants performed dual tasks requiring a report about two summaries in each trial, and single tasks requiring a report about one of the summaries. We estimated trial-by-trial correlations between the precision of reports as well as correlations across observers. Both analyses showed the absence of correlations between different types of ensemble statistics, suggesting their independence. We also found no decrement (except that related to the order of report explained by memory retrieval) in performance in dual compared to single tasks, which suggests that two statistics of one ensemble can be processed in parallel.
The visual system can represent multiple objects in a compressed form of ensemble summary statistics (such as object numerosity, mean, and variance of their features). Yet, the relationships between the different types of visual statistics remains relatively unclear. Here, we tested whether two summaries (mean and numerosity – Experiment 1, and mean and variance – Experiment 2) are calculated independently from each other and in parallel, that is, without cost of dividing attention. Our participants performed dual tasks requiring report about two summaries in each trial, and single tasks requiring report about only one of the summaries. Observers were briefly shown sample sets of circles of various sizes. At test, they had to report the number of circles, their mean size, or the variance of sizes using the adjustment method. The relative difference between an adjusted value and a correct answer was used as a measure of precision. We estimated trial-by-trial correlations between the precision of reports in dual task separately for each observer, as well as correlations between averaged errors in reporting summaries in different conditions across all observers. Both analyses showed (1) the absence of correlations between different types of ensemble statistics suggesting their independence, (2) strong auto-correlations of same-type statistics in different tasks (dual vs. single) suggesting good between-test consistency. We also found no decrement (except that related to the order of report explained by memory retrieval) in performance in dual compared to single tasks, which suggests that two statistics of one ensemble can be processed in parallel. In an additional experiment, we found that the precision of variance reports did not change even when mean size and spatial density changed substantially between sample and adjustment sets. This finding also says for independence between the ensemble statistics.
Major discoveries in technology and science often rely on mathematical skills. Mathematical knowledge is founded on basic math problem solving such as addition, subtraction, multiplication, and division. Research shows that problem solving is associated with eye movements that index allocation of attention. Machine learning has been used with eye-tracking metrics to predict performance on real-life user efficiency tasks and classic puzzle games. Critically, no study to date has evaluated eye-tracking metrics associated with mathematical operations using machine learning approaches to classify trial correctness and predict task difficulty level. Participants (n = 26, 20-30 years) viewed mathematical problems in three levels of difficulty indexed by 1-, 2-, and 3-digit problems along with four possible answers, while their eye movements were being recorded. Eye-tracking data were acquired with EyeLink Portable Duo SR Research eye-tracker with 1ms temporal resolution (at 1000 Hz frequency) in remote head-free-to-move mode. Results show that trial correctness can be classified with a 0.81 ROC AUC score based on 5 fold cross-validation. Predicting task difficulty level of each trial was attained with 72% accuracy, which is significantly better than the random prediction (i.e., 50%). The most important features for both machine learning models include metrics associated with current pupil fixation, current saccade amplitudes, and current fixation duration. Theoretically, findings contribute to theories of mathematical cognition. Practically, algorithms can contribute to further research in mathematical problem solving and machine learning, which potentially has applications in education in terms of assessment and personalized learning.
An uninformative exogenous cue speeds target detection if cue and target appear in the same location separated by a brief temporal interval. This finding is usually ascribed to the orienting of spatial attention to the cued location. Here we examine the role of perceptual merging of the two trial events in speeded target detection. That is, the cue and target may be perceived as a single event when they appear in the same location. If so, cueing effects could reflect, in part, the binding of the perceived target onset to the earlier cue onset. We observed the traditional facilitation of cued over uncued targets and asked the same observers to judge target onset time by noting the time on a clock when the target appeared. Observers consistently judged the onset time of the target as being earlier than it appeared with cued targets judged as earlier than uncued targets. When the event order is reversed so that the target precedes the cue, perceived onset is accurate in both cued and uncued locations. This pattern of results suggests that perceptual merging does occur in exogenous cueing. A modified attention account is discussed that proposes reentrant processing, evident through perceptual merging, as the underlying mechanism of reflexive orienting of attention.
Cognitively challenging tasks require complex coordination of information beyond visual input. Predicting accuracy on such tasks has potential applications in education and industry. Task difficulty is associated with increases in reaction time and variation in eye tracking indices. Critically, machine learning has not yet been used to predict accuracy on cognitive tasks with multiple difficulty levels. We report data on 57 (34 females; 20-30 years) participants who completed visuospatial tasks of mental attentional capacity with six levels of difficulty while their eye movements were recorded using EyeLink Portable Duo SR Research eye-tracker with 1ms temporal resolution (at 1000 Hz frequency) in remote head-free-to-move mode. Results show that task accuracy scores can be robustly predicted when all variables (e.g., eye-tracking, difficulty level and reaction time) are considered together (R2 = .80). Reaction time, difficulty level and eye tracking metrics are also effective independent predictors with R2 equaling .73, .58, and .36, respectively. Analyses for feature importance suggest eye-tracking indices with the most importance for the models include the number of fixations, number of saccades, duration of the current fixation and pupil size. Notably, our machine learning algorithms target a prediction question, rather than a classification one, and the current algorithm can be useful for future research and applications in other contexts where visuospatial processing is required. Theoretically, findings show common and distinct metrics that can inform theories of cognition and vision science.
When storing multiple objects in visual working memory, observers sometimes misattribute perceived features to incorrect locations or objects. These "swaps" are usually explained by a failure to store object representations in a bound form. Swap errors have been demonstrated mostly in simple objects whose features (color, orientation, shape) are easy to encode independently. Here, we tested whether similar swaps can occur with real-world objects where the connections between features are meaningful. In Experiment 1, observers were simultaneously shown four items from two object categories (two exemplars per category). Within a category, the exemplars could be presented in either the same (two open boxes) or different states (one open, one closed box). After a delay, two exemplars drawn from one category were shown in both possible states. Participants had to recognize which exemplar went with which state. In a control task, they had to recognize two old vs. two new exemplars. Participants showed good memory for exemplars when no binding was required. However, when the tested objects were shown in the different states, participants were less accurate. Good memory for state information and for exemplar information on their own, with a significant memory decrement for exemplar-state combinations suggest that binding was difficult for observers and "swap" errors occurred even for real-world objects. In Experiment 2 we used the same tasks, but on half of trials the locations of the exemplars were swapped at test. We found that participants ascribed incorrect states to exemplars more frequently when the locations were swapped. We conclude that the internal features of real-world objects are not perfectly bound in VWM and can be attached to locations independently. Overall, we provide evidence that even real-world objects are not stored in an entirely bound representation in working memory.
When storing multiple objects in visual working memory, observers sometimes misattribute perceived features to incorrect locations or objects. These misattributions are called binding errors (or swaps) and have been previously demonstrated mostly in simple objects whose features are easy to encode independently and arbitrarily chosen, like colors and orientations. Here, we tested whether similar swaps can occur with real-world objects, where the connection between features is meaningful rather than arbitrary. In Experiments 1 and 2, observers were simultaneously shown four items from two object categories. Within a category, the two exemplars could be presented in either the same or different states (e.g., open/closed; full/empty). After a delay, both exemplars from one of the categories were probed, and participants had to recognize which exemplar went with which state. We found good memory for state information and exemplar information on their own, but a significant memory decrement for exemplar–state combinations, suggesting that binding was difficult for observers and swap errors occurred even for meaningful real-world objects. In Experiment 3, we used the same task, but in one-half of the trials, the locations of the exemplars were swapped at test. We found that there are more errors in general when the locations of exemplars were swapped. We concluded that the internal features of real-world objects are not perfectly bound in working memory, and location updates impair object and feature representations. Overall, we provide evidence that even real-world objects are not stored in an entirely unitized format in working memory.