A Generative Model of Cognitive State from Task and Eye Movements
Background / introduction
The early eye tracking studies of Yarbus provided descriptive evidence that an observer’s task influences
patterns of eye movements, leading to the tantalizing prospect that an observer’s intentions could be
inferred from their saccade behavior. We investigate the predictive value of task and eye movement
properties by creating a computational cognitive model of saccade selection based on instructed task
and internal cognitive state using a Dynamic Bayesian Network (DBN). Understanding how humans
generate saccades under different conditions and cognitive sets links recent work on salience models of
low level vision with higher level cognitive goals. This model provides a Bayesian, cognitive approach to
top down transitions in attentional set in pre-frontal areas along with vector based saccade generation
from the superior colliculus.
Our approach is to begin with eye movement data that has previously been shown to differ across task.
We first present an analysis of the extent to which individual saccadic features are diagnostic of an
observer’s task. Second, we use those features to infer an underlying cognitive state that potentially
differs from the instructed task. Finally, we demonstrate how changes of cognitive state over time can
be incorporated into a generative model of eye movement vectors without resorting to an external
Internal cognitive state frees the model from the assumption that instructed task is the only factor
influencing observers’ saccadic behavior. While the inclusion of hidden temporal state does not
improve the classification accuracy of the model, it does allow accurate prediction of saccadic sequence
results observed in search paradigms.
Given the generative nature of this model, it is capable of saccadic simulation in real time. We
demonstrated that the properties from its generated saccadic vectors closely match those of human
observers given a particular task and cognitive state. Many current models of vision focus entirely on
bottom up salience to produce estimates of spatial ‘areas of interest’ within a visual scene. While a few
recent models do add top-down knowledge and task information, we believe our contribution is
important in three key ways. First, we incorporate task as learned attentional sets that that are capable
of self-transition given only information available to the visual system. This matches influential theories
of bias signals by Miller & Cohen (2001), and implements selection of state without simply shifting the
decision to an external homunculus. Second, our model is generative and capable of predicting
sequence artifacts in saccade generation like those found in visual search. Third, our model generates
relative saccadic vector information as opposed to absolute spatial coordinates. This matches more
closely the internal saccadic representation as they are generated in the superior colliculus.