?
Predicting First-Language and Second-Language Proficiency Using Eye Fixation Data and Demographic Information: Assumptions, Data Representations, and Methods
ABSTRACT Studying first-language (L1), second-language (L2) acquisition, and bilingualism using eye
movement data has become a popular topic in psycholinguistic and educational research communities. The
current research uses eye fixation data along with demographic information, to investigate the five research
questions (RQ) as follows. Q1 Is it possible to predict L1 from the eye fixation data using artificial intelligence
(AI) methods? Q2 Is it possible to predict second-language proficiency (L2P) from eye-fixation data using AI
methods? Q3 Which of the six L2P assessment batteries under consideration is more effective in predicting
L2P? Q4 How informative is eye fixation data or its combination with demographic information in predicting
L1 and L2P? Q5 How can eye fixation data be represented for training AI models in predicting L1 and L2P?
We used the MECO L2 data set and scrutinized the performance of three families of AI methods. In respect to
each RQ the results showed that 1) using only eye fixation data, it is possible to predict L1 with a ROC-AUC
equal to 0.755; 2) using only eye fixation data, it is not possible to predict L2P accurately (since a R2-score
equal to 0.216 was obtained); 3) L2 Lexical Skills is the most effective L2P assessment battery; 4) combining
the eye-fixation data with demographic features led to a significant improvement in the performance of the
models, i.e., a ROC-AUC equal to 0.997 in predicting L1 and a R2-score equal to 0.899 in predicting L2P
were obtained, and simultaneously downgraded the impacts of eye-fixation parameters; 5) the 2D-scatter
plot images can be considered an appropriate candidate for training AI models using only eye-fixation data
–at least for predicting L1.