?
Исследование потенциала генеративных моделей для оценивания эссе и обеспечения обратной связи
In the era of rapid development of generative language models these tools are increasingly being used by both students and instructors. This paper aims to investigate the potential of generative models interacting with users via chatbots ChatGPT и PerplexityAI for the evaluation of standardised essays in English and the provision of feedback on their quality. Accounting for the specific features of each chatbot and standardised assessment criteria, we developed prompts which were consequently fed to the chatbots together with 19 students’ essays. The chatbots both awarded overall grades and gave points and feedback on specific aspects. The chatbots’ grades were compared to the ones provided by the instructor, and to each other. Cronbach’s alpha was used to measure the consistency of grading, whereas Koen’s and Fleiss’s kappas helped to evaluate inter-rater agreement. Though the consistency of grading among the raters was shown to be from acceptable to excellent on different aspects, which indicates similar interpretations of assessment criteria by the instructor and the chatbots, inter-rater agreement was slight. Qualitative analysis revealed such features of feedback from chatbots as ignoring instructions in the prompt, finding non-existent errors, or awarding different grades in consecutive inquiries. We conclude that chatbots can be used for rough evaluation of standardised essays; however, their output cannot be considered reliable and needs expert editing.