Large Language Model-Based Automated Item Generation in STEM Assessments: Historical Mapping and a Scoping Review of Empirical Studies

?

Large Language Model-Based Automated Item Generation in STEM Assessments: Historical Mapping and a Scoping Review of Empirical Studies

JOURNAL OF EDUCATIONAL TECHNOLOGY DEVELOPMENT AND EXCHANGE. 2026. Vol. 19. No. 2. P. 141–165.

Educational assessments, from low-stakes classroom tests to high-stakes national examinations, require item pools that are valid, fair, and secure. Automated Item Generation (AIG) aims to efficiently produce large pools of calibrated test items. This paper adopts a two-part design: (1) a brief historical mapping situating LLM-based AIG within the broader AIG trajectory; and (2) a scoping review of empirical studies on LLM-based AIG for STEM assessments, published between January 2022 and January 2026. A structured search of ERIC, Lens and OpenAlex yielded 1,267 records; after deduplication and screening, 7 studies were retained for synthesis. In all studies, LLMs were primarily used to draft stems, keys, distractors, and explanations by instruction-tuned prompting, sometimes enhanced with retrieval and human-in-the-loop review. Empirical evidence on item quality is generally promising. Multiple investigations have documented acceptable expert evaluations and, in a subset of studies, psychometric properties comparable to those of human-authored items. Nevertheless, recurrent limitations have been observed, including factual inaccuracies, construct drift, low calibration of item difficulty, and variable distractor plausibility. Few studies reported robust fairness audits or provided reproducible details, such as complete prompts and decoding settings. In general, LLM-based AIG can substantially increase throughput in STEM item development, but high-stakes deployment requires layered validation protocols (expert review, pilot testing, psychometrics, and bias audits) and governance controls to ensure traceability and item security.

Research target: Education Psychology

Language: English

DOI

Text on another site

Параметры салютогенной среды университета: пилотажное исследование

Hachaturova M. R., Safonova A., Экспериментальная психология 2026 Т. 19 № 2 С. 51–66

Context and relevance. The influence of the environment on psychological well-being has been studied in a wide variety of contexts, including workplaces, schools, healthcare facilities, and digital spaces. However, the psychological well-being of university faculty, particularly in relation to their connection to the university environment in the wide sense, remains virtually unrepresented in psychological research. ...

Added: July 25, 2026

Proceedings of the International Science Conference “Scientific research of the SCO countries: synergy and integration” - Reports in English (June 3, 2026. Beijing, PRC)

Scientific publishing house Infinity, 2026.

These Conference Proceedings combine materials of the conference – research papers and thesis reports of scientifi c workers. They examine technical, juridical and sociological aspects of research issues. Some articles deal with theoretical and methodological approaches and principles of research questions of personality professionalization. ...

Added: July 24, 2026

Студенческая вовлеченность в вузах Дальнего Востока: региональные особенности и вызовы образовательной политики

Maloshonok N., Журнал исследований социальной политики 2026 Т. 24 № 2 С. 331–352

Дальневосточный федеральный округ (ДВФО) сталкивается с дефицитом высококвалифицированных кадров, что актуализирует вопросы качества высшего образования и формирования человеческого капитала в регионе. Одним из инструментов оценки образовательной среды в международной практике выступает концепция студенческой вовлеченности, рассматриваемая как индикатор участия студентов в образовательных практиках, способствующих обучению и развитию. Цель статьи заключается в выявлении особенностей студенческой вовлеченности в вузах ДВФО и определении их значения для образовательной политики региона. ...

Added: July 23, 2026

Взаимосвязь различных аспектов команды c удовлетворенностью результатом и процессом работы в проектных командах

Vasiliev F., Vasilieva E., Мир психологии. Научно-методический журнал 2026 Т. 2 № 125 С. 113–125

This study examines how various team aspects influence their evaluation of teamwork outcomes. The research was conducted on a targeted sample in Russian organizations. Using structural equation modeling, three groups of factors were identified: team aspects that affected cognitive evaluation, emotional evaluation of work, and satisfaction with the work process (team reputation and commitment); team aspects that did not ...

Added: July 22, 2026

Ценностные основания классической и восстановительной медиации: сравнительный анализ двух моделей посредничества

Грудников Н. С., Пастухова Е. Г., Психология и право 2026 Т. 16 № 2 С. 198–214

Context and relevance. Mediation is a widespread form of conflict resolution with its own unique principles. Restorative justice, as a new paradigm in criminal justice, utilizes mediation as a form of implementation while adapting it to its own core concepts. Objective. The study aims to identify the differences between the value foundations of classical and ...

Added: July 22, 2026

Creative vs Routine: потенциал спроса на высшее образование в России

Abankina I., Зиньковский К. В., Креативные индустрии 2025 Т. 1 № 1 С. 53–66

The article presents an analysis of the demand for higher education in various fields of study in the context of public policy, the study is based on data from the HSE research “Monitoring of reception quality” for 2018–2024. The article proves that higher education in the creative industries sector directly depends on the income level ...

Added: July 20, 2026

Тренды спроса и предложения в высшем образовании

Abankina I., Зиньковский К. В., Журнал Новой экономической ассоциации 2025 Т. 3 № 68 С. 299–308

The article presents an analysis of the demand and supply in higher education in various fi elds in the context of public policy. By studying the demand from families for various higher education programs, three areas of public policy are identifi ed that directly affect the supply and demand in higher education. The fi rst ...

Added: July 20, 2026

Нейропсихоаналитический взгляд на расстройства зависимости

Sokolova A., Журнал клинического и прикладного психоанализа 2024 Т. 1 С. 66–84

В статье рассматриваются взгляды нейропсихоанализа на этиологию расстройств зависимости от психоактивных веществ. Нейропсихоанализ исследует взаимосвязь между последними достижениями в нейронауках и психоаналитическими моделями сознания. Он пересматривает взгляды психоанализа на нарушения развития и функционирования человека на основе нового понимания работы головного мозга. Воззрения нейропсихоанализа на этиологию расстройств зависимости (РЗ) вырастают из аффективной нейронауки и семи эмоциональных ...

Added: July 18, 2026

Working Through Resistance: A Neuropsychoanalytic Model of Defensive and Structural Resistance to Change

Sokolova A., Psychoanalytic Psychology 2026

Version:0.9 StartHTML:0000000105 EndHTML:0000003224 StartFragment:0000000141 EndFragment:0000003184 Resistance to change remains central to psychoanalytic practice: Patients often maintain dysfunctional beliefs and relational patterns even when they recognize their destructive nature. While psychoanalysis has richly described resistance as a defense against painful affect and threats to internal object relations, the process-level dynamics through which resistance persists over time have remained incompletely articulated. This ...

Added: July 18, 2026

Английский язык для студентов педагогических вузов. = English for Pre-Service Teachers (B2-C1)

Stognieva O., Новикова В. П., М.: Флинта, 2026.

Инновационный курс английского языка для специальных целей для студентов педагогических вузов предлагает погружение в актуальный образовательный дискурс: от вопросов воспитания и когнитивного развития детей и подростков до переосмысления роли школы в цифровую эпоху. Содержательной основой курса выступают аутентичные мультимодальные материалы, позволяющие анализировать глобальные тренды современных образовательных систем и подходов. Издание идеально подходит вузам, стремящимся подготовить ...

Added: July 16, 2026

Personality as attribution: Report-based traits and the problem of what personality research measures

Shchebetenko S., / Series PsyArXiv "PsyArXiv". 2026.

Personality psychology owes much of its cumulative success to self- and informant-report assessment. Report-based trait constructs are reliable, replicable, predictive, and useful for organising a wide range of findings. This paper argues that their success should be understood not only methodologically, but also theoretically. I propose the attributional model of personality reporting, according to which self- and informant-reported traits are ...

Added: July 16, 2026

Shared environment, shared mechanisms: comparing pathways to mental health outcomes among indigenous youth and youth with other ethnic backgrounds

Arina Bukina, Eritsyan K., Antonova N. et al., Frontiers in Psychology 2026 Vol. 17 Article 1824428

Background: Numerous studies have shown that indigenous populations experience poorer health outcomes compared to people with other backgrounds. However, the interpretation of these disparities remains challenging due to differences in living conditions and social environments, as well as by potential measurement-related biases. Little is known about whether the underlying mechanisms of mental health outcomes differ between ...

Added: July 15, 2026

Cognitive Distortions Are Not Logical Errors: A Conceptual Clarification

Denisova V., Petrović N., Journal of Rational - Emotive and Cognitive - Behavior Therapy 2026 Vol. 44 No. 35 P. 1–16

Cognitive distortions are routinely described in cognitive-behavioral (CBT) and rational-emotive behavior (REBT) therapies as “errors in logic” or “illogical thinking.” While this terminology is pedagogically convenient, it often obscures a crucial conceptual distinction between violations of logical inference and problems related to the justification, scope, or evaluative force of belief content. In this paper, we do not ...

Added: July 15, 2026

Феномен субъектности в высшем образовании: теоретико-методологические подходы и практические аспекты развития

Shmelev I., Мир психологии. Научно-методический журнал 2026 Т. 125 № 2 С. 269–287

The article is devoted to the study of subjectness phenomenon in higher education. Subjectness is considered as a person’s ability for active creative actions, self-determination, and transformation of reality. The main theoretical approaches to understanding subjectness 269in psychological and pedagogical discourse are analyzed. Techniques for developing students’ subjectivity are presented, including coaching, techniques for developing critical thinking, dialogic learning, collaborative ...

Added: July 15, 2026

Укрепление суверенитета стран Африки в контексте формирования многополярного мира

Абрамова И. О., Аду Я. Н., Анише Т. Э. et al., Институт Африки РАН, 2025.

Данная монография посвящена анализу концепции суверенитета африканских государств как в теоретическом, так и практическом ключе. Рассматриваются экономический, ментальный, языковой и культурный, информационный и образовательный аспекты африканского суверенитета. Исследуется роль региональных интеграционных группировок, а также внешних акторов в контексте суверенизации Африки. Особое внимание уделяется роли РФ в укре- плении суверенитета стран континента. Монография формирует целостное представление по ...

Added: July 13, 2026

Integrative profiling of glymphatic dysfunction in adolescent subthreshold depression

Myachykov A., Qiwei G., Ruisi W. et al., Journal of Affective Disorders 2026 Vol. 412 Article 122110

Subthreshold depression (StD) in adolescence is clinically important, but its neurobiological substrates remain unclear. We examined whether adolescents with StD show multimodal MRI alterations related to glymphatic function. ...

Added: July 13, 2026

The Temporal Flow of Interaction: Interpersonal Synchronization and Alliance Perception Across Session Parts

Galina Oreshina, Journal of Nonverbal Behavior 2026 Article 10919

Interpersonal synchrony is increasingly conceptualized not as a static marker of rapport but as a dynamic, context-sensitive process that fluctuates across interactional phases to serve distinct relational functions. This study examined whether movement synchrony—absolute, non-absolute, client-leading, and counsellor-leading—was associated with therapeutic alliance in naturally occurring counselling sessions. Twenty-seven video recordings were analysed using motion energy ...

Added: July 13, 2026

Метакогнитивная регуляция как фактор, влияющий на эффективность обучения в условиях применения цифровых образовательных технологий: систематический обзор литературы

Samoilov O., Морозов З. А., Петухова Д. Р. et al., Психология человека в образовании 2023 Т. 4 № 5 С. 519–535

Introduction. The article presents a systematic review of psychological research focusing on metacognitive regulation of learning effectiveness through digital educational technologies (DET). Digital technologies are often used in education. Over the past years, self-regulated learning, i.e., the practice of using digital technologies to manage learning activities, has been gaining momentum. However, the specifics of such ...

Added: July 13, 2026

AI writes, we collaborate—or vice versa? Group strategies for using generative AI in collaborative writing assignments

Korchak A., Costley J., Fanguy M., COMPUTERS AND EDUCATION OPEN 2026 Vol. 11 Article 100390

Generative AI (Gen-AI) tools have gained immense popularity among university students, offering advantages such as reduced time for assignment preparation, enhanced product quality, and accelerated learning. When combined with the benefits of collaborative learning (CL), the group use of Gen-AI holds significant potential. The present study aims to explore how students use Gen-AI during the ...

Added: July 13, 2026

Prompt Design for GPT-4 Assessments of EFL Student Reports

Stognieva O., Murashova N., Journal of Asia TEFL 2026 Vol. 23 No. 2 P. 490–505

This study investigates the impact of different prompt design strategies on the performance of GPT-4 in assessing undergraduate reports within an English as a Foreign Language (EFL) context. As Large Language Models (LLMs) increasingly integrate into educational assessment, understanding how prompt engineering affects grading accuracy and alignment with human judgment is crucial. Three prompt design methods—TELeR Taxonomy, Six strategies ...

Added: July 12, 2026

Development and validation of the Mental Health Self-Care Scale for the general adult population

Mikhaylova O., Zhyrgalbek J., Sofia Yanis et al., Public Health 2026 Vol. 258 Article 106405

Objectives: To develop and psychometrically validate the Mental Health Self-Care Scale (MHSCS) for assessing mental health self-care behaviours in the general adult population. Study design: Cross-sectional survey. Methods: A 62-item initial pool grounded in the updated Middle Range Theory of Self-Care was refined through expert review (n = 11) and cognitive interviewing (n = 24), then administered to 600 Russian ...

Added: June 25, 2026

Use Case 5: LLM-driven creation of natural hazard geodatabase from digital mass media

Derkacheva A., Sakirkina M., Kraev G. et al., , in: AI for good innovate for impact report 2025.: Geneva: International Telecommunication Union, 2025. P. 167–169.

Added: May 26, 2026

Об идеологических предвзятостях генеративного ИИ: Российско-украинский конфликт в репрезентации ChatGPT

Baysha O., Trofimov V., Российская школа связей с общественностью 2026 № 40 С. 171–191

A growing number of scholars are warning about the dangers of the reproduction by generative AI of socio-political and ideological biases absorbed by models from the texts on which they were trained. If a given model was trained on Western media texts, it may generate narratives that reproduce West centric views of world events. This ...

Added: April 21, 2026

Сопоставление номенклатур товаров ресторанов и поставщиков с помощью LLM — Case Study для ресторанного холдинга

Jin S., Panfilov P., Сулейкин А. С., Труды Института системного программирования РАН 2025 Т. 37 № 6 С. 163–176

In the modern restaurant business, accurate mapping of product nomenclatures between restaurants and suppliers is a critical task. Effective inventory management and procurement optimization directly impact business profitability. With the increase in suppliers and product variety, traditional mapping methods become less efficient. This study proposes using large language models (LLM) to automate and improve the ...

Added: April 17, 2026