Automatic generation of physics items with Large Language Models (LLMs)

Moses Oluoke Omopekunola; Elena Yu. Kardanova

doi:10.21831/reid.v10i2.76864

Publications

?

Automatic generation of physics items with Large Language Models (LLMs)

REID (Research and Evaluation in Education). 2024. Vol. 10. No. 2. P. 168–185.

Moses Oluoke Omopekunola, Elena Yu. Kardanova

High-quality items are essential for producing reliable and valid assessments, offering valuable insights for decision-making processes. As the demand for items with strong psychometric properties increases for both summative and formative assessments, automatic item generation (AIG) has gained prominence. Research highlights the potential of large language models (LLMs) in the AIG process, noting the positive impact of generative AI tools like ChatGPT on educational assessments, recognized for their ability to generate various item types across different languages and subjects. This study fills a research gap by exploring how AI-generated items in secondary/high school physics aligned with educational taxonomy. It utilizes Bloom's taxonomy, a well-known framework for designing and categorizing assessment items across various cognitive levels, from low to high. It focuses on a preliminary assessment of LLMs ability to generate physics items that match the Bloom’s taxonomy application level. Two leading LLMs, ChatGPT (GPT-4) and Gemini, were chosen for their strong performance in creating high-quality educational content. The research utilized various prompts to generate items at different cognitive levels based on Bloom's taxonomy. These items were assessed using multiple criteria: clarity, accuracy, absence of misleading content, appropriate complexity, correct language use, alignment with the intended level of Bloom's taxonomy, solvability, and assurance of a single correct answer. The findings indicated that both ChatGPT and Gemini were skilled at generating physics assessment items, though their effectiveness varied based on the prompting methods used. Instructional prompts, particularly, resulted in excellent outputs from both models, producing items that were clear, precise, and consistently aligned with the Application level of Bloom's taxonomy.

Research target: Education

Keywords: Bloom’s taxonomy LLM ChatGPT AIG Gemini Physics items

Heritability of Functional Literacy: Evidence from a Classical Twin Design

Kolachev N., Kovaleva G., Behavior Genetics 2026

Functional literacy—the ability to apply reading, mathematical, and scientific knowledge in authentic contexts as operationalized by the PISA framework—is a key predictor of educational attainment, labour-market outcomes, and economic growth. Despite extensive behavioral-genetic research on cognitive ability, the heritability of competency-based literacy measures remains largely unexamined, particularly outside Western populations. The present study addresses this ...

Added: May 26, 2026

Menstrual health as a public concern in Ghana: A conceptual review of symbolic and structural ownership

Вохойие Х., Anikin V. A., Tomsk State University Journal of Philosophy, Sociology and Political Science 2026 Vol. 90 P. 204–211

Menstrual health serves as a critical litmus test for state intervention in contexts where women’s bodily experiences are historically shaped by stigma, cultural taboos, and structural violence. Drawing on Joseph Gusfield’s theory of social problems, this review traces the moral passage of menstruation in Ghana (1992–2025) from a privately managed, stigmatized phenomenon to a matter ...

Added: May 26, 2026

Optimizing Computational Infrastructure for Large Language Models in Bioinformatics: A Case Study

Beknazarov N., , in: Parallel Computational Technologies, 19th International Conference, PCT 2025, Moscow, Russia, April 8–10, 2025, Revised Selected Papers. (CCIS, volume 2891)Vol. 2891.: Springer, 2026. P. 3–16.

This paper addresses the challenge of efficiently training Large Language Models (LLMs) on large-scale, sparse omics datasets in high-performance computing (HPC) environments. Using over 1000 BED tracks as a representative data source, we propose a method combining interval-based chunked storage, sparse matrix transformation, and parallel data loading, integrated within a PyTorch Lightning training framework. Our ...

Added: May 19, 2026

Проблемы и перспективы реализации билингвальных образовательных программ в Республике Сербия

Zamkovaya M., Педагогика и психология образования 2026 № 1 С. 57–67

В статье рассматривается процесс становления системы билингвального образования в Республике Сербия с акцентом на его цели, достижения и существующие вызовы. Отмечается, что несмотря на положительные результаты, образовательные учреждения сталкиваются с рядом проблем, среди которых недостаточное финансирование, нехватка квалифицированных специалистов и отсутствие четкой нормативно-правовой базы, регламентирующей процесс билингвального обучения. В статье также анализируются результаты эмпирических исследований, ...

Added: May 18, 2026

Мама, папа, дайте денег! Как воспитать у детей разумное отношение к финансам?

Андреева О. С., Просвещение, 2023.

Эта книга о том, как воспитать финансово грамотных детей. Не так важно, какой уровень дохода в вашей семье: умение зарабатывать, сберегать и тратить поможет вашим детям достичь благополучия в будущем. Все необходимые навыки для реальной сложной и порой трудной жизни вы должны вложить в детей сами. Эта книга – пособие, которое поможет родителям понятно, последовательно и ...

Added: May 18, 2026

"Деньги. 250 фактов. Энциклопедия"

Андреева О. С., Росмэн-Пресс, 2026.

«Деньги. Энциклопедия российского школьника» кратко и емко расскажет: · Чем в древности платили за товары · Как выглядели первые деньги, из чего их делали и где хранили · Как найти клад и можно ли выиграть в лотерею · Как работает банк · Как дети и подростки могут зарабатывать и распоряжаться личным бюджетом · И еще сотни интересных научно-обоснованных фактов Книга поможет мальчикам ...

Added: May 18, 2026

Мама, папа, научите! Как управлять деньгами и не наделать ошибок : руководство по финансовому воспитанию для родителей приёмных детей

Андреева О. С., Просвещение, 2024.

В книге собраны рекомендации и практические советы приёмным родителям, с помощью которых можно сформировать у детей правильное отношение к денежным средствам, рассказывается, какими словами объяснять понятные взрослым истины относительно личных и семейных финансов ребёнку любого возраста. Одной из задач книги стала необходимость объяснения этого феномена — потребительского отношения приёмных детей к труду замещающих родителей, к ...

Added: May 18, 2026

Do thesis topics matter? How thesis topic characteristics relate to doctoral experience and self-confidence in defence

Pavliuk D., Higher Education 2026

The literature on doctoral students’ experience rarely examines the importance of thesis topic characteristics, even though this is one of the central decisions both at the start and throughout the doctoral journey. Although studies examine how students choose their thesis topics, there is little research on how different topic characteristics are linked to doctoral experience ...

Added: May 16, 2026

Differences and associations between students’ and teachers’ intelligence: evidence relies on a PISA-based cognitive assessment tool

Kolachev N., Kovaleva G., Educational Research and Evaluation 2026 P. 1–24

This study investigates the psychometric structure and school-level associations between cognitive abilities of teachers and students. Using a cross-sectional quantitative design, we administered a PISA-based cognitive assessment tool measuring reading, mathematics, science literacy, and global competence to 5,391 eighth-grade students and 2,385 teachers from 84 schools in a Russian region. Bifactor modeling, measurement invariance testing, ...

Added: May 15, 2026

Архимед: научно-методический сборник

М.: ООО «Макс Пресс», 2026.

В настоящем сборнике представлены тезисы докладов участников семинара "Интеграция основного и дополнительного физико-математического образования", проходившего 11 февраля 2026 года в ГБОУ Школа №2007 ФМШ г. москвы, а также другие публикации, посвящённые вопросам дополнительного физико-математического образования. ...

Added: May 11, 2026

ЗАРУБЕЖНЫЙ ОПЫТ ФОРМИРОВАНИЯ УЧЕНИЧЕСКИХ НАУЧНЫХ ОБЩЕСТВ

Киселева Н. А., Михайлов К. А., Чернилевская О. Н. et al., Отечественная и зарубежная педагогика 2024 Т. 1 № 4 С. 109–123

The article is devoted to the study of foreign experience in the field of development of scientific knowledge and design and research skills of schoolchildren in the scientific community of students. The authors studied and analyzed foreign experience in organizing scientific communities of students, which demonstrates specific practical practices. As part of this work, the ...

Added: May 11, 2026

Современная модель ученических научных обществ в московской системе образования

Ольшанская С. С., Чернилевская О. Н., Михайлов К. А. et al., Научные исследования и разработки. Социально-гуманитарные исследования и технологии 2024 Т. 13 № 4 С. 17–24

The article is devoted to the study of development of scientific cognition of modern schoolchildren. Based on the analysis of the situation and taking into account the peculiarities of successful Russian and international practices, the authors of the article have designed a model of scientific community of students in Moscow. ...

Added: May 11, 2026

Comparative Analysis of Students’ Perceptions of Programming Puzzles: Parson’s and Wordle-Like

Varnavsky A., IEEE Access 2026 Vol. 14 P. 37487–37508

Puzzles are an excellent tool for learning computer science and programming, fostering increased interest, engagement, and motivation among students, as well as developing logical, critical, and computational thinking. Among beginner programmers, Parson's Programming Puzzles are quite popular, aimed at mastering the basic syntactic and logical constructs of programming languages. However, as students' skills grow, their ...

Added: May 7, 2026

Системные барьеры и многоуровневое сотрудничество в инклюзивном образовании: взгляды заинтересованных сторон из Нигерии

Абимбола О. А., Балогун А. Д., Мир психологии. Научно-методический журнал 2025 № 3(122) С. 200–211

Children with learning disabilities in Nigeria face systemic exclusion from quality education due to fragmented collaboration between educators, caregivers and social policymakers. This study examines collaborative approaches that support children with learning disabilities and identifies systemic barriers to the effectiveness of these approaches. 45 semi-structured interviews and focus groups of stakeholders from urban and rural ...

Added: May 6, 2026

Factors promoting and hindering inclusive education for learners with disabilities in low- and lower-middle-income countries: a comparative review

Абимбола О. А., Психологическая наука и образование 2026 Vol. 31 No. 2 P. 244–255

Context and relevance. The article discusses the lack of scientific knowledge about the barriers and factors of systemic failure in adapting inclusive education models to the ethnolinguistic diversity and acute resource constraints characteristic of low- and middle-income countries (LMICs). Objective. The aim is to identify key factors that promote and hinder the implementation of inclusive ...

Added: May 6, 2026

Оценка критического мышления студентов университета: вклад предметного знания в результаты тестирования

Talov D., Tarasova K., Высшее образование в России 2026 Т. 35 № 4 С. 85–104

The development of critical thinking is currently regarded as one of the key objectives of higher education. In this context, there is a growing number of instruments aimed at assessing critical thinking in domain-specific and more authentic educational settings. However, considerably less attention has been paid to the question of the extent to which the ...

Added: May 6, 2026

Об идеологических предвзятостях генеративного ИИ: Российско-украинский конфликт в репрезентации ChatGPT

Baysha O., Trofimov V., Российская школа связей с общественностью 2026 № 40 С. 171–191

A growing number of scholars are warning about the dangers of the reproduction by generative AI of socio-political and ideological biases absorbed by models from the texts on which they were trained. If a given model was trained on Western media texts, it may generate narratives that reproduce West centric views of world events. This ...

Added: April 21, 2026

When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs

Seleznyov M., Chaichuk M., Ershov G. et al., , in: Findings of the Association for Computational Linguistics: EMNLP 2025.: Association for Computational Linguistics, 2025. P. 20370–20385.

Large Language Models (LLMs) are highly sensitive to subtle, non-semantic variations in prompt phrasing and formatting. In this work, we present the first systematic evaluation of 4 methods for improving prompt robustness within a unified experimental framework. We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural ...

Added: February 3, 2026

Measuring Chemical LLM robustness to molecular representations: a SMILES variation-based framework

Ganeeva V., Khrabrov K., Kadurin A. et al., Journal of Cheminformatics 2025 No. 17 Article 164

The recent integration of natural language processing into chemistry has advanced drug discovery. Molecule representations in language models (LMs) are crucial to enhance chemical understanding. We explored the ability of models to match the same chemical structures despite their different representations. Recognizing the same substance in different representations is an important component of emulating the ...

Added: February 3, 2026

Aspect-Based Sentiment Analysis Using Large Language Models on Museum Visitor Reviews

Anastasia V. Kolmogorova, Elizaveta R. Kulikova, Vladislav V. Lobanov, Supercomputing Frontiers and Innovations 2025 Vol. 12 No. 3 P. 121–140

Museum reviews provide rich insight into visitor preferences and can drive useful change within institutions, yet they have attracted little attention in sentiment research owing to limited commercial interest and the multi-thematic nature of reviews. In this study we analysed over 12 000 reviews in Russian for 15 museum sites collected from nine different platforms. ...

Added: November 30, 2025

AutoJudge: Judge Decoding Without Manual Annotation

Roman Garipov, Fedor Velikonivtsev, Ivan Ermakov et al., , in: 39th Conference on Neural Information Processing Systems (NeurIPS 2025).: NeurIPS, 2025. P. 94605–94642.

We introduce AutoJudge, a method that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify the generated tokens that affect the downstream quality of the response, relaxing the distribution match guarantee so that the "unimportant" tokens can be generated faster.Our approach relies ...

Added: November 6, 2025

Strategizing with AI: Insights from a Beauty Contest Experiment

Iuliia Alekseenko, Dagaev D., Sofiia Paklina et al., Journal of Economic Behavior and Organization 2025 Vol. 240 Article 107330

Added: November 6, 2025

LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers

Anton R., Mikhalchuk M., Rahmatullaev T. et al., , in: Findings of the Association for Computational Linguistics: NAACL 2025.: Association for Computational Linguistics, 2025. P. 7757–7764.

We introduce methods to quantify how Large Language Models (LLMs) encode and store contextual information, revealing that tokens often seen as minor (e.g., determiners, punctuation) carry surprisingly high context. Notably, removing these tokens — especially stopwords, articles, and commas — consistently degrades performance on MMLU and BABILong-4k, even if removing only irrelevant tokens. Our analysis ...

Added: November 6, 2025

Исследования благополучия с помощью передовых методов обработки естественного языка (NLP): перспективы и ограничения

Voevodina E., Современная зарубежная психология 2025 Т. 14 № 3 С. 172–181

Context and relevance. Well-being research faces methodological limitations of conventional psychometric measures, criticized for poor ecological validity, limited information yield, and inadequate capture of multidimensional construct of well-being. Advanced natural language processing (NLP) technologies offer solutions to these constraints. Objective. To evaluate opportunities and challenges of transformer-based NLP for well-being research. Methods and materials. We conducted an analytical review of ...

Added: October 9, 2025