Время ответа в компьютерном адаптивном тестировании
The paper describes the ways to develop a computerized adaptive test using item response times as collateral information. The paper shows that introducing item response times in the measurement model has the same effect on the reliability of computerized adaptive tests as on the reliability of linear tests. Nonetheless, the presence of missing responses may bias the estimates of the ability.
REALEC (Vinogradova, 2016) is the first in the open access collection of English texts (mainly essays) written by students with Russian as their native language who are learning English at the university. The project team working with the corpus over the last two years have been developing computational tools to make the use of REALEC efficient for both students and their English instructors in preparation for the university EFL examination. This paper considers four tools designed to enhance corpus-mediated work in the classroom:
• easy access to the statistics of student errors in one text, in all texts written by the same author, or in all texts in a current folder, which provides for on-the-spot feedback on the quality of the text uploaded to the corpus;
• automated evaluation of lexical proficiency, which includes commonly used features such as length of words; length of sentences; distribution of words across the Common European Framework scale levels (A1-C2); use of academic vocabulary compared with one of the two lists - the Coxhead Academic Word List and in the Corpus of Contemporary American English; number of repetitions; use of linking words; use of collocations (as attested by the comparison with the Pearson academic collocation list);
• automated test-maker, which extracts sentences from the corpus and turns them into questions for placement and progress testing purposes;
• automated evaluation of syntactic complexity of the text which takes into account features such as mean sentence depth and the average number of relative and adverbial clauses.
The opportunity to get automated evaluation of the variety of syntactic means used in a student text is an important feature for both instructors and learners.
One way to take control over the effect of social desirability on respondent answers is to introduce social desirability scales into questionnaire. Social desirability scale included into the Teaching and Learning International Survey (TALIS) and used for the Russian-speaking sample of teachers was not cross-culturally adapted. Besides that, this tool is based on the Marlowe-Crowne Scale where the psychometric characteristics are assessed only according to the Classical Test Theory and have ambiguous results. To fill in the gap in our knowledge of validity of the TALIS scale of social desirability, the authors conducted a psychometric analysis using the Item Response Theory. The results showed good reliability, considerable unidimensionality, though poor scale functioning. Based on the obtained results including simulated data, measures to improve the quality of psychometric characteristics of the scale are proposed by the authors. The key findings concerning the structure of the social desirability construct are made.
School climate is one of the significant factors determining educational achievement. However, the lack of instruments to measure it has complicated the study of this concept in Russia. We review the history of the study of the concept of “school climate,” and we discuss approaches to how it can be defined. We describe the most widely used questionnaires for studying school climate and analyze the set of components that have been included in them. To conduct the empirical study, we chose the student questionnaire that is used in the PISA international study, which provides a theoretical basis for measuring a number of dimensions of school climate. We conducted a psychometric analysis using methods from confirmatory factor analysis and modern test theory. It turned out that the structure of the indices that are used to measure school climate is not what the framers of the questionnaire assumed it would be. It is unclear whether the questions reflect the school climate indicators that are specifically proposed in the questionnaires. Some of the judgments in the questionnaire have been worded in such a way as to elicit most students’ agreement or disagreement with them without revealing any differences in how students perceive the subject of the question. The answer categories are unbalanced for most of the judgments. Respondents tended to fill them out in a one-sided fashion. We propose steps for how the instrument can be further improved.
Formative assessments are an important component of massive open online courses (MOOCs), online courses with open access and unlimited student participation. Accurate conclusions on students’ proficiency via formative, however, face several challenges: (a) students are typically allowed to make several attempts; and (b) student performance might be affected by other variables, such as interest. Thus, neglecting effects of attempts and interest in proficiency evaluation might result in biased conclusions. In this study we try to solve this limitation and propose two extensions of the common psychometric model, the Rasch model, by including the effects of attempts and interest. We illustrate these extensions using real MOOC data and evaluate them using cross-validation. We found that (a) the effects of attempts and interest on the performance are positive on average but both vary among students; (b) a part of the variance in proficiency parameters is due to variation between students in the effect of interest; and (c) the overall accuracy of prediction of student’s item responses using the extensions is 4.3% higher than using the Rasch model.