Feasibility of using science scores from different educational assessments as proxy measures of reading literacy to measure and monitor SDG 4.1.1

Research output: Book/ReportCommissioned report

Abstract

This document evaluates several arguments related to the use of science scores as a proxy measure for reading literacy in the framework of measuring and monitoring SDG 4.1.1 at a global scale. The SDG indicator 4.1.1 measures the proportion of children and young people (a) in grades 2/3; (b) at the end of primary; and (c) at the end of lower secondary achieving at least a minimum proficiency level in (i) reading and (ii) mathematics, by sex. The arguments presented here are evaluated in relation to problems associated with the validity of these measures when scores from different subjects (e.g., literacy, science) are used interchangeably. Our evaluation is focused on three aspects: problems associated with differences in the conceptual framework on which the different tests are based; problems associated with the different interpretations that it is possible to make of the scores analysed, and problems associated with the relevant differences that are observed when these measures are correlated with student background factors such as gender.

Multiple efforts have been made to define standards that establish the quality of educational assessments. The main standards refer to the validity, reliability and fairness of the tests. The Standards for Educational and Psychological Testing defines validity as the “degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (EERA et al., 1999, 2014). This definition reminds us that one of the main aspects of validity has to do with what we intend to do with the results of the tests (i.e., the scores). In that sense, the validation process involves accumulating relevant evidence to provide a scientific basis for the proposed interpretation of the scores. In other words, “a clear articulation of each intended test score interpretation for a specified use should be set forth, and appropriate validity evidence in support of each intended interpretation should be provided”. Based on these definitions, a series of specific considerations regarding the theoretical framework and the possible uses of the scores from International Large-Scale Assessments (ILSA) to measure SDG 4.1.1 are derived. Arguably, the most relevant are the following:

-Conceptual framework: the conceptual definition of the construct(s) the test intends to assess. The construct or constructs that the test is intended to assess should be clearly described.

-Intended interpretation: test developers intended interpretation and use of test scores (e.g., disaggregation). A rationale should be presented for each intended interpretation of test scores for a given use.

-Correlation with relevant factors: the association of different test scores to a relevant background/sociodemographic factors (e.g., gender).
Original languageEnglish
Place of PublicationMontreal
PublisherUNESCO Institute for Statistics
Commissioning bodyUNESCO
Number of pages10
VolumeWG/GAML/6
Publication statusPublished - 24 Nov 2022
EventGlobal Alliance to Monitor Learning (GAML). Technical Cooperation Group 9th meeting -
Duration: 22 Nov 202224 Nov 2022
https://tcg.uis.unesco.org/9th-meeting-of-the-tcg/

Publication series

NameGlobal Alliance to Monitor Learning (GAML)

Keywords

  • SDG
  • sdg 4
  • Measurement

Fingerprint

Dive into the research topics of 'Feasibility of using science scores from different educational assessments as proxy measures of reading literacy to measure and monitor SDG 4.1.1'. Together they form a unique fingerprint.

Cite this