Digital Assessment Research: Marking Consistency

View all tags

We believe that exams should be a fair and accurate reflection of students’ performance – regardless of whether an exam is taken onscreen or on paper.

That’s why our research programme not only focuses on comparability of student performance, but also explores if and how marking may compare between paper and digital exams formats.

At a glance

  • We asked markers to score handwritten and typed versions of the same exam answers to evaluate consistency of marking across exam formats.
  • Findings show high consistency between marking across answer formats – supporting fairness and reliability across paper-based and onscreen exam modes.
  • Insights from interviews and surveys highlighted opportunities to further enhance marker training.
  • We’ll continue to broaden our research to validate findings and share our findings to support how we and the wider sector evolve digital assessments.  

What was the research about?

Robust research and insights from schools and colleges across the globe are at the heart of us making onscreen GCSE, International GCSE and International A level exam options available alongside paper-based formats since 2022.

It’s crucial that exams in paper and digital formats are fair and marked consistently. Doing so not only delivers on our commitment to students but also ensures that we meet regulatory requirements and can confidently offer a choice of exam formats that schools, colleges and the education community can trust.

That’s why our research study focused on:

  • evaluating the consistency of marking across handwritten and typed exam scripts
  • identifying practical opportunities to support comparability and fairness between paper-based and onscreen assessments.

By doing so, we’re better able to understand and ensure comparability between assessment formats and gain practical insights that can continue to support dual-mode exam delivery.

How was the research conducted?

We used a two-phase mixed-method design for this research.

Phase 1: quantitative data to help assess the consistency of markers across formats

  • We selected 72 International GCSE English Language A exam scripts from summer 2022 across genders and ability ranges. We manually digitised these scripts to create typed versions with quality assurance to ensure accurate transcription.
  • We asked ten experienced markers to score the identical exam scripts in handwritten and typed formats.
  • Using a within-subjects experimental setup design,1 each marker was randomly allocated two marking sessions. Half of the markers scored the handwritten scripts and then typed ones; the other half started with typed scripts, followed by handwritten ones.
  • There was at least a one-week gap between the two marking sessions to prevent memory-based influences on marking.

Phase 2: qualitative data to explore marking experiences across modes

  • Post-trial surveys and semi-structured interviews were conducted to gather markers’ experiences across handwritten and typed responses. These provided qualitative data to complement and interpret the quantitative findings.

What were the key findings?

The study indicated a high level of consistency in marking between the handwritten and typed exam scripts.

  • Overall consistency – marks were generally consistent across handwritten and typed formats. After applying statistical controls2 in our linear mixed-effects model, our results showed that exam script format, gender, and student ability did not significantly affect marks.

We did find some nuances that could provide springboards for further investigation:

  • Qualitative insights – from our Phase 2 interviews, markers self-reported some factors that could impact marking consistency across both typed and handwritten scripts – including the length and legibility of answers (e.g. typed answers seem shorter). We’ve already taken next steps to address this (see 'What are Pearson's next steps?' section).
  • Question-specific trends – marking of the large-tariff question3 showed small variances, with handwritten responses sometimes scoring slightly higher – in line with the above.

What do these findings mean for Pearson and the wider sector?

This research significantly advances our understanding of marker consistency in high-stakes exams for International GCSE English Language exams. It also forms the basis of a foundational study that illuminates factors that can impact marking consistency and is a springboard for more detailed investigations.

What further research is needed?

Given the nature of this study and its focus on a specific subject and a small group of markers, broader research is essential.

Our findings underscore the need to further consider variables such as student characteristics, gender, ability levels, and physical aspects of the student scripts like handwriting and spacing. This knowledge is crucial for further refining assessment practices and continuing to ensure that students' grades reflect their true capabilities – regardless of the medium of their responses.

Future studies should include a more diverse pool of examiners to better represent the spectrum of marking behaviours across different levels of expertise. Additionally, this research is developed for International GCSE English Language A, and the findings may not necessarily be generalised to other non-essay-based subjects. As such, future research should take place across subjects.

What are Pearson’s next steps on this?

This research is part of a comprehensive series of studies considering various aspects of onscreen exams – from accessibility and inclusion through to comparability and feedback from teachers and students.

As such, we’ll not only be looking at this study in isolation but in relation to our wider research and evidence base as we create a full and informed picture of assessment opportunities.

Our next steps include (but are not limited to):

  • validating our research findings by replicating the experiments with a broader range of markers with different levels of marking experience and evaluating marking consistency across formats for a wider range of subjects.
  • enhancing marker training by sharing our findings with markers at training sessions and using these insights to support further guidance and best practice (such as encouraging marking focus on the quality of content rather than the perceived length of responses across all formats).
  • informing sector-wide conversations and recommendations – we’ll continue to share our research and use the insights gained to inform recommendations that can improve assessment practices and ensure fair and consistent marking across all exam modes.

While the study can give the education community confidence in digital and paper exams, we’ll continue to research and refine assessment practices so we can help uphold rigorous standards and ensure that students can best show what they know can do in exams and be recognised accordingly.

References

1. within-subject design: in a ‘within-subject’ designed experiment, participants in the same sample are exposed to the same treatments, with an aim of measuring changes resulting from different treatments for outcomes, such as typed and handwritten responses.

2. statistically control: to "control for" certain variables means to account for their potential influence on the outcomes of an analysis. By controlling for variables, researchers isolate the effect of the main variable of interest, minimising the impact of these other variables. This helps to ensure that any observed relationships or effects are not confounded by external factors.

3. large-tariff question: within the discipline of English Language, the large-tariff question usually asks students to compare and evaluate texts critically and more in-depth than a short answer.

About the research

Date: November 2024

Authors: Dr Liyuan Liu (Senior Assessment Researcher, Pearson), Dr Ana Ulicheva (Principal Researcher, Pearson), and Hayley Dalton (Head of Vocational and Assessment Research, Pearson)

Citation: Liu, L., Ulicheva, A. and Dalton, H. (2024) Digital Assessment Research: Marking Consistency. Available at: https://www.pearson.com/en-gb/schools/insights-and-events/schools-blog/2024/12/marketing-consistency-research.html (Accessed: 22 Nov 2024)

You may also be interested in...

Onscreen exam updates

Would you like to be the first to know about our latest digital assessment news and ways you can get involved in research?

Sign up for updates

Share your thoughts

Get in touch with us through our social media channels to have your say:

X (Twitter) LinkedIn Facebook