Advisor: Dr. June Ahn

Toolbox: R, Figma, Qualtrics, HTML, CSS, JavaScript


The design challenge for this research is to quickly and effectively analyze "practical analytics", which is locally-grounded, education data that can more directly inform instructional improvements for teachers. Dual challenges arise in utilizing practical analytics:

(1) Such data are often expensive and time consuming to process at scale.

(2) Developing usable analytics that validly relate to actionable, improvement decisions is difficult to achieve in practice.

Observations of challenges in using practical analytics motivate my projects with Dr. June Ahn in the Design and Partnership Lab: develop systems to improve the processing of school-based education data, using a variety of human-computation and machine learning approaches.

Within this context, I designed an experiment to answer the questions:

(1) To what extent can crowdsourcing effectively process open-ended, middle-grade science assessment data?

(2) To what extent can crowdsourcing influence participants’ understanding of task-related concepts, compared to a control group who did not participate in crowdsourcing?

My Roles

  • Design the crowdsourcing training tasks with 168 participants, who were Education undergraduates with prior experience in assessment grading.
  • Apply learning sciences research to design the experimental conditions: (1) Grade & Explain. Participants grade and explain their scores; (2) Grade Only. Participants grade only; and (3) Control. Participants did not grade, but watched an instructional video about a related science concept.
  • Develop a coding rubric for the assessment tasks, and design the rubric to be clear enough for graders with no prior experience with the assessment.
  • Develop a website for the experiments, where participants completed a training tasks, got randomly assigned to a crowdsourcing condition, and completed pre and post-assessments.

  • Conduct multivariate analyses and Monte Carlo simulations to understand the impact of crowdsourcing, compared to expert scores.


Crowdsourced assessments can work!

Participants provided results with high accuracy and close agreement with expert scores.

In subsequent simulations, we found that aggregating scores from multiple crowdworkers consistently got us closer to the expert scores, suggesting potential benefit of utilizing collective intelligence in educational domains.

Crowdsourced participants learned from the tasks.

Compared to those who did not participate in the crowdsourcing tasks, we found positive learning outcomes for both of the Grading groups, and a significant outcome for the Grade Only group, after accounting for pre-test science understanding and attitudes towards science.

Opportunities for HCI and learning analytics research.

We did not find significant differences between the two grading conditions (Grade Only and Grade & Explain), contrary to prior work in learning sciences and crowdsourcing in general domains.

This suggests a need to attend to different task design features when it comes to education assessments, such as providing more timely feedback, segmenting task difficulties, or giving opportunities for peer assessments.

Because the crowdsourced scores may show wide deviation from one another, we are also looking into ways to represent variance in data output to end-users (i.e., teachers), to make more informed instructional decisions at scale.

The preliminary findings from our work can be found in a paper we presented at the Learning@Scale Conference' 20.