TY - JOUR
T1 - Validating automated assessments of teaching effectiveness using multimodal data
AU - Fütterer, Tim
AU - Hou, Ruikun
AU - Bühler, Babette
AU - Bozkir, Efe
AU - Bell, Courtney
AU - Kasneci, Enkelejda
AU - Gerjets, Peter
AU - Trautwein, Ulrich
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2026/2
Y1 - 2026/2
N2 - Background: For enhancing student learning in classrooms, high-quality teaching is essential. Research highlighted core dimensions of effective teaching, including classroom management, student support, and cognitive activation. However, traditional methods of assessing teaching effectiveness dimensions (e.g., student surveys) have limitations, including rating biases and resource intensiveness. Aims: To overcome these challenges, we explored machine learning (ML) algorithms for the automated assessment of teaching effectiveness. Sample: The study analyzed multimodal data—such as video, audio, and transcripts—from the Global Teaching Insights study, which included video recordings and transcripts from 46 teachers and 1,132 students in Germany. Method: Scores for 18 teaching effectiveness subdimensions from three core dimensions were automatically generated by training attention-based ML models on multimodal features extracted from pretrained encoders. These ML-generated scores were compared with scores provided by human experts. A content validity study was conducted, where human experts evaluated the plausibility of ML-generated scores against human-generated scores. Structural equation models were used to assess the relationship between teaching effectiveness subdimensions and students’ tested achievement. Results: ML-generated scores were more reliable for some subdimensions (e.g., nature of discourse), and they were also plausible and content valid. ML-generated scores achieved higher absolute accuracy than human scores in 11 of 18 subdimensions. Limitations include reliance on human ratings as ground truth and inconsistent predictive validity, underscoring the need for refined models to generate actionable insights, such as real-time feedback systems. Conclusions: The findings provide valuable insights for the development of automated feedback, enhancing the practical application of teaching effectiveness assessments.
AB - Background: For enhancing student learning in classrooms, high-quality teaching is essential. Research highlighted core dimensions of effective teaching, including classroom management, student support, and cognitive activation. However, traditional methods of assessing teaching effectiveness dimensions (e.g., student surveys) have limitations, including rating biases and resource intensiveness. Aims: To overcome these challenges, we explored machine learning (ML) algorithms for the automated assessment of teaching effectiveness. Sample: The study analyzed multimodal data—such as video, audio, and transcripts—from the Global Teaching Insights study, which included video recordings and transcripts from 46 teachers and 1,132 students in Germany. Method: Scores for 18 teaching effectiveness subdimensions from three core dimensions were automatically generated by training attention-based ML models on multimodal features extracted from pretrained encoders. These ML-generated scores were compared with scores provided by human experts. A content validity study was conducted, where human experts evaluated the plausibility of ML-generated scores against human-generated scores. Structural equation models were used to assess the relationship between teaching effectiveness subdimensions and students’ tested achievement. Results: ML-generated scores were more reliable for some subdimensions (e.g., nature of discourse), and they were also plausible and content valid. ML-generated scores achieved higher absolute accuracy than human scores in 11 of 18 subdimensions. Limitations include reliance on human ratings as ground truth and inconsistent predictive validity, underscoring the need for refined models to generate actionable insights, such as real-time feedback systems. Conclusions: The findings provide valuable insights for the development of automated feedback, enhancing the practical application of teaching effectiveness assessments.
KW - Artificial intelligence
KW - Automated assessment
KW - Machine learning
KW - Multimodal data
KW - Teaching effectiveness
UR - https://www.scopus.com/pages/publications/105021656258
U2 - 10.1016/j.learninstruc.2025.102264
DO - 10.1016/j.learninstruc.2025.102264
M3 - Article
AN - SCOPUS:105021656258
SN - 0959-4752
VL - 101
JO - Learning and Instruction
JF - Learning and Instruction
M1 - 102264
ER -