
The research team asked four different large language models (LLMs) to review 600 essays from middle school students and give writing feedback. Next, they submitted each essay to the LLMs 12 more times, but added additional info on the writer’s race, gender, and motivation level, as well as whether they had a learning disability.
Compared to the advice given to boys, feedback for essays authored by girls received more praise and less constructive criticism, demonstrating both feedback withholding and positive feedback biases. Girls’ feedback often used first-person pronouns and affective language that positioned the AI model as personally engaged with the student’s work. AI models were also more likely to encourage girls to link their arguments to empathy, respect, and relational responsibility.
In contrast, feedback for essays authored by boys was more objective and task-oriented, focusing on evidence, reasoning, and clarity and demonstrating a level of trust for students to handle criticism without relational cushioning.
“Taken together, these patterns suggest that LLMs enact Marked Pedagogies guided by stereotypes rather than pedagogical best practices,” the authors conclude. “Instead of providing feedback at multiple levels of writing or consistently signaling constructive trust in students’ capacity to revise, LLMs differentially calibrate feedback based on presumed identities, judging not only students’ current ability, but also their capability — placing lower ceilings on growth for students of marked attributes and, by implication, constraining their educational futures.”


