AI Education Alert: Mock Marking, AI Diagnosis & Anthropic's Mythos Insights

Hey {{ first_name | human }},

If you are back in school, I hope you had a strong start to the term. If you still have a week off then enjoy it to the max.

TL;DR: The 60 Second briefing

⚡️AI Marks Mocks: A news site published a case study of a school that used AI to mark GCSE mocks.

🧪80%: According to a new study, common LLMs fail 80% of the time when used to self-diagnose.

🚨Mythos: This is a new model by Anthropic that is apparently so powerful, they are not releasing it to the public. Rather, they are providing access to select partners to use it to bolster internet security.

📚 AI+education news

⚡️The School Using AI to Mark Mocks > What it is: ⚡️A Yorkshire secondary school used AI to mark mock GCSEs in subjective subjects with longer-form answers.

Why this matters: Case studies are a useful tool to hear about first-hand experiences and learning from any initial implementation errors is always going to be beneficial for schools who seek to replicate or implement similar projects. Two things that stand out for me are:
1. The acknowledgement that there was an initial workload increase.
2. The misconception of the headteacher around bias. While any bias based on the knowledge of the pupils would not be present, the data sets for all commercial LLMs are based on human data which still contains biases.

🚨AI Tutors not living up to the hype > What it is: AI tutors, despite their hype, lack motivation from pupils, a fact that educators could have predicted before Khanmigo’s launch.

❝

“It doesn’t necessarily make students motivated to learn or fill in gaps in knowledge needed to ask questions.

Sal Khan

Why this matters: If you expect pupils to use AI tutors due to novelty, you may see initial uptake, but it’ll likely decrease quickly. Schools must consider how to encourage long-term use to ensure a return on investment.

🌍 Wider AI updates

🧪 AI Diagnosis > What it is: A new study revealed that large language models performed well in final diagnosis but struggled with differential diagnosis and uncertainty. Differential diagnosis involves evaluating plausible alternatives before selecting the most likely answer. In essence, the models were more effective at arriving at an answer than rigorously considering the alternatives.

Why it matters: Across all 21 models, differential diagnosis failure rates often exceeded 80%, while final diagnosis failure rates were typically below 40%. So although some models looked strong on the final answer, that often masked weaker reasoning underneath. That is why current off-the-shelf models are still not reliable enough for safe clinical use.

🚨Project Glasswing > Anthropic has unveiled Project Glasswing, a cross-industry cybersecurity initiative centred around the unreleased Claude Mythos Preview model. The model can identify and exploit software vulnerabilities at a level comparable to or surpassing human experts, and partners will utilise it for defensive security work on critical software and open-source systems.

Why it matters: Why it matters: The headline isn’t just ”AI can help with cyber”. Leading labs think frontier models are close to changing the balance between attackers and defenders. This matters for education indirectly: schools are part of this wider digital infrastructure. As AI systems become more capable, the important questions are not only about lesson planning and marking, but also about resilience, procurement, privacy, and security.
Do this next: Review any cybersecurity protocols and policies that you may have to ensure that they are up-to-date and that staff know these policies.

🎯Prompt/Tip:

A short quality of life tip here: If you ask ChatGPT to help draft an email, then it will appear in this user interface. On a mobile device, if you click the ‘paper aeroplane’ icon, it will open up your default mail client and populate the subject and body of the email saving you time having to select, copy and paste between two applications.

‘Till next week.

Mr A 🦾

Help a colleague save time by sharing this newsletter; distributing these ideas helps a friend get home on time and keeps our energy focused on what matters most: great teaching.

Safety & Privacy Notice

The tools and workflows mentioned are intended for professional productivity and educational enhancement. Users must ensure that any AI implementation remains compliant with their local data protection regulations and institutional safeguarding policies.

Data Privacy: Do not enter personally identifiable information (PII), sensitive student records, or confidential institutional data into public AI models.
Verification Required: AI-generated content can be inaccurate, biased, or out of date. Always maintain a "human-in-the-loop" approach by reviewing and fact-checking all outputs before use.
Professional Judgement: These suggestions do not substitute for formal legal, clinical, or safeguarding advice. Final responsibility for accuracy and appropriateness remains with the professional user.

AI Marks Mocks

TL;DR: The 60 Second briefing

📚 AI+education news

🌍 Wider AI updates

🎯Prompt/Tip:

Safety & Privacy Notice

Reply

Keep Reading

AI for Schools

Home

Sponsors