Can AI Pass the USMLE? What That Means for Medical Licensing Exams

Yes, AI models have scored at or above passing thresholds on USMLE Step 1 practice exams. One widely cited evaluation put a large language model at 86%, with no medical school, clinical rotations, or supervised patient care.

That result alone doesn't make licensing exams meaningless, but it does expose a real problem: medical assessment was built to verify what a human knows and can do unaided, and AI has made "unaided" a much harder condition to guarantee. New 2026 research on AI's effect on trainee critical thinking, combined with structural changes already underway at the USMLE, shows the field is starting to respond.

What exactly does it mean that AI can pass the USMLE?

For more than sixty years, licensing exams have been medicine's gold standard for competency verification, the mechanism we use to decide who is permitted to treat patients unsupervised. (Not to mention a major factor in determining residency competitiveness, but that’s another story for another day). When a generative AI model can score competitively on that exam from a standing start, it doesn't prove the AI understands medicine the way a physician does. It proves the exam format, multiple-choice recall and pattern-matching against a known answer key, is exactly the kind of task large language models are built to excel at.

Is this just about cheating?

Academic integrity concerns are real and well documented, including coordinated-cheating investigations across test centers in recent years, but framing this purely as a cheating problem undersells what's happening. The deeper issue is that AI is changing what test-takers know how to do before they ever sit down for the exam, which a proctoring camera can't detect.

While its unlikely AI tools could be used in proctored exam centers, these developments do raise the question of whether we are teaching the right things in medical education- continuing to focus on memorization and regurgitation of random facts, rather than laying the foundation for critical thinking and clinical reasoning.

What does 2026 research say about AI and clinical reasoning?

This is where the assessment crisis gets more concrete and more personal for educators. Coverage of a 2026 study on AI overuse among young physicians found measurable erosion of independent critical thinking tied to heavy, unscaffolded AI use during training. A systematic review of early experimental evidence on generative AI and clinical reasoning assessment reached a similarly mixed verdict: AI can be a powerful diagnostic aid, but its effect on a learner's reasoning depends entirely on how it's used.

The more encouraging counterpoint comes from an npj Digital Medicine study that followed 372 medical students over twelve months: more active participation in AI-assisted diagnosis was associated with gains in critical thinking, but only when mediated by AI literacy. In other words, AI didn't help or hurt reasoning on its own. Whether a student's AI literacy was strong enough to use the tool as a thinking partner rather than an answer machine determined the outcome. That single finding may be the most important data point to come out of this research in the past year, because it reframes the goal: the fix isn't banning AI from training, it's making sure AI literacy is taught before AI access is granted.

Is the USMLE itself changing?

Yes, though it's worth being precise about why. Starting with the 2026 testing cycle, the USMLE program restructured Step 1 from seven sixty-minute blocks into fourteen thirty-minute blocks, alongside broader test-delivery software updates and a January 2026 consolidation of registration processes between the NBME and FSMB. The stated rationale is testing-experience and delivery modernization, not an AI-specific countermeasure, but it's a useful signal that the exam infrastructure underpinning U.S. medical licensure is actively being rebuilt for the first time in years, at the exact moment AI is forcing every other part of assessment to be re-examined. Could we see free text response type items in the future?

So what should assessment actually measure now?

The recommendation gaining the most traction among medical educators is straightforward to state and hard to execute: grade the reasoning process, not just the final answer. That means assuming learners will use AI and designing assessments around that reality — structured case write-ups that show the differential-building steps, oral exams and direct observation that can't be outsourced, and OSCEs and supervised practical exams that test bedside communication, physical examination, and professional judgment in ways no chatbot can stand in for.

FREQUENTLY ASKED QUESTIONS

Q: Can AI actually pass the USMLE?

A: Yes. Evaluations of large language models on USMLE Step 1 practice questions have reported scores at or above typical passing thresholds, including one widely cited result of 86%, despite the model having no clinical training or supervised patient experience.

Q: Does AI use make medical students worse at clinical reasoning?

A: The evidence is mixed and depends heavily on how AI is used. Heavy, unscaffolded use is associated with measurable declines in independent critical thinking in some 2026 research, while a 372-student, twelve-month study found that active AI-assisted diagnosis was linked to gains in critical thinking when mediated by strong AI literacy.

Q: What is "grading the reasoning process"?

A: It's an assessment philosophy gaining traction in medical education that evaluates how a learner arrives at a clinical answer, through case write-ups, oral exams, and direct observation, rather than evaluating only the final answer, which AI can now generate convincingly on its own.

Q: Is the USMLE changing because of AI?

A: The USMLE program restructured Step 1's block format and updated test-delivery software and registration processes starting in the 2026 cycle. The stated reasons are testing-experience modernization rather than an AI-specific response, though the changes arrive amid broader pressure on assessment models industry-wide.

SOURCES & CITATIONS

• Medical Science Educator, 2025 (LLM performance on USMLE-style items).
• Generative artificial intelligence for clinical reasoning assessment in medical students: a systematic review of early experimental evidence. ScienceDirect, 2026.
• AI literacy mediates AI-assisted diagnosis participation and critical thinking among medical students under supervision. npj Digital Medicine, 2026.
• AI overuse undermines young doctors' critical thinking, study finds — coverage via ACDIS / Medscape / ICT&Health, 2026.
• USMLE 2026 Bulletin of Information — test format and registration updates, NBME/FSMB.
• AI Cheating and Assessment Integrity in 2026 — testing-security industry analysis.

Can AI Pass the USMLE? What That Means for Medical Licensing Exams

Yes, AI models have scored at or above passing thresholds on USMLE Step 1 practice exams. One widely cited evaluation put a large language model at 86%, with no medical school, clinical rotations, or supervised patient care.

What exactly does it mean that AI can pass the USMLE?

Is this just about cheating?

What does 2026 research say about AI and clinical reasoning?

Is the USMLE itself changing?

So what should assessment actually measure now?

FREQUENTLY ASKED QUESTIONS

YOU MAY ALSO LIKE...

Deskilling, Never-Skilling, Mis-Skilling: 3 Risks Reshaping Medical Education

Food as Medicine: Filling the Gaps in Medical Education

Why Nutrition is An Essential Part of Medical Education

Product

Resources

Company