Lesson 07 of 10AI Healthcare Quality & Safety

Safety & Risk in
Healthcare AI

AI systems can cause harm. Understanding how, why, and with what frequency AI-related harm events occur — and how to prevent, detect, and respond to them — is the most urgent knowledge gap in healthcare AI governance today.

What you will learn
Identify the primary categories of AI-related patient safety events and near misses
Explain how model drift degrades AI performance over time and the monitoring required to detect it
Describe the failure modes unique to AI systems compared to traditional clinical software
Apply a structured framework for investigating and reporting AI-related safety events
Define the governance infrastructure required to sustain AI safety monitoring across an organization

Categories of AI-related
patient safety events

AI-related patient safety events do not fit neatly into existing incident taxonomies developed for traditional clinical errors. They require a distinct classification framework that reflects the novel failure modes of algorithmic systems. The primary categories include: model malfunction — where the AI system produces an error, crash, or unexpected output; unexpected or incorrect output — where the system functions as designed but produces a clinically inappropriate recommendation; bias-related differential harm — where the system produces systematically worse outcomes for specific patient groups; and human-AI teaming failures — where the interaction between clinician and AI system breaks down in ways that cause harm.

Alert fatigue events — where clinicians dismiss genuinely important AI-generated alerts due to habituation — represent one of the most prevalent and most preventable AI safety event categories. Automation complacency events — where clinicians reduce their own clinical vigilance because of trust in an AI system — represent the complementary risk. Data drift events occur when the patient population, clinical workflows, or data inputs to an AI system change in ways that degrade model performance without triggering any system error.

The reporting of AI-related safety events is critically underdeveloped. Most existing safety reporting systems have no specific category for AI-related events. Events are frequently attributed to individual clinical error rather than to the AI system that contributed to the conditions for that error. This attribution gap makes it impossible to learn systematically from AI-related harm events or to identify patterns that should trigger governance responses.

The Attribution Gap

When an AI-related safety event is attributed to individual clinical error rather than to the AI system's contribution, the learning opportunity disappears. Organizations must develop AI-specific safety reporting categories that capture the AI system, its version, the nature of its output, and how it contributed to the event — or AI safety improvement will remain reactive.

Model drift
the silent performance degradation

Model drift describes the gradual degradation of an AI system's performance over time as the real-world data environment diverges from the training data environment. It occurs because healthcare is not static — patient populations change, clinical practices evolve, documentation patterns shift, new medications are introduced, and disease prevalence fluctuates seasonally and epidemiologically.

A sepsis prediction model trained before the COVID-19 pandemic may perform poorly on patients with COVID-19-associated sepsis, whose presentation differs from the sepsis phenotypes in the training data. A documentation AI trained on one version of ICD-10 coding guidelines may produce incorrect suggestions when coding guidelines are updated. A predictive readmission model trained in a pre-telehealth era may systematically underperform for patients who have access to remote monitoring and virtual follow-up.

Model drift is insidious because it does not trigger any system error. The AI continues to function and generate outputs. The outputs are simply less accurate than they were — and in many organizations, nobody is monitoring performance closely enough to detect the degradation before it causes harm. Ongoing performance monitoring — comparing current model predictions to actual clinical outcomes in the local patient population — is the only reliable mechanism for detecting model drift.

Governing AI safety
the organizational infrastructure required

Effective AI safety governance requires organizational infrastructure that most healthcare institutions have not yet developed. The minimum governance requirements include: a mechanism for reporting AI-related safety events and near misses with AI-specific data fields; regular performance monitoring comparing model predictions to actual outcomes in the local patient population; a defined process for investigating AI-related safety events that can distinguish between model failure, deployment failure, and human-AI teaming failure; and clear escalation pathways when AI safety concerns are identified.

The AI Oversight & Governance Committee — covered in detail in Lesson 09 — is the governance body responsible for maintaining this infrastructure. But governance committee oversight alone is not sufficient. Safety monitoring must be embedded in clinical operations, with front-line staff empowered and supported to report AI-related concerns without fear of attribution to individual error.

Post-market surveillance — the ongoing monitoring of AI performance after regulatory clearance and clinical deployment — is an emerging regulatory requirement in multiple jurisdictions. Organizations that build safety monitoring infrastructure proactively will be better positioned to meet these requirements and, more importantly, to protect their patients from AI-related harm.

Post-Market Surveillance

Regulatory clearance of an AI device is based on pre-market performance data. Real-world clinical deployment under diverse conditions is where the true performance of any AI system is determined. Post-market surveillance — ongoing monitoring of real-world performance — is both an ethical obligation and an emerging regulatory requirement.

Key concepts
from this lesson

Key Concept

Model Drift

Gradual degradation of AI performance as the real-world data environment diverges from the training data environment.

Key Concept

AI Safety Event

A patient safety event in which an AI system contributed to the conditions for harm — through malfunction, incorrect output, or human-AI teaming failure.

Key Concept

Post-Market Surveillance

Ongoing monitoring of AI performance after deployment — the mechanism for detecting model drift and identifying safety signals.

Key Concept

Human-AI Teaming Failure

Safety events arising from the interaction between clinician and AI system — automation complacency, alert fatigue, override without assessment.

Key Concept

Attribution Gap

The tendency to attribute AI-related safety events to individual clinical error rather than to AI system contribution — preventing systematic learning.

Key Concept

Performance Monitoring Dashboard

A structured system for tracking AI model performance metrics against actual clinical outcomes — the primary tool for detecting model drift.

Case Study

The algorithm that nobody was watching

A hospital deploys a readmission prediction model in the discharge planning workflow. At deployment, the model achieves an AUC of 0.81 in internal validation — strong enough performance to justify integration into the care transitions workflow. The deployment is considered successful.

Twenty-two months later, a patient safety researcher conducting an unrelated project notices that the readmission model's high-risk predictions show very poor concordance with actual readmission outcomes in recent data. She requests a formal performance audit. The AUC on the previous six months of local data is 0.61 — a substantial performance decline from deployment performance.

Investigation reveals: the hospital had introduced a telehealth-based post-discharge follow-up program 14 months earlier, which had significantly changed the readmission rates and the patient characteristics of those who were readmitted. The model had never been updated or monitored after initial deployment. No one had the assigned responsibility to monitor its performance. The governance framework that approved the initial deployment had no post-deployment performance monitoring requirements.

What this illustrates

Deployment approval is not the end of AI governance — it is the beginning of ongoing governance obligations. A model performing adequately at deployment can become a patient safety risk over time without any malfunction, if the world it was trained on diverges from the world it is deployed in. Post-deployment performance monitoring is not optional governance — it is essential safety infrastructure.

Reflection Prompt

Who is monitoring the AI systems in your organization right now?

Think about the AI systems that currently influence clinical or operational decisions in your organization. For each one: is there an assigned owner responsible for monitoring its performance? Is there a defined schedule for performance review? Is there a mechanism for clinical staff to report concerns about AI output quality? If the answers are no — which is common — your organization has AI deployment governance but not AI safety governance. These are different things.

Further Learning

The IHI's work on patient safety event reporting and learning systems provides foundational context for developing AI-specific safety reporting infrastructure. Adapting existing reporting frameworks for AI-related events is a current challenge that the field is actively working to address.

Knowledge Check — Lesson 07

1. A sepsis prediction model that performed well at deployment begins generating significantly more false-positive alerts 18 months later. No system error has been reported. The most likely explanation is:

AThe EHR vendor has made changes to the underlying IT infrastructure affecting model performance
BModel drift — the patient population or clinical environment has changed in ways that degrade model performance
CThe model has reached its design lifespan and needs to be replaced
DClinical staff have learned to override alerts, which changes the apparent false-positive rate

2. An AI clinical decision support system contributes to a medication error, but the incident report attributes the error entirely to 'nurse did not verify the AI recommendation.' This reflects:

AAn accurate root cause analysis — individual verification failure is the proximate cause
BThe attribution gap — AI system contribution is not captured, preventing systematic learning from AI-related events
CJust Culture principles — the AI system cannot be held accountable for individual clinical decisions
DStandard incident reporting practice that accurately reflects clinical governance responsibilities

3. Which of the following best describes the minimum organizational infrastructure required for AI safety governance?

AA qualified AI officer and an IT team responsible for AI system maintenance
BAI-specific safety event reporting, performance monitoring, investigation processes, and escalation pathways
CAn annual AI audit conducted by the patient safety committee
DVendor-provided performance reports reviewed by the clinical informatics team quarterly

4. Post-market surveillance of a clinical AI system is best described as:

AThe process of monitoring AI vendor marketing claims after the product is commercially available
BOngoing monitoring of AI performance in real-world deployment to detect safety signals and model drift
CThe regulatory review process conducted after an AI system receives initial clearance
DAnnual reporting to the regulatory authority on the number of clinical events involving AI systems

5. Automation complacency in AI deployment is best distinguished from alert fatigue by:

AAlert fatigue involves ignoring alerts that fire; automation complacency involves reducing clinical vigilance for patients who receive low or no alerts
BAlert fatigue is a technology problem; automation complacency is a training problem
CAlert fatigue affects nurses; automation complacency affects physicians
DThey are the same phenomenon described by different professional communities