Radiology triage AI rarely succeeds or fails because of model architecture alone. In real-world deployments, outcomes are shaped by a single operational decision that often receives far less scrutiny than it deserves: the alert threshold that determines which scans are flagged as urgent and pushed to the top of the radiologist’s worklist. Andrew Ting, MD, highlights that this threshold setting is not a technical afterthought but a clinical policy decision encoded in software. Once deployed, it quietly governs workload distribution, turnaround times, and which patients receive faster attention.
Why the Threshold Matters More Than the Model
A continuous risk or abnormality score is produced by the majority of triage algorithms. Only when that score is above a certain threshold that sets off an alert or priority flag is it actionable. Cases go into the regular queue below the cutoff. They are raised above it. Probabilistic output is transformed into operational reality through this binary split.
Radiologists are overloaded with “urgent” situations that are not actually urgent if the threshold is set too low. Genuinely important findings are delayed in accordance with normal studies if it is set too high. In both situations, the model can be clinically disruptive yet statistically sound. The model’s behavior within the worklist, rather than on a validation spreadsheet, is dictated by the threshold.
Thresholds as Workflow Design, Not Math
Rather than during development, the cutoff is frequently selected during implementation. Teams may choose a value that “seems fair” based on retrospective analytics, or vendors may recommend a preset. It is overlooked that the threshold determines how AI modifies regular reading habits.
According to Dr Andrew Ting, AI is not perceived by radiologists as a probability distribution or a curve. They perceive it as disruptions, rearranged lines, and time constraints. The threshold determines whether or not those disturbances feel warranted and how frequently they happen. Care is subtly accelerated by a carefully selected threshold that disappears into the background. A badly selected one becomes an ongoing cause of conflict.
The Hidden Trade-Off Embedded in Every Threshold
Sensitivity and operational burden are traded off at each threshold. Increasing sensitivity results in more false alarms but also more real positives. Although tightening the cutoff lowers noise, it may cause edge case care to be delayed. It is not possible to maximize this trade-off separately from institutional tolerance for disruptions, workforce levels, and case mix.
Crucially, there are differences in the “correct” threshold. A high-volume outpatient imaging center will require a different cutoff than a trauma center with overnight staffing limitations. The context in which the threshold functions operate is ignored when it is treated as a fixed value.
Common Ways Alert Thresholds Are Implemented
While the underlying model may be complex, the threshold is typically implemented in one of a few practical ways. Each approach changes outcomes in predictable but often overlooked ways:
- Fixed probability cutoff: A single risk score value determines urgency. This is simple to deploy but brittle, as it assumes stable prevalence and consistent reading capacity across shifts and sites. When case mix changes, alert volume can swing dramatically.
- Top-N prioritization: Only the highest N scoring cases in a given time window are flagged. This caps alert volume but can suppress true positives during surges, allowing critical cases to slip through if many studies score similarly high.
- Percentile-based thresholds: Alerts trigger for cases above a certain percentile of recent scores. This adapts to volume but can normalize risk inflation, where “urgent” becomes relative rather than clinically anchored.
- Time-to-read optimization thresholds: Cutoffs are adjusted to meet turnaround targets rather than clinical severity alone. This improves metrics but can blur the distinction between operational urgency and medical urgency.
Each implementation encodes a philosophy about what “urgent” means. Choosing one without acknowledging its implications is how well-performing models create disappointing outcomes.
How Thresholds Shape Radiologist Behavior
Radiologists pick up trends rapidly. Alerts turn into background noise if they fire too frequently. They attract notice if they fire infrequently yet consistently. Which of these behaviors appears depends on the threshold.
Trust calibration is a subtle effect. Radiologists start to disregard the signal when they see flagged instances that don’t require urgency on a regular basis. On the other hand, trust in the system declines when urgent cases often come in without warning. The threshold, not the model, is the primary reason in both cases.
Cognitive load is also impacted by thresholds. Reading flow is broken up by constant reprioritization, which raises the danger of inaccuracy and tiredness. A carefully calibrated cutoff honors the finite nature of human attention.
Threshold Drift Over Time
Even a carefully considered threshold can deteriorate. Score distributions are altered by modifications to scanning techniques, referral trends, or disease prevalence. In the absence of monitoring, alert volume gradually shifts, upsetting the initial equilibrium.
For this reason, threshold governance is important. Organizations can react to drift before doctors notice it if they treat the cutoff as an adjustable policy rather than a set configuration. Early warning that the threshold is no longer appropriate is provided by routinely reviewing alert rates, turnaround times, and override behavior.
Making Threshold Setting a Clinical Decision
The most effective implementation approach is threshold selection as a collaborative clinical and operational choice. In practice, radiologists, operational teams, and informatics leaders work together to establish urgency. The threshold’s behavior under actual limitations is then examined by testing it against simulated worklists rather than ROC curves alone.
By doing this, AI is reframed as a workflow tool rather than a diagnostic oracle. Instead of asking, “How accurate is the model?”Does this cutoff assist the right patients in getting seen sooner without overloading readers?””
Final Thoughts
AI for radiology triage is crucial at the point where probability turns into action. Organizations can steer clear of numerous downstream problems that are mistakenly ascribed to model quality by concentrating on this one lever. Andrew Ting, MD, emphasizes that the alarm cutoff represents the operational core of triage AI rather than a little configuration item. AI may improve radiology workflows rather than interfere with them when it is carefully configured, continually monitored, and in line with clinical realities.