Aug 2, 2020 — 12 min read
How cognitive biases affect diagnostic problems in radiology and how AI can help out
“An apple a day, keeps the doctor away”: an aphorism that is often told to kids to entice them to eat more fruits. Although it may have an element of truth in it, it also serves as a nice example of the rhyme-as-reason effect, a cognitive bias that describes the tendency of people to perceive statements as more likely to be true, when put in the form of a rhyme.
Biases in human reasoning affect all aspects of our lives and have been studied extensively. Daniel Kahneman, an Israeli psychologist was one of the first and most prominent in this field and received a Nobel prize in economics in 2002, for his work on cognitive biases and their effect on people’s buying behavior.
Apart from economics, cognitive biases are also important in medicine. For example, humans consistently overestimate the likelihood of rare diseases (referred to as a zebra in American medical slang), causing overdiagnosis, which results in unnecessary stress for patients and economic burden for clinics.
Shortcomings in human cognitive capacities can have even more serious consequences than overdiagnosis. Human error in medicine has recently been identified as the third leading cause of death in the US. In radiology, errors are not only affected by faults in reasoning, but also in perception [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13].
Some important perceptual biases in radiology are
Satisfaction of search bias This bias can cause errors when the clinician stops their search for pathologies in an image, once an initial abnormality has been identified. A potential result is more dangerous diseases being overlooked or a differential diagnosis failing, because of a lack of information.
Confirmation bias Related to the satisfaction of search bias. In this case however, the clinician has some predisposed ideas about the disease that should be present and only looks for evidence to confirm his or her hypothesis.
Prevalence effect The prevalence effect describes the fact that people are far more likely to miss rare events than common events, all other things being equal. An example of this is in airport security: bags containing weapons are luckily very rare, but officers tend to miss them if they do. However, when the same signal is presented in a setting where it occurs more frequently, they have no problem detecting it. This bias has also been identified as a potential cause of human error in screening settings for relatively rare diseases .
Inattentional blindness This phenomenon describes the failure to notice a completely obvious sign, that is very different from what the person is trained or instructed to look for. A famous example of this bias at work is a study where researchers inserted a picture of a gorilla in a slice of a chest CT scan; 83% of radiologists looking for lung nodules in the scan failed to see the obvious gorilla .
Biases causing misinterpretation in radiology (and medical diagnosis in general) are
Anchoring bias Related to the satisfaction of search bias, except that the clinician keeps looking for new information. However, they lock into an initial diagnosis in spite of this new evidence.
Automation bias The automation bias describes the behavior of people who rely too much on technology. For instance, if readers of radiological exams know a system in the background is helping them, they may become lazy and fully trust the system.
Zebra retreat Similar to the prevalence effect, this applies to the detection of rare diseases. In the case of the zebra retreat, the clinician did notice the disease but assumed that it must be normal, because the particular disease is so rare. (This is the opposite of the zebra effect, the tendency to overestimate the probability of a rare disease.)
To prevent these biases, procedures such as checklists, decision trees and standardized reporting systems have been proposed. Computers, in the form of — computer aided diagnosis (CAD) — can also mitigate these effects. What better way to aid someone then to compensate for their mistakes? This post discusses some common setups in which radiologists act with AI systems and postulates how errors resulting from perceptual and cognitive biases could be mitigated.
Baring a few exceptions, it will likely be some time before AI systems are allowed to read medical images completely autonomously. Until then, systems should complement radiologists and compensate for mistakes humans make. The final performance is a function of the radiologist, the system and the interface between them. This paradigm is often referred to as ‘augmented intelligence’ or ‘complimentary intelligence’ and was eloquently phrased by Gilbert et al. :
“[…] Instead, we must focus on promoting the model of the “centaur”, a highly trained human working together with an AI to achieve more than would be possible alone.. “
At the moment there are roughly three different setups where the computer interacts with a radiologist. In common parlance, all of these are referred to as ‘computer aided diagnosis’ (CAD), although this term is also often used as a subtype of general CAD. All setups require different ‘levels’ of automation. Similar to automation levels defined for self-driving cars , one could generate a (somewhat hand-wavy) hierarchy, with the doctor on the one end and an autonomous AI system on the other. A depiction is given in figure 1.
Computer aided detection (also referred to as CADe)
In this setting, the radiologist opens an image and queries the system which subsequently shows markers or heat maps on suspicious areas. The US Food and Drug Administration (FDA) describes CADe systems as :
“CADe devices are computerized systems that incorporate pattern recognition and data analysis capabilities (i.e., combine values, measurements, or features extracted from the patient radiological data) and are intended to identify, mark, highlight, or in any other manner direct attention to portions of an image, or aspects of radiology device data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data by the intended use […]”
Although promising, it was shown that that early systems were mostly effective for catching errors in search, in particular small pathologies such as calcifications in mammograms, but were simply distracting for things that would have been found anyway. Because of poor independent performance, the system turned out to be ineffective in the clinic.
Figure 2. Example of computer aided detection systems. An input image, a mammogram in this case (left), is fed through a machine learning algorithm that adds markers (middle) or generates a heatmap (right) on suspicious areas. (image by author)
The reason for this lack of ‘augmented intelligence’ has been attributed to the automation bias, which describes the tendencies of humans to trust automated systems too much. This bias can be a blessing and a curse. Any system operating below human performance will drag the radiologist down, any system operating above it will make it better.
Computer aided diagnosis (also referred to as CADx)
CADx systems do not (only) mark suspicious areas in the image. Instead the system provides some more information that is relevant for the diagnosis, such as a score for the image or part of the image. The FDA describes CADx as :
“CADx devices are computerized systems intended to provide information beyond identifying, marking, highlighting, or in any other manner directing attention to portions of an image, or aspects of radiology device data, that may reveal abnormalities during interpretation of patient radiology images or patient radiology device data by the clinician.”
Some examples of CADx systems are:
A. Interactive decision support In this setting the radiologist queries a region in the image and the system shows a score that represents the degree of suspicion of the region .
B. Content based image retrieval Content based image retrieval (CBIR) was first introduced in search engines to help people find similar content. In a medical context, the use queries an area in the image and the system shows a set of similar patches, for instance five positive and five negative cases that all look similar.
Although the idea has been discussed extensively [18, 19], to date little (or none?) clinical applications exist. It is also difficult to say if displaying similar images with their classes actually improves the reader’s performance over something like simple decision support.
Figure 4. Illustration of the use of a content based image retrieval (CBIR) system for computer aided diagnosis. A user could query the image and the system will look for similar regions along with their respective diagnosis.
CADx systems will mostly target errors in interpretation (such as the anchoring bias and zebra retreat), search errors will largely remain unaffected because the radiologist still has to search for suspicious areas. The automation bias may be less likely to gain a foothold, because the user typically has to query the system first.
Computer assisted triaging
Triaging systems rank patients based on urgency, by estimating an outcome such as their condition or probability of recovery. The idea of using computers to do triaging is sometimes referred to as computer assisted simple triage (CAST). This was initially proposed for emergency room settings , but recently caught on in other domains [21, 22]. The FDA describes computer aided triaging systems as:
“Computer-triage devices are computerized systems intended to, in any way, reduce or eliminate any aspect of clinical care currently provided by a clinician, such as a device for which the output indicates that a subset of patients (i.e., one or more patients in the target population) are normal and therefore do not require interpretation of their radiological data by a clinician.”
At the moment there are roughly two different settings:
A. Soft triaging Here, all cases are ordered and presented to a doctor in this order. This allows the clinician to focus on the most pressing cases first. Figure 5 shows an illustration of such a system.
In a soft triaging setting, essentially all the cognitive biases that are present -on a case level- still apply. However, radiologists may be less likely to miss essential information due to fatigue, since they can schedule the urgent cases at times when they feel most rested, provided the diagnosis task allows them to.
If implemented well, the algorithms are expected to generate a better triage than humans and therefore on a case-list level, biases stemming from misinterpretation are expected to be mitigated.
B. Hard triaging (or rule-out systems) Similar to the soft triaging approach, the cases are ordered but instead, the bottom x% is no longer presented to a doctor, to free up time. This is particularly useful in low incidence settings such as screening, where large proportions of cases could be diagnosed automatically. Figure 6 displays an illustration of this setup.
Again similar to soft triaging, cognitive biases that apply on a case level are still present in all the cases that are shown to the radiologists. You could argue that the automation effect is eliminated for cases not shown to radiologists, or that it is simply the extreme case of the automation effect: all cases follow the diagnosis of the system. An advantage, however, is that it is easier to evaluate the system for this subset, because user interactions do not have to be taken into account.
Errors suffered because of fatigue are likely to be mitigated, as a lot of time is freed up (unless this is again allocated to different tasks). The prevalence effect, the phenomenon where readers are more likely to miss signals in a low incidence setting is also likely to be mitigated, as the remaining cases will have a higher incidence.
In 2018, the FDA gave approval for the first ever autonomous AI system, a system used in screening for diabetic retinopathy . For most applications in medical image analysis, it may be some time before a similar system will be realized, as development takes years and regulations should be strict . Some intermediate steps could still be applied though.
In some screening settings such as lung and breast cancer screening, exams are sometimes read by two radiologists, typically independently. One of the two radiologists could be replaced by a system operating autonomously. In this case the ‘augmented intelligence’ component is still there: concepts from ensemble learning apply and the system is somewhat simpler to analyze. An illustration of this setup is provided in figure 7.
In case of a completely autonomous AI system, all cognitive biases are alleviated. That does not mean the system is unbiased though. If it was trained with bias in the data, you have a different problem. Biases in the data such as the center the data was trained on, the manufacturer of the scanner and the annotator will still be reflected in the output.
To summarize, computers are powerful tools and have great potential for medical diagnosis. Until systems can operate independently, they should help radiologists and compensate for errors radiologists make. Carefully analyzing errors of radiologists, for instance, by looking at cognitive biases, could help boost the joint performance of the radiologist and the system.
Kundel, H.L., Nodine, C.F. and Carmody, D., 1978. Visual scanning, pattern recognition and decision-making in pulmonary nodule detection. Investigative radiology, 13(3), pp.175–181.
Pinto, A. and Brunese, L., 2010. Spectrum of diagnostic errors in radiology. World journal of radiology, 2(10), p.377.
Kim, Y.W. and Mansfield, L.T., 2014. Fool me twice: delayed diagnoses in radiology with emphasis on perpetuated errors. American journal of roentgenology, 202(3), pp.465–470.
Bruno, M.A., Walker, E.A. and Abujudeh, H.H., 2015. Understanding and confronting our mistakes: the epidemiology of error in radiology and strategies for error reduction. Radiographics, 35(6), pp.1668–1676.
Berbaum, K.S., FRANKEN Jr, E.A., DORFMAN, D.D., ROOHOLAMINI, S.A., KATHOL, M.H., BARLOON, T.J., BEHLKE, F.M., Sato, Y.U.T.A.K.A., LU, C.H., EL-KHOURY, G.Y. and FLICKINGER, F.W., 1990. Satisfaction of search in diagnostic radiology. Investigative radiology, 25(2), pp.133–140.
Akgül, C.B., Rubin, D.L., Napel, S., Beaulieu, C.F., Greenspan, H. and Acar, B., 2011. Content-based image retrieval in radiology: current status and future directions. Journal of digital imaging, 24(2), pp.208–222.
Graber, M., 2005. Diagnostic errors in medicine: a case of neglect. The Joint Commission Journal on Quality and Patient Safety, 31(2), pp.106–113.
Busby, L.P., Courtier, J.L. and Glastonbury, C.M., 2018. Bias in radiology: the how and why of misses and misinterpretations. Radiographics, 38(1), pp.236–247.
Bornstein, B.H. and Emler, A.C., 2001. Rationality in medical decision making: a review of the literature on doctors’ decision‐making biases. Journal of evaluation in clinical practice, 7(2), pp.97–107.
Saposnik, G., Redelmeier, D., Ruff, C.C. and Tobler, P.N., 2016. Cognitive biases associated with medical decisions: a systematic review. BMC medical informatics and decision making, 16(1), p.138.
Drew, T., Võ, M.L.H. and Wolfe, J.M., 2013. The invisible gorilla strikes again: Sustained inattentional blindness in expert observers. Psychological science, 24(9), pp.1848–1853.
Evans, K.K., Birdwell, R.L. and Wolfe, J.M., 2013. If you don’t find it often, you often don’t find it: why some cancers are missed in breast cancer screening. PloS one, 8(5).
Gilbert, F.J., Smye, S.W. and Schönlieb, C.B., 2020. Artificial intelligence in clinical imaging: a health system approach. Clinical radiology, 75(1), pp.3–6.
SAE On-Road Automated Vehicle Standards Committee, 2018. Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles. SAE International: Warrendale, PA, USA.
FDA, U. (2012). Guidance for Industry and Food and Drug Administration Staff: Computer-Assisted Detection Devices Applied to Radiology Images and Radiology Device Data — Premarket Notification [510 (k)] Submissions.
Hupse, R., Samulski, M., Lobbes, M.B., Mann, R.M., Mus, R., den Heeten, G.J., Beijerinck, D., Pijnappel, R.M., Boetes, C. and Karssemeijer, N., 2013. Computer-aided detection of masses at mammography: interactive decision support versus prompts. Radiology, 266(1), pp.123–129.
Cai, C.J., Reif, E., Hegde, N., Hipp, J., Kim, B., Smilkov, D., Wattenberg, M., Viegas, F., Corrado, G.S., Stumpe, M.C. and Terry, M., 2019, May. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (pp. 1–14).
Akgül, C.B., Rubin, D.L., Napel, S., Beaulieu, C.F., Greenspan, H. and Acar, B., 2011. Content-based image retrieval in radiology: current status and future directions. Journal of digital imaging, 24(2), pp.208–222.
Goldenberg, R. and Peled, N., 2011. Computer-aided simple triage. International journal of computer assisted radiology and surgery, 6(5), p.705.
Yala, A., Schuster, T., Miles, R., Barzilay, R. and Lehman, C., 2019. A deep learning model to triage screening mammograms: a simulation study. Radiology, 293(1), pp.38–46.
Annarumma, M., Withey, S.J., Bakewell, R.J., Pesce, E., Goh, V. and Montana, G., 2019. Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology, 291(1), pp.196–202.
Abràmoff, M.D., Lavin, P.T., Birch, M., Shah, N. and Folk, J.C., 2018. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ digital medicine, 1(1), pp.1–8.
(Docket No. FDA‐2019‐N‐5592) “Public Workshop ‐ Evolving Role of Artificial Intelligence in Radiological Imaging;” Comments of the American College of Radiology, 2020
Using AI to better manage COVID-19
Lunit to unveil data-driven imaging biomarker at RSNA 2016
Lunit INSIGHT: The First-ever, Real-time Imaging AI Analytics on the Web
Uncertainty and Deep Learning
Lunit Returns to RSNA with Real-time Imaging Platform, Featuring Cloud-based A
Lunit Unveils “Lunit INSIGHT,” A New Real-time Imaging AI Platform on the Web at RSNA 2017
Lunit INSIGHT: Q&A with Brandon B. Suh, Chief Medical Officer at Lunit
Lunit Opens “Lunit INSIGHT for Mammography” for Public Access
Lunit Brings its Newest AI Solution for Mammography to RSNA 2018
Lunit Partners with Fujifilm and Salud Digna to Provide Medical AI Solution in Mexico
Lunit announces new members in its advisory board: Dr. Eliot Siegel, Dr. Linda Moy, and Dr. Khan Siddiqui
Lunit to Showcase AI Solution for Breast Cancer at SBI 2019
Lunit Receives Korea MFDS Approval for its AI Solution for Breast Cancer, Lunit INSIGHT MMG
Lunit Gets Korea MFDS Approval for its AI Solution for Chest X-ray, Lunit INSIGHT CXR 2
Lunit at RSNA 2019
Lunit Announces its First CE Mark for AI-Powered Chest X-ray Analysis Software, Lunit INSIGHT CXR
Lunit AI software clinically installed in global sites—to be presented at RSNA 2019
AI-assisted Radiologists Can Detect More Breast Cancer with Reduced False-positive Recall
Lunit INSIGHT MMG, an AI Solution for Breast Cancer Detection, Now CE Certified
Lunit at ECR 2020: Providing Virtual Exhibition and Online Presentations
Emergent Connect Partners with Lunit to Provide Cloud Based AI Solutions
AI Analysis Can Improve Lung Cancer Detection on Chest Radiographs
AI has Added Value in Insurance Underwriting Process, Recent Pilot Project Between Lunit and Cathay Financial Innovation Lab Reveals
RSNA 2020--Lunit Collaborates with Global Giants in Presenting its AI Solution at Virtual Booths of GE Healthcare, FujiFilm, and Sectra
AI Proves Its Value in Assistance for Emergency Cases-- With Higher Accuracy and Timely Reporting Time of Chest Radiographs
Lunit Expands Collaboration with GE Healthcare to Advance AI Adoption across Healthcare Industry
AI Can Offer Fast and Reliable Examination to Triage COVID-19 Patients-- A Multicenter Retrospective Study Reveals
Recent Studies Reveal High Performance of Lunit AI in Breast Cancer Detection
South Korean Medical AI Provider and PACS Leader Enters Indonesian Market, to Help COVID-19 Screening and Diagnosis
What do medical studies say about AI for COVID-19 management?
What do the medical journals say about AI-powered mammography?
Research Using Lunit Demo Website
What do the medical journals say about AI-powered chest x-ray interpretation?
Join our webinar with Dr. Fredrik Strand
From retrospective to prospective trials of AI in breast cancer screening
Will AI Identify Breast Cancer Better Than Radiologists in Actual Clinical Screening?
Case of the Month | Where Is Breast Cancer?
Clinical Application of Lunit AI For Chest Radiography and Mammography to Be Presented at AOCR 2021
A Multi-reader Study Finds Unnecessary CT Exams Can Be Reduced by 30% When Analyzing Chest Radiographs with AI
Fujifilm Introduces its AI-powered Product for Chest X-ray in Japan, in Collaboration with Lunit
Case of the Month (Chest x-ray)
COVID Global User Testimonials
Where is Cancer?
Breast Cancer Awareness Month — Ambassador Messages
TimeSformer: Is Space-Time Attention All You Need for Video Understanding?
Evaluation curves for object detection algorithms in medical images
Lunit to Participate in RSNA 2021, Presenting its New AI Solutions for Digital Breast Tomosynthesis and Chest CT
RadLink, Singapore’s Leading Imaging Center, Adapts Lunit AI to Analyze Chest X-rays and Mammograms
Lunit Gets FDA Nod for AI-based Chest X-ray Triage Solution, Developed for Sorting of Emergency Cases
Lunit's AI Software for Breast Cancer Detection, Lunit INSIGHT MMG, Wins FDA Clearance
Research Backs Clinical Efficacy of Lunit's Radiology AI Products; Abstracts to be Presented at RSNA 2021
Lunit Obtains MDSAP Certificate, Granted Fast-Track Regulatory Process in Major Countries
Lunit to Showcase Chest and Breast Radiology AI in Arab Health 2022
Baheya Foundation, Egypt’s Premier Destination for Breast Cancer, Adopts Lunit AI to Enhance Early Screening