Back to List

Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems

Zhi Zhen Qin et al. — Scientific Reports


Deep learning (DL) neural networks have only recently been employed to interpret chest radiography (CXR) to screen and triage people for pulmonary tuberculosis (TB). No published studies have compared multiple DL systems and populations. We conducted a retrospective evaluation of three DL systems (CAD4TB, Lunit INSIGHT, and qXR) for detecting TB-associated abnormalities in chest radiographs from outpatients in Nepal and Cameroon. All 1196 individuals received a Xpert MTB/RIF assay and a CXR read by two groups of radiologists and the DL systems. Xpert was used as the reference standard. The area under the curve of the three systems was similar: Lunit (0.94, 95% CI: 0.93–0.96), qXR (0.94, 95% CI: 0.92–0.97) and CAD4TB (0.92, 95% CI: 0.90–0.95). When matching the sensitivity of the radiologists, the specificities of the DL systems were significantly higher except for one. Using DL systems to read CXRs could reduce the number of Xpert MTB/RIF tests needed by 66% while maintaining sensitivity at 95% or better. Using a universal cutoff score resulted different performance in each site, highlighting the need to select scores based on the population screened. These DL systems should be considered by TB programs where human resources are constrained, and automated technology is available.


It is almost impossible to talk about the future of medicine without stumbling upon two letters that bring many hopes, fears, and confusions to the topic. Artificial intelligence (AI) is not new but has gained traction in healthcare in the last decade, due in part to advances in deep learning neural networks. Neural networks are a set of algorithms organized in nodes and layers that mimic human cognitive functions, designed to automatically infer rules to recognize patterns1,2. Neural networks help us cluster and classify images, sound, text and time series after being trained on labeled datasets1. Deep-learning networks are distinguished from earlier versions of neural networks by having more than one hidden layer1, so that each layer learns a distinct set of characters and aggregates and combines inputs from the previous layers to understand and perform more complex features and functions, such as reading medical images and autonomous driving1,3.

Deep neural networks provide opportunities for new solutions to tackle tuberculosis (TB), which kills more people world-wide than any single infectious disease4. A major reason for this high mortality is the persistent gap in detection; more than one third of the estimated 10 million incident TB cases are not diagnosed and reported4. Chest x-ray (CXR) has historically been used in TB detection; for mass screenings5, and more recently for prevalence surveys and active case finding interventions6,7. It is recommended by the World Health Organization (WHO) as a triage test prior to the use of Xpert MTB/Rif8. However, CXR is of only limited use for TB diagnosis due to its modest specificity, since many diseases present with similar radiologic patterns9,10, high inter- and intra-reader variability and reproducibility11,12, and the paucity of skilled radiologists in many high TB burden countries12.

Several deep-learning (DL) systems have been developed in recent years to analyze digital chest radiographs for TB-related abnormalities that could potentially address current shortcomings, including reducing human inter-reader variability and reproducibility and supplying radiologic services where radiologists are not available. However, current evidence is limited to only one product, CAD4TB (Delft Imaging Systems, Netherlands)6,13,14 which has been evaluated only with non-DL versions of the software, as DL is new in the current version 6. No peer-reviewed evaluations of the performance of any DL system for detecting TB abnormalities exist, nor do any compare multiple DL systems with human readers. WHO has not made a recommendation on the use of automated reading systems for TB due to the current lack of evidence8. To fill the evidence gap, we compared the performance of three different DL applications in detecting bacteriologically-confirmed TB with that of radiologists experienced in detecting TB, using datasets from two countries.

Read the full paper

Zhi Zhen Qin, Melissa S. Sander, Bishwa Rai, Collins N. Titahong, Santat Sudrungrot, Sylvain N. Laah, Lal Mani Adhikari, E. Jane Carter, Lekha Puri, Andrew J. Codlin & Jacob Creswell

Scientific Reports

Read more