Publications
Deep-learning algorithms for the interpretation of chest radiographs to aid in the triage of COVID-19 patients: A multicenter retrospective study
Se Bum Jang et al.PLOS ONEABSTRACT
The recent medical applications of deep-learning (DL) algorithms have demonstrated their clinical efficacy in improving speed and accuracy of image interpretation. If the DL algorithm achieves a performance equivalent to that achieved by physicians in chest radiography (CR) diagnoses with Coronavirus disease 2019 (COVID-19) pneumonia, the automatic interpretation of the CR with DL algorithms can significantly reduce the burden on clinicians and radiologists in sudden surges of suspected COVID-19 patients. The aim of this study was to evaluate the efficacy of the DL algorithm for detecting COVID-19 pneumonia on CR compared with formal radiology reports. This is a retrospective study of adult patients that were diagnosed as positive COVID-19 cases based on the reverse transcription polymerase chain reaction among all the patients who were admitted to five emergency departments and one community treatment center in Korea from February 18, 2020 to May 1, 2020. The CR images were evaluated with a publicly available DL algorithm. For reference, CR images without chest computed tomography (CT) scans classified as positive for COVID-19 pneumonia were used given that the radiologist identified ground-glass opacity, consolidation, or other infiltration in retrospectively reviewed CR images. Patients with evidence of pneumonia on chest CT scans were also classified as COVID-19 pneumonia positive outcomes. The overall sensitivity and specificity of the DL algorithm for detecting COVID-19 pneumonia on CR were 95.6%, and 88.7%, respectively. The area under the curve value of the DL algorithm for the detection of COVID-19 with pneumonia was 0.921. The DL algorithm demonstrated a satisfactory diagnostic performance comparable with that of formal radiology reports in the CR-based diagnosis of pneumonia in COVID-19 patients. The DL algorithm may offer fast and reliable examinations that can facilitate patient screening and isolation decisions, which can reduce the medical staff workload during COVID-19 pandemic situations.
AUTHORS
URL
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0242759
Validation of a Deep Learning Algorithm for the Detection of Malignant Pulmonary Nodules in Chest Radiographs
Hyunsuk Yoo et al.JAMA Open NetworkABSTRACT
Importance The improvement of pulmonary nodule detection, which is a challenging task when using chest radiographs, may help to elevate the role of chest radiographs for the diagnosis of lung cancer.
Objective To assess the performance of a deep learning–based nodule detection algorithm for the detection of lung cancer on chest radiographs from participants in the National Lung Screening Trial (NLST).
Design, Setting, and Participants This diagnostic study used data from participants in the NLST ro assess the performance of a deep learning–based artificial intelligence (AI) algorithm for the detection of pulmonary nodules and lung cancer on chest radiographs using separate training (in-house) and validation (NLST) data sets. Baseline (T0) posteroanterior chest radiographs from 5485 participants (full T0 data set) were used to assess lung cancer detection performance, and a subset of 577 of these images (nodule data set) were used to assess nodule detection performance. Participants aged 55 to 74 years who currently or formerly (ie, quit within the past 15 years) smoked cigarettes for 30 pack-years or more were enrolled in the NLST at 23 US centers between August 2002 and April 2004. Information on lung cancer diagnoses was collected through December 31, 2009. Analyses were performed between August 20, 2019, and February 14, 2020.
Exposures Abnormality scores produced by the AI algorithm.
Main Outcomes and Measures The performance of an AI algorithm for the detection of lung nodules and lung cancer on radiographs, with lung cancer incidence and mortality as primary end points.
Results A total of 5485 participants (mean [SD] age, 61.7 [5.0] years; 3030 men [55.2%]) were included, with a median follow-up duration of 6.5 years (interquartile range, 6.1-6.9 years). For the nodule data set, the sensitivity and specificity of the AI algorithm for the detection of pulmonary nodules were 86.2% (95% CI, 77.8%-94.6%) and 85.0% (95% CI, 81.9%-88.1%), respectively. For the detection of all cancers, the sensitivity was 75.0% (95% CI, 62.8%-87.2%), the specificity was 83.3% (95% CI, 82.3%-84.3%), the positive predictive value was 3.8% (95% CI, 2.6%-5.0%), and the negative predictive value was 99.8% (95% CI, 99.6%-99.9%). For the detection of malignant pulmonary nodules in all images of the full T0 data set, the sensitivity was 94.1% (95% CI, 86.2%-100.0%), the specificity was 83.3% (95% CI, 82.3%-84.3%), the positive predictive value was 3.4% (95% CI, 2.2%-4.5%), and the negative predictive value was 100.0% (95% CI, 99.9%-100.0%). In digital radiographs of the nodule data set, the AI algorithm had higher sensitivity (96.0% [95% CI, 88.3%-100.0%] vs 88.0% [95% CI, 75.3%-100.0%]; P = .32) and higher specificity (93.2% [95% CI, 89.9%-96.5%] vs 82.8% [95% CI, 77.8%-87.8%]; P = .001) for nodule detection compared with the NLST radiologists. For malignant pulmonary nodule detection on digital radiographs of the full T0 data set, the sensitivity of the AI algorithm was higher (100.0% [95% CI, 100.0%-100.0%] vs 94.1% [95% CI, 82.9%-100.0%]; P = .32) compared with the NLST radiologists, and the specificity (90.9% [95% CI, 89.6%-92.1%] vs 91.0% [95% CI, 89.7%-92.2%]; P = .91), positive predictive value (8.2% [95% CI, 4.4%-11.9%] vs 7.8% [95% CI, 4.1%-11.5%]; P = .65), and negative predictive value (100.0% [95% CI, 100.0%-100.0%] vs 99.9% [95% CI, 99.8%-100.0%]; P = .32) were similar to those of NLST radiologists.
Conclusions and Relevance In this study, the AI algorithm performed better than NLST radiologists for the detection of pulmonary nodules on digital radiographs. When used as a second reader, the AI algorithm may help to detect lung cancer.
AUTHORS
URL
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2770952
Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study
Mattie Salim et al.LANCET Digital HealthABSTRACT
Background
We examined the potential change in cancer detection when using an artificial intelligence (AI) cancer-detection software to triage certain screening examinations into a no radiologist work stream, and then after regular radiologist assessment of the remainder, triage certain screening examinations into an enhanced assessment work stream. The purpose of enhanced assessment was to simulate selection of women for more sensitive screening promoting early detection of cancers that would otherwise be diagnosed as interval cancers or as next-round screen-detected cancers. The aim of the study was to examine how AI could reduce radiologist workload and increase cancer detection.
Methods
In this retrospective simulation study, all women diagnosed with breast cancer who attended two consecutive screening rounds were included. Healthy women were randomly sampled from the same cohort; their observations were given elevated weight to mimic a frequency of 0·7% incident cancer per screening interval. Based on the prediction score from a commercially available AI cancer detector, various cutoff points for the decision to channel women to the two new work streams were examined in terms of missed and additionally detected cancer.
Findings
7364 women were included in the study sample: 547 were diagnosed with breast cancer and 6817 were healthy controls. When including 60%, 70%, or 80% of women with the lowest AI scores in the no radiologist stream, the proportion of screen-detected cancers that would have been missed were 0, 0·3% (95% CI 0·0–4·3), or 2·6% (1·1–5·4), respectively. When including 1% or 5% of women with the highest AI scores in the enhanced assessment stream, the potential additional cancer detection was 24 (12%) or 53 (27%) of 200 subsequent interval cancers, respectively, and 48 (14%) or 121 (35%) of 347 next-round screen-detected cancers, respectively.
Interpretation
Using a commercial AI cancer detector to triage mammograms into no radiologist assessment and enhanced assessment could potentially reduce radiologist workload by more than half, and pre-emptively detect a substantial proportion of cancers otherwise diagnosed later.
AUTHORS
URL
https://www.thelancet.com/journals/landig/article/PIIS2589-7500(20)30185-0/fulltext
External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms
Mattie Salim et al.JAMA OncologyABSTRACT
Importance
A computer algorithm that performs at or above the level of radiologists in mammography screening assessment could improve the effectiveness of breast cancer screening.
Objective
To perform an external evaluation of 3 commercially available artificial intelligence (AI) computer-aided detection algorithms as independent mammography readers and to assess the screening performance when combined with radiologists.
Design, Setting, and Participants
This retrospective case-control study was based on a double-reader population-based mammography screening cohort of women screened at an academic hospital in Stockholm, Sweden, from 2008 to 2015. The study included 8805 women aged 40 to 74 years who underwent mammography screening and who did not have implants or prior breast cancer. The study sample included 739 women who were diagnosed as having breast cancer (positive) and a random sample of 8066 healthy controls (negative for breast cancer).
Main Outcomes and Measures
Positive follow-up findings were determined by pathology-verified diagnosis at screening or within 12 months thereafter. Negative follow-up findings were determined by a 2-year cancer-free follow-up. Three AI computer-aided detection algorithms (AI-1, AI-2, and AI-3), sourced from different vendors, yielded a continuous score for the suspicion of cancer in each mammography examination. For a decision of normal or abnormal, the cut point was defined by the mean specificity of the first-reader radiologists (96.6%).
Results
The median age of study participants was 60 years (interquartile range, 50-66 years) for 739 women who received a diagnosis of breast cancer and 54 years (interquartile range, 47-63 years) for 8066 healthy controls. The cases positive for cancer comprised 618 (84%) screen detected and 121 (16%) clinically detected within 12 months of the screening examination. The area under the receiver operating curve for cancer detection was 0.956 (95% CI, 0.948-0.965) for AI-1, 0.922 (95% CI, 0.910-0.934) for AI-2, and 0.920 (95% CI, 0.909-0.931) for AI-3. At the specificity of the radiologists, the sensitivities were 81.9% for AI-1, 67.0% for AI-2, 67.4% for AI-3, 77.4% for first-reader radiologist, and 80.1% for second-reader radiologist. Combining AI-1 with first-reader radiologists achieved 88.6% sensitivity at 93.0% specificity (abnormal defined by either of the 2 making an abnormal assessment). No other examined combination of AI algorithms and radiologists surpassed this sensitivity level.
Conclusions and Relevance
To our knowledge, this study is the first independent evaluation of several AI computer-aided detection algorithms for screening mammography. The results of this study indicated that a commercially available AI computer-aided detection algorithm can assess screening mammograms with a sufficient diagnostic performance to be further evaluated as an independent reader in prospective clinical trials. Combining the first readers with the best algorithm identified more cases positive for cancer than combining the first readers with second readers.
AUTHORS
URL
https://jamanetwork.com/journals/jamaoncology/article-abstract/2769894
Deep learning–based automated detection algorithm for active pulmonary tuberculosis on chest radiographs: diagnostic performance in systematic screening of asymptomatic individuals
Jong Hyuk Lee et al.European RadiologyABSTRACT
Objectives
Performance of deep learning–based automated detection (DLAD) algorithms in systematic screening for active pulmonary tuberculosis is unknown. We aimed to validate DLAD algorithm for detection of active pulmonary tuberculosis and any radiologically identifiable relevant abnormality on chest radiographs (CRs) in this setting.
Methods
We performed out-of-sample testing of a pre-trained DLAD algorithm, using CRs from 19.686 asymptomatic individuals (ages, 21.3 ± 1.9 years) as part of systematic screening for tuberculosis between January 2013 and July 2018. Area under the receiver operating characteristic curves (AUC) for diagnosis of tuberculosis and any relevant abnormalities were measured. Accuracy measures including sensitivities, specificities, positive predictive values (PPVs), and negative predictive values (NPVs) were calculated at pre-defined operating thresholds (high sensitivity threshold, 0.16; high specificity threshold, 0.46).
Results
All five CRs from four individuals with active pulmonary tuberculosis were correctly classified as having abnormal findings by DLAD with specificities of 0.959 and 0.997, PPVs of 0.006 and 0.068, and NPVs of both 1.000 at high sensitivity and high specificity thresholds, respectively. With high specificity thresholds, DLAD showed comparable diagnostic measures with the pooled radiologists (p values > 0.05). For the radiologically identifiable relevant abnormality (n = 28), DLAD showed an AUC value of 0.967 (95% confidence interval, 0.938–0.996) with sensitivities of 0.821 and 0.679, specificities of 0.960 and 0.997, PPVs of 0.028 and 0.257, and NPVs of both 0.999 at high sensitivity and high specificity thresholds, respectively.
Conclusions
In systematic screening for tuberculosis in a low-prevalence setting, DLAD algorithm demonstrated excellent diagnostic performance, comparable with the radiologists in the detection of active pulmonary tuberculosis.
AUTHORS
URL
https://link.springer.com/article/10.1007/s00330-020-07219-4
Performance of a Deep Learning Algorithm Compared with Radiologic Interpretation for Lung Cancer Detection on Chest Radiographs in a Health Screening Population
Jong Hyuk Lee et al.RadiologyABSTRACT
Abstract
A deep learning algorithm detected lung cancer nodules on chest radiographs with a performance comparable to that of radiologists, which will be helpful for radiologists in healthy populations with a low prevalence of lung cancer.
Background
The performance of a deep learning algorithm for lung cancer detection on chest radiographs in a health screening population is unknown.
Purpose
To validate a commercially available deep learning algorithm for lung cancer detection on chest radiographs in a health screening population.
Materials and Methods
Out-of-sample testing of a deep learning algorithm was retrospectively performed using chest radiographs from individuals undergoing a comprehensive medical check-up between July 2008 and December 2008 (validation test). To evaluate the algorithm performance for visible lung cancer detection, the area under the receiver operating characteristic curve (AUC) and diagnostic measures, including sensitivity and false-positive rate (FPR), were calculated. The algorithm performance was compared with that of radiologists using the McNemar test and the Moskowitz method. Additionally, the deep learning algorithm was applied to a screening cohort undergoing chest radiography between January 2008 and December 2012, and its performances were calculated.
Results
In a validation test comprising 10 285 radiographs from 10 202 individuals (mean age, 54 years ± 11 [standard deviation]; 5857 men) with 10 radiographs of visible lung cancers, the algorithm’s AUC was 0.99 (95% confidence interval: 0.97, 1), and it showed comparable sensitivity (90% [nine of 10 radiographs]) to that of the radiologists (60% [six of 10 radiographs]; P = .25) with a higher FPR (3.1% [319 of 10 275 radiographs] vs 0.3% [26 of 10 275 radiographs]; P < .001). In the screening cohort of 100 525 chest radiographs from 50 070 individuals (mean age, 53 years ± 11; 28 090 men) with 47 radiographs of visible lung cancers, the algorithm’s AUC was 0.97 (95% confidence interval: 0.95, 0.99), and its sensitivity and FPR were 83% (39 of 47 radiographs) and 3% (2999 of 100 478 radiographs), respectively.
Conclusion
A deep learning algorithm detected lung cancers on chest radiographs with a performance comparable to that of radiologists, which will be helpful for radiologists in healthy populations with a low prevalence of lung cancer.
AUTHORS
URL
Implementation of a Deep Learning-Based Computer-Aided Detection System for the Interpretation of Chest Radiographs in Patients Suspected for COVID-19
Eui Jin Hwang et al.Korean Journal of RadiologyABSTRACT
Objective
To describe the experience of implementing a deep learning-based computer-aided detection (CAD) system for the interpretation of chest X-ray radiographs (CXR) of suspected coronavirus disease (COVID-19) patients and investigate the diagnostic performance of CXR interpretation with CAD assistance.
Materials and Methods
In this single-center retrospective study, initial CXR of patients with suspected or confirmed COVID-19 were investigated. A commercialized deep learning-based CAD system that can identify various abnormalities on CXR was implemented for the interpretation of CXR in daily practice. The diagnostic performance of radiologists with CAD assistance were evaluated based on two different reference standards: 1) real-time reverse transcriptase-polymerase chain reaction (rRT-PCR) results for COVID-19 and 2) pulmonary abnormality suggesting pneumonia on chest CT. The turnaround times (TATs) of radiology reports for CXR and rRT-PCR results were also evaluated.
Results
Among 332 patients (male:female, 173:159; mean age, 57 years) with available rRT-PCR results, 16 patients (4.8%) were diagnosed with COVID-19. Using CXR, radiologists with CAD assistance identified rRT-PCR positive COVID-19 patients with sensitivity and specificity of 68.8% and 66.7%, respectively. Among 119 patients (male:female, 75:44; mean age, 69 years) with available chest CTs, radiologists assisted by CAD reported pneumonia on CXR with a sensitivity of 81.5% and a specificity of 72.3%. The TATs of CXR reports were significantly shorter than those of rRT-PCR results (median 51 vs. 507 minutes; p < 0.001).
Conclusion
Radiologists with CAD assistance could identify patients with rRT-PCR-positive COVID-19 or pneumonia on CXR with a reasonably acceptable performance. In patients suspected with COVID-19, CXR had much faster TATs than rRT-PCRs.
AUTHORS
URL
Deep Learning–based Automatic Detection Algorithm for Reducing Overlooked Lung Cancers on Chest Radiographs
Sowon Jang et al.RadiologyABSTRACT
Abstract
Chest radiograph interpretation, assisted by a deep learning–based automatic detection algorithm, can reduce the number of overlooked lung cancers without increasing the frequency of chest CT follow-up.
Background
It is uncertain whether a deep learning–based automatic detection algorithm (DLAD) for identifying malignant nodules on chest radiographs will help diagnose lung cancers.
Purpose
To evaluate the efficacy of using a DLAD in observer performance for the detection of lung cancers on chest radiographs.
Materials and Methods
Among patients diagnosed with lung cancers between January 2010 and December 2014, 117 patients (median age, 69 years; interquartile range [IQR], 64–74 years; 57 women) were retrospectively identified in whom lung cancers were visible on previous chest radiographs. For the healthy control group, 234 patients (median age, 58 years; IQR, 48–68 years; 123 women) with normal chest radiographs were randomly selected. Nine observers reviewed each chest radiograph, with and without a DLAD. They detected potential lung cancers and determined whether they would recommend chest CT for follow-up. Observer performance was compared with use of the area under the alternative free-response receiver operating characteristic curve (AUC), sensitivity, and rates of chest CT recommendation.
Results
In total, 105 of the 117 patients had lung cancers that were overlooked on their original radiographs. The average AUC for all observers significantly rose from 0.67 (95% confidence interval [CI]: 0.62, 0.72) without a DLAD to 0.76 (95% CI: 0.71, 0.81) with a DLAD (P < .001). With a DLAD, observers detected more overlooked lung cancers (average sensitivity, 53% [56 of 105 patients] with a DLAD vs 40% [42 of 105 patients] without a DLAD) (P < .001) and recommended chest CT for more patients (62% [66 of 105 patients] with a DLAD vs 47% [49 of 105 patients] without a DLAD) (P < .001). In the healthy control group, no difference existed in the rate of chest CT recommendation (10% [23 of 234 patients] without a DLAD and 8% [20 of 234 patients] with a DLAD) (P = .13).
Conclusion
Using a deep learning–based automatic detection algorithm may help observers reduce the number of overlooked lung cancers on chest radiographs, without a proportional increase in the number of follow-up chest CT examinations.
AUTHORS
URL
Automated identification of chest radiographs with referable abnormality with deep learning: need for recalibration
Eui Jin Hwang et al.European RadiologyABSTRACT
Objectives
To evaluate the calibration of a deep learning (DL) model in a diagnostic cohort and to improve model’s calibration through recalibration procedures.
Methods
Chest radiographs (CRs) from 1135 consecutive patients (M:F = 582:553; mean age, 52.6 years) who visited our emergency department were included. A commercialized DL model was utilized to identify abnormal CRs, with a continuous probability score for each CR. After evaluation of the model calibration, eight different methods were used to recalibrate the original model based on the probability score. The original model outputs were recalibrated using 681 randomly sampled CRs and validated using the remaining 454 CRs. The Brier score for overall performance, average and maximum calibration error, absolute Spiegelhalter’s Z for calibration, and area under the receiver operating characteristic curve (AUROC) for discrimination were evaluated in 1000-times repeated, randomly split datasets.
Results
The original model tended to overestimate the likelihood for the presence of abnormalities, exhibiting average and maximum calibration error of 0.069 and 0.179, respectively; an absolute Spiegelhalter’s Z value of 2.349; and an AUROC of 0.949. After recalibration, significant improvements in the average (range, 0.015–0.036) and maximum (range, 0.057–0.172) calibration errors were observed in eight and five methods, respectively. Significant improvement in absolute Spiegelhalter’s Z (range, 0.809–4.439) was observed in only one method (the recalibration constant). Discriminations were preserved in six methods (AUROC, 0.909–0.949).
Conclusion
The calibration of DL algorithm can be augmented through simple recalibration procedures. Improved calibration may enhance the interpretability and credibility of the model for users.
AUTHORS
URL
https://link.springer.com/article/10.1007/s00330-020-07062-7
Clinical Validation of a Deep Learning Algorithm for Detection of Pneumonia on Chest Radiographs in Emergency Department Patients with Acute Febrile Respiratory Illness
Jae Hyun Kim et al.Journal of Clinical MedicineABSTRACT
Early identification of pneumonia is essential in patients with acute febrile respiratory illness (FRI). We evaluated the performance and added value of a commercial deep learning (DL) algorithm in detecting pneumonia on chest radiographs (CRs) of patients visiting the emergency department (ED) with acute FRI. This single-centre, retrospective study included 377 consecutive patients who visited the ED and the resulting 387 CRs in August 2018–January 2019. The performance of a DL algorithm in detection of pneumonia on CRs was evaluated based on area under the receiver operating characteristics (AUROC) curves, sensitivity, specificity, negative predictive values (NPVs), and positive predictive values (PPVs). Three ED physicians independently reviewed CRs with observer performance test to detect pneumonia, which was re-evaluated with the algorithm eight weeks later. AUROC, sensitivity, and specificity measurements were compared between “DL algorithm” vs. “physicians-only” and between “physicians-only” vs. “physicians aided with the algorithm”. Among 377 patients, 83 (22.0%) had pneumonia. AUROC, sensitivity, specificity, PPV, and NPV of the algorithm for detection of pneumonia on CRs were 0.861, 58.3%, 94.4%, 74.2%, and 89.1%, respectively. For the detection of ‘visible pneumonia on CR’ (60 CRs from 59 patients), AUROC, sensitivity, specificity, PPV, and NPV were 0.940, 81.7%, 94.4%, 74.2%, and 96.3%, respectively. In the observer performance test, the algorithm performed better than the physicians for pneumonia (AUROC, 0.861 vs. 0.788, p = 0.017; specificity, 94.4% vs. 88.7%, p < 0.0001) and visible pneumonia (AUROC, 0.940 vs. 0.871, p = 0.007; sensitivity, 81.7% vs. 73.9%, p = 0.034; specificity, 94.4% vs. 88.7%, p < 0.0001). Detection of pneumonia (sensitivity, 82.2% vs. 53.2%, p = 0.008; specificity, 98.1% vs. 88.7%; p < 0.0001) and ‘visible pneumonia’ (sensitivity, 82.2% vs. 73.9%, p = 0.014; specificity, 98.1% vs. 88.7%, p < 0.0001) significantly improved when the algorithm was used by the physicians. Mean reading time for the physicians decreased from 165 to 101 min with the assistance of the algorithm. Thus, the DL algorithm showed a better diagnosis of pneumonia, particularly visible pneumonia on CR, and improved diagnosis by ED physicians in patients with acute FRI.
AUTHORS
URL
Deep learning algorithm for surveillance of pneumothorax after lung biopsy: a multicenter diagnostic cohort study
Eui Jin Hwang et al.European RadiologyABSTRACT
Objectives
Pneumothorax is the most common and potentially life-threatening complication arising from percutaneous lung biopsy. We evaluated the performance of a deep learning algorithm for detection of post-biopsy pneumothorax in chest radiographs (CRs), in consecutive cohorts reflecting actual clinical situation.
Methods
We retrospectively included post-biopsy CRs of 1757 consecutive patients (1055 men, 702 women; mean age of 65.1 years) undergoing percutaneous lung biopsies from three institutions. A commercially available deep learning algorithm analyzed each CR to identify pneumothorax. We compared the performance of the algorithm with that of radiology reports made in the actual clinical practice. We also conducted a reader study, in which the performance of the algorithm was compared with those of four radiologists. Performances of the algorithm and radiologists were evaluated by area under receiver operating characteristic curves (AUROCs), sensitivity, and specificity, with reference standards defined by thoracic radiologists.
Results
Pneumothorax occurred in 17.5% (308/1757) of cases, out of which 16.6% (51/308) required catheter drainage. The AUROC, sensitivity, and specificity of the algorithm were 0.937, 70.5%, and 97.7%, respectively, for identification of pneumothorax. The algorithm exhibited higher sensitivity (70.2% vs. 55.5%, p < 0.001) and lower specificity (97.7% vs. 99.8%, p < 0.001), compared with those of radiology reports. In the reader study, the algorithm exhibited lower sensitivity (77.3% vs. 81.8–97.7%) and higher specificity (97.6% vs. 81.7–96.0%) than the radiologists.
Conclusion
The deep learning algorithm appropriately identified pneumothorax in post-biopsy CRs in consecutive diagnostic cohorts. It may assist in accurate and timely diagnosis of post-biopsy pneumothorax in clinical practice.
AUTHORS
URL
https://link.springer.com/article/10.1007%2Fs00330-020-06771-3
Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms
Thomas Schaffter et al.JAMA Network OpenABSTRACT
Importance
Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives.
Objective
To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms.
Design, Setting, and Participants
In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016.
Main Outcomes and Measurements
Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists’ specificity with radiologists’ sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists’ recall assessment was developed and evaluated.
Results
Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists’ sensitivity, lower than community-practice radiologists’ specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity.
Conclusions and Relevance
While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.
AUTHORS
URL
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2761795?resultClick=1
Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study
H.E. Kim et al.Lancet Digital HealthABSTRACT
Background
Mammography is the current standard for breast cancer screening. This study aimed to develop an artificial intelligence (AI) algorithm for diagnosis of breast cancer in mammography, and explore whether it could benefit radiologists by improving accuracy of diagnosis.
Methods
In this retrospective study, an AI algorithm was developed and validated with 170 230 mammography examinations collected from five institutions in South Korea, the USA, and the UK, including 36 468 cancer positive confirmed by biopsy, 59 544 benign confirmed by biopsy (8827 mammograms) or follow-up imaging (50 717 mammograms), and 74 218 normal. For the multicentre, observer-blinded, reader study, 320 mammograms (160 cancer positive, 64 benign, 96 normal) were independently obtained from two institutions. 14 radiologists participated as readers and assessed each mammogram in terms of likelihood of malignancy (LOM), location of malignancy, and necessity to recall the patient, first without and then with assistance of the AI algorithm. The performance of AI and radiologists was evaluated in terms of LOM-based area under the receiver operating characteristic curve (AUROC) and recall-based sensitivity and specificity.
Findings
The AI standalone performance was AUROC 0·959 (95% CI 0·952–0·966) overall, and 0·970 (0·963–0·978) in the South Korea dataset, 0·953 (0·938–0·968) in the USA dataset, and 0·938 (0·918–0·958) in the UK dataset. In the reader study, the performance level of AI was 0·940 (0·915–0·965), significantly higher than that of the radiologists without AI assistance (0·810, 95% CI 0·770–0·850; p<0·0001). With the assistance of AI, radiologists' performance was improved to 0·881 (0·850–0·911; p<0·0001). AI was more sensitive to detect cancers with mass (53 [90%] vs 46 [78%] of 59 cancers detected; p=0·044) or distortion or asymmetry (18 [90%] vs ten [50%] of 20 cancers detected; p=0·023) than radiologists. AI was better in detection of T1 cancers (73 [91%] vs 59 [74%] of 80; p=0·0039) or node-negative cancers (104 [87%] vs 88 [74%] of 119; p=0·0025) than radiologists.
Interpretation
The AI algorithm developed with large-scale mammography data showed better diagnostic performance in breast cancer detection compared with radiologists. The significant improvement in radiologists' performance when aided by AI supports application of AI to mammograms as a diagnostic support tool.
AUTHORS
URL
https://www.thelancet.com/journals/landig/article/PIIS2589-7500(20)30003-0/fulltext
Test-retest reproducibility of a deep learning–based automatic detection algorithm for the chest radiograph
H.J. Kim et al.European RadiologyABSTRACT
Objectives
To perform test-retest reproducibility analyses for deep learning–based automatic detection algorithm (DLAD) using two stationary chest radiographs (CRs) with short-term intervals, to analyze influential factors on test-retest variations, and to investigate the robustness of DLAD to simulated post-processing and positional changes.
Methods
This retrospective study included patients with pulmonary nodules resected in 2017. Preoperative CRs without interval changes were used. Test-retest reproducibility was analyzed in terms of median differences of abnormality scores, intraclass correlation coefficients (ICC), and 95% limits of agreement (LoA). Factors associated with test-retest variation were investigated using univariable and multivariable analyses. Shifts in classification between the two CRs were analyzed using pre-determined cutoffs. Radiograph post-processing (blurring and sharpening) and positional changes (translations in x- and y-axes, rotation, and shearing) were simulated and agreement of abnormality scores between the original and simulated CRs was investigated.
Results
Our study analyzed 169 patients (median age, 65 years; 91 men). The median difference of abnormality scores was 1–2% and ICC ranged from 0.83 to 0.90. The 95% LoA was approximately ± 30%. Test-retest variation was negatively associated with solid portion size (β, − 0.50; p = 0.008) and good nodule conspicuity (β, − 0.94; p < 0.001). A small fraction (15/169) showed discordant classifications when the high-specificity cutoff (46%) was applied to the model outputs (p = 0.04). DLAD was robust to the simulated positional change (ICC, 0.984, 0.996), but relatively less robust to post-processing (ICC, 0.872, 0.968).
Conclusions
DLAD was robust to the test-retest variation. However, inconspicuous nodules may cause fluctuations of the model output and subsequent misclassifications.
Key Points
• The deep learning–based automatic detection algorithm was robust to the test-retest variation of the chest radiographs in general.
• The test-retest variation was negatively associated with solid portion size and good nodule conspicuity.
• High-specificity cutoff (46%) resulted in discordant classifications of 8.9% (15/169; p = 0.04) between the test-retest radiographs.
AUTHORS
URL
https://link.springer.com/article/10.1007%2Fs00330-019-06589-8
Deep Learning for Chest Radiograph Diagnosis in the Emergency Department
E.J. Hwang et al.RadiologyABSTRACT
Background
The performance of a deep learning (DL) algorithm should be validated in actual clinical situations, before its clinical implementation.
Purpose
To evaluate the performance of a DL algorithm for identifying chest radiographs with clinically relevant abnormalities in the emergency department (ED) setting.
Materials and Methods
This single-center retrospective study included consecutive patients who visited the ED and underwent initial chest radiography between January 1 and March 31, 2017. Chest radiographs were analyzed with a commercially available DL algorithm. The performance of the algorithm was evaluated by determining the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity at predefined operating cutoffs (high-sensitivity and high-specificity cutoffs). The sensitivities and specificities of the algorithm were compared with those of the on-call radiology residents who interpreted the chest radiographs in the actual practice by using McNemar tests. If there were discordant findings between the algorithm and resident, the residents reinterpreted the chest radiographs by using the algorithm’s output.
Results
A total of 1135 patients (mean age, 53 years ± 18; 582 men) were evaluated. In the identification of abnormal chest radiographs, the algorithm showed an AUC of 0.95 (95% confidence interval [CI]: 0.93, 0.96), a sensitivity of 88.7% (227 of 256 radiographs; 95% CI: 84.1%, 92.3%), and a specificity of 69.6% (612 of 879 radiographs; 95% CI: 66.5%, 72.7%) at the high-sensitivity cutoff and a sensitivity of 81.6% (209 of 256 radiographs; 95% CI: 76.3%, 86.2%) and specificity of 90.3% (794 of 879 radiographs; 95% CI: 88.2%, 92.2%) at the high-specificity cutoff. Radiology residents showed lower sensitivity (65.6% [168 of 256 radiographs; 95% CI: 59.5%, 71.4%], P < .001) and higher specificity (98.1% [862 of 879 radiographs; 95% CI: 96.9%, 98.9%], P < .001) compared with the algorithm. After reinterpretation of chest radiographs with use of the algorithm’s outputs, the sensitivity of the residents improved (73.4% [188 of 256 radiographs; 95% CI: 68.0%, 78.8%], P = .003), whereas specificity was reduced (94.3% [829 of 879 radiographs; 95% CI: 92.8%, 95.8%], P < .001).
Conclusion
A deep learning algorithm used with emergency department chest radiographs showed diagnostic performance for identifying clinically relevant abnormalities and helped improve the sensitivity of radiology residents’ evaluation.
AUTHORS
URL
Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs
E.J. Hwang et al.JAMA Network OpenABSTRACT
Interpretation of chest radiographs is a challenging task prone to errors, requiring expert readers. An automated system that can accurately classify chest radiographs may help streamline the clinical workflow. To develop a deep learning–based algorithm that can classify normal and abnormal results from chest radiographs with major thoracic diseases including pulmonary malignant neoplasm, active tuberculosis, pneumonia, and pneumothorax and to validate the algorithm’s performance using independent data sets. This diagnostic study developed a deep learning–based algorithm using single-center data collected between November 1, 2016, and January 31, 2017. The algorithm was externally validated with multicenter data collected between May 1 and July 31, 2018. A total of 54 221 chest radiographs with normal findings from 47 917 individuals (21 556 men and 26 361 women; mean [SD] age, 51 [16] years) and 35 613 chest radiographs with abnormal findings from 14 102 individuals (8373 men and 5729 women; mean [SD] age, 62 [15] years) were used to develop the algorithm. A total of 486 chest radiographs with normal results and 529 with abnormal results (1 from each participant; 628 men and 387 women; mean [SD] age, 53 [18] years) from 5 institutions were used for external validation. Fifteen physicians, including nonradiology physicians, board-certified radiologists, and thoracic radiologists, participated in observer performance testing. Data were analyzed in August 2018. Image-wise classification performances measured by area under the receiver operating characteristic curve; lesion-wise localization performances measured by area under the alternative free-response receiver operating characteristic curve. The algorithm demonstrated a median (range) area under the curve of 0.979 (0.973-1.000) for image-wise classification and 0.972 (0.923-0.985) for lesion-wise localization; the algorithm demonstrated significantly higher performance than all 3 physician groups in both image-wise classification (0.983 vs 0.814-0.932; all P < .005) and lesion-wise localization (0.985 vs 0.781-0.907; all P < .001). Significant improvements in both image-wise classification (0.814-0.932 to 0.904-0.958; all P < .005) and lesion-wise localization (0.781-0.907 to 0.873-0.938; all P < .001) were observed in all 3 physician groups with assistance of the algorithm. The algorithm consistently outperformed physicians, including thoracic radiologists, in the discrimination of chest radiographs with major thoracic diseases, demonstrating its potential to improve the quality and efficiency of clinical practice.
AUTHORS
URL
https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2728630
Development and Validation of a Deep Learning–based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs
E.J. Hwang et al.Clinical Infectious Diseases (CID)ABSTRACT
Detection of active pulmonary tuberculosis (TB) on chest radiographs (CR) is critical for the diagnosis and screening of TB. An automated system may help streamline the TB screening process and improve diagnostic performance. We developed a deep-learning-based automatic detection (DLAD) algorithm, using 54,221 normal CRs and 6,768 CRs with active pulmonary TB, which were labeled and annotated by 13 board-certified radiologists. The performance of DLAD was validated using six external multi-center, multi-national datasets. To compare the performances of DLAD with physicians, an observer performance test was conducted by 15 physicians including non-radiology physicians, board-certified radiologists, and thoracic radiologists. Image-wise classification and lesion-wise localization performances were measured using area under the receiver operating characteristic (ROC) curves, and area under the alternative free-response ROC curves, respectively. Sensitivities and specificities of DLAD were calculated using two cutoffs [high sensitivity (98%) and high specificity (98%)] obtained through in-house validation. DLAD demonstrated classification performances of 0.977–1.000 and localization performance of 0.973–1.000. Sensitivities and specificities for classification were 94.3–100% and 91.1–100% using the high sensitivity cutoff and 84.1–99.0% and 99.1–100% using the high specificity cutoff. DLAD showed significantly higher performance in both classification (0.993 vs. 0.746–0.971) and localization (0.993 vs. 0.664–0.925) compared to all groups of physicians. Our DLAD demonstrated excellent and consistent performance in the detection of active pulmonary TB on CR, outperforming physicians including thoracic radiologists.
AUTHORS
URL
https://academic.oup.com/cid/advance-article/doi/10.1093/cid/ciy967/5174137
Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs
S.G. Park et al.RadiologyABSTRACT
The purpose of this study is to develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. For this retrospective study, DLAD was developed by using 43 292 chest radiographs (normal radiograph–to–nodule radiograph ratio, 34 067:9225) in 34 676 patients (healthy-to-nodule ratio, 30 784:3892; 19 230 men [mean age, 52.8 years; age range, 18–99 years]; 15 446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph classification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. According to one internal and four external validation data sets, radiograph classification and nodule detection performances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P < .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P < .05). This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nodule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader.
AUTHORS
URL
Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study
E.K. Kim et al.Scientific ReportsABSTRACT
We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients’ age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remainder were used for training. The core algorithm of DIB-MG is a deep convolutional neural network; a deep learning algorithm specialized for images. Each sample (case) is an exam composed of 4-view images (RCC, RMLO, LCC, and LMLO). For each case in a training set, the cancer probability inferred from DIB-MG is compared with the per-case ground-truth label. Then the model parameters in DIB-MG are updated based on the error between the prediction and the ground-truth. At the operating point (threshold) of 0.5, sensitivity was 75.6% and 76.1% when specificity was 90.2% and 88.5%, and AUC was 0.903 and 0.906 for the validation and test sets, respectively. This research showed the potential of DIB-MG as a screening tool for breast cancer.
AUTHORS
URL
Learning Visual Context by Comparison
M. Kim et al.ECCV 2020ABSTRACT
Finding diseases from an X-ray image is an important yet highly challenging task. Current methods for solving this task exploit various characteristics of the chest X-ray image, but one of the most important characteristics is still missing: the necessity of comparison between related regions in an image. In this paper, we present Attend-and-Compare Module (ACM) for capturing the difference between an object of interest and its corresponding context. We show that explicit difference modeling can be very helpful in tasks that require direct comparison between locations from afar. This module can be plugged into existing deep learning models. For evaluation, we apply our module to three chest X-ray recognition tasks and COCO object detection & segmentation tasks and observe consistent improvements across tasks. The code is available at https://github.com/mk-minchul/attend-and-compare.
AUTHORS
URL
SRM: A Style-based Recalibration Module for Convolutional Neural Networks
H.J. Lee et al.ICCV 2019ABSTRACT
Following the advance of style transfer with Convolutional Neural Networks (CNNs), the role of styles in CNNs has drawn growing attention from a broader perspective. In this paper, we aim to fully leverage the potential of styles to improve the performance of CNNs in general vision tasks. We propose a Style-based Recalibration Module (SRM), a simple yet effective architectural unit, which adaptively recalibrates intermediate feature maps by exploiting their styles. SRM first extracts the style information from each channel of the feature maps by style pooling, then estimates per-channel recalibration weight via channel-independent style integration. By incorporating the relative importance of individual styles into feature maps, SRM effectively enhances the representational ability of a CNN. The proposed module is directly fed into existing CNN architectures with negligible overhead. We conduct comprehensive experiments on general image recognition as well as tasks related to styles, which verify the benefit of SRM over recent approaches such as Squeeze-and-Excitation (SE). To explain the inherent difference between SRM and SE, we provide an in-depth comparison of their representational properties.
AUTHORS
URL
Photometric Transformer Networks and Label Adjustment for Breast Density Prediction
J.H. Lee et al.ICCV 2019 WorkshopABSTRACT
Grading breast density is highly sensitive to normalization settings of digital mammogram as the density is tightly correlated with the distribution of pixel intensity. Also, the grade varies with readers due to uncertain grading criteria. These issues are inherent in the density assessment of digital mammography. They are problematic when designing a computer-aided prediction model for breast density and become worse if the data comes from multiple sites. In this paper, we proposed two novel deep learning techniques for breast density prediction: 1) photometric transformation which adaptively normalizes the input mammograms, and 2) label distillation which adjusts the label by using its output prediction. The photometric transformer network predicts optimal parameters for photometric transformation on the fly, learned jointly with the main prediction network. The label distillation, a type of pseudo-label techniques, is intended to mitigate the grading variation. We experimentally showed that the proposed methods are beneficial in terms of breast density prediction, resulting in significant performance improvement compared to various previous approaches.
AUTHORS
URL
Learning Loss for Active Learning
D.G. Yoo et al.CVPR 2019ABSTRACT
More annotated data improves the performance of deep neural networks. The problem is the limited budget for annotation. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specifically for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic and works efficiently with the deep networks. We attach a small parametric module, named ``loss prediction module,'' to a target network, and learn it to predict target losses of unlabeled inputs. After that, this module can suggest data that the target model likely produces a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks.
AUTHORS
URL
PseudoEdgeNet: Nuclei Segmentation only with Point Annotations
I.W. Yoo et al.MICCAI 2019ABSTRACT
Nuclei segmentation is one of the important tasks for whole slide image analysis in digital pathology. With the drastic advance of deep learning, recent deep networks have demonstrated successful performance of the nuclei segmentation task. However, a major bottleneck to achieving good performance is the cost for annotation. A large network requires a large number of segmentation masks, and this annotation task is given to pathologists, not the public. In this paper, we propose a weakly supervised nuclei segmentation method, which requires only point annotations for training. This method can scale to large training set as marking a point of a nucleus is much cheaper than the fine segmentation mask. To this end, we introduce a novel auxiliary network, called PseudoEdgeNet, which guides the segmentation network to recognize nuclei edges even without edge annotations. We evaluate our method with two public datasets, and the results demonstrate that the method consistently outperforms other weakly supervised methods.
AUTHORS
URL
Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks
H.S. Nam et al.NeurlPS 2018ABSTRACT
Real-world image recognition is often challenged by the variability of visual styles including object textures, lighting conditions, filter effects, etc. Although these variations have been deemed to be implicitly handled by more training data and deeper networks, recent advances in image style transfer suggest that it is also possible to explicitly manipulate the style information. Extending this idea to general visual recognition problems, we present Batch-Instance Normalization (BIN) to explicitly normalize unnecessary styles from images. Considering certain style features play an essential role in discriminative tasks, BIN learns to selectively normalize only disturbing styles while preserving useful styles. The proposed normalization module is easily incorporated into existing network architectures such as Residual Networks, and surprisingly improves the recognition performance in various scenarios. Furthermore, experiments verify that BIN effectively adapts to completely different tasks like object classification and style transfer, by controlling the tradeoff between preserving and removing style variations. BIN can be implemented with only a few lines of code using popular deep learning frameworks.
AUTHORS
URL
Distort-and-Recover: Color Enhancement Using Deep Reinforcement Learning
J.C. Park et al.CVPR 2018ABSTRACT
Learning-based color enhancement approaches typically learn to map from input images to retouched images. Most of existing methods require expensive pairs of input-retouched images or produce results in a noninterpretable way. In this paper, we present a deep reinforcement learning (DRL) based method for color enhancement to explicitly model the step-wise nature of human retouching process. We cast a color enhancement process as a Markov Decision Process where actions are defined as global color adjustment operations. Then we train our agent to learn the optimal global enhancement sequence of the actions. In addition, we present a ‘distort-and-recover’ training scheme which only requires high-quality reference images for training instead of input and retouched image pairs. Given high-quality reference images, we distort the images’ color distribution and form distorted-reference image pairs for training. Through extensive experiments, we show that our method produces decent enhancement results and our DRL approach is more suitable for the ‘distortand-recover’ training scheme than previous supervised approaches. Supplementary material and code are available at https://sites.google.com/view/distort-and-recover/
AUTHORS
URL
CBAM: Convolutional Block Attention Module
J.C. Park et al.ECCV 2018ABSTRACT
We propose Convolutional Block Attention Module (CBAM), a simple and effective attention module that can be integrated with any feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architecture seamlessly with negligible overheads. Our module is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements on classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.
AUTHORS
URL
BAM: Bottleneck Attention Module
J.C. Park et al.BMVC 2018ABSTRACT
Recent advances in deep neural networks have been developed via architecture search in depth, width, and cardinality. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective attention module, named Bottleneck Attention Module (BAM), that can be integrated with any feed-forward convolutional neural networks. Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs. Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models. We validate our BAM through extensive experiments on CIFAR-100, ImageNet-1K, VOC 2007 and MS COCO benchmarks. Our experiments show consistent improvement in classification and detection performances with various models, demonstrating the wide applicability of BAM. The code and models will be publicly available.
AUTHORS
URL
A Robust and Effective Approach Towards Accurate Metastasis Detection and pN-stage Classification in Breast Cancer
B.J. Lee et al.MICCAI 2018ABSTRACT
Predicting TNM stage is the major determinant of breast cancer prognosis and treatment. The essential part of TNM stage classification is whether the cancer has metastasized to the regional lymph nodes (N-stage). Pathologic N-stage (pN-stage) is commonly performed by pathologists detecting metastasis in histological slides. However, this diagnostic procedure is prone to misinterpretation and would normally require extensive time by pathologists because of the sheer volume of data that needs a thorough review. Automated detection of lymph node metastasis and pN-stage prediction has a great potential to reduce their workload and help the pathologist. Recent advances in convolutional neural networks (CNN) have shown significant improvements in histological slide analysis, but accuracy is not optimized because of the difficulty in the handling of gigapixel images. In this paper, we propose a robust method for metastasis detection and pN-stage classification in breast cancer from multiple gigapixel pathology images in an effective way. pN-stage is predicted by combining patch-level CNN based metastasis detector and slide-level lymph node classifier. The proposed framework achieves a state-of-the-art quadratic weighted kappa score of 0.9203 on the Camelyon17 dataset, outperforming the previous winning method of the Camelyon17 challenge.
Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks
H.E. Kim et al.MICCAI 2018ABSTRACT
Data is one of the most important factors in machine learning. However, even if we have high-quality data, there is a situation in which access to the data is restricted. For example, access to the medical data from outside is strictly limited due to the privacy issues. In this case, we have to learn a model sequentially only with the data accessible in the corresponding stage. In this work, we propose a new method for preserving learned knowledge by modeling the high-level feature space and the output space to be mutually informative, and constraining feature vectors to lie in the modeled space during training. The proposed method is easy to implement as it can be applied by simply adding a reconstruction loss to an objective function. We evaluate the proposed method on CIFAR-10/100 and a chest X-ray dataset, and show benefits in terms of knowledge preservation compared to previous approaches.
Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks
H.S. Nam et al.NIPS 2018ABSTRACT
Real-world image recognition is often challenged by the variability of visual styles including object textures, lighting conditions, filter effects, etc. Although these variations have been deemed to be implicitly handled by more training data and deeper networks, recent advances in image style transfer suggest that it is also possible to explicitly manipulate the style information. Extending this idea to general visual recognition problems, we present Batch-Instance Normalization (BIN) to explicitly normalize unnecessary styles from images. Considering certain style features play an essential role in discriminative tasks, BIN learns to selectively normalize only disturbing styles while preserving useful styles. The proposed normalization module is easily incorporated into existing network architectures such as Residual Networks, and surprisingly improves the recognition performance in various scenarios. Furthermore, experiments verify that BIN effectively adapts to completely different tasks like object classification and style transfer, by controlling the trade-off between preserving and removing style variations.
AUTHORS
URL
Accurate Lung Segmentation via Network-Wise Training of Convolutional Networks
S.H. Hwang et al.MICCAI 2017 DLMIA WorkshopABSTRACT
We introduce an accurate lung segmentation model for chest radiographs based on deep convolutional neural networks. Our model is based on atrous convolutional layers to increase the field-of-view of filters efficiently. To improve segmentation performances further, we also propose a multi-stage training strategy, network-wise training, which the current stage network is fed with both input images and the outputs from pre-stage network. It is shown that this strategy has an ability to reduce falsely predicted labels and produce smooth boundaries of lung fields. We evaluate the proposed model on a common benchmark dataset, JSRT, and achieve the state-of-the-art segmentation performances with much fewer model parameters.
A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology
K.H. Paeng et al.MICCAI 2017 DLMIA WorkshopABSTRACT
Predicting tumor proliferation scores is an important biomarker indicative of breast cancer patients' prognosis. In this paper, we present a unified framework to predict tumor proliferation scores from whole slide images in breast histopathology. The proposed system is offers a fully automated solution to predicting both a molecular data based, and a mitosis counting based tumor proliferation score. The framework integrates three modules, each fine-tuned to maximize the overall performance: an image processing component for handling whole slide images, a deep learning based mitosis detection network, and a proliferation scores prediction module. We have achieved 0.567 quadratic weighted Cohen's kappa in mitosis counting based score prediction and 0.652 F1-score in mitosis detection. On Spearman's correlation coefficient, which evaluates prediction on the molecular data based score, the system obtained 0.6171. Our system won first place in all of the three tasks in Tumor Proliferation Assessment Challenge at MICCAI 2016, outperforming all other approaches.
AUTHORS
URL
Transferring Knowledge to Smaller Network With Class-Distance Loss
S.W. Kim et al.ICLR 2017 WorkshopABSTRACT
Training a network with small capacity that can perform as well as a larger capacity network is an important problem that needs to be solved in real life applications which require fast inference time and small memory requirement. Previous approaches that transfer knowledge from a bigger network to a smaller network show little benefit when applied to state-of-the-art convolutional neural network architectures such as Residual Network trained with batch normalization. We propose class-distance loss that helps teacher networks to form densely clustered vector space to make it easy for the student network to learn from it. We show that a small network with half the size of the original network trained with the proposed strategy can perform close to the original network on CIFAR-10 dataset.
Semantic Noise Modeling for Better Representation Learning
H.E. Kim et al.arXivABSTRACT
Latent representation learned from multi-layered neural networks via hierarchical feature abstraction enables recent success of deep learning. Under the deep learning framework, generalization performance highly depends on the learned latent representation which is obtained from an appropriate training scenario with a task-specific objective on a designed network model. In this work, we propose a novel latent space modeling method to learn better latent representation. We designed a neural network model based on the assumption that good base representation can be attained by maximizing the total correlation between the input, latent, and output variables. From the base model, we introduce a semantic noise modeling method which enables class-conditional perturbation on latent space to enhance the representational power of learned latent feature. During training, latent vector representation can be stochastically perturbed by a modeled class-conditional additive noise while maintaining its original semantic feature. It implicitly brings the effect of semantic augmentation on the latent space. The proposed model can be easily learned by back-propagation with common gradient-based optimization algorithms. Experimental results show that the proposed method helps to achieve performance benefits against various previous approaches. We also provide the empirical analyses for the proposed class-conditional perturbation process including t-SNE visualization.
AUTHORS
URL
Self-Transfer Learning for Fully Weakly Supervised Object Localization
S.H. Hwang et al.MICCAI 2016ABSTRACT
Recent advances of deep learning have achieved remarkable performances in various challenging computer vision tasks. Especially in object localization, deep convolutional neural networks outperform traditional approaches based on extraction of data/task-driven features instead of hand-crafted features. Although location information of region-of-interests (ROIs) gives good prior for object localization, it requires heavy annotation efforts from human resources. Thus a weakly supervised framework for object localization is introduced. The term "weakly" means that this framework only uses image-level labeled datasets to train a network. With the help of transfer learning which adopts weight parameters of a pre-trained network, the weakly supervised learning framework for object localization performs well because the pre-trained network already has well-trained class-specific features. However, those approaches cannot be used for some applications which do not have pre-trained networks or well-localized large scale images. Medical image analysis is a representative among those applications because it is impossible to obtain such pre-trained networks. In this work, we present a "fully" weakly supervised framework for object localization ("semi"-weakly is the counterpart which uses pre-trained filters for weakly supervised localization) named as self-transfer learning (STL). It jointly optimizes both classification and localization networks simultaneously. By controlling a supervision level of the localization network, STL helps the localization network focus on correct ROIs without any types of priors. We evaluate the proposed STL framework using two medical image datasets, chest X-rays and mammograms, and achieve signiticantly better localization performance compared to previous weakly supervised approaches.
Pixel-Level Domain Transfer
D.G. Yoo et al.ECCV 2016ABSTRACT
We present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator in Generative Adversarial Nets, but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.
AUTHORS
URL
A novel approach for tuberculosis screening based on deep convolutional neural networks
S.H. Hwang et al.SPIE Medical Imaging 2016ABSTRACT
We propose an automatic TB screening system based on deep CNN. Since CNN extracts the most discriminative features according to target objective from given data by itself, the proposed system does not require manually-designed features for TB screening. Also, we show that transfer learning from lower convolutional layers of pre-trained networks resolves the difficulties in handling high-resolution medical images and training huge parameters with limited number of images. Experiments are conducted using three real field datasets, the KIT, MC and Shenzhen sets, and the results show that the proposed system has high screening performance in terms of AUC and accuracy.
AUTHORS
URL
https://spie.org/Publications/Proceedings/Paper/10.1117/12.2216198?SSO=1
Deconvolutional Feature Stacking for Weakly-Supervised Semantic Segmentation
H.E. Kim et al.arXivABSTRACT
A weakly-supervised semantic segmentation framework with a tied deconvolutional neural network is presented. Each deconvolution layer in the framework consists of unpooling and deconvolution operations. 'Unpooling' upsamples the input feature map based on unpooling switches defined by corresponding convolution layer's pooling operation. 'Deconvolution' convolves the input unpooled features by using convolutional weights tied with the corresponding convolution layer's convolution operation. The unpooling-deconvolution combination helps to eliminate less discriminative features in a feature extraction stage, since output features of the deconvolution layer are reconstructed from the most discriminative unpooled features instead of the raw one. This results in reduction of false positives in a pixel-level inference stage. All the feature maps restored from the entire deconvolution layers can constitute a rich discriminative feature set according to different abstraction levels. Those features are stacked to be selectively used for generating class-specific activation maps. Under the weak supervision (image-level labels), the proposed framework shows promising results on lesion segmentation in medical images (chest X-rays) and achieves state-of-the-art performance on the PASCAL VOC segmentation dataset in the same experimental condition.
Deep Convolutional Neural Network-based Mitosis Detection in Invasive Carcinoma of Breast by Smartphone-based Histologic Image Acquisition
S.H. Kim et al.USCAP 2016ABSTRACT
Mitosis counting is time and labor-consuming work and it frequently reveals inter-observer variability. Although deep convolutional neural network, the most accurate image classification algorithm, has been used for detecting mitosis, only public data sets were tested and it had never been utilized for routine histologic slide images. Recently, smartphone cameras with adaptors to the microscope were tried for easier image acquisition and they significantly resolved a barrier for applying computer algorithms to analyze histologic images. Histologic slides of 70 invasive ductal carcinomas of breast were selected and 1761 high-power field histologic images (400x) were acquired by using smartphone application with an adaptor attached to the microscope manufactured by us. Mitoses were annotated by four pathologists blindly. More than three pathologists’ concordance was regarded as true. 2004 mitotic cells and 801600 non-mitotic cells from 60 cases were divided into 10 sets and the algorithm was sequentially trained using fine-tuning method. After the training, ten patients’ images were tested for the concordance of detection with pathologists. During the algorithm training, sensitivity for mitosis detection was calculated between 75-83%. Specificity for mitosis detection was increased to achieve 97% as we trained the algorithm with more images. The trained algorithm identified 189 mitoses in 748 images from 10 cases and showed 79% sensitivity and 96% specificity for detecting mitosis compared to the pathologists. The detected mitoses were displayed in the application within 14 seconds in average. The proposed deep convolutional neural network-based mitosis detection system revealed remarkable sensitivity and specificity, and the performance improved as more images were utilized for training. Along with the smartphone application and the adaptor we manufactured, it assists pathologists to identify mitosis so that reduce time and labor costs, while resulting objective diagnosis.
AUTHORS
URL
No Link
AttentionNet: Aggregating Weak Directions for Accurate Object Detection
D.G. Yoo et al.ICCV 2015ABSTRACT
We present a novel detection method using a deep convolutional neural network (CNN), named AttentionNet. We cast an object detection problem as an iterative classification problem, which is the most suitable form of a CNN. AttentionNet provides quantized weak directions pointing a target object and the ensemble of iterative predictions from AttentionNet converges to an accurate object boundary box. Since AttentionNet is a unified network for object detection, it detects objects without any separated models from the object proposal to the post bounding-box regression. We evaluate AttentionNet by a human detection task and achieve the state-of-the-art performance of 65% (AP) on PASCAL VOC 2007/2012 with an 8-layered architecture only.