Learning Loss for Active Learning


More annotated data improves the performance of deep neural networks. The problem is the limited budget for annotation. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specifically for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic and works efficiently with the deep networks. We attach a small parametric module, named ``loss prediction module,'' to a target network, and learn it to predict target losses of unlabeled inputs. After that, this module can suggest data that the target model likely produces a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks.

Development and Validation of a Deep Learning–Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs


Interpretation of chest radiographs is a challenging task prone to errors, requiring expert readers. An automated system that can accurately classify chest radiographs may help streamline the clinical workflow. To develop a deep learning–based algorithm that can classify normal and abnormal results from chest radiographs with major thoracic diseases including pulmonary malignant neoplasm, active tuberculosis, pneumonia, and pneumothorax and to validate the algorithm’s performance using independent data sets. This diagnostic study developed a deep learning–based algorithm using single-center data collected between November 1, 2016, and January 31, 2017. The algorithm was externally validated with multicenter data collected between May 1 and July 31, 2018. A total of 54 221 chest radiographs with normal findings from 47 917 individuals (21 556 men and 26 361 women; mean [SD] age, 51 [16] years) and 35 613 chest radiographs with abnormal findings from 14 102 individuals (8373 men and 5729 women; mean [SD] age, 62 [15] years) were used to develop the algorithm. A total of 486 chest radiographs with normal results and 529 with abnormal results (1 from each participant; 628 men and 387 women; mean [SD] age, 53 [18] years) from 5 institutions were used for external validation. Fifteen physicians, including nonradiology physicians, board-certified radiologists, and thoracic radiologists, participated in observer performance testing. Data were analyzed in August 2018. Image-wise classification performances measured by area under the receiver operating characteristic curve; lesion-wise localization performances measured by area under the alternative free-response receiver operating characteristic curve. The algorithm demonstrated a median (range) area under the curve of 0.979 (0.973-1.000) for image-wise classification and 0.972 (0.923-0.985) for lesion-wise localization; the algorithm demonstrated significantly higher performance than all 3 physician groups in both image-wise classification (0.983 vs 0.814-0.932; all P < .005) and lesion-wise localization (0.985 vs 0.781-0.907; all P < .001). Significant improvements in both image-wise classification (0.814-0.932 to 0.904-0.958; all P < .005) and lesion-wise localization (0.781-0.907 to 0.873-0.938; all P < .001) were observed in all 3 physician groups with assistance of the algorithm. The algorithm consistently outperformed physicians, including thoracic radiologists, in the discrimination of chest radiographs with major thoracic diseases, demonstrating its potential to improve the quality and efficiency of clinical practice.

@article{hwang2019development, title={Development and Validation of a Deep Learning--Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs}, author={Hwang, Eui Jin and Park, Sunggyun and Jin, Kwang-Nam and Im Kim, Jung and Choi, So Young and Lee, Jong Hyuk and Goo, Jin Mo and Aum, Jaehong and Yim, Jae-Joon and Cohen, Julien G and others}, journal={JAMA network open}, volume={2}, number={3}, pages={e191095--e191095}, year={2019}, publisher={American Medical Association} }

Development and Validation of a Deep Learning-Based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs


Detection of active pulmonary tuberculosis (TB) on chest radiographs (CR) is critical for the diagnosis and screening of TB. An automated system may help streamline the TB screening process and improve diagnostic performance. We developed a deep-learning-based automatic detection (DLAD) algorithm, using 54,221 normal CRs and 6,768 CRs with active pulmonary TB, which were labeled and annotated by 13 board-certified radiologists. The performance of DLAD was validated using six external multi-center, multi-national datasets. To compare the performances of DLAD with physicians, an observer performance test was conducted by 15 physicians including non-radiology physicians, board-certified radiologists, and thoracic radiologists. Image-wise classification and lesion-wise localization performances were measured using area under the receiver operating characteristic (ROC) curves, and area under the alternative free-response ROC curves, respectively. Sensitivities and specificities of DLAD were calculated using two cutoffs [high sensitivity (98%) and high specificity (98%)] obtained through in-house validation. DLAD demonstrated classification performances of 0.977–1.000 and localization performance of 0.973–1.000. Sensitivities and specificities for classification were 94.3–100% and 91.1–100% using the high sensitivity cutoff and 84.1–99.0% and 99.1–100% using the high specificity cutoff. DLAD showed significantly higher performance in both classification (0.993 vs. 0.746–0.971) and localization (0.993 vs. 0.664–0.925) compared to all groups of physicians. Our DLAD demonstrated excellent and consistent performance in the detection of active pulmonary TB on CR, outperforming physicians including thoracic radiologists.

@article{doi:10.1093/cid/ciy967, author = {Hwang, Eui Jin and Park, Sunggyun and Jin, Kwang-Nam and Kim, Jung Im and Choi, So Young and Lee, Jong Hyuk and Goo, Jin Mo and Aum, Jaehong and Yim, Jae-Joon and Park, Chang Min and DLAD Development and Evaluation Group}, title = {Development and Validation of a Deep Learning-Based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs}, journal = {Clinical Infectious Diseases}, pages = {ciy967}, year = {2018}, doi = {10.1093/cid/ciy967} }

Development and Validation of Deep Learning–based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs


The purpose of this study is to develop and validate a deep learning–based automatic detection algorithm (DLAD) for malignant pulmonary nodules on chest radiographs and to compare its performance with physicians including thoracic radiologists. For this retrospective study, DLAD was developed by using 43 292 chest radiographs (normal radiograph–to–nodule radiograph ratio, 34 067:9225) in 34 676 patients (healthy-to-nodule ratio, 30 784:3892; 19 230 men [mean age, 52.8 years; age range, 18–99 years]; 15 446 women [mean age, 52.3 years; age range, 18–98 years]) obtained between 2010 and 2015, which were labeled and partially annotated by 13 board-certified radiologists, in a convolutional neural network. Radiograph classification and nodule detection performances of DLAD were validated by using one internal and four external data sets from three South Korean hospitals and one U.S. hospital. For internal and external validation, radiograph classification and nodule detection performances of DLAD were evaluated by using the area under the receiver operating characteristic curve (AUROC) and jackknife alternative free-response receiver-operating characteristic (JAFROC) figure of merit (FOM), respectively. An observer performance test involving 18 physicians, including nine board-certified radiologists, was conducted by using one of the four external validation data sets. Performances of DLAD, physicians, and physicians assisted with DLAD were evaluated and compared. According to one internal and four external validation data sets, radiograph classification and nodule detection performances of DLAD were a range of 0.92–0.99 (AUROC) and 0.831–0.924 (JAFROC FOM), respectively. DLAD showed a higher AUROC and JAFROC FOM at the observer performance test than 17 of 18 and 15 of 18 physicians, respectively (P < .05), and all physicians showed improved nodule detection performances with DLAD (mean JAFROC FOM improvement, 0.043; range, 0.006–0.190; P < .05). This deep learning–based automatic detection algorithm outperformed physicians in radiograph classification and nodule detection performance for malignant pulmonary nodules on chest radiographs, and it enhanced physicians’ performances when used as a second reader.

@article{nam2018development, title={Development and Validation of Deep Learning--based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs}, author={Nam, Ju Gang and Park, Sunggyun and Hwang, Eui Jin and Lee, Jong Hyuk and Jin, Kwang-Nam and Lim, Kun Young and Vu, Thienkai Huy and Sohn, Jae Ho and Hwang, Sangheum and Goo, Jin Mo and others}, journal={Radiology}, pages={180237}, year={2018}, publisher={Radiological Society of North America} }

CBAM: Convolutional Block Attention Module


We propose Convolutional Block Attention Module (CBAM), a simple and effective attention module that can be integrated with any feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architecture seamlessly with negligible overheads. Our module is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements on classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

@inproceedings{woo2018cbam, title={CBAM: Convolutional Block Attention Module}, author={Woo, Sanghyun and Park, Jongchan and Lee, Joon-Young and So Kweon, In}, booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, pages={3--19}, year={2018} }

BAM: Bottleneck Attention Module


Recent advances in deep neural networks have been developed via architecture search in depth, width, and cardinality. In this work, we focus on the effect of attention in general deep neural networks. We propose a simple and effective attention module, named Bottleneck Attention Module (BAM), that can be integrated with any feed-forward convolutional neural networks. Our module infers an attention map along two separate pathways, channel and spatial. We place our module at each bottleneck of models where the downsampling of feature maps occurs. Our module constructs a hierarchical attention at bottlenecks with a number of parameters and it is trainable in an end-to-end manner jointly with any feed-forward models. We validate our BAM through extensive experiments on CIFAR-100, ImageNet-1K, VOC 2007 and MS COCO benchmarks. Our experiments show consistent improvement in classification and detection performances with various models, demonstrating the wide applicability of BAM. The code and models will be publicly available.

@inproceedings{park2018bam, title={BAM: bottleneck attention module}, author={Park, Jongchan and Woo, Sanghyun and Lee, Joon-Young and Kweon, In So}, booktitle={Proceedings of the British Machine Vision Conference (BMVC)}, year={2018} }

Distort-and-Recover: Color Enhancement using Deep Reinforcement Learning


Learning-based color enhancement approaches typically learn to map from input images to retouched images. Most of existing methods require expensive pairs of input- retouched images or produce results in a non-interpretable way. In this paper, we present a deep reinforcement learning (DRL) based method for color enhancement to explicitly model the step-wise nature of human retouching process. We cast a color enhancement process as a Markov Decision Process where actions are defined as global color adjustment operations. Then we train our agent to learn the optimal global enhancement sequence of the actions. In addition, we present a ‘distort-and-recover’ training scheme which only requires high-quality reference images for training instead of input and retouched image pairs. Given high-quality reference images, we distort the images’ color distribution and form distorted-reference image pairs for training. Through extensive experiments, we show that our method produces decent enhancement results and our DRL approach is more suitable for the ‘distort-and-recover’ training scheme than previous supervised approaches. Authors: Jongchan Park (Lunit), Joon-Young Lee (Adobe Research), Donggeun Yoo (Lunit), and In So Kweon (KAIST)

@InProceedings{Park_2018_CVPR, author = {Park, Jongchan and Lee, Joon-Young and Yoo, Donggeun and So Kweon, In}, title = {Distort-and-Recover: Color Enhancement Using Deep Reinforcement Learning}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2018} }

A Robust and Effective Approach Towards Accurate Metastasis Detection and pN-stage Classification in Breast Cancer


Predicting TNM stage is the major determinant of breast cancer prognosis and treatment. The essential part of TNM stage classification is whether the cancer has metastasized to the regional lymph nodes (N-stage). Pathologic N-stage (pN-stage) is commonly performed by pathologists detecting metastasis in histological slides. However, this diagnostic procedure is prone to misinterpretation and would normally require extensive time by pathologists because of the sheer volume of data that needs a thorough review. Automated detection of lymph node metastasis and pN-stage prediction has a great potential to reduce their workload and help the pathologist. Recent advances in convolutional neural networks (CNN) have shown significant improvements in histological slide analysis, but accuracy is not optimized because of the difficulty in the handling of gigapixel images. In this paper, we propose a robust method for metastasis detection and pN-stage classification in breast cancer from multiple gigapixel pathology images in an effective way. pN-stage is predicted by combining patch-level CNN based metastasis detector and slide-level lymph node classifier. The proposed framework achieves a state-of-the-art quadratic weighted kappa score of 0.9203 on the Camelyon17 dataset, outperforming the previous winning method of the Camelyon17 challenge.

@InProceedings{lee2018robust, title={A Robust and Effective Approach Towards Accurate Metastasis Detection and pN-stage Classification in Breast Cancer}, author={Lee, Byungjae and Paeng, Kyunghyun}, booktitle={The International Conference On Medical Image Computing & Computer Assisted Intervention (MICCAI)}, year={2018} }

Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks


Data is one of the most important factors in machine learning. However, even if we have high-quality data, there is a situation in which access to the data is restricted. For example, access to the medical data from outside is strictly limited due to the privacy issues. In this case, we have to learn a model sequentially only with the data accessible in the corresponding stage. In this work, we propose a new method for preserving learned knowledge by modeling the high-level feature space and the output space to be mutually informative, and constraining feature vectors to lie in the modeled space during training. The proposed method is easy to implement as it can be applied by simply adding a reconstruction loss to an objective function. We evaluate the proposed method on CIFAR-10/100 and a chest X-ray dataset, and show benefits in terms of knowledge preservation compared to previous approaches.

@InProceedings{kim2018keep, title={Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks}, author={Kim, Hyo-Eun and Kim, Seungwook and Lee, Jaehwan}, booktitle={The International Conference On Medical Image Computing & Computer Assisted Intervention (MICCAI)}, year={2018} }

Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks


Real-world image recognition is often challenged by the variability of visual styles including object textures, lighting conditions, filter effects, etc. Although these variations have been deemed to be implicitly handled by more training data and deeper networks, recent advances in image style transfer suggest that it is also possible to explicitly manipulate the style information. Extending this idea to general visual recognition problems, we present Batch-Instance Normalization (BIN) to explicitly normalize unnecessary styles from images. Considering certain style features play an essential role in discriminative tasks, BIN learns to selectively normalize only disturbing styles while preserving useful styles. The proposed normalization module is easily incorporated into existing network architectures such as Residual Networks, and surprisingly improves the recognition performance in various scenarios. Furthermore, experiments verify that BIN effectively adapts to completely different tasks like object classification and style transfer, by controlling the trade-off between preserving and removing style variations.

@inproceedings{nam2018batch, title={Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks}, author={Nam, Hyeonseob and Kim, Hyo-Eun}, booktitle={Advances in neural information processing systems (NIPS)}, year={2018} }

Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study


We assessed the feasibility of a data-driven imaging biomarker based on weakly supervised learning (DIB; an imaging biomarker derived from large-scale medical image data with deep learning technology) in mammography (DIB-MG). A total of 29,107 digital mammograms from five institutions (4,339 cancer cases and 24,768 normal cases) were included. After matching patients’ age, breast density, and equipment, 1,238 and 1,238 cases were chosen as validation and test sets, respectively, and the remainder were used for training. The core algorithm of DIB-MG is a deep convolutional neural network; a deep learning algorithm specialized for images. Each sample (case) is an exam composed of 4-view images (RCC, RMLO, LCC, and LMLO). For each case in a training set, the cancer probability inferred from DIB-MG is compared with the per-case ground-truth label. Then the model parameters in DIB-MG are updated based on the error between the prediction and the ground-truth. At the operating point (threshold) of 0.5, sensitivity was 75.6% and 76.1% when specificity was 90.2% and 88.5%, and AUC was 0.903 and 0.906 for the validation and test sets, respectively. This research showed the potential of DIB-MG as a screening tool for breast cancer.

@article{kim2018applying, title={Applying Data-driven Imaging Biomarker in Mammography for Breast Cancer Screening: Preliminary Study}, author={Kim, Eun-Kyung and Kim, Hyo-Eun and Han, Kyunghwa and Kang, Bong Joo and Sohn, Yu-Mee and Woo, Ok Hee and Lee, Chan Wha}, journal={Scientific reports}, volume={8}, number={1}, pages={2762}, year={2018}, publisher={Nature Publishing Group} }

Accurate Lung Segmentation via Network-Wise Training of Convolutional Networks


We introduce an accurate lung segmentation model for chest radiographs based on deep convolutional neural networks. Our model is based on atrous convolutional layers to increase the field-of-view of filters efficiently. To improve segmentation performances further, we also propose a multi-stage training strategy, network-wise training, which the current stage network is fed with both input images and the outputs from pre-stage network. It is shown that this strategy has an ability to reduce falsely predicted labels and produce smooth boundaries of lung fields. We evaluate the proposed model on a common benchmark dataset, JSRT, and achieve the state-of-the-art segmentation performances with much fewer model parameters.

@InProceedings{hwang2017accurate, title={Accurate Lung Segmentation via Network-Wise Training of Convolutional Networks}, author={Hwang, Sangheum and Park, Sunggyun}, booktitle={The International Conference On Medical Image Computing & Computer Assisted Intervention (MICCAI) DLMIA Workshop}, year={2017} }

A Unified Framework for Tumor Proliferation Score Prediction in Breast Histopathology


Predicting tumor proliferation scores is an important biomarker indicative of breast cancer patients' prognosis. In this paper, we present a unified framework to predict tumor proliferation scores from whole slide images in breast histopathology. The proposed system is offers a fully automated solution to predicting both a molecular data based, and a mitosis counting based tumor proliferation score. The framework integrates three modules, each fine-tuned to maximize the overall performance: an image processing component for handling whole slide images, a deep learning based mitosis detection network, and a proliferation scores prediction module. We have achieved 0.567 quadratic weighted Cohen's kappa in mitosis counting based score prediction and 0.652 F1-score in mitosis detection. On Spearman's correlation coefficient, which evaluates prediction on the molecular data based score, the system obtained 0.6171. Our system won first place in all of the three tasks in Tumor Proliferation Assessment Challenge at MICCAI 2016, outperforming all other approaches.

@InProceedings{paeng2017unified, title={A unified framework for tumor proliferation score prediction in breast histopathology}, author={Paeng, Kyunghyun and Hwang, Sangheum and Park, Sunggyun and Kim, Minsoo}, booktitle={The International Conference On Medical Image Computing & Computer Assisted Intervention (MICCAI) DLMIA Workshop}, year={2017} }

Transferring Knowledge to Smaller Network with Class-Distance Loss


Training a network with small capacity that can perform as well as a larger capacity network is an important problem that needs to be solved in real life applications which require fast inference time and small memory requirement. Previous approaches that transfer knowledge from a bigger network to a smaller network show little benefit when applied to state-of-the-art convolutional neural network architectures such as Residual Network trained with batch normalization. We propose class-distance loss that helps teacher networks to form densely clustered vector space to make it easy for the student network to learn from it. We show that a small network with half the size of the original network trained with the proposed strategy can perform close to the original network on CIFAR-10 dataset.

@inproceedings{title={Transferring Knowledge to Smaller Network with Class-Distance Loss}, author={Seungwook Kim, Hyo-Eun Kim}, booktitle={International Conference on Learning Representation(ICLR) Workshop}, year={2017} }

Semantic Noise Modeling for Better Representation Learning


Latent representation learned from multi-layered neural networks via hierarchical feature abstraction enables recent success of deep learning. Under the deep learning framework, generalization performance highly depends on the learned latent representation which is obtained from an appropriate training scenario with a task-specific objective on a designed network model. In this work, we propose a novel latent space modeling method to learn better latent representation. We designed a neural network model based on the assumption that good base representation can be attained by maximizing the total correlation between the input, latent, and output variables. From the base model, we introduce a semantic noise modeling method which enables class-conditional perturbation on latent space to enhance the representational power of learned latent feature. During training, latent vector representation can be stochastically perturbed by a modeled class-conditional additive noise while maintaining its original semantic feature. It implicitly brings the effect of semantic augmentation on the latent space. The proposed model can be easily learned by back-propagation with common gradient-based optimization algorithms. Experimental results show that the proposed method helps to achieve performance benefits against various previous approaches. We also provide the empirical analyses for the proposed class-conditional perturbation process including t-SNE visualization.

@article{DBLP:journals/corr/KimHC16, author = {Hyo{-}Eun Kim and Sangheum Hwang and Kyunghyun Cho}, title = {Semantic Noise Modeling for Better Representation Learning}, journal = {CoRR}, volume = {abs/1611.01268}, year = {2016}, url = {}, timestamp = {Thu, 01 Dec 2016 19:32:08 +0100}, biburl = {}, bibsource = {dblp computer science bibliography,} }



Recent advances of deep learning have achieved remarkable performances in various challenging computer vision tasks. Especially in object localization, deep convolutional neural networks outperform traditional approaches based on extraction of data/task-driven features instead of hand-crafted features. Although location information of region-of-interests (ROIs) gives good prior for object localization, it requires heavy annotation efforts from human resources. Thus a weakly supervised framework for object localization is introduced. The term "weakly" means that this framework only uses image-level labeled datasets to train a network. With the help of transfer learning which adopts weight parameters of a pre-trained network, the weakly supervised learning framework for object localization performs well because the pre-trained network already has well-trained class-specific features. However, those approaches cannot be used for some applications which do not have pre-trained networks or well-localized large scale images. Medical image analysis is a representative among those applications because it is impossible to obtain such pre-trained networks. In this work, we present a "fully" weakly supervised framework for object localization ("semi"-weakly is the counterpart which uses pre-trained filters for weakly supervised localization) named as self-transfer learning (STL). It jointly optimizes both classification and localization networks simultaneously. By controlling a supervision level of the localization network, STL helps the localization network focus on correct ROIs without any types of priors. We evaluate the proposed STL framework using two medical image datasets, chest X-rays and mammograms, and achieve signiticantly better localization performance compared to previous weakly supervised approaches.

@InProceedings{hwang2016self, title={Self-transfer learning for fully weakly supervised object localization}, author={Hwang, Sangheum and Kim, Hyo-Eun}, booktitle={The International Conference On Medical Image Computing & Computer Assisted Intervention (MICCAI)}, year={2016} }

Pixel-level Domain Transfer


We present an image-conditional image generation model. The model transfers an input domain to a target domain in semantic level, and generates the target image in pixel level. To generate realistic target images, we employ the real/fake-discriminator in Generative Adversarial Nets, but also introduce a novel domain-discriminator to make the generated image relevant to the input image. We verify our model through a challenging task of generating a piece of clothing from an input image of a dressed person. We present a high quality clothing dataset containing the two domains, and succeed in demonstrating decent results.

@inproceedings{yoo2016pixel, title={Pixel-level domain transfer}, author={Yoo, Donggeun and Kim, Namil and Park, Sunggyun and Paek, Anthony S and Kweon, In So}, booktitle={European Conference on Computer Vision (ECCV)}, pages={517--532}, year={2016}, organization={Springer} }



We propose an automatic TB screening system based on deep CNN. Since CNN extracts the most discriminative features according to target objective from given data by itself, the proposed system does not require manually-designed features for TB screening. Also, we show that transfer learning from lower convolutional layers of pre-trained networks resolves the difficulties in handling high-resolution medical images and training huge parameters with limited number of images. Experiments are conducted using three real field datasets, the KIT, MC and Shenzhen sets, and the results show that the proposed system has high screening performance in terms of AUC and accuracy.

@proceeding{doi:10.1117/12.2216198, author = {Hwang, Sangheum and Kim, Hyo-Eun and Jeong, Jihoon and Kim, Hee-Jin}, title = { A novel approach for tuberculosis screening based on deep convolutional neural networks }, journal = {Proc. SPIE}, volume = {9785}, pages = {97852W-97852W-8}, year = {2016}, URL = {} }



We present a novel detection method using a deep convolutional neural network (CNN), named AttentionNet. We cast an object detection problem as an iterative classification problem, which is the most suitable form of a CNN. AttentionNet provides quantized weak directions pointing a target object and the ensemble of iterative predictions from AttentionNet converges to an accurate object boundary box. Since AttentionNet is a unified network for object detection, it detects objects without any separated models from the object proposal to the post bounding-box regression. We evaluate AttentionNet by a human detection task and achieve the state-of-the-art performance of 65% (AP) on PASCAL VOC 2007/2012 with an 8-layered architecture only.

@inproceedings{yoo2015attentionnet, title={Attentionnet: Aggregating weak directions for accurate object detection}, author={Yoo, Donggeun and Park, Sunggyun and Lee, Joon-Young and Paek, Anthony S and So Kweon, In}, booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)}, pages={2659--2667}, year={2015} }



A weakly-supervised semantic segmentation framework with a tied deconvolutional neural network is presented. Each deconvolution layer in the framework consists of unpooling and deconvolution operations. 'Unpooling' upsamples the input feature map based on unpooling switches defined by corresponding convolution layer's pooling operation. 'Deconvolution' convolves the input unpooled features by using convolutional weights tied with the corresponding convolution layer's convolution operation. The unpooling-deconvolution combination helps to eliminate less discriminative features in a feature extraction stage, since output features of the deconvolution layer are reconstructed from the most discriminative unpooled features instead of the raw one. This results in reduction of false positives in a pixel-level inference stage. All the feature maps restored from the entire deconvolution layers can constitute a rich discriminative feature set according to different abstraction levels. Those features are stacked to be selectively used for generating class-specific activation maps. Under the weak supervision (image-level labels), the proposed framework shows promising results on lesion segmentation in medical images (chest X-rays) and achieves state-of-the-art performance on the PASCAL VOC segmentation dataset in the same experimental condition.

@ARTICLE{2016arXiv160204984K, author = {{Kim}, H.-E. and {Hwang}, S.}, title = "{Scale-Invariant Feature Learning using Deconvolutional Neural Networks for Weakly-Supervised Semantic Segmentation}", keywords = {Computer Science - Computer Vision and Pattern Recognition}, year = 2016 }



Mitosis counting is time and labor-consuming work and it frequently reveals inter-observer variability. Although deep convolutional neural network, the most accurate image classification algorithm, has been used for detecting mitosis, only public data sets were tested and it had never been utilized for routine histologic slide images. Recently, smartphone cameras with adaptors to the microscope were tried for easier image acquisition and they significantly resolved a barrier for applying computer algorithms to analyze histologic images. Histologic slides of 70 invasive ductal carcinomas of breast were selected and 1761 high-power field histologic images (400x) were acquired by using smartphone application with an adaptor attached to the microscope manufactured by us. Mitoses were annotated by four pathologists blindly. More than three pathologists’ concordance was regarded as true. 2004 mitotic cells and 801600 non-mitotic cells from 60 cases were divided into 10 sets and the algorithm was sequentially trained using fine-tuning method. After the training, ten patients’ images were tested for the concordance of detection with pathologists. During the algorithm training, sensitivity for mitosis detection was calculated between 75-83%. Specificity for mitosis detection was increased to achieve 97% as we trained the algorithm with more images. The trained algorithm identified 189 mitoses in 748 images from 10 cases and showed 79% sensitivity and 96% specificity for detecting mitosis compared to the pathologists. The detected mitoses were displayed in the application within 14 seconds in average. The proposed deep convolutional neural network-based mitosis detection system revealed remarkable sensitivity and specificity, and the performance improved as more images were utilized for training. Along with the smartphone application and the adaptor we manufactured, it assists pathologists to identify mitosis so that reduce time and labor costs, while resulting objective diagnosis.

back to top