论文下载: 链接
3D MRI brain tumor segmentation using autoencoder regularization
采用自编码正规化的三维MRI脑肿瘤分割方法
Abstract. Automated segmentation of brain tumors from 3D magnetic resonance images (MRIs) is necessary for the diagnosis, monitoring, and treatment planning of the disease. Manual delineation practices require anatomical knowledge, are expensive, time consuming and can be inaccurate due to human error. Here, we describe a semantic segmentation network for tumor subregion segmentation from 3D MRIs based on encoder-decoder architecture. Due to a limited training dataset size, a variational auto-encoder branch is added to reconstruct the input image itself in order to regularize the shared decoder and impose additional constraints on its layers. The current approach won 1st place in the BraTS 2018 challenge.
摘要从三维磁共振成像(MRIs)中自动分割脑肿瘤对于疾病的诊断、监测和治疗规划是必要的。人工描绘实践需要解剖知识,是昂贵的,耗时的,并可能是不准确的,由于人为的错误。在此,我们描述了一个基于编码-解码器结构的三维核磁共振肿瘤亚区分割语义网络。由于有限的训练数据集的大小,一个可变的自动编码器分支被添加到重建输入图像本身,以规范共享解码器和施加额外的约束对其层。目前的方法赢得了2018年BraTS挑战赛的第一名。
1 Introduction
Brain tumors are categorized into primary and secondary tumor types. Primary brain tumors originate from brain cells, whereas secondary tumors metastasize into the brain from other organs. The most common type of primary brain tumors are gliomas, which arise from brain glial cells. Gliomas can be of low-grade (LGG) and high-grade (HGG) subtypes. High grade gliomas are an aggressive type of malignant brain tumor that grow rapidly, usually require surgery and radiotherapy and have poor survival prognosis. Magnetic Resonance Imaging (MRI) is a key diagnostic tool for brain tumor analysis, monitoring and surgery planning. Usually, several complimentary 3D MRI modalities are acquired - such as T1, T1 with contrast agent (T1c), T2 and Fluid Attenuation Inversion Recover (FLAIR) - to emphasize different tissue properties and areas of tumor spread. For example the contrast agent, usually gadolinium, emphasizes hyperactive tumor subregions in T1c MRI modality.
脑瘤可分为原发性和继发性肿瘤。原发性脑肿瘤起源于脑细胞,而继发性肿瘤则从其他器官转移到大脑。最常见的原发性脑瘤是胶质瘤,它是由大脑胶质细胞产生的。胶质瘤可分为低级别(LGG)和高级别(HGG)两种亚型。高等级胶质瘤是一种侵袭性的恶性脑肿瘤,生长迅速,通常需要手术和放疗,生存预后差。磁共振成像(MRI)是脑肿瘤分析、监测和手术计划的重要诊断工具。通常,获得几种互补的3D MRI模式——如T1、T1与造影剂(T1c)、T2和液体衰减反转恢复(FLAIR)——以强调不同的组织特性和肿瘤扩散区域。例如,造影剂,通常是钆剂,在T1c MRI形态中强调肿瘤亚区异常活跃。
Automated segmentation of 3D brain tumors can save physicians time and provide an accurate reproducible solution for further tumor analysis and monitoring. Recently, deep learning based segmentation techniques surpassed traditional computer vision methods for dense semantic segmentation. Convolutional neural networks (CNN) are able to learn from examples and demonstrate state-of-the-art segmentation accuracy both in 2D natural images [6] and in 3D medical image modalities [19].
自动分割三维脑肿瘤可以节省医生的时间,并为进一步的肿瘤分析和监测提供一个准确的可重复的解决方案。近年来,基于深度学习的分割技术已经超越了传统的计算机视觉方法,成为一种语义分割的新方法。卷积神经网络(CNN)能够从示例中学习并演示在2D自然图像[6]和3D医学图像模式[19]中最先进的分割精度。
Multimodal Brain Tumor Segmentation Challenge (BraTS) aims to evaluate state-of-the-art methods for the segmentation of brain tumors by providing a 3D MRI dataset with ground truth tumor segmentation labels annotated arby physicians [5,18,4,2,3]. This year, BraTS 2018 training dataset included 285 cases (210 HGG and 75 LGG), each with four 3D MRI modalities (T1, T1c, T2 and FLAIR) rigidly aligned, resampled to 1x1x1 mm isotropic resolution and skull-stripped. The input image size is 240x240x155. The data were collected from 19 institutions, using various MRI scanners. Annotations include 3 tumor subregions: the enhancing tumor, the peritumoral edema, and necrotic and non-enhancing tumor core. The annotations were combined into 3 nested subregions: whole tumor (WT), tumor core (TC) and enhancing tumor (ET), as shown in Figure 2. Two additional datasets without the ground truth labels were provided for validation and testing. These datasets required participants to upload the segmentation masks to the organizers’ server for evaluations. The validation dataset (66 cases) allowed multiple submissions and was designed for intermediate evaluations. The testing dataset (191 cases) allowed only a single submission, and was used to calculate the final challenge ranking.
多模式脑瘤分割挑战(Multimodal Brain Tumor Segmentation Challenge, BraTS)旨在评估目前最先进的脑瘤分割方法,方法是通过提供一个3D MRI数据集,其中包含由arby医生标注的肿瘤分割标记(ground truth Tumor Segmentation label)[5,18,4,2,3]。今年,BraTS 2018训练数据集包括285个病例(210 HGG和75 LGG),每个病例都有4个3D MRI模式(T1、T1c、T2和FLAIR)严格对齐,重新采样至1x1x1 mm各向同性分辨率和去颅骨。输入的图像大小是240x240x155。这些数据来自19个机构,使用不同的核磁共振扫描仪。注释包括3个肿瘤亚区:强化的肿瘤、瘤周边的水肿、坏死的非强化的肿瘤核心。将这些标签合并为3个嵌套的子区域:whole tumor (WT)、tumor core (TC)和enhanced tumor (ET),如图2所示。另外提供了两个没有ground truth标签的数据集用于验证和测试。这些数据集要求参与者上传分割面具到组织者的服务器进行评估。验证数据集(66个案例)允许多次提交,设计用于中间评估。测试数据集(191个案例)只允许一次提交,并用于计算最终的挑战排名。
In this work, we describe our semantic segmentation approach for volumetric 3D brain tumor segmentation from multimodal 3D MRIs, which won the BraTS 2018 challenge. We follow the encoder-decoder structure of CNN, with asymmetrically large encoder to extract deep image features, and the decoder part econstructs dense segmentation masks. We also add the variational autoencoder (VAE) branch to the network to reconstruct the input images jointly with segmentation in order to regularize the shared encoder. At inference time, only the main segmentation encode-decoder part is used.
在这项工作中,我们描述了我们的语义分割方法,用于从多模态三维磁共振成像中进行体积三维脑瘤分割,该方法赢得了2018年BraTS挑战。我们采用了CNN的编解码器结构,使用非对称大编码器来提取深度图像特征,并且解码器部分构造了密集的分割掩码。我们还将变分自编码器(VAE)分支加入到网络中,与分割一起对输入图像进行重构,从而对共享编码器进行规范化。在推理时,只使用了主要的分割译码器部分。
2 Related work
Last year, BraTS 2017, top performing submissions included Kamnitsas et al. [13] who proposed to ensemble several models for robust segmentation (EMMA), and Wang et al. [21] who proposed to segment tumor subregions in cascade using anisotropic convolutions. EMMA takes advantage of an ensemble of several independently trained architectures. In particular, EMMA combined DeepMedic [14], FCN [16] and U-net [20] models and ensembled their segmentation predictions. During training they used a batch size of 8, and a crop of 64x64x64 3D patch. EMMA’s ensemble of different models demonstrated a good generalization performance winning the BraTS 2017 challenge. Wang et al. [21] second place paper took a different approach, by training 3 networks for each tumor subregion in cascade, with each subsequent network taking the output of the previous network (cropped) as its input. Each network was similar in structure and consists of a large encoder part (with dilated convolutions) and a basic decoder. They also decompose the 3x3x3 convolution kernel into intra-slice (3x3x1) and inter-slice (1x1x3) kernel to save on both the GPU memory and the computational time.
去年,BraTS 2017,表现最好的提交包括Kamnitsas等人的[13],他们提出集成几个模型进行稳健分割(EMMA), Wang等人的[21],他们提出用各向异性卷积在级联中分割肿瘤亚区。EMMA利用了几个独立训练的体系结构的集合。特别是,EMMA结合了DeepMedic[14]、FCN[16]和U-net[20]模型,并综合了它们的细分预测。在训练过程中,他们使用了8个批量大小的补丁和64x64x64的3D补丁。艾玛的不同模型的集合展示了良好的综合性能,赢得了2017年的BraTS挑战。Wang等人的[21]第二名论文采用了不同的方法,对每个肿瘤亚区以级联方式训练3个网络,每个后续网络以前一个网络(裁剪)的输出作为输入。每个网络在结构上是相似的,由一个大的编码器部分(带有扩展的卷积)和一个基本的解码器组成。他们还将3x3x3的卷积核分解为片内(3x3x1)和片间(1x1x3)的核,以节省GPU内存和计算时间。
This year, BraTS 2018 top performing submission (in addition to the current work) included Isensee et al. [12] in the 2nd place, McKinly et al. [17] and Zhou et al. [23], who shared the 3rd place. Isensee et al. [12] demonstrated that a generic U-net architecture with a few minor modifications is enough to achieve competitive performance. The authors used a batch size of 2 and a crop size of 128x128x128. Furthermore, the authors used an additional training data from their own institution (which yielded some improvements for the enhancing tumor dice). McKinly et al. [17] proposed a segmentation CNN in which a DenseNet [11] structure with dilated convolutions was embedded in U-net-like network. The authors also introduce a new loss function, a generalization of binary crossentropy, to account for label uncertainty. Finally, Zhou et al. [23] proposed to use an ensemble of different networks: taking into account multi-scale context information, segmenting 3 tumor subregions in cascade with a shared backbone weights and adding an attention block.
今年,BraTS 2018年度最佳提交作品(除了目前的作品)包括Isensee等人的[12]排名第二,McKinly等人的[17]排名第三,Zhou等人的[23]排名第三。Isensee等人的[12]演示了一个通用的U-net架构,只需进行少量的修改就足以获得具有竞争力的性能。作者使用的批处理大小为2,作物大小为128x128x128。此外,作者使用了来自他们自己机构的额外训练数据(这产生了增强肿瘤骰子的一些改进)。McKinly等人提出了一种分段CNN,在u -net类网络中嵌入一个扩展卷积的DenseNet[11]结构。作者还引入了一种新的损失函数,即二元串扰的泛化,来考虑标签的不确定性。最后,Zhou等人提出使用不同网络的集合:考虑多尺度上下文信息,使用共享的主链权重级联分割3个肿瘤亚区,并添加一个注意块。
Compared to the related works, we use the largest crop size of 160x192x128 but compromise the batch size to be 1 to be able to fit network into the GPU memory limits. We also output all 3 nested tumor subregions directly after the sigmoid (instead of using several networks or the softmax over the number of classes). Finally, we add an additional branch to regularize the shared encoder, used only during training. We did not use any additional training data and used only the provided training set
与相关工作相比,我们使用了最大的剪裁尺寸160x192x128,但为了能够将网络适配到GPU内存限制中,我们将批处理大小降低为1。我们还直接在sigmoid后输出所有3个嵌套的肿瘤亚区(而不是使用多个网络或softmax类的数量)。最后,我们添加了一个额外的分支来对共享编码器进行规范化,仅在培训期间使用。我们没有使用任何额外的训练数据,只使用提供的训练集
Fig. 1. Schematic visualization of the network architecture. Input is a four channel 3D MRI crop, followed by initial 3x3x3 3D convolution with 32 filters. Each green block is a ResNet-like block with the GroupNorm normalization. The output of the segmentation decoder has three channels (with the same spatial size as the input) followed by a sigmoid for segmentation maps of the three tumor subregions (WT, TC, ET). The VAE branch reconstructs the input image into itself, and is used only during training to regularize the shared encoder.
图1所示。网络结构可视化示意图。输入是一个四通道的3D MRI裁剪,然后是初始的3x3x3 3D卷积与32个滤波器。每个绿色块都是具有GroupNorm标准化的类resnet块。分割解码器的输出有三个通道(与输入的空间大小相同),然后是一个sigmoid,用于三个肿瘤亚区(WT、TC、ET)的分割图。VAE分支将输入图像重新构造到自身,并且仅在训练期间使用,以便对共享编码器进行规范化。
3 Methods
Our segmentation approach follows encoder-decoder based CNN architecture with an asymmetrically larger encoder to extract image features and a smaller decoder to reconstruct the segmentation mask [6,7,9,20,19]. We add an additional branch to the encoder endpoint to reconstruct the original image, similar to autoencoder architecture. The motivation for using the auto-encoder branch is to add additional guidance and regularization to the encoder part, since the training dataset size is limited. We follow the variational auto-encoder (VAE) approach to better cluster/group the features of the encoder endpoint. We describe the building parts of our networks in the next subsections (see also Figure 1)
我们的分割方法遵循基于编码器-解码器的CNN架构,采用非对称的较大编码器提取图像特征,较小的解码器重构分割掩码[6,7,9,20,19]。我们向编码器端点添加一个额外的分支来重构原始图像,类似于自动编码器架构。使用自动编码器分支的动机是向编码器部分添加额外的指导和正则化,因为训练数据集的大小是有限的。我们遵循变分自动编码器(VAE)方法,以更好地集群/分组编码器端点的特性。我们将在下一小节中描述网络的构建部分(参见图1)
3.1 Encoder part
The encoder part uses ResNet [10] blocks, where each block consists of two convolutions with normalization and ReLU, followed by additive identity skip connection. For normalization, we use Group Normalization (GN) [22], which shows better than BatchNorm performance when batch size is small (bath size of 1 in our case). We follow a common CNN approach to progressively downsize image dimensions by 2 and simultaneously increase feature size by 2. For downsizing we use strided convolutions. All convolutions are 3x3x3 with initial number of filters equal to 32. The encoder endpoint has size 256x20x24x16, and is 8 times spatially smaller than the input image. We decided against further downsizing to preserve more spatial content.
编码器部分使用ResNet[10]块,其中每个块由两个具有标准化和ReLU的卷积组成,然后是附加的身份跳过连接。在归一化方面,我们使用了分组归一化(GN)[22],当批处理大小较小时(在我们的示例中为槽大小为1),其性能优于BatchNorm。我们采用一种常见的CNN方法,逐步缩小图像尺寸2,同时增加特征尺寸2。为了缩小规模,我们使用strided convolutions。所有的卷积都是3x3x3,初始滤波器数目为32。编码器端点的大小为256x20x24x16,在空间上比输入图像小8倍。我们决定不进一步缩小规模以保留更多的空间内容。
3.2 Decoder part
The decoder structure is similar to the encoder one, but with a single block per each spatial level. Each decoder level begins with upsizing: reducing the number of features by a factor of 2 (using 1x1x1 convolutions) and doubling the spatial dimension (using 3D bilinear upsampling), followed by an addition of encoder output of the equivalent spatial level. The end of the decoder has the same spatial size as the original image, and the number of features equal to the initial input feature size, followed by 1x1x1 convolution into 3 channels and a sigmoid function.
解码器的结构类似于编码器,但是每个空间层都有一个单独的块。每一个解码器级别都从放大开始:将特征的数量减少2倍(使用1x1x1卷积)并将空间维度增加一倍(使用3D双线性上采样),然后添加等效空间级别的编码器输出。解码器的末端具有与原始图像相同的空间大小,特征的数量等于初始输入特征的大小,然后将1x1x1卷积成3个通道和一个s形函数。
3.3 VAE part
Starting from the encoder endpoint output, we first reduce the input to a low dimensional space of 256 (128 to represent mean, and 128 to represent std). Then, a sample is drawn from the Gaussian distribution with the given mean and std, and reconstructed into the input image dimensions following the same architecture as the decoder, except we don’t use the inter-level skip connections from the encoder here. The VAE part structure is shown in Table 1.
从编码器端点输出开始,我们首先将输入减少到一个低维空间256(128表示平均值,128表示std)。然后,从具有给定平均值和std的高斯分布中抽取一个样本,并按照与解码器相同的架构重构为输入图像的维数,只是这里我们没有使用来自编码器的层间跳过连接。VAE部分结构如表1所示。
3.4 Loss
Our loss function consists of 3 terms:
我们的损失函数由3项组成:
4 Results
We implemented our network in Tensorflow [1] and trained it on NVIDIA Tesla V100 32GB GPU using BraTS 2018 training dataset (285 cases) without any additional in-house data. During training we used a random crop of size 160x192x128, which ensures that most image content remains within the crop area. We concatenated 4 available 3D MRI modalities into the 4 channel image as an input. The output of the network is 3 nested tumor subregions (after the sigmoid).
我们在Tensorflow[1]中实现了我们的网络,并在NVIDIA Tesla V100 32GB GPU上使用BraTS 2018训练数据集(285个案例)进行训练,没有任何额外的内部数据。在培训期间,我们使用了160x192x128大小的随机裁剪,这确保了大多数图像内容都保留在裁剪区域内。我们将4种可用的3D MRI模式连接到4通道图像中作为输入。网络的输出是3个嵌套的肿瘤亚区(在sigmoid之后)。
We report the results of our approach on BraTS 2018 validation (66 cases) and the testing sets (191 cases). These datasets were provided with unknown glioma grade and unknown segmentation. We uploaded our segmentation results to the BraTS 2018 server for evaluation of per class dice, sensitivity, specificity and Hausdorff distances.
我们报告了我们的方法在2018年BraTS验证(66例)和测试集(191例)上的结果。这些数据集提供了未知的胶质瘤分级和未知的分割。我们将分割结果上传到BraTS 2018服务器,用于评估每个类的骰子、灵敏度、特异性和Hausdorff距离。
Aside from evaluating a single model, we also applied test time augmentation (TTA) by mirror flipping the input 3D image axes, and averaged the output of the resulting 8 flipped segmentation probability maps. Finally, we ensembled a set of 10 models (trained from scratch) to further improve the performance.
除了评估单个模型外,我们还通过镜像翻转输入三维图像轴来应用测试时间增量(TTA),并对得到的8个翻转分割概率图的输出进行平均。最后,我们集合了10个模型(从零开始训练)来进一步提高性能。
Table 2 shows the results of our model on the BraTS 2018 validation dataset. At the time of initial short paper submission (Jul 13, 2018), our dice accuracy performance was second best (team name NVDLMED1) for all of the 3 segmentation labels (ET, WT, TC). The TTA only marginally improved the performance, but the ensemble of 10 models resulted in 1% improvement, which is consistent with the literature results of using ensembles.
表2显示了我们的模型在BraTS 2018验证数据集上的结果。在最初的短论文提交时(2018年7月13日),我们的骰子精度表现在所有3个分割标签(ET, WT, TC)中都是第二好的(团队名NVDLMED1)。TTA仅略微改善了性能,但10个模型的集成使性能提高了1%,这与使用集成的文献结果一致。
For the testing dataset, only a single submission was allowed. Our results are shown in Table 3, which won the 1st place at BraTS 2018 challenge
对于测试数据集,只允许一次提交。我们的结果如表3所示,它获得了BraTS 2018挑战赛的第一名
Time-wise, each training epoch (285 cases) on a single GPU (NVIDIA Tesla V100 32GB) takes 9min. Training the model for 300 epochs takes 2 days. We’ve also trained the model on NVIDIA DGX-1 server (that includes 8 V100 GPUs interconnected with NVLink); this allowed to train the model in 6 hours ( 7.8x speed up). The inference time is 0.4 sec for a single model on a single V100 GPU.
在时间上,一个GPU (NVIDIA Tesla V100 32GB)上的每个训练历元(285例)需要9分钟。训练300个epoch的模型需要2天。我们还在NVIDIA DGX-1服务器(包括8个与NVLink互连的V100 gpu)上对模型进行了培训;这允许在6小时内训练模型(7.8倍的速度)。单个V100 GPU上的单个模型的推理时间为0.4秒。
5 Discussion and Conclusion
In this work, we described a semantic segmentation network for brain tumor segmentation from multimodal 3D MRIs, which won the BraTS 2018 challenge. While experimenting with network architectures, we have tried several alternative approaches. For instance, we have tried a larger batch size of 8 to be able to use BatchNorm (and take advantage of batch statistics), however due to the GPU memory limits this modification required to use a smaller image crop size, and resulted in worse performance. We have also experimented with more sophisticated data augmentation techniques, including random histogram matching, affine image transforms, and random image filtering, which did not demonstrate any additional improvements. We have tried several data post-processing techniques to fine tune the segmentation predictions with CRF [14], but did not find it beneficial (it helped for some images, but made some other image segmentation results worse). Increasing the network depth further did not improve the performance, but increasing the network width (the number of features/filters) consistently improved the results. Using the NVIDIA Volta V100 32GB GPU we were able to double the number of features compared to V100 16GB version. Finally, the additional VAE branch helped to regularize the shared encoder (in presence of limited data), which not only improved the performance, but helped to consistently achieve good training accuracy for any random initialization. Our BraTS 2018 testing dataset results are 0.7664, 0.8839 and 0.8154 average dice for enhanced tumor core, whole tumor and tumor core, respectively
在这项工作中,我们描述了一个语义分割网络,用于从多模态3D核磁共振中分割脑瘤,该网络赢得了2018年BraTS挑战。在试验网络体系结构时,我们尝试了几种替代方法。例如,我们尝试了一个更大的批处理大小为8,以能够使用BatchNorm(并利用批处理统计数据),但由于GPU内存限制,这一修改需要使用更小的图像裁剪大小,导致性能下降。我们还尝试了更复杂的数据增强技术,包括随机直方图匹配、仿射图像变换和随机图像滤波,这些技术并没有显示出任何额外的改进。我们尝试了几种数据后处理技术来微调CRF[14]的分割预测,但没有发现它是有益的(它有助于一些图像,但使其他一些图像分割结果更糟)。进一步增加网络深度并不能提高性能,但是增加网络宽度(特性/过滤器的数量)可以不断地改进结果。使用NVIDIA Volta V100 32GB的GPU,我们能够将V100的16GB版本的特性数量增加一倍。最后,增加的VAE分支有助于对共享编码器进行规范化(在数据有限的情况下),这不仅提高了性能,而且有助于始终保持对任何随机初始化的良好训练准确性。我们的BraTS 2018测试数据集的增强肿瘤核、全肿瘤和肿瘤核的平均色度分别为0.7664、0.8839和0.8154