在线
|
接着上一讲的内容,上一讲已经把第一个模型训练完毕。
该模型的主要作用就是将CT图像分为三种类型
- NiCT:无用CT图片;
- pCT:潜在的可能与cov19相关的图片;
- nCT:与cov19无关的图片;
然后我们现在就要将这个模型运用在cohort1和cohort2的模型中
- # load model
- # the model and initial setting save in VGG_Simple.py
- from VGG_Simple import *
- %matplotlib inline
- %config InlineBackend.figure_format = 'svg'
- # load model parameter
- version = 'version_14' # choose best model path
- path = os.path.join(os.getcwd(), 'check_point', version)
- ckpt = str(os.listdir(path)[2]) # choose best model
- print(ckpt)
- # Create model
- ckpt = os.path.join(path, ckpt)
- class_weight = torch.FloatTensor([2, 5, 2]).cuda()
- myloss = nn.CrossEntropyLoss(weight=class_weight)
- model = VGG_Simple(myloss=myloss)
- epoch=30-val_acc=0.99097-val_loss=0.03236.ckpt
复制代码 这里要注意一下,因为我们是将数据批量的下载到patientCT这个文件夹中,所以首先我们需要将 cohort1 和 cohort2 的数据分离开来。
cohort1 patient 1-1170
chohrt2 patient 1171 - 1521
- batch_size = 64
- # get patient CT-file path
- cohort1 = ['Patient {}'.format(i) for i in range(0,1171)]
- cohort2 = ['Patient {}'.format(i) for i in range(1171,1522)]
- patient_list = os.listdir('../patientCT/')
- # attention! not all samples have CT-file
- cohort1 = set(cohort1) & set(patient_list)
- cohort2 = set(cohort2) & set(patient_list)
- cohort1 = [os.path.join('..','patientCT',cohort,cohort) for cohort in cohort1]
- cohort2 = [os.path.join('..','patientCT',cohort,cohort) for cohort in cohort2]
- cohort1 = sorted(cohort1)
- cohort2 = sorted(cohort2)
- # make an empty dataframe
- result = pd.DataFrame(columns=['NiCT', 'nCT', 'pCT', 'patientId', 'label'])
复制代码 导入模型(一)
- for i in range(len(cohort1)):
- print(i)
- cohort_sample_path = cohort1[i] # get patient CT-file path
- cohort_sample = cohort_sample_path.split('\\')[3] # get patientId
- # import patient one by one
- cohort = torchvision.datasets.ImageFolder(cohort_sample_path,
- transform=torchvision.transforms.Compose([
- torchvision.transforms.Grayscale(num_output_channels=1),
- torchvision.transforms.Scale(256),
- torchvision.transforms.CenterCrop(200),
- torchvision.transforms.ToTensor()])
- )
- # load all ct file
- cohort = torch.utils.data.DataLoader(cohort, batch_size=batch_size,shuffle=False,num_workers =24)
- XRawData = []
- YRawData = []
- for i,(x,y) in enumerate(cohort):
- XRawData.append(x)
- YRawData.append(y)
- XRawData = torch.cat(XRawData)
- YRawData = torch.cat(YRawData)
- cohort_iter = makeDataiter(XRawData, YRawData, batch_size=batch_size,shuffle=False)
- model.test_predict = []
- model.test_sample_label = []
- model.test_decoder = []
- model.test_matrix = []
- model.test_conv = []
- model.test_primary = []
- model.test_digitcaps = []
- trainer = pl.Trainer(resume_from_checkpoint=ckpt, gpus=-1)
- # predict patient ct
- trainer.test(model, cohort_iter)
- predict = torch.cat(model.test_predict)
- predict_label = toLabel(predict).cpu().numpy()
- # get each picture id
- sample = [cohort_sample +' ' + cohort.dataset.samples[i][0].split(
- '\\')[5] for i in range(len(cohort.dataset.samples))]
- predict = pd.DataFrame(predict.numpy())
- predict.columns = ['NiCT','nCT', 'pCT']
- predict.index = sample
- predict['patientId'] = cohort_sample
- predict['label'] = predict_label
- predict['label'] = predict['label'].replace(
- [0,1,2], # 构建 label
- ['NiCT','nCT', 'pCT']).astype(str)
- # get top 10 pct
- predict = predict.sort_values(by='pCT',ascending = False)
- predict = predict.iloc[0:10,]
- result = pd.concat([result,predict])
复制代码 分别对两套数据进行预测,并将结果保存输出
- # result2 = pd.DataFrame(columns=['NiCT', 'nCT', 'pCT', 'patientId', 'label'])
- #
- # for i in range(len(cohort2)):
- # print(i)
- # cohort_sample_path = cohort2[i]
- # cohort_sample = cohort_sample_path.split('\\')[3]
- # # 依次导入数据
- # cohort = torchvision.datasets.ImageFolder(cohort_sample_path,
- # transform=torchvision.transforms.Compose([
- # torchvision.transforms.Grayscale(num_output_channels=1),
- # torchvision.transforms.Scale(256),
- # torchvision.transforms.CenterCrop(200),
- # torchvision.transforms.ToTensor()])
- # )
- # cohort = torch.utils.data.DataLoader(cohort, batch_size=batch_size,shuffle=False,num_workers =24)
- #
- # # 数据类型转换
- # XRawData = []
- # YRawData = []
- # for i,(x,y) in enumerate(cohort):
- # XRawData.append(x)
- # YRawData.append(y)
- #
- # XRawData = torch.cat(XRawData)
- # YRawData = torch.cat(YRawData)
- #
- # cohort_iter = makeDataiter(XRawData, YRawData, batch_size=batch_size,shuffle=False)
- #
- # model.test_predict = []
- # model.test_sample_label = []
- # model.test_decoder = []
- # model.test_matrix = []
- # model.test_conv = []
- # model.test_primary = []
- # model.test_digitcaps = []
- #
- # trainer = pl.Trainer(resume_from_checkpoint=ckpt, gpus=-1)
- #
- # # predict patient ct
- # trainer.test(model, cohort_iter)
- # predict = torch.cat(model.test_predict)
- # predict_label = toLabel(predict).cpu().numpy()
- #
- # # get each picture id
- # sample = [cohort_sample +' ' + cohort.dataset.samples[i][0].split(
- # '\\')[5] for i in range(len(cohort.dataset.samples))]
- # predict = pd.DataFrame(predict.numpy())
- # predict.columns = ['NiCT','nCT', 'pCT']
- # predict.index = sample
- # predict['patientId'] = cohort_sample
- # predict['label'] = predict_label
- # predict['label'] = predict['label'].replace(
- # [0,1,2], # 构建 label
- # ['NiCT','nCT', 'pCT']).astype(str)
- #
- # # get top 10 pct
- #
- # predict = predict.sort_values(by='pCT',ascending = False)
- # predict = predict.iloc[0:10,]
- #
- # result2 = pd.concat([result2,predict])
- # save result
- # result.to_csv('cohort1_pCT_detection.csv')
- # result2.to_csv('cohort2_pCT_detection.csv')
复制代码 导入数据,进行模型可靠性验证
- cohort1_pCT = pd.read_csv('cohort1_pCT_detection.csv',index_col=0)
- cohort2_pCT = pd.read_csv('cohort2_pCT_detection.csv',index_col=0)
- meta = pd.read_csv('metadata.csv',index_col=0)
复制代码 这里我们要稍微对meta稍微处理一下,因为并不是每一个样本都有CT
- cohort1 = set(cohort1_pCT.index.map(lambda x: x.split(' IMG')[0]))
- cohort2 = set(cohort2_pCT.index.map(lambda x: x.split(' IMG')[0]))
- patientIds = list(cohort1)+list(cohort2)
- meta = meta.loc[patientIds]
- # cohort1 control pictures
- cohort1_predict_control = set(cohort1_pCT[cohort1_pCT['label'] != 'pCT'].index.map(
- lambda x: x.split(' IMG')[0]))
- # cohort2 control pictures
- cohort2_predict_control = set(cohort2_pCT[cohort2_pCT['label'] != 'pCT'].index.map(
- lambda x: x.split(' IMG')[0]))
- # cohort1 pCT pictures
- cohort1_predict_pCT = set(cohort1_pCT[cohort1_pCT['label'] == 'pCT'].index.map(
- lambda x: x.split(' IMG')[0]))
- # cohort2 pCT pictures
- cohort2_predict_pCT = set(cohort2_pCT[cohort2_pCT['label'] == 'pCT'].index.map(
- lambda x: x.split(' IMG')[0]))
- cohort1_ambiguous = cohort1_predict_control & cohort1_predict_pCT
- cohort2_ambiguous = cohort2_predict_control & cohort2_predict_pCT
- cohort1_control = cohort1_predict_control - cohort1_ambiguous
- cohort2_control = cohort2_predict_control - cohort2_ambiguous
- cohort1_pCT = cohort1_predict_pCT - cohort1_ambiguous
- cohort2_pCT = cohort2_predict_pCT - cohort2_ambiguous
- true_control = set(meta[meta['Morbidity outcome'] == 'Control'].index)
复制代码 这里我们注意到,原始数据中控制组分为两部分,其中一部分为Community-acquired pneumonia即普通肺炎,另一部分为Control,我们就暂且当作完全健康的CT
那么根据文章所给出的定义,按道理来说pCT应该在完全健康组中不存在
- Counter(meta['Morbidity outcome'])
- Counter({'Regular': 542,
- 'Severe': 170,
- 'Suspected': 259,
- 'Control': 96,
- 'Control (Community-acquired pneumonia)': 218,
- 'Mild': 22,
- 'Critically ill': 35})
- len(set(meta[meta['Morbidity outcome'] == 'Control'].index) & cohort1_pCT)
- len(set(meta[meta['Morbidity outcome'] == 'Control'].index) & cohort1_ambiguous)
- len(set(meta[meta['Morbidity outcome'] == 'Control'].index) & cohort1_control)
- len(set(meta[meta['Morbidity outcome'] == 'Control'].index) & cohort2_pCT)
- len(set(meta[meta['Morbidity outcome'] == 'Control'].index) & cohort2_ambiguous)
- len(set(meta[meta['Morbidity outcome'] == 'Control'].index) & cohort2_control)
复制代码 11
15
70
0
0
0
在健康对照中根据输出的结果,有大约10%左右的假阳性
另外,我们还得关心一下阳性率的情况
- disease = list(
- set(meta[(meta['Morbidity outcome'] != 'Control') & (
- meta['Morbidity outcome'] != 'Control (Community-acquired pneumonia)') & (
- meta['Morbidity outcome'] != 'Suspected')
- ].index))
- len(set(disease) & cohort1_control)
- len(set(disease) & cohort2_control)
复制代码 212 102
211+102 / 769
从上面的输出结果来看,阳性率大约为60%左右,还是可以接受的
因此,可以开始训练第二个模型
- Best Regards,
- Yuan.SH
- ---------------------------------------
- School of Basic Medical Sciences,
- Fujian Medical University,
- Fuzhou, Fujian, China.
- please contact with me via the following ways:
- (a) e-mail :yuansh3354@163.com
复制代码 来源:https://blog.csdn.net/qq_40966210/article/details/114021462
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |
|