医工互联

 找回密码
 注册[Register]

手机动态码快速登录

手机号快速登录

微信登录

微信扫一扫,快速登录

QQ登录

只需一步,快速开始

查看: 143|回复: 0
收起左侧

pRRophetic 通过基因表达水平预测临床化疗反应的R包

[复制链接]

  离线 

发表于 2022-11-3 06:41:17 | 显示全部楼层 |阅读模式 <

一、pRRophetic 是干什么用的?

pRRophetic是明尼苏达大学Paul Geeleher, Nancy Cox, R. Stephanie Huang做的包,主要算法是基于自己课题组2014年的发的Genome Biology,Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines (https://link.springer.com/article/10.1186/gb-2014-15-3-r47)。这个R包的算法的输入主要是较大的Project的cell line expression profiles 与对应的IC50信息,然后通过ridge regression建立一个模型,再用这个模型去预测临床样本的Chemotherapeutic Response。2021年,这个课题组又出了
oncoPredict (oncoPredict: an R package for predicting in vivo or cancer patient drug response and biomarkers from cell line screening data), 算是对pRRophetic的一次升级。
(注:R 4.2.1 版本会报错,建议使用4.0.2以下版本)
二、使用指南

1. 包的安装

github地址:https://github.com/paulgeeleher/pRRophetic
但是大家好像下载tar本地安装
   OSF | pRRophetic R package Wiki
https://osf.io/5xvsg/wiki/home/
  
074505ebao7ljvk5qlbkwk.png

需要 “car”, “ridge”, “preprocessCore”, “genefilter”, “sva” 四个前置包。
  1. # 如果不缺环境 直接安装就行,不过我喜欢Rstudio,点点鼠标就完事了。
  2. install.packages("pRRophetic_0.5.tar.gz", repos = NULL, dependencies = TRUE)
复制代码
值得一提的是,这个包里除了源代码,包含很多现成的数据集,这个就很人性化。
  1. # 数据集在这里
  2. find.package("pRRophetic") %>% paste0("/data") %>% dir()
  3. [1] "bortezomibData.RData"                     
  4. [2] "ccleData.RData"                           
  5. [3] "Cell_line_RMA_proc_basalExp.txt"           
  6. [4] "Cell_Lines_Details-1.csv"                  
  7. [5] "cgp2016ExprRma.RData"                     
  8. [6] "datalist"                                 
  9. [7] "drugAndPhenoCgp.RData"                     
  10. [8] "PANCANCER_IC_Tue Aug  9 15_28_57 2016.csv"
  11. [9] "PANCANCER_IC_Tue_Aug_9_15_28_57_2016.RData"
复制代码
2. 使用范例

核心的函数 pRRopheticPredict
  1. pRRopheticPredict {pRRophetic}        R Documentation
  2. Given a gene expression matrix, predict drug senstivity for a drug in CGP
  3. Description
  4. Given a gene expression matrix, predict drug senstivity for a drug in CGP.
  5. Usage
  6. pRRopheticPredict(testMatrix, drug, tissueType = "all", batchCorrect = "eb",
  7.   powerTransformPhenotype = TRUE, removeLowVaryingGenes = 0.2,
  8.   minNumSamples = 10, selection = -1, printOutput = TRUE,
  9.   removeLowVaringGenesFrom = "homogenizeData", dataset = "cgp2014")
  10. Arguments
  11. testMatrix       
  12. a gene expression matrix with gene names as row ids and sample names as column ids.
  13. drug       
  14. the name of the drug for which you would like to predict sensitivity, one of A.443654, A.770041, ABT.263, ABT.888, AG.014699, AICAR, AKT.inhibitor.VIII, AMG.706, AP.24534, AS601245, ATRA, AUY922, Axitinib, AZ628, AZD.0530, AZD.2281, AZD6244, AZD6482, AZD7762, AZD8055, BAY.61.3606, Bexarotene, BI.2536, BIBW2992, Bicalutamide, BI.D1870, BIRB.0796, Bleomycin, BMS.509744, BMS.536924, BMS.708163, BMS.754807, Bortezomib, Bosutinib, Bryostatin.1, BX.795, Camptothecin, CCT007093, CCT018159, CEP.701, CGP.082996, CGP.60474, CHIR.99021, CI.1040, Cisplatin, CMK, Cyclopamine, Cytarabine, Dasatinib, DMOG, Docetaxel, Doxorubicin, EHT.1864, Elesclomol, Embelin, Epothilone.B, Erlotinib, Etoposide, FH535, FTI.277, GDC.0449, GDC0941, Gefitinib, Gemcitabine, GNF.2, GSK269962A, GSK.650394, GW.441756, GW843682X, Imatinib, IPA.3, JNJ.26854165, JNK.9L, JNK.Inhibitor.VIII, JW.7.52.1, KIN001.135, KU.55933, Lapatinib, Lenalidomide, LFM.A13, Metformin, Methotrexate, MG.132, Midostaurin, Mitomycin.C, MK.2206, MS.275, Nilotinib, NSC.87877, NU.7441, Nutlin.3a, NVP.BEZ235, NVP.TAE684, Obatoclax.Mesylate, OSI.906, PAC.1, Paclitaxel, Parthenolide, Pazopanib, PD.0325901, PD.0332991, PD.173074, PF.02341066, PF.4708671, PF.562271, PHA.665752, PLX4720, Pyrimethamine, QS11, Rapamycin, RDEA119, RO.3306, Roscovitine, Salubrinal, SB.216763, SB590885, Shikonin, SL.0101.1, Sorafenib, S.Trityl.L.cysteine, Sunitinib, Temsirolimus, Thapsigargin, Tipifarnib, TW.37, Vinblastine, Vinorelbine, Vorinostat, VX.680, VX.702, WH.4.023, WO2009093972, WZ.1.84, X17.AAG, X681640, XMD8.85, Z.LLNle.CHO, ZM.447439.
  15. tissueType       
  16. specify if you would like to traing the models on only a subset of the CGP cell lines (based on the tissue type from which the cell lines originated). This be one any of "all" (for everything, default option), "allSolidTumors" (everything except for blood), "blood", "breast", "CNS", "GI tract" ,"lung", "skin", "upper aerodigestive"
  17. batchCorrect       
  18. How should training and test data matrices be homogenized. Choices are "eb" (default) for ComBat, "qn" for quantiles normalization or "none" for no homogenization.
  19. powerTransformPhenotype       
  20. Should the phenotype be power transformed before we fit the regression model? Default to TRUE, set to FALSE if the phenotype is already known to be highly normal.
  21. removeLowVaryingGenes       
  22. What proportion of low varying genes should be removed? 20 percent be default
  23. minNumSamples       
  24. How many training and test samples are requried. Print an error if below this threshold
  25. selection       
  26. How should duplicate gene ids be handled. Default is -1 which asks the user. 1 to summarize by their or 2 to disguard all duplicates.
  27. printOutput       
  28. Set to FALSE to supress output
  29. Value
  30. a gene expression matrix that does not contain duplicate gene ids
复制代码
  1. # 加载数据集 硼替佐米(Bortezomib)
  2. data(bortezomibData)
复制代码
这个数据集来自与GSE9782。exprDataBortezomib数据集里面包含264的样本,pRRphetic里预处理后有22283个基因变量。另一个人重要的输入,就是drug response了。
  1. # studyResponse的level 有三个 Levels: PGx_Responder = IE PGx_Responder = NR PGx_Responder = R
  2. table(studyResponse)
  3. PGx_Responder = IE PGx_Responder = NR  PGx_Responder = R
  4. 25                126                113
复制代码
首先QQplot看一下IC50是否符合正态分布。
  1. pRRopheticQQplot("Bortezomib")
复制代码
074506js4g8kapgpgzz4m4.png

然后执行了一个 5-fold cross-validation,验证一下是否可以预测临床药物敏感性。可能朋友们会有疑问,为啥没有train dataset,只有test dataset,因为train dataset其实是默认好的,是CGP cell lines,可以用tissueType执行肿瘤类型(This be one any of “all” (for everything, default option), “allSolidTumors” (everything except for blood), “blood”, “breast”, “CNS”, “GI tract” ,“lung”, “skin”, “upper aerodigestive”)。
  1. cvOut <- pRRopheticCV("Bortezomib", cvFold=5, testExprData=exprDataBortezomib)
  2. summary(cvOut)
复制代码
看看结果
  1. Summary of cross-validation results:
  2. Pearsons correlation: 0.43 , P =  3.57572505890629e-14
  3. R-squared value: 0.19
  4. Estimated 95% confidence intervals: -4.41, 3.56
  5. Mean prediction error: 1.61
复制代码
  1. #Plot the cross validation predicted phenotype against the measured IC50s.
  2. plot(cvOut)
复制代码
074506kgz27z9forwfrhlw.png

pRRopheticPredict
Given a gene expression matrix, predict drug senstivity for a drug in CGP
Based on the qqplot it is likely acceptable to use these data for prediction of
bortezomib sensitivity. Predict bortezomib sensitivity using all cell lines, then
only cell lines from hematological cancers and then only cell lines from derived from solid tumors. (selection = ?,How should duplicate gene ids be handled. Default is -1 which asks the user. 1 to summarize by their or 2 to disguard all duplicates.)
  1. predictedPtype <- pRRopheticPredict(exprDataBortezomib, "Bortezomib",
  2.                                     selection=1)
  3. predictedPtype_blood <- pRRopheticPredict(exprDataBortezomib, "Bortezomib",
  4.                                           "blood", selection=1)
  5. predictedPtype_solid <- pRRopheticPredict(exprDataBortezomib, "Bortezomib",
  6.                                           "allSolidTumors", selection=1)
复制代码
  1. t.test(predictedPtype[((studyResponse == "PGx_Responder = NR") & bortIndex)],
  2.        predictedPtype[((studyResponse == "PGx_Responder = R") & bortIndex)],
  3.        alternative="greater")
  4.         Welch Two Sample t-test
  5. data:  predictedPtype[((studyResponse == "PGx_Responder = NR") & bortIndex)] and predictedPtype[((studyResponse == "PGx_Responder = R") & bortIndex)]
  6. t = 3.0109, df = 163.75, p-value = 0.001509
  7. alternative hypothesis: true difference in means is greater than 0
  8. 95 percent confidence interval:
  9. 0.1826465       Inf
  10. sample estimates:
  11. mean of x mean of y
  12. -4.926576 -5.331926
  13. t.test(predictedPtype_blood[((studyResponse == "PGx_Responder = NR") & bortIndex)],
  14.        predictedPtype_blood[((studyResponse == "PGx_Responder = R") & bortIndex)],
  15.        alternative="greater")
  16.         Welch Two Sample t-test
  17. data:  predictedPtype_blood[((studyResponse == "PGx_Responder = NR") & bortIndex)] and predictedPtype_blood[((studyResponse == "PGx_Responder = R") & bortIndex)]
  18. t = 4.1204, df = 165.24, p-value = 2.984e-05
  19. alternative hypothesis: true difference in means is greater than 0
  20. 95 percent confidence interval:
  21. 0.3975589       Inf
  22. sample estimates:
  23. mean of x mean of y
  24. -4.372173 -5.036370
  25. t.test(predictedPtype_solid[((studyResponse == "PGx_Responder = NR") & bortIndex)],
  26.        predictedPtype_solid[((studyResponse == "PGx_Responder = R") & bortIndex)],
  27.        alternative="greater")
  28. Welch Two Sample t-test
  29. data:  predictedPtype_solid[((studyResponse == "PGx_Responder = NR") & bortIndex)] and predictedPtype_solid[((studyResponse == "PGx_Responder = R") & bortIndex)]
  30. t = 1.1167, df = 165.87, p-value = 0.1329
  31. alternative hypothesis: true difference in means is greater than 0
  32. 95 percent confidence interval:
  33. -0.07325769         Inf
  34. sample estimates:
  35. mean of x mean of y
  36. -5.381191 -5.533412
复制代码
  1. allTissuesPpv <- getPPV(predResponders=predictedPtype[((studyResponse == "PGx_Responder = R") & bortIndex)], predNonResponders=predictedPtype[((studyResponse == "PGx_Responder = NR") & bortIndex)], drug="Bortezomib")
  2. PPV:  0.57
  3. NPV:  0.59
  4. Cutpoint:  -5.04
  5. bloodPpv <- getPPV(predResponders=predictedPtype_blood[((studyResponse == "PGx_Responder = R") & bortIndex)], predNonResponders=predictedPtype_blood[((studyResponse == "PGx_Responder = NR") & bortIndex)], drug="Bortezomib", tissue="blood")
  6. PPV:  0.63
  7. NPV:  0.69
  8. Cutpoint:  -4.54
复制代码
  1. df <- stack(list(NR=predictedPtype_blood[((studyResponse == "PGx_Responder = NR")
  2.                                           & bortIndex)], R=predictedPtype_blood[((studyResponse == "PGx_Responder = R") &
  3.                                                                                    bortIndex)]))
  4. ggplot(data=df, aes(y=values, x=ind)) + geom_boxplot(alpha=.3, fill=c("#CC0033", "#006633")) +
  5.   theme_bw() + ylab("Predicted Bortezomib Sensitivity") + xlab("Clinical Response")
复制代码
074507hh5nnognut0955h9.png

然后是CCLE数据库,
  1. data(ccleData)
复制代码
exprMatCcle是表达矩阵,里面有1037个cell line,18988个基因信息。
sensDataCcle里面有504的cell line的各种信息,包括“”CLE.Cell.Line.Name" ,"Primary.Cell.Line.Name”,“Compound”,还有最最重要的“IC50…uM.”
  1. cvOut_pd <- pRRopheticCV("PD.0325901", cvFold=5, testExprData=exprMatCcle)
  2. summary(cvOut_pd)
复制代码
  1. plot(cvOut_pd)
复制代码
074507ydudfcdzmsrzuhkq.png

  1. predictedPtype_ccle <- pRRopheticPredict(exprMatCcle, "PD.0325901", selection=1)
  2. 11897  gene identifiers overlap between the supplied expression matrices...
  3. Found2batches
  4. Adjusting for0covariate(s) or covariate level(s)
  5. Standardizing Data across genes
  6. Fitting L/S model and finding priors
  7. Finding parametric adjustments
  8. Adjusting the Data
  9. 2380 low variabilty genes filtered.
  10. Fitting Ridge Regression model... Done
  11. Calculating predicted phenotype...Done
复制代码
  1. cellLinesWithCcleIc50s <- names(predictedPtype_ccle)[names(predictedPtype_ccle) %in%
  2.                                                        sensDataCcle$CCLE.Cell.Line.Name]
  3. predCcleOrd <- predictedPtype_ccle[names(predictedPtype_ccle)]
  4. ccleActArea_pd <- -sensDataCcle$"ActArea"[sensDataCcle$Compound == "PD-0325901"]
  5. names(ccleActArea_pd) <- sensDataCcle$"CCLE.Cell.Line.Name"[sensDataCcle$Compound ==
  6.                                                               "PD-0325901"]
  7. ccleActAreaord <- ccleActArea_pd[cellLinesWithCcleIc50s]
复制代码
  1. cor.test(predictedPtype_ccle[cellLinesWithCcleIc50s], ccleActAreaord,
  2.          method="spearman")
  3. Spearman's rank correlation rho
  4. data:  predictedPtype_ccle[cellLinesWithCcleIc50s] and ccleActAreaord
  5. S = 7728298, p-value < 2.2e-16
  6. alternative hypothesis: true rho is not equal to 0
  7. sample estimates:
  8.       rho
  9. 0.5807112
复制代码
  1. df2 <- data.frame(predCcle=predictedPtype_ccle[cellLinesWithCcleIc50s],
  2.                   actAreaCcle=ccleActAreaord)
  3. ggplot(data=df2, aes(y=predCcle, x=actAreaCcle)) + geom_point(alpha=0.5) +
  4.   geom_smooth(method=lm) + theme_bw() + xlab("Measured Activity Area") +
  5.   ylab("Predicted Drug Sensitivity")
复制代码
074507jm9z4x4ucc4w9qti.png

  1. predictedPtype_ccle_erlotinib <- pRRopheticLogisticPredict(exprMatCcle, "Erlotinib", selection=1)
复制代码

Get the ActArea for the CCLE cell lines for which we have just predicted
IC50.
  1. cellLinesWithCcleIc50s <-
  2.   names(predictedPtype_ccle_erlotinib)[names(predictedPtype_ccle_erlotinib) %in%
  3.                                          sensDataCcle$CCLE.Cell.Line.Name]
  4. predCcleOrd <- predictedPtype_ccle_erlotinib[names(predictedPtype_ccle_erlotinib)]
  5. ccleActArea_pd <- sensDataCcle$"ActArea"[sensDataCcle$Compound == "Erlotinib"]
  6. names(ccleActArea_pd) <- sensDataCcle$"CCLE.Cell.Line.Name"[sensDataCcle$Compound ==
  7.                                                               "Erlotinib"]
  8. ccleActAreaord <- ccleActArea_pd[cellLinesWithCcleIc50s]
复制代码
There are a very large number of cell lines resistant to Erlotinib (within
the drug screening window), so a correlation is not an appropriate measure of concordance. So lets do a t-test between some of the most sensitive and resistant cell lines to assess whether signal is being captured by the predictions.
  1. resistant <- names(sort(ccleActAreaord))[1:55] #55 highly resistant cell lines.
  2. sensitive <- names(sort(ccleActAreaord, decreasing=TRUE))[1:15] #15 sensitive
  3. t.test(predictedPtype_ccle_erlotinib[resistant],
  4.        predictedPtype_ccle_erlotinib[sensitive])
  5.         Welch Two Sample t-test
  6. data:  predictedPtype_ccle_erlotinib[resistant] and predictedPtype_ccle_erlotinib[sensitive]
  7. t = 2.431, df = 32.127, p-value = 0.02081
  8. alternative hypothesis: true difference in means is not equal to 0
  9. 95 percent confidence interval:
  10. 0.2421538 2.7431351
  11. sample estimates:
  12. mean of x mean of y
  13. -5.184405 -6.677049
复制代码
Despite the fact that IC50 values are not correlated for this drug between
these studies, the most sensitive/resistant samples are separated highly significantly with this logistic models.
  1. boxplot(list(Resistant=predictedPtype_ccle_erlotinib[resistant],
  2.              Sensitive=predictedPtype_ccle_erlotinib[sensitive]), pch=20,
  3.         vertical=TRUE, method="jitter", ylab="Log-odds of sensitivity")
复制代码
074508r51rd1jvlggj5cr1.png

Include, is an example, prediction from the bortezomib clinical data where we try to predict CR, PR, MR, NC, PD from CR, PR, MR, NC, PD. This serves as both an example of prediction directly from clinical data and of using a dataset other than the CGP from which to predict.
First, prepare the training data and test expression data.
  1. trainExpr <- exprDataBortezomib[, (detailedResponse %in% c(1,2,3,4,5)) &
  2.                                   studyIndex %in% c("studyCode = 25", "studyCode = 40")]
  3. trainPtype <- detailedResponse[(detailedResponse %in% c(1,2,3,4,5)) &
  4.                                  studyIndex %in% c("studyCode = 25", "studyCode = 40")]
  5. testExpr <- exprDataBortezomib[, (detailedResponse %in% c(1,2,3,4,5)) &
  6.                                  studyIndex %in% c("studyCode = 39")]
复制代码
Calculate the predicted phenotype.
  1. ptypeOut <- calcPhenotype(trainExpr, trainPtype, testExpr, selection=1)
复制代码
  1. testPtype <- detailedResponse[(detailedResponse %in% c(1,2,3,4,5)) &
  2.                                 studyIndex %in% c("studyCode = 39")]
  3. cor.test(testPtype, ptypeOut, alternative="greater")
复制代码
  1. t.test(ptypeOut[testPtype %in% c(3,4,5)], ptypeOut[testPtype %in% c(1,2)],
  2.        alternative="greater")
复制代码
来源:https://blog.csdn.net/u013429737/article/details/125959971
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!
回复

使用道具 举报

提醒:禁止复制他人回复等『恶意灌水』行为,违者重罚!
您需要登录后才可以回帖 登录 | 注册[Register] 手机动态码快速登录 微信登录

本版积分规则

发布主题 快速回复 收藏帖子 返回列表 客服中心 搜索
简体中文 繁體中文 English 한국 사람 日本語 Deutsch русский بالعربية TÜRKÇE português คนไทย french

QQ|RSS订阅|小黑屋|处罚记录|手机版|联系我们|Archiver|医工互联 |粤ICP备2021178090号 |网站地图

GMT+8, 2024-9-20 10:35 , Processed in 0.288794 second(s), 63 queries .

Powered by Discuz!

Copyright © 2001-2023, Discuz! Team.

快速回复 返回顶部 返回列表