Breast cancers with PIK3CA mutations can be treated with PIK3CA inhibitors in hormone receptor-positive HER2 negative subtypes. We applied a supervised elastic net penalized logistic regression model to predict PIK3CA mutations from gene expression data. This regression approach was applied to predict modeling using the TCGA pan-cancer dataset. Approximately 10,000 cases were available for PIK3CA mutation and mRNA expression data. In 10-fold cross-validation, the model with $\lambda$ = 0.01 and $\alpha$ = 1.0 (ridge regression) showed the best performance, in terms of area under the receiver operating characteristic (AUROC). The final model was developed with selected hyper-parameters using the entire training set. The training set AUROC was 0.93, and the test set AUROC was 0.84. The area under the precision-recall (AUPR) of the training set was 0.66, and the test set AUPR was 0.39. Cancer types were the most important predictors. Both insulin like growth factor 1 receptor (IGF1R) and the phosphatase and tensin homolog (PTEN) were the most significant genes in gene expression predictors. Our study suggests that predicting genomic alterations using gene expression data is possible, with good outcomes.
Growing evidence suggests that the efficacy of immunotherapy in non-small cell lung cancers (NSCLCs) is associated with the immune microenvironment within the tumor. We aimed to explore radiologic phenotyping using a radiomics approach to assess the immune microenvironment in NSCLC. Two independent NSCLC cohorts (training and test sets) were included. Single-sample gene set enrichment analysis was used to determine the tumor microenvironment, where type 1 helper T (Th1) cells, type 2 helper T (Th2) cells, and cytotoxic T cells were the targets for prediction with computed tomographic (CT) radiomic features. Multiple algorithms were in the modeling followed by final model selection. The training dataset comprised 89 NSCLCs and the test set included 60 cases of lung squamous cell carcinoma and adenocarcinoma. A total of 239 CT radiomic features were used. A linear discriminant analysis model was selected for the final model of Th2 cell group prediction. The area under the curve value of the final model on the test set was 0.684. Predictors of the linear discriminant analysis model were skewness (total and outer pixels), kurtosis, variance (subsampled from delta [subtraction inner pixels from outer pixels]), and informational measure of correlation. The performances of radiomics on test set of Th1 and cytotoxic T cell were not accurate enough to be predictable. A radiomics approach can be used to interrogate an entire tumor in a noninvasive manner and provide added diagnostic value to identify the immune microenvironment of NSCLC, in particular, Th2 cell signatures.