Stastics | Jun's Blog

Tidymodel and glmnet

When the penalized generalize linear model (Lasso or Ridge) is processed in the tidymodel environment, finalizing the hyperparameter (lambda) and getting coefficients of the final model are confusing. Here is an example. This example predicts PIK3CA mutation status by gene expression data. TCGA breast cancer dataset is used. Modeling library(glmnet) library(themis) set.seed(930093) cv_splits <- rsample::vfold_cv(trainset_ahDiff, strata = PIK3CA_T) mod <- logistic_reg(penalty = tune(), mixture = tune()) %>% set_engine("glmnet") rec <- recipe(PIK3CA_T ~ .

Tidymodel

Machine Learning and Tidymodel Model setting, {Parsnip} Rpackage Parsnip standardizes model specification. Tidymodel follows the concept of lazy evaluation of the tidyverse. Parsnip sets unified specifications and lately evaluates. Feature engineering, {Recipes} Recipes make preprocessing easy with step_() functions. Recipes after specification calculate. Resampling, {rsample} To choose a model and hyperparameters, we must validate the different models. Making hyperparameter set, {dials} The Rpackage {dials} set hyperparameter similarily with {Parsnip}.

Applied Machine Learning Workshop RStudio Conference 2020

This is a note of applied machine learning workshop RStudion conference 2020 Why is it hard to predict (domain knowledge). purrr::map allows inline code. purrr::map and tidyr::nest covered because they are used in resample or tune. Skew data might be looking outlier. People look at data in many different ways like outliers, missingness, correlation, and suspicion of an important variable. The ggplot is good to explore variables adding geoms changing plot.

Reproducing Kernel Hilbert Space

Finally arrive at reproducing kernel Hilbert space. https://nzer0.github.io/reproducing-kernel-hilbert-space.html The above post introduces RKHS in Korean. It was helpful. I had struggled to understand some concepts in RKHS. What does mean Hilbert space in terms of feature expansion? (\(f:\mathcal{X} \to \mathbb{R}\), \(f \in \mathcal{H}_K\)) It was confusing the difference between \(f\) and \(f(x)\). \(f\) means the function in Hilbert space and \(f(x)\) is evaluation. I thought that the function can be represented by the inner product of the basis of feature space \(K(\cdot,x)\) and coefficients \(f\), and the coefficients are vectors in feature space.

Bayesian

Abstract Introduction of Bayesian approch in base calling and copy number variation (CNV). This is for the intradepartment lecture. url_slides: ‘http://rpubs.com/JKang/492555’