R

Use Korean font in pdf generated from rmarkdown on Windows

yaml --- mainfont: NanumGothic output: pdf_document: latex_engine: xelatex --- The mainfont is the LaTex option. The argument is a font family. The exact names of the font family can be found in the fonts folder. Expand the header of the fonts folder. NanumGothic is the exact name of the font family. The font family should be Editable embedding type (https://docs.microsoft.com/en-us/typography/opentype/spec/os2#fstype). The LaTex engine should be xelatex. Let’s use another Korean font on pdf generated by rmarkdown.

Managing pulications of HUGO academic blog in Rstudio

This HUGO academic theme blog is managed by blogdown. (https://bookdown.org/yihui/blogdown/other-themes.html) The HUGO theme needs to transform bib file to each reference.md files. This can be done by bib2academic rpackage. (https://github.com/petzi53/bib2academic) For the featured publication, add featured: true in the reference.md file.

Tidymodel and glmnet

When the penalized generalize linear model (Lasso or Ridge) is processed in the tidymodel environment, finalizing the hyperparameter (lambda) and getting coefficients of the final model are confusing. Here is an example. This example predicts PIK3CA mutation status by gene expression data. TCGA breast cancer dataset is used. Modeling library(glmnet) library(themis) set.seed(930093) cv_splits <- rsample::vfold_cv(trainset_ahDiff, strata = PIK3CA_T) mod <- logistic_reg(penalty = tune(), mixture = tune()) %>% set_engine("glmnet") rec <- recipe(PIK3CA_T ~ .

Tidymodel

Machine Learning and Tidymodel Model setting, {Parsnip} Rpackage Parsnip standardizes model specification. Tidymodel follows the concept of lazy evaluation of the tidyverse. Parsnip sets unified specifications and lately evaluates. Feature engineering, {Recipes} Recipes make preprocessing easy with step_() functions. Recipes after specification calculate. Resampling, {rsample} To choose a model and hyperparameters, we must validate the different models. Making hyperparameter set, {dials} The Rpackage {dials} set hyperparameter similarily with {Parsnip}.

RStudio Conference 2020

Key Note J.J. Allaire RStudio becomes Public B Corp. J.J. Allaire’s favorite book Fooled by Randomness RStudio has restructured as a 'benefit corporation,' legally allowing it to consider the needs of its users and the #rstats community and not just its shareholders, JJ Allaire announced just now at #rstudioconf #rstudioconf2020 https://t.co/uG6SjNeLei — Sharon Machlis (@sharon000) January 29, 2020 #Rstudio evolution #rstudioconf2020 pic.twitter.com/euZFBTpvVY — 1LittleCoder💻 (@1littlecoder) January 29, 2020 Google AI PAIR team

Applied Machine Learning Workshop RStudio Conference 2020

This is a note of applied machine learning workshop RStudion conference 2020 Why is it hard to predict (domain knowledge). purrr::map allows inline code. purrr::map and tidyr::nest covered because they are used in resample or tune. Skew data might be looking outlier. People look at data in many different ways like outliers, missingness, correlation, and suspicion of an important variable. The ggplot is good to explore variables adding geoms changing plot.

ion2cbioportal Rpackage

https://github.com/Jkang-alien/ion2cbioportal This is Rpackage for making the cbioportal dataset from the ion torrent NGS result files. The formats of ion torrent result files are vcf and tsv. Those files convert to maf format and other data set for cbioportal. The package includes vignette. vignette("ion2cbioportal")

Predictive Modeling

NGS interpretation database and search

An NGS pathology report contains an interpretation section to describe the clinical interpretation of the found genomic variants of the patient’s cancer sample. The interpretation is different from each variant and each cancer type or the other clinical factors. The pathologists describe the interpretation and archive those in certain methods including tables like excel. I make a shinyapp to archive the interpretations to a database and search them. https://jkang.shinyapps.io/Interpretation/

Big Data TCGA

Abstact TCGA is a large consortium of comprehensive cancer genomic research that opens publically the data. There are several way to access to the TCGA data. This talk introduce among the simple ways.