Machine learning

Dimension Reduction

Spectral decomposition

Gaussian kernel matrix can be factorized into \((\Phi \textbf{X})^\textbf{H} \Phi \textbf{X} =\textbf{X}^\textbf{H} \Phi^\textbf{H} \Phi \textbf{X} = \textbf{X}^\textbf{H}\textbf{X}\), where \(\Phi\) is Gaussian kernel basis matrix and \(\textbf{X}\) is coefficients matrix of reproducing kernel Hilbert space \(K(\cdot,x) \in \mathcal{H}_K\) A matrix is a system. A system takes input and gives output. A matrix is a linear system. Differentiation and Integration are linear systems. Fourier transformation matches input basis and operator (differentiation) basis.

Laplace transformation

The Fourier series represents a periodic function as a descrete vectors. The Fourier transformation turns a time domain non-periodic function into a frequency domain continuous function. The Fourier series and transformation change a single time base \(t\) into infinite frequency basis \(e^{inx}\) or \(e^{iwx}\). The function on infinite basis domain can be represented by a vector or a function of basis domain \(v_{n}\) or \(f(w)\). This is a coefficients of Fourier series or Fourier transformation.

Convolution and Fourier transformation

Convolution is a vector operation on two vectors. \[ Convolution \\ c * d = d*c \\ (c*d)_n = \Sigma_{i+j} c_i d_j = \Sigma_i c_i d_{n-i}.\] This is multiplying polynomials. The parameters of multiplied polynomial become convolution of two polynomials. Fourier transformation expands x base to infinite exponential basis \(e^{iwk}\). The multiplication on x (time) space becomes convolutionn on k (frequency) space. If time space is periodic, its Fourier transformation is discrete i.

Lagrange dual problem and conjugate function

The optimization problem have two components that are objective function \(f_0 : \mathbb R ^n \rightarrow \mathbb R\) and the constraints. The objective function and constraints keep in check each other and make balance at saddle point i.e. optimal point. The dual (Lagrange) problem of the optimal problem also solve the optimization problem by making low boundary. The dual problem can be explained as a conjugate function \(f^* = \sup (x^Ty-f(x))\).


The purpose of approximation is finding optimal point \(x^*\) i.e. \(\nabla F(x^*) = 0\). We need a step/search direction \(\Delta x\) and step size \(t\). Taylor approximation has polynomial arguments that is a step and parameters of derivatives at the start point. The first degree of Taylor approximation has one adding term from start point \((x_0, F(x_0))\). The adding term \(\nabla F(x) \Delta x\) is consistent with a parameter (gradient \(\nabla F(x)\)) and a argument (step \(\Delta x\)).

Differential equations and Fourier transformation

Differential equations describe the change of state. The change relates to the state. The solutions of the differential equations are the status equations. The initial conditions set the time \(t\) and status \(y\). The boundary conditions are the value of boundary \(y_0\) and \(y_1\). \(dy \over dt\) \(= ay + q(t)\) starting from \(y(0)\) at \(t = 0\). inital conditions \(t = 0\) and \(y=1\) \(q(t)\) is a input and \(y(t)\) is a response.


Information relates to uncertainty. The Shannon information content of an outcome \(x\) is \(h(x)=-log_{2}P(x)\). The rare event has larger information than a common event. The unit of information is a bit (binary digit). Coding is a mapping from an outcome of an ensemble to binary digits \(\{0,1\}^+\). A symbol code is a code for a single ensemble. A block code is a code for a sequence ensemble. A set of sequences of the ensemble has a typical subset.

Taylor series

\[ f(x) = \sum_{k=0}^\infty c_k x^k = c_0 + c_1 x + c_2 x^2 + \dotsb. \] This is an approximation that is a function of h and derivatives of \(f(x)\) are elements of parameters. \(f(x \pm h) = f(x) \pm hf'(x) + \frac{h^2}{2}f''(x) \pm \frac{h^3}{6}f'''(x) + O(h^4)\) Let’s think about \(\sin(x)\). \[ f(x) = \sin(x) \ f(0) = 0, f'(x)=\cos(x)\ f'(0)=1, f''(x)=-\sin(x)\ f''(0)=0 \] Thus, \[\begin{align*} \sin(x) &= 0 + \frac{1}{1!

Tidymodel and glmnet

When the penalized generalize linear model (Lasso or Ridge) is processed in the tidymodel environment, finalizing the hyperparameter (lambda) and getting coefficients of the final model are confusing. Here is an example. This example predicts PIK3CA mutation status by gene expression data. TCGA breast cancer dataset is used. Modeling library(glmnet) library(themis) set.seed(930093) cv_splits <- rsample::vfold_cv(trainset_ahDiff, strata = PIK3CA_T) mod <- logistic_reg(penalty = tune(), mixture = tune()) %>% set_engine("glmnet") rec <- recipe(PIK3CA_T ~ .