# Machine learning

## Laplace transformation

The Fourier series represents a periodic function as a descrete vectors. The Fourier transformation turns a time domain non-periodic function into a frequency domain continuous function. The Fourier series and transformation change a single time base $$t$$ into infinite frequency basis $$e^{inx}$$ or $$e^{iwx}$$. The function on infinite basis domain can be represented by a vector or a function of basis domain $$v_{n}$$ or $$f(w)$$. This is a coefficients of Fourier series or Fourier transformation.

## Convolution and Fourier transformation

Convolution is a vector operation on two vectors. $Convolution \\ c * d = d*c \\ (c*d)_n = \Sigma_{i+j} c_i d_j = \Sigma_i c_i d_{n-i}.$ This is multiplying polynomials. The parameters of multiplied polynomial become convolution of two polynomials. Fourier transformation expands x base to infinite exponential basis $$e^{iwk}$$. The multiplication on x (time) space becomes convolutionn on k (frequency) space. If time space is periodic, its Fourier transformation is discrete i.

## Lagrange dual problem and conjugate function

The optimization problem have two components that are objective function $$f_0 : \mathbb R ^n \rightarrow \mathbb R$$ and the constraints. The objective function and constraints keep in check each other and make balance at saddle point i.e. optimal point. The dual (Lagrange) problem of the optimal problem also solve the optimization problem by making low boundary. The dual problem can be explained as a conjugate function $$f^* = \sup (x^Ty-f(x))$$.

## Approximation

The purpose of approximation is finding optimal point $$x^*$$ i.e. $$\nabla F(x^*) = 0$$. We need a step/search direction $$\Delta x$$ and step size $$t$$. Taylor approximation has polynomial arguments that is a step and parameters of derivatives at the start point. The first degree of Taylor approximation has one adding term from start point $$(x_0, F(x_0))$$. The adding term $$\nabla F(x) \Delta x$$ is consistent with a parameter (gradient $$\nabla F(x)$$) and a argument (step $$\Delta x$$).

## Singular vector decomposition

Bases are the central idea of linear algebra. An invertable square matrix has eigenvectors. A symetric matrix has orthogonal eigenvectors with non-negative eigenvalues, i.e. positive semidefinite. A matrix has two types of singular vectors, left and right signular vectors, $$A=U\Sigma V^{T}$$. When we think the matrix $$A$$ is data points of rows $$A=U\Sigma V^{T}$$ like data table, The right singular vectors $$V$$ build bases, the sigular values $$\Sigma$$ are magnitude of the bases and the left singular values $$U$$ becomes new data points on new bases.

## Low rank matrix and compressed sensing

This is a note for part III of Linear Algebra and learning from data, Gilbert Strang The main themes are sparsity (Low rank), Information theory (compression), and of course linear transformation. A full rank matrix is inefficient. Finding low lank matrix which is close with original matrix can save computation. The rank one matrix $$uv^{T}$$ is a unit of a matrix. The full rank matrix can be decomposed by sum of rank one matrices i.

The meaning of $$A^{T}$$ Steady state equilibrium Graph Laplacian matrix $$A^{T}CA$$ Differential equation and Laplacian matrix Derivative is a graph without branch. Row space and column space are dual. $$A$$ and $$A^{T}$$ are dual. ref) Linear algebra and learning from data, Part IV, Gilbert Strang

## Duality

This is summary of Boyd convex optimization. Steepest descent method is a convex optimization algorithm. The normalized steepest descent direction $$x_{nsd}$$ is a vector of unit ball of a norm that extends in the direction $$-\nabla f(x)$$. The inner product of $$x_{nsd}$$ and $$-\nabla f(x)$$ is maximized. The first order Taylor approximation of $$f(x+v) = f(x) + \nabla f(x)^{T} v$$ is most efficient when $$v = x_{nsd}$$. The $$x_{nsd}$$ is unnormalized into $$x_{sd}$$.

## Tidymodel

Machine Learning and Tidymodel Model setting, {Parsnip} Rpackage Parsnip standardizes model specification. Tidymodel follows the concept of lazy evaluation of the tidyverse. Parsnip sets unified specifications and lately evaluates. Feature engineering, {Recipes} Recipes make preprocessing easy with step_() functions. Recipes after specification calculate. Resampling, {rsample} To choose a model and hyperparameters, we must validate the different models. Making hyperparameter set, {dials} The Rpackage {dials} set hyperparameter similarily with {Parsnip}.

## Applied Machine Learning Workshop RStudio Conference 2020

This is a note of applied machine learning workshop RStudion conference 2020 Why is it hard to predict (domain knowledge). purrr::map allows inline code. purrr::map and tidyr::nest covered because they are used in resample or tune. Skew data might be looking outlier. People look at data in many different ways like outliers, missingness, correlation, and suspicion of an important variable. The ggplot is good to explore variables adding geoms changing plot.