Math

Differential equations and Fourier transformation

Differential equations describe the change of state. The change relates to the state. The solutions of the differential equations are the status equations. The initial conditions set the time \(t\) and status \(y\). The boundary conditions are the value of boundary \(y_0\) and \(y_1\). \(dy \over dt\) \(= ay + q(t)\) starting from \(y(0)\) at \(t = 0\). inital conditions \(t = 0\) and \(y=1\) \(q(t)\) is a input and \(y(t)\) is a response.

Information

Information relates to uncertainty. The Shannon information content of an outcome \(x\) is \(h(x)=-log_{2}P(x)\). The rare event has larger information than a common event. The unit of information is a bit (binary digit). Coding is a mapping from an outcome of an ensemble to binary digits \(\{0,1\}^+\). A symbol code is a code for a single ensemble. A block code is a code for a sequence ensemble. A set of sequences of the ensemble has a typical subset.

Entropy

This is a note for Elements of information theory of Thomas M. Cover. The entropy (\(H\)) is a measure of uncertainty of a variable which is the answer to what is the ultimate data compression. Is the conditional probability \(p(x|y)\) considered as a probability of the “conditional variable” \((X|Y=y)\)? Yes, it is the subset of \(X\) given \(Y=y\). If you sum all of the subset probabilities, it becomes the cardinality of \(X\).

Duality

This is summary of Boyd convex optimization. Steepest descent method is a convex optimization algorithm. The normalized steepest descent direction \(x_{nsd}\) is a vector of unit ball of a norm that extends in the direction \(-\nabla f(x)\). The inner product of \(x_{nsd}\) and \(-\nabla f(x)\) is maximized. The first order Taylor approximation of \(f(x+v) = f(x) + \nabla f(x)^{T} v\) is most efficient when \(v = x_{nsd}\). The \(x_{nsd}\) is unnormalized into \(x_{sd}\).

Strong convexity and implications

This is a summary of the Boyd convex optimization book. The strong convexity assumption can be useful to explain the iterative minimization algorithms like gradient descent, steepest descent, and Newton’s method. The smallest and largest eigen value of Hessian \(m \preceq \nabla^{2}f(x) \preceq M\) with norm of gradient \(\| \nabla f(x)\|_2\) determine the boundary of optimal value \(p^{*}\). The condition number of cond(\(C_\alpha\)) \(\leq {M \over m}\), where \(C_\alpha\) is \(\alpha\)-sublevel.

Convex set

There is a homology between a line segment and a convex set. It is helpful to understand the convex set. A line, a line segment, and one sideline has homology to an affine set, a convex set, and a cone. A line is \(\{y|y=\theta_1 x_1 + \theta_2 x_2, \theta_1 + \theta_2 = 1\}\) if \(\theta_1, \theta_2 \in \mathbb{R}\), a line segment is if \(\theta_1, \theta_2 > 0\) and an one side line if any \(\theta_1, \theta_2 < 0\).

Reproducing Kernel Hilbert Space

Finally arrive at reproducing kernel Hilbert space. https://nzer0.github.io/reproducing-kernel-hilbert-space.html The above post introduces RKHS in Korean. It was helpful. I had struggled to understand some concepts in RKHS. What does mean Hilbert space in terms of feature expansion? (\(f:\mathcal{X} \to \mathbb{R}\), \(f \in \mathcal{H}_K\)) It was confusing the difference between \(f\) and \(f(x)\). \(f\) means the function in Hilbert space and \(f(x)\) is evaluation. I thought that the function can be represented by the inner product of the basis of feature space \(K(\cdot,x)\) and coefficients \(f\), and the coefficients are vectors in feature space.

Limit of inequality of sequence and epsilon

Here I summarize some tools for proof of the Riesz representation theorem. They are the limit of inequality of sequence and \(\epsilon\). The Rudin’s proof of the Riesz representation theorem construct measure \(\mu\) and measurable set \(\mathfrak{M}\), then prove the \(\mu\) and \(\mathfrak{M}\) have properties. Countable additivity (not subadditivity) is an important property. The strategy of proving equality (additivity) is bidirectional inequality. Limit of inequality of sequence gives us a tool that finite inequality makes infinite inequality.

Positive Borel measures

This is a note of real and complex analysis chapter 2. Chapter 2 is about measures. The measure already defined in chapter 1. In chapter 2, every linear functionals, not combination, of a continuous function space on compact set (\(C\)) (\(\Lambda f\)) represents the integration of the function (\(\int f du\)) (Riesz representation theorem). Let X be a locally compact Hausdorf space, and let \(\Lambda\) be a positive linear functional on \(C_c(X)\).

Abstract integration

This is a note for Rudin’s real and complex analysis chapter 1. The key concepts are \(\sigma\)-algebra, measure (\(\mu\)) zero, and linear combination. The three concepts bring me abstract integration. The \(\sigma\)-algebra makes that countable sum and measure of complement (subtract measure) can be possible. Measure zero completes the system. linear combination integrates a measurable function. After a measure space established, Lebesgue’s monotone convergence theorem, Fatou’s lemma, and Lebesgue’s dominant convergence theorem follow.