Convex optimization

Lagrange dual problem and conjugate function

The optimization problem have two components that are objective function \(f_0 : \mathbb R ^n \rightarrow \mathbb R\) and the constraints. The objective function and constraints keep in check each other and make balance at saddle point i.e. optimal point. The dual (Lagrange) problem of the optimal problem also solve the optimization problem by making low boundary. The dual problem can be explained as a conjugate function \(f^* = \sup (x^Ty-f(x))\).

Approximation

The purpose of approximation is finding optimal point \(x^*\) i.e. \(\nabla F(x^*) = 0\). We need a step/search direction \(\Delta x\) and step size \(t\). Taylor approximation has polynomial arguments that is a step and parameters of derivatives at the start point. The first degree of Taylor approximation has one adding term from start point \((x_0, F(x_0))\). The adding term \(\nabla F(x) \Delta x\) is consistent with a parameter (gradient \(\nabla F(x)\)) and a argument (step \(\Delta x\)).

Taylor series

\[ f(x) = \sum_{k=0}^\infty c_k x^k = c_0 + c_1 x + c_2 x^2 + \dotsb. \] This is an approximation that is a function of h and derivatives of \(f(x)\) are elements of parameters. \(f(x \pm h) = f(x) \pm hf'(x) + \frac{h^2}{2}f''(x) \pm \frac{h^3}{6}f'''(x) + O(h^4)\) Let’s think about \(\sin(x)\). \[ f(x) = \sin(x) \ f(0) = 0, f'(x)=\cos(x)\ f'(0)=1, f''(x)=-\sin(x)\ f''(0)=0 \] Thus, \[\begin{align*} \sin(x) &= 0 + \frac{1}{1!

Duality

This is summary of Boyd convex optimization. Steepest descent method is a convex optimization algorithm. The normalized steepest descent direction \(x_{nsd}\) is a vector of unit ball of a norm that extends in the direction \(-\nabla f(x)\). The inner product of \(x_{nsd}\) and \(-\nabla f(x)\) is maximized. The first order Taylor approximation of \(f(x+v) = f(x) + \nabla f(x)^{T} v\) is most efficient when \(v = x_{nsd}\). The \(x_{nsd}\) is unnormalized into \(x_{sd}\).

Strong convexity and implications

This is a summary of the Boyd convex optimization book. The strong convexity assumption can be useful to explain the iterative minimization algorithms like gradient descent, steepest descent, and Newton’s method. The smallest and largest eigen value of Hessian \(m \preceq \nabla^{2}f(x) \preceq M\) with norm of gradient \(\| \nabla f(x)\|_2\) determine the boundary of optimal value \(p^{*}\). The condition number of cond(\(C_\alpha\)) \(\leq {M \over m}\), where \(C_\alpha\) is \(\alpha\)-sublevel.

Convex set

There is a homology between a line segment and a convex set. It is helpful to understand the convex set. A line, a line segment, and one sideline has homology to an affine set, a convex set, and a cone. A line is \(\{y|y=\theta_1 x_1 + \theta_2 x_2, \theta_1 + \theta_2 = 1\}\) if \(\theta_1, \theta_2 \in \mathbb{R}\), a line segment is if \(\theta_1, \theta_2 > 0\) and an one side line if any \(\theta_1, \theta_2 < 0\).