# Duality

This is summary of Boyd convex optimization. Steepest descent method is a convex optimization algorithm. The normalized steepest descent direction $$x_{nsd}$$ is a vector of unit ball of a norm that extends in the direction $$-\nabla f(x)$$. The inner product of $$x_{nsd}$$ and $$-\nabla f(x)$$ is maximized. The first order Taylor approximation of $$f(x+v) = f(x) + \nabla f(x)^{T} v$$ is most efficient when $$v = x_{nsd}$$.

The $$x_{nsd}$$ is unnormalized into $$x_{sd}$$. The normalization is ralated with unit ball of norm. When $$x_{nsd}$$ is scaled with dual norm of $$-\nabla f(x)$$, the second term of Taylor approximation $$\nabla f(x)^{T} x_{sd}$$ becomes convex (squre of dual norm of gradient of $$f(x)$$). The unnormalized $$x_{sd}$$ the amount of movement of approximation because the inner product of gradient of $$f(x)$$ and unnormalized steepest descent direction is squre of dual norm of gradient.

The dual norm of gradient $$\lVert \nabla f(x) \rVert$$ is main subject of this post. The simplest dual is a complement of a set. The $$(C^c)^c$$ is $$C$$. If $$C$$ is small, $$C^C$$ is large and vice versa. The dual cone is related to inner product and non-negativity. Let $$K$$ be a cone, The set $$K^{*} = \{y|x^{T}y \geq 0$$ for all $$x \in K\}$$. If $$K$$ is large, $$K^{*}$$ is small and vice versa.

The dual norm $$\left\lVert x \right\rVert _{*}$$ is $$\sup \{\, x^{T}y \mid \lVert y \rVert \leq 1 \,\}$$. If $$x$$ direction is long axis of ellypsoid norm, the norm of $$x$$ is small. But the dual norm is large because $$\lVert y \rVert _{2}$$ large and vice versa.

The main points are the first order Taylor approximation is most efficient with ellypsoid norm when the linear approximation is $$\sup\{\nabla f(x) ^{T} x_{sd}\}$$ which is dual norm of gradient of $$f(x)$$.

##### Jun Kang
###### Clinical Assistant Professor of Hospital Pathology

My research interests include pathology, oncology and statistics.