数学代写|凸优化作业代写Convex Optimization代考|DUALITY OF INNER AND OUTER LINEARIZATION
We have considered so far cutting plane and simplicial decomposition methods, and we will now aim to connect them via duality. To this end, we define in this section outer and inner linearizations, and we formalize their conjugacy relation and other related properties. An outer linearization of a closed proper convex function $f: \Re^n \mapsto(-\infty, \infty]$ is defined by a finite set of vectors $\left{y_1, \ldots, y_{\ell}\right}$ such that for every $j=1, \ldots, \ell$, we have $y_j \in \partial f\left(x_j\right)$ for some $x_j \in \Re^n$. It is given by $$ F(x)=\max _{j=1, \ldots, \ell}\left{f\left(x_j\right)+\left(x-x_j\right)^{\prime} y_j\right}, \quad x \in \Re^n, $$
and it is illustrated in the left side of Fig. 4.3.1. The choices of $x_j$ such that $y_j \in \partial f\left(x_j\right)$ may not be unique, but result in the same function $F(x)$ : the epigraph of $F$ is determined by the supporting hyperplanes to the epigraph of $f$ with normals defined by $y_j$, and the points of support $x_j$ are immaterial. In particular, the definition (4.14) can be equivalently written in terms of the conjugate $f^{\star}$ of $f$ as $$ F(x)=\max _{j=1, \ldots, \ell}\left{x^{\prime} y_j-f^{\star}\left(y_j\right)\right}, $$ using the relation $x_j^{\prime} y_j=f\left(x_j\right)+f^{\star}\left(y_j\right)$, which is implied by $y_j \in \partial f\left(x_j\right)$ (the Conjugate Subgradient Theorem, Prop. 5.4.3 in Appendix B).
Note that $F(x) \leq f(x)$ for all $x$, so as is true for any outer approximation of $f$, the conjugate $F^{\star}$ satisfies $F^{\star}(y) \geq f^{\star}(y)$ for all $y$. Moreover, it can be shown that $F^{\star}$ is an inner linearization of the conjugate $f^{\star}$, as illustrated in the right side of Fig. 4.3.1. Indeed we have, using Eq. (4.15), $$ \begin{aligned} F^{\star}(y)= & \sup {x \in \Re^n}\left{y^{\prime} x-F(x)\right} \ & =\sup {x \in \Re^n}\left{y^{\prime} x-\max {j=1, \ldots, \ell}\left{y_j^{\prime} x-f^{\star}\left(y_j\right)\right}\right}, \ & =\sup {\substack{x \in \Re^n, \xi \in \Re \ y_j^{\prime} x-f^{\star}\left(y_j\right) \leq \xi, j=1, \ldots, \ell}}\left{y^{\prime} x-\xi\right} . \end{aligned} $$
We will now consider a unified framework for polyhedral approximation, which combines the cutting plane and simplicial decomposition methods. We consider the problem $$ \begin{array}{ll} \operatorname{minimize} & \sum_{i=1}^m f_i\left(x_i\right) \ \text { subject to } & x \in S, \end{array} $$ where $$ x \stackrel{\text { def }}{=}\left(x_1, \ldots, x_m\right), $$
is a vector in $\Re^{n_1+\cdots+n_m}$, with components $x_i \in \Re^{n_i}, i=1, \ldots, m$, and $f_i: \Re^{n_i} \mapsto(-\infty, \infty]$ is a closed proper convex function for each $i$, $S$ is a subspace of $\Re^{n_1+\cdots+n_m}$.
We refer to this as an extended monotropic program (EMP for short). $\dagger$ A classical example of EMP is a single commodity network optimization problem, where $x_i$ represents the (scalar) flow of an arc of a directed graph and $S$ is the circulation subspace of the graph (see e.g., [Ber98]). Also problems involving general linear constraints and an additive extended realvalued convex cost function can be converted to EMP. In particular, the problem $$ \begin{array}{ll} \text { minimize } & \sum_{i=1}^m f_i\left(x_i\right) \ \text { subject to } & A x=b, \end{array} $$ where $A$ is a given matrix and $b$ is a given vector, is equivalent to $$ \begin{array}{ll} \text { minimize } & \sum_{i=1}^m f_i\left(x_i\right)+\delta_Z(z) \ \text { subject to } & A x-z=0, \end{array} $$ where $z$ is a vector of artificial variables, and $\delta_Z$ is the indicator function of the set $Z={z \mid z=b}$. This is an EMP with constraint subspace $$ S={(x, z) \mid A x-z=0} . $$
现代博弈论始于约翰-冯-诺伊曼(John von Neumann)提出的两人零和博弈中的混合策略均衡的观点及其证明。冯-诺依曼的原始证明使用了关于连续映射到紧凑凸集的布劳威尔定点定理,这成为博弈论和数学经济学的标准方法。在他的论文之后,1944年,他与奥斯卡-莫根斯特恩(Oskar Morgenstern)共同撰写了《游戏和经济行为理论》一书,该书考虑了几个参与者的合作游戏。这本书的第二版提供了预期效用的公理理论,使数理统计学家和经济学家能够处理不确定性下的决策。
This is a graduate-level course on optimization. The course covers mathematical programming and combinatorial optimization from the perspective of convex optimization, which is a central tool for solving large-scale problems. In recent years convex optimization has had a profound impact on statistical machine learning, data analysis, mathematical finance, signal processing, control, theoretical computer science, and many other areas. The first part will be dedicated to the theory of convex optimization and its direct applications. The second part will focus on advanced techniques in combinatorial optimization using machinery developed in the first part.
4.51 Monotone transformation of objective in vector optimization. Consider the vector optimization problem (4.56). Suppose we form a new vector optimization problem by replacing the objective $f_0$ with $\phi \circ f_0$, where $\phi: \mathbf{R}^q \rightarrow \mathbf{R}^q$ satisfies $$ u \preceq K v, u \neq v \Longrightarrow \phi(u) \preceq_K \phi(v), \phi(u) \neq \phi(v) \text {. } $$ Show that a point $x$ is Pareto optimal (or optimal) for one problem if and only if it is Pareto optimal (optimal) for the other, so the two problems are equivalent. In particular, composing each objective in a multicriterion problem with an increasing function does not affect the Pareto optimal points.
问题 2.
4.52 Pareto optimal points and the boundary of the set of achievable values. Consider a vector optimization problem with cone $K$. Let $\mathcal{P}$ denote the set of Pareto optimal values, and let $\mathcal{O}$ denote the set of achievable objective values. Show that $\mathcal{P} \subseteq \mathcal{O} \cap \mathbf{b d} \mathcal{O}$, i.e., every Pareto optimal value is an achievable objective value that lies in the boundary of the set of achievable objective values.
问题 3.
4.53 Suppose the vector optimization problem (4.56) is convex. Show that the set $$ \mathcal{A}=\mathcal{O}+K=\left{t \in \mathbf{R}^q \mid f_0(x) \preceq_K t \text { for some feasible } x\right}, $$ is convex. Also show that the minimal elements of $\mathcal{A}$ are the same as the minimal points of $\mathcal{O}$.
问题 4.
4.54 Scalarization and optimal points. Suppose a (not necessarily convex) vector optimization problem has an optimal point $x^$. Show that $x^$ is a solution of the associated scalarized problem for any choice of $\lambda \succ K * 0$. Also show the converse: If a point $x$ is a solution of the scalarized problem for any choice of $\lambda \succ \kappa * 0$, then it is an optimal point for the (not necessarily convex) vector optimization problem.
Textbooks
• An Introduction to Stochastic Modeling, Fourth Edition by Pinsky and Karlin (freely available through the university library here) • Essentials of Stochastic Processes, Third Edition by Durrett (freely available through the university library here) To reiterate, the textbooks are freely available through the university library. Note that you must be connected to the university Wi-Fi or VPN to access the ebooks from the library links. Furthermore, the library links take some time to populate, so do not be alarmed if the webpage looks bare for a few seconds.
数学代写|凸优化作业代写Convex Optimization代考|Supremum and infimum
In mathematics, given a subset $S$ of a partially ordered set $T$, the supremum (sup) of $S$, if it exists, is the least element of $T$ that is greater than or equal to each element of $S$. Consequently, the supremum is also referred to as the least upper bound, lub or $L U B$. If the supremum exists, it may or may not belong to $S$. On the other hand, the infimum (inf) of $S$ is the greatest element in $T$, not necessarily in $S$, that is less than or equal to all elements of $S$. Consequently the
term greatest lower bound (also abbreviated as glb or GLB) is also commonly used. Consider a set $C \subseteq \mathbb{R}$.
A number $a$ is an upper bound (lower bound) on $C$ if for each $x \in C, x \leq$ $a(x \geq a)$.
A number $b$ is the least upper bound (greatest lower bound) or the supremum (infimum) of $C$ if (i) $b$ is an upper bound (lower bound) on $C$, and (ii) $b \leq a(b \geq a)$ for every upper bound (lower bound) $a$ on $C$. Remark $1.8$ An infimum is in a precise sense dual to the concept of a supremum and vice versa. For instance, sup $C=\infty$ if $C$ is unbounded above and inf $C=$ $-\infty$ if $C$ is unbounded below.
数学代写|凸优化作业代写Convex Optimization代考|Derivative and gradient
Since vector limits are computed by taking the limit of each coordinate function, we can write the function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ for a point $\mathbf{x} \in \mathbb{R}^{n}$ as follows: $$ \boldsymbol{f}(\mathbf{x})=\left[\begin{array}{c} f_{1}(\mathbf{x}) \ f_{2}(\mathbf{x}) \ \vdots \ f_{m}(\mathbf{x}) \end{array}\right]=\left(f_{1}(\mathbf{x}), f_{2}(\mathbf{x}), \ldots, f_{m}(\mathbf{x})\right) $$ where each $f_{i}(\mathbf{x})$ is a function from $\mathbb{R}^{n}$ to $\mathbb{R}$. Now, $\frac{\partial \boldsymbol{f}(\mathbf{x})}{\partial x_{j}}$ can be defined as $$ \frac{\partial \boldsymbol{f}(\mathbf{x})}{\partial x_{j}}=\left[\begin{array}{c} \frac{\partial f_{1}(\mathbf{x})}{\partial x_{j}} \ \frac{\partial f_{2}(\mathbf{x})}{\partial x_{j}} \ \vdots \ \frac{\partial f_{m}(\mathbf{x})}{\partial x_{j}} \end{array}\right]=\left(\frac{\partial f_{1}(\mathbf{x})}{\partial x_{j}}, \frac{\partial f_{2}(\mathbf{x})}{\partial x_{j}}, \ldots, \frac{\partial f_{m}(\mathbf{x})}{\partial x_{j}}\right) $$ The above vector is a tangent vector at the point $\mathbf{x}$ of the curve $\boldsymbol{f}$ obtained by varying only $x_{j}$ (the $j$ th coordinate of $\mathbf{x}$ ) with $x_{i}$ fixed for all $i \neq j$.
The norm ball of a point $\mathbf{x} \in \mathbb{R}^{n}$ is defined as the following set ${ }^{1}$ $$ B(\mathbf{x}, r)=\left{\mathbf{y} \in \mathbb{R}^{n} \mid|\mathbf{y}-\mathbf{x}| \leq r\right} . $$ where $r$ is the radius and $\mathbf{x}$ is the center of the norm ball. It is also called the neighborhood of the point $\mathbf{x}$. For the case of $n=2, \mathbf{x}=\mathbf{0}{2}$, and $r=1$, the 2 norm ball is $B(\mathbf{x}, r)=\left{\mathbf{y} \mid y{1}^{2}+y_{2}^{2} \leq 1\right}$ (a circular disk of radius equal to 1 ), the 1-norm ball is $B(\mathbf{x}, r)=\left{\mathbf{y}|| y_{1}|+| y_{2} \mid \leq 1\right}$ (a 2-dimensional cross-polytope of area equal to 2 ), and the $\infty$-norm ball is $B(\mathbf{x}, r)=\left{\mathbf{y}|| y_{1}|\leq 1,| y_{2} \mid \leq 1\right$,$} (a$ square of area equal to 4) (see Figure 1.1). Note that the norm ball is symmetric with respect to (w.r.t.) the origin, convex, closed, bounded and has nonempty interior. Moreover, the 1-norm ball is a subset of the 2-norm ball which is a subset of $\infty$-norm ball, due to the following inequality: $$ |\mathbf{v}|_{p} \leq|\mathbf{v}|_{q} $$ where $\mathbf{v} \in \mathbb{R}^{n}, p$ and $q$ are real and $p>q \geq 1$, and the equality holds when $\mathbf{v}=$ re $_{i}$, i.e., all the $p$-norm balls of constant radius $r$ have intersections at $r \mathbf{e}{i}, i=$ $1, \ldots, n$. For instance, in Figure 1.1, $\left|\mathbf{x}{1}\right|_{p}=1$ for all $p \geq 1$, and $\left|\mathbf{x}{2}\right|{\infty}=1<$ $\left|\mathbf{x}{2}\right|{2}=\sqrt{2}<\left|\mathbf{x}{2}\right|{1}=2$. The inequality (1.17) is proven as follows.
数学代写|凸优化作业代写Convex Optimization代考|Interior point
A point $\mathbf{x}$ in a set $C \subseteq \mathbb{R}^{n}$ is an interior point of the set $C$ if there exists an $\epsilon>0$ for which $B(\mathbf{x}, \epsilon) \subseteq C$ (see Figure 1.3). In other words, a point $\mathbf{x} \in C$ is said to be an interior point of the set $C$ if the set $C$ contains some neighborhood of $\mathbf{x}$, that is, if all points within some neighborhood of $\mathbf{x}$ are also in $C$.
Remark $1.7$ The set of all the interior points of $C$ is called the interior of $C$ and is represented as int $C$, which can also be expressed as $$ \text { int } C={\mathbf{x} \in C \mid B(\mathbf{x}, r) \subseteq C \text {, for some } r>0} \text {, } $$ which will be frequently used in many proofs directly or indirectly in the ensuing chapters. Complement, scaled sets, and sum of sets The complement of a set $C \subset \mathbb{R}^{n}$ is defined as follows (see Figure 1.3): $$ \mathbb{R}^{n} \backslash C=\left{\mathbf{x} \in \mathbb{R}^{n} \mid \mathbf{x} \notin C\right}, $$ where ” $\backslash$ ” denotes the set difference, i.e., $A \backslash B={\mathbf{x} \in A \mid \mathbf{x} \notin B}$. The set $C \subset$ $\mathbb{R}^{n}$ scaled by a real number $\alpha$ is a set defined as $$ \alpha \cdot C \triangleq{\alpha \mathbf{x} \mid \mathbf{x} \in C} $$
In mathematics, a matrix norm is a natural extension of the notion of a vector norm to matrices. Some useful matrix norms needed throughout the book are introduced next. The Frobenius norm of an $m \times n$ matrix $\mathbf{A}$ is defined as $$ |\mathbf{A}|_{\mathrm{F}}=\left(\sum_{i=1}^{m} \sum_{j=1}^{n}\left|[\mathbf{A}]{i j}\right|^{2}\right)^{1 / 2}=\sqrt{\operatorname{Tr}\left(\mathbf{A}^{T} \mathbf{A}\right)} $$ where $$ \operatorname{Tr}(\mathbf{X})=\sum{i=1}^{n}[\mathbf{X}]_{i i} $$ denotes the trace of a square matrix $\mathbf{X} \in \mathbb{R}^{n \times n}$. As $n=1$, A reduces to a column vector of dimension $m$ and its Frobenius norm also reduces to the 2 -norm of the vector.
The other class of norm is known as the induced norm or operator norm. Suppose that $|\cdot|_{a}$ and $|\cdot|_{b}$ are norms on $\mathbb{R}^{m}$ and $\mathbb{R}^{n}$, respectively. Then the operator/induced norm of $\mathbf{A} \in \mathbb{R}^{m \times n}$, induced by the norms $|\cdot|_{a}$ and $\left|_{\cdot}\right|_{b}$, is defined as $$ |\mathbf{A}|_{a, b}=\sup \left{|\mathbf{A} \mathbf{u}|_{a} \mid|\mathbf{u}|_{b} \leq 1\right} $$ where $\sup (C)$ denotes the least upper bound of the set $C$. As $a=b$, we simply denote $|\mathbf{A}|_{a, b}$ by $|\mathbf{A}|_{a}$. Commonly used induced norms of an $m \times n$ matrix $$ \mathbf{A}=\left{a_{i j}\right}_{m \times n}=\left[\mathbf{a}{1}, \ldots, \mathbf{a}{n}\right] $$ are as follows: $$ \begin{aligned} |\mathbf{A}|_{1} &=\max {|\mathbf{u}|{1} \leq 1} \sum_{j=1}^{n} u_{j} \mathbf{a}{j} |{1}, \quad(a=b=1) \ & \leq \max {|\mathbf{u}|{1} \leq 1} \sum_{j=1}^{n}\left|u_{j}\right| \cdot\left|\mathbf{a}{j}\right|{1} \text { (by triangle inequality) } \ &=\max {1 \leq j \leq n}\left|\mathbf{a}{j}\right|_{1}=\max {1 \leq j \leq n} \sum{i=1}^{m}\left|a_{i j}\right| \end{aligned} $$
数学代写|凸优化作业代写Convex Optimization代考|Inner product
The inner product of two real vectors $\mathbf{x} \in \mathbb{R}^{n}$ and $\mathbf{y} \in \mathbb{R}^{n}$ is a real scalar and is defined as $$ \langle\mathbf{x}, \mathbf{y}\rangle=\mathbf{y}^{T} \mathbf{x}=\sum_{i=1}^{n} x_{i} y_{i} $$ If $\mathbf{x}$ and $\mathbf{y}$ are complex vectors, then the transpose in the above equation will be replaced by Hermitian. Note that the square root of the inner product of a vector $\mathbf{x}$ with itself gives the Euclidean norm of that vector.
Cauchy-Schwartz inequality: For any two vectors $\mathbf{x}$ and $\mathbf{y}$ in $\mathbb{R}^{n}$, the CauchySchwartz inequality $$ |\langle\mathbf{x}, \mathbf{y}\rangle| \leq|\mathbf{x}|_{2} \cdot|\mathbf{y}|_{2} $$ holds. Furthermore, the equality holds if and only if $\mathbf{x}=\alpha \mathbf{y}$ for some $\alpha \in \mathbb{R}$. Pythagorean theorem: If two vectors $\mathbf{x}$ and $\mathbf{y}$ in $\mathbb{R}^{n}$ are orthogonal, i.e., $\langle\mathbf{x}, \mathbf{y}\rangle=0$, then $$ |\mathbf{x}+\mathbf{y}|_{2}^{2}=(\mathbf{x}+\mathbf{y})^{T}(\mathbf{x}+\mathbf{y})=|\mathbf{x}|_{2}^{2}+2(\mathbf{x}, \mathbf{y}\rangle+|\mathbf{y}|_{2}^{2}=|\mathbf{x}|_{2}^{2}+|\mathbf{y}|_{2}^{2} $$
数学代写|凸优化作业代写Convex Optimization代考|Definition and examples
A function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ is said to be quasiconvex if its domain and all its $\alpha$ sublevel sets defined as $$ S_{a}={\mathbf{x} \mid \mathbf{x} \in \operatorname{dom} f, f(\mathbf{x}) \leq \alpha} $$ (see Figure $3.8$ ) are convex for every $\alpha$. Moreover,
$f$ is quasiconvex if $f$ is convex since every sublevel set of convex functions is a convex set (cf. Remark 3.5), but the converse is not necessarily true.
$f$ is quasiconcave if $-f$ is quasiconvex. It is also true that $f$ is quasiconcave if its domain and all the $\alpha$-superlevel sets defined as
A function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ is quasiconvex if and only if $$ f(\theta \mathbf{x}+(1-\theta) \mathbf{y}) \leq \max {f(\mathbf{x}), f(\mathbf{y})} $$ for all $\mathbf{x}, \mathbf{y} \in \operatorname{dom} f$, and $0 \leq \theta \leq 1$ (see Figure $3.8$ ). Proof: Let us prove the necessity followed by sufficiency.
Necessity: Let $\mathbf{x}, \mathbf{y} \in \operatorname{dom} f$. Choose $\alpha=\max {f(\mathbf{x}), f(\mathbf{y})}$. Then $\mathbf{x}, \mathbf{y} \in S_{\alpha}$. Since $f$ is quasiconvex by assumption, $S_{\alpha}$ is convex, that is, for $\theta \in[0,1]$, $$ \begin{aligned} &\theta \mathbf{x}+(1-\theta) \mathbf{y} \in S_{\alpha} \ &\Rightarrow f(\theta \mathbf{x}+(1-\theta) \mathbf{y}) \leq \alpha=\max {f(\mathbf{x}), f(\mathbf{y})} \end{aligned} $$
Sufficiency: For every $\alpha$, pick two points $\mathbf{x}, \mathbf{y} \in S_{\alpha} \Rightarrow f(\mathbf{x}) \leq \alpha, f(\mathbf{y}) \leq \alpha$. Since for $0 \leq \theta \leq 1, f(\theta \mathbf{x}+(1-\theta) \mathbf{y}) \leq \max {f(\mathbf{x}), f(\mathbf{y})} \leq \alpha($ by $(3.106))$, we have $$ \theta \mathbf{x}+(1-\theta) \mathbf{y} \in S_{\alpha} $$ Therefore, $S_{\alpha}$ is convex and thus the function $f$ is quasiconvex. Remark 3.33 $f$ is quasiconcave if and only if $$ f(\theta \mathbf{x}+(1-\theta) \mathbf{y}) \geq \min {f(\mathbf{x}), f(\mathbf{y})} $$ for all $\mathbf{x}, \mathbf{y} \in \operatorname{dom} f$, and $0 \leq \theta \leq 1$. This is also the modified Jensen’s inequality for quasiconcave functions. If the inequality (3.106) holds strictly for $0<\theta<1$, then $f$ is strictly quasiconvex. Similarly, if the inequality (3.107) holds strictly for $0<\theta<1$, then $f$ is strictly quasiconcave. Remark 3.34 Since the rank of a PSD matrix is quasiconcave, $$ \operatorname{rank}(\mathbf{X}+\mathbf{Y}) \geq \min {\operatorname{rank}(\mathbf{X}), \operatorname{rank}(\mathbf{Y})}, \mathbf{X}, \mathbf{Y} \in \mathbb{S}{+}^{\mathrm{n}} $$ holds true. This can be proved by (3.107), by which we get $$ \operatorname{rank}(\theta \mathbf{X}+(1-\theta) \mathbf{Y}) \geq \min {\operatorname{rank}(\mathbf{X}), \operatorname{rank}(\mathbf{Y})} $$ for all $\mathbf{X} \in \mathbb{S}{+}^{n}, \mathbf{Y} \in \mathbb{S}_{+}^{n}$, and $0 \leq \theta \leq 1$. Then replacing $\mathbf{X}$ by $\mathbf{X} / \theta$ and $\mathbf{Y}$ by $\mathbf{Y} /(1-\theta)$ where $\theta \neq 0$ and $\theta \neq 1$ gives rise to (3.108).
Remark $3.35 \operatorname{card}(\mathbf{x}+\mathbf{y}) \geq \min {\operatorname{card}(\mathbf{x}), \operatorname{card}(\mathbf{y})}, \mathbf{x}, \mathbf{y} \in \mathbb{R}_{+}^{n}$. Similar to the proof of (3.108) in Remark $3.34$, this inequality can be shown to be true by using $(3.107)$ again, since $\operatorname{card}(\mathrm{x})$ is quasiconcave.
Suppose that $f$ is differentiable. Then $f$ is quasiconvex if and only if dom $f$ is convex and for all $\mathbf{x}, \mathbf{y} \in \operatorname{dom} f$ $$ f(\mathbf{y}) \leq f(\mathbf{x}) \Rightarrow \nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x}) \leq 0, $$ that is, $\nabla f(\mathbf{x})$ defines a supporting hyperplane to the sublevel set $$ S_{\alpha=f(\mathbf{x})}={\mathbf{y} \mid f(\mathbf{y}) \leq \alpha=f(\mathbf{x})} $$ at the point $\mathbf{x}$ (see Figure 3.11). Moreover, the first-order condition given by (3.110) means that the first-order term in the Taylor series of $f(\mathbf{y})$ at the point $\mathbf{x}$ is no greater than zero whenever $f(\mathbf{y}) \leq f(\mathbf{x})$. Proof: Let us prove the necessity followed by the sufficiency.
Necessity: Suppose $f(\mathbf{x}) \geq f(\mathbf{y})$. Then, by modified Jensen’s inequality, we have $$ f(t \mathbf{y}+(1-t) \mathbf{x}) \leq f(\mathbf{x}) \text { for all } 0 \leq t \leq 1 $$ Therefore, $$ \begin{aligned} \lim {t \rightarrow 0^{+}} \frac{f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))-f(\mathbf{x})}{t} &=\lim {t \rightarrow 0^{+}} \frac{1}{t}\left(f(\mathbf{x})+t \nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x})-f(\mathbf{x})\right) \ &=\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x}) \leq 0 \end{aligned} $$ where we have used the first-order Taylor series approximation in the first equality.
Sufficiency: Suppose that $f(\mathbf{x})$ is not quasiconvex. Then there exists a nonconvex sublevel set of $f$, $$ S_{\alpha}={\mathbf{x} \mid f(\mathbf{x}) \leq \alpha}, $$ and two distinct points $\mathbf{x}{1}, \mathbf{x}{2} \in S_{\alpha}$ such that $\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2} \notin$ $S_{\alpha}$, for some $0<\theta<1$, i.e., $$ f\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}\right)>\alpha \text { for some } 0<\theta<1 . $$ Since $f$ is differentiable, hence continuous, (3.112) implies that, as illustrated in Figure 3.12, there exist distinct $\theta_{1}, \theta_{2} \in(0,1)$ such that $$ \begin{aligned} &f\left(\theta \mathbf{x}_{1}+(1-\theta) \mathbf{x}_{2}\right)>\alpha \text { for all } \theta_{1}<\theta<\theta_{2} \ &f\left(\theta_{1} \mathbf{x}{1}+\left(1-\theta{1}\right) \mathbf{x}{2}\right)=f\left(\theta{2} \mathbf{x}{1}+\left(1-\theta{2}\right) \mathbf{x}{2}\right)=\alpha . \end{aligned} $$ Let $\mathbf{x}=\theta{1} \mathbf{x}{1}+\left(1-\theta{1}\right) \mathbf{x}{2}$ and $\mathbf{y}=\theta{2} \mathbf{x}{1}+\left(1-\theta{2}\right) \mathbf{x}_{2}$, and so $$ f(\mathbf{x})=f(\mathbf{y})=\alpha, $$ and $$ g(t)=f(t \mathbf{y}+(1-t) \mathbf{x})>\alpha \text { for all } 0<t<1 $$
is a differentiable function of $t$ and $\partial g(t) / \partial t>0$ for $t \in[0, \varepsilon)$ where $0<\varepsilon \ll 1$, as illustrated in Figure 3.12. Then, it can be inferred that $$ \begin{aligned} (1-t) \frac{\partial g(t)}{\partial t} &=\nabla f(\mathbf{x}+t(\mathbf{y}-\mathbf{x}))^{T}[(1-t)(\mathbf{y}-\mathbf{x})]>0 \text { for all } t \in[0, \varepsilon) \ &=\nabla f(\boldsymbol{x})^{T}(\mathbf{y}-\boldsymbol{x})>0 \text { for all } t \in[0, \varepsilon) \end{aligned} $$ where $$ \begin{aligned} \boldsymbol{x} &=\mathbf{x}+t(\mathbf{y}-\mathbf{x}), \quad t \in[0, \varepsilon] \ \Rightarrow g(t) &=f(\boldsymbol{x}) \geq f(\mathbf{y})=\alpha(\text { by }(3.113) \text { and }(3.114)) \end{aligned} $$ Therefore, if $f$ is not quasiconvex, there exist $\boldsymbol{x}, \mathbf{y}$ such that $f(\mathbf{y}) \leq f(\boldsymbol{x})$ and $\nabla f(\boldsymbol{x})^{T}(\mathbf{y}-\boldsymbol{x})>0$, which contradicts with the implication (3.110). Thus we have completed the proof of sufficiency.
数学代写|凸优化作业代写Convex Optimization代考|Nonnegative weighted sum
Let $f_{1}, \ldots, f_{m}$ be convex functions and $w_{1}, \ldots, w_{m} \geq 0$. Then $\sum_{i=1}^{m} w_{i} f_{i}$ is convex.
Proof: $\operatorname{dom}\left(\sum_{i=1}^{m} w_{i} f_{i}\right)=\bigcap_{i=1}^{m}$ dom $f_{i}$ is convex because dom $f_{i}$ is convex for all $i$. For $0 \leq \theta \leq 1$, and $\mathbf{x}, \mathbf{y} \in \operatorname{dom}\left(\sum_{i=1}^{m} w_{i} f_{i}\right)$, we have $$ \begin{aligned} \sum_{i=1}^{m} w_{i} f_{i}(\theta \mathbf{x}+(1-\theta) \mathbf{y}) & \leq \sum_{i=1}^{m} w_{i}\left(\theta f_{i}(\mathbf{x})+(1-\theta) f_{i}(\mathbf{y})\right) \ &=\theta \sum_{i=1}^{m} w_{i} f_{i}(\mathbf{x})+(1-\theta) \sum_{i=1}^{m} w_{i} f_{i}(\mathbf{y}) \end{aligned} $$ Hence proved. Remark 3.27 $f(\mathbf{x}, \mathbf{y})$ is convex in $\mathbf{x}$ for each $\mathbf{y} \in \mathcal{A}$ and $w(\mathbf{y}) \geq 0$. Then, $$ g(\mathbf{x})=\int_{\mathcal{A}} w(\mathbf{y}) f(\mathbf{x}, \mathbf{y}) d \mathbf{y} $$ is convex on $\bigcap_{y \in \mathcal{A}} \operatorname{dom} f$. Composition with affine mapping If $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ is a convex function, then for $\mathbf{A} \in \mathbb{R}^{n \times m}$ and $\mathbf{b} \in \mathbb{R}^{n}$, the function $g: \mathbb{R}^{m} \rightarrow \mathbb{R}$, defined as $$ g(\mathbf{x})=f(\mathbf{A} \mathbf{x}+\mathbf{b}), $$ is also convex and its domain can be expressed as $$ \begin{aligned} \operatorname{dom} g &=\left{\mathbf{x} \in \mathbb{R}^{m} \mid \mathbf{A} \mathbf{x}+\mathbf{b} \in \operatorname{dom} f\right} \ &=\left{\mathbf{A}^{\dagger}(\mathbf{y}-\mathbf{b}) \mid \mathbf{y} \in \mathbf{d o m} f\right}+\mathcal{N}(\mathbf{A}) \quad(\text { cf. }(2.62)) \end{aligned} $$ which is also a convex set by Remark $2.9$. Proof (using epigraph): Since $g(\mathbf{x})=f(\mathbf{A x}+\mathbf{b})$ and epi $f={(\mathbf{y}, t) \mid f(\mathbf{y}) \leq t}$, we have $$ \text { epi } \begin{aligned} g &=\left{(\mathbf{x}, t) \in \mathbb{R}^{m+1} \mid f(\mathbf{A} \mathbf{x}+\mathbf{b}) \leq t\right} \ &=\left{(\mathbf{x}, t) \in \mathbb{R}^{m+1} \mid(\mathbf{A} \mathbf{x}+\mathbf{b}, t) \in \mathbf{e p i} f\right} \end{aligned} $$
Now, define $$ \mathcal{S}=\left{(\mathbf{x}, \mathbf{y}, t) \in \mathbb{R}^{m+n+1} \mid \mathbf{y}=\mathbf{A} \mathbf{x}+\mathbf{b}, f(\mathbf{y}) \leq t\right} $$ so that $$ \text { epi } g=\left{\left[\begin{array}{lll} \mathbf{I}{m} & \mathbf{0}{m \times n} & \mathbf{0}{m} \ \mathbf{0}{m}^{T} & \mathbf{0}{\mathrm{n}}^{T} & 1 \end{array}\right](\mathbf{x}, \mathbf{y}, t) \mid(\mathbf{x}, \mathbf{y}, t) \in \mathcal{S}\right} $$ which is nothing but the image of $\mathcal{S}$ via an affine mapping. It can be easily shown, by the definition of convex sets, that $\mathcal{S}$ is convex if $f$ is convex. Therefore epi $g$ is convex (due to affine mapping from the convex set $\mathcal{S}$ ) implying that $g$ is convex (by Fact 3.2). Alternative proof: For $0 \leq \theta \leq 1$, we have $$ \begin{aligned} g\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}\right) &=f\left(\mathbf{A}\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}\right)+\mathbf{b}\right) \ &=f\left(\theta\left(\mathbf{A} \mathbf{x}{1}+\mathbf{b}\right)+(1-\theta)\left(\mathbf{A} \mathbf{x}{2}+\mathbf{b}\right)\right) \ & \leq \theta f\left(\mathbf{A} \mathbf{x}{1}+\mathbf{b}\right)+(1-\theta) f\left(\mathbf{A} \mathbf{x}{2}+\mathbf{b}\right) \ &=\theta g\left(\mathbf{x}{1}\right)+(1-\theta) g\left(\mathbf{x}_{2}\right) \end{aligned} $$ Moreover, dom $g$ (cf. (3.72)) is also a convex set, and so we conclude that $f(\mathbf{A} \mathbf{x}+\mathbf{b})$ is a convex function.
Suppose that $h: \operatorname{dom} h \rightarrow \mathbb{R}$ is a convex (concave) function and dom $h \subset \mathbb{R}^{n}$. The extended-value extension of $h$, denoted as $\tilde{h}$, with $\operatorname{dom} \tilde{h}=\mathbb{R}^{n}$ aids in simple representation as its domain is the entire $\mathbb{R}^{n}$, which need not be explicitly mentioned. The extended-valued function $\tilde{h}$ is a function taking the same value of $h(\mathbf{x})$ for $\mathbf{x} \in$ dom $h$, otherwise taking the value of $+\infty(-\infty)$. Specifically, if $h$ is convex, $$ \tilde{h}(\mathbf{x})=\left{\begin{array}{l} h(\mathbf{x}), \mathbf{x} \in \operatorname{dom} h \ +\infty, \mathbf{x} \notin \operatorname{dom} h \end{array}\right. $$ and if $h$ is concave, $$ \tilde{h}(\mathbf{x})=\left{\begin{array}{l} h(\mathbf{x}), \mathbf{x} \in \operatorname{dom} h \ -\infty, \mathbf{x} \notin \operatorname{dom} h \end{array}\right. $$ Then the extended-valued function $\tilde{h}$ does not affect the convexity (or concavity) of the original function $h$ and Eff-dom $\tilde{h}=$ Eff-dom $h$.
Some examples for illustrating properties of an extended-value extension of a function are as follows.
$h(x)=\log x$, $\operatorname{dom} h=\mathbb{R}_{++}$. Then $h(x)$ is concave and $\tilde{h}(x)$ is concave and nondecreasing.
$h(x)=x^{1 / 2}$, dom $h=\mathbb{R}_{+}$. Then $h(x)$ is concave and $\tilde{h}(x)$ is concave and nondecreasing.
In the function $$ h(x)=x^{2}, x \geq 0, $$ i.e., dom $h=\mathbb{R}_{+}, h(x)$ is convex and $\tilde{h}(x)$ is convex but neither nondecreasing nor nonincreasing.
Let $f(\mathbf{x})=h(g(\mathbf{x}))$, where $h: \mathbb{R} \rightarrow \mathbb{R}$ and $g: \mathbb{R}^{n} \rightarrow \mathbb{R}$. Then we have the following four composition rules about the convexity or concavity of $f$. (a) $f$ is convex if $h$ is convex, $\bar{h}$ nondecreasing, and $g$ convex. (3.76a) (b) $f$ is convex if $h$ is convex, $\tilde{h}$ nonincreasing, and $g$ concave. (3.76b) (c) $f$ is concave if $h$ is concave, $\tilde{h}$ nondecreasing, and $g$ concave. (3.76c) (d) $f$ is concave if $h$ is concave, $\tilde{h}$ nonincreasing, and $g$ convex. (3.76d) Consider the case that $g$ and $h$ are twice differentiable and $\tilde{h}(x)=h(x)$. Then, $$ \nabla f(\mathbf{x})=h^{\prime}(g(\mathbf{x})) \nabla g(\mathbf{x}) $$ and $$ \begin{aligned} \nabla^{2} f(\mathbf{x}) &=D(\nabla f(\mathbf{x}))=D\left(h^{\prime}(g(\mathbf{x})) \cdot \nabla g(\mathbf{x})\right) \quad(\text { by }(1.46)) \ &=\nabla g(\mathbf{x}) D\left(h^{\prime}(g(\mathbf{x}))\right)+h^{\prime}(g(\mathbf{x})) \cdot D(\nabla g(\mathbf{x})) \ &=h^{\prime \prime}(g(\mathbf{x})) \nabla g(\mathbf{x}) \nabla g(\mathbf{x})^{T}+h^{\prime}(g(\mathbf{x})) \nabla^{2} g(\mathbf{x}) \end{aligned} $$ The composition rules (a) (cf. (3.76a)) and (b) (cf. (3.76b)) can be proven for convexity of $f$ by checking if $\nabla^{2} f(\mathbf{x}) \succeq \mathbf{0}$, and the composition rules (c) (cf. $(3.76 \mathrm{c}))$ and (d) (cf. $(3.76 \mathrm{~d}))$ for concavity of $f$ by checking if $\nabla^{2} f(\mathbf{x}) \preceq \mathbf{0}$. Let us conclude this subsection with a simple example. Example $3.3$ Let $g(\mathbf{x})=|\mathbf{x}|_{2}$ (convex) and $$ h(x)= \begin{cases}x^{2}, & x \geq 0 \ 0, & \text { otherwise }\end{cases} $$ which is convex. So $\tilde{h}(x)=h(x)$ is nondecreasing. Then, $f(\mathbf{x})=h(g(\mathbf{x}))=$ $|\mathbf{x}|_{2}^{2}=\mathbf{x}^{T} \mathbf{x}$ is convex by $(3.76 \mathrm{a})$, or by the second-order condition $\nabla^{2} f(\mathbf{x})=$ $2 \mathbf{I}_{n} \succ \mathbf{0}, f$ is indeed convex.
数学代写|凸优化作业代写Convex Optimization代考|Pointwise minimum and infimum
If $f(\mathbf{x}, \mathbf{y})$ is convex in $(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^{m} \times \mathbb{R}^{n}$ and $C \subset \mathbb{R}^{n}$ is convex and nonempty, then $$ g(\mathbf{x})=\inf {\mathbf{y} \in C} f(\mathbf{x}, \mathbf{y}) $$ is convex, provided that $g(\mathbf{x})>-\infty$ for some $\mathbf{x}$. Similarly, if $f(\mathbf{x}, \mathbf{y})$ is concave in $(\mathbf{x}, \mathbf{y}) \in \mathbb{R}^{m} \times \mathbb{R}^{n}$, then $$ \tilde{g}(\mathbf{x})=\sup {\mathbf{y} \in C} f(\mathbf{x}, \mathbf{y}) $$ is concave provided that $C \subset \mathbb{R}^{n}$ is convex and nonempty and $\tilde{g}(\mathbf{x})<\infty$ for some $\mathbf{x}$. Next, we present the proof for the former.
Proof of (3.87): Since $f$ is continuous over int(dom $f$ ) (cf. Remark 3.7), for any $\epsilon>0$ and $\mathbf{x}{1}, \mathbf{x}{2} \in \operatorname{dom} g$, there exist $\mathbf{y}{1}, \mathbf{y}{2} \in C$ (depending on $\epsilon$ ) such that $$ f\left(\mathbf{x}{i}, \mathbf{y}{i}\right) \leq g\left(\mathbf{x}{i}\right)+\epsilon, i=1,2 . $$ Let $\left(\mathbf{x}{1}, t_{1}\right),\left(\mathbf{x}{2}, t{2}\right) \in$ epi $g$. Then $g\left(\mathbf{x}{i}\right)=\inf {\mathbf{y} \in C} f\left(\mathbf{x}{i}, \mathbf{y}\right) \leq t{i}, i=1,2$. Then for any $\theta \in[0,1]$, we have $$ \begin{aligned} g\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}\right) &=\inf {\mathbf{y} \in C} f\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}, \mathbf{y}\right) \ & \leq f\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}, \theta \mathbf{y}{1}+(1-\theta) \mathbf{y}{2}\right) \ & \leq \theta f\left(\mathbf{x}{1}, \mathbf{y}{1}\right)+(1-\theta) f\left(\mathbf{x}{2}, \mathbf{y}{2}\right) \quad \text { (since } f \text { is convex) } \ & \leq \theta g\left(\mathbf{x}{1}\right)+(1-\theta) g\left(\mathbf{x}{2}\right)+\epsilon \quad \text { (by (3.89)) } \ & \leq \theta t{1}+(1-\theta) t_{2}+\epsilon . \end{aligned} $$ It can be seen that as $\epsilon \rightarrow 0, g\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}\right) \leq \theta t_{1}+(1-\theta) t_{2}$, implying $\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}, \theta t_{1}+(1-\theta) t_{2}\right) \in$ epi $g$. Hence epi $g$ is a convex set, and thus $g(\mathbf{x})$ is a convex function by Fact 3.2.
Alternative proof of (3.87): Because dom $g={\mathbf{x} \mid(\mathbf{x}, \mathbf{y}) \in \operatorname{dom} f, \mathbf{y} \in C}$ is the projection of the convex set ${(\mathbf{x}, \mathbf{y}) \mid(\mathbf{x}, \mathbf{y}) \in \operatorname{dom} f, \mathbf{y} \in C}$ on the $\mathbf{x}-$ coordinate, it must be a convex set (cf. Remark 2.11).
数学代写|凸优化作业代写Convex Optimization代考|Basic properties and examples of convex functions
Prior to introducing the definition, properties and various conditions of convex functions together with illustrative examples, we need to clarify the role of $+\infty$ and $-\infty$ for a function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$. In spite of $+\infty,-\infty \notin \mathbb{R}, f(\mathbf{x})$ is allowed to take a value of $+\infty$ or $-\infty$ for some $\mathrm{x} \in \operatorname{dom} f$, hereafter. For instance, the following functions $f_{1}(\mathbf{x})=\left{\begin{array}{l}|\mathbf{x}|_{2}^{2},|\mathbf{x}|_{2} \leq 1 \ +\infty, \quad 1<|\mathbf{x}|_{2} \leq 2\end{array}, \quad\right.$ dom $f_{1}=\left{\mathbf{x} \in \mathbb{R}^{n} \mid|\mathbf{x}|_{2} \leq 2\right}$ $f_{2}(x)=\left{\begin{array}{l}-\infty, x=0 \ \log x, x>0\end{array}, \quad \operatorname{dom} f_{2}=\mathbb{R}{+}\right.$ are well-defined functions, and $f{1}$ is a convex function and $f_{2}$ is a concave function. The convexity of functions will be presented next in detail. Definition and fundamental properties A function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}$ is said to be convex if the following conditions are satisfied
dom $f$ is convex.
For all $\mathbf{x}, \mathbf{y} \in \operatorname{dom} f, \theta \in[0,1]$. $$ f(\theta \mathbf{x}+(1-\theta) \mathbf{y}) \leq \theta f(\mathbf{x})+(1-\theta) f(\mathbf{y}) $$
A convex function basically looks like a faceup bowl as illustrated in Figure 3.1, and it may be differentiable, or continuous but nonsmooth or a nondifferentiable function (e.g., with some discontinuities or with $f(\mathbf{x})=+\infty$ for some $\mathbf{x}$ ). Note that for a given $\theta \in[0,1], \mathbf{z} \triangleq \theta \mathbf{x}+(1-\theta) \mathbf{y}$ is a point on the line segment from $\mathbf{x}$ to $\mathbf{y}$ with $$ \frac{|\mathbf{z}-\mathbf{y}|_{2}}{|\mathbf{y}-\mathbf{x}|_{2}}=\theta \text {, and } \frac{|\mathbf{z}-\mathbf{x}|_{2}}{|\mathbf{y}-\mathbf{x}|_{2}}=1-\theta \text {, } $$ and $f(\mathbf{z})$ is upper bounded by the sum of $100 \times \theta \%$ of $f(\mathbf{x})$ and $100 \times(1-\theta) \%$ of $f(\mathbf{y})$ (i.e., the closer (further) the $\mathbf{z}$ to $\mathbf{x}$, the larger (smaller) the contribution of $f(\mathbf{x})$ to the upper bound of $f(\mathbf{z})$, and this also applies to the contribution of $f(\mathbf{y})$ as shown in Figure 3.1). Note that when $\mathbf{z}$ is given instead of $\theta$, the value of $\theta$ in the upper bound of $f(\mathbf{z})$ can also be determined by (3.4). Various convex function examples will be provided in Subsection 3.1.4.
Suppose that $f$ is differentiable. Then $f$ is convex if and only if $\operatorname{dom} f$ is convex and $$ f(\mathbf{y}) \geq f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x}) \quad \forall \mathbf{x}, \mathbf{y} \in \operatorname{dom} f $$ This is called the first-order condition, which means that the first-order Taylor series approximation of $f(\mathbf{y})$ w.r.t. $\mathbf{y}=\mathbf{x}$ is always below the original function (see Figure $3.3$ for the one-dimensional case), i.e., the first-order condition (3.16) provides a tight lower bound (which is an affine function in $\mathbf{y}$ ) over the entire domain for a differentiable convex function. Moreover, it can be seen from (3.16) that $$ f(\mathbf{y})=\max _{\mathbf{x} \in \operatorname{dom} f} f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x}) \quad \forall \mathbf{y} \in \operatorname{dom} f $$ For instance, as illustrated in Figure $3.3, f(b) \geq f(a)+f^{\prime}(a)(b-a)$ for any $a$ and the equality holds only when $a=b$. Next, let us prove the first-order condition.
Proof of (3.16): Let us prove the sufficiency followed by necessity.
Sufficiency: (i.e., if (3.16) holds, then $f$ is convex) From (3.16), we have, for all $\mathbf{x}, \mathbf{y}, \mathbf{z} \in \operatorname{dom} f$ which is convex and $0 \leq \lambda \leq 1$, $$ \begin{aligned} f(\mathbf{y}) & \geq f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x}), \ f(\mathbf{z}) & \geq f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{z}-\mathbf{x}), \ \Rightarrow \lambda f(\mathbf{y})+(1-\lambda) f(\mathbf{z}) & \geq f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\lambda \mathbf{y}+(1-\lambda) \mathbf{z}-\mathbf{x}) . \end{aligned} $$
By setting $\mathbf{x}=\lambda \mathbf{y}+(1-\lambda) \mathbf{z} \in \operatorname{dom} f$ in the above inequality, we obtain $\lambda f(\mathbf{y})+(1-\lambda) f(\mathbf{z}) \geq f(\lambda \mathbf{y}+(1-\lambda) \mathbf{z})$. So $f$ is convex.
Necessity: (i.e., if $f$ is convex, then (3.16) holds) For $\mathbf{x}, \mathbf{y} \in \mathbf{d o m} f$ and $0 \leq$ $\lambda \leq 1$, $$ \begin{aligned} f((1-\lambda) \mathbf{x}+\lambda \mathbf{y}) &=f(\mathbf{x}+\lambda(\mathbf{y}-\mathbf{x})) \ &=f(\mathbf{x})+\lambda \nabla f(\mathbf{x}+\theta \lambda(\mathbf{y}-\mathbf{x}))^{T}(\mathbf{y}-\mathbf{x}) \end{aligned} $$ for some $\theta \in[0,1]$ (from the first-order expansion of Taylor series (1.53)). Since $f$ is convex, we have $$ f((1-\lambda) \mathbf{x}+\lambda \mathbf{y}) \leq(1-\lambda) f(\mathbf{x})+\lambda f(\mathbf{y}) $$ Substituting (3.18) on the left-hand side of this inequality yields $$ \lambda f(\mathbf{y}) \geq \lambda f(\mathbf{x})+\lambda \nabla f(\mathbf{x}+\theta \lambda(\mathbf{y}-\mathbf{x}))^{T}(\mathbf{y}-\mathbf{x}) $$ For $\lambda>0$, we get (after dividing by $\lambda$ ), $$ \begin{aligned} f(\mathbf{y}) & \geq f(\mathbf{x})+\nabla f(\mathbf{x}+\theta \lambda(\mathbf{y}-\mathbf{x}))^{T}(\mathbf{y}-\mathbf{x}) \ &=f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x}) \quad\left(\text { as } \lambda \rightarrow 0^{+}\right) \end{aligned} $$ because $\nabla f$ is continuous due to the fact that $f$ is differentiable and convex (cf. Remark $3.13$ below). Hence (3.16) has been proved.
Suppose that $f$ is twice differentiable. Then $f$ is convex if and only if dom $f$ is convex and the Hessian of $f$ is PSD for all $\mathbf{x} \in \operatorname{dom} f$, that is, $$ \nabla^{2} f(\mathbf{x}) \succeq \mathbf{0}, \forall \mathbf{x} \in \operatorname{dom} f $$ Proof: Let us prove the sufficiency followed by necessity.
Sufficiency: (i.e., if $\nabla^{2} f(\mathbf{x}) \succeq \mathbf{0}, \forall \mathbf{x} \in \operatorname{dom} f$, then $f$ is convex) From the second-order expansion of Taylor series of $f(\mathrm{x})$ (cf. (1.54)), we have $$ \begin{aligned} f(\mathbf{x}+\mathbf{v}) &=f(\mathbf{x})+\nabla f(\mathbf{x})^{T} \mathbf{v}+\frac{1}{2} \mathbf{v}^{T} \nabla^{2} f(\mathbf{x}+\theta \mathbf{v}) \mathbf{v} \ & \geq f(\mathbf{x})+\nabla f(\mathbf{x})^{T} \mathbf{v} \quad(\text { by }(3.27)) \end{aligned} $$ for some $\theta \in[0,1]$. Let $\mathbf{y}=\mathbf{x}+\mathbf{v}$, i.e., $\mathbf{v}=\mathbf{y}-\mathbf{x}$. Then we have $$ f(\mathbf{y}) \geq f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x}) $$ which is the exactly first-order condition for the convexity of $f(\mathbf{x})$, implying that $f$ is convex.
Necessity: Since $f(\mathbf{x})$ is convex, from the first-order condition we have $$ f(\mathbf{x}+\mathbf{v}) \geq f(\mathbf{x})+\nabla f(\mathbf{x})^{T} \mathbf{v} $$ which together with the second-order expansion of Taylor series of $f(\mathbf{x})$ given by (3.28) implies $$ \mathbf{v}^{T} \nabla^{2} f(\mathbf{x}+\theta \mathbf{v}) \mathbf{v} \geq 0 $$ By letting $|\mathbf{v}|_{2} \rightarrow 0$, it can be inferred that $\nabla^{2} f(\mathbf{x}) \succeq \mathbf{0}$ because $\nabla^{2} f(\mathbf{x})$ is continuous for a convex twice differentiable function $f(\mathbf{x})$.
Remark 3.16 If the second-order condition given by (3.27) holds true with the strict inequality for all $\mathbf{x} \in \operatorname{dom} f$, the function $f$ is strictly convex; moreover, under the second-order condition given by (3.27) for the case that $f: \mathbb{R} \rightarrow \mathbb{R}$,
the first derivative $f^{\prime}$ must be continuous and nondecreasing if $f$ is convex, and continuous and strictly increasing if $f$ is strictly convex.
Remark $3.17$ (Strong convexity) A convex function $f$ is strongly convex on a set $C$ if there exists an $m>0$ such that either $\nabla^{2} f(\mathbf{x}) \succeq m \mathbf{I}$ for all $\mathbf{x} \in C$, or equivalently the following second-order condition holds true: $$ f(\mathbf{y}) \geq f(\mathbf{x})+\nabla f(\mathbf{x})^{T}(\mathbf{y}-\mathbf{x})+\frac{m}{2}|\mathbf{y}-\mathbf{x}|_{2}^{2} \quad \forall \mathbf{x}, \mathbf{y} \in C $$ which is directly implied from (3.28). So if $f$ is strongly convex, it must be strictly convex, but the reverse is not necessarily true.
Suppose that $C$ and $D$ are convex sets in $\mathbb{R}^{n}$ and $C \cap D=\emptyset$. Then there exists a hyperplane $H(\mathbf{a}, b)=\left{\mathbf{x} \mid \mathbf{a}^{T} \mathbf{x}=b\right}$ where $\mathbf{a} \in \mathbb{R}^{n}$ is a nonzero vector and $b \in \mathbb{R}$ such that $$ C \subseteq H_{-}(\mathbf{a}, b) \text {, i.e., } \mathbf{a}^{T} \mathbf{x} \leq b, \forall \mathbf{x} \in C $$ and $$ D \subseteq H_{+}(\mathbf{a}, b), \text { i.e., } \mathbf{a}^{T} \mathbf{x} \geq b, \forall \mathbf{x} \in D $$ Only the proof of (2.123) will be given below since the proof of (2.122) can be proven similarly. In the proof, we implicitly assume that the two convex sets $C$ and $D$ are closed without loss of generality. The reasons are that $\mathbf{c l} C$, int $C$, cl $D$, and int $D$ are also convex by Property $2.5$ in Subsection 2.1.4, and thus the same hyperplane that separates $\mathbf{c l} C$ (or int $C$ ) and $\mathbf{c l} D$ (or int $D$ ) can also separate $C$ and $D$. Proof: Let $$ \operatorname{dist}(C, D)=\inf \left{|\mathbf{u}-\mathbf{v}|_{2} \mid \mathbf{v} \in C, \mathbf{u} \in D\right} . $$ Assume that $\operatorname{dist}(C, D)>0$, and that there exists a point $\mathbf{c} \in C$ and a point $\mathbf{d} \in D$ such that $$ |\mathbf{c}-\mathbf{d}|_{2}=\operatorname{dist}(C, D) $$ (as illustrated in Figure $2.20$ ). These assumptions will be satisfied if $C$ and $D$ are closed, and one of $C$ and $D$ is bounded. Note that it is possible that if both $C$ and $D$ are not bounded, such $\mathbf{c} \in C$ and $\mathbf{d} \in D$ may not exist. For instance, $C=\left{(x, y) \in \mathbb{R}^{2} \mid y \geq e^{-x}+1, x \geq 0\right}$ and $D=\left{(x, y) \in \mathbb{R}^{2} \mid y \leq-e^{-x}, x \geq 0\right}$ are convex, closed, and unbounded with $\operatorname{dist}(C, D)=1$, but $\mathbf{c} \in C$ and $\mathbf{d} \in D$ satisfying (2.125) do not exist.
For any nonempty convex set $C$ and for any $\mathbf{x}{0} \in \mathbf{b d} C$, there exists an $\mathbf{a} \neq \mathbf{0}$, such that $\mathbf{a}^{T} \mathbf{x} \leq \mathbf{a}^{T} \mathbf{x}{0}$, for all $\mathbf{x} \in C$; namely, the convex set $C$ is supported by
Proof: Assume that $C$ is a convex set, $A=$ int $C$ (which is open and convex), and $\mathbf{x}{0} \in \mathbf{b d} C$. Let $B=\left{\mathbf{x}{0}\right}$ (which is convex). Then $A \cap B=\emptyset$. By the separating hyperplane theorem, there exists a separating hyperplane $H={\mathbf{x} \mid$ $\left.\mathbf{a}^{T} \mathbf{x}=\mathbf{a}^{T} \mathbf{x}{0}\right}$ (since the distance between the set $A$ and the set $B$ is equal to zero), where $\mathbf{a} \neq \mathbf{0}$, between $A$ and $B$, such that $\mathbf{a}^{T}\left(\mathbf{x}-\mathbf{x}{0}\right) \leq 0$ for all $\mathbf{x} \in C$ (i.e., $C \subseteq H_{-}$by Remark $2.23$ ). Therefore, the hyperplane $H$ is a supporting hyperplane of the convex set $C$ which passes $\mathbf{x}_{0} \in$ bd $C$.
It is now easy to prove, by the supporting hyperplane theorem, that a closed convex set $S$ with int $S \neq \emptyset$ is the intersection of all (possibly an infinite number of) closed halfspaces that contain it (cf. Remark 2.8). Let $$ \mathcal{H}\left(\mathbf{x}{0}\right) \triangleq\left{\mathbf{x} \mid \mathbf{a}^{T}\left(\mathbf{x}-\mathbf{x}{0}\right)=0\right} $$ be a supporting hyperplane of $S$ passing $\mathbf{x}{0} \in \mathbf{b d} S$. This implies, by the hyperplane supporting theorem, that the associated closed halfspace $\mathcal{H}{-}\left(\mathbf{x}{0}\right)$, which contains the closed convex set $S$, is given by $$ \mathcal{H}{-}\left(\mathbf{x}{0}\right) \triangleq\left{\mathbf{x} \mid \mathbf{a}^{T}\left(\mathbf{x}-\mathbf{x}{0}\right) \leq 0\right}, \quad \mathbf{x}{0} \in \mathbf{b d} S $$ Thus it must be true that $$ S=\bigcap{\mathbf{x}{0} \in \mathbf{b d} S} \mathcal{H}{-}\left(\mathbf{x}{0}\right) $$ implying that a closed convex set $S$ can be defined by all of its supporting hyperplanes $\mathcal{H}\left(\mathrm{x}{0}\right)$, though the expression (2.136) may not be unique, thereby justifying Remark $2.8$. When the number of supporting halfspaces containing the closed convex set $S$ is finite, $S$ is a polyhedron. When $S$ is compact and convex,
the supporting hyperplane representation (2.136) can also be expressed as $$ \left.S=\bigcap_{\mathbf{x}{0} \in S{\text {extr }}} \mathcal{H}{-}\left(\mathbf{x}{0}\right) \text { (cf. }(2.24)\right) $$ where the intersection also contains those halfspaces whose boundaries may contain multiple extreme points of $S$. Let us conclude this section with the following three remarks.
数学代写|凸优化作业代写Convex Optimization代考|Summary and discussion
In this chapter, we have introduced convex sets and their properties (mostly geometric properties). Various convexity preserving operations were introduced together with many examples. In addition, the concepts of proper cones on which the generalized equality is defined, dual norms, and dual cones were introduced in detail. Finally, we presented the separating hyperplane theorem, which corroborates the existence of a hyperplane separating two disjoint convex sets, and the existence of the supporting hyperplane of any nonempty convex set. These fundamentals on convex sets along with convex functions to be introduced in the next chapter will be highly instrumental in understanding the concepts of convex optimization. The convex geometry properties introduced in this chapter have been applied to blind hyperspectral unmixing for material identification in remote sensing. Some will be introduced in Chapter 6 .
If $S_{1}$ and $S_{2}$ are convex sets, then $S_{1} \cap S_{2}$ is also convex. This property extends to the intersection of an infinite number of convex sets, i.e., if $S_{a}$ is convex for every $\alpha \in \mathcal{A}$, then $\cap_{\alpha \in \mathcal{A}} S_{\alpha}$ is convex. Let us illuminate the usefulness of this convexity preserving operation with the following remarks and examples.
Remark 2.6 A polyhedron can be considered as intersection of a finite number of halfspaces and hyperplanes (which are convex) and hence the polyhedron is convex.
Remark 2.7 Subspaces are closed under arbitrary intersections; so are affine sets and convex cones. So they all are convex sets.
Remark 2.8 A closed convex set $S$ is the intersection of all (possibly an infinite number of) closed halfspaces that contain $S$. This can be proven by the separating hyperplane theory (to be introduced in Subsection 2.6.1).
Example 2.4 The PSD cone $\mathbb{S}{+}^{n}$ is known to be convex. The proof of its convexity by the intersection property is given as follows. It is easy to see that $S{+}^{n}$ can be expressed as $$ \mathbb{S}{+}^{n}=\left{\mathbf{X} \in \mathbb{S}^{n} \mid \mathbf{z}^{T} \mathbf{X} \mathbf{z} \geq 0, \forall \mathbf{z} \in \mathbb{R}^{n}\right}=\bigcap{\mathbf{z} \in \mathbb{R}^{n}} S_{\mathbf{z}} $$ where $$ \begin{aligned} S_{\mathbf{z}} &=\left{\mathbf{X} \in \mathbb{S}^{n} \mid \mathbf{z}^{T} \mathbf{X} \mathbf{z} \geq 0\right}=\left{\mathbf{X} \in \mathbb{S}^{n} \mid \operatorname{Tr}\left(\mathbf{z}^{T} \mathbf{X} \mathbf{z}\right) \geq 0\right} \ &=\left{\mathbf{X} \in \mathbb{S}^{n} \mid \operatorname{Tr}\left(\mathbf{X} \mathbf{z z}^{T}\right) \geq 0\right}=\left{\mathbf{X} \in \mathbb{S}^{n} \mid \operatorname{Tr}(\mathbf{X} \mathbf{Z}) \geq 0\right} \end{aligned} $$ in which $\mathbf{Z}=\mathbf{z z}^{T}$, implying that $S_{\mathbf{z}}$ is a halfspace if $\mathbf{z} \neq \mathbf{0}{n}$. As the intersection of halfspaces is also convex, $\mathbb{S}{+}^{n}$ (intersection of infinite number of halfspaces) is a convex set. It is even easier to prove the convexity of $S_{\mathbf{z}}$ by the definition of convex sets. Example 2.5 Consider $$ P(\mathbf{x}, \omega)=\sum_{i=1}^{n} x_{i} \cos (i \omega) $$ and a set $$ \begin{aligned} C &=\left{\mathbf{x} \in \mathbb{R}^{n} \mid l(\omega) \leq P(\mathbf{x}, \omega) \leq u(\omega) \quad \forall \omega \in \Omega\right} \ &=\bigcap_{\omega \in \Omega}\left{\mathbf{x} \in \mathbb{R}^{n} \mid l(\omega) \leq \sum_{i=1}^{n} x_{i} \cos (i \omega) \leq u(\omega)\right} \end{aligned} $$ Let $$ \mathbf{a}(\omega)=[\cos (\omega), \cos (2 \omega), \ldots, \cos (n \omega)]^{T} $$
Then we have $C=\bigcap_{\omega \in \Omega}\left{\mathbf{x} \in \mathbb{R}^{n} \mid \mathbf{a}^{T}(\omega) \mathbf{x} \geq l(\omega), \mathbf{a}^{T}(\omega) \mathbf{x} \leq u(\omega)\right}$ (intersection of halfspaces), which implies that $C$ is convex. Note that the set $C$ is a polyhedron only when the set size $|\Omega|$ is finite.
数学代写|凸优化作业代写Convex Optimization代考|Affine function
A function $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ is affine if it takes the form $$ \boldsymbol{f}(\mathbf{x})=\mathbf{A} \mathbf{x}+\mathbf{b} $$ where $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^{m}$. The affine function, for which $\boldsymbol{f}$ (dom $\boldsymbol{f}$ ) is an affine set if dom $f$ is an affine set, also called the affine transformation or the affine mapping, has been implicitly used in defining the affine hull given by (2.7) in the preceding Subsection 2.1.2. It preserves points, straight lines, and planes, but not necessarily preserves angles between lines or distances between points. The affine mapping plays an important role in a variety of convex sets and convex functions, problem reformulations to be introduced in the subsequent chapters. Suppose $S \subseteq \mathbb{R}^{n}$ is convex and $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ is an affine function (see Figure 2.12). Then the image of $S$ under $f$, $$ f(S)={f(\mathbf{x}) \mid \mathbf{x} \in S} $$ is convex. The converse is also true, i.e., the inverse image of the convex set $C$ $$ \boldsymbol{f}^{-1}(C)={\mathbf{x} \mid \boldsymbol{f}(\mathbf{x}) \in C} $$ is convex. The proof is given below. Proof: Let $\mathbf{y}{1}$ and $\mathbf{y}{2} \in C$. Then there exist $\mathbf{x}{1}$ and $\mathbf{x}{2} \in f^{-1}(C)$ such that $\mathbf{y}{1}=\mathbf{A} \mathbf{x}{1}+\mathbf{b}$ and $\mathbf{y}{2}=\mathbf{A} \mathbf{x}{2}+\mathbf{b}$. Our aim is to show that the set $f^{-1}(C)$, which is the inverse image of $\boldsymbol{f}$, is convex. For $\theta \in[0,1]$, $$ \begin{aligned} \theta \mathbf{y}{1}+(1-\theta) \mathbf{y}{2} &=\theta\left(\mathbf{A} \mathbf{x}{1}+\mathbf{b}\right)+(1-\theta)\left(\mathbf{A \mathbf { x } { 2 }}+\mathbf{b}\right) \ &=\mathbf{A}\left(\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2}\right)+\mathbf{b} \in C \end{aligned} $$ which implies that $\theta \mathbf{x}{1}+(1-\theta) \mathbf{x}{2} \in f^{-1}(C)$, and that the convex combination of $\mathbf{x}{1}$ and $\mathbf{x}{2}$ is in $f^{-1}(C)$, and hence $f^{-1}(C)$ is convex.
Remark 2.9 If $S_{1} \subset \mathbb{R}^{n}$ and $S_{2} \subset \mathbb{R}^{n}$ are convex and $\alpha_{1}, \alpha_{2} \in \mathbb{R}$, then the set $S=\left{(\mathbf{x}, \mathbf{y}) \mid \mathbf{x} \in S_{1}, \mathbf{y} \in S_{2}\right}$ is convex. Furthermore, the set $$ \alpha_{1} S_{1}+\alpha_{2} S_{2}=\left{\mathbf{z}=\alpha_{1} \mathbf{x}+\alpha_{2} \mathbf{y} \mid \mathbf{x} \in S_{1}, \mathbf{y} \in S_{2}\right} \quad(\text { cf. (1.22) and (1.23)) } $$ is also convex (since this set can be thought of as the image of the convex set $S$ through the affine mapping given by (2.58) from $S$ to $\alpha_{1} S_{1}+\alpha_{2} S_{2}$ with
数学代写|凸优化作业代写Convex Optimization代考|Perspective function and linear-fractional function
Linear-fractional functions are functions which are more general than affine but still preserve convexity. The perspective function scales or normalizes vectors so that the last component is one, and then drops the last component.
The perspective function $\boldsymbol{p}: \mathbb{R}^{n+1} \rightarrow \mathbb{R}^{n}$, with $\operatorname{dom} \boldsymbol{p}=\mathbb{R}^{n} \times \mathbb{R}{++}$, is defined as $$ \boldsymbol{p}(\mathbf{z}, t)=\frac{\mathbf{z}}{t} . $$ The perspective function $\boldsymbol{p}$ preserves the convexity of the convex set. Proof: Consider two points $\left(\mathbf{z}{1}, t_{1}\right)$ and $\left(\mathbf{z}{2}, t{2}\right)$ in a convex set $C$ and so $\mathbf{z}{1} / t{1}$ and $\mathbf{z}{2} / t{2} \in \boldsymbol{p}(C)$. Then $$ \theta\left(\mathbf{z}{1}, t{1}\right)+(1-\theta)\left(\mathbf{z}{2}, t{2}\right)=\left(\theta \mathbf{z}{1}+(1-\theta) \mathbf{z}{2}, \theta t_{1}+(1-\theta) t_{2}\right) \in C, $$ for any $\theta \in[0,1]$ implying $$ \frac{\theta \mathbf{z}{1}+(1-\theta) \mathbf{z}{2}}{\theta t_{1}+(1-\theta) t_{2}} \in \boldsymbol{p}(C) $$ Now, by defining $$ \mu=\frac{\theta t_{1}}{\theta t_{1}+(1-\theta) t_{2}} \in[0,1], $$ we get $$ \frac{\theta \mathbf{z}{1}+(1-\theta) \mathbf{z}{2}}{\theta t_{1}+(1-\theta) t_{2}}=\mu \frac{\mathbf{z}{1}}{t{1}}+(1-\mu) \frac{\mathbf{z}{2}}{t{2}} \in \boldsymbol{p}(C), $$ which implies $\boldsymbol{p}(C)$ is convex. A linear-fractional function is formed by composing the perspective function with an affine function. Suppose $\boldsymbol{g}: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m+1}$ is affine, i.e., $$ \boldsymbol{g}(\mathbf{x})=\left[\begin{array}{c} \mathbf{A} \ \mathbf{c}^{T} \end{array}\right] \mathbf{x}+\left[\begin{array}{l} \mathbf{b} \ d \end{array}\right] $$
where $\mathbf{A} \in \mathbb{R}^{m \times n}, \mathbf{b} \in \mathbb{R}^{m}, \mathbf{c} \in \mathbb{R}^{n}$, and $d \in \mathbb{R}$. The function $\boldsymbol{f}: \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}$ is given by $f=\boldsymbol{p} \circ \boldsymbol{g}$, i.e., $$ \boldsymbol{f}(\mathbf{x})=\boldsymbol{p}(\boldsymbol{g}(\mathbf{x}))=\frac{\mathbf{A} \mathbf{x}+\mathbf{b}}{\mathbf{c}^{T} \mathbf{x}+d}, \quad \operatorname{dom} \boldsymbol{f}=\left{\mathbf{x} \mid \mathbf{c}^{T} \mathbf{x}+d>0\right}, $$ is called a linear-fractional (or projective) function. Hence, linear-fractional functions preserve the convexity.