## 数学代写|凸优化作业代写Convex Optimization代考|IE3078

statistics-lab™ 为您的留学生涯保驾护航 在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

## 数学代写|凸优化作业代写Convex Optimization代考|Maximum Likelihood Estimation

Suppose $\boldsymbol{x} \in \mathbb{R}^n$ follows a probability density function $f(\boldsymbol{x} ; \boldsymbol{\theta})$ that is governed by some parameter $\boldsymbol{\theta} \in \Theta$. We had known the exact form of function $f(\cdot)$ but not the value of $\theta$. ${ }^1$

We need to determine the value of $\boldsymbol{\theta}$ from a series of data samples $\boldsymbol{x}_i \in \mathbb{R}^n, i=$ $1, \ldots, n$. One popular method to reach this goal is using the maximum likelihood estimation (MLE) method $[2,3]$.

The likelihood function $L$ is usually defined as the function obtained by reversing the roles of $\boldsymbol{x}$ and $\boldsymbol{\theta}$ as
$$L\left(\boldsymbol{\theta} ; \boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\right)=f\left(\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n ; \boldsymbol{\theta}\right)$$
In MLE, we will try to find a value $\boldsymbol{\theta}^$ of the parameter $\boldsymbol{\theta}$ that maximizes $L(\boldsymbol{\theta} ; \boldsymbol{x})$ for all the data sample. Here $\boldsymbol{\theta}^$ is called a maximum likelihood estimator of $\boldsymbol{\theta}$. In this book, we assume the number of data samples is large enough to make a good estimation. Some further discussions on maximum likelihood method can be found in [4].

If these data are independent and identically distributed (i.i.d.), the resulting likelihood function for the samples can be written as
$$L\left(\boldsymbol{\theta} ; \boldsymbol{x}1, \ldots, \boldsymbol{x}_n\right)=\prod{i=1}^n f\left(\boldsymbol{x}i ; \boldsymbol{\theta}\right)$$ Thus, the parameter estimation problem can then be formulated as the following optimization problem: $$\max {\boldsymbol{\theta}} L\left(\boldsymbol{\theta} ; \boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\right)$$
Notice that the natural logarithm function $\ln (\cdot)$ is concave and strictly increasing. If the maximum value of $L(\boldsymbol{\theta} ; \boldsymbol{x})$ does exist, it will occur at the same points as that of $\ln [L(\boldsymbol{\theta} ; \boldsymbol{x})]$. This function is called the log likelihood function and in many cases is easier to work out than the likelihood function, since $L(\boldsymbol{\theta} ; \boldsymbol{x})$ has a product structure.

## 数学代写|凸优化作业代写Convex Optimization代考|Measurements with iid Noise

In many cases, we need to determine a vector parameter $\vartheta \in \mathbb{R}^p$ from a series of input-output measurement pairs $\left(\boldsymbol{x}_i, y_i\right), \boldsymbol{x}_i \in \mathbb{R}^p, y_i \in \mathbb{R}, i=1, \ldots, n$. The linear relationship between these variables can be written as
$$y_i=\boldsymbol{\vartheta}^T \boldsymbol{x}_i+v_i$$
where $v_i \in \mathbb{R}$ are i.i.d. random variables whose probability density function $f(v)$ is known to us.

We can still use the maximum likelihood estimation (MLE) method to estimate $\vartheta$. The corresponding likelihood function is formulated as
$$L\left(\boldsymbol{\vartheta} ;\left(\boldsymbol{x}1, y_1\right), \ldots,\left(\boldsymbol{x}_n, y_n\right)\right)=\prod{i=1}^n f\left(y_i-\boldsymbol{\vartheta}^T \boldsymbol{x}_i ; \boldsymbol{\vartheta}\right)$$
If the probability density function $f(v)$ is log-concave, the estimation of $\vartheta$ can be formulated as a convex optimization problem in terms of $\vartheta$.

Example 3.7 If $v_i$ follows a normal distribution with mean $\mu$ and variance $\sigma^2$, we have the log likelihood function written as
$$\ln L\left(\boldsymbol{\vartheta} ;\left(\boldsymbol{x}1, y_1\right), \ldots,\left(\boldsymbol{x}_n, y_n\right)\right)=-n \ln \sigma-\frac{n}{2} \ln 2 \pi-\frac{1}{2 \sigma^2} \sum{i=1}^n\left(y_i-\boldsymbol{\vartheta}^T \boldsymbol{x}i-\mu\right)^2$$ Thus, the maximum likelihood estimator of $\vartheta$ is the optimal solution of the following least squares approximation problem; see also discussions in Sect. 4.1 $$\min {\vartheta} \sum_{i=1}^n\left(y_i-\vartheta^T \boldsymbol{x}_i-\mu\right)^2$$
Example 3.8 If $v_i$ follows a Laplacian distribution with probability density function written as
$$f(v)=\frac{1}{2 \tau} e^{-|v-\mu| / \tau}$$
where $\tau>0$ and $\mu$ is the mean value.

# 凸优化代写

## 数学代写|凸优化作业代写Convex Optimization代考|Karush-Kuhn-Tucker (KKT) Conditions

\begin{aligned} & \min _{\boldsymbol{x}} f(\boldsymbol{x}) \ & \text { s.t. } g_i(\boldsymbol{x}) \leq 0, i=1, \ldots, m \end{aligned}

$$L(\boldsymbol{x}, \boldsymbol{\alpha})=f(\boldsymbol{x})+\sum_{i=1}^m \alpha_i g_i(\boldsymbol{x})$$

$$q(\boldsymbol{\alpha})=\inf _{\boldsymbol{x} \in \Omega} L(\boldsymbol{x}, \boldsymbol{\alpha})$$

## Matlab代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 数学代写|凸优化作业代写Convex Optimization代考|CS168

statistics-lab™ 为您的留学生涯保驾护航 在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

## 数学代写|凸优化作业代写Convex Optimization代考|Rate of Convergence

The following proposition describes how the convergence rate of the proximal algorithm depends on the magnitude of $c_k$ and on the order of growth of $f$ near the optimal solution set (see also Fig. 5.1.3).

Proposition 5.1.4: (Rate of Convergence) Assume that $X^$ is nonempty and that for some scalars $\beta>0, \delta>0$, and $\gamma \geq 1$, we have $$f^+\beta(d(x))^\gamma \leq f(x), \quad \forall x \in \Re^n \text { with } d(x) \leq \delta,$$
where
$$d(x)=\min {x^* \in X^}\left|x-x^\right|$$
Let also
$$\sum {k=0}^{\infty} c_k=\infty,$$
so that the sequence $\left{x_k\right}$ generated by the proximal algorithm (5.1) converges to some point in $X^*$ by Prop. 5.1.3. Then:
(a) For all $k$ sufficiently large, we have
$$d\left(x_{k+1}\right)+\beta c_k\left(d\left(x_{k+1}\right)\right)^{\gamma-1} \leq d\left(x_k\right)$$
if $\gamma>1$, and

$$d\left(x_{k+1}\right)+\beta c_k \leq d\left(x_k\right),$$
if $\gamma=1$ and $x_{k+1} \notin X^$. (b) (Superlinear Convergence) Let $1<\gamma<2$ and $x_k \notin X^$ for all $k$. Then if $\inf {k \geq 0} c_k>0$, $$\limsup {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{\left(d\left(x_k\right)\right)^{1 /(\gamma-1)}}<\infty .$$ (c) (Linear Convergence) Let $\gamma=2$ and $x_k \notin X *$ for all $k$. Then if $\lim {k \rightarrow \infty} c_k=\bar{c}$ with $\bar{c} \in(0, \infty)$, $$\limsup {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{d\left(x_k\right)} \leq \frac{1}{1+\beta \bar{c}},$$ while if $\lim {k \rightarrow \infty} c_k=\infty$, $$\lim {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{d\left(x_k\right)}=0 .$$ (d) (Sublinear Convergence) Let $\gamma>2$. Then
$$\limsup {k \rightarrow \infty} \frac{d\left(x{k+1}\right)}{d\left(x_k\right)^{2 / \gamma}}<\infty .$$

An interesting interpretation of the proximal iteration is obtained by considering the function
$$\phi_c(z)=\inf {x \in \Re^n}\left{f(x)+\frac{1}{2 c}|x-z|^2\right}$$ for a fixed positive value of $c$. It can be seen that $$\inf {x \in \Re^n} f(x) \leq \phi_c(z) \leq f(z), \quad \forall z \in \Re^n,$$
from which it follows that the set of minima of $f$ and $\phi_c$ coincide (this is also evident from the geometric view of the proximal minimization given in Fig. 5.1.7). The following proposition shows that $\phi_c$ is a convex differentiable function, and derives its gradient.

Proposition 5.1.7: The funetion $\phi_c$ of Eq. (5.14) is convex and differentiable, and we have
$$\nabla \phi_c(z)=\frac{z-x_c(z)}{c} \quad \forall z \in \Re^n,$$
where $x_c(z)$ is the unique minimizer in Eq. (5.14). Moreover
$$\nabla \phi_c(z) \in \partial f\left(x_c(z)\right), \quad \forall z \in \Re^n$$
Proof: We first note that $\phi_c$ is convex, since it is obtained by partial minimization of $f(x)+\frac{1}{2 c}|x-z|^2$, which is convex as a function of $(x, z)$ (cf. Prop. 3.3.1 in Appendix B). Furthermore, $\phi_c$ is real-valued, since the infimum in Eq. (5.14) is attained.

Let us fix $z$, and for notational simplicity, denote $\bar{z}=x_c(z)$. To show that $\phi_c$ is differentiable with the given form of gradient, we note that by the optimality condition of Prop. 3.1.4, we have $v \in \partial \phi_c(z)$, or equivalently $0 \in \partial \phi_c(z)-v$, if and only if $z$ attains the minimum over $y \in \Re^n$ of
$$\phi_c(y)-v^{\prime} y=\inf _{x \in \Re^n}\left{f(x)+\frac{1}{2 c}|x-y|^2\right}-v^{\prime} y$$

# 凸优化代写

## 数学代写|凸优化作业代写Convex Optimization代考|Rate of Convergence

$$d(x)=\min {x^* \in X^}\left|x-x^\right|$$

$$\sum{k=0}^{\infty} c_k=\infty,$$

(a)对于所有$k$足够大的，我们有
$$d\left(x_{k+1}\right)+\beta c_k\left(d\left(x_{k+1}\right)\right)^{\gamma-1} \leq d\left(x_k\right)$$

$$d\left(x_{k+1}\right)+\beta c_k \leq d\left(x_k\right),$$

$$\limsup {k \rightarrow \infty} \frac{d\left(x{k+1}\right)}{d\left(x_k\right)^{2 / \gamma}}<\infty .$$

$$\phi_c(z)=\inf {x \in \Re^n}\left{f(x)+\frac{1}{2 c}|x-z|^2\right}$$为固定正值$c$。可以看出$$\inf {x \in \Re^n} f(x) \leq \phi_c(z) \leq f(z), \quad \forall z \in \Re^n,$$

$$\nabla \phi_c(z)=\frac{z-x_c(z)}{c} \quad \forall z \in \Re^n,$$

$$\nabla \phi_c(z) \in \partial f\left(x_c(z)\right), \quad \forall z \in \Re^n$$

$$\phi_c(y)-v^{\prime} y=\inf _{x \in \Re^n}\left{f(x)+\frac{1}{2 c}|x-y|^2\right}-v^{\prime} y$$

## Matlab代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 数学代写|凸优化作业代写Convex Optimization代考|ESE6050

statistics-lab™ 为您的留学生涯保驾护航 在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

## 数学代写|凸优化作业代写Convex Optimization代考|GENERALIZED SIMPLICIAL DECOMPOSITION

In this section we will aim to highlight some of the applications and the fine points of the general algorithm of the preceding section. As vehicle we will use the simplicial decomposition approach, and the problem
\begin{aligned} & \text { minimize } f(x)+c(x) \ & \text { subject to } x \in \Re^n, \end{aligned}
where $f: \Re^n \mapsto(-\infty, \infty]$ and $c: \Re^n \mapsto(-\infty, \infty]$ are closed proper convex functions. This is the Fenchel duality context, and it contains as a special case the problem to which the ordinary simplicial decomposition method of Section 4.2 applies (where $f$ is differentiable, and $c$ is the indicator function of a bounded polyhedral set). Here we will mainly focus on the case where $f$ is nondifferentiable and possibly extended real-valued.

We apply the polyhedral approximation scheme of the preceding section to the equivalent EMP
\begin{aligned} & \operatorname{minimize} f_1\left(x_1\right)+f_2\left(x_2\right) \ & \text { subject to }\left(x_1, x_2\right) \in S, \end{aligned}

where
$$f_1\left(x_1\right)=f\left(x_1\right), \quad f_2\left(x_2\right)=c\left(x_2\right), \quad S=\left{\left(x_1, x_2\right) \mid x_1=x_2\right} .$$
Note that the orthogonal subspace has the form
$$S^{\perp}=\left{\left(\lambda_1, \lambda_2\right) \mid \lambda_1=-\lambda_2\right}=\left{(\lambda,-\lambda) \mid \lambda \in \Re^n\right} .$$
Optimal primal and dual solutions of this EMP problem are of the form $\left(x^{o p t}, x^{o p t}\right)$ and $\left(\lambda^{o p t},-\lambda^{o p t}\right)$, with
$$\lambda^{o p t} \in \partial f\left(x^{o p t}\right), \quad-\lambda^{o p t} \in \partial c\left(x^{o p t}\right),$$
consistently with the optimality conditions of Prop. 4.4.1. A pair of such optimal solutions $\left(x^{o p t}, \lambda^{o p t}\right)$ satisfies the necessary and sufficient optimality conditions of the Fenchel Duality Theorem [Prop. 1.2.1(c)] for the original problem.

## 数学代写|凸优化作业代写Convex Optimization代考|Dual/Cutting Plane Implementation

Let us also provide a dual implementation, which is an equivalent outer linearization/cutting plane-type of method. The Fenchel dual of the minimization of $f+c$ [cf. Eq. (4.31)] is
\begin{aligned} & \text { minimize } f^{\star}(\lambda)+c^{\star}(-\lambda) \ & \text { subject to } \lambda \in \Re^n, \end{aligned}
where $f^{\star}$ and $c^{\star}$ are the conjugates of $f$ and $c$, respectively. According to the theory of the preceding section, the generalized simplicial decomposition algorithm (4.32)-(4.34) can alternatively be implemented by replacing $c^$ by a piecewise linear/cutting plane outer linearization, while leaving $f^$ unchanged, i.e., by solving at iteration $k$ the problem
\begin{aligned} & \text { minimize } f^{\star}(\lambda)+C_k^{\star}(-\lambda) \ & \text { subject to } \lambda \in \Re^n, \end{aligned}
where $C_k^{\star}$ is an outer linearization of $c^{\star}$ (the conjugate of $C_k$ ). This problem is the (Fenchel) dual of problem (4.32) [or equivalently, the low-dimensional problem (4.36)].

Note that solutions of problem (4.37) are the subgradients $\lambda_k$ satisfying $\lambda_k \in \partial f\left(x_k\right)$ and $-\lambda_k \in \partial C_k\left(x_k\right)$, where $x_k$ is the solution of the problem (4.32) [cf. Eq. (4.33)], while the associated subgradient of $c^*$ at $-\lambda_k$ is the vector $\tilde{x}k$ generated by Eq. (4.34), as shown in Fig. 4.5.1. In fact, the function $C_k^{\star}$ has the form $$C_k^{\star}(-\lambda)=\max {j \in J_k}\left{c\left(-\lambda_j\right)-\tilde{x}_j^{\prime}\left(\lambda-\lambda_j\right)\right}$$
where $\lambda_j$ and $\tilde{x}_j$ are vectors that can be obtained either by using the generalized simplicial decomposition method (4.32)-(4.34), or by using its dual, the cutting plane method based on solving the outer approximation problems (4.37). The ordinary cutting plane method, described in the beginning of Section 4.1, is obtained as the special case where $f^{\star}(\lambda) \equiv 0$ [or equivalently, $f(x)=\infty$ if $x \neq 0$, and $f(0)=0$ ].

# 凸优化代写

## 数学代写|凸优化作业代写Convex Optimization代考|GENERALIZED SIMPLICIAL DECOMPOSITION

\begin{aligned} & \text { minimize } f(x)+c(x) \ & \text { subject to } x \in \Re^n, \end{aligned}

\begin{aligned} & \operatorname{minimize} f_1\left(x_1\right)+f_2\left(x_2\right) \ & \text { subject to }\left(x_1, x_2\right) \in S, \end{aligned}

$$f_1\left(x_1\right)=f\left(x_1\right), \quad f_2\left(x_2\right)=c\left(x_2\right), \quad S=\left{\left(x_1, x_2\right) \mid x_1=x_2\right} .$$

$$S^{\perp}=\left{\left(\lambda_1, \lambda_2\right) \mid \lambda_1=-\lambda_2\right}=\left{(\lambda,-\lambda) \mid \lambda \in \Re^n\right} .$$

$$\lambda^{o p t} \in \partial f\left(x^{o p t}\right), \quad-\lambda^{o p t} \in \partial c\left(x^{o p t}\right),$$

## 数学代写|凸优化作业代写Convex Optimization代考|Dual/Cutting Plane Implementation

\begin{aligned} & \text { minimize } f^{\star}(\lambda)+c^{\star}(-\lambda) \ & \text { subject to } \lambda \in \Re^n, \end{aligned}

\begin{aligned} & \text { minimize } f^{\star}(\lambda)+C_k^{\star}(-\lambda) \ & \text { subject to } \lambda \in \Re^n, \end{aligned}

## Matlab代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 数学代写|凸优化作业代写Convex Optimization代考|EE364a

statistics-lab™ 为您的留学生涯保驾护航 在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

## 数学代写|凸优化作业代写Convex Optimization代考|DUALITY OF INNER AND OUTER LINEARIZATION

We have considered so far cutting plane and simplicial decomposition methods, and we will now aim to connect them via duality. To this end, we define in this section outer and inner linearizations, and we formalize their conjugacy relation and other related properties. An outer linearization of a closed proper convex function $f: \Re^n \mapsto(-\infty, \infty]$ is defined by a finite set of vectors $\left{y_1, \ldots, y_{\ell}\right}$ such that for every $j=1, \ldots, \ell$, we have $y_j \in \partial f\left(x_j\right)$ for some $x_j \in \Re^n$. It is given by
$$F(x)=\max _{j=1, \ldots, \ell}\left{f\left(x_j\right)+\left(x-x_j\right)^{\prime} y_j\right}, \quad x \in \Re^n,$$

and it is illustrated in the left side of Fig. 4.3.1. The choices of $x_j$ such that $y_j \in \partial f\left(x_j\right)$ may not be unique, but result in the same function $F(x)$ : the epigraph of $F$ is determined by the supporting hyperplanes to the epigraph of $f$ with normals defined by $y_j$, and the points of support $x_j$ are immaterial. In particular, the definition (4.14) can be equivalently written in terms of the conjugate $f^{\star}$ of $f$ as
$$F(x)=\max _{j=1, \ldots, \ell}\left{x^{\prime} y_j-f^{\star}\left(y_j\right)\right},$$
using the relation $x_j^{\prime} y_j=f\left(x_j\right)+f^{\star}\left(y_j\right)$, which is implied by $y_j \in \partial f\left(x_j\right)$ (the Conjugate Subgradient Theorem, Prop. 5.4.3 in Appendix B).

Note that $F(x) \leq f(x)$ for all $x$, so as is true for any outer approximation of $f$, the conjugate $F^{\star}$ satisfies $F^{\star}(y) \geq f^{\star}(y)$ for all $y$. Moreover, it can be shown that $F^{\star}$ is an inner linearization of the conjugate $f^{\star}$, as illustrated in the right side of Fig. 4.3.1. Indeed we have, using Eq. (4.15),
\begin{aligned} F^{\star}(y)= & \sup {x \in \Re^n}\left{y^{\prime} x-F(x)\right} \ & =\sup {x \in \Re^n}\left{y^{\prime} x-\max {j=1, \ldots, \ell}\left{y_j^{\prime} x-f^{\star}\left(y_j\right)\right}\right}, \ & =\sup {\substack{x \in \Re^n, \xi \in \Re \ y_j^{\prime} x-f^{\star}\left(y_j\right) \leq \xi, j=1, \ldots, \ell}}\left{y^{\prime} x-\xi\right} . \end{aligned}

## 数学代写|凸优化作业代写Convex Optimization代考|GENERALIZED POLYHEDRAL APPROXIMATION

We will now consider a unified framework for polyhedral approximation, which combines the cutting plane and simplicial decomposition methods. We consider the problem
$$\begin{array}{ll} \operatorname{minimize} & \sum_{i=1}^m f_i\left(x_i\right) \ \text { subject to } & x \in S, \end{array}$$
where
$$x \stackrel{\text { def }}{=}\left(x_1, \ldots, x_m\right),$$

is a vector in $\Re^{n_1+\cdots+n_m}$, with components $x_i \in \Re^{n_i}, i=1, \ldots, m$, and
$f_i: \Re^{n_i} \mapsto(-\infty, \infty]$ is a closed proper convex function for each $i$, $S$ is a subspace of $\Re^{n_1+\cdots+n_m}$.

We refer to this as an extended monotropic program (EMP for short). $\dagger$
A classical example of EMP is a single commodity network optimization problem, where $x_i$ represents the (scalar) flow of an arc of a directed graph and $S$ is the circulation subspace of the graph (see e.g., [Ber98]). Also problems involving general linear constraints and an additive extended realvalued convex cost function can be converted to EMP. In particular, the problem
$$\begin{array}{ll} \text { minimize } & \sum_{i=1}^m f_i\left(x_i\right) \ \text { subject to } & A x=b, \end{array}$$
where $A$ is a given matrix and $b$ is a given vector, is equivalent to
$$\begin{array}{ll} \text { minimize } & \sum_{i=1}^m f_i\left(x_i\right)+\delta_Z(z) \ \text { subject to } & A x-z=0, \end{array}$$
where $z$ is a vector of artificial variables, and $\delta_Z$ is the indicator function of the set $Z={z \mid z=b}$. This is an EMP with constraint subspace
$$S={(x, z) \mid A x-z=0} .$$

# 凸优化代写

## 数学代写|凸优化作业代写Convex Optimization代考|DUALITY OF INNER AND OUTER LINEARIZATION

$$F(x)=\max _{j=1, \ldots, \ell}\left{f\left(x_j\right)+\left(x-x_j\right)^{\prime} y_j\right}, \quad x \in \Re^n,$$

$$F(x)=\max _{j=1, \ldots, \ell}\left{x^{\prime} y_j-f^{\star}\left(y_j\right)\right},$$

\begin{aligned} F^{\star}(y)= & \sup {x \in \Re^n}\left{y^{\prime} x-F(x)\right} \ & =\sup {x \in \Re^n}\left{y^{\prime} x-\max {j=1, \ldots, \ell}\left{y_j^{\prime} x-f^{\star}\left(y_j\right)\right}\right}, \ & =\sup {\substack{x \in \Re^n, \xi \in \Re \ y_j^{\prime} x-f^{\star}\left(y_j\right) \leq \xi, j=1, \ldots, \ell}}\left{y^{\prime} x-\xi\right} . \end{aligned}

## 数学代写|凸优化作业代写Convex Optimization代考|GENERALIZED POLYHEDRAL APPROXIMATION

$$\begin{array}{ll} \operatorname{minimize} & \sum_{i=1}^m f_i\left(x_i\right) \ \text { subject to } & x \in S, \end{array}$$

$$x \stackrel{\text { def }}{=}\left(x_1, \ldots, x_m\right),$$

$f_i: \Re^{n_i} \mapsto(-\infty, \infty]$是每个$i$的闭固有凸函数，$S$是$\Re^{n_1+\cdots+n_m}$的一个子空间。

EMP的一个经典例子是单个商品网络优化问题，其中$x_i$表示有向图的弧线的(标量)流，$S$是图的循环子空间(参见示例[Ber98])。此外，涉及一般线性约束和可加扩展重值凸代价函数的问题也可以转换为EMP
$$\begin{array}{ll} \text { minimize } & \sum_{i=1}^m f_i\left(x_i\right) \ \text { subject to } & A x=b, \end{array}$$

$$\begin{array}{ll} \text { minimize } & \sum_{i=1}^m f_i\left(x_i\right)+\delta_Z(z) \ \text { subject to } & A x-z=0, \end{array}$$

$$S={(x, z) \mid A x-z=0} .$$

## Matlab代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

## 数学代写|凸优化作业代写Convex Optimization代考|Subgradient Methods with Iterate Averaging

statistics-lab™ 为您的留学生涯保驾护航 在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

## 数学代写|凸优化作业代写Convex Optimization代考|Subgradient Methods with Iterate Averaging

If the stepsize $\alpha_k$ in the subgradient method
$$x_{k+1}=P_X\left(x_k-\alpha_k g_k\right)$$
is chosen to be large (such as constant or such that the condition $\sum_{k=0}^{\infty} \alpha_k^2<\infty$ is violated) the method may not converge. This exercise shows that by averaging the iterates of the method, we may obtain convergence with larger stepsizes. Let the optimal solution set $X^$ be nonempty, and assume that for some scalar $c$, we have $$c \geq \sup \left{\left|g_k\right| \mid k=0,1, \ldots\right}, \quad \forall k \geq 0,$$ (cf. Assumption 3.2.1). Assume further that $\alpha_k$ is chosen according to $$\alpha_k=\frac{\theta}{c \sqrt{k+1}}, \quad k=0,1, \ldots$$ where $\theta$ is a positive constant. Show that $$f\left(\bar{x}k\right)-f^ \leq c\left(\frac{\min {x^* \in X^}\left|x_0-x^\right|^2}{2 \theta}+\theta \ln (k+2)\right) \frac{1}{\sqrt{k+1}}, \quad k=0,1, \ldots,$$
where $\bar{x}k$ is the averaged iterate, generated according to $$\bar{x}_k=\frac{\sum{\ell=0}^k \alpha_{\ell} x_{\ell}}{\sum_{\ell=0}^k \alpha_{\ell}}$$

similar analysis applies to incremental and to stochastic subgradient methods.
Abbreviated proof: Denote
$$\delta_k=\frac{1}{2} \min {x^* \in X^}\left|x_k-x^\right|^2 .$$
Applying Prop. 3.2.2(a) with $y$ equal to the projection of $x_k$ onto $X^$, we obtain $$\delta{k+1} \leq \delta_k-\alpha_k\left(f\left(x_k\right)-f^\right)+\frac{1}{2} \alpha_k^2 c^2$$
Adding this inequality from 0 to $k$, and using the fact $\delta_{k+1} \geq 0$,
$$\sum_{\ell=0}^k \alpha_{\ell}\left(f\left(x_k\right)-f^\right) \leq \delta_0+\frac{1}{2} c^2 \sum_{\ell=0}^k \alpha_{\ell}^2,$$ so by dividing with $\sum_{\ell=0}^k \alpha_{\ell}$, $$\frac{\sum_{\ell=0}^k \alpha_{\ell} f\left(x_k\right)}{\sum_{\ell=0}^k \alpha_{\ell}}-f^ \leq \frac{\delta_0+\frac{1}{2} c^2 \sum_{\ell=0}^k \alpha_{\ell}^2}{\sum_{\ell=0}^k \alpha_{\ell}}$$

## 数学代写|凸优化作业代写Convex Optimization代考|Modified Dynamic Stepsize Rules

$$x_{k+1}=P_X\left(x_k-\alpha_k g_k\right)$$
with the stepsize chosen according to one of the two rules
$$\alpha_k=\frac{f\left(x_k\right)-f_k}{\max \left{\gamma,\left|g_k\right|^2\right}} \quad \text { or } \quad \alpha_k=\min \left{\gamma, \frac{f\left(x_k\right)-f_k}{\left|g_k\right|^2}\right}$$

where $\gamma$ is a fixed positive scalar and $f_k$ is given by the dynamic adjustment procedure (3.22)-(3.23). Show that the convergence result of Prop. 3.2.8 still holds. Abbreviated Proof: We proceed by contradiction, as in the proof of Prop. 3.2.8. From Prop. 3.2.2(a) with $y=\bar{y}$, we have for all $k \geq \bar{k}$,
\begin{aligned} \left|x_{k+1}-\bar{y}\right|^2 & \leq\left|x_k-\bar{y}\right|^2-2 \alpha_k\left(f\left(x_k\right)-f(\bar{y})\right)+\alpha_k^2\left|g_k\right|^2 \ & \leq\left|x_k-\bar{y}\right|^2-2 \alpha_k\left(f\left(x_k\right)-f(\bar{y})\right)+\alpha_k\left(f\left(x_k\right)-f_k\right) \ & =\left|x_k-\bar{y}\right|^2-\alpha_k\left(f\left(x_k\right)-f_k\right)-2 \alpha_k\left(f_k-f(\bar{y})\right) \ & \leq\left|x_k-\bar{y}\right|^2-\alpha_k\left(f\left(x_k\right)-f_k\right) . \end{aligned}
Hence $\left{x_k\right}$ is bounded, which implies that $\left{g_k\right}$ is also bounded (cf. Prop. 3.1.2). Let $\bar{c}$ be such that $\left|g_k\right| \leq \bar{c}$ for all $k$. Assume that $\alpha_k$ is chosen according to the first rule in Eq. (3.47). Then from the preceding relation we have for all $k \geq \bar{k}$,
$$\left|x_{k+1}-\bar{y}\right|^2 \leq\left|x_k-\bar{y}\right|^2-\frac{\delta^2}{\max \left{\gamma, \bar{c}^2\right}} .$$
As in the proof of Prop. 3.2.8, this leads to a contradiction and the result follows. The proof is similar if $\alpha_k$ is chosen according to the second rule in Eq. (3.47).

# 凸优化代写

## 数学代写|凸优化作业代写Convex Optimization代考|Subgradient Methods with Iterate Averaging

$$x_{k+1}=P_X\left(x_k-\alpha_k g_k\right)$$

$$\delta_k=\frac{1}{2} \min {x^* \in X^}\left|x_k-x^\right|^2 .$$

$$\sum_{\ell=0}^k \alpha_{\ell}\left(f\left(x_k\right)-f^\right) \leq \delta_0+\frac{1}{2} c^2 \sum_{\ell=0}^k \alpha_{\ell}^2,$$除以$\sum_{\ell=0}^k \alpha_{\ell}$， $$\frac{\sum_{\ell=0}^k \alpha_{\ell} f\left(x_k\right)}{\sum_{\ell=0}^k \alpha_{\ell}}-f^ \leq \frac{\delta_0+\frac{1}{2} c^2 \sum_{\ell=0}^k \alpha_{\ell}^2}{\sum_{\ell=0}^k \alpha_{\ell}}$$

## 数学代写|凸优化作业代写Convex Optimization代考|Modified Dynamic Stepsize Rules

$$x_{k+1}=P_X\left(x_k-\alpha_k g_k\right)$$

$$\alpha_k=\frac{f\left(x_k\right)-f_k}{\max \left{\gamma,\left|g_k\right|^2\right}} \quad \text { or } \quad \alpha_k=\min \left{\gamma, \frac{f\left(x_k\right)-f_k}{\left|g_k\right|^2}\right}$$

\begin{aligned} \left|x_{k+1}-\bar{y}\right|^2 & \leq\left|x_k-\bar{y}\right|^2-2 \alpha_k\left(f\left(x_k\right)-f(\bar{y})\right)+\alpha_k^2\left|g_k\right|^2 \ & \leq\left|x_k-\bar{y}\right|^2-2 \alpha_k\left(f\left(x_k\right)-f(\bar{y})\right)+\alpha_k\left(f\left(x_k\right)-f_k\right) \ & =\left|x_k-\bar{y}\right|^2-\alpha_k\left(f\left(x_k\right)-f_k\right)-2 \alpha_k\left(f_k-f(\bar{y})\right) \ & \leq\left|x_k-\bar{y}\right|^2-\alpha_k\left(f\left(x_k\right)-f_k\right) . \end{aligned}

$$\left|x_{k+1}-\bar{y}\right|^2 \leq\left|x_k-\bar{y}\right|^2-\frac{\delta^2}{\max \left{\gamma, \bar{c}^2\right}} .$$

## 数学代写|凸优化作业代写Convex Optimization代考|Connection with Incremental Subgradient Methods

statistics-lab™ 为您的留学生涯保驾护航 在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

## 数学代写|凸优化作业代写Convex Optimization代考|Connection with Incremental Subgradient Methods

We discussed in Section 2.1.5 incremental variants of gradient methods, which apply to minimization over a closed convex set $X$ of an additive cost function of the form
$$f(x)=\sum_{i=1}^m f_i(x),$$
where the functions $f_i: \Re^n \mapsto \Re$ are differentiable. Incremental variants of the subgradient method are also possible in the case where the $f_i$ are nondifferentiable but convex. The idea is to sequentially take steps along the subgradients of the component functions $f_i$, with intermediate adjustment of $x$ after processing each $f_i$. We simply use an arbitrary subgradient of $f_i$ at a point where $f_i$ is nondifferentiable, in place of the gradient that would be used if $f_i$ were differentiable at that point.

Incremental methods are particularly interesting when the number of cost terms $m$ is very large. Then a full subgradient step is very costly, and one hopes to make progress with approximate but much cheaper incremental steps. We will discuss in detail incremental subgradient methods and their combinations with other methods, such as incremental proximal methods, in Section 6.4. In this section we will discuss the most common type of incremental subgradient method, and highlight its connection with the $\epsilon$-subgradient method.

Let us consider the minimization of $\sum_{i=1}^m f_i$ over $x \in X$, for the case where each $f_i$ is a convex real-valued function. Similar to the incremental gradient methods of Section 2.1.5, we view an iteration as a cycle of $m$ subiterations. If $x_k$ is the vector obtained after $k$ cycles, the vector $x_{k+1}$ obtained after one more cycle is
$$x_{k+1}=\psi_{m, k},$$
where starting with $\psi_{0, k}=x_k$, we obtain $\psi_{m, k}$ after the $m$ steps
$$\psi_{i, k}=P_X\left(\psi_{i-1, k}-\alpha_k g_{i, k}\right), \quad i=1, \ldots, m,$$
with $g_{i, k}$ being an arbitrary subgradient of $f_i$ at $\psi_{i-1, k}$.

## 数学代写|凸优化作业代写Convex Optimization代考|NOTES, SOURCES, AND EXERCISES

Section 3.1: Subgradients are central in the work of Fenchel [Fen51]. The original theorem by Danskin [Dan67] provides a formula for the directional derivative of the maximum of a (not necessarily convex) directionally differentiable function. When adapted to a convex function $f$, this formula yields Eq. (3.10) for the subdifferential of $f$; see Exercise 3.5.

Another important subdifferential formula relates to the subgradients of an expected value function
$$f(x)=E{F(x, \omega)},$$

where $\omega$ is a random variable taking values in a set $\Omega$, and $F(\cdot, \omega): \Re^n \mapsto \Re$ is a real-valued convex function such that $f$ is real-valued (note that $f$ is easily verified to be convex). If $\omega$ takes a finite number of values with probabilities $p(\omega)$, then the formulas
$$f^{\prime}(x ; d)=E\left{F^{\prime}(x, \omega ; d)\right}, \quad \partial f(x)=E{\partial F(x, \omega)},$$
hold because they can be written in terms of finite sums as
$$f^{\prime}(x ; d)=\sum_{\omega \in \Omega} p(\omega) F^{\prime}(x, \omega ; d), \quad \partial f(x)=\sum_{\omega \in \Omega} p(\omega) \partial F(x, \omega),$$
so Prop. 3.1.3(b) applies. However, the formulas (3.32) hold even in the case where $\Omega$ is uncountably infinite, with appropriate mathematical interpretation of the integral of set-valued functions $E{\partial F(x, \omega)}$ as the set of integrals
$$\int_{\omega \in \Omega} g(x, \omega) d P(\omega)$$
where $g(x, \omega) \in \partial F(x, \omega), \omega \in \Omega$ (measurability issues must be addressed in this context). For a formal proof and analysis, see the author’s papers [Ber72], [Ber73], which also provide a necessary and sufficient condition for $f$ to be differentiable, even when $F(\cdot, \omega)$ is not. In this connection, it is important to note that the integration over $\omega$ in Eq. (3.33) may smooth out the nondifferentiabilities of $F(\cdot, \omega)$ if $\omega$ is a “continuous” random variable. This property can be used in turn in algorithms, including schemes that bring to bear the methodology of differentiable optimization; see e.g., Yousefian, Nedić, and Shanbhag [YNS10], [YNS12], Agarwal and Duchi [AgD11], Duchi, Bartlett, and Wainwright [DBW12], Brown and Smith [BrS13], Abernethy et al. [ALS14], and Jiang and Zhang [JiZ14].

# 凸优化代写

## 数学代写|凸优化作业代写Convex Optimization代考|Connection with Incremental Subgradient Methods

$$f(x)=\sum_{i=1}^m f_i(x),$$

$$x_{k+1}=\psi_{m, k},$$

$$\psi_{i, k}=P_X\left(\psi_{i-1, k}-\alpha_k g_{i, k}\right), \quad i=1, \ldots, m,$$

## 数学代写|凸优化作业代写Convex Optimization代考|NOTES, SOURCES, AND EXERCISES

$$f(x)=E{F(x, \omega)},$$

$$f^{\prime}(x ; d)=E\left{F^{\prime}(x, \omega ; d)\right}, \quad \partial f(x)=E{\partial F(x, \omega)},$$

$$f^{\prime}(x ; d)=\sum_{\omega \in \Omega} p(\omega) F^{\prime}(x, \omega ; d), \quad \partial f(x)=\sum_{\omega \in \Omega} p(\omega) \partial F(x, \omega),$$

$$\int_{\omega \in \Omega} g(x, \omega) d P(\omega)$$

## 数学代写|凸优化作业代写Convex Optimization代考|Linear Convergence Rate of Incremental Gradient Method

statistics-lab™ 为您的留学生涯保驾护航 在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

## 数学代写|凸优化作业代写Convex Optimization代考|Linear Convergence Rate of Incremental Gradient Method

This exercise quantifies the rate of convergence of the incremental gradient method to the “region of confusion” (cf. Fig. 2.1.11), for any order of processing the additive cost components, assuming these components are positive definite quadratic. Consider the incremental gradient method
$$x_{k+1}=x_k-\alpha \nabla f_k\left(x_k\right) \quad k=0,1, \ldots,$$
where $f_0, f_1, \ldots$, are quadratic functions with eigenvalues lying within some interval $[\gamma, \Gamma]$, where $\gamma>0$. Suppose that for a given $\epsilon>0$, there is a vector $x^$ such that $$\left|\nabla f_k\left(x^\right)\right| \leq \epsilon, \quad \forall k=0,1, \ldots$$

Show that for all $\alpha$ with $0<\alpha \leq 2 /(\gamma+\Gamma)$, the generated sequence $\left{x_k\right}$ converges to a $2 \epsilon / \gamma$-neighborhood of $x^$, i.e., $$\limsup {k \rightarrow \infty}\left|x_k-x^\right| \leq \frac{2 \epsilon}{\gamma} .$$ Moreover the rate of convergence to this neighborhood is linear, in the sense that $$\left|x_k-x^\right|>\frac{2 \epsilon}{\gamma} \quad \Rightarrow \quad\left|x{k+1}-x^\right|<\left(1-\frac{\alpha \gamma}{2}\right)\left|x_k-x^\right|,$$ while $$\left|x_k-x^\right| \leq \frac{2 \epsilon}{\gamma} \quad \Rightarrow \quad\left|x_{k+1}-x^\right| \leq \frac{2 \epsilon}{\gamma} .$$ Hint: Let $f_k(x)=\frac{1}{2} x^{\prime} Q_k x-b_k^{\prime} x$, where $Q_k$ is positive definite symmetric, and write $$x_{k+1}-x^=\left(I-\alpha Q_k\right)\left(x_k-x^\right)-\alpha \nabla f_k\left(x^\right) .$$
For other related convergence rate results, see [NeB00] and [Sch14a].

## 数学代写|凸优化作业代写Convex Optimization代考|Proximal Gradient Method, £1-Regularization, and the Shrinkage Operation

The proximal gradient iteration (2.27) is well suited for problems involving a nondifferentiable function component that is convenient for a proximal iteration. This exercise considers the important case of the $\ell_1$ norm. Consider the problem
\begin{aligned} & \operatorname{minimize} f(x)+\gamma|x|_1 \ & \text { subject to } x \in \Re^n, \end{aligned}
where $f: \Re^n \mapsto \Re$ is a differentiable convex function, $|\cdot|_1$ is the $\ell_1$ norm, and $\gamma>0$. The proximal gradient iteration is given by the gradient step
$$z_k=x_k-\alpha \nabla f\left(x_k\right)$$
followed by the proximal step
$$x_{k+1} \in \arg \min {x \in \Re^n}\left{\gamma|x|_1+\frac{1}{2 \alpha}\left|x-z_k\right|^2\right}$$ [cf. Eq. (2.28)]. Show that the proximal step can be performed separately for each coordinate $x^i$ of $x$, and is given by the so-called shrinkage operation: $$x{k+1}^i=\left{\begin{array}{ll} z_k^i-\alpha \gamma & \text { if } z_k^i>\alpha \gamma, \ 0 & \text { if }\left|z_k^i\right| \leq \alpha \gamma, \ z_k^i+\alpha \gamma & \text { if } z_k^i<-\alpha \gamma, \end{array} \quad i=1, \ldots, n .\right.$$
Note: Since the shrinkage operation tends to set many coordinates $x_{k+1}^i$ to 0 , it tends to produce “sparse” iterates.

# 凸优化代写

## 数学代写|凸优化作业代写Convex Optimization代考|Linear Convergence Rate of Incremental Gradient Method

$$x_{k+1}=x_k-\alpha \nabla f_k\left(x_k\right) \quad k=0,1, \ldots,$$

## 数学代写|凸优化作业代写Convex Optimization代考|Proximal Gradient Method, £1-Regularization, and the Shrinkage Operation

\begin{aligned} & \operatorname{minimize} f(x)+\gamma|x|1 \ & \text { subject to } x \in \Re^n, \end{aligned} 其中$f: \Re^n \mapsto \Re$为可微凸函数，$|\cdot|_1$为$\ell_1$范数，$\gamma>0$。近端梯度迭代由梯度步长给出 $$z_k=x_k-\alpha \nabla f\left(x_k\right)$$ 接着是近端步骤 $$x{k+1} \in \arg \min {x \in \Re^n}\left{\gamma|x|1+\frac{1}{2 \alpha}\left|x-z_k\right|^2\right}$$[参见式(2.28)]。表明可以对$x$的每个坐标$x^i$分别执行近端步骤，并由所谓的收缩操作:$$x{k+1}^i=\left{\begin{array}{ll} z_k^i-\alpha \gamma & \text { if } z_k^i>\alpha \gamma, \ 0 & \text { if }\left|z_k^i\right| \leq \alpha \gamma, \ z_k^i+\alpha \gamma & \text { if } z_k^i<-\alpha \gamma, \end{array} \quad i=1, \ldots, n .\right.$$给出 注意:由于收缩操作倾向于将许多坐标$x{k+1}^i$设置为0，因此它倾向于产生“稀疏”迭代。

## 数学代写|凸优化作业代写Convex Optimization代考|Armijo/Backtracking Stepsize Rule

statistics-lab™ 为您的留学生涯保驾护航 在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

## 数学代写|凸优化作业代写Convex Optimization代考|Armijo/Backtracking Stepsize Rule

Consider minimization of a continuously differentiable function $f: \Re^n \mapsto \Re$, using the iteration
$$x_{k+1}=x_k+\alpha_k d_k$$
where $d_k$ is a descent direction. Given fixed scalars $\beta$, and $\sigma$, with $0<\beta<1$, $0<\sigma<1$, and $s_k$ with $\inf _{k \geq 0} s_k>0$, the stepsize $\alpha_k$ is determined as follows: we set $\alpha_k=\beta^{m_k} s_k$, where $m_k$ is the first nonnegative integer $m$ for which
$$f\left(x_k\right)-f\left(x_k+\beta^m s_k d_k\right) \geq-\sigma \beta^m s_k \nabla f\left(x_k\right)^{\prime} d_k$$
Assume that there exist positive scalars $c_1, c_2$ such that for all $k$ we have
$$c_1\left|\nabla f\left(x_k\right)\right|^2 \leq-\nabla f\left(x_k\right)^{\prime} d_k, \quad\left|d_k\right|^2 \leq c_2\left|\nabla f\left(x_k\right)\right|^2$$
(a) Show that the stepsize $\alpha_k$ is well-defined, i.e., that it will be determined after a finite number of reductions if $\nabla f\left(x_k\right) \neq 0$. Proof: We have for all $s>0$
$$f\left(x_k+s d_k\right)-f\left(x_k\right)=s \nabla f\left(x_k\right)^{\prime} d_k+o(s)$$
Thus the test for acceptance of a stepsize $s>0$ is written as
$$s \nabla f\left(x_k\right)^{\prime} d_k+o(s) \leq \sigma s \nabla f\left(x_k\right)^{\prime} d_k,$$
or using Eq. (2.72),
$$\frac{o(s)}{s} \leq(1-\sigma) c_1\left|\nabla f\left(x_k\right)\right|^2,$$
which is satisfied for $s$ in some interval $\left(0, \bar{s}_k\right]$. Thus the test will be passed for all $m$ for which $\beta^m s_k \leq \bar{s}_k$.

Show that every limit point $\bar{x}$ of the generated sequence $\left{x_k\right}$ satisfies $\nabla f(\bar{x})=0$. Proof: Assume, to arrive at a contradiction, that there is a subsequence $\left{x_k\right}_{\mathcal{K}}$ that converges to some $\bar{x}$ with $\nabla f(\bar{x}) \neq 0$. Since $\left{f\left(x_k\right)\right}$ is monotonically nonincreasing, $\left{f\left(x_k\right)\right}$ either converges to a finite value or diverges to $-\infty$. Since $f$ is continuous, $f(\bar{x})$ is a limit point of $\left{f\left(x_k\right)\right}$, so it follows that the entire sequence $\left{f\left(x_k\right)\right}$ converges to $f(\bar{x})$. Hence,
$$f\left(x_k\right)-f\left(x_{k+1}\right) \rightarrow 0$$
By the definition of the Armijo rule and the descent property $\nabla f\left(x_k\right)^{\prime} d_k \leq$ 0 of the direction $d_k$, we have
$$f\left(x_k\right)-f\left(x_{k+1}\right) \geq-\sigma \alpha_k \nabla f\left(x_k\right)^{\prime} d_k \geq 0,$$
so by combining the preceding two relations,
$$\alpha_k \nabla f\left(x_k\right)^{\prime} d_k \rightarrow 0$$
From the left side of Eq. (2.72) and the hypothesis $\nabla f(\bar{x}) \neq 0$, it follows that
$$\limsup {\substack{k \rightarrow \infty \ k \in \mathcal{K}}} \nabla f\left(x_k\right)^{\prime} d_k<0$$ which together with Eq. (2.73) implies that $$\left{\alpha_k\right}{\mathcal{K}} \rightarrow 0 .$$
which together with Eq. (2.73) implies that
$$\left{\alpha_k\right}_{\mathcal{K}} \rightarrow 0$$
Since $s_k$, the initial trial value for $\alpha_k$, is bounded away from $0, s_k$ will be reduced at least once for all $k \in \mathcal{K}$ that are greater than some iteration index $\bar{k}$. Thus we must have for all $k \in \mathcal{K}$ with $k>\bar{k}$
$$f\left(x_k\right)-f\left(x_k+\left(\alpha_k / \beta\right) d_k\right)<-\sigma\left(\alpha_k / \beta\right) \nabla f\left(x_k\right)^{\prime} d_k$$

## 数学代写|凸优化作业代写Convex Optimization代考|Convergence of Steepest Descent to a Single Limit

Let $f: \Re^n \mapsto \Re$ be a differentiable convex function, and assume that for some $L>0$, we have
$$|\nabla f(x)-\nabla f(y)| \leq L|x-y|, \quad \forall x, y \in \Re^n .$$
Let $X^$ be the set of minima of $f$, and assume that $X^$ is nonempty. Consider the steepest descent method
$$x_{k+1}=x_k-\alpha_k \nabla f\left(x_k\right)$$
Show that $\left{x_k\right}$ converges to a minimizing point of $f$ under each of the following two stepsize rule conditions:
(i) For some $\epsilon>0$, we have
$$\epsilon \leq \alpha_k \leq \frac{2(1-\epsilon)}{L}, \quad \forall k .$$
(ii) $\alpha_k \rightarrow 0$ and $\sum_{k=0}^{\infty} \alpha_k=\infty$.
Notes: The original source [BGI95] also shows convergence to a single limit for a variant of the Armijo rule. This should be contrasted with a result of [Gon00], which shows that the steepest descent method with the exact line minimization rule may produce a sequence with multiple limit points (all of which are of course optimal), even for a convex cost function. There is also a “local capture” theorem that applies to gradient methods for nonconvex continuously differentiable cost functions $f$ and an isolated local minimum of $f$ (a local minimum $x^$ that is unique within a neighborhood of $x^$ ). Under mild conditions it asserts that there is an open sphere $S_{x^}$ centered at $x^$ such that once the generated sequence $\left{x_k\right}$ enters $S_{x^}$, it converges to $x^$ (see [Ber82a], Prop. 1.12, or [Ber99], Prop. 1.2.5 and the references given there). Abbreviated Proof: Consider the stepsize rule (i). From the descent inequality (Exercise 2.2), we have for all $k$
$$f\left(x_{k+1}\right) \leq f\left(x_k\right)-\alpha_k\left(1-\frac{\alpha_k L}{2}\right)\left|\nabla f\left(x_k\right)\right|^2 \leq f\left(x_k\right)-\epsilon^2\left|\nabla f\left(x_k\right)\right|^2$$
so $\left{f\left(x_k\right)\right}$ is monotonically nonincreasing and converges. Adding the preceding relation for all values of $k$ and taking the limit as $k \rightarrow \infty$, we obtain for all $x^* \in X^$, $$f\left(x^\right) \leq f\left(x_0\right)-\epsilon^2 \sum_{k=0}^{\infty}\left|\nabla f\left(x_k\right)\right|^2$$

# 凸优化代写

## 数学代写|凸优化作业代写Convex Optimization代考|Armijo/Backtracking Stepsize Rule

$$x_{k+1}=x_k+\alpha_k d_k$$

$$f\left(x_k\right)-f\left(x_k+\beta^m s_k d_k\right) \geq-\sigma \beta^m s_k \nabla f\left(x_k\right)^{\prime} d_k$$

$$c_1\left|\nabla f\left(x_k\right)\right|^2 \leq-\nabla f\left(x_k\right)^{\prime} d_k, \quad\left|d_k\right|^2 \leq c_2\left|\nabla f\left(x_k\right)\right|^2$$
(a)证明步长$\alpha_k$是定义良好的，即，如果$\nabla f\left(x_k\right) \neq 0$，它将在有限次缩减后确定。证明:我们有所有$s>0$
$$f\left(x_k+s d_k\right)-f\left(x_k\right)=s \nabla f\left(x_k\right)^{\prime} d_k+o(s)$$

$$s \nabla f\left(x_k\right)^{\prime} d_k+o(s) \leq \sigma s \nabla f\left(x_k\right)^{\prime} d_k,$$

$$\frac{o(s)}{s} \leq(1-\sigma) c_1\left|\nabla f\left(x_k\right)\right|^2,$$

$$f\left(x_k\right)-f\left(x_{k+1}\right) \geq-\sigma \alpha_k \nabla f\left(x_k\right)^{\prime} d_k \geq 0,$$

$$\alpha_k \nabla f\left(x_k\right)^{\prime} d_k \rightarrow 0$$

$$\limsup {\substack{k \rightarrow \infty \ k \in \mathcal{K}}} \nabla f\left(x_k\right)^{\prime} d_k<0$$与式(2.73)一起，意味着$$\left{\alpha_k\right}{\mathcal{K}} \rightarrow 0 .$$ 它与式(2.73)一起意味着 $$\left{\alpha_k\right}_{\mathcal{K}} \rightarrow 0$$ 由于$\alpha_k$的初始试验值$s_k$与$0, s_k$有界限，因此对于所有大于某个迭代索引$\bar{k}$的$k \in \mathcal{K}$，至少会减少一次。因此，我们必须为所有$k \in \mathcal{K}$ with $k>\bar{k}$
$$f\left(x_k\right)-f\left(x_k+\left(\alpha_k / \beta\right) d_k\right)<-\sigma\left(\alpha_k / \beta\right) \nabla f\left(x_k\right)^{\prime} d_k$$

## 数学代写|凸优化作业代写Convex Optimization代考|Convergence of Steepest Descent to a Single Limit

$$|\nabla f(x)-\nabla f(y)| \leq L|x-y|, \quad \forall x, y \in \Re^n .$$

$$x_{k+1}=x_k-\alpha_k \nabla f\left(x_k\right)$$

(i)对于一些$\epsilon>0$，我们有
$$\epsilon \leq \alpha_k \leq \frac{2(1-\epsilon)}{L}, \quad \forall k .$$
(ii) $\alpha_k \rightarrow 0$和$\sum_{k=0}^{\infty} \alpha_k=\infty$。

$$f\left(x_{k+1}\right) \leq f\left(x_k\right)-\alpha_k\left(1-\frac{\alpha_k L}{2}\right)\left|\nabla f\left(x_k\right)\right|^2 \leq f\left(x_k\right)-\epsilon^2\left|\nabla f\left(x_k\right)\right|^2$$

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。