标签： CPD131

数学代写|凸优化作业代写Convex Optimization代考|IE3078

Posted on 2023年8月21日2023年8月28日 by statistics-lab

如果你也在怎样代写凸优化Convex optimization 这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。凸优化Convex optimization由于在大规模资源分配、信号处理和机器学习等领域的广泛应用，人们对凸优化的兴趣越来越浓厚。本书旨在解决凸优化问题的算法的最新和可访问的发展。

凸优化Convex optimization无约束可以很容易地用梯度下降(最陡下降的特殊情况)或牛顿方法解决，结合线搜索适当的步长;这些可以在数学上证明收敛速度很快，尤其是后一种方法。如果目标函数是二次函数，也可以使用KKT矩阵技术求解具有线性等式约束的凸优化(它推广到牛顿方法的一种变化，即使初始化点不满足约束也有效)，但通常也可以通过线性代数消除等式约束或解决对偶问题来解决。

statistics-lab™ 为您的留学生涯保驾护航在代写凸优化Convex Optimization方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写凸优化Convex Optimization代写方面经验极为丰富，各种代写凸优化Convex Optimization相关的作业也就用不着说。

数学代写|凸优化作业代写Convex Optimization代考|IE3078

数学代写|凸优化作业代写Convex Optimization代考|Maximum Likelihood Estimation

Suppose $\boldsymbol{x} \in \mathbb{R}^n$ follows a probability density function $f(\boldsymbol{x} ; \boldsymbol{\theta})$ that is governed by some parameter $\boldsymbol{\theta} \in \Theta$. We had known the exact form of function $f(\cdot)$ but not the value of $\theta$. ${ }^1$

We need to determine the value of $\boldsymbol{\theta}$ from a series of data samples $\boldsymbol{x}_i \in \mathbb{R}^n, i=$ $1, \ldots, n$. One popular method to reach this goal is using the maximum likelihood estimation (MLE) method $[2,3]$.

The likelihood function $L$ is usually defined as the function obtained by reversing the roles of $\boldsymbol{x}$ and $\boldsymbol{\theta}$ as
$$
L\left(\boldsymbol{\theta} ; \boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\right)=f\left(\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n ; \boldsymbol{\theta}\right)
$$
In MLE, we will try to find a value $\boldsymbol{\theta}^$ of the parameter $\boldsymbol{\theta}$ that maximizes $L(\boldsymbol{\theta} ; \boldsymbol{x})$ for all the data sample. Here $\boldsymbol{\theta}^$ is called a maximum likelihood estimator of $\boldsymbol{\theta}$. In this book, we assume the number of data samples is large enough to make a good estimation. Some further discussions on maximum likelihood method can be found in [4].

If these data are independent and identically distributed (i.i.d.), the resulting likelihood function for the samples can be written as
$$
L\left(\boldsymbol{\theta} ; \boldsymbol{x}1, \ldots, \boldsymbol{x}_n\right)=\prod{i=1}^n f\left(\boldsymbol{x}i ; \boldsymbol{\theta}\right) $$ Thus, the parameter estimation problem can then be formulated as the following optimization problem: $$ \max {\boldsymbol{\theta}} L\left(\boldsymbol{\theta} ; \boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\right)
$$
Notice that the natural logarithm function $\ln (\cdot)$ is concave and strictly increasing. If the maximum value of $L(\boldsymbol{\theta} ; \boldsymbol{x})$ does exist, it will occur at the same points as that of $\ln [L(\boldsymbol{\theta} ; \boldsymbol{x})]$. This function is called the log likelihood function and in many cases is easier to work out than the likelihood function, since $L(\boldsymbol{\theta} ; \boldsymbol{x})$ has a product structure.

数学代写|凸优化作业代写Convex Optimization代考|Measurements with iid Noise

In many cases, we need to determine a vector parameter $\vartheta \in \mathbb{R}^p$ from a series of input-output measurement pairs $\left(\boldsymbol{x}_i, y_i\right), \boldsymbol{x}_i \in \mathbb{R}^p, y_i \in \mathbb{R}, i=1, \ldots, n$. The linear relationship between these variables can be written as
$$
y_i=\boldsymbol{\vartheta}^T \boldsymbol{x}_i+v_i
$$
where $v_i \in \mathbb{R}$ are i.i.d. random variables whose probability density function $f(v)$ is known to us.

We can still use the maximum likelihood estimation (MLE) method to estimate $\vartheta$. The corresponding likelihood function is formulated as
$$
L\left(\boldsymbol{\vartheta} ;\left(\boldsymbol{x}1, y_1\right), \ldots,\left(\boldsymbol{x}_n, y_n\right)\right)=\prod{i=1}^n f\left(y_i-\boldsymbol{\vartheta}^T \boldsymbol{x}_i ; \boldsymbol{\vartheta}\right)
$$
If the probability density function $f(v)$ is log-concave, the estimation of $\vartheta$ can be formulated as a convex optimization problem in terms of $\vartheta$.

Example 3.7 If $v_i$ follows a normal distribution with mean $\mu$ and variance $\sigma^2$, we have the log likelihood function written as
$$
\ln L\left(\boldsymbol{\vartheta} ;\left(\boldsymbol{x}1, y_1\right), \ldots,\left(\boldsymbol{x}_n, y_n\right)\right)=-n \ln \sigma-\frac{n}{2} \ln 2 \pi-\frac{1}{2 \sigma^2} \sum{i=1}^n\left(y_i-\boldsymbol{\vartheta}^T \boldsymbol{x}i-\mu\right)^2 $$ Thus, the maximum likelihood estimator of $\vartheta$ is the optimal solution of the following least squares approximation problem; see also discussions in Sect. 4.1 $$ \min {\vartheta} \sum_{i=1}^n\left(y_i-\vartheta^T \boldsymbol{x}_i-\mu\right)^2
$$
Example 3.8 If $v_i$ follows a Laplacian distribution with probability density function written as
$$
f(v)=\frac{1}{2 \tau} e^{-|v-\mu| / \tau}
$$
where $\tau>0$ and $\mu$ is the mean value.

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|Maximum Likelihood Estimation

假设$\boldsymbol{x} \in \mathbb{R}^n$遵循由某个参数$\boldsymbol{\theta} \in \Theta$控制的概率密度函数$f(\boldsymbol{x} ; \boldsymbol{\theta})$。我们知道函数$f(\cdot)$的确切形式，但不知道$\theta$的值。 ${ }^1$

我们需要从一系列数据样本$\boldsymbol{x}_i \in \mathbb{R}^n, i=$$1, \ldots, n$中确定$\boldsymbol{\theta}$的值。实现这一目标的一个流行方法是使用最大似然估计(MLE)方法$[2,3]$。

似然函数$L$通常定义为将$\boldsymbol{x}$和$\boldsymbol{\theta}$ as的作用反向得到的函数
$$
L\left(\boldsymbol{\theta} ; \boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\right)=f\left(\boldsymbol{x}_1, \ldots, \boldsymbol{x}_n ; \boldsymbol{\theta}\right)
$$
在MLE中，我们将尝试找到参数$\boldsymbol{\theta}$的值$\boldsymbol{\theta}^$，使所有数据样本的$L(\boldsymbol{\theta} ; \boldsymbol{x})$最大化。这里$\boldsymbol{\theta}^$被称为$\boldsymbol{\theta}$的极大似然估计量。在本书中，我们假设数据样本的数量足够大，可以进行很好的估计。关于极大似然法的进一步讨论可以在[4]中找到。

如果这些数据是独立且同分布的(i.i.d)，则得到的样本似然函数可以写成
$$
L\left(\boldsymbol{\theta} ; \boldsymbol{x}1, \ldots, \boldsymbol{x}_n\right)=\prod{i=1}^n f\left(\boldsymbol{x}i ; \boldsymbol{\theta}\right) $$因此，参数估计问题可以表示为如下优化问题:$$ \max {\boldsymbol{\theta}} L\left(\boldsymbol{\theta} ; \boldsymbol{x}_1, \ldots, \boldsymbol{x}_n\right)
$$
注意，自然对数函数$\ln (\cdot)$是凹的，并且严格递增。如果确实存在$L(\boldsymbol{\theta} ; \boldsymbol{x})$的最大值，它将出现在与$\ln [L(\boldsymbol{\theta} ; \boldsymbol{x})]$相同的点上。这个函数被称为对数似然函数，在很多情况下比似然函数更容易计算，因为$L(\boldsymbol{\theta} ; \boldsymbol{x})$有一个乘积结构。

数学代写|凸优化作业代写Convex Optimization代考|Measurements with iid Noise

在许多情况下，我们需要从一系列输入-输出测量对$\left(\boldsymbol{x}_i, y_i\right), \boldsymbol{x}_i \in \mathbb{R}^p, y_i \in \mathbb{R}, i=1, \ldots, n$中确定一个矢量参数$\vartheta \in \mathbb{R}^p$。这些变量之间的线性关系可以写成
$$
y_i=\boldsymbol{\vartheta}^T \boldsymbol{x}_i+v_i
$$
其中$v_i \in \mathbb{R}$为i.i.d随机变量，其概率密度函数$f(v)$为已知。

我们仍然可以使用极大似然估计(MLE)方法来估计$\vartheta$。相应的似然函数表示为
$$
L\left(\boldsymbol{\vartheta} ;\left(\boldsymbol{x}1, y_1\right), \ldots,\left(\boldsymbol{x}_n, y_n\right)\right)=\prod{i=1}^n f\left(y_i-\boldsymbol{\vartheta}^T \boldsymbol{x}_i ; \boldsymbol{\vartheta}\right)
$$
如果概率密度函数$f(v)$是log-凹的，则对$\vartheta$的估计可以表示为一个关于$\vartheta$的凸优化问题。

如果$v_i$服从均值$\mu$和方差$\sigma^2$的正态分布，我们将对数似然函数写成
$$
\ln L\left(\boldsymbol{\vartheta} ;\left(\boldsymbol{x}1, y_1\right), \ldots,\left(\boldsymbol{x}n, y_n\right)\right)=-n \ln \sigma-\frac{n}{2} \ln 2 \pi-\frac{1}{2 \sigma^2} \sum{i=1}^n\left(y_i-\boldsymbol{\vartheta}^T \boldsymbol{x}i-\mu\right)^2 $$因此，$\vartheta$的极大似然估计量是以下最小二乘近似问题的最优解;参见第4.1节$$ \min {\vartheta} \sum{i=1}^n\left(y_i-\vartheta^T \boldsymbol{x}_i-\mu\right)^2
$$的讨论
例3.8如果$v_i$遵循拉普拉斯分布，其概率密度函数为
$$
f(v)=\frac{1}{2 \tau} e^{-|v-\mu| / \tau}
$$
其中$\tau>0$和$\mu$为平均值。

数学代写|凸优化作业代写Convex Optimization代考请认准statistics-lab™

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

微观经济学代写

微观经济学是主流经济学的一个分支，研究个人和企业在做出有关稀缺资源分配的决策时的行为以及这些个人和企业之间的相互作用。my-assignmentexpert™ 为您的留学生涯保驾护航在数学Mathematics作业代写方面已经树立了自己的口碑, 保证靠谱, 高质且原创的数学Mathematics代写服务。我们的专家在图论代写Graph Theory代写方面经验极为丰富，各种图论代写Graph Theory相关的作业也就用不着说。

线性代数代写

线性代数是数学的一个分支，涉及线性方程，如：线性图，如：以及它们在向量空间和通过矩阵的表示。线性代数是几乎所有数学领域的核心。

博弈论代写

现代博弈论始于约翰-冯-诺伊曼（John von Neumann）提出的两人零和博弈中的混合策略均衡的观点及其证明。冯-诺依曼的原始证明使用了关于连续映射到紧凑凸集的布劳威尔定点定理，这成为博弈论和数学经济学的标准方法。在他的论文之后，1944年，他与奥斯卡-莫根斯特恩（Oskar Morgenstern）共同撰写了《游戏和经济行为理论》一书，该书考虑了几个参与者的合作游戏。这本书的第二版提供了预期效用的公理理论，使数理统计学家和经济学家能够处理不确定性下的决策。

微积分代写

微积分，最初被称为无穷小微积分或 “无穷小的微积分”，是对连续变化的数学研究，就像几何学是对形状的研究，而代数是对算术运算的概括研究一样。

它有两个主要分支，微分和积分；微分涉及瞬时变化率和曲线的斜率，而积分涉及数量的累积，以及曲线下或曲线之间的面积。这两个分支通过微积分的基本定理相互联系，它们利用了无限序列和无限级数收敛到一个明确定义的极限的基本概念。

计量经济学代写

什么是计量经济学？
计量经济学是统计学和数学模型的定量应用，使用数据来发展理论或测试经济学中的现有假设，并根据历史数据预测未来趋势。它对现实世界的数据进行统计试验，然后将结果与被测试的理论进行比较和对比。

根据你是对测试现有理论感兴趣，还是对利用现有数据在这些观察的基础上提出新的假设感兴趣，计量经济学可以细分为两大类：理论和应用。那些经常从事这种实践的人通常被称为计量经济学家。

Matlab代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习和应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|ESE415

Posted on 2023年8月21日2023年8月28日 by statistics-lab

数学代写|凸优化作业代写Convex Optimization代考|Kernel SVM

In many situations, we cannot separate the data with a hyperplane. Instead, we design a nonlinear classification function rather than linear classification function (2.1)
$$
H^{\prime}(\boldsymbol{x})=\boldsymbol{w}^T \phi(x)+b
$$
Any point $\boldsymbol{x}$ giving $H^{\prime}(\boldsymbol{x})>0$ will be recognized as Class I, and any point $\boldsymbol{x}$ giving $H^{\prime}(\boldsymbol{x})<0$ will be recognized as Class II.

Suppose we use nonlinear classification functions and get the following convex optimization problem
$$
\begin{array}{ll}
\min {\boldsymbol{w}, b} & \frac{1}{2}|\boldsymbol{w}|_2^2 \ \text { s.t. } & y_i\left[\boldsymbol{w}^T \phi\left(\boldsymbol{x}_i\right)+b\right] \geq 1, i=1, \ldots, L \end{array} $$ We form the generalized Lagrangian function as $$ L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2}|\boldsymbol{w}|_2^2-\sum{i=1}^L \alpha_i\left[y_i\left(\boldsymbol{w}^T \phi\left(\boldsymbol{x}_i\right)+b\right)-1\right]
$$
Letting its partial derivatives with respect to $w$ and $b$ be zero, we have

$$
\begin{aligned}
& \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha})}{\partial \boldsymbol{w}}=0 \Longrightarrow \boldsymbol{w}=\sum_{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right) \ & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha})}{\partial b}=0 \Longrightarrow \sum{i=1}^L \alpha_i y_i=0
\end{aligned}
$$
Further eliminating the primal decision variables $\boldsymbol{w}$ and $b$, we have the objective of the Lagrange dual problem as
$$
\begin{aligned}
L(\boldsymbol{w}, b, \boldsymbol{\alpha}) & =\frac{1}{2} \boldsymbol{w}^T\left[\sum_{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right)\right]-\sum{i=1}^L \alpha_i y_i \boldsymbol{w}^T \phi\left(\boldsymbol{x}i\right)-\sum{i=1}^L \alpha_i y_i b+\sum_{i=1}^L \alpha_i \
& =-\frac{1}{2} \boldsymbol{w}^T \sum_{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right)-\sum{i=1}^L \alpha_i y_i b+\sum_{i=1}^L \alpha_i \
& =-\frac{1}{2}\left[\sum_{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right)\right]^T \sum{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right)-b\left(\sum{i=1}^L \alpha_i y_i\right)+\sum_{i=1}^L \alpha_i \
& =\sum_{i=1}^L \alpha_i-\frac{1}{2} \sum_{i=1}^L \sum_{j=1}^L y_i y_j \alpha_i \alpha_j\left[\phi\left(\boldsymbol{x}_i\right)\right]^T \phi\left(\boldsymbol{x}_j\right)
\end{aligned}
$$

数学代写|凸优化作业代写Convex Optimization代考|Multi-kernel SVM

One problem of kernel methods is that the resulting decision function is sometimes hard to interpret and is thus difficult to extract relevant knowledge about the problem. We can solve this problem by considering convex combinations of $K$ kernel functions, each of which has distinct meaning. The resulting multi-kernel SVM can then be given as
$$
\Theta(\boldsymbol{x}, \boldsymbol{y})=\phi(\boldsymbol{x})^T \phi(\boldsymbol{y})=\left[\sum_{k=1}^K \beta_k \phi_k\left(\boldsymbol{x}i\right)\right]^T\left[\sum{k=1}^K \beta_k \phi_k\left(\boldsymbol{y}_j\right)\right]
$$
$$
=\sum_{k=1}^K \beta_k^2 \phi_k(\boldsymbol{x})^T \phi_k(\boldsymbol{y})=\sum_{k=1}^K \beta_k^2 \Theta_k(\boldsymbol{x}, \boldsymbol{y})
$$
with preselected coefficients $\beta_k \geq 0$, where each kernel function $\Theta_k(\boldsymbol{x}, \boldsymbol{y})$ uses only a distinct set of features.
Suppose the primal problem is written as
$$
\begin{array}{ll}
\min {\boldsymbol{w}, b} & \frac{1}{2} \sum{k=1}^K \beta_k\left|\boldsymbol{w}k\right|_2^2 \ \text { s.t. } & y_i\left{\sum{k=1}^K\left[\beta_k \boldsymbol{w}_k^T \phi_k\left(\boldsymbol{x}_i\right)\right]+b\right} \geq 1, i=1, \ldots, L
\end{array}
$$
We form the generalized Lagrangian function as
$$
L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2} \sum_{k=1}^K \beta_k\left|\boldsymbol{w}k\right|_2^2-\sum{i=1}^L \alpha_i\left{y_i \sum_{k=1}^K\left[\beta_k \boldsymbol{w}k^T \phi_k\left(\boldsymbol{x}_i\right)\right]+y_i b-1\right} $$ Letting its partial derivatives with respect to $w$ and $b$ be zero, we have $$ \begin{aligned} & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha})}{\partial \boldsymbol{w}_k}=0 \Longrightarrow \boldsymbol{w}_k=\sum{i=1}^L \alpha_i y_i \phi_k\left(\boldsymbol{x}i\right) \ & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha})}{\partial b}=0 \Longrightarrow \sum{i=1}^L \alpha_i y_i=0
\end{aligned}
$$
Further eliminating the primal decision variables $\boldsymbol{w}$ and $b$, we have the objective of the Lagrange dual problem as
$$
\begin{aligned}
L(\boldsymbol{w}, b, \boldsymbol{\alpha}) & =\frac{1}{2} \sum_{k=1}^K \beta_k\left|\boldsymbol{w}k\right|_2^2-\sum{i=1}^L \alpha_i y_i \sum_{k=1}^K\left[\beta_k \boldsymbol{w}k^T \phi_k\left(\boldsymbol{x}_i\right)\right]+\sum{i=1}^L \alpha_i \
& =\sum_{i=1}^L \alpha_i-\frac{1}{2} \sum_{i=1}^L \sum_{j=1}^L y_i y_j \alpha_i \alpha_j \sum_{k=1}^K \beta_k \phi_k\left(\boldsymbol{x}_i\right)^T \phi_k\left(\boldsymbol{x}_j\right)
\end{aligned}
$$

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|Kernel SVM

在许多情况下，我们不能用超平面分离数据。相反，我们设计了一个非线性分类函数，而不是线性分类函数(2.1)。
$$
H^{\prime}(\boldsymbol{x})=\boldsymbol{w}^T \phi(x)+b
$$
任何得分$\boldsymbol{x}$给予$H^{\prime}(\boldsymbol{x})>0$将被认定为一级，任何得分$\boldsymbol{x}$给予$H^{\prime}(\boldsymbol{x})<0$将被认定为二级。

假设我们使用非线性分类函数，得到以下凸优化问题
$$
\begin{array}{ll}
\min {\boldsymbol{w}, b} & \frac{1}{2}|\boldsymbol{w}|_2^2 \ \text { s.t. } & y_i\left[\boldsymbol{w}^T \phi\left(\boldsymbol{x}_i\right)+b\right] \geq 1, i=1, \ldots, L \end{array} $$我们形成广义拉格朗日函数$$ L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2}|\boldsymbol{w}|_2^2-\sum{i=1}^L \alpha_i\left[y_i\left(\boldsymbol{w}^T \phi\left(\boldsymbol{x}_i\right)+b\right)-1\right]
$$
令它对$w$和$b$的偏导为零，我们有

$$
\begin{aligned}
& \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha})}{\partial \boldsymbol{w}}=0 \Longrightarrow \boldsymbol{w}=\sum_{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right) \ & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha})}{\partial b}=0 \Longrightarrow \sum{i=1}^L \alpha_i y_i=0
\end{aligned}
$$
进一步消去原始决策变量$\boldsymbol{w}$和$b$，我们得到拉格朗日对偶问题的目标为
$$
\begin{aligned}
L(\boldsymbol{w}, b, \boldsymbol{\alpha}) & =\frac{1}{2} \boldsymbol{w}^T\left[\sum_{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right)\right]-\sum{i=1}^L \alpha_i y_i \boldsymbol{w}^T \phi\left(\boldsymbol{x}i\right)-\sum{i=1}^L \alpha_i y_i b+\sum_{i=1}^L \alpha_i \
& =-\frac{1}{2} \boldsymbol{w}^T \sum_{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right)-\sum{i=1}^L \alpha_i y_i b+\sum_{i=1}^L \alpha_i \
& =-\frac{1}{2}\left[\sum_{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right)\right]^T \sum{i=1}^L \alpha_i y_i \phi\left(\boldsymbol{x}i\right)-b\left(\sum{i=1}^L \alpha_i y_i\right)+\sum_{i=1}^L \alpha_i \
& =\sum_{i=1}^L \alpha_i-\frac{1}{2} \sum_{i=1}^L \sum_{j=1}^L y_i y_j \alpha_i \alpha_j\left[\phi\left(\boldsymbol{x}_i\right)\right]^T \phi\left(\boldsymbol{x}_j\right)
\end{aligned}
$$

数学代写|凸优化作业代写Convex Optimization代考|Multi-kernel SVM

核方法的一个问题是，所得到的决策函数有时难以解释，因此很难提取有关问题的相关知识。我们可以通过考虑的凸组合来解决这个问题 $K$ 核函数，每个都有不同的含义。得到的多核支持向量机可以表示为
$$
\Theta(\boldsymbol{x}, \boldsymbol{y})=\phi(\boldsymbol{x})^T \phi(\boldsymbol{y})=\left[\sum_{k=1}^K \beta_k \phi_k\left(\boldsymbol{x}i\right)\right]^T\left[\sum{k=1}^K \beta_k \phi_k\left(\boldsymbol{y}_j\right)\right]
$$

$$
=\sum_{k=1}^K \beta_k^2 \phi_k(\boldsymbol{x})^T \phi_k(\boldsymbol{y})=\sum_{k=1}^K \beta_k^2 \Theta_k(\boldsymbol{x}, \boldsymbol{y})
$$
有预先选定的系数 $\beta_k \geq 0$，其中每个核函数 $\Theta_k(\boldsymbol{x}, \boldsymbol{y})$ 只使用一组不同的特性。
假设原始问题写成
$$
\begin{array}{ll}
\min {\boldsymbol{w}, b} & \frac{1}{2} \sum{k=1}^K \beta_k\left|\boldsymbol{w}k\right|2^2 \ \text { s.t. } & y_i\left{\sum{k=1}^K\left[\beta_k \boldsymbol{w}_k^T \phi_k\left(\boldsymbol{x}_i\right)\right]+b\right} \geq 1, i=1, \ldots, L \end{array} $$ 我们形成广义拉格朗日函数为 $$ L(\boldsymbol{w}, b, \boldsymbol{\alpha})=\frac{1}{2} \sum{k=1}^K \beta_k\left|\boldsymbol{w}k\right|2^2-\sum{i=1}^L \alpha_i\left{y_i \sum{k=1}^K\left[\beta_k \boldsymbol{w}k^T \phi_k\left(\boldsymbol{x}i\right)\right]+y_i b-1\right} $$ 让它的偏导数 $w$ 和 $b$ 等于0，我们有 $$ \begin{aligned} & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha})}{\partial \boldsymbol{w}_k}=0 \Longrightarrow \boldsymbol{w}_k=\sum{i=1}^L \alpha_i y_i \phi_k\left(\boldsymbol{x}i\right) \ & \frac{\partial L(\boldsymbol{w}, b, \boldsymbol{\alpha})}{\partial b}=0 \Longrightarrow \sum{i=1}^L \alpha_i y_i=0 \end{aligned} $$ 进一步消除原始决策变量 $\boldsymbol{w}$ 和 $b$，我们有拉格朗日对偶问题的目标 $$ \begin{aligned} L(\boldsymbol{w}, b, \boldsymbol{\alpha}) & =\frac{1}{2} \sum{k=1}^K \beta_k\left|\boldsymbol{w}k\right|2^2-\sum{i=1}^L \alpha_i y_i \sum{k=1}^K\left[\beta_k \boldsymbol{w}k^T \phi_k\left(\boldsymbol{x}i\right)\right]+\sum{i=1}^L \alpha_i \ & =\sum{i=1}^L \alpha_i-\frac{1}{2} \sum_{i=1}^L \sum_{j=1}^L y_i y_j \alpha_i \alpha_j \sum_{k=1}^K \beta_k \phi_k\left(\boldsymbol{x}_i\right)^T \phi_k\left(\boldsymbol{x}_j\right)
\end{aligned}
$$

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

微观经济学代写

线性代数代写

线性代数是数学的一个分支，涉及线性方程，如：线性图，如：以及它们在向量空间和通过矩阵的表示。线性代数是几乎所有数学领域的核心。

博弈论代写

微积分代写

计量经济学代写

Matlab代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|IEMS458

Posted on 2023年8月21日2023年8月28日 by statistics-lab

数学代写|凸优化作业代写Convex Optimization代考|Gradient Descent and Coordinate Descent

Let us consider a minimization problem without constraints
$$
\min _{\boldsymbol{x}} f(\boldsymbol{x})
$$
where $\boldsymbol{x} \in \mathbb{R}^n$ is the optimization variable and the function $f: \mathbb{R}^n \mapsto \mathbb{R}$ is the objective function.

Suppose we start from a point $\boldsymbol{x}_0 \in \mathbb{R}^n$. If the function $f(\boldsymbol{x})$ is defined and differentiable in a neighborhood of $\boldsymbol{x}_0$, then $f(\boldsymbol{x})$ decreases fastest if we move in the direction of the negative gradient of $\nabla f(\boldsymbol{x})$ at $\boldsymbol{x}_0$. When we move a small enough distance, in other words, for a small enough $\gamma_0>0$, we reach a new point $\boldsymbol{x}_1$
$$
\boldsymbol{x}_1=\boldsymbol{x}_0-\gamma_0 \nabla f\left(\boldsymbol{x}_0\right)
$$

and it follows that
$$
f\left(x_0\right) \geq f\left(x_1\right)
$$
Consider the sequence $\left{\boldsymbol{x}0, \boldsymbol{x}_1, \ldots, \boldsymbol{x}_k\right}$ such that $$ \boldsymbol{x}{k+1}=\boldsymbol{x}_k-\gamma_k \nabla f\left(\boldsymbol{x}_k\right), \gamma_k>0
$$
we have
$$
f\left(\boldsymbol{x}0\right) \geq f\left(\boldsymbol{x}_1\right) \geq \cdots f\left(\boldsymbol{x}_k\right) \geq f\left(\boldsymbol{x}{k+1}\right) \geq \cdots
$$
and hopefully this sequence converges to the desired local minimum [9]; see Fig. 1.1 for an illustration.

We call such search algorithm as gradient descent algorithm. When the function $f(\boldsymbol{x})$ is convex, all local minima are meanwhile the global minima, and gradient descent algorithm can converge to the global solution.

Another widely used search strategy is coordinate descent algorithm. It is based on the fact that the minimization of a multivariate function can be achieved by iteratively minimizing it along one direction at each time [10]. For the above problem (1.17), we can iterate through each direction, one at a time, minimizing the objective function with respect that coordinate direction as
$$
\boldsymbol{x}i^{k+1}=\arg \min {\boldsymbol{y} \in \mathbb{R}} f\left(x_1^{k+1}, \ldots, x_{i-1}^{k+1}, y, x_{i+1}^k, \ldots, x_n^k\right)
$$
This will generate a sequence $\left{\boldsymbol{x}_0, \boldsymbol{x}_1, \ldots, \boldsymbol{x}_k, \ldots\right}$ such that $f\left(\boldsymbol{x}_0\right) \geq f\left(\boldsymbol{x}_1\right) \geq$ $\cdots f\left(x_k\right) \geq \cdots$. If this multivariate function is a convex function, this sequence will finally reach the global optimal solution; see Fig. 1.2 for an illustration.

数学代写|凸优化作业代写Convex Optimization代考|Karush-Kuhn-Tucker (KKT) Conditions

However, when there are constraints for the optimization problems, we cannot move in a direction that only depends on the gradient of objective function. Let us consider the following minimization problem with constraints
$$
\begin{aligned}
& \min _{\boldsymbol{x}} f(\boldsymbol{x}) \
& \text { s.t. } g_i(\boldsymbol{x}) \leq 0, i=1, \ldots, m
\end{aligned}
$$
where the functions $g_i: \mathbb{R}^n \mapsto \mathbb{R}, i=1, \ldots, m$ are the inequality constraint functions. Suppose the domain of this problem is denoted by $\Omega$. A point $x^*$ is called the optimal solution of the optimization problem, if its objective value is the smallest among all vectors satisfying the constraints.

The Lagrangian function associated with the optimization problem (1.25)-(1.26) is defined as
$$
L(\boldsymbol{x}, \boldsymbol{\alpha})=f(\boldsymbol{x})+\sum_{i=1}^m \alpha_i g_i(\boldsymbol{x})
$$

where $\boldsymbol{\alpha}=\left[\alpha_1, \ldots, \alpha_m\right]^T \in \mathbb{R}^{m+}$ is called a dual variable or Lagrange multiplier vector. The scalar $\alpha_i$ is referred to as the Lagrange multiplier associated with the $i$ th constraint.

The Lagrange dual function is defined as the infimum of the Lagrangian function with respect to $\boldsymbol{\alpha}$
$$
q(\boldsymbol{\alpha})=\inf _{\boldsymbol{x} \in \Omega} L(\boldsymbol{x}, \boldsymbol{\alpha})
$$
Denote the optimal value of primal problem (1.25)-(1.26) by $f^$. It can be easily shown that $q(\boldsymbol{\alpha}) \leq f^$. This is called weak duality.

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|Gradient Descent and Coordinate Descent

让我们考虑一个没有约束的最小化问题
$$
\min _{\boldsymbol{x}} f(\boldsymbol{x})
$$
其中$\boldsymbol{x} \in \mathbb{R}^n$为优化变量，$f: \mathbb{R}^n \mapsto \mathbb{R}$为目标函数。

假设我们从一点$\boldsymbol{x}_0 \in \mathbb{R}^n$开始。如果函数$f(\boldsymbol{x})$被定义并且在$\boldsymbol{x}_0$的邻域内可微，那么如果我们在$\boldsymbol{x}_0$处沿负梯度$\nabla f(\boldsymbol{x})$的方向移动，那么$f(\boldsymbol{x})$减小得最快。当我们移动足够小的距离，换句话说，对于一个足够小的$\gamma_0>0$，我们到达一个新的点 $\boldsymbol{x}_1$
$$
\boldsymbol{x}_1=\boldsymbol{x}_0-\gamma_0 \nabla f\left(\boldsymbol{x}_0\right)
$$

因此
$$
f\left(x_0\right) \geq f\left(x_1\right)
$$
考虑顺序$\left{\boldsymbol{x}0, \boldsymbol{x}_1, \ldots, \boldsymbol{x}_k\right}$，这样$$ \boldsymbol{x}{k+1}=\boldsymbol{x}_k-\gamma_k \nabla f\left(\boldsymbol{x}_k\right), \gamma_k>0
$$
我们有
$$
f\left(\boldsymbol{x}0\right) \geq f\left(\boldsymbol{x}_1\right) \geq \cdots f\left(\boldsymbol{x}_k\right) \geq f\left(\boldsymbol{x}{k+1}\right) \geq \cdots
$$
希望这个序列收敛到期望的局部最小值[9];如图1.1所示。

我们称这种搜索算法为梯度下降算法。当函数$f(\boldsymbol{x})$为凸时，所有的局部极小值同时也是全局极小值，梯度下降算法可以收敛到全局解。

另一种广泛使用的搜索策略是坐标下降算法。它基于这样一个事实，即多元函数的最小化可以通过每次沿一个方向迭代最小化来实现[10]。对于上面的问题(1.17)，我们可以在每个方向上迭代，一次一个，使目标函数相对于坐标方向的最小值为
$$
\boldsymbol{x}i^{k+1}=\arg \min {\boldsymbol{y} \in \mathbb{R}} f\left(x_1^{k+1}, \ldots, x_{i-1}^{k+1}, y, x_{i+1}^k, \ldots, x_n^k\right)
$$
这将生成一个序列$\left{\boldsymbol{x}_0, \boldsymbol{x}_1, \ldots, \boldsymbol{x}_k, \ldots\right}$，例如$f\left(\boldsymbol{x}_0\right) \geq f\left(\boldsymbol{x}_1\right) \geq$$\cdots f\left(x_k\right) \geq \cdots$。如果此多元函数为凸函数，则该序列最终将达到全局最优解;如图1.2所示。

数学代写|凸优化作业代写Convex Optimization代考|Karush-Kuhn-Tucker (KKT) Conditions

然而，当优化问题存在约束条件时，我们不能只依赖于目标函数的梯度方向运动。让我们考虑下面这个有约束的最小化问题
$$
\begin{aligned}
& \min _{\boldsymbol{x}} f(\boldsymbol{x}) \
& \text { s.t. } g_i(\boldsymbol{x}) \leq 0, i=1, \ldots, m
\end{aligned}
$$
其中$g_i: \mathbb{R}^n \mapsto \mathbb{R}, i=1, \ldots, m$是不等式约束函数。假设这个问题的定义域用$\Omega$表示。如果点$x^*$的目标值在满足约束的所有向量中最小，则称为优化问题的最优解。

与优化问题(1.25)-(1.26)相关的拉格朗日函数定义为
$$
L(\boldsymbol{x}, \boldsymbol{\alpha})=f(\boldsymbol{x})+\sum_{i=1}^m \alpha_i g_i(\boldsymbol{x})
$$

其中$\boldsymbol{\alpha}=\left[\alpha_1, \ldots, \alpha_m\right]^T \in \mathbb{R}^{m+}$称为对偶变量或拉格朗日乘子向量。标量$\alpha_i$被称为与$i$第1个约束相关联的拉格朗日乘子。

拉格朗日对偶函数定义为拉格朗日函数对$\boldsymbol{\alpha}$的极小值
$$
q(\boldsymbol{\alpha})=\inf _{\boldsymbol{x} \in \Omega} L(\boldsymbol{x}, \boldsymbol{\alpha})
$$
用$f^$表示原始问题(1.25)-(1.26)的最优值。可以很容易地证明$q(\boldsymbol{\alpha}) \leq f^$。这被称为弱对偶。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

微观经济学代写

线性代数代写

线性代数是数学的一个分支，涉及线性方程，如：线性图，如：以及它们在向量空间和通过矩阵的表示。线性代数是几乎所有数学领域的核心。

博弈论代写

微积分代写

计量经济学代写

Matlab代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|CS168

Posted on 2023年7月31日2023年8月25日 by statistics-lab

数学代写|凸优化作业代写Convex Optimization代考|Rate of Convergence

The following proposition describes how the convergence rate of the proximal algorithm depends on the magnitude of $c_k$ and on the order of growth of $f$ near the optimal solution set (see also Fig. 5.1.3).

Proposition 5.1.4: (Rate of Convergence) Assume that $X^$ is nonempty and that for some scalars $\beta>0, \delta>0$, and $\gamma \geq 1$, we have $$ f^+\beta(d(x))^\gamma \leq f(x), \quad \forall x \in \Re^n \text { with } d(x) \leq \delta,
$$
where
$$
d(x)=\min {x^* \in X^}\left|x-x^\right|
$$
Let also
$$
\sum{k=0}^{\infty} c_k=\infty,
$$
so that the sequence $\left{x_k\right}$ generated by the proximal algorithm (5.1) converges to some point in $X^*$ by Prop. 5.1.3. Then:
(a) For all $k$ sufficiently large, we have
$$
d\left(x_{k+1}\right)+\beta c_k\left(d\left(x_{k+1}\right)\right)^{\gamma-1} \leq d\left(x_k\right)
$$
if $\gamma>1$, and

$$
d\left(x_{k+1}\right)+\beta c_k \leq d\left(x_k\right),
$$
if $\gamma=1$ and $x_{k+1} \notin X^$. (b) (Superlinear Convergence) Let $1<\gamma<2$ and $x_k \notin X^$ for all $k$. Then if $\inf {k \geq 0} c_k>0$, $$ \limsup {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{\left(d\left(x_k\right)\right)^{1 /(\gamma-1)}}<\infty . $$ (c) (Linear Convergence) Let $\gamma=2$ and $x_k \notin X *$ for all $k$. Then if $\lim {k \rightarrow \infty} c_k=\bar{c}$ with $\bar{c} \in(0, \infty)$, $$ \limsup {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{d\left(x_k\right)} \leq \frac{1}{1+\beta \bar{c}}, $$ while if $\lim {k \rightarrow \infty} c_k=\infty$, $$ \lim {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{d\left(x_k\right)}=0 . $$ (d) (Sublinear Convergence) Let $\gamma>2$. Then
$$
\limsup {k \rightarrow \infty} \frac{d\left(x{k+1}\right)}{d\left(x_k\right)^{2 / \gamma}}<\infty .
$$

数学代写|凸优化作业代写Convex Optimization代考|Gradient Interpretation

An interesting interpretation of the proximal iteration is obtained by considering the function
$$
\phi_c(z)=\inf {x \in \Re^n}\left{f(x)+\frac{1}{2 c}|x-z|^2\right} $$ for a fixed positive value of $c$. It can be seen that $$ \inf {x \in \Re^n} f(x) \leq \phi_c(z) \leq f(z), \quad \forall z \in \Re^n,
$$
from which it follows that the set of minima of $f$ and $\phi_c$ coincide (this is also evident from the geometric view of the proximal minimization given in Fig. 5.1.7). The following proposition shows that $\phi_c$ is a convex differentiable function, and derives its gradient.

Proposition 5.1.7: The funetion $\phi_c$ of Eq. (5.14) is convex and differentiable, and we have
$$
\nabla \phi_c(z)=\frac{z-x_c(z)}{c} \quad \forall z \in \Re^n,
$$
where $x_c(z)$ is the unique minimizer in Eq. (5.14). Moreover
$$
\nabla \phi_c(z) \in \partial f\left(x_c(z)\right), \quad \forall z \in \Re^n
$$
Proof: We first note that $\phi_c$ is convex, since it is obtained by partial minimization of $f(x)+\frac{1}{2 c}|x-z|^2$, which is convex as a function of $(x, z)$ (cf. Prop. 3.3.1 in Appendix B). Furthermore, $\phi_c$ is real-valued, since the infimum in Eq. (5.14) is attained.

Let us fix $z$, and for notational simplicity, denote $\bar{z}=x_c(z)$. To show that $\phi_c$ is differentiable with the given form of gradient, we note that by the optimality condition of Prop. 3.1.4, we have $v \in \partial \phi_c(z)$, or equivalently $0 \in \partial \phi_c(z)-v$, if and only if $z$ attains the minimum over $y \in \Re^n$ of
$$
\phi_c(y)-v^{\prime} y=\inf _{x \in \Re^n}\left{f(x)+\frac{1}{2 c}|x-y|^2\right}-v^{\prime} y
$$

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|Rate of Convergence

下面的命题描述了近端算法的收敛速度如何取决于$c_k$的大小和$f$在最优解集附近的增长顺序(参见图5.1.3)。

命题5.1.4:(收敛速度)假设$X^$是非空的，并且对于一些标量$\beta>0, \delta>0$和$\gamma \geq 1$，我们有$$ f^+\beta(d(x))^\gamma \leq f(x), \quad \forall x \in \Re^n \text { with } d(x) \leq \delta,
$$
在哪里
$$
d(x)=\min {x^* \in X^}\left|x-x^\right|
$$
让我们
$$
\sum{k=0}^{\infty} c_k=\infty,
$$
使得近端算法(5.1)生成的序列$\left{x_k\right}$通过Prop. 5.1.3收敛到$X^*$中的某个点。然后:
(a)对于所有$k$足够大的，我们有
$$
d\left(x_{k+1}\right)+\beta c_k\left(d\left(x_{k+1}\right)\right)^{\gamma-1} \leq d\left(x_k\right)
$$
如$\gamma>1$，及

$$
d\left(x_{k+1}\right)+\beta c_k \leq d\left(x_k\right),
$$
如$\gamma=1$和$x_{k+1} \notin X^$。(b)(超线性收敛)令$1<\gamma<2$和$x_k \notin X^$对所有$k$。然后如果$\inf {k \geq 0} c_k>0$, $$ \limsup {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{\left(d\left(x_k\right)\right)^{1 /(\gamma-1)}}<\infty . $$ (c)(线性收敛)令$\gamma=2$和$x_k \notin X *$对于所有$k$。如果$\lim {k \rightarrow \infty} c_k=\bar{c}$有$\bar{c} \in(0, \infty)$, $$ \limsup {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{d\left(x_k\right)} \leq \frac{1}{1+\beta \bar{c}}, $$如果$\lim {k \rightarrow \infty} c_k=\infty$, $$ \lim {k \rightarrow \infty} \frac{d\left(x_{k+1}\right)}{d\left(x_k\right)}=0 . $$ (d)(次线性收敛)设$\gamma>2$。然后
$$
\limsup {k \rightarrow \infty} \frac{d\left(x{k+1}\right)}{d\left(x_k\right)^{2 / \gamma}}<\infty .
$$

数学代写|凸优化作业代写Convex Optimization代考|Gradient Interpretation

通过考虑函数，得到了对近端迭代的一个有趣的解释
$$
\phi_c(z)=\inf {x \in \Re^n}\left{f(x)+\frac{1}{2 c}|x-z|^2\right} $$为固定正值$c$。可以看出$$ \inf {x \in \Re^n} f(x) \leq \phi_c(z) \leq f(z), \quad \forall z \in \Re^n,
$$
由此可知，$f$和$\phi_c$的最小值集合重合(从图5.1.7给出的近端极小值的几何视图也可以看出这一点)。下面的命题证明$\phi_c$是一个凸可微函数，并推导出它的梯度。

命题5.1.7:式(5.14)的函数$\phi_c$是凸可微的，有
$$
\nabla \phi_c(z)=\frac{z-x_c(z)}{c} \quad \forall z \in \Re^n,
$$
其中$x_c(z)$是式(5.14)中唯一的最小值。而且
$$
\nabla \phi_c(z) \in \partial f\left(x_c(z)\right), \quad \forall z \in \Re^n
$$
证明:我们首先注意到$\phi_c$是凸的，因为它是通过$f(x)+\frac{1}{2 c}|x-z|^2$的部分最小化得到的，而作为$(x, z)$的函数是凸的(参见附录B中的Prop. 3.3.1)。此外，$\phi_c$是实值的，因为在Eq.(5.14)中得到了极小值。

让我们修复$z$，为了表示简单，表示$\bar{z}=x_c(z)$。为了证明$\phi_c$对给定形式的梯度是可微的，我们注意到，根据Prop. 3.1.4的最优性条件，我们有$v \in \partial \phi_c(z)$，或等价的$0 \in \partial \phi_c(z)-v$，当且仅当$z$在$y \in \Re^n$上达到最小值
$$
\phi_c(y)-v^{\prime} y=\inf _{x \in \Re^n}\left{f(x)+\frac{1}{2 c}|x-y|^2\right}-v^{\prime} y
$$

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

微观经济学代写

线性代数代写

线性代数是数学的一个分支，涉及线性方程，如：线性图，如：以及它们在向量空间和通过矩阵的表示。线性代数是几乎所有数学领域的核心。

博弈论代写

微积分代写

计量经济学代写

Matlab代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|ESE6050

Posted on 2023年7月31日2023年8月25日 by statistics-lab

数学代写|凸优化作业代写Convex Optimization代考|EE364a

数学代写|凸优化作业代写Convex Optimization代考|GENERALIZED SIMPLICIAL DECOMPOSITION

In this section we will aim to highlight some of the applications and the fine points of the general algorithm of the preceding section. As vehicle we will use the simplicial decomposition approach, and the problem
$$
\begin{aligned}
& \text { minimize } f(x)+c(x) \
& \text { subject to } x \in \Re^n,
\end{aligned}
$$
where $f: \Re^n \mapsto(-\infty, \infty]$ and $c: \Re^n \mapsto(-\infty, \infty]$ are closed proper convex functions. This is the Fenchel duality context, and it contains as a special case the problem to which the ordinary simplicial decomposition method of Section 4.2 applies (where $f$ is differentiable, and $c$ is the indicator function of a bounded polyhedral set). Here we will mainly focus on the case where $f$ is nondifferentiable and possibly extended real-valued.

We apply the polyhedral approximation scheme of the preceding section to the equivalent EMP
$$
\begin{aligned}
& \operatorname{minimize} f_1\left(x_1\right)+f_2\left(x_2\right) \
& \text { subject to }\left(x_1, x_2\right) \in S,
\end{aligned}
$$

where
$$
f_1\left(x_1\right)=f\left(x_1\right), \quad f_2\left(x_2\right)=c\left(x_2\right), \quad S=\left{\left(x_1, x_2\right) \mid x_1=x_2\right} .
$$
Note that the orthogonal subspace has the form
$$
S^{\perp}=\left{\left(\lambda_1, \lambda_2\right) \mid \lambda_1=-\lambda_2\right}=\left{(\lambda,-\lambda) \mid \lambda \in \Re^n\right} .
$$
Optimal primal and dual solutions of this EMP problem are of the form $\left(x^{o p t}, x^{o p t}\right)$ and $\left(\lambda^{o p t},-\lambda^{o p t}\right)$, with
$$
\lambda^{o p t} \in \partial f\left(x^{o p t}\right), \quad-\lambda^{o p t} \in \partial c\left(x^{o p t}\right),
$$
consistently with the optimality conditions of Prop. 4.4.1. A pair of such optimal solutions $\left(x^{o p t}, \lambda^{o p t}\right)$ satisfies the necessary and sufficient optimality conditions of the Fenchel Duality Theorem [Prop. 1.2.1(c)] for the original problem.

数学代写|凸优化作业代写Convex Optimization代考|Dual/Cutting Plane Implementation

Let us also provide a dual implementation, which is an equivalent outer linearization/cutting plane-type of method. The Fenchel dual of the minimization of $f+c$ [cf. Eq. (4.31)] is
$$
\begin{aligned}
& \text { minimize } f^{\star}(\lambda)+c^{\star}(-\lambda) \
& \text { subject to } \lambda \in \Re^n,
\end{aligned}
$$
where $f^{\star}$ and $c^{\star}$ are the conjugates of $f$ and $c$, respectively. According to the theory of the preceding section, the generalized simplicial decomposition algorithm (4.32)-(4.34) can alternatively be implemented by replacing $c^$ by a piecewise linear/cutting plane outer linearization, while leaving $f^$ unchanged, i.e., by solving at iteration $k$ the problem
$$
\begin{aligned}
& \text { minimize } f^{\star}(\lambda)+C_k^{\star}(-\lambda) \
& \text { subject to } \lambda \in \Re^n,
\end{aligned}
$$
where $C_k^{\star}$ is an outer linearization of $c^{\star}$ (the conjugate of $C_k$ ). This problem is the (Fenchel) dual of problem (4.32) [or equivalently, the low-dimensional problem (4.36)].

Note that solutions of problem (4.37) are the subgradients $\lambda_k$ satisfying $\lambda_k \in \partial f\left(x_k\right)$ and $-\lambda_k \in \partial C_k\left(x_k\right)$, where $x_k$ is the solution of the problem (4.32) [cf. Eq. (4.33)], while the associated subgradient of $c^*$ at $-\lambda_k$ is the vector $\tilde{x}k$ generated by Eq. (4.34), as shown in Fig. 4.5.1. In fact, the function $C_k^{\star}$ has the form $$ C_k^{\star}(-\lambda)=\max {j \in J_k}\left{c\left(-\lambda_j\right)-\tilde{x}_j^{\prime}\left(\lambda-\lambda_j\right)\right}
$$
where $\lambda_j$ and $\tilde{x}_j$ are vectors that can be obtained either by using the generalized simplicial decomposition method (4.32)-(4.34), or by using its dual, the cutting plane method based on solving the outer approximation problems (4.37). The ordinary cutting plane method, described in the beginning of Section 4.1, is obtained as the special case where $f^{\star}(\lambda) \equiv 0$ [or equivalently, $f(x)=\infty$ if $x \neq 0$, and $f(0)=0$ ].

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|GENERALIZED SIMPLICIAL DECOMPOSITION

在本节中，我们将重点介绍前一节通用算法的一些应用和优点。作为载体，我们将使用简单分解的方法，以及这个问题
$$
\begin{aligned}
& \text { minimize } f(x)+c(x) \
& \text { subject to } x \in \Re^n,
\end{aligned}
$$
其中$f: \Re^n \mapsto(-\infty, \infty]$和$c: \Re^n \mapsto(-\infty, \infty]$为闭固有凸函数。这是Fenchel对偶上下文，它包含了一个特殊的问题，适用于4.2节的普通简单分解方法(其中$f$是可微的，$c$是有界多面体集的指示函数)。这里我们主要关注$f$不可微且可能是扩展实值的情况。

我们将前一节的多面体近似格式应用于等效EMP
$$
\begin{aligned}
& \operatorname{minimize} f_1\left(x_1\right)+f_2\left(x_2\right) \
& \text { subject to }\left(x_1, x_2\right) \in S,
\end{aligned}
$$

在哪里
$$
f_1\left(x_1\right)=f\left(x_1\right), \quad f_2\left(x_2\right)=c\left(x_2\right), \quad S=\left{\left(x_1, x_2\right) \mid x_1=x_2\right} .
$$
注意正交子空间有这样的形式
$$
S^{\perp}=\left{\left(\lambda_1, \lambda_2\right) \mid \lambda_1=-\lambda_2\right}=\left{(\lambda,-\lambda) \mid \lambda \in \Re^n\right} .
$$
该EMP问题的最优原解和对偶解分别为$\left(x^{o p t}, x^{o p t}\right)$和$\left(\lambda^{o p t},-\lambda^{o p t}\right)$，其中
$$
\lambda^{o p t} \in \partial f\left(x^{o p t}\right), \quad-\lambda^{o p t} \in \partial c\left(x^{o p t}\right),
$$
符合Prop. 4.4.1的最优性条件。一对这样的最优解$\left(x^{o p t}, \lambda^{o p t}\right)$满足Fenchel对偶定理的充分必要最优性条件。1.2.1(c)]的原始问题。

数学代写|凸优化作业代写Convex Optimization代考|Dual/Cutting Plane Implementation

让我们还提供一个双重实现，这是一个等效的外线性化/切割平面类型的方法。最小化$f+c$的Fenchel对偶[参见式(4.31)]为
$$
\begin{aligned}
& \text { minimize } f^{\star}(\lambda)+c^{\star}(-\lambda) \
& \text { subject to } \lambda \in \Re^n,
\end{aligned}
$$
其中$f^{\star}$和$c^{\star}$分别是$f$和$c$的共轭。根据上一节的理论，广义简单分解算法(4.32)-(4.34)也可以在保持$f^$不变的情况下，用分段线性/切割平面外线性化代替$c^$来实现，即在迭代$k$时求解问题
$$
\begin{aligned}
& \text { minimize } f^{\star}(\lambda)+C_k^{\star}(-\lambda) \
& \text { subject to } \lambda \in \Re^n,
\end{aligned}
$$
其中$C_k^{\star}$是$c^{\star}$的外线性化($C_k$的共轭)。这个问题是问题(4.32)的(Fenchel)对偶[或等价地，低维问题(4.36)]。

注意，问题(4.37)的解是满足$\lambda_k \in \partial f\left(x_k\right)$和$-\lambda_k \in \partial C_k\left(x_k\right)$的子梯度$\lambda_k$，其中$x_k$是问题(4.32)的解[参见Eq.(4.33)]，而$c^*$在$-\lambda_k$的相关子梯度是由Eq.(4.34)生成的向量$\tilde{x}k$，如图4.5.1所示。实际上，函数$C_k^{\star}$的形式是$$ C_k^{\star}(-\lambda)=\max {j \in J_k}\left{c\left(-\lambda_j\right)-\tilde{x}_j^{\prime}\left(\lambda-\lambda_j\right)\right}
$$
其中$\lambda_j$和$\tilde{x}_j$是向量，可以使用广义简单分解方法(4.32)-(4.34)，也可以使用其对偶，即基于求解外部逼近问题的切割平面方法(4.37)来获得。在4.1节开头描述的普通切割平面方法，是作为$f^{\star}(\lambda) \equiv 0$[或等价的$f(x)=\infty$如果$x \neq 0$和$f(0)=0$]的特殊情况得到的。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

微观经济学代写

线性代数代写

线性代数是数学的一个分支，涉及线性方程，如：线性图，如：以及它们在向量空间和通过矩阵的表示。线性代数是几乎所有数学领域的核心。

博弈论代写

微积分代写

计量经济学代写

Matlab代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|EE364a

Posted on 2023年7月31日2023年8月25日 by statistics-lab

数学代写|凸优化作业代写Convex Optimization代考|DUALITY OF INNER AND OUTER LINEARIZATION

We have considered so far cutting plane and simplicial decomposition methods, and we will now aim to connect them via duality. To this end, we define in this section outer and inner linearizations, and we formalize their conjugacy relation and other related properties. An outer linearization of a closed proper convex function $f: \Re^n \mapsto(-\infty, \infty]$ is defined by a finite set of vectors $\left{y_1, \ldots, y_{\ell}\right}$ such that for every $j=1, \ldots, \ell$, we have $y_j \in \partial f\left(x_j\right)$ for some $x_j \in \Re^n$. It is given by
$$
F(x)=\max _{j=1, \ldots, \ell}\left{f\left(x_j\right)+\left(x-x_j\right)^{\prime} y_j\right}, \quad x \in \Re^n,
$$

and it is illustrated in the left side of Fig. 4.3.1. The choices of $x_j$ such that $y_j \in \partial f\left(x_j\right)$ may not be unique, but result in the same function $F(x)$ : the epigraph of $F$ is determined by the supporting hyperplanes to the epigraph of $f$ with normals defined by $y_j$, and the points of support $x_j$ are immaterial. In particular, the definition (4.14) can be equivalently written in terms of the conjugate $f^{\star}$ of $f$ as
$$
F(x)=\max _{j=1, \ldots, \ell}\left{x^{\prime} y_j-f^{\star}\left(y_j\right)\right},
$$
using the relation $x_j^{\prime} y_j=f\left(x_j\right)+f^{\star}\left(y_j\right)$, which is implied by $y_j \in \partial f\left(x_j\right)$ (the Conjugate Subgradient Theorem, Prop. 5.4.3 in Appendix B).

Note that $F(x) \leq f(x)$ for all $x$, so as is true for any outer approximation of $f$, the conjugate $F^{\star}$ satisfies $F^{\star}(y) \geq f^{\star}(y)$ for all $y$. Moreover, it can be shown that $F^{\star}$ is an inner linearization of the conjugate $f^{\star}$, as illustrated in the right side of Fig. 4.3.1. Indeed we have, using Eq. (4.15),
$$
\begin{aligned}
F^{\star}(y)= & \sup {x \in \Re^n}\left{y^{\prime} x-F(x)\right} \ & =\sup {x \in \Re^n}\left{y^{\prime} x-\max {j=1, \ldots, \ell}\left{y_j^{\prime} x-f^{\star}\left(y_j\right)\right}\right}, \ & =\sup {\substack{x \in \Re^n, \xi \in \Re \
y_j^{\prime} x-f^{\star}\left(y_j\right) \leq \xi, j=1, \ldots, \ell}}\left{y^{\prime} x-\xi\right} .
\end{aligned}
$$

数学代写|凸优化作业代写Convex Optimization代考|GENERALIZED POLYHEDRAL APPROXIMATION

We will now consider a unified framework for polyhedral approximation, which combines the cutting plane and simplicial decomposition methods. We consider the problem
$$
\begin{array}{ll}
\operatorname{minimize} & \sum_{i=1}^m f_i\left(x_i\right) \
\text { subject to } & x \in S,
\end{array}
$$
where
$$
x \stackrel{\text { def }}{=}\left(x_1, \ldots, x_m\right),
$$

is a vector in $\Re^{n_1+\cdots+n_m}$, with components $x_i \in \Re^{n_i}, i=1, \ldots, m$, and
$f_i: \Re^{n_i} \mapsto(-\infty, \infty]$ is a closed proper convex function for each $i$, $S$ is a subspace of $\Re^{n_1+\cdots+n_m}$.

We refer to this as an extended monotropic program (EMP for short). $\dagger$
A classical example of EMP is a single commodity network optimization problem, where $x_i$ represents the (scalar) flow of an arc of a directed graph and $S$ is the circulation subspace of the graph (see e.g., [Ber98]). Also problems involving general linear constraints and an additive extended realvalued convex cost function can be converted to EMP. In particular, the problem
$$
\begin{array}{ll}
\text { minimize } & \sum_{i=1}^m f_i\left(x_i\right) \
\text { subject to } & A x=b,
\end{array}
$$
where $A$ is a given matrix and $b$ is a given vector, is equivalent to
$$
\begin{array}{ll}
\text { minimize } & \sum_{i=1}^m f_i\left(x_i\right)+\delta_Z(z) \
\text { subject to } & A x-z=0,
\end{array}
$$
where $z$ is a vector of artificial variables, and $\delta_Z$ is the indicator function of the set $Z={z \mid z=b}$. This is an EMP with constraint subspace
$$
S={(x, z) \mid A x-z=0} .
$$

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|DUALITY OF INNER AND OUTER LINEARIZATION

到目前为止，我们已经考虑了切割平面和简单分解方法，现在我们的目标是通过对偶将它们连接起来。为此，我们在本节中定义了外线性化和内线性化，并形式化了它们的共轭关系和其他相关性质。闭固有凸函数$f: \Re^n \mapsto(-\infty, \infty]$的外线性化由一个有限向量集$\left{y_1, \ldots, y_{\ell}\right}$定义，使得对于每个$j=1, \ldots, \ell$，对于某些$x_j \in \Re^n$，我们有$y_j \in \partial f\left(x_j\right)$。它是由
$$
F(x)=\max _{j=1, \ldots, \ell}\left{f\left(x_j\right)+\left(x-x_j\right)^{\prime} y_j\right}, \quad x \in \Re^n,
$$

如图4.3.1左侧所示。对$x_j$的选择使得$y_j \in \partial f\left(x_j\right)$可能不是唯一的，但会产生相同的函数$F(x)$: $F$的题词是由对$f$题词的支持超平面与$y_j$定义的法线确定的，并且支持$x_j$的点是无关紧要的。特别地，定义(4.14)可以等价地写成$f$ as的共轭$f^{\star}$
$$
F(x)=\max _{j=1, \ldots, \ell}\left{x^{\prime} y_j-f^{\star}\left(y_j\right)\right},
$$
利用$y_j \in \partial f\left(x_j\right)$(共轭次梯度定理，附录B第5.4.3节)隐含的关系$x_j^{\prime} y_j=f\left(x_j\right)+f^{\star}\left(y_j\right)$。

注意，对于所有$x$, $F(x) \leq f(x)$，对于$f$的任何外部近似，共轭$F^{\star}$对所有$y$都满足$F^{\star}(y) \geq f^{\star}(y)$。此外，可以看出$F^{\star}$是共轭方程$f^{\star}$的内线性化，如图4.3.1右侧所示。事实上，我们有，使用式(4.15)，
$$
\begin{aligned}
F^{\star}(y)= & \sup {x \in \Re^n}\left{y^{\prime} x-F(x)\right} \ & =\sup {x \in \Re^n}\left{y^{\prime} x-\max {j=1, \ldots, \ell}\left{y_j^{\prime} x-f^{\star}\left(y_j\right)\right}\right}, \ & =\sup {\substack{x \in \Re^n, \xi \in \Re \
y_j^{\prime} x-f^{\star}\left(y_j\right) \leq \xi, j=1, \ldots, \ell}}\left{y^{\prime} x-\xi\right} .
\end{aligned}
$$

数学代写|凸优化作业代写Convex Optimization代考|GENERALIZED POLYHEDRAL APPROXIMATION

现在我们将考虑一个统一的多面体逼近框架，它结合了切割平面和简单分解方法。我们考虑这个问题
$$
\begin{array}{ll}
\operatorname{minimize} & \sum_{i=1}^m f_i\left(x_i\right) \
\text { subject to } & x \in S,
\end{array}
$$
在哪里
$$
x \stackrel{\text { def }}{=}\left(x_1, \ldots, x_m\right),
$$

是一个向量在$\Re^{n_1+\cdots+n_m}$，与组件$x_i \in \Re^{n_i}, i=1, \ldots, m$，和
$f_i: \Re^{n_i} \mapsto(-\infty, \infty]$是每个$i$的闭固有凸函数，$S$是$\Re^{n_1+\cdots+n_m}$的一个子空间。

我们将其称为扩展单性程序(简称EMP)。$\dagger$
EMP的一个经典例子是单个商品网络优化问题，其中$x_i$表示有向图的弧线的(标量)流，$S$是图的循环子空间(参见示例[Ber98])。此外，涉及一般线性约束和可加扩展重值凸代价函数的问题也可以转换为EMP
$$
\begin{array}{ll}
\text { minimize } & \sum_{i=1}^m f_i\left(x_i\right) \
\text { subject to } & A x=b,
\end{array}
$$
其中$A$是一个给定的矩阵$b$是一个给定的向量，等价于
$$
\begin{array}{ll}
\text { minimize } & \sum_{i=1}^m f_i\left(x_i\right)+\delta_Z(z) \
\text { subject to } & A x-z=0,
\end{array}
$$
式中$z$为人工变量向量，$\delta_Z$为集合$Z={z \mid z=b}$的指标函数。这是一个有约束子空间的EMP
$$
S={(x, z) \mid A x-z=0} .
$$

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

微观经济学代写

线性代数代写

线性代数是数学的一个分支，涉及线性方程，如：线性图，如：以及它们在向量空间和通过矩阵的表示。线性代数是几乎所有数学领域的核心。

博弈论代写

微积分代写

计量经济学代写

Matlab代写

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|Subgradient Methods with Iterate Averaging

Posted on 2023年7月14日2023年7月14日 by statistics-lab

数学代写|凸优化作业代写Convex Optimization代考|Subgradient Methods with Iterate Averaging

If the stepsize $\alpha_k$ in the subgradient method
$$
x_{k+1}=P_X\left(x_k-\alpha_k g_k\right)
$$
is chosen to be large (such as constant or such that the condition $\sum_{k=0}^{\infty} \alpha_k^2<\infty$ is violated) the method may not converge. This exercise shows that by averaging the iterates of the method, we may obtain convergence with larger stepsizes. Let the optimal solution set $X^$ be nonempty, and assume that for some scalar $c$, we have $$ c \geq \sup \left{\left|g_k\right| \mid k=0,1, \ldots\right}, \quad \forall k \geq 0, $$ (cf. Assumption 3.2.1). Assume further that $\alpha_k$ is chosen according to $$ \alpha_k=\frac{\theta}{c \sqrt{k+1}}, \quad k=0,1, \ldots $$ where $\theta$ is a positive constant. Show that $$ f\left(\bar{x}k\right)-f^ \leq c\left(\frac{\min {x^* \in X^}\left|x_0-x^\right|^2}{2 \theta}+\theta \ln (k+2)\right) \frac{1}{\sqrt{k+1}}, \quad k=0,1, \ldots,
$$
where $\bar{x}k$ is the averaged iterate, generated according to $$ \bar{x}_k=\frac{\sum{\ell=0}^k \alpha_{\ell} x_{\ell}}{\sum_{\ell=0}^k \alpha_{\ell}}
$$

similar analysis applies to incremental and to stochastic subgradient methods.
Abbreviated proof: Denote
$$
\delta_k=\frac{1}{2} \min {x^* \in X^}\left|x_k-x^\right|^2 .
$$
Applying Prop. 3.2.2(a) with $y$ equal to the projection of $x_k$ onto $X^$, we obtain $$ \delta{k+1} \leq \delta_k-\alpha_k\left(f\left(x_k\right)-f^\right)+\frac{1}{2} \alpha_k^2 c^2
$$
Adding this inequality from 0 to $k$, and using the fact $\delta_{k+1} \geq 0$,
$$
\sum_{\ell=0}^k \alpha_{\ell}\left(f\left(x_k\right)-f^\right) \leq \delta_0+\frac{1}{2} c^2 \sum_{\ell=0}^k \alpha_{\ell}^2, $$ so by dividing with $\sum_{\ell=0}^k \alpha_{\ell}$, $$ \frac{\sum_{\ell=0}^k \alpha_{\ell} f\left(x_k\right)}{\sum_{\ell=0}^k \alpha_{\ell}}-f^ \leq \frac{\delta_0+\frac{1}{2} c^2 \sum_{\ell=0}^k \alpha_{\ell}^2}{\sum_{\ell=0}^k \alpha_{\ell}}
$$

数学代写|凸优化作业代写Convex Optimization代考|Modified Dynamic Stepsize Rules

Consider the subgradient method
$$
x_{k+1}=P_X\left(x_k-\alpha_k g_k\right)
$$
with the stepsize chosen according to one of the two rules
$$
\alpha_k=\frac{f\left(x_k\right)-f_k}{\max \left{\gamma,\left|g_k\right|^2\right}} \quad \text { or } \quad \alpha_k=\min \left{\gamma, \frac{f\left(x_k\right)-f_k}{\left|g_k\right|^2}\right}
$$

where $\gamma$ is a fixed positive scalar and $f_k$ is given by the dynamic adjustment procedure (3.22)-(3.23). Show that the convergence result of Prop. 3.2.8 still holds. Abbreviated Proof: We proceed by contradiction, as in the proof of Prop. 3.2.8. From Prop. 3.2.2(a) with $y=\bar{y}$, we have for all $k \geq \bar{k}$,
$$
\begin{aligned}
\left|x_{k+1}-\bar{y}\right|^2 & \leq\left|x_k-\bar{y}\right|^2-2 \alpha_k\left(f\left(x_k\right)-f(\bar{y})\right)+\alpha_k^2\left|g_k\right|^2 \
& \leq\left|x_k-\bar{y}\right|^2-2 \alpha_k\left(f\left(x_k\right)-f(\bar{y})\right)+\alpha_k\left(f\left(x_k\right)-f_k\right) \
& =\left|x_k-\bar{y}\right|^2-\alpha_k\left(f\left(x_k\right)-f_k\right)-2 \alpha_k\left(f_k-f(\bar{y})\right) \
& \leq\left|x_k-\bar{y}\right|^2-\alpha_k\left(f\left(x_k\right)-f_k\right) .
\end{aligned}
$$
Hence $\left{x_k\right}$ is bounded, which implies that $\left{g_k\right}$ is also bounded (cf. Prop. 3.1.2). Let $\bar{c}$ be such that $\left|g_k\right| \leq \bar{c}$ for all $k$. Assume that $\alpha_k$ is chosen according to the first rule in Eq. (3.47). Then from the preceding relation we have for all $k \geq \bar{k}$,
$$
\left|x_{k+1}-\bar{y}\right|^2 \leq\left|x_k-\bar{y}\right|^2-\frac{\delta^2}{\max \left{\gamma, \bar{c}^2\right}} .
$$
As in the proof of Prop. 3.2.8, this leads to a contradiction and the result follows. The proof is similar if $\alpha_k$ is chosen according to the second rule in Eq. (3.47).

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|Subgradient Methods with Iterate Averaging

如果步长为$\alpha_k$在子梯度法中
$$
x_{k+1}=P_X\left(x_k-\alpha_k g_k\right)
$$
选择较大(如常数或违反$\sum_{k=0}^{\infty} \alpha_k^2<\infty$条件)的方法可能不收敛。这个练习表明，通过平均方法的迭代，我们可以在较大的步长下获得收敛性。令最优解集$X^$非空，并假设对于某个标量$c$，我们有$$ c \geq \sup \left{\left|g_k\right| \mid k=0,1, \ldots\right}, \quad \forall k \geq 0, $$(参见假设3.2.1)。进一步假设$\alpha_k$是根据$$ \alpha_k=\frac{\theta}{c \sqrt{k+1}}, \quad k=0,1, \ldots $$选择的，其中$\theta$是一个正常数。展示一下$$ f\left(\bar{x}k\right)-f^ \leq c\left(\frac{\min {x^* \in X^}\left|x_0-x^\right|^2}{2 \theta}+\theta \ln (k+2)\right) \frac{1}{\sqrt{k+1}}, \quad k=0,1, \ldots,
$$
其中$\bar{x}k$是根据生成的平均迭代 $$ \bar{x}k=\frac{\sum{\ell=0}^k \alpha{\ell} x_{\ell}}{\sum_{\ell=0}^k \alpha_{\ell}}
$$

类似的分析也适用于增量法和随机次梯度法。
缩写证明:
$$
\delta_k=\frac{1}{2} \min {x^* \in X^}\left|x_k-x^\right|^2 .
$$
应用Prop. 3.2.2(a)，令$y$等于$x_k$在$X^$上的投影，我们得到$$ \delta{k+1} \leq \delta_k-\alpha_k\left(f\left(x_k\right)-f^\right)+\frac{1}{2} \alpha_k^2 c^2
$$
把不等式从0加到$k$，利用$\delta_{k+1} \geq 0$这个事实，
$$
\sum_{\ell=0}^k \alpha_{\ell}\left(f\left(x_k\right)-f^\right) \leq \delta_0+\frac{1}{2} c^2 \sum_{\ell=0}^k \alpha_{\ell}^2, $$除以$\sum_{\ell=0}^k \alpha_{\ell}$， $$ \frac{\sum_{\ell=0}^k \alpha_{\ell} f\left(x_k\right)}{\sum_{\ell=0}^k \alpha_{\ell}}-f^ \leq \frac{\delta_0+\frac{1}{2} c^2 \sum_{\ell=0}^k \alpha_{\ell}^2}{\sum_{\ell=0}^k \alpha_{\ell}}
$$

数学代写|凸优化作业代写Convex Optimization代考|Modified Dynamic Stepsize Rules

考虑次梯度法
$$
x_{k+1}=P_X\left(x_k-\alpha_k g_k\right)
$$
步长根据两个规则之一选择
$$
\alpha_k=\frac{f\left(x_k\right)-f_k}{\max \left{\gamma,\left|g_k\right|^2\right}} \quad \text { or } \quad \alpha_k=\min \left{\gamma, \frac{f\left(x_k\right)-f_k}{\left|g_k\right|^2}\right}
$$

其中$\gamma$为固定正标量，$f_k$由式(3.22)-(3.23)的动态调整过程给出。证明Prop. 3.2.8的收敛性结果仍然成立。简证:我们以反证法进行，如提案3.2.8的证明。根据提案3.2.2(a)中$y=\bar{y}$，我们得到所有$k \geq \bar{k}$，
$$
\begin{aligned}
\left|x_{k+1}-\bar{y}\right|^2 & \leq\left|x_k-\bar{y}\right|^2-2 \alpha_k\left(f\left(x_k\right)-f(\bar{y})\right)+\alpha_k^2\left|g_k\right|^2 \
& \leq\left|x_k-\bar{y}\right|^2-2 \alpha_k\left(f\left(x_k\right)-f(\bar{y})\right)+\alpha_k\left(f\left(x_k\right)-f_k\right) \
& =\left|x_k-\bar{y}\right|^2-\alpha_k\left(f\left(x_k\right)-f_k\right)-2 \alpha_k\left(f_k-f(\bar{y})\right) \
& \leq\left|x_k-\bar{y}\right|^2-\alpha_k\left(f\left(x_k\right)-f_k\right) .
\end{aligned}
$$
因此$\left{x_k\right}$是有界的，这就意味着$\left{g_k\right}$也是有界的(参见Prop 3.1.2)。让$\bar{c}$为所有$k$成为$\left|g_k\right| \leq \bar{c}$。假设根据式(3.47)中的第一条规则选择$\alpha_k$。从前面的关系式中我们得到所有$k \geq \bar{k}$，
$$
\left|x_{k+1}-\bar{y}\right|^2 \leq\left|x_k-\bar{y}\right|^2-\frac{\delta^2}{\max \left{\gamma, \bar{c}^2\right}} .
$$
正如Prop. 3.2.8的证明一样，这就引出了一个矛盾，结果就出来了。如果根据式(3.47)中的第二条规则选择$\alpha_k$，则证明是类似的。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|Connection with Incremental Subgradient Methods

Posted on 2023年7月14日2023年7月14日 by statistics-lab

数学代写|凸优化作业代写Convex Optimization代考|Connection with Incremental Subgradient Methods

We discussed in Section 2.1.5 incremental variants of gradient methods, which apply to minimization over a closed convex set $X$ of an additive cost function of the form
$$
f(x)=\sum_{i=1}^m f_i(x),
$$
where the functions $f_i: \Re^n \mapsto \Re$ are differentiable. Incremental variants of the subgradient method are also possible in the case where the $f_i$ are nondifferentiable but convex. The idea is to sequentially take steps along the subgradients of the component functions $f_i$, with intermediate adjustment of $x$ after processing each $f_i$. We simply use an arbitrary subgradient of $f_i$ at a point where $f_i$ is nondifferentiable, in place of the gradient that would be used if $f_i$ were differentiable at that point.

Incremental methods are particularly interesting when the number of cost terms $m$ is very large. Then a full subgradient step is very costly, and one hopes to make progress with approximate but much cheaper incremental steps. We will discuss in detail incremental subgradient methods and their combinations with other methods, such as incremental proximal methods, in Section 6.4. In this section we will discuss the most common type of incremental subgradient method, and highlight its connection with the $\epsilon$-subgradient method.

Let us consider the minimization of $\sum_{i=1}^m f_i$ over $x \in X$, for the case where each $f_i$ is a convex real-valued function. Similar to the incremental gradient methods of Section 2.1.5, we view an iteration as a cycle of $m$ subiterations. If $x_k$ is the vector obtained after $k$ cycles, the vector $x_{k+1}$ obtained after one more cycle is
$$
x_{k+1}=\psi_{m, k},
$$
where starting with $\psi_{0, k}=x_k$, we obtain $\psi_{m, k}$ after the $m$ steps
$$
\psi_{i, k}=P_X\left(\psi_{i-1, k}-\alpha_k g_{i, k}\right), \quad i=1, \ldots, m,
$$
with $g_{i, k}$ being an arbitrary subgradient of $f_i$ at $\psi_{i-1, k}$.

数学代写|凸优化作业代写Convex Optimization代考|NOTES, SOURCES, AND EXERCISES

Section 3.1: Subgradients are central in the work of Fenchel [Fen51]. The original theorem by Danskin [Dan67] provides a formula for the directional derivative of the maximum of a (not necessarily convex) directionally differentiable function. When adapted to a convex function $f$, this formula yields Eq. (3.10) for the subdifferential of $f$; see Exercise 3.5.

Another important subdifferential formula relates to the subgradients of an expected value function
$$
f(x)=E{F(x, \omega)},
$$

where $\omega$ is a random variable taking values in a set $\Omega$, and $F(\cdot, \omega): \Re^n \mapsto \Re$ is a real-valued convex function such that $f$ is real-valued (note that $f$ is easily verified to be convex). If $\omega$ takes a finite number of values with probabilities $p(\omega)$, then the formulas
$$
f^{\prime}(x ; d)=E\left{F^{\prime}(x, \omega ; d)\right}, \quad \partial f(x)=E{\partial F(x, \omega)},
$$
hold because they can be written in terms of finite sums as
$$
f^{\prime}(x ; d)=\sum_{\omega \in \Omega} p(\omega) F^{\prime}(x, \omega ; d), \quad \partial f(x)=\sum_{\omega \in \Omega} p(\omega) \partial F(x, \omega),
$$
so Prop. 3.1.3(b) applies. However, the formulas (3.32) hold even in the case where $\Omega$ is uncountably infinite, with appropriate mathematical interpretation of the integral of set-valued functions $E{\partial F(x, \omega)}$ as the set of integrals
$$
\int_{\omega \in \Omega} g(x, \omega) d P(\omega)
$$
where $g(x, \omega) \in \partial F(x, \omega), \omega \in \Omega$ (measurability issues must be addressed in this context). For a formal proof and analysis, see the author’s papers [Ber72], [Ber73], which also provide a necessary and sufficient condition for $f$ to be differentiable, even when $F(\cdot, \omega)$ is not. In this connection, it is important to note that the integration over $\omega$ in Eq. (3.33) may smooth out the nondifferentiabilities of $F(\cdot, \omega)$ if $\omega$ is a “continuous” random variable. This property can be used in turn in algorithms, including schemes that bring to bear the methodology of differentiable optimization; see e.g., Yousefian, Nedić, and Shanbhag [YNS10], [YNS12], Agarwal and Duchi [AgD11], Duchi, Bartlett, and Wainwright [DBW12], Brown and Smith [BrS13], Abernethy et al. [ALS14], and Jiang and Zhang [JiZ14].

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|Connection with Incremental Subgradient Methods

我们在2.1.5节中讨论了梯度方法的增量变体，它适用于在闭凸集$X$上最小化形式为的附加代价函数
$$
f(x)=\sum_{i=1}^m f_i(x),
$$
这里的函数$f_i: \Re^n \mapsto \Re$是可微的。在$f_i$不可微但是凸的情况下，子梯度方法的增量变体也是可能的。其思想是沿着组件函数$f_i$的子梯度依次采取步骤，在处理每个$f_i$之后对$x$进行中间调整。我们只需在$f_i$不可微的一点上使用$f_i$的任意子梯度，来代替$f_i$在该点可微时使用的梯度。

当成本项$m$的数量非常大时，增量方法特别有趣。然后，一个完整的次梯度步骤是非常昂贵的，人们希望通过近似但更便宜的增量步骤取得进展。我们将在第6.4节详细讨论增量亚梯度方法及其与其他方法的组合，例如增量近端方法。在本节中，我们将讨论最常见的增量子梯度方法类型，并强调其与$\epsilon$ -subgradient方法的联系。

让我们考虑$\sum_{i=1}^m f_i$ / $x \in X$的最小化，对于每个$f_i$都是凸实值函数的情况。与第2.1.5节的增量梯度方法类似，我们将迭代视为$m$子迭代的循环。若$x_k$为$k$次循环后得到的矢量，则再经过一次循环后得到的矢量$x_{k+1}$为
$$
x_{k+1}=\psi_{m, k},
$$
从$\psi_{0, k}=x_k$开始，我们在$m$步骤之后得到$\psi_{m, k}$
$$
\psi_{i, k}=P_X\left(\psi_{i-1, k}-\alpha_k g_{i, k}\right), \quad i=1, \ldots, m,
$$
其中$g_{i, k}$是$f_i$ at $\psi_{i-1, k}$的任意子梯度。

数学代写|凸优化作业代写Convex Optimization代考|NOTES, SOURCES, AND EXERCISES

第3.1节:次梯度是Fenchel [Fen51]工作的核心。Danskin [Dan67]的原始定理提供了一个方向可微函数(不一定是凸的)最大值的方向导数公式。当适用于凸函数$f$时，该公式为$f$的次微分产生Eq. (3.10);参见练习3.5。

另一个重要的次微分公式与期望值函数的次梯度有关
$$
f(x)=E{F(x, \omega)},
$$

其中$\omega$是在集合$\Omega$中取值的随机变量，$F(\cdot, \omega): \Re^n \mapsto \Re$是实值凸函数，因此$f$是实值(注意$f$很容易被验证为凸)。如果$\omega$取有限个数的值，概率为$p(\omega)$，则公式
$$
f^{\prime}(x ; d)=E\left{F^{\prime}(x, \omega ; d)\right}, \quad \partial f(x)=E{\partial F(x, \omega)},
$$
因为它们可以写成有限和的形式
$$
f^{\prime}(x ; d)=\sum_{\omega \in \Omega} p(\omega) F^{\prime}(x, \omega ; d), \quad \partial f(x)=\sum_{\omega \in \Omega} p(\omega) \partial F(x, \omega),
$$
因此，第3.1.3(b)号提案适用。然而，公式(3.32)即使在$\Omega$是不可数无穷的情况下也成立，将集值函数的积分$E{\partial F(x, \omega)}$适当地解释为积分集
$$
\int_{\omega \in \Omega} g(x, \omega) d P(\omega)
$$
其中$g(x, \omega) \in \partial F(x, \omega), \omega \in \Omega$(可度量性问题必须在此上下文中解决)。对于正式的证明和分析，参见作者的论文[Ber72]， [Ber73]，它们也提供了$f$可微的充分必要条件，即使$F(\cdot, \omega)$不可微。在这方面，重要的是要注意，如果$\omega$是一个“连续”随机变量，则Eq.(3.33)中对$\omega$的积分可以平滑$F(\cdot, \omega)$的不可微性。这个性质可以依次用于算法中，包括采用可微优化方法的方案;参见Yousefian, nedidic, and Shanbhag [YNS10]， [YNS12]， Agarwal and Duchi [AgD11]， Duchi, Bartlett, and Wainwright [DBW12]， Brown and Smith [BrS13]， Abernethy et al. [ALS14]， Jiang and Zhang [JiZ14]。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|Linear Convergence Rate of Incremental Gradient Method

Posted on 2023年7月14日2023年7月14日 by statistics-lab

数学代写|凸优化作业代写Convex Optimization代考|Linear Convergence Rate of Incremental Gradient Method

This exercise quantifies the rate of convergence of the incremental gradient method to the “region of confusion” (cf. Fig. 2.1.11), for any order of processing the additive cost components, assuming these components are positive definite quadratic. Consider the incremental gradient method
$$
x_{k+1}=x_k-\alpha \nabla f_k\left(x_k\right) \quad k=0,1, \ldots,
$$
where $f_0, f_1, \ldots$, are quadratic functions with eigenvalues lying within some interval $[\gamma, \Gamma]$, where $\gamma>0$. Suppose that for a given $\epsilon>0$, there is a vector $x^$ such that $$ \left|\nabla f_k\left(x^\right)\right| \leq \epsilon, \quad \forall k=0,1, \ldots
$$

Show that for all $\alpha$ with $0<\alpha \leq 2 /(\gamma+\Gamma)$, the generated sequence $\left{x_k\right}$ converges to a $2 \epsilon / \gamma$-neighborhood of $x^$, i.e., $$ \limsup {k \rightarrow \infty}\left|x_k-x^\right| \leq \frac{2 \epsilon}{\gamma} . $$ Moreover the rate of convergence to this neighborhood is linear, in the sense that $$ \left|x_k-x^\right|>\frac{2 \epsilon}{\gamma} \quad \Rightarrow \quad\left|x{k+1}-x^\right|<\left(1-\frac{\alpha \gamma}{2}\right)\left|x_k-x^\right|, $$ while $$ \left|x_k-x^\right| \leq \frac{2 \epsilon}{\gamma} \quad \Rightarrow \quad\left|x_{k+1}-x^\right| \leq \frac{2 \epsilon}{\gamma} . $$ Hint: Let $f_k(x)=\frac{1}{2} x^{\prime} Q_k x-b_k^{\prime} x$, where $Q_k$ is positive definite symmetric, and write $$ x_{k+1}-x^=\left(I-\alpha Q_k\right)\left(x_k-x^\right)-\alpha \nabla f_k\left(x^\right) .
$$
For other related convergence rate results, see [NeB00] and [Sch14a].

数学代写|凸优化作业代写Convex Optimization代考|Proximal Gradient Method, £1-Regularization, and the Shrinkage Operation

The proximal gradient iteration (2.27) is well suited for problems involving a nondifferentiable function component that is convenient for a proximal iteration. This exercise considers the important case of the $\ell_1$ norm. Consider the problem
$$
\begin{aligned}
& \operatorname{minimize} f(x)+\gamma|x|_1 \
& \text { subject to } x \in \Re^n,
\end{aligned}
$$
where $f: \Re^n \mapsto \Re$ is a differentiable convex function, $|\cdot|_1$ is the $\ell_1$ norm, and $\gamma>0$. The proximal gradient iteration is given by the gradient step
$$
z_k=x_k-\alpha \nabla f\left(x_k\right)
$$
followed by the proximal step
$$
x_{k+1} \in \arg \min {x \in \Re^n}\left{\gamma|x|_1+\frac{1}{2 \alpha}\left|x-z_k\right|^2\right} $$ [cf. Eq. (2.28)]. Show that the proximal step can be performed separately for each coordinate $x^i$ of $x$, and is given by the so-called shrinkage operation: $$ x{k+1}^i=\left{\begin{array}{ll}
z_k^i-\alpha \gamma & \text { if } z_k^i>\alpha \gamma, \
0 & \text { if }\left|z_k^i\right| \leq \alpha \gamma, \
z_k^i+\alpha \gamma & \text { if } z_k^i<-\alpha \gamma,
\end{array} \quad i=1, \ldots, n .\right.
$$
Note: Since the shrinkage operation tends to set many coordinates $x_{k+1}^i$ to 0 , it tends to produce “sparse” iterates.

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|Linear Convergence Rate of Incremental Gradient Method

这个练习量化了增量梯度法在“混乱区域”的收敛速度(参见图2.1.11)，对于任何处理附加成本分量的顺序，假设这些分量是正定的二次元。考虑增量梯度法
$$
x_{k+1}=x_k-\alpha \nabla f_k\left(x_k\right) \quad k=0,1, \ldots,
$$
式中$f_0, f_1, \ldots$为特征值位于某区间$[\gamma, \Gamma]$内的二次函数，式中$\gamma>0$。假设对于给定的$\epsilon>0$，有一个向量$x^$满足 $$ \left|\nabla f_k\left(x^\right)\right| \leq \epsilon, \quad \forall k=0,1, \ldots
$$

证明对于所有$\alpha$和$0<\alpha \leq 2 /(\gamma+\Gamma)$，生成的序列$\left{x_k\right}$收敛到$x^$的$2 \epsilon / \gamma$邻域，即$$ \limsup {k \rightarrow \infty}\left|x_k-x^\right| \leq \frac{2 \epsilon}{\gamma} . $$，并且收敛到该邻域的速度是线性的，即$$ \left|x_k-x^\right|>\frac{2 \epsilon}{\gamma} \quad \Rightarrow \quad\left|x{k+1}-x^\right|<\left(1-\frac{\alpha \gamma}{2}\right)\left|x_k-x^\right|, $$和$$ \left|x_k-x^\right| \leq \frac{2 \epsilon}{\gamma} \quad \Rightarrow \quad\left|x_{k+1}-x^\right| \leq \frac{2 \epsilon}{\gamma} . $$提示:设$f_k(x)=\frac{1}{2} x^{\prime} Q_k x-b_k^{\prime} x$，其中$Q_k$是正定对称的，并写$$ x_{k+1}-x^=\left(I-\alpha Q_k\right)\left(x_k-x^\right)-\alpha \nabla f_k\left(x^\right) .
$$
其他相关的收敛速率结果参见[NeB00]和[Sch14a]。

数学代写|凸优化作业代写Convex Optimization代考|Proximal Gradient Method, £1-Regularization, and the Shrinkage Operation

近端梯度迭代(2.27)非常适合于涉及不可微函数分量的问题，这便于近端迭代。这个练习考虑了$\ell_1$规范的重要情况。考虑这个问题
$$
\begin{aligned}
& \operatorname{minimize} f(x)+\gamma|x|1 \ & \text { subject to } x \in \Re^n, \end{aligned} $$ 其中$f: \Re^n \mapsto \Re$为可微凸函数，$|\cdot|_1$为$\ell_1$范数，$\gamma>0$。近端梯度迭代由梯度步长给出 $$ z_k=x_k-\alpha \nabla f\left(x_k\right) $$ 接着是近端步骤 $$ x{k+1} \in \arg \min {x \in \Re^n}\left{\gamma|x|1+\frac{1}{2 \alpha}\left|x-z_k\right|^2\right} $$[参见式(2.28)]。表明可以对$x$的每个坐标$x^i$分别执行近端步骤，并由所谓的收缩操作:$$ x{k+1}^i=\left{\begin{array}{ll} z_k^i-\alpha \gamma & \text { if } z_k^i>\alpha \gamma, \ 0 & \text { if }\left|z_k^i\right| \leq \alpha \gamma, \ z_k^i+\alpha \gamma & \text { if } z_k^i<-\alpha \gamma, \end{array} \quad i=1, \ldots, n .\right. $$给出注意:由于收缩操作倾向于将许多坐标$x{k+1}^i$设置为0，因此它倾向于产生“稀疏”迭代。

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写

数学代写|凸优化作业代写Convex Optimization代考|Armijo/Backtracking Stepsize Rule

Posted on 2023年6月30日2023年6月30日 by statistics-lab

如果你也在怎样代写凸优化Convex optimization 这个学科遇到相关的难题，请随时右上角联系我们的24/7代写客服。凸优化Convex optimization凸优化是数学优化的一个子领域，研究的是凸集上凸函数最小化的问题。许多类凸优化问题允许采用多项式时间算法，而数学优化一般来说是NP-hard。

凸优化Convex optimization是数学优化的一个子领域，研究的是凸集上凸函数最小化的问题。许多类别的凸优化问题允许采用多项式时间算法，而数学优化一般来说是NP困难的。凸优化在许多学科中都有应用，如自动控制系统、估计和信号处理、通信和网络、电子电路设计、数据分析和建模、金融、统计（最佳实验设计）、和结构优化，其中近似概念被证明是有效的。

数学代写|凸优化作业代写Convex Optimization代考|Armijo/Backtracking Stepsize Rule

Consider minimization of a continuously differentiable function $f: \Re^n \mapsto \Re$, using the iteration
$$
x_{k+1}=x_k+\alpha_k d_k
$$
where $d_k$ is a descent direction. Given fixed scalars $\beta$, and $\sigma$, with $0<\beta<1$, $0<\sigma<1$, and $s_k$ with $\inf _{k \geq 0} s_k>0$, the stepsize $\alpha_k$ is determined as follows: we set $\alpha_k=\beta^{m_k} s_k$, where $m_k$ is the first nonnegative integer $m$ for which
$$
f\left(x_k\right)-f\left(x_k+\beta^m s_k d_k\right) \geq-\sigma \beta^m s_k \nabla f\left(x_k\right)^{\prime} d_k
$$
Assume that there exist positive scalars $c_1, c_2$ such that for all $k$ we have
$$
c_1\left|\nabla f\left(x_k\right)\right|^2 \leq-\nabla f\left(x_k\right)^{\prime} d_k, \quad\left|d_k\right|^2 \leq c_2\left|\nabla f\left(x_k\right)\right|^2
$$
(a) Show that the stepsize $\alpha_k$ is well-defined, i.e., that it will be determined after a finite number of reductions if $\nabla f\left(x_k\right) \neq 0$. Proof: We have for all $s>0$
$$
f\left(x_k+s d_k\right)-f\left(x_k\right)=s \nabla f\left(x_k\right)^{\prime} d_k+o(s)
$$
Thus the test for acceptance of a stepsize $s>0$ is written as
$$
s \nabla f\left(x_k\right)^{\prime} d_k+o(s) \leq \sigma s \nabla f\left(x_k\right)^{\prime} d_k,
$$
or using Eq. (2.72),
$$
\frac{o(s)}{s} \leq(1-\sigma) c_1\left|\nabla f\left(x_k\right)\right|^2,
$$
which is satisfied for $s$ in some interval $\left(0, \bar{s}_k\right]$. Thus the test will be passed for all $m$ for which $\beta^m s_k \leq \bar{s}_k$.

Show that every limit point $\bar{x}$ of the generated sequence $\left{x_k\right}$ satisfies $\nabla f(\bar{x})=0$. Proof: Assume, to arrive at a contradiction, that there is a subsequence $\left{x_k\right}_{\mathcal{K}}$ that converges to some $\bar{x}$ with $\nabla f(\bar{x}) \neq 0$. Since $\left{f\left(x_k\right)\right}$ is monotonically nonincreasing, $\left{f\left(x_k\right)\right}$ either converges to a finite value or diverges to $-\infty$. Since $f$ is continuous, $f(\bar{x})$ is a limit point of $\left{f\left(x_k\right)\right}$, so it follows that the entire sequence $\left{f\left(x_k\right)\right}$ converges to $f(\bar{x})$. Hence,
$$
f\left(x_k\right)-f\left(x_{k+1}\right) \rightarrow 0
$$
By the definition of the Armijo rule and the descent property $\nabla f\left(x_k\right)^{\prime} d_k \leq$ 0 of the direction $d_k$, we have
$$
f\left(x_k\right)-f\left(x_{k+1}\right) \geq-\sigma \alpha_k \nabla f\left(x_k\right)^{\prime} d_k \geq 0,
$$
so by combining the preceding two relations,
$$
\alpha_k \nabla f\left(x_k\right)^{\prime} d_k \rightarrow 0
$$
From the left side of Eq. (2.72) and the hypothesis $\nabla f(\bar{x}) \neq 0$, it follows that
$$
\limsup {\substack{k \rightarrow \infty \ k \in \mathcal{K}}} \nabla f\left(x_k\right)^{\prime} d_k<0 $$ which together with Eq. (2.73) implies that $$ \left{\alpha_k\right}{\mathcal{K}} \rightarrow 0 .
$$
which together with Eq. (2.73) implies that
$$
\left{\alpha_k\right}_{\mathcal{K}} \rightarrow 0
$$
Since $s_k$, the initial trial value for $\alpha_k$, is bounded away from $0, s_k$ will be reduced at least once for all $k \in \mathcal{K}$ that are greater than some iteration index $\bar{k}$. Thus we must have for all $k \in \mathcal{K}$ with $k>\bar{k}$
$$
f\left(x_k\right)-f\left(x_k+\left(\alpha_k / \beta\right) d_k\right)<-\sigma\left(\alpha_k / \beta\right) \nabla f\left(x_k\right)^{\prime} d_k
$$

数学代写|凸优化作业代写Convex Optimization代考|Convergence of Steepest Descent to a Single Limit

Let $f: \Re^n \mapsto \Re$ be a differentiable convex function, and assume that for some $L>0$, we have
$$
|\nabla f(x)-\nabla f(y)| \leq L|x-y|, \quad \forall x, y \in \Re^n .
$$
Let $X^$ be the set of minima of $f$, and assume that $X^$ is nonempty. Consider the steepest descent method
$$
x_{k+1}=x_k-\alpha_k \nabla f\left(x_k\right)
$$
Show that $\left{x_k\right}$ converges to a minimizing point of $f$ under each of the following two stepsize rule conditions:
(i) For some $\epsilon>0$, we have
$$
\epsilon \leq \alpha_k \leq \frac{2(1-\epsilon)}{L}, \quad \forall k .
$$
(ii) $\alpha_k \rightarrow 0$ and $\sum_{k=0}^{\infty} \alpha_k=\infty$.
Notes: The original source [BGI95] also shows convergence to a single limit for a variant of the Armijo rule. This should be contrasted with a result of [Gon00], which shows that the steepest descent method with the exact line minimization rule may produce a sequence with multiple limit points (all of which are of course optimal), even for a convex cost function. There is also a “local capture” theorem that applies to gradient methods for nonconvex continuously differentiable cost functions $f$ and an isolated local minimum of $f$ (a local minimum $x^$ that is unique within a neighborhood of $x^$ ). Under mild conditions it asserts that there is an open sphere $S_{x^}$ centered at $x^$ such that once the generated sequence $\left{x_k\right}$ enters $S_{x^}$, it converges to $x^$ (see [Ber82a], Prop. 1.12, or [Ber99], Prop. 1.2.5 and the references given there). Abbreviated Proof: Consider the stepsize rule (i). From the descent inequality (Exercise 2.2), we have for all $k$
$$
f\left(x_{k+1}\right) \leq f\left(x_k\right)-\alpha_k\left(1-\frac{\alpha_k L}{2}\right)\left|\nabla f\left(x_k\right)\right|^2 \leq f\left(x_k\right)-\epsilon^2\left|\nabla f\left(x_k\right)\right|^2
$$
so $\left{f\left(x_k\right)\right}$ is monotonically nonincreasing and converges. Adding the preceding relation for all values of $k$ and taking the limit as $k \rightarrow \infty$, we obtain for all $x^* \in X^$, $$ f\left(x^\right) \leq f\left(x_0\right)-\epsilon^2 \sum_{k=0}^{\infty}\left|\nabla f\left(x_k\right)\right|^2
$$

凸优化代写

数学代写|凸优化作业代写Convex Optimization代考|Armijo/Backtracking Stepsize Rule

考虑最小化一个连续可微函数$f: \Re^n \mapsto \Re$，使用迭代
$$
x_{k+1}=x_k+\alpha_k d_k
$$
其中$d_k$为下降方向。给定固定标量$\beta$, $\sigma$, $0<\beta<1$, $0<\sigma<1$, $s_k$, $\inf _{k \geq 0} s_k>0$，步长$\alpha_k$确定如下:我们设置$\alpha_k=\beta^{m_k} s_k$，其中$m_k$是第一个非负整数$m$
$$
f\left(x_k\right)-f\left(x_k+\beta^m s_k d_k\right) \geq-\sigma \beta^m s_k \nabla f\left(x_k\right)^{\prime} d_k
$$
假设存在正标量$c_1, c_2$对于所有的$k$
$$
c_1\left|\nabla f\left(x_k\right)\right|^2 \leq-\nabla f\left(x_k\right)^{\prime} d_k, \quad\left|d_k\right|^2 \leq c_2\left|\nabla f\left(x_k\right)\right|^2
$$
(a)证明步长$\alpha_k$是定义良好的，即，如果$\nabla f\left(x_k\right) \neq 0$，它将在有限次缩减后确定。证明:我们有所有$s>0$
$$
f\left(x_k+s d_k\right)-f\left(x_k\right)=s \nabla f\left(x_k\right)^{\prime} d_k+o(s)
$$
因此，步长$s>0$的可接受性测试写为
$$
s \nabla f\left(x_k\right)^{\prime} d_k+o(s) \leq \sigma s \nabla f\left(x_k\right)^{\prime} d_k,
$$
或用式(2.72)，
$$
\frac{o(s)}{s} \leq(1-\sigma) c_1\left|\nabla f\left(x_k\right)\right|^2,
$$
它在$\left(0, \bar{s}_k\right]$区间内满足$s$。因此，测试将通过所有$m$，其中$\beta^m s_k \leq \bar{s}_k$。

证明生成序列$\left{x_k\right}$的每个极限点$\bar{x}$满足$\nabla f(\bar{x})=0$。证明:为了得到一个矛盾，假设有一个子序列$\left{x_k\right}{\mathcal{K}}$与$\nabla f(\bar{x}) \neq 0$收敛到某个$\bar{x}$。由于$\left{f\left(x_k\right)\right}$是单调非递增的，因此$\left{f\left(x_k\right)\right}$收敛于有限值或发散到$-\infty$。因为$f$是连续的，$f(\bar{x})$是$\left{f\left(x_k\right)\right}$的极限点，所以整个序列$\left{f\left(x_k\right)\right}$收敛到$f(\bar{x})$。因此， $$ f\left(x_k\right)-f\left(x{k+1}\right) \rightarrow 0
$$
根据Armijo法则的定义和方向$d_k$的下降性质$\nabla f\left(x_k\right)^{\prime} d_k \leq$ 0，我们有
$$
f\left(x_k\right)-f\left(x_{k+1}\right) \geq-\sigma \alpha_k \nabla f\left(x_k\right)^{\prime} d_k \geq 0,
$$
通过结合前面两个关系，
$$
\alpha_k \nabla f\left(x_k\right)^{\prime} d_k \rightarrow 0
$$
从方程(2.72)的左侧和假设$\nabla f(\bar{x}) \neq 0$可以得出如下结论
$$
\limsup {\substack{k \rightarrow \infty \ k \in \mathcal{K}}} \nabla f\left(x_k\right)^{\prime} d_k<0 $$与式(2.73)一起，意味着$$ \left{\alpha_k\right}{\mathcal{K}} \rightarrow 0 . $$ 它与式(2.73)一起意味着 $$ \left{\alpha_k\right}_{\mathcal{K}} \rightarrow 0 $$ 由于$\alpha_k$的初始试验值$s_k$与$0, s_k$有界限，因此对于所有大于某个迭代索引$\bar{k}$的$k \in \mathcal{K}$，至少会减少一次。因此，我们必须为所有$k \in \mathcal{K}$ with $k>\bar{k}$
$$
f\left(x_k\right)-f\left(x_k+\left(\alpha_k / \beta\right) d_k\right)<-\sigma\left(\alpha_k / \beta\right) \nabla f\left(x_k\right)^{\prime} d_k
$$

数学代写|凸优化作业代写Convex Optimization代考|Convergence of Steepest Descent to a Single Limit

设$f: \Re^n \mapsto \Re$是一个可微凸函数，对于$L>0$，我们有
$$
|\nabla f(x)-\nabla f(y)| \leq L|x-y|, \quad \forall x, y \in \Re^n .
$$
设$X^$为$f$的最小值集合，并设$X^$为非空。考虑最陡下降法
$$
x_{k+1}=x_k-\alpha_k \nabla f\left(x_k\right)
$$
证明在以下两种步长规则条件下，$\left{x_k\right}$收敛于$f$的一个极小点:
(i)对于一些$\epsilon>0$，我们有
$$
\epsilon \leq \alpha_k \leq \frac{2(1-\epsilon)}{L}, \quad \forall k .
$$
(ii) $\alpha_k \rightarrow 0$和$\sum_{k=0}^{\infty} \alpha_k=\infty$。
注:原始源代码[BGI95]也显示了Armijo规则的一个变体收敛到单个极限。这应该与[Gon00]的结果形成对比，该结果表明，具有精确的线最小化规则的最陡下降方法可能产生具有多个极限点的序列(当然所有这些都是最优的)，即使对于凸代价函数也是如此。还有一个“局部捕获”定理，适用于非凸连续可微代价函数$f$的梯度方法和孤立的局部最小值$f$(在$x^$的邻域内唯一的局部最小值$x^$)。在温和的条件下，它断言存在一个以$x^$为中心的开放球体$S_{x^}$，这样一旦生成的序列$\left{x_k\right}$进入$S_{x^}$，它就收敛到$x^$(参见[Ber82a]， Prop. 1.12，或[Ber99]， Prop. 1.2.5以及那里给出的参考文献)。简短证明:考虑步长规则(i)。从下降不等式(练习2.2)中，我们得到所有$k$
$$
f\left(x_{k+1}\right) \leq f\left(x_k\right)-\alpha_k\left(1-\frac{\alpha_k L}{2}\right)\left|\nabla f\left(x_k\right)\right|^2 \leq f\left(x_k\right)-\epsilon^2\left|\nabla f\left(x_k\right)\right|^2
$$
所以$\left{f\left(x_k\right)\right}$是单调不增加的并且是收敛的。将上述关系对所有的$k$值相加，取极限$k \rightarrow \infty$，则对所有的$x^* \in X^$， $$ f\left(x^\right) \leq f\left(x_0\right)-\epsilon^2 \sum_{k=0}^{\infty}\left|\nabla f\left(x_k\right)\right|^2
$$

统计代写请认准statistics-lab™. statistics-lab™为您的留学生涯保驾护航。

金融工程代写

金融工程是使用数学技术来解决金融问题。金融工程使用计算机科学、统计学、经济学和应用数学领域的工具和知识来解决当前的金融问题，以及设计新的和创新的金融产品。

非参数统计代写

非参数统计指的是一种统计方法，其中不假设数据来自于由少数参数决定的规定模型；这种模型的例子包括正态分布模型和线性回归模型。

广义线性模型代考

广义线性模型（GLM）归属统计学领域，是一种应用灵活的线性回归模型。该模型允许因变量的偏差分布有除了正态分布之外的其它分布。

术语广义线性模型（GLM）通常是指给定连续和/或分类预测因素的连续响应变量的常规线性回归模型。它包括多元线性回归，以及方差分析和方差分析（仅含固定效应）。

有限元方法代写

有限元方法（FEM）是一种流行的方法，用于数值解决工程和数学建模中出现的微分方程。典型的问题领域包括结构分析、传热、流体流动、质量运输和电磁势等传统领域。

有限元是一种通用的数值方法，用于解决两个或三个空间变量的偏微分方程（即一些边界值问题）。为了解决一个问题，有限元将一个大系统细分为更小、更简单的部分，称为有限元。这是通过在空间维度上的特定空间离散化来实现的，它是通过构建对象的网格来实现的：用于求解的数值域，它有有限数量的点。边界值问题的有限元方法表述最终导致一个代数方程组。该方法在域上对未知函数进行逼近。[1] 然后将模拟这些有限元的简单方程组合成一个更大的方程系统，以模拟整个问题。然后，有限元通过变化微积分使相关的误差函数最小化来逼近一个解决方案。

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

R语言代写	问卷设计与分析代写
PYTHON代写	回归分析与线性模型代写
MATLAB代写	方差分析与试验设计代写
STATA代写	机器学习/统计学习代写
SPSS代写	计量经济学代写
EVIEWS代写	时间序列分析代写
EXCEL代写	深度学习代写
SQL代写	各种数据建模与可视化代写