### 机器学习代写|聚类分析作业代写clustering analysis代考|Non-hierarchical clustering

## 机器学习代写|聚类分析作业代写clustering analysis代考|partitioning clustering

In contrast to hierarchical clustering, which yields a successive level of clusters by iterative fusions or divisions, non-hierarchical or partitioning clustering assigns a set of data points into $c$ clusters without any hierarchical structure. This process usually accompanies the optimization of a criterion function, usually the minimization of a objective function representing the within variability of the clusters (Xu and Wunsch, 2009). One of the best-known and most popular non-hierarchical clustering methods is c-means clustering. Another interesting partitioning method is c-medoids clustering. In the following sections, we briefly present these methods and the cluster validity criteria for determining the optimal number of clusters that have to be pre-specified in these methods.

## 机器学习代写|聚类分析作业代写clustering analysis代考|c-Means clustering method

The c-means clustering method (MacQueen, 1967) which is also known as $\mathrm{k}$ means clustering is one of the best-known and most popular clustering methods. It is also commonly known as k-means clustering. The c-means clustering methods seeks an optimal partition of the data by minimizing the sumof-squared-error criterion shown in Eq. (3.1) with an iterative optimization procedure, which belongs to the category of hill-climbing algorithms (Xu and Wunsch, 2009). The basic clustering procedure of c-means clustering is summarized as follows (Everitt et al., 2011; Xu and Wunsch, 2009):

1. Initialize a c-partition randomly or based on some prior knowledge. Calculate the cluster prototypes (centroids or means) (that is, calculate the mean in each cluster considering only the observations belonging to each cluster).
2. Assign each unit in the data set to the nearest cluster by using a suitable distance measure between each pair of units and centroids.
3. Recalculate the cluster prototypes (centroids or means) based on the current partition.
4. Repeat steps 2 and 3 until there is no change for each cluster.

Mathematically, the c-means clustering method is formalized as follows:
$$\begin{array}{r} \min : \sum_{i=1}^{I} \sum_{c=1}^{C} u_{i c} d_{i c}^{2}=\sum_{i=1}^{I} \sum_{c=1}^{C} u_{i c}\left|\mathbf{x}{i}-\mathbf{h}{c}\right|^{2} \ \sum_{c=1}^{C} u_{i c}=1, u_{i c} \geq 0, u_{i c}={0,1} \end{array}$$
where $u_{i c}$ indicates the membership degree of the $i$-th unit to the $c$-th cluster; $u_{i c}={0,1}$, that is, $u_{i c}=1$ when the $i$-th unit belongs to the $c$-th cluster; $u_{i c}=0$ otherwise; $d_{i c}^{2}=\left|\mathbf{x}{i}-\mathbf{h}{c}\right|^{2}$ indicates the squared Euclidean distance between the $i$-th object and the centroid of the $c$-th cluster.

## 机器学习代写|聚类分析作业代写clustering analysis代考|c-Medoids clustering method

By considering the c-medoids clustering method or partitioning around medoids (PAM) method (Kaufman and Rousseeuw, 1987, 1990), units are classified into clusters represented by one of the data points in the cluster (this method is also often referred to as k-medoids). These data points are the prototypes, the so-called medoids. Each medoid synthesizes the cluster information and represents the prototypal features of the clusters and then synthesizes the characteristics of the units belonging to each cluster. Following the c-medoids clustering method, we minimize the objective function represented by the sum (or mathematically equivalent, average) of the dissimilarity of units to their closest representative units. The c-medoids clustering method first computes a set of representative units, the medoids. After finding the set of medoids, each unit of the data set is assigned to the nearest medoid units. The algorithm suggested by Kaufman and Rousseeuw (1990) for the c-medoids clustering method proceeds in two phases:

Phase $1(B U I L D)$ : This phase sequentially selects $c$ “centrally located” units to be used as initial medoids.

Phase $2(S W A P)$ : If the objective function can be reduced by interchanging (swapping) a selected unit with an unselected unit, then the swap is carried out. This is continued until the objective function can no longer be decreased. Then, by considering a set of $I$ units by X (set of the observations) and a subset of $\mathbf{X}$ with $C$ units by $\tilde{\mathbf{X}}$ (set of the medoids) (where $C<<I$ ), we could formalize the model as follows:
$$\begin{array}{r} \min : \sum_{i=1}^{I} \sum_{c=1}^{C} u_{i c} d_{i c}^{2}=\sum_{i=1}^{I} \sum_{c=1}^{C} u_{i c}\left|\mathbf{x}{i}-\tilde{\mathbf{x}}{c}\right|^{2} \ \sum_{c=1}^{C} u_{i c}=1, u_{i c} \geq 0, u_{i c}={0,1} \end{array}$$ where $u_{i c}$ indicates the membership degree of the $i$-th unit to the $c$-th cluster; $u_{i c}={0,1}$, that is, $u_{i c}=1$ when the $i$-th unit belongs to the $c$-th cluster; $u_{i c}=0$ otherwise; $d_{i c}^{2}=\left|\mathbf{x}{i}-\tilde{\mathbf{x}}{c}\right|^{2}$ indicates the squared Euclidean distance between the $i$-th object and the medoid of the $c$-th cluster.

## 机器学习代写|聚类分析作业代写clustering analysis代考|c-Means clustering method

c-means 聚类方法 (MacQueen, 1967)，也称为ķ意味着聚类是最著名和最流行的聚类方法之一。它也通常称为 k-means 聚类。c-means 聚类方法通过最小化公式中所示的平方和误差标准来寻求数据的最佳划分。（3.1）具有迭代优化过程，属于爬山算法的范畴（Xu and Wunsch，2009）。c-means 聚类的基本聚类过程总结如下（Everitt et al., 2011; Xu and Wunsch, 2009）：

1. 随机或基于一些先验知识初始化一个 c 分区。计算集群原型（质心或均值）（即，仅考虑属于每个集群的观测值来计算每个集群中的平均值）。
2. 通过在每对单位和质心之间使用合适的距离度量，将数据集中的每个单位分配给最近的集群。
3. 根据当前分区重新计算集群原型（质心或均值）。
4. 重复步骤 2 和 3，直到每个集群都没有变化。

