### 统计代写|复杂网络代写complex networks代考| Algorithms for Community Detection

## 统计代写|复杂网络代写complex networks代考|Comparing a Quality Function

Instead of comparing the output of an algorithm for networks with known community structure one may compare the results of different algorithms across a quality function for the assignment of nodes into communities. Newman and Girvan [23] have proposed the following measure of the “modularity” of a community structure with $q$ groups:
$$Q=\sum_{s=1}^{q} e_{s s}-a_{s t}^{2}, \text { with } a_{s}=\sum_{s=1}^{q} e_{\mathrm{T} s} .$$
Here, $e_{r s}$ is the fraction of all edges that connect nodes in groups $r$ and $s$ and hence $e_{s s}$ is the fraction of edges connecting the nodes of group $s$ internally. From this, one finds that $a_{s}$ represents the fraction of all edges having at least one end in group $s$ and $a_{s}^{2}$ is to be interpreted as the expected fraction of links falling between nodes of group $s$ given a random distribution of links. Note the similarity of this measure with the assortativity coefficient defined earlier. It is clear that $-1<Q<1$.

This modularity measure will play a central role in the following chapters and it is of course a natural idea to optimize the assignment of nodes in communities directly by maximizing the modularity of the resulting partition.

## 统计代写|复杂网络代写complex networks代考|Hierarchical Algorithms

A large number of heuristic algorithmic approaches to community detection have been proposed by computer scientists. The developments follow generally along the lines of the algorithms developed for multivariate data [24-26]. Typically, the problem is approached by a recursive min-cut technique that partitions a connected graph into two parts minimizing the number of edges to cut $[27,28]$. These treatments, however, suffer greatly from being very skewed as the min-cut is usually found by cutting off only a very small subgraph [29]. A number of penalty functions have been suggested to overcome this problem and balance the size of subgraphs resulting from a cut. Among these are ratio cuts $[29,30]$, normalized cuts [31] or min-max cuts [32].

The clustering algorithm devised by Girvan and Newman (GN) [17] was the first to introduce the problem of community detection to physics researchers in the field of complex networks. As is often the case, the impact the paper created was not merely for the algorithm but because of the well-chosen illustrative example of its application. GN’s algorithm is based on “edge betweenness” – a concept again borrowed from sociology. Given all geodesic paths between all pairs of nodes in the network, the betweenness of an edge is the number of such paths that run across it. It is intuitive that betweenness is a measure of centrality and hence introduces a measure of distance to the graph. The GN algorithm calculates the edge betweenness for all edges in the graph and then removes the edge with the highest betweenness. Then, the betweenness values for all edges are recalculated. This process is repeated until the network is split into two disconnected components and the procedure starts over again on each of the two components until only single nodes remain. The algorithm falls into the class of recursive partitioning algorithms and its output is generally depicted as a dendrogram illustrating the progression of splitting the network.

Figure $2.5$ illustrates the algorithm with the example chosen by GN [17]. The network shown displays the friendships among the members of a karate club at a US university compiled by the anthropologist Zachary [18] over a period of 2 years. Over the course of the observation an internal dispute between the manager (node 34) and the instructor of the club (node 1) led to the split up of the club. Roughly half of the members joined the instructor in the formation of a new club and the other half of the members stayed with the manager hiring a new instructor. It turns out that the first split induced by the GN algorithm corresponds almost exactly to the observed split among the members of the club. This led to the conclusion that the split could be “predicted” from the topology of the network and that the GN algorithm is able to make such predictions. As far as the definition of community is concerned, the algorithm induces a hierarchy of communities as at any level of progress of the algorithm a set of connected nodes is to be understood as a community.

## 统计代写|复杂网络代写complex networks代考|Semi-hierarchical

The hierarchical methods cited so far assume a nested hierarchy of communities. One of the few methods which allow for overlapping communities is the clique percolation method of Palla et al. $[8,22]$ which was introduced already. Even though the method allows a node to be part of more than one community, communities resulting from $k+1$-clique percolation processes are always contained within $k$-clique communities. It is never possible that the nodes contained in the overlap of two communities form their own community. Another problem of this method is its dependence on the existence of triangles in the network. Nodes which are not connected via triangles to communities can never be part of such communities and only nodes with at least $k-1$ links can be part of a k-clique at all. Also, this method may be easily mislead by the addition or removal of single links in the network, as a single link may be responsible for the joining of two communities into one. Clearly, this situation is unsatisfactory in case of noisy data.

