A Novel Clustering Method for Breaking Down the Symmetric Multiple Traveling Salesman Problem

: Purpose: This study proposes a new two-stage clustering method to break down the symmetric multiple traveling salesman problem (mTSP) into several single standard traveling salesman problems, each of which can then be solved separately using a heuristic optimization algorithm. Design/methodology/approach: In the initial stage, a modified form of factor analysis is used to identify clusters of cities. In the second stage, the cities are allocated to the identified clusters using an integer-programming model. A comparison with the k-means++ clustering algorithm, one of the most popular clustering algorithms, was made to evaluate the performance of the proposed method in terms of four objective criteria. Findings: Computational results and comparison on 63 problems revealed that the proposed method is promising for producing quality clusters and thus for enhancing the performance of heuristic optimization algorithms in solving the mTSP. Originality/value: Unlike previous studies, this study tackles the issue of improving the performance of clustering-based optimization approaches in solving the mTSP by proposing a new clustering method that produces better cluster solutions rather than by proposing a new or improved version of a heuristic optimization algorithm for finding optimal routes.


The Context
The traveling salesman problem (TSP) is related to the determination of the shortest possible route connecting multiple cities so that a salesman can visit each city on his route only once and then return to his city of origin. A more complex version of the TSP is the multiple traveling salesman problem (mTSP), in which m salesmen must visit n cities, and each salesman must begin at and return to the same city. The mTSP problem is more challenging than the TSP because it requires identifying which cities must be allotted to each salesman as well as the corresponding optimal routes (Carter & Ragsdale, 2006). Several variants of the mTSP exist, all of which are contingent on the number of depots involved (multiple or single); whether the paths are open or closed; whether the number of salesmen are set a priori, limited, or minimized; and whether additional constraints must be considered, such as a particular time frame for visiting each city, the maximum number of cities to be assigned to each salesman, the maximum and/or minimum allowable distance that a salesman must travel, and other issues (Bektas, 2006). The focus of this study is the single depot systematic mTSP. Throughout the rest of this paper, therefore, this variant is referred to as mTSP unless otherwise indicated.
In the literature, several procedures have been developed for solving the mTSP, which is recognized as an NP-hard problem in combinatorial optimization (Bektas, 2006). One of type of these procedures is the utilization of a transformation procedure by which the mTSP is converted to a single standard TSP (Gorenstein, 1970). However, this type of conversion is not efficient because the resulting TSP can be highly degenerate, particularly if it involves a growing number of salesmen (Bektas, 2006;Chandran, Narendrananesh & Ganesh, 2006). A alternative transformation procedure that has proven to be more efficient is to break down the mTSP into several standard TSPs equal to the number of salesmen in the problem using a clustering algorithm; this allows each TSP to be solved individually using any heuristic optimization algorithm (Sofge, Schultz & De Jong, 2002). This two-stage procedure is referred to in this paper as a clustering-based optimization approach. A review of the studies that have proposed using this approach to solve the mTSP is presented in the following section.

Research Gaps and Justification for the Study
The performance of a clustering-based optimization approach to solve the mTSP depends on two components: (1) the performance of the clustering algorithm used to produce quality clusters; and (2) the performance of the heuristic optimization algorithm used to find the most optimal routes in the minimum computation time. Most researchers have paid less attention to the first component, as indicated by the fact that most clustering-based optimization approaches involve the use of the k-means clustering algorithm for forming clusters with a new or improved version of a heuristic optimization algorithm for finding optimal routes. This study fills this gap and contributes to the literature by proposing a new clustering method as a means of improving the performance of clustering-based optimization approaches to solve the mTSP, with the following assumptions: • All salesmen must begin and finish at one common single depot.
• Each salesman is assigned to a number of cities without considering any constraints.
• Each salesman must visit each of the assigned cities once per tour.
• Euclidean metric is used to measure the distance measure between every pair of cities.
The remainder of this paper is organized as follows. Section 2 is a brief review of most of the relevant literature. Section 3 details the proposed method. Section 4 presents the numerical study used to evaluate the performance of the proposed method. Finally, Section 5 concludes the paper and proposes opportunities for future study.

Clustering-Based Optimization Approaches
Many studies have proposed a two-stage solution procedure to improve the performance of heuristics optimization algorithms to solve large mTSPs. The first stage uses a clustering method to group the cities into several clusters, whereas the second stage involves finding a Hamiltonian route into each cluster using a heuristic optimization algorithm. One of the earliest studies to propose solving the mTSP using a clustering-based optimization approaches was that of Sofge et al. (2002): They used the neighborhood attractor schema combined with different heuristic optimization algorithms, including a shrink-wrap algorithm and an assortment of evolutionary computation algorithms. The resulting combinations were tested on three different problems, and the encouraging results opened a new avenue of research in solving the mTSP (Sofge et al., 2002). Since then, several studies have developed clustering-based optimization approaches that combine a clustering algorithm with one or more heuristic optimization algorithms. These studies are summarized in Table 1 Two general observations can be drawn from Table 1. First, to break down the mTSP, 91 percent (10 out of 11) of the studies used either the classical version of the k-means or one of its variants; 10 studies used the standard version of the k-means; and only one study (Sofge et al., 2002) used a neighborhood attractor schema, a variant of the k-means. Second, the majority of studies (72.7%) used genetic algorithms to find the optimal routes.
In addition to the studies presented in Table 1, Chandran et al. (2006) and Ma et al. (2018) developed their own clustering algorithms to break down the mTSP. However, the performance of Chandran et al.'s (2006) algorithm was not evaluated with respect to the extent to which it improves the performance of heuristics optimization algorithms in solving the mTSP; it was only evaluated with respect to producing quality clusters using only one measure (relative percentage deviation). Ma et al. (2018) evaluated the performance of the developed clustering algorithm only in terms of total travel distance. This evaluation was achieved by combining a clustering algorithm with integer linear programming to determine the optimal routes.

The k-Mean Algorithm
The standard version of this algorithm, Lloyd's algorithm, is named for Stuart Lloyd, who first proposed it in 1957 (Morissette & Chartier, 2013). Because of its efficiency in grouping large data sets into good quality clusters, the k-means is by far the most extensively used clustering algorithm (Kovács et al., 2018;Morissette & Chartier, 2013;Singh, Malik & Sharma, 2012).
In the k-means algorithm, a large number of data points are grouped in a number of clusters to minimize the intra-cluster variance. The first step of this algorithm is to randomly choose the locations of the initial centroid for the required number of clusters. Each data point is then apportioned to its nearest centroid, which is subsequently changed based on the data points that are allotted to the cluster. This process is continued until the values of centroids become stabilized.
Literature has highlighted several limitations of the k-means algorithm (see Kaur & Kaur, 2013;Singh et al., 2012). These limitations include its sensitivities to the presence of outliers, the formation of unbalanced or even empty clusters, and its performance depends heavily on initialization (selection of initial centroids), so poor initialization can lead to poor performance (Fränti & Sieranoja, 2019). The last limitation can be surmounted by rerunning the algorithm many times using different random locations for the starting centroids in order to identify the best clustering solution or by using a better initialization technique. In this regard, several variants of the standard version of the k-means have been proposed in the literature, including Forgy's algorithm (Forgy, 1965), MacQueen's algorithm (MacQueen, 1967), the neighbourhood attractor schema (Sofge et al., 2002), k-means ++ algorithm (Arthur & Vassilvitskii, 2007), and the seeding algorithm for k-means problem with penalties (Li, Xu, Yue, Zhang & Zhang, 2020;Li, Xu, Zhang & Zhou, 2020). These variants of the k-means algorithm have proven effective to some extent. However, there is a still a need to run the algorithm multiple times, with different initial centroids, and to average these results to obtain a more stable overall result (Morissette & Chartier, 2013). Unfortunately, there is no method to help determine the number of runs required to guarantee the best results. Moreover, there is no consensus on which variant works the best (Fränti & Sieranoja, 2019).

The Proposed Method
As the breaking down of the mTSP can be considered a problem of dimensionality reduction in which a few independent clusters are formed from a large number of cities in order to minimize the total intra-cluster distance, it is proposed to use a two-stage method. In the initial stage, clusters of cities are identified using a modified form of factor analysis (FA). This stage is then followed by the use of a simple integer-programming model to allot the cities to the identified clusters. The selection of this method has been inspired by the successful implementation of similar methods in different applications (e.g., Albadawi, Bashir & Chen, 2005;Bashir & Karaa, 2008;Hamdan & Bashir, 2015). It worth noting that in FA terminology, clusters are called factors. Therefore, these two terms will be used interchangeability throughout the rest of the paper.

Stage 1: Identification of Clusters
FA is a dimensionality reduction technique developed by Spearman in 1904 to extract a few independent factors from the correlation matrix of a large number of interrelated variables using different extraction methods including canonical factor analysis, common factor analysis, image factor analysis, and principal component analysis (PCA) In this study, PCA was selected as an extraction method because it is straightforward, quantitatively rigorous, and popular (Rummel, 1988). The factors extracted using PCA comprise uncorrelated linear groupings of the initial variables. The variances accounted for by the first few factors typically represent a high percentage of the total variance of the original variables, which means that these variables can be assigned to a few independent factors. However, because the factors obtained using PCA are commonly correlated with many original variables, rotation methods are applied to ensure that each variable is related to only one factor and that each factor is highly correlated with just a small number of variables. A more detailed description of FA can be found in Rummel (1988).
The application of the modified form of FA to identify clusters of cities involves the generation of the matrix of relative distances, the extraction of preliminary clusters, and the obtaining of the final clusters. In this sub-section, a simple hypothetical example is used to explain these steps.

Generation of the relative distance matrix
As mentioned above, the input for FA is a correlation matrix created from an original data set. In the proposed method, this matrix is replaced by one referred to as the relative distance (RD) matrix, in which the principal diagonal elements are 1 and every off-diagonal element R ij is defined by equation (1): (1) Where d ij is the distance between any pair of cities i and j, and d max is the maximum distance among the cities.

Extraction of the preliminary clusters
In this step, equation (2) is used to calculate the eigenvalues and associated eigenvectors (factor loadings) of the RD matrix. Per the matrix theory, because it is truly symmetric, this matrix has real eigenvalues and their corresponding eigenvectors are independent.
where RD is the relative distance matrix of size n × n, I is the identity matrix, λ i is the characteristic root (eigenvalue), and Y i is the corresponding eigenvector.
The eigenvalues calculated for the RD matrix (Table 3) are presented in descending order in Table 4. The total variance that every cluster revealed is provided in the first column, which is labelled "eigenvalue." The second column presents the percentage of the total variance attributable to each cluster. The last column, which is the cumulative percentage, shows the percentage of variance attributable to that cluster and those preceding it in the table.
Of these nine clusters, only those with the highest k eigenvalues need to be selected to form preliminary clusters, where k represents the desired number of clusters that need to be formed. For example, if it is decided that three clusters have to be formed, then the clusters that correspond the highest three eigenvalues have to be chosen, and so on.
For the hypothetical example, the formed preliminary clusters are shown in Table 5, where it was assumed that two clusters needed to be formed. These two clusters are the eigenvectors that corresponded to the largest two eigenvalues (5.413 and 1.798) of the RD matrix. The elements of the eigenvectors are called loadings and the cities were assigned to the cluster associated with highest absolute loading.  Table 5. Preliminary clusters of the hypothetical example

Obtaining the Final Clusters
Determining which cities should be allocated to each cluster on the basis of the preliminary clusters is not easy in many cases, because a city might have a similar absolute value of loading on more than one cluster. In PCA, this problem is commonly overcome using a rotation technique such as varimax. Kaiser (1960) provides more details about this technique.
Applying the varimax technique to the matrix of preliminary clusters detailed in Table 5 produced the matrix of final clusters presented in Table 6. As shown in Table 6, Cities 1, 4, 7, and 9 had the largest loadings in cluster 1, while Cities 2, 3, 5, 6, and 8 had the largest loadings in cluster 2. Thus, the best grouping for the nine cities was grouping Cities 1, 4, 7 and 9 in one cluster and Cities 2, 3, 5, 6, and 8 in another.  Table 6. Final clusters of the hypothetical example

Stage 2: Assigning Cities to Clusters
If the problem is small, then employing the preceding steps is enough to form the clusters. However, assigning the cities to clusters manually is impractical for large problems. This task can be facilitated using a simple binary integer programming model, such as the following:

Maximize
(3) Subject to (4) Where k is the number of clusters, n is the total number of cities, w iu is the loading of city i on cluster u, and x iu is 1 if a city i is allotted to cluster u and 0 if not.
Realistic constraints can be easily incorporated in this model. For instance, if the maximum number of cities that can be visited by a salesman in a cluster is bounded, then the following constraint can be added: where l is the maximum number of cities a salesman can visit in cluster u.

Performance Evaluation
This section provides a summary of the evaluation of the performance of the proposed method with regard not only to the quality of formed clusters, but also to the extent that the method improves the performance of heuristics optimization algorithms in solving the mTSP. This evaluation was conducted by solving test problems without considering any constraints, using two combinations: the method that we proposed with a genetic algorithm (FA-GA) and the k-means++ with a genetic algorithm (KM-GA).
The main reason for comparing our method with the k-means++ was because it outperforms the standard k-means (Arthur & Vassilvitskii, 2007), which was used in most studies proposing clustering-based optimization approaches to solve the mTSP (see Table 1). The difference between these two algorithms is in their initialization methods. The following steps summarize the k-means++ initialization method: 1. One centroid, c 1 , is selected randomly from the set of data, X.
2. For each data point x ϵ X, the distance d(x) to the closest selected centroid is computed.
3. A new centroid c i is selected using a weighted probability defined by .
4. Steps 2 and 3 are repeated until the centroids for all clusters are identified. 5. The steps of the standard k-means algorithm are then followed.
To evaluate to the extent to which the proposed clustering method improves the performance of heuristics optimization algorithms in solving the mTSP, it was necessary to combine the proposed method with a selected heuristics optimization algorithm and compare the performance of this combination with that of a combination of the k-means++ and the same selected heuristics optimization algorithm. Because it is beyond the scope of this study to propose a heuristics optimization algorithm to determine the optimal routes, a basic genetic algorithm (an open-source MATLAB code provided by Kirk (2014) was used. The major steps of this algorithm are as follows: 1. Create a starting population of 80 solutions. 2. Iterate the following steps: a. Evaluate the cost of the current population of solutions; b. Identify the best solution in the population; and c. Modify the population by randomly putting them in groups of eight and performing mutations on seven of the eight, while keeping the best of the eight to pass along to the next generation. 3. Use the best solution found.

Evaluation Criteria
Four objective criteria that are commonly utilized in the literature were chosen for performance evaluation. These are: 1) the sum of the squared error; 2) the variability of cluster size; 3) the total traveling distance; and 4) the running time. The first two criteria were used to evaluate performance with respect to the obtained clustering solutions, whereas the other two criteria were used to evaluate performance with respect to the optimal route solutions obtained.

The Sum of the Squared Error
The Sum of the Squared Error (SSE) is commonly used to measure intra-cluster variance, which is the most uncomplicated and commonly utilized criterion for measuring the quality of clustering. For the purposes of this study, SSE is the sum of the squared distances between the coordinates of the points (cities) and the coordinates of centroids of the corresponding clusters (Cao, Liang & Jiang, 2009). It is calculated as: Where C ur is the coordinate of cluster u centroid along dimension r, c ir is the coordinate of city i along dimension r, k is the number of clusters, and N is the number of dimensions. According to equation (6), SSE decreases as the number of clusters grows. This is because an increase in the number of clusters will lead to a decrease in the sizes of the clusters, and thus a reduction in SSE value.

Variability of Cluster Size
As mentioned earlier, one common limitation of the k-means is the formation of unbalanced or even empty clusters (Kaur & Kaur, 2013;Morissette & Chartier, 2013;Singh et al., 2012). This limits its practicality for some applications of the mTSP, such as sales territories assignment. For instance, according to Bolaños, Echeverry and Escobar (2015), a balanced workload between clusters is critical due to its impact on the salespeople's morale, satisfaction, and likelihood of achieving their targets. One simple method to evaluate the variability of cluster size (V) is standard deviation, defined by equation (7): Where k is the number of clusters, is the average cluster size, and S u is the size of cluster u (i.e., the number of cities in cluster u).

Total Traveling Distance
A common objective of any traveling salesman problem is the minimization of the total traveling distance (Bolaños et al., 2015), so that the total traveling cost is also minimized. Equation (8) defines the total traveling distance (TTD): where d ij is the distance between cities i and j, n is the number of cities, and T ij = 1 if a salesman is traveling directly from city i to j in the optimal route and 0 if not.

Running Time
Running time, defined as the total CPU time (in seconds) spent during the execution of an algorithm, is one of the most critical performance criteria. Because it typically grows with input size and other factors, it can be considered a measure of the practicality of an algorithm. In this study, comparisons were made between the two combinations (FA-GA and KM-GA) with respect to the impacts of both problem size and the number of clusters according to the running time.

Test Problems
The two combinations, FA-GA and KM-GA, were tested using seven standard instances obtained from the TSPLIB library (https://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/tsp/). As shown in Table 7, the sizes of these instances ranged from 52 to 1002 cities. For every instance, an additional city representing the depot was added to each instance. The average of the coordinates of the cities in each instance were considered the coordinates of the added depot. Each of the seven instance was solved using nine different numbers of salesmen, ranging from two to 10.

Number of Runs
One advantage of the method proposed in this study is that it does not require multiple runs. In contrast, the k-mean++ algorithm begins by randomly selecting the location of the initial centroid, so selecting different starting centroids can result in different clusters being formed. The k-means++ algorithm must therefore be run several times (Morissette & Chartier, 2013). In the current study, the k-means++ algorithm was run independently 20 times to solve each test problem. Accordingly, the performance evaluation of the k-means++ in terms of each criterion was based on the average values of the results of 20 runs. Therefore, a total of 1,323 runs were performed in this study (1,260 runs of KM-GA and 63 of FA-GA).

Performance Evaluation Results
Table A-1 in the appendix presents a summary of the performance evaluation results using MATLAB 2018 on a system with a CPU Intel core i7.

Quality of Clusters
As a sample of the solutions, the obtained optimal route for each salesman -for instance, KorA100 (k = 6)-is presented in Table 8. A general observation can be made about the formed clusters for all the test problems, including KorA100, whose formed clusters are displayed in Figures 1 and 2: FA-GA generates clusters with non-crossed paths, whereas KM-GA generates clusters with crossed paths. This feature makes FA-GA more practical in applications that require avoiding the crossing of paths, such as multi-robot task allocation and unmanned aerial vehicles. The reason that FA-GA generates exclusive, distinctive clusters is due to its mathematical basis, i.e., the orthogonality of the eigenvectors of the matrix of relative distances. It is also clear that the SSE of FA-GA generally decreased quickly until a particular value of k was reached; after this point the values continued to decrease, but at a reduced rate. Conversely, the relationship between SSE of KM-GA and k did not follow a particular pattern. One conceivable interpretation for this behavior is that each plotted value of SSE was an average value of SSEs obtained over 20 runs. It is worth noting that, as shown in Table A-1 in the appendix, these findings are applicable to the solutions obtained not only for KorA200 and P439 but also those obtained for all the other instances.   With regard to the variability of cluster size, the results presented in Table A-1 indicate that, compared to those produced by KM-GA, the clusters produced by FA-GA had a lower cluster size variability in 65 percent of the 63 solved problems. Thus, FA-GA offers the advantage of achieving a good workload balance among the salesmen.

Impact on Finding Optimal Routes
Figures A-3 and A-4 in the appendix illustrate the plots of k values versus the total traveling distance of the solutions produced by FA-GA and KM-GA for KorA100 and Pr1002. A visual observation of these plots and those obtained for the other instances indicates that when the number of clusters grows, so too does the total traveling distance. This observation is applicable to both combinations. Furthermore, as shown in Table A-1 in the appendix, FA-GA plainly outperformed KM-GA with respect to minimizing the total traveling distance for all instances, regardless of the number of clusters.
Figures A-5 and A-6 in the appendix show plots of instance size versus the total CPU running time in seconds for the solutions produced using the two combinations for the seven instances of the nine different values of k. A visual assessment of these plots indicates that FA-GA obviously performed better than KM-GA with regard to the insensitivity of the total CPU running time to the problem size. Moreover, as shown in Table A-1 in the appendix, compared to FA-GA, KM-GA used less CPU running time in 22 percent of the solutions. However, these solutions required longer traveling distances than those produced using FA-GA.

Summary and Conclusion
The mTSP, an NP-hard problem, is among the most widely discussed problems in combinatorial optimization and several procedures to solve it have been proposed in the literature. Among these procedures is the use of a clustering-based optimization approach in which a clustering algorithm is used to break the problem down into a number of TSPs equal to the number of salesmen, and then the tour for each TSP is determined using an optimization algorithm. To break down the mTSP, this study proposed a new method comprising two stages. First, the clusters are identified via a modified form of FA, and then a simple integer programming model is used for the assignment of cities to the identified clusters, considering realistic constraints such as the maximum number of cities to be assigned to each salesman.
To evaluate the performance of the proposed method, it was combined with Kirk's (2014) genetic algorithm to solve a total of 63 test problems (seven standard instances, each solved nine times by varying the number of clusters from two to 10). These solutions were then compared to those produced by combining the k-means++ and Kirk's genetic algorithm. The comparison of the results demonstrated that the combination that included the proposed method compared favorably to the other based on four evaluation criteria: 1) the sum of the squared error; 2) the variability of cluster size; 3) total traveling distance; and 4) running time.
This study makes a twofold contribution to the literature. First, unlike previous studies, it tackles the issue of improving the performance of clustering-based optimization approaches in solving the mTSP by proposing a method that produces better cluster solutions rather than by proposing a new or improved version of an optimization algorithm for finding optimal routes. Second, this first study to use a FA-based method to break down the mTSP. In addition to its superior performance compared to the k-mean++ and in producing distinct clusters, this method does not require random selection of the initial centroids of clusters. Thus, an important advantage of the proposed method is its stability compared to the k-mean++ and other variants of the k-means that may provide different clusters for each run.
Although this study represents useful progress, much remains to be done. For instance, the performance of the proposed method was evaluated using 63 problems (seven instances, each of which was solved with nine different numbers of salesmen). However, there is a need to provide more evidence of the robustness of the proposed method by conducting an extensive computational study using instances of larger sizes. Moreover, combining the proposed method with Kirk's (2014) genetic algorithm to find the optimal routes was only for comparison purposes. To find optimal routes, future studies could explore combining the proposed method with new, current, or improved versions of heuristics optimization algorithms (such as those presented in Table 1).