In clustering of high-dimensional data a variable selection is commonly applied to obtain an accurate grouping of the samples. For two-class problems this selection may be carried out by fitting a mixture distribution to each variable. We propose a hybrid method for estimating a parametric mixture of two symmetric densities. The estimator combines the method of moments with the minimum distance approach. An evaluation study including both extensive simulations and gene expression data from acute leukemia patients shows that the hybrid method outperforms a maximum-likelihood estimator in model-based clustering. The hybrid estimator is flexible and performs well also under imprecise model assumptions, suggesting that it is robust and suited for real problems.

Mixture distributions are used in many fields of science for modeling data taken from different subpopulations. An important medical application is clustering of gene expression data to discover novel subgroups of a disease. This is a high-dimensional problem and it is common to do a variable selection to obtain a subset of genes whose expression contribute in separating the subgroups. For two-class problems this may be carried out by fitting a univariate mixture distribution to each gene and single out variables for which the overlap between the component distributions is small enough [

Karl Pearson [

In this paper, we propose a hybrid approach for estimating five parameters of a mixture of two densities which are symmetric about their means. The approach combines the method of moments with a minimum distance estimator based on a quadratic measure of deviation between the fitted and empirical distribution functions. The motivation behind our approach is to develop a robust algorithm that produces accurate estimates also when the parametric shape of the mixture distribution is misspecified, which is common in practice.

The paper is organized as follows. In Section

In this section we present the novel moment-distance hybrid estimator (HM-estimator) and describe how it can be used for model-based clustering. We consider the problem where the real-valued random variable

The HM-estimator, denoted

An estimate

Next we describe how the HM-estimator is obtained in practice, via a reformulation of definition (

In this subsection we describe how the HM-estimator (

The minimization of

A schematic description of how the HM-estimator is obtained. The user provides a grid with values

The mixture density

The true partition defined by

This section presents a simulation study where the proposed hybrid method (HM) is compared with a conventional ML-estimator derived via the EM-algorithm. We investigate the methods’ performances in model-based clustering and their accuracy for estimating the mixing proportion. The consequences of calculating the estimators under incorrect model assumptions are getting particular attention.

In the simulations, we restrict ourselves to the case where the component densities

Besides varying the family of the component densities, we consider six configurations of the parameter vector

The configurations (i)–(vii) of the parameter vector

Configuration | |||||||

Parameter | (i) | (ii) | (iii) | (iv) | (v) | (vi) | (vii) |

0 | 0 | 0 | 0 | 0 | 0 | 0 | |

2 | 3 | 3 | 4 | 4 | 3 | 0 | |

1 | 1 | 1 | 4 | 4 | 9 | 1 | |

1 | 1 | 1 | 1 | 1 | 1 | 1 | |

0.50 | 0.50 | 0.25 | 0.50 | 0.25 | 0.50 | 0.10–0.50 |

The mixture distributions used in the simulations: four distribution families – mixtures of normal, logistic, Laplace, and contaminated normal distributions – and six parameter configurations (i)–(vi)

The mixture data were generated as follows: first we simulated the (true) partition vector

The hybrid estimator

The maximum likelihood estimator

For a simulated dataset

Let

To determine if there was a significant difference between the methods’ clustering performance, we applied the

The considered scenarios corresponded to problems that were more or less difficult with respect to clustering and as part of our evaluations we quantified these difficulties. Here

We compared the methods in terms of their accuracy for estimating the mixing proportion

The following standard characteristics for evaluating an estimator

To determine if there was a significant difference in efficiency between the methods, we applied the

This section includes a detailed treatment of the results for sample size

The HM- and ML-estimators were evaluated mainly by their ability to cluster the samples in agreement with the true partition vector

The relative performance of the methods was evaluated by considering the mean difference in FARI (

Scenarios (i) and (vi) were hard clustering problems in the sense that the mean optimal FARI was low in all the cases:

In the case where data were generated from normal mixtures most of the observed differences were significant but of varying magnitude:

The average clustering performance of the hybrid method (HM) and the maximum likelihood (ML) method. 500 samples with 50 observations each were generated from four mixture distributions (normal, logistic, Laplace and contaminated Gaussian) with the parameter configurations (i)–(vi). The fuzzy adjusted Rand index (FARI) was obtained for each sample and estimator. The mean FARI was observed for each scenario, and the mean of the optimal FARI (opt.) obtained using the true mixture distribution serves as a reference

Mean of fuzzy adjusted Rand index, |
||||

Mixture | ||||

Normal | (i) | 0.28 | 0.21 | 0.35 |

(ii) | 0.59 | 0.52 | 0.68 | |

(iii) | 0.54 | 0.56 | 0.70 | |

(iv) | 0.57 | 0.53 | 0.60 | |

(v) | 0.64 | 0.65 | 0.70 | |

(vi) | 0.29 | 0.27 | 0.30 | |

Logistic | (i) | 0.41 | 0.24 | 0.46 |

(ii) | 0.71 | 0.60 | 0.71 | |

(iii) | 0.65 | 0.52 | 0.71 | |

(iv) | 0.64 | 0.53 | 0.65 | |

(v) | 0.67 | 0.57 | 0.73 | |

(vi) | 0.34 | 0.26 | 0.39 | |

Laplace | (i) | 0.32 | 0.22 | 0.40 |

(ii) | 0.66 | 0.57 | 0.70 | |

(iii) | 0.59 | 0.54 | 0.70 | |

(iv) | 0.60 | 0.53 | 0.63 | |

(v) | 0.67 | 0.64 | 0.72 | |

(vi) | 0.30 | 0.27 | 0.33 | |

Contaminated | (i) | 0.44 | 0.26 | 0.58 |

(ii) | 0.78 | 0.64 | 0.80 | |

(iii) | 0.74 | 0.65 | 0.81 | |

(iv) | 0.73 | 0.60 | 0.71 | |

(v) | 0.75 | 0.67 | 0.70 | |

(vi) | 0.39 | 0.33 | 0.42 |

With the data generated from logistic mixtures, the HM-estimator outperformed the ML-estimator for all parameter configurations, and most of the observed differences and evaluation measures were significant;

The FARI observed for the hybrid and maximum likelihood methods. 500 samples each with 50 observations, are generated from normal mixture distributions with the parameter configurations (i)–(vi). Samples for which the hybrid method performs considerably better (worse) than the maximum likelihood estimator are in the upper (lower) shaded area. Points inside the white area mark samples that correspond to inconsiderable differences. A difference is regarded as considerable if the absolute difference in the methods’ FARI exceeds 0.1

The relative clustering performance of the hybrid method (HM) and the maximum likelihood (ML) method. 500 samples each with 50 observations, are generated from four mixture distributions (normal, logistic, Laplace, and contaminated Gaussian) with the parameter configurations (i)–(vi). The fuzzy adjusted Rand index (FARI) is observed for each sample and estimator. For each scenario we observe: the mean of the differences between the observed average FARI values for the HM- and ML-estimators (

Comparison of HM and ML for soft clustering, |
|||||||||

Mixture | p-value | p-value | p-value | ||||||

Normal | (i) | 0.07 | 0.00 | 0.67 | 0.00 | 0.92 | 0.00 | 143 | 13 |

(ii) | 0.08 | 0.00 | 0.68 | 0.00 | 0.86 | 0.00 | 126 | 20 | |

(iii) | −0.02 | 0.02 | 0.42 | 0.00 | 0.40 | 0.01 | 84 | 124 | |

(iv) | 0.04 | 0.00 | 0.50 | 0.96 | 0.81 | 0.00 | 82 | 19 | |

(v) | −0.01 | 0.04 | 0.35 | 0.00 | 0.47 | 0.63 | 51 | 57 | |

(vi) | 0.02 | 0.00 | 0.55 | 0.02 | 0.83 | 0.00 | 69 | 14 | |

Logistic | (i) | 0.10 | 0.00 | 0.76 | 0.00 | 0.94 | 0.00 | 200 | 14 |

(ii) | 0.08 | 0.00 | 0.68 | 0.00 | 0.91 | 0.00 | 127 | 12 | |

(iii) | 0.05 | 0.00 | 0.50 | 0.89 | 0.61 | 0.00 | 138 | 89 | |

(iv) | 0.07 | 0.00 | 0.65 | 0.00 | 0.94 | 0.00 | 136 | 8 | |

(v) | 0.03 | 0.00 | 0.56 | 0.01 | 0.67 | 0.00 | 111 | 55 | |

(vi) | 0.03 | 0.00 | 0.62 | 0.00 | 0.80 | 0.00 | 73 | 18 | |

Laplace | (i) | 0.17 | 0.00 | 0.85 | 0.00 | 0.98 | 0.00 | 268 | 5 |

(ii) | 0.12 | 0.00 | 0.72 | 0.00 | 0.97 | 0.00 | 147 | 4 | |

(iii) | 0.13 | 0.00 | 0.62 | 0.00 | 0.79 | 0.00 | 184 | 49 | |

(iv) | 0.12 | 0.00 | 0.73 | 0.00 | 0.98 | 0.00 | 175 | 4 | |

(v) | 0.10 | 0.00 | 0.67 | 0.00 | 0.81 | 0.00 | 192 | 45 | |

(vi) | 0.08 | 0.00 | 0.78 | 0.00 | 0.89 | 0.00 | 161 | 21 | |

Contaminated | (i) | 0.17 | 0.00 | 0.81 | 0.00 | 0.96 | 0.00 | 264 | 11 |

(ii) | 0.14 | 0.00 | 0.71 | 0.00 | 0.93 | 0.00 | 174 | 13 | |

(iii) | 0.08 | 0.00 | 0.55 | 0.04 | 0.72 | 0.00 | 161 | 61 | |

(iv) | 0.12 | 0.00 | 0.75 | 0.00 | 0.94 | 0.00 | 212 | 14 | |

(v) | 0.08 | 0.00 | 0.66 | 0.00 | 0.85 | 0.00 | 182 | 31 | |

(vi) | 0.06 | 0.00 | 0.65 | 0.00 | 0.82 | 0.00 | 157 | 34 |

In the case where the data were simulated from a mixture of Laplace or contaminated Gaussian distributions the HM-estimator outperformed the ML-estimator and all the observed differences were significant:

Configuration (vii) defined a non-mixture distribution for which the desired result would be an average FARI value around zero and few high FARI values. Overall, both methods performed as expected and no clear differences between the methods were observed, with the exception that the ML method was more variable in the case

The FARI observed for the hybrid and maximum likelihood methods. 500 samples each with 50 observations, are generated from logistic mixture distributions with the parameter configurations (i)–(vi). Samples for which the hybrid method performs considerably better (worse) than the maximum likelihood estimator are in the upper (lower) shaded area. Points inside the white area mark samples that correspond to inconsiderable differences. A difference is regarded as considerable if the absolute difference in the methods’ FARI exceeds 0.1

The FARI observed for the hybrid and maximum likelihood methods. 500 samples each with 50 observations, are generated from Laplace mixture distributions with the parameter configurations (i)–(vi). Samples for which the hybrid method performs considerably better (worse) than the maximum likelihood estimator are in the upper (lower) shaded area. Points inside the white area mark samples that correspond to inconsiderable differences. A difference is regarded as considerable if the absolute difference in the methods’ FARI exceeds 0.1

The FARI observed for the hybrid and maximum likelihood methods. 500 samples each with 50 observations, are generated from contaminated Gaussian mixture distributions with the parameter configurations (i)–(vi). Samples for which the hybrid method performs considerably better (worse) than the maximum likelihood estimator are in the upper (lower) shaded area. Points inside the white area mark samples that correspond to inconsiderable differences. A difference is regarded as considerable if the absolute difference in the methods’ FARI exceeds 0.1

The methods ability to estimate the mixing proportion

The HM-estimator (of the proportion parameter

The accuracy of the HM- and ML-estimators with regard to estimating the proportion parameter

Estimation of the mixing proportion, |
||||||||||

Data | True | Mean | p-val | |||||||

HM | ML | HM | ML | HM | ML | |||||

Normal | (i) | 0.50 | 0.543 | 0.483 | 0.043 | −0.017 | 0.052 | 0.084 | 0.033 | 0.000 |

(ii) | 0.50 | 0.517 | 0.508 | 0.017 | 0.008 | 0.017 | 0.036 | 0.019 | 0.000 | |

(iii) | 0.25 | 0.331 | 0.280 | 0.081 | 0.030 | 0.028 | 0.027 | −0.000 | 0.889 | |

(iv) | 0.50 | 0.471 | 0.481 | −0.029 | −0.019 | 0.014 | 0.023 | 0.009 | 0.000 | |

(v) | 0.25 | 0.238 | 0.246 | −0.012 | −0.004 | 0.011 | 0.013 | 0.002 | 0.104 | |

(vi) | 0.50 | 0.362 | 0.423 | −0.138 | −0.077 | 0.040 | 0.041 | 0.001 | 0.617 | |

Logistic | (i) | 0.50 | 0.553 | 0.518 | 0.053 | 0.018 | 0.043 | 0.081 | 0.037 | 0.000 |

(ii) | 0.50 | 0.504 | 0.503 | 0.004 | 0.003 | 0.013 | 0.029 | 0.016 | 0.000 | |

(iii) | 0.25 | 0.314 | 0.338 | 0.064 | 0.088 | 0.022 | 0.040 | 0.017 | 0.000 | |

(iv) | 0.50 | 0.488 | 0.528 | −0.012 | 0.028 | 0.012 | 0.024 | 0.012 | 0.000 | |

(v) | 0.25 | 0.254 | 0.297 | 0.004 | 0.047 | 0.010 | 0.020 | 0.011 | 0.000 | |

(vi) | 0.50 | 0.400 | 0.489 | −0.100 | −0.011 | 0.034 | 0.035 | 0.001 | 0.627 | |

Laplace | (i) | 0.50 | 0.533 | 0.511 | 0.033 | 0.011 | 0.029 | 0.074 | 0.044 | 0.000 |

(ii) | 0.50 | 0.484 | 0.498 | −0.016 | −0.002 | 0.011 | 0.025 | 0.015 | 0.000 | |

(iii) | 0.25 | 0.295 | 0.383 | 0.045 | 0.133 | 0.014 | 0.048 | 0.034 | 0.000 | |

(iv) | 0.50 | 0.491 | 0.561 | −0.009 | 0.061 | 0.011 | 0.025 | 0.014 | 0.000 | |

(v) | 0.25 | 0.280 | 0.371 | 0.030 | 0.121 | 0.012 | 0.035 | 0.024 | 0.000 | |

(vi) | 0.50 | 0.445 | 0.536 | −0.055 | 0.036 | 0.029 | 0.050 | 0.021 | 0.000 | |

Contaminated | (i) | 0.50 | 0.517 | 0.491 | 0.017 | −0.009 | 0.046 | 0.083 | 0.037 | 0.000 |

(ii) | 0.50 | 0.488 | 0.502 | −0.012 | 0.002 | 0.008 | 0.025 | 0.016 | 0.000 | |

(iii) | 0.25 | 0.276 | 0.362 | 0.026 | 0.112 | 0.011 | 0.036 | 0.024 | 0.000 | |

(iv) | 0.50 | 0.491 | 0.563 | −0.009 | 0.063 | 0.011 | 0.024 | 0.014 | 0.000 | |

(v) | 0.25 | 0.257 | 0.359 | 0.007 | 0.109 | 0.009 | 0.024 | 0.014 | 0.000 | |

(vi) | 0.50 | 0.426 | 0.527 | −0.074 | 0.027 | 0.033 | 0.040 | 0.006 | 0.051 |

In [

We applied a supervised procedure to get a subset of the 3,571 genes which were expressed differently with respect to the ALL/AML grouping. For each gene

We applied the HM and ML clustering methods to each of the 342 test variables to compare their ability to divide the 72 cancer samples into the ALL and AML groups. The analysis was carried out as described in Section

Independent of method the overall performance was rather poor; most of the clusterings had a FARI between 0 and 0.4, Figure

The observed differences between the HM and ML methods were all significant and in favor of the hybrid method. For 213 of the 342 test genes the HM method clustered the samples more accurately than the ML method (i.e.

Clustering results for the cancer data. The fuzzy adjusted Rand indices (FARI) observed for the hybrid and maximum likelihood methods. Data were taken from a microarray experiment on gene expression levels in two types of acute leukemia: ALL and AML. 342 genes were measured across 72 samples. Genes for which the hybrid method performed considerably better (worse) than the maximum likelihood estimator are in the upper (lower) shaded area. Here a difference was defined to be considerable if the absolute difference in FARI between the methods was larger than 0.1

We consider a univariate cluster problem, which arises in many applications, where the data are generated from a mixture distribution with two components and where the aim is to group samples of the same type. This problem is commonly solved using the EM-algorithm based on the assumption that the observations are generated by a mixture of two normal densities. Although this is a powerful method it is also sensitive to incorrectly specified distributions. Furthermore, the assumption that data approximately follow a normal mixture is rather restrictive, which makes the EM-approach unfeasible in many applications.

The use of hybrid methods in mixture problems is, to the best of our knowledge, rather unexplored. The variant we propose can be motivated as follows: the method of moments is general in the sense that the parametric family can be left unspecified, it is enough to assume that the component densities are symmetric and have finite moments, and the minimum distance method is robust against symmetric departures from the assumed normal mixture distribution.

The results suggest that the proposed HM-estimator has a considerably better ability to cluster the samples than the ML-estimator, in particular if the assumption of a normal mixture is incorrect. This result is observed for both simulated and real data, and holds independently of the sample size. A slight advantage for the HM-estimator is also observed in the case where the Gaussian mixture assumption is valid.

We also consider estimation of the mixing proportion

It should be noted that the HM-estimator can easily be adapted to any parametric mixture of symmetric densities, not just the normal mixture distribution. Furthermore, we can consider a less restrictive assumption that allows the components distributions to be of several types. For example, we may use the composite assumption that the data are generated by a mixture of two normal distributions, the mixture of two Laplace distributions, or the mixture of one normal distribution and one Laplace distribution. In this case parameter estimates can be obtained via the proposed hybrid method, either by extending the distance function, or by deriving the HM-estimator for each assumed mixture distribution and take the estimator with the best fit to the empirical data to be used as the final estimator. Further studies are needed to show that this approach is reasonable.

A general drawback with the method of moments is that the estimating equations sometimes lack solutions, and our variant is not an exception. However, this problem is usually overlooked and does not seem to be of practical importance, see [

We use the FARI to evaluate the performance of the clustering methods, the reason for this is that FARI has a higher resolution than the ordinary adjusted Rand index, and is therefore better to separate approaches for which the clustering performance is relatively similar.

We propose to robustify the hybrid estimator by using trimmed (5% removed) versions of the sample moments. This is to enable high performance also in the presence of outliers, which are often encountered in real datasets and modeled here by the Laplace and contaminated Gaussian mixtures. For some of our simulations we have applied the HM method without trimming. Overall the results are usually better when we apply the 5% trimming, but there are some exceptions (data not shown). Moreover, one could argue that the ML-estimator may perform better if some of the extreme observations are removed prior to the estimation. The 5% trimming used in our simulations is merely for illustration and should not be taken as a general recommendation; how to choose the trimming level on the basis of data is a topic for future research.

In most applications several variables are observed and the common practice is to base the clustering on all, or at least several, of the observed variables. For high-dimensional genomic data this type of approaches has been shown to be difficult and non-informative variables need to be removed in order to have success [

To conclude, the proposed moment-distance hybrid method has good clustering performance, is robust against incorrect model assumptions and can easily be applied to a wide range of problems.

If

The

The density of the

The

The material in this section is based on the paper [

Let

Let

a = the number of pairs in

b = the number of pairs in

c = the number of pairs in

d = the number of pairs in

The Rand index (RI) for

Next we give a more formal definition of the numbers

The partitioning considered in the previous section is called

Next we introduce indexes of similarity between two fuzzy partitions

Despite the notation

The average clustering performance of the hybrid method (HM) and the maximum likelihood (ML) method. 500 samples with 100 observations each were generated from four mixture distributions (normal, logistic, Laplace and contaminated Gaussian) with the parameter configurations (i)–(vi). The fuzzy adjusted Rand index (FARI) was obtained for each sample and estimator. The mean FARI was observed for each scenario, and the mean of the optimal FARI (opt.) obtained using the true mixture distribution serves as a reference

Mean of fuzzy adjusted Rand index, |
||||

Data | ||||

Normal | (i) | 0.27 | 0.19 | 0.35 |

(ii) | 0.64 | 0.61 | 0.69 | |

(iii) | 0.56 | 0.60 | 0.71 | |

(iv) | 0.57 | 0.55 | 0.60 | |

(v) | 0.67 | 0.68 | 0.71 | |

(vi) | 0.29 | 0.27 | 0.29 | |

Logistic | (i) | 0.43 | 0.19 | 0.45 |

(ii) | 0.73 | 0.59 | 0.70 | |

(iii) | 0.68 | 0.47 | 0.71 | |

(iv) | 0.65 | 0.49 | 0.65 | |

(v) | 0.67 | 0.53 | 0.73 | |

(vi) | 0.35 | 0.26 | 0.40 | |

Laplace | (i) | 0.33 | 0.19 | 0.39 |

(ii) | 0.69 | 0.62 | 0.70 | |

(iii) | 0.65 | 0.58 | 0.71 | |

(iv) | 0.60 | 0.52 | 0.63 | |

(v) | 0.69 | 0.65 | 0.72 | |

(vi) | 0.32 | 0.28 | 0.33 | |

Contaminated | (i) | 0.48 | 0.20 | 0.59 |

(ii) | 0.81 | 0.64 | 0.81 | |

(iii) | 0.78 | 0.61 | 0.80 | |

(iv) | 0.76 | 0.54 | 0.72 | |

(v) | 0.78 | 0.64 | 0.70 | |

(vi) | 0.40 | 0.30 | 0.42 |

The relative clustering performance of the hybrid method (HM) and the maximum likelihood (ML) method. 500 samples each with 100 observations, are generated from four mixture distributions (normal, logistic, Laplace, and contaminated Gaussian) with the parameter configurations (i)–(vi). The fuzzy adjusted Rand index (FARI) is observed for each sample and estimator. For each scenario we observe: the mean of the differences between the observed average FARI values for the HM- and ML-estimators (

Comparison of HM and ML for soft clustering, |
|||||||||

Dist. | p-value | p-value | p-value | ||||||

Normal | (i) | 0.09 | 0.00 | 0.74 | 0.00 | 0.95 | 0.00 | 187 | 10 |

(ii) | 0.03 | 0.00 | 0.67 | 0.00 | 0.77 | 0.00 | 58 | 17 | |

(iii) | −0.04 | 0.00 | 0.36 | 0.00 | 0.32 | 0.00 | 69 | 146 | |

(iv) | 0.02 | 0.00 | 0.53 | 0.20 | 0.98 | 0.00 | 57 | 1 | |

(v) | −0.01 | 0.03 | 0.37 | 0.00 | 0.57 | 0.41 | 30 | 23 | |

(vi) | 0.02 | 0.00 | 0.64 | 0.00 | 0.91 | 0.00 | 40 | 4 | |

Logistic | (i) | 0.24 | 0.00 | 0.94 | 0.00 | 0.99 | 0.00 | 349 | 2 |

(ii) | 0.14 | 0.00 | 0.82 | 0.00 | 0.99 | 0.00 | 144 | 1 | |

(iii) | 0.21 | 0.00 | 0.77 | 0.00 | 0.93 | 0.00 | 246 | 20 | |

(iv) | 0.16 | 0.00 | 0.84 | 0.00 | 1.00 | 0.00 | 258 | 0 | |

(v) | 0.14 | 0.00 | 0.77 | 0.00 | 0.94 | 0.00 | 263 | 17 | |

(vi) | 0.10 | 0.00 | 0.90 | 0.00 | 0.98 | 0.00 | 193 | 3 | |

Laplace | (i) | 0.14 | 0.00 | 0.83 | 0.00 | 0.95 | 0.00 | 256 | 12 |

(ii) | 0.08 | 0.00 | 0.73 | 0.00 | 0.93 | 0.00 | 90 | 7 | |

(iii) | 0.07 | 0.00 | 0.54 | 0.05 | 0.66 | 0.00 | 142 | 73 | |

(iv) | 0.07 | 0.00 | 0.71 | 0.00 | 0.96 | 0.00 | 157 | 7 | |

(v) | 0.04 | 0.00 | 0.52 | 0.30 | 0.80 | 0.00 | 114 | 28 | |

(vi) | 0.04 | 0.00 | 0.75 | 0.00 | 1.00 | 0.00 | 80 | 0 | |

Contaminated | (i) | 0.28 | 0.00 | 0.92 | 0.00 | 0.99 | 0.00 | 375 | 5 |

(ii) | 0.17 | 0.00 | 0.84 | 0.00 | 1.00 | 0.00 | 192 | 0 | |

(iii) | 0.17 | 0.00 | 0.74 | 0.00 | 0.92 | 0.00 | 279 | 25 | |

(iv) | 0.23 | 0.00 | 0.91 | 0.00 | 1.00 | 0.00 | 366 | 2 | |

(v) | 0.14 | 0.00 | 0.84 | 0.00 | 0.96 | 0.00 | 277 | 13 | |

(vi) | 0.10 | 0.00 | 0.84 | 0.00 | 0.93 | 0.00 | 174 | 14 |

The average clustering performance of the hybrid method (HM) and the maximum likelihood (ML) method. 500 samples with 500 observations each were generated from four mixture distributions (normal, logistic, Laplace and contaminated Gaussian) with the parameter configurations (i)–(vi). The fuzzy adjusted Rand index (FARI) was obtained for each sample and estimator. The mean FARI was observed for each scenario, and the mean of the optimal FARI (opt.) obtained using the true mixture distribution serves as a reference

Mean of fuzzy adjusted Rand index, |
||||

Data | ||||

Normal | (i) | 0.27 | 0.25 | 0.35 |

(ii) | 0.67 | 0.67 | 0.68 | |

(iii) | 0.54 | 0.69 | 0.71 | |

(iv) | 0.57 | 0.59 | 0.60 | |

(v) | 0.69 | 0.71 | 0.71 | |

(vi) | 0.32 | 0.30 | 0.30 | |

Logistic | (i) | 0.45 | 0.07 | 0.45 |

(ii) | 0.75 | 0.66 | 0.70 | |

(iii) | 0.74 | 0.34 | 0.71 | |

(iv) | 0.64 | 0.41 | 0.65 | |

(v) | 0.67 | 0.48 | 0.73 | |

(vi) | 0.38 | 0.28 | 0.39 | |

Laplace | (i) | 0.33 | 0.16 | 0.39 |

(ii) | 0.71 | 0.70 | 0.70 | |

(iii) | 0.65 | 0.62 | 0.71 | |

(iv) | 0.59 | 0.52 | 0.63 | |

(v) | 0.69 | 0.63 | 0.72 | |

(vi) | 0.34 | 0.30 | 0.33 | |

Contaminated | (i) | 0.48 | 0.04 | 0.59 |

(ii) | 0.84 | 0.62 | 0.81 | |

(iii) | 0.81 | 0.49 | 0.80 | |

(iv) | 0.76 | 0.49 | 0.71 | |

(v) | 0.77 | 0.62 | 0.71 | |

(vi) | 0.45 | 0.36 | 0.42 |

The relative clustering performance of the hybrid method (HM) and the maximum likelihood (ML) method. 500 samples each with 500 observations, are generated from four mixture distributions (normal, logistic, Laplace, and contaminated Gaussian) with the parameter configurations (i)–(vi). The fuzzy adjusted Rand index (FARI) is observed for each sample and estimator. For each scenario we observe: the mean of the differences between the observed average FARI values for the HM- and ML-estimators (

Comparison of HM and ML for soft clustering, |
|||||||||

Data | p-value | p-value | p-value | ||||||

Normal | (i) | 0.02 | 0.00 | 0.47 | 0.14 | 0.70 | 0.00 | 112 | 48 |

(ii) | 0.00 | 0.01 | 0.58 | 0.00 | 0.40 | 1.00 | 2 | 3 | |

(iii) | −0.15 | 0.00 | 0.05 | 0.00 | 0.02 | 0.00 | 7 | 364 | |

(iv) | −0.01 | 0.00 | 0.21 | 0.00 | 0.80 | 0.38 | 4 | 1 | |

(v) | −0.01 | 0.00 | 0.18 | 0.00 | 0.00 | 1.00 | 0 | 1 | |

(vi) | 0.02 | 0.00 | 0.90 | 0.00 | – | – | 0 | 0 | |

Logistic | (i) | 0.38 | 0.00 | 0.99 | 0.00 | 1.00 | 0.00 | 474 | 0 |

(ii) | 0.09 | 0.00 | 0.85 | 0.00 | 1.00 | 0.00 | 68 | 0 | |

(iii) | 0.40 | 0.00 | 0.91 | 0.00 | 0.99 | 0.00 | 409 | 2 | |

(iv) | 0.22 | 0.00 | 0.97 | 0.00 | 1.00 | 0.00 | 438 | 0 | |

(v) | 0.18 | 0.00 | 0.99 | 0.00 | 1.00 | 0.00 | 464 | 0 | |

(vi) | 0.10 | 0.00 | 1.00 | 0.00 | 1.00 | 0.00 | 229 | 0 | |

Laplace | (i) | 0.17 | 0.00 | 0.80 | 0.00 | 0.96 | 0.00 | 314 | 12 |

(ii) | 0.01 | 0.00 | 0.69 | 0.00 | 0.83 | 0.22 | 5 | 1 | |

(iii) | 0.03 | 0.00 | 0.39 | 0.00 | 0.49 | 0.90 | 122 | 125 | |

(iv) | 0.08 | 0.00 | 0.73 | 0.00 | 0.99 | 0.00 | 206 | 1 | |

(v) | 0.06 | 0.00 | 0.79 | 0.00 | 1.00 | 0.00 | 156 | 0 | |

(vi) | 0.05 | 0.00 | 1.00 | 0.00 | 1.00 | 0.01 | 8 | 0 | |

Contaminated | (i) | 0.44 | 0.00 | 0.99 | 0.00 | 1.00 | 0.00 | 495 | 0 |

(ii) | 0.22 | 0.00 | 0.95 | 0.00 | 1.00 | 0.00 | 230 | 0 | |

(iii) | 0.32 | 0.00 | 0.96 | 0.00 | 1.00 | 0.00 | 461 | 2 | |

(iv) | 0.26 | 0.00 | 1.00 | 0.00 | 1.00 | 0.00 | 496 | 0 | |

(v) | 0.15 | 0.00 | 1.00 | 0.00 | 1.00 | 0.00 | 416 | 0 | |

(vi) | 0.09 | 0.00 | 1.00 | 0.00 | 1.00 | 0.00 | 136 | 0 |

The accuracy of the HM- and ML-estimators with regard to estimating the proportion parameter

Estimation of the mixing proportion, |
||||||||||

Data | True | Mean | p-val | |||||||

HM | ML | HM | ML | HM | ML | |||||

Normal | (i) | 0.50 | 0.594 | 0.524 | 0.094 | 0.024 | 0.049 | 0.088 | 0.040 | 0.000 |

(ii) | 0.50 | 0.524 | 0.503 | 0.024 | 0.003 | 0.010 | 0.015 | 0.006 | 0.000 | |

(iii) | 0.25 | 0.358 | 0.287 | 0.108 | 0.037 | 0.026 | 0.020 | −0.007 | 0.005 | |

(iv) | 0.50 | 0.489 | 0.488 | −0.011 | −0.012 | 0.006 | 0.011 | 0.005 | 0.000 | |

(v) | 0.25 | 0.257 | 0.258 | 0.007 | 0.008 | 0.005 | 0.007 | 0.001 | 0.041 | |

(vi) | 0.50 | 0.365 | 0.438 | −0.135 | −0.062 | 0.029 | 0.025 | −0.004 | 0.080 | |

Logistic | (i) | 0.50 | 0.548 | 0.478 | 0.048 | −0.022 | 0.020 | 0.094 | 0.074 | 0.000 |

(ii) | 0.50 | 0.496 | 0.505 | −0.004 | 0.005 | 0.005 | 0.022 | 0.018 | 0.000 | |

(iii) | 0.25 | 0.290 | 0.427 | 0.040 | 0.177 | 0.011 | 0.064 | 0.053 | 0.000 | |

(iv) | 0.50 | 0.509 | 0.610 | 0.009 | 0.110 | 0.006 | 0.026 | 0.020 | 0.000 | |

(v) | 0.25 | 0.284 | 0.404 | 0.034 | 0.154 | 0.008 | 0.039 | 0.031 | 0.000 | |

(vi) | 0.50 | 0.485 | 0.592 | −0.015 | 0.092 | 0.016 | 0.042 | 0.027 | 0.000 | |

Laplace | (i) | 0.50 | 0.584 | 0.506 | 0.084 | 0.006 | 0.033 | 0.090 | 0.057 | 0.000 |

(ii) | 0.50 | 0.511 | 0.500 | 0.011 | −0.000 | 0.007 | 0.020 | 0.013 | 0.000 | |

(iii) | 0.25 | 0.307 | 0.344 | 0.057 | 0.094 | 0.014 | 0.037 | 0.023 | 0.000 | |

(iv) | 0.50 | 0.509 | 0.556 | 0.009 | 0.056 | 0.006 | 0.016 | 0.010 | 0.000 | |

(v) | 0.25 | 0.264 | 0.308 | 0.014 | 0.058 | 0.005 | 0.014 | 0.009 | 0.000 | |

(vi) | 0.50 | 0.412 | 0.516 | −0.088 | 0.016 | 0.020 | 0.020 | 0.000 | 0.979 | |

Contaminated | (i) | 0.50 | 0.548 | 0.505 | 0.048 | 0.005 | 0.025 | 0.108 | 0.083 | 0.000 |

(ii) | 0.50 | 0.494 | 0.508 | −0.006 | 0.008 | 0.004 | 0.022 | 0.018 | 0.000 | |

(iii) | 0.25 | 0.266 | 0.398 | 0.016 | 0.148 | 0.006 | 0.038 | 0.032 | 0.000 | |

(iv) | 0.50 | 0.499 | 0.610 | −0.001 | 0.110 | 0.005 | 0.031 | 0.026 | 0.000 | |

(v) | 0.25 | 0.259 | 0.381 | 0.009 | 0.131 | 0.005 | 0.025 | 0.020 | 0.000 | |

(vi) | 0.50 | 0.469 | 0.555 | −0.031 | 0.055 | 0.018 | 0.043 | 0.026 | 0.000 |

The accuracy of the HM- and ML-estimators with regard to estimating the proportion parameter

Estimation of the mixing proportion, |
||||||||||

Data | True | Mean | p-val | |||||||

HM | ML | HM | ML | HM | ML | |||||

Normal | (i) | 0.50 | 0.671 | 0.499 | 0.171 | −0.001 | 0.037 | 0.047 | 0.009 | 0.001 |

(ii) | 0.50 | 0.530 | 0.497 | 0.030 | −0.003 | 0.003 | 0.002 | −0.001 | 0.002 | |

(iii) | 0.25 | 0.403 | 0.260 | 0.153 | 0.010 | 0.026 | 0.004 | −0.023 | 0.000 | |

(iv) | 0.50 | 0.524 | 0.501 | 0.024 | 0.001 | 0.002 | 0.003 | 0.001 | 0.001 | |

(v) | 0.25 | 0.255 | 0.253 | 0.005 | 0.003 | 0.001 | 0.002 | 0.001 | 0.000 | |

(vi) | 0.50 | 0.386 | 0.490 | −0.114 | −0.010 | 0.016 | 0.003 | −0.013 | 0.000 | |

Logistic | (i) | 0.50 | 0.579 | 0.500 | 0.079 | 0.000 | 0.009 | 0.140 | 0.130 | 0.000 |

(ii) | 0.50 | 0.502 | 0.491 | 0.002 | −0.009 | 0.001 | 0.012 | 0.011 | 0.000 | |

(iii) | 0.25 | 0.267 | 0.537 | 0.017 | 0.287 | 0.002 | 0.101 | 0.099 | 0.000 | |

(iv) | 0.50 | 0.523 | 0.659 | 0.023 | 0.159 | 0.002 | 0.029 | 0.027 | 0.000 | |

(v) | 0.25 | 0.294 | 0.439 | 0.044 | 0.189 | 0.003 | 0.038 | 0.034 | 0.000 | |

(vi) | 0.50 | 0.507 | 0.651 | 0.007 | 0.151 | 0.003 | 0.027 | 0.024 | 0.000 | |

Laplace | (i) | 0.50 | 0.641 | 0.505 | 0.141 | 0.005 | 0.025 | 0.098 | 0.073 | 0.000 |

(ii) | 0.50 | 0.520 | 0.501 | 0.020 | 0.001 | 0.001 | 0.002 | 0.000 | 0.323 | |

(iii) | 0.25 | 0.331 | 0.335 | 0.081 | 0.085 | 0.010 | 0.028 | 0.018 | 0.000 | |

(iv) | 0.50 | 0.535 | 0.583 | 0.035 | 0.083 | 0.002 | 0.012 | 0.010 | 0.000 | |

(v) | 0.25 | 0.272 | 0.345 | 0.022 | 0.095 | 0.001 | 0.013 | 0.011 | 0.000 | |

(vi) | 0.50 | 0.443 | 0.556 | −0.057 | 0.056 | 0.006 | 0.005 | −0.001 | 0.019 | |

Contaminated | (i) | 0.50 | 0.600 | 0.496 | 0.100 | −0.004 | 0.013 | 0.175 | 0.162 | 0.000 |

(ii) | 0.50 | 0.491 | 0.494 | −0.009 | −0.006 | 0.001 | 0.020 | 0.020 | 0.000 | |

(iii) | 0.25 | 0.268 | 0.473 | 0.018 | 0.223 | 0.002 | 0.054 | 0.053 | 0.000 | |

(iv) | 0.50 | 0.513 | 0.654 | 0.013 | 0.154 | 0.001 | 0.027 | 0.025 | 0.000 | |

(v) | 0.25 | 0.278 | 0.399 | 0.028 | 0.149 | 0.002 | 0.023 | 0.022 | 0.000 | |

(vi) | 0.50 | 0.487 | 0.580 | −0.013 | 0.080 | 0.003 | 0.016 | 0.014 | 0.000 |

The accuracy of the HM- and ML-estimators with regard to estimating the difference in mean parameter

Estimation of the difference in mean |
||||||||||

Data | True | Mean | p-val | |||||||

HM | ML | HM | ML | HM | ML | |||||

Normal | (i) | 2 | 2.105 | 2.216 | 0.105 | 0.216 | 0.226 | 0.345 | 0.119 | 0.001 |

(ii) | 3 | 2.902 | 2.984 | −0.098 | −0.016 | 0.113 | 0.185 | 0.072 | 0.002 | |

(iii) | 3 | 2.486 | 2.870 | −0.514 | −0.130 | 0.711 | 0.539 | −0.172 | 0.006 | |

(iv) | 4 | 3.953 | 4.166 | −0.047 | 0.166 | 0.429 | 0.538 | 0.109 | 0.065 | |

(v) | 4 | 3.645 | 4.196 | −0.355 | 0.196 | 0.816 | 0.908 | 0.092 | 0.215 | |

(vi) | 3 | 3.889 | 3.887 | 0.889 | 0.887 | 2.191 | 2.788 | 0.596 | 0.011 | |

Logistic | (i) | 2 | 2.099 | 2.048 | 0.099 | 0.048 | 0.248 | 0.481 | 0.234 | 0.000 |

(ii) | 3 | 2.917 | 2.878 | −0.083 | −0.122 | 0.108 | 0.294 | 0.185 | 0.000 | |

(iii) | 3 | 2.566 | 2.561 | −0.434 | −0.439 | 0.627 | 1.004 | 0.376 | 0.000 | |

(iv) | 4 | 3.877 | 3.925 | −0.123 | −0.075 | 0.305 | 0.607 | 0.302 | 0.000 | |

(v) | 4 | 3.572 | 3.686 | −0.428 | −0.314 | 1.137 | 1.375 | 0.238 | 0.047 | |

(vi) | 3 | 3.521 | 3.323 | 0.521 | 0.323 | 1.679 | 2.378 | 0.699 | 0.039 | |

Laplace | (i) | 2 | 2.075 | 1.815 | 0.075 | −0.185 | 0.309 | 0.687 | 0.378 | 0.000 |

(ii) | 3 | 3.011 | 2.855 | 0.011 | −0.145 | 0.080 | 0.347 | 0.267 | 0.000 | |

(iii) | 3 | 2.673 | 2.342 | −0.327 | −0.658 | 0.624 | 1.332 | 0.708 | 0.000 | |

(iv) | 4 | 3.973 | 3.725 | −0.027 | −0.275 | 0.424 | 0.931 | 0.507 | 0.008 | |

(v) | 4 | 3.557 | 3.162 | −0.443 | −0.838 | 1.260 | 2.042 | 0.782 | 0.000 | |

(vi) | 3 | 3.245 | 3.257 | 0.245 | 0.257 | 1.300 | 3.624 | 2.325 | 0.000 | |

Contaminated | (i) | 2 | 2.336 | 2.029 | 0.336 | 0.029 | 1.634 | 1.365 | −0.269 | 0.364 |

(ii) | 3 | 2.950 | 2.799 | −0.050 | −0.201 | 0.064 | 0.551 | 0.486 | 0.000 | |

(iii) | 3 | 2.753 | 2.508 | −0.247 | −0.492 | 0.566 | 1.220 | 0.653 | 0.000 | |

(iv) | 4 | 3.950 | 3.715 | −0.050 | −0.285 | 0.542 | 1.360 | 0.818 | 0.005 | |

(v) | 4 | 3.744 | 3.076 | −0.256 | −0.924 | 1.158 | 1.800 | 0.643 | 0.000 | |

(vi) | 3 | 3.407 | 3.145 | 0.407 | 0.145 | 2.256 | 4.548 | 2.292 | 0.012 |

The accuracy of the HM- and ML-estimators with regard to estimating the difference in mean parameter

Estimation of the difference in mean |
||||||||||

Data | True | Mean | p-val | |||||||

HM | ML | HM | ML | HM | ML | |||||

Normal | (i) | 2 | 2.075 | 2.116 | 0.075 | 0.116 | 0.146 | 0.267 | 0.121 | 0.001 |

(ii) | 3 | 2.880 | 2.962 | −0.120 | −0.038 | 0.072 | 0.095 | 0.023 | 0.076 | |

(iii) | 3 | 2.446 | 2.854 | −0.554 | −0.146 | 0.501 | 0.414 | −0.087 | 0.056 | |

(iv) | 4 | 3.770 | 4.033 | −0.230 | 0.033 | 0.262 | 0.280 | 0.018 | 0.684 | |

(v) | 4 | 3.511 | 4.055 | −0.489 | 0.055 | 0.547 | 0.543 | −0.005 | 0.914 | |

(vi) | 3 | 3.805 | 3.486 | 0.805 | 0.486 | 1.361 | 1.347 | −0.015 | 0.911 | |

Logistic | (i) | 2 | 2.054 | 2.031 | 0.054 | 0.031 | 0.160 | 0.486 | 0.326 | 0.000 |

(ii) | 3 | 2.941 | 2.935 | −0.059 | −0.065 | 0.051 | 0.164 | 0.113 | 0.000 | |

(iii) | 3 | 2.609 | 2.605 | −0.391 | −0.395 | 0.383 | 0.770 | 0.387 | 0.000 | |

(iv) | 4 | 3.837 | 3.761 | −0.163 | −0.239 | 0.210 | 0.464 | 0.255 | 0.000 | |

(v) | 4 | 3.510 | 3.493 | −0.490 | −0.507 | 0.680 | 1.216 | 0.536 | 0.000 | |

(vi) | 3 | 3.503 | 2.993 | 0.503 | −0.007 | 1.021 | 1.118 | 0.098 | 0.744 | |

Laplace | (i) | 2 | 2.104 | 1.768 | 0.104 | −0.232 | 0.179 | 0.779 | 0.599 | 0.000 |

(ii) | 3 | 2.997 | 2.803 | −0.003 | −0.197 | 0.033 | 0.373 | 0.339 | 0.000 | |

(iii) | 3 | 2.776 | 2.085 | −0.224 | −0.915 | 0.340 | 1.656 | 1.316 | 0.000 | |

(iv) | 4 | 3.901 | 3.512 | −0.099 | −0.488 | 0.159 | 0.841 | 0.681 | 0.000 | |

(v) | 4 | 3.563 | 2.899 | −0.437 | −1.101 | 0.828 | 2.116 | 1.288 | 0.000 | |

(vi) | 3 | 2.993 | 2.840 | −0.007 | −0.160 | 0.571 | 3.014 | 2.443 | 0.000 | |

Contaminated | (i) | 2 | 2.141 | 2.094 | 0.141 | 0.094 | 0.825 | 2.103 | 1.277 | 0.000 |

(ii) | 3 | 2.993 | 2.807 | −0.007 | −0.193 | 0.029 | 0.458 | 0.428 | 0.000 | |

(iii) | 3 | 2.822 | 2.194 | −0.178 | −0.806 | 0.217 | 1.208 | 0.991 | 0.000 | |

(iv) | 4 | 3.948 | 3.448 | −0.052 | −0.552 | 0.136 | 1.706 | 1.569 | 0.000 | |

(v) | 4 | 3.741 | 2.818 | −0.259 | −1.182 | 0.562 | 2.015 | 1.453 | 0.000 | |

(vi) | 3 | 3.146 | 3.039 | 0.146 | 0.039 | 2.391 | 5.900 | 3.509 | 0.071 |

The accuracy of the HM- and ML-estimators with regard to estimating the difference in mean parameter

Estimation of the difference in mean |
||||||||||

Data | True | Mean | p-val | |||||||

HM | ML | HM | ML | HM | ML | |||||

Normal | (i) | 2 | 1.962 | 2.044 | −0.038 | 0.044 | 0.016 | 0.078 | 0.063 | 0.008 |

(ii) | 3 | 2.876 | 3.005 | −0.124 | 0.005 | 0.027 | 0.012 | −0.015 | 0.000 | |

(iii) | 3 | 2.231 | 2.986 | −0.769 | −0.014 | 0.633 | 0.040 | −0.593 | 0.000 | |

(iv) | 4 | 3.657 | 3.984 | −0.343 | −0.016 | 0.161 | 0.068 | −0.093 | 0.000 | |

(v) | 4 | 3.351 | 3.983 | −0.649 | −0.017 | 0.450 | 0.181 | −0.269 | 0.000 | |

(vi) | 3 | 3.677 | 3.055 | 0.677 | 0.055 | 0.615 | 0.166 | −0.449 | 0.000 | |

Logistic | (i) | 2 | 1.974 | 1.858 | −0.026 | −0.142 | 0.014 | 0.586 | 0.571 | 0.000 |

(ii) | 3 | 2.915 | 3.033 | −0.085 | 0.033 | 0.015 | 0.021 | 0.006 | 0.247 | |

(iii) | 3 | 2.516 | 2.722 | −0.484 | −0.278 | 0.302 | 0.494 | 0.192 | 0.000 | |

(iv) | 4 | 3.699 | 3.609 | −0.301 | −0.391 | 0.133 | 0.320 | 0.187 | 0.000 | |

(v) | 4 | 3.296 | 3.154 | −0.704 | −0.846 | 0.526 | 1.051 | 0.524 | 0.000 | |

(vi) | 3 | 3.222 | 2.639 | 0.222 | −0.361 | 0.164 | 0.203 | 0.040 | 0.040 | |

Laplace | (i) | 2 | 2.050 | 1.340 | 0.050 | −0.660 | 0.011 | 1.178 | 1.167 | 0.000 |

(ii) | 3 | 2.979 | 2.957 | −0.021 | −0.043 | 0.007 | 0.140 | 0.134 | 0.000 | |

(iii) | 3 | 2.850 | 1.709 | −0.150 | −1.291 | 0.054 | 2.159 | 2.105 | 0.000 | |

(iv) | 4 | 3.817 | 3.185 | −0.183 | −0.815 | 0.067 | 0.793 | 0.726 | 0.000 | |

(v) | 4 | 3.281 | 2.431 | −0.719 | −1.569 | 0.652 | 2.577 | 1.924 | 0.000 | |

(vi) | 3 | 2.836 | 2.312 | −0.164 | −0.688 | 0.076 | 0.529 | 0.453 | 0.000 | |

Contaminated | (i) | 2 | 1.956 | 1.123 | −0.044 | −0.877 | 0.012 | 2.306 | 2.294 | 0.000 |

(ii) | 3 | 2.980 | 2.748 | −0.020 | −0.252 | 0.005 | 0.269 | 0.264 | 0.000 | |

(iii) | 3 | 2.799 | 1.771 | −0.201 | −1.229 | 0.087 | 1.674 | 1.587 | 0.000 | |

(iv) | 4 | 3.848 | 3.171 | −0.152 | −0.829 | 0.047 | 0.888 | 0.840 | 0.000 | |

(v) | 4 | 3.463 | 2.617 | −0.537 | −1.383 | 0.357 | 1.977 | 1.620 | 0.000 | |

(vi) | 3 | 2.955 | 2.440 | −0.045 | −0.560 | 0.078 | 0.466 | 0.389 | 0.000 |

Clustering results in terms of fuzzy adjusted Rand index (FARI) for data containing no information about the class labels, i.e. the mixture components coincide. For each distribution (normal, logistic, Laplace, and contaminated normal), 5 values of the mixing proportion were considered. 500 samples each with 50 observations were generated

Clustering results in terms of fuzzy adjusted Rand index (FARI) for data containing no information about the class labels, i.e. the mixture components coincide. For each distribution (normal, logistic, Laplace, and contaminated normal), 5 values of the mixing proportion were considered. 500 samples each with 100 observations were generated

Clustering results in terms of fuzzy adjusted Rand index (FARI) for data containing no information about the class labels, i.e. the mixture components coincide. For each distribution (normal, logistic, Laplace, and contaminated normal), 5 values of the mixing proportion were considered. 500 samples each with 500 observations were generated