The area under the receiver operating characteristic curve (AUC) is a suitable measure for the quality of classification algorithms. Here we use the theory of U-statistics in order to derive new confidence intervals for it. The new confidence intervals take into account that only the total sample size used to calculate the AUC can be controlled, while the number of members of the case group and the number of members of the control group are random. We show that the new confidence intervals can not only be used in order to evaluate the quality of the fitted model, but also to judge the quality of the classification algorithm itself. We would like to take this opportunity to show that two popular confidence intervals for the AUC, namely DeLong’s interval and the Mann–Whitney intervals due to Sen, coincide.
Principal Component Analysis (PCA) is a classical technique of dimension reduction for multivariate data. When the data are a mixture of subjects from different subpopulations one can be interested in PCA of some (or each) subpopulation separately. In this paper estimators are considered for PC directions and corresponding eigenvectors of subpopulations in the nonparametric model of mixture with varying concentrations. Consistency and asymptotic normality of obtained estimators are proved. These results allow one to construct confidence sets for the PC model parameters. Performance of such confidence intervals for the leading eigenvalues is investigated via simulations.
We consider a multivariable functional errors-in-variables model $AX\approx B$, where the data matrices A and B are observed with errors, and a matrix parameter X is to be estimated. A goodness-of-fit test is constructed based on the total least squares estimator. The proposed test is asymptotically chi-squared under null hypothesis. The power of the test under local alternatives is discussed.
We consider a multivariate functional measurement error model $AX\approx B$. The errors in $[A,B]$ are uncorrelated, row-wise independent, and have equal (unknown) variances. We study the total least squares estimator of X, which, in the case of normal errors, coincides with the maximum likelihood one. We give conditions for asymptotic normality of the estimator when the number of rows in A is increasing. Under mild assumptions, the covariance structure of the limit Gaussian random matrix is nonsingular. For normal errors, the results can be used to construct an asymptotic confidence interval for a linear functional of X.