1 Introduction
Directional statistics (aka. spherical statistics) form a relevant subfield of statistics in which one studies directions, rotations, and axes. A typical situation in spherical statistics is that observations are gathered on a sphere, say ${\mathbb{S}^{2}}$, and consequently the methods have to be adapted to non-Euclidean geometry. More generally, one can think of observations on more general compact Riemannian manifolds. Application areas are numerous as, for example, one can think of ${\mathbb{S}^{2}}$ representing the Earth surface, and measurements are then observations on this surface. To name just a few application areas, see [11, 22] for navigation/control in robotics, [14, 16] for modeling wind directional changes, [6] for finding lymphoblastic leukemia cells, [3] for movement of tectonic plates, [12] for modelling of protein chains, and [9, 19, 21] for radiology applications in the context of MIMO-systems. For details on spherical statistics, we refer to the monograph [5].
One of the central problems in statistics is to estimate model parameters from observations. In the spherical context, this can mean, for example, that one assumes a parametric distribution ${P_{\theta }}$ on ${\mathbb{S}^{2}}$ and then uses the data to estimate the unknown θ. The most commonly applied distributions on the sphere are von Mises–Fisher distributions (also called von Mises distributions in $\mathbb{S}$ or Kent distributions in ${\mathbb{S}^{2}}$) that can be viewed as the equivalent of the normal distribution on the sphere. For the parameter estimation for von Mises–Fisher distributions, see [23]. Another widely applied distribution on ${\mathbb{S}^{d}}$ is the projected normal distribution. That is, the distribution of $X/\| X\| $ for $X\sim N(\mu ,\Sigma )$. While the density function of the projected normal is well known (see [7]), the parameter estimation is much less studied due to the complicated nature of the density. In particular, the parameter estimation is studied in the circular case ($\mathbb{S}$), see [15, 16, 20, 24].
In this article, we consider the problem of parameter estimation related to the projected normal distribution onto ${\mathbb{S}^{2}}$. In contrast to the existing literature, we do not consider estimation of the parameters of the projected normal that can be obtained via standard methodology, as the density is completely known. Instead, our aim is to extract information on the underlying normal distribution $N(\mu ,\Sigma )$ (with support on the ambient Euclidean space ${\mathbb{R}^{3}}$) based solely on data consisting of projected points on ${\mathbb{S}^{2}}$. Obviously, we immediately run into identifiability issues if we observe only $X/\| X\| $ instead of $X\sim N(\mu ,\Sigma )$. First of all, it is clear that one cannot identify arbitrary shapes of Σ (for example, the distribution can be arbitrarily spread in the direction μ and this cannot be observed as all points in this direction are equally projected). For this reason, we assume, for the sake of simplicity, isotropic variance $\Sigma ={\sigma ^{2}}{I_{3}}$ and assume $X\sim N(\mu ,{\sigma ^{2}}{I_{3}})$. However, even in this case, we can only estimate the quantities $\mu /\| \mu \| $ (the direction of the location) and ${\sigma ^{2}}/\| \mu {\| ^{2}}$, the reason being that the distribution of the projection $\operatorname{pr}(aX)$ is invariant on $a\gt 0$. This is indeed natural, as intuitively one can only estimate the direction $\mu /\| \mu \| $ from the projections (onto ${\mathbb{S}^{2}}$). Similarly, $N(\mu ,{\sigma ^{2}}{I_{3}})$ for larger σ located in a distant μ seems similar to the normal distribution with smaller variance but located closer, if one is only observing both distributions on the surface of a sphere. We also note that estimation of the direction $\mu /\| \mu \| $ is already well known, and we claim no originality in this respect. Instead, our main contribution is the estimation of $\lambda ={\sigma ^{2}}/\| \mu {\| ^{2}}$. For this, we study the covariance matrix of $X/\| X\| $ on ${\mathbb{S}^{2}}$ and by linking it to certain special functions and analyzing their series expansions, we show that there is a bijective mapping between λ and the covariance matrix of $X/\| X\| $ on ${\mathbb{S}^{2}}$. As the latter can be estimated by using the methods of spherical statistics, we obtain a consistent estimator for λ via the inverse mapping that can be computed, e.g., via the bisection method.
The rest of the article is organized as follows. In Section 2 we present and discuss our main results. We begin with Section 2.1 where we introduce our setup and notation. After that, we discuss the convergence of sample estimators (for mean and covariance) in the context of a general Riemannian manifold. The case of the projected normal in ${\mathbb{S}^{2}}$ is then discussed in Section 2.3. All the proofs are postponed to Section 3.
2 Main result
In this section we present and discuss our main results. First, in Section 2.1, we shall introduce some notation and clarify some terminology used in the context of manifold-valued random variables. Some of the theory relies on differential geometric concepts which are briefly summarized in Appendix A. The convergence of sample covariances on a general compact manifold is discussed in Section 2.2, while the special case of ${\mathbb{S}^{2}}$ and the projected normal distribution is treated in Section 2.3.
2.1 General setting
Let $M\subset {\mathbb{R}^{k}}$ be a smooth n-dimensional manifold. Let $S\in {\mathbb{R}^{k}}$ be a dense subset and let $\operatorname{pr}:S\to M$ be the projection onto $M\subseteq {\mathbb{R}^{k}}$, assumed smooth everywhere on S and hence almost everywhere in ${\mathbb{R}^{k}}$. Let $(\Omega ,\mathbb{P})$ be a probability space and consider a normally distributed (multivariate) random variable
such that
Let ${x_{1}},\dots ,{x_{L}}$ be an independently drawn sample of X. Suppose now that we observe
The question is now, can one estimate the mean and covariance of this projected sample?
In order for this question to have meaning, a notion of mean and covariance is needed for manifold-valued random variables. The intrinsic mean, a.k.a. the Frechét mean (see [17]), of an absolutely continuous random variable $X:\Omega \to M$ is defined as follows.
Definition 2.1.
Let M be a Riemannian manifold with corresponding distance function dist and volume form ${\operatorname{dVol}_{M}}$. Moreover, suppose ${p_{X}}:M\to [0,\infty )$ is a probability density function of some absolutely continuous random variable $X:\Omega \to M$. The expected value of X is then defined by
Remark.
The argument in the infimum in (2.1) need not exist nor does it need to be unique. For example, consider $X\stackrel{\mathrm{d}}{=}N(0,{\sigma ^{2}}I)\in {\mathbb{R}^{n+1}}$, then $\operatorname{pr}(X)$ follows the uniform distribution on ${\mathbb{S}^{n}}$ and all points in ${\mathbb{S}^{n}}$ satisfy the infimum in (2.1). This is a very important difference from the case of ${\mathbb{R}^{n}}$-valued random variables. In ${\mathbb{R}^{n}}$ we know that if X is absolutely continuous, integrable and square integrable, then any point $\mu \in {\mathbb{R}^{n}}$ which minimizes the least square integral
is unique.
The intrinsic definition of the covariance matrix for a random variable on a Riemannian manifold is well known as well (see, e.g., [17]).
Definition 2.2.
Let M be a geodesically complete Riemannian manifold, and let ${\log _{q}}:M\to {T_{q}}M$ be the natural log map (defined a.e.). Let X be an M-valued absolutely continuous random variable with the intrinsic mean $\mu =\mathbb{E}[X]\in M$. Then, the covariance matrix for X is the linear map $\operatorname{Cov}(X):{T_{\mu }}M\to {T_{\mu }}M$ defined by the integral
2.2 Convergence of sample estimators on a compact manifold
Let ${y_{\ell }}$ be a sample of L independent measurements on a compact manifold M. In order to estimate the mean of such a sample, we shall utilize the discrete version of Definition 2.1. This definition follows that of [17].
Note that since the function ${\operatorname{dist}^{2}}$ has good regularity, one may hope to find such an infimum by solving the (nonlinear) equation
It was shown in [8] that if M is compact and if ${y_{\ell }}$ is sampled from a distribution with a unique empirical mean ${x_{0}}$, then the distribution of the above follows a central limit theorem. More precisely, if $\mu =\mathbb{E}[{y_{\ell }}]$ is the true mean of the underlying distribution and if $\hat{\mu }$ is the empirical mean, it holds that
for some linear map $V\in \mathcal{L}({T_{\mu }}M,{T_{\mu }}M)$.
Definition 2.4.
Let ${\{{y_{\ell }}\}_{\ell =1}^{L}}$ be a sample of L points on a Riemannian manifold M with a unique empirical mean $\hat{\xi }$. Then the empirical covariance of the sample is defined by
where $\hat{V}:{T_{\hat{\xi }}}M\to {T_{\hat{\xi }}}M$ is a linear map.
(2.2)
\[ \hat{V}=\frac{1}{L-1}{\sum \limits_{\ell =1}^{L}}{\log _{\hat{\xi }}}({y_{\ell }}){\log _{\hat{\xi }}}{({y_{\ell }})^{T}}\]Since the empirical mean converges by the results of [8], it remains to verify that the empirical covariance in Equation (2.2) converges to the covariance V of the limiting distribution ${\lim \nolimits_{L\to \infty }}\sqrt{L}{\log _{\xi }}\left(\hat{\xi }\right)$.
The following result shows that, in the case of isotropic covariance, the empirical covariance converges. The proof is postponed to Section 3.1.
Theorem 2.5.
Let ${\{{\xi _{\ell }}\}_{\ell =1}^{L}}$ be L independent identically distributed random variables on a compact geodesically complete manifold M. Suppose further they have a unique mean $\mathbb{E}[{\xi _{\ell }}]=\mu $ and an isotropic covariance $\operatorname{Cov}({\xi _{\ell }})=vI$. Then,
\[ \frac{1}{L-1}{\sum \limits_{\ell =1}^{L}}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}\stackrel{\mathbb{P}}{\longrightarrow }vI\]
with the same rate of convergence as the empirical mean $\hat{\mu }$ of the sample ${\{{\xi _{\ell }}\}_{\ell =1}^{L}}$.
Remark.
As far as we know, the rate of convergence for the empirical mean on compact manifolds is not completely solved as of now. In [8] it has been shown that the empirical mean has a rate of convergence $\sqrt{L}$ for a large class of manifolds, but not all compact geodesically complete manifolds. However, [2] provides rates of convergence for the empirical mean for a general class of metric spaces, from which it seems that the rate of convergence $\sqrt{L}$ isn’t necessarily true.
2.3 Observing the projected normal in ${\mathbb{S}^{2}}$
In this section we consider $M={\mathbb{S}^{2}}$, the unit 2-sphere in ${\mathbb{R}^{3}}$. In this case, the projection map is simply
and the domain of definition for pr is $S={\mathbb{R}^{3}}\setminus \{0\}$.
Throughout, we shall use spherical coordinates, i.e. for the classical Cartesian coordinates ${(x,y,z)^{T}}$ in ${\mathbb{R}^{3}}$ we set $x=\cos (\theta )\sin (\phi )$, $y=\sin (\theta )\sin (\phi )$, and $z=\cos (\phi )$. Consider two points ${({\theta _{1}},{\phi _{1}})^{T}}$ and ${({\theta _{2}},{\phi _{2}})^{T}}$. Then it is a classical result (see, e.g., [10]) that the distance function may be written as
It has been shown in [7] that the projected normal, i.e. the random variable defined by $\operatorname{pr}(X)$ where $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )$, has the probability density function
at a point $u={(\cos (\theta )\sin (\phi ),\sin (\theta )\sin (\phi ),\cos (\phi ))^{T}}$, where $A={u^{T}}{\Sigma ^{-1}}u$, $B={u^{T}}{\Sigma ^{-1}}\mu $, $C=-\frac{1}{2}{\mu ^{T}}{\Sigma ^{-1}}\mu $, and $\operatorname{K}=B{A^{-\frac{1}{2}}}$. Moreover, the functions $\varphi ,\Phi :\mathbb{R}\to \mathbb{R}$ are defined as
and
respectively.
(2.3)
\[ {p_{\operatorname{pr}(X)}}(\theta ,\phi )={\left(\frac{1}{2\pi A}\right)^{3/2}}{\left|\Sigma \right|^{-\frac{1}{2}}}\exp (C)\left(\operatorname{K}+{\operatorname{K}^{2}}\frac{\Phi (\operatorname{K})}{\varphi (\operatorname{K})}+\frac{\Phi (\operatorname{K})}{\varphi (\operatorname{K})}\right)\]In our context, this random variable is then observed in ${\mathbb{S}^{2}}$. As ${\mathbb{S}^{2}}$ is an isometrically embedded Riemannian manifold, Definition 2.1 applies for the projected normal. Intuitively, the expectation of a projected normal distribution ought to be $\operatorname{pr}(\mu )$. Unfortunately, for the fully general case of the projected normal, this conjecture will be false. However, by imposing an isotropic covariance Σ onto X we shall show that this intuition is indeed correct. This is the topic of the next result whose proof is postponed in Section 3.2.
Theorem 2.6.
Let X be a normally distributed random variable with average $\mu \in {\mathbb{R}^{3}}$ and covariance matrix Σ, i.e. $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )\in {\mathbb{R}^{3}}$. If $\Sigma ={\sigma ^{2}}{I_{3}}$, then $\mathbb{E}[\operatorname{pr}(X)]=\operatorname{pr}(\mu )$.
Remark.
The above theorem is true whenever μ is an eigenvector of Σ. However, if μ is not an eigenvector of Σ, then the statement of Theorem 2.6 is false in general. To see this, let $\mu ={(0,0,1)^{T}}$ and
\[ \Sigma =\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& 0& 0\\ {} 0& 1& 0.5\\ {} 0& 0.5& 1\end{array}\right).\]
In this case it follows that the probability density function is not symmetric around μ, see Figure 1 below.Fig. 1.
A normally distributed random variable $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )$, with the covariance matrix Σ such that the mean μ is not an eigenvector for Σ, for which $\mathbb{E}[\operatorname{pr}(X)]\ne \operatorname{pr}(\mu )$
In general, the tangent space of ${\mathbb{S}^{2}}$ at a point q is the set of vectors orthogonal (in the Euclidean product inherited from ${\mathbb{R}^{3}}$) to q. Note that the initial velocity of the geodesic connecting a point μ to q has the same direction as $q\times \mu $ (i.e. the cross-product of ${\mathbb{R}^{3}}$). The logarithm map of q centered at a point μ is thus the vector of the normalised starting velocity of the geodesic connecting μ with q times the length of the geodesic connecting μ and q, see [13]. Without loss of generality, consider $\mu ={(0,0,1)^{T}}$. Then
\[ {\log _{\mu }}(q)=\left(\begin{array}{c}\cos (\theta )\sin (\phi )\\ {} \sin (\theta )\sin (\phi )\end{array}\right)\frac{\phi }{\left|\sin (\phi )\right|},\]
where the basis for the vector is in a rewritten form of the basis (of ${T_{\mu }}{\mathbb{S}^{2}}$)
\[ \left\{\left(\begin{array}{c}1\\ {} 0\\ {} 0\end{array}\right),\left(\begin{array}{c}0\\ {} 1\\ {} 0\end{array}\right)\right\}.\]
Hence the intrinsic covariance of $\operatorname{pr}(X)$ can be written as
\[\begin{aligned}{}\operatorname{Cov}(\operatorname{pr}(X))=& {\int _{0}^{2\pi }}{\int _{0}^{\pi }}\left(\begin{array}{c@{\hskip10.0pt}c}{\cos ^{2}}(\theta ){\sin ^{2}}(\phi )& \cos (\theta )\sin (\theta ){\sin ^{2}}(\phi )\\ {} \cos (\theta )\sin (\theta ){\sin ^{2}}(\phi )& {\sin ^{2}}(\theta ){\sin ^{2}}(\phi )\end{array}\right)\\ {} & \times \frac{{\phi ^{2}}}{{\sin ^{2}}(\phi )}{\left(\frac{1}{2\pi }\right)^{3/2}}\exp (-\frac{1}{2{\sigma ^{2}}})\\ {} & \times \left(\operatorname{K}+{\operatorname{K}^{2}}\frac{\Phi (\operatorname{K})}{\varphi (\operatorname{K})}+\frac{\Phi (\operatorname{K})}{\varphi (\operatorname{K})}\right)\sin (\phi )\mathrm{d}\phi \mathrm{d}\theta ,\end{aligned}\]
where $\operatorname{K}=\frac{1}{\sigma }\cos (\phi )$. Or, equivalently, by simplifying and integrating w.r.t. θ,
(2.4)
\[ \begin{aligned}{}\operatorname{Cov}(\operatorname{pr}(X))=& \frac{\pi }{{(2\pi )^{3/2}}}\exp (-\frac{1}{2{\sigma ^{2}}})\left(\begin{array}{c@{\hskip10.0pt}c}1& 0\\ {} 0& 1\end{array}\right)\\ {} & \times {\int _{0}^{\pi }}{\phi ^{2}}\left(\frac{\cos (\phi )}{\sigma }+\left(\frac{{\cos ^{2}}(\phi )}{{\sigma ^{2}}}+1\right)\frac{\Phi (\frac{\cos (\phi )}{\sigma })}{\varphi (\frac{\cos (\phi )}{\sigma })}\right)\sin (\phi )\mathrm{d}\phi \\ {} =:& \left(\begin{array}{c@{\hskip10.0pt}c}1& 0\\ {} 0& 1\end{array}\right)f(\sigma ).\end{aligned}\]Remark.
In the limit as $\sigma \to \infty $, it holds that ${p_{\operatorname{pr}(X)}}\to \frac{1}{4\pi }{𝟙_{{\mathbb{S}^{2}}}}$. The intrinsic covariance in this limit is
On the other hand, in the limit as $\sigma \to 0$, ${p_{\operatorname{pr}(X)}}$ converges to the point mass distribution while the covariance goes to zero.
Note that the above estimates are only for the intrinsic expectation and covariance. If we want to relate these estimates to the extrinsic expectation and covariance, we end up with the obvious problem that $\operatorname{pr}(X)$ and $\operatorname{pr}(aX)$, $a\gt 0$, have the same distribution. Hence we need to choose which normal distribution in ${\mathbb{R}^{3}}$ corresponds to the intrinsic covariance and average. By Lemma 2.6 the average is known (up to a factor). Moreover, it turns out there is a one-to-one correspondence between the extrinsic covariance and the intrinsic covariance up to a factor $a\gt 0$, and thus one may estimate the underlying covariance parameter σ using the relation (2.4) with, e.g., the bisection method.
The degeneracy of the factor a essentially means that we can only estimate the parameter σ in the case when the projected average is the true average of the ${\mathbb{R}^{3}}$-valued normal random variable. More generally, if $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$, then $\frac{X}{\| \mu \| }\stackrel{\mathrm{d}}{=}N\left(\frac{\mu }{\| \mu \| },\frac{{\sigma ^{2}}}{\| \mu {\| ^{2}}}{I_{3}}\right)$. Consequently, we can only estimate the quantities $\frac{\mu }{\| \mu \| }\in {\mathbb{S}^{2}}$ and $\frac{{\sigma ^{2}}}{\| \mu {\| ^{2}}}$. This means that, as expected, we can estimate the direction $\frac{\mu }{\| \mu \| }$ of the extrinsic distribution $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ but not the distance $\| \mu \| $. On the other hand, for the variance we can only estimate the quantity $\frac{{\sigma ^{2}}}{\| \mu {\| ^{2}}}$. This is expected as well, as for the distribution $N(\mu ,{\sigma ^{2}}{I_{3}})$ located far away ($\| \mu \| $ large) the projections onto ${\mathbb{S}^{2}}$ are not wide spread even if $\sigma \gt 0$ is large.
The required one-to-one correspondence between extrinsic and intrinsic covariances is formulated in the following proposition, whose proof is presented in Section 3.3. The statement is also illustrated in Figure 2.
Proposition 2.7.
Let X be a normally distributed random variable with mean $\mu \in {\mathbb{R}^{3}}$ and isotropic variance ${\sigma ^{2}}{I_{3}}$, i.e. $\frac{X}{\left\| \mu \right\| }\stackrel{\mathrm{d}}{=}N(\frac{\mu }{\left\| \mu \right\| },\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}{I_{3}})$. Then
for $0\le v\lt \frac{{\pi ^{2}}-4}{4}$ and the relation
is a bijection.
The bijection f is crucial for observing projected normal random variables. Utilizing this f, if the scalar variance of some normal random variable $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ in ${\mathbb{R}^{3}}$ is known, then the intrinsic scalar variance of the corresponding projected normal random variable is precisely $f(\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}})$. Conversely, if some projected isotropic normal random variable has mean μ and covariance $v{I_{2}}$, then the normal random variable X such that $\operatorname{pr}(X)$ has these parameters is precisely $X\stackrel{\mathrm{d}}{=}N(\mu ,{f^{-1}}(v){I_{3}})$ (up to a positive scalar factor).
As a consequence of Proposition 2.7 and Theorem 2.5 we can conclude that given measurements on the sphere, we can estimate the scalar variance of an isotropically distributed normal random vector in ${\mathbb{R}^{3}}$. This leads to the next result that can be viewed as the main theorem of the present paper. Its proof follows directly from Theorem 2.5 and Proposition 2.7. The rate of convergence is obtained immediately from noting that [8, Theorem 2] applies to ${\mathbb{S}^{2}}$, and thus the empirical mean has rate of convergence $\sqrt{L}$, and by Theorem 2.5 so does the empirical covariance.
Theorem 2.8.
Let X be a normally distributed random variable with mean $\mu \in {\mathbb{R}^{3}}$ and isotropic variance ${\sigma ^{2}}{I_{3}}$, i.e. $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$. Given independent measurements $({x_{1}},{x_{2}},\dots ,{x_{L}})$ from $\operatorname{pr}(X)$, we can estimate $\lambda =\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}$ by
where $\hat{V}$ is the empirical covariance matrix given in Equation (2.2) and where f is the bijection given in Proposition 2.7 and defined in Equation (2.4). Moreover, it holds that
\[ \hat{\lambda }\stackrel{\mathbb{P}}{\longrightarrow }\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}\]
as $L\to \infty $, with rate of convergence $\sqrt{L}$.
Fig. 2.
A plot of the scalar variance of $\operatorname{pr}(X)$, i.e. $f({\sigma ^{2}})$ from Proposition 2.7, where $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ and $\mu \in {\mathbb{S}^{2}}$ is arbitrary. The red line indicates the upper bound $({\pi ^{2}}-4)/4={\lim \nolimits_{\sigma \to \infty }}\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}$
We conclude this section with some simulations. Firstly, Table 1 illustrates an empirical verification of Theorem 2.5 for the case of the manifold ${\mathbb{S}^{2}}$. The simulations are conducted by generating L samples from $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ with $\mu {(0,0,1)^{T}}\in {\mathbb{S}^{2}}$ and $\sigma =1$ and projecting these samples onto ${\mathbb{S}^{2}}$. The sample covariance is computed as in Algorithm 1, and its error is then compared to the theoretical covariance using the Frobenius norm. This error is then averaged over using 100 repetitions of the Monte Carlo method.
Table 1.
The absolute error of 100 times repeated Monte Carlo simulations of the empirical covariance, Equation (2.2), compared to the theoretical covariance of the projected normal distribution, $\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))/2$. Each simulation run uses L data points and $\sigma =1$
L | 30 | 50 | 100 | 1000 | ${10^{4}}$ | ${10^{5}}$ | ${10^{6}}$ |
error | 0.013 | 0.0080 | 0.0055 | 0.0033 | 0.0015 | 8.4e-05 | 2.9e-05 |
Secondly, Figure 3 is an illustration of the convergence result in Theorem 2.8. The simulations are done by generating L samples from $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ with $\mu {(0,0,1)^{T}}\in {\mathbb{S}^{2}}$ and $\sigma =1$. The empirical covariance $\hat{V}$, Equation (2.2), is computed for each set of samples. Then, for each set of samples, the estimator $\hat{\lambda }$ for ${\sigma ^{2}}$, see Theorem 2.8, is computed by ${f^{-1}}(\operatorname{tr}(\hat{V})/2)$. This is done with a 1000-fold repetition.
Fig. 3.
A box plot of the inferred estimator $\hat{\lambda }$ in Theorem 2.8 using the empirical covariance, Equation (2.2), for L randomly generated projected normal random variables with ${\sigma ^{2}}=1$ and $\mu ={(0,0,1)^{T}}$. For each L this is done with 1000 repetitions. The true underlying scalar variance, $\lambda ={f^{-1}}(\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))/2)=1$, is shown in red color. Note that some outliers are omitted for readability of the graph
3 Proofs
3.1 Proof of Theorem 2.5
First, note that
\[ \frac{1}{L-1}{\sum \limits_{\ell =1}^{L}}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}\]
is a linear map ${T_{\hat{\mu }}}{\mathbb{S}^{2}}\to {T_{\hat{\mu }}}{\mathbb{S}^{2}}$, and $\operatorname{Cov}(\xi )$ is by definition a linear map ${T_{\mu }}{\mathbb{S}^{2}}\to {T_{\mu }}{\mathbb{S}^{2}}$. In order to compare $\hat{V}$ and $\operatorname{Cov}(\xi )$ we shall transport parallelly the linear map $\hat{V}:{T_{\hat{\mu }}}\to {T_{\hat{\mu }}}$ to a linear map ${T_{\mu }}{\mathbb{S}^{2}}\to {T_{\mu }}{\mathbb{S}^{2}}$. That is, we look at ${P_{\hat{\mu },\mu }}\hat{V}{P_{\mu ,\hat{\mu }}}$ and see how far away it is from $v{I_{2}}$. It follows from Equation (A.1) that ${P_{\hat{\mu },\mu }}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}{P_{\mu ,\hat{\mu }}}$ has the first order expansion
\[\begin{aligned}{}& {P_{\hat{\mu },\mu }}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}{P_{\mu ,\hat{\mu }}}\\ {} & \hspace{1em}={\log _{\mu }}({\xi _{\ell }}){\log _{\mu }}{({\xi _{\ell }})^{T}}+\left({\nabla _{{\log _{\mu }}(\hat{\mu })}}{\log _{\mu }}({\xi _{\ell }})\right){\log _{\mu }}{({\xi _{\ell }})^{T}}\\ {} & \hspace{2em}+{\log _{\mu }}({\xi _{\ell }}){\left({\nabla _{{\log _{\mu }}(\hat{\mu })}}{\log _{\mu }}({\xi _{\ell }})\right)^{T}}+\mathcal{O}(\operatorname{dist}{(\mu ,\hat{\mu })^{2}}).\end{aligned}\]
Therefore,
\[\begin{aligned}{}& \underset{n\to \infty }{\lim }\left\| \mathbb{E}\left[{P_{\hat{\mu },\mu }}\frac{1}{n-1}{\sum \limits_{\ell =1}^{n}}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}{P_{\mu ,\hat{\mu }}}-vI\right]\right\| \\ {} & \hspace{1em}\le \underset{n\to \infty }{\lim }\left\| \frac{1}{n-1}{\sum \limits_{\ell =1}^{n}}\mathbb{E}\left[{\log _{\mu }}({\xi _{\ell }}){\log _{\mu }}{({\xi _{\ell }})^{T}}\right]-vI\right\| \\ {} & \hspace{2em}+\underset{n\to \infty }{\lim }\left\| \mathbb{E}\left[\left({\nabla _{{\log _{\mu }}(\hat{\mu })}}{\log _{\mu }}({\xi _{\ell }})\right){\log _{\mu }}{({\xi _{\ell }})^{T}}+{\log _{\mu }}({\xi _{\ell }}){\left({\nabla _{{\log _{\mu }}(\hat{\mu })}}{\log _{\mu }}({\xi _{\ell }})\right)^{T}}\right]\right\| \\ {} & \hspace{2em}+\underset{n\to \infty }{\lim }\mathbb{E}\left[\mathcal{O}(\operatorname{dist}{(\hat{\mu },\mu )^{2}})\right]\\ {} & \hspace{1em}\le \left\| vI-vI\right\| +\underset{n\to \infty }{\lim }\mathbb{E}\left[\mathcal{O}(\operatorname{dist}(\hat{\mu },\mu ))\right]\end{aligned}\]
Here, according to [8, Proposition 1], it holds that $\operatorname{dist}(\mu ,\hat{\mu })$ converges to zero almost surely. Since M is compact, it has finite mass and thus $\operatorname{dist}(\mu ,\hat{\mu })$ converges in expectation to 0 by the dominated convergence theorem. Finally, the fact that the empirical covariance has the same rate of convergence as the empirical mean follows from the last inequality. This completes the proof.□
3.2 Proof of Theorem 2.6
By the very definition, we need to show that
for $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$. By rescaling X with a factor of $\frac{1}{\left\| \mu \right\| }$ and by a rotational symmetry, we may assume $\mu ={(-1,0,0)^{T}}$ without loss of generality.
(3.1)
\[ \underset{q\in {\mathbb{S}^{2}}}{\operatorname{arginf}}{\int _{{\mathbb{S}^{2}}}}\operatorname{dist}{(q,y)^{2}}{p_{\operatorname{pr}(X)}}(y)d{\mathbb{S}^{2}}(y)=\operatorname{pr}(\mathbb{E}[X])\]We make a simple first-derivative test to find the minimum in (3.1). Let $({\theta _{1}},{\phi _{1}})$ be the spherical coordinates that are integrated over the sphere and let $({\theta _{2}},{\phi _{2}})$ be the coordinates that minimize the integral. Hence we need to show that $({\theta _{2}},{\phi _{2}})=(\pi ,\pi /2)$. Writing the integral in spherical coordinates and using the fact that the derivative with respect to ${\phi _{2}}$ is zero, it follows that $({\theta _{2}},{\phi _{2}})$ satisfies
\[ \frac{\partial }{\partial {\phi _{2}}}{\int _{{\mathbb{S}^{2}}}}{\operatorname{dist}^{2}}(({\theta _{1}},{\phi _{1}}),({\theta _{2}},{\phi _{2}})){p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\sin ({\phi _{1}})d{\theta _{1}}d{\phi _{1}}=0.\]
Now by the Leibniz integral rule the derivative may be moved inside the integral, and it holds
\[\begin{aligned}{}& {\int _{0}^{2\pi }}{\int _{0}^{\pi }}2\operatorname{dist}(({\theta _{1}},{\phi _{1}}),({\theta _{2}},{\phi _{2}}))\\ {} & \hspace{1em}\times \frac{(\cos ({\phi _{1}})\sin ({\phi _{2}})-\sin ({\phi _{1}})\cos ({\phi _{2}})\cos ({\theta _{2}}-{\theta _{1}}))}{\sqrt{1-{\left(\cos ({\phi _{1}})\cos ({\phi _{2}})+\sin ({\phi _{1}})\sin ({\phi _{2}})\cos ({\theta _{2}}-{\theta _{1}})\right)^{2}}}}\\ {} & \hspace{1em}\times {p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\sin ({\phi _{1}})\mathrm{d}{\theta _{1}}\mathrm{d}{\phi _{1}}\\ {} & \hspace{2em}=0.\end{aligned}\]
Plugging $({\theta _{2}},{\phi _{2}})=(\pi ,\pi /2)$ into the integral on the left-hand side yields an integral
\[\begin{aligned}{}& 2{\int _{0}^{2\pi }}{\int _{0}^{\pi }}\operatorname{acos}(\sin ({\phi _{1}})\cos (\pi -{\theta _{1}}))\frac{\cos ({\phi _{1}})}{\sqrt{1-{\sin ^{2}}({\phi _{1}}){\cos ^{2}}(\pi -{\theta _{1}})}}\\ {} & \hspace{1em}\times {p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\sin ({\phi _{1}})\mathrm{d}{\theta _{1}}\mathrm{d}{\phi _{1}}.\end{aligned}\]
This integral equals zero, which can be seen by observing that the integrand is odd along ${\phi _{1}}$ around $\frac{\pi }{2}$, since ${p_{\operatorname{pr}(X)}}$ is symmetric around μ by construction. By similar arguments, we obtain that
\[ \frac{\partial }{\partial {\theta _{2}}}{\int _{{\mathbb{S}^{2}}}}{\operatorname{dist}^{2}}(({\theta _{1}},{\phi _{1}}),({\theta _{2}},{\phi _{2}})){p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\sin ({\phi _{1}})\mathrm{d}{\phi _{1}}\mathrm{d}{\theta _{1}}\]
reduces at $({\theta _{2}},{\phi _{2}})=(\pi ,\pi /2)$ to
\[\begin{aligned}{}& 2{\int _{0}^{2\pi }}{\int _{0}^{\pi }}\operatorname{acos}(\sin ({\phi _{1}})\cos (\pi -{\theta _{1}}))\frac{{\sin ^{2}}({\phi _{1}})\sin (\pi -{\theta _{1}})}{\sqrt{1-{\sin ^{2}}({\phi _{1}}){\cos ^{2}}(\pi -{\theta _{1}})}}\\ {} & \hspace{1em}\times {p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\mathrm{d}{\phi _{1}}\mathrm{d}{\theta _{1}}\end{aligned}\]
which equals zero by symmetry around ${\theta _{1}}=\pi $. Therefore, one can conclude that $\mu =\operatorname{pr}(\mathbb{E}[X])$ is a local extremum for the integral in Equation (3.1).Next we shall argue why it is a global minimum. Note that a very similar computation will show that $-\mu $, i.e. in spherical coordinates $({\theta _{2}},{\phi _{2}})=(0,\pi /2)$ is another local extremum. In fact, μ and $-\mu $ are the only local extremum for the integral in Equation (3.1) since it is the only two points for which the distance function is rotationally symmetric around the line in ${\mathbb{R}^{3}}$ which is spanned by μ. By direct computation,
\[ {p_{\operatorname{pr}(X)}}\left(\pi ,\frac{\pi }{2}\right)=\frac{1}{{(2\pi )^{3/2}}}\exp (-\frac{1}{2{\sigma ^{2}}})\left(\frac{1}{\sigma }+\frac{1}{{\sigma ^{2}}}\frac{\Phi (\frac{1}{\sigma })}{\varphi (\frac{1}{\sigma })}+\frac{\Phi (\frac{1}{\sigma })}{\varphi (\frac{1}{\sigma })}\right)\]
and
\[ {p_{\operatorname{pr}(X)}}\left(0,\frac{\pi }{2}\right)=\frac{1}{{(2\pi )^{3/2}}}\exp (-\frac{1}{2{\sigma ^{2}}})\left(\frac{-1}{\sigma }+\frac{1}{{\sigma ^{2}}}\frac{\Phi (\frac{-1}{\sigma })}{\varphi (\frac{-1}{\sigma })}+\frac{\Phi (\frac{-1}{\sigma })}{\varphi (\frac{-1}{\sigma })}\right),\]
hence ${p_{\operatorname{pr}(X)}}(\mu )\gt {p_{\operatorname{pr}(X)}}(-\mu )$. More generally, a similar computation shows that ${p_{\operatorname{pr}(X)}}(y)\gt {p_{\operatorname{pr}(X)}}(Ry)$, if $\langle y,\mu \rangle \gt 0$, where R is the reflection mapping over the $y,z$-plane, i.e.
\[ R\left(\begin{array}{c}a\\ {} b\\ {} c\end{array}\right)=\left(\begin{array}{c}-a\\ {} b\\ {} c\end{array}\right).\]
Tautologically, the distance function increases when one goes father, and points being farther away will contribute more into the integral. Therefore $({\theta _{2}},{\phi _{2}})=(\pi ,\pi /2)$ yields a global minimum for the integral in Equation (3.1).□
3.3 Proof of Proposition 2.7
Without loss of generality, we assume $\mu ={(0,0,1)^{T}}$. In this case the functions $A,B,C,\operatorname{K}$ inside Equation (2.3) are given by
By differentiating in x, we get
\[\begin{array}{l}\displaystyle A=\frac{1}{{\sigma ^{2}}},\\ {} \displaystyle B=\frac{1}{{\sigma ^{2}}}\cos (\phi ),\\ {} \displaystyle C=-\frac{1}{2{\sigma ^{2}}},\end{array}\]
and
By (2.4) it holds that $\operatorname{Cov}(\operatorname{pr}(X))$ is isotropic and we can write
\[\begin{aligned}{}\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}& =\frac{1}{{(2\pi )^{1/2}}}\exp (-\frac{1}{2{\sigma ^{2}}})\\ {} & \hspace{1em}\times {\int _{0}^{\pi }}{\phi ^{2}}\left(\frac{\cos (\phi )}{\sigma }+\left(\frac{{\cos ^{2}}(\phi )}{{\sigma ^{2}}}+1\right)\frac{\Phi (\frac{\cos (\phi )}{\sigma })}{\varphi (\frac{\cos (\phi )}{\sigma })}\right)\sin (\phi )\mathrm{d}\phi .\end{aligned}\]
Denote
\[ f(\sigma )=\exp (-\frac{1}{2{\sigma ^{2}}}){\int _{0}^{\pi }}{\phi ^{2}}\left(\frac{\cos (\phi )}{\sigma }+\left(\frac{{\cos ^{2}}(\phi )}{{\sigma ^{2}}}+1\right)\frac{\Phi (\frac{\cos (\phi )}{\sigma })}{\varphi (\frac{\cos (\phi )}{\sigma })}\right)\sin (\phi )\mathrm{d}\phi .\]
In order to obtain the claim, it suffices to prove that $f(\sigma )$ is strictly increasing, from which it follows that $\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}$ is strictly increasing in σ as well. For notational simplicity, we set $x=\frac{1}{\sigma }$ and show that $f(x)$ is strictly decreasing in x, where now $f(x)$, with a slight abuse of notation, is given by
(3.2)
\[ \begin{aligned}{}f(x)& =\exp (-\frac{{x^{2}}}{2})\\ {} & \hspace{1em}\times {\int _{0}^{\pi }}{\phi ^{2}}\left(x\cos (\phi )+\left({x^{2}}{\cos ^{2}}(\phi )+1\right)\frac{\Phi (x\cos (\phi ))}{\varphi (x\cos (\phi ))}\right)\sin (\phi )\mathrm{d}\phi .\end{aligned}\]
\[\begin{aligned}{}& {f^{\prime }}(x)=\exp (\frac{-{x^{2}}}{2})({I_{1}}+{I_{2}}+{I_{3}}),\hspace{1em}\mathrm{where}\\ {} & \hspace{3.33333pt}{I_{1}}:=-x{\int _{0}^{\pi }}{\phi ^{2}}\cos (\phi )x\sin (\phi )\mathrm{d}\phi ,\\ {} & \hspace{3.33333pt}{I_{2}}:={\int _{0}^{\pi }}{\phi ^{2}}\left(-{x^{3}}{\cos ^{2}}(\phi )-x+3x{\cos ^{2}}(\phi )+{x^{3}}{\cos ^{4}}(\phi )\right)\frac{\Phi (x\cos (\phi ))}{\varphi (x\cos (\phi ))}\sin (\phi )\mathrm{d}\phi ,\\ {} & \hspace{3.33333pt}{I_{3}}:={\int _{0}^{\pi }}{\phi ^{2}}\left(2\cos (\phi )+{x^{2}}{\cos ^{3}}(\phi )\right)\sin (\phi )\mathrm{d}\phi .\end{aligned}\]
Immediately, we have that
and
In order to show ${f^{\prime }}(x)\lt 0$, we need to decipher ${I_{2}}$, which is more complicated. The main idea is to show that ${f^{\prime }}(x)$ is analytic for all $x\ge 0$ and then to show that there is a strictly decreasing analytic function between ${f^{\prime }}(x)$ and 0. We begin with Lemma 3.1 showing that $\frac{\Phi }{\varphi }$ is a Dawson-like function and therefore analytic. In Lemma 3.2 the series expansion of ${I_{2}}$ is integrated term-wise. The terms of the series expression for ${I_{2}}$ are given inductively in Lemma 3.3, which are then inserted to give explicit expressions for the terms of ${f^{\prime }}(x)\exp ({x^{2}}/2)$ in Lemma 3.4. Lemma 3.5 then gives a lower bound on the odd terms of ${f^{\prime }}(x)\exp ({x^{2}}/2)$, and Lemma 3.7 gives an upper bound. By Lemma 3.8 we show that the series expression of ${f^{\prime }}(x)\exp ({x^{2}}/2)$ is eventually decreasing as a series, and in combination with Corollary 3.6 it is concluded that ${f^{\prime }}(x)\exp ({x^{2}}/2)$ is entire. The proof is finished by comparing ${f^{\prime }}(x)\exp ({x^{2}}/2)$ to a linear combination of analytic functions computed in Lemma 3.9.Proof.
Note that $\frac{\Phi }{\varphi }(x)$ is very similar to the Dawson function (originally studied in [4], see also [1] for further details) given by
and it has the series expansion
Note that
\[ {\int _{0}^{x}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t=\sqrt{2}{\int _{0}^{x/\sqrt{2}}}\exp (-{u^{2}})\mathrm{d}u\]
by the variable substitution $u=t/\sqrt{2}$. Hence,
\[\begin{aligned}{}\exp (\frac{{x^{2}}}{2}){\int _{0}^{x}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t& =\sqrt{2}{D_{-}}(x/\sqrt{2})\\ {} & =\sqrt{2}{\sum \limits_{k=0}^{\infty }}\frac{{2^{k}}}{(2k+1)!!}\frac{{x^{2k+1}}}{{2^{k}}\sqrt{2}}={\sum \limits_{k=0}^{\infty }}\frac{1}{(2k+1)!!}{x^{2k+1}}.\end{aligned}\]
Moreover,
and hence
has the series expansion
\[ \frac{\sqrt{2\pi }}{2}{\sum \limits_{k=0}^{\infty }}\frac{1}{k!}\frac{{x^{2k}}}{{2^{k}}}=\frac{\sqrt{2\pi }}{2}{\sum \limits_{k=0}^{\infty }}\frac{1}{(2k)!!}{x^{2k}}.\]
It follows that
\[\begin{aligned}{}\frac{\Phi }{\varphi }(x)& =\exp (\frac{{x^{2}}}{2}){\int _{-\infty }^{0}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t+\exp (\frac{{x^{2}}}{2}){\int _{0}^{x}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t\\ {} & ={\sum \limits_{k=0}^{\infty }}\frac{1}{(2k+1)!!}{x^{2k+1}}+\frac{\sqrt{2\pi }}{2}{\sum \limits_{k=0}^{\infty }}\frac{1}{(2k)!!}{x^{2k}}\end{aligned}\]
proving the claimed series representation. Finally, the convergence everywhere follows from the fact that the Dawson function converges everywhere. □Using Lemma 3.1 it follows that ${I_{2}}$ can be rewritten as
Each term in Equation (3.3) involves
which we will study next.
(3.3)
\[ \begin{aligned}{}{I_{2}}& ={\sum \limits_{k=0}^{\infty }}{d_{k}}{\int _{0}^{\pi }}{\phi ^{2}}\left(-{x^{3}}{\cos ^{2}}(\phi )-x+3x{\cos ^{2}}(\phi )+{x^{3}}{\cos ^{4}}(\phi )\right)\\ {} & \hspace{1em}\times {x^{k}}{\cos ^{k}}(\phi )\sin (\phi )\mathrm{d}\phi .\end{aligned}\]Lemma 3.2.
The sequence ${J_{m}}$ satisfies ${J_{0}}={\pi ^{2}}-4$, and for even $m\ne 0$
\[ {J_{m}}=\frac{1}{m+1}\left({\pi ^{2}}-4\frac{m!!}{(m+1)!!}\left(1+{\sum \limits_{j=0}^{\frac{m}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\right)\right),\]
while for $m\in \mathbb{N}$ odd
Proof.
For ${J_{0}}$ we observe immediately that
Next, let m be odd. Then integration by parts gives
\[\begin{aligned}{}{J_{m}}& ={\left[-{\phi ^{2}}\frac{{\cos ^{m+1}}(\phi )}{m+1}\right]_{0}^{\pi }}+\frac{2}{m+1}{\int _{0}^{\pi }}\phi {\cos ^{m+1}}(\phi )\mathrm{d}\phi \\ {} & =-\frac{{\pi ^{2}}}{m+1}+\frac{2}{m+1}\Bigg[\frac{m!!}{(m+1)!!}{\phi ^{2}}+\phi {\sum \limits_{j=0}^{\frac{m-1}{2}}}{\cos ^{m-2j}}(\phi )\sin (\phi )\frac{m!!}{(m-2j)!!}\\ {} & \hspace{1em}\times \frac{(m-2j-1)!!}{(m+1)!!}\Bigg]{_{0}^{\pi }}-\frac{2}{m+1}{\sum \limits_{j=0}^{\frac{m-1}{2}}}\frac{m!!}{(m-2j)!!}\frac{(m-2j-1)!!}{(m+1)!!}\\ {} & \hspace{1em}\times {\int _{0}^{\pi }}{\cos ^{m-2j}}(\phi )\sin (\phi )d\phi -\frac{2}{m+1}\frac{m!!}{(m+1)!!}{\int _{0}^{\pi }}\phi \mathrm{d}\phi \\ {} & =-\frac{{\pi ^{2}}}{m+1}+\frac{2}{m+1}\frac{m!!}{(m+1)!!}{\pi ^{2}}-\frac{1}{m+1}\frac{m!!}{(m+1)!!}{\pi ^{2}}\\ {} & =\frac{{\pi ^{2}}}{m+1}\left(\frac{m!!}{(m+1)!!}-1\right).\end{aligned}\]
Similarly, when $m\gt 0$ is even, integration by parts gives
\[\begin{aligned}{}{J_{m}}& ={\left[-{\phi ^{2}}\frac{{\cos ^{m+1}}(\phi )}{m+1}\right]_{0}^{\pi }}+\frac{2}{m+1}{\int _{0}^{\pi }}\phi {\cos ^{m+1}}(\phi )\mathrm{d}\phi \\ {} & =\frac{{\pi ^{2}}}{m+1}+\frac{2}{m+1}\Bigg[\phi \sin (\phi )\frac{m!!}{(m+1)!!}+\phi {\sum \limits_{j=0}^{\frac{m}{2}-1}}{\cos ^{m-2j}}\sin (\phi )\frac{m!!}{(m-2j)!!}\\ {} & \hspace{1em}\times \frac{(m-1-2j)!!}{(m+1)!!}\Bigg]{_{0}^{\pi }}-\frac{2}{m+1}\frac{m!!}{(m+1)!!}{\int _{0}^{\pi }}\sin (\phi )d\phi \\ {} & \hspace{1em}-\frac{2}{m+1}{\sum \limits_{j=0}^{\frac{m}{2}-1}}\frac{m!!}{(m-2j)!!}\frac{(m-1-2j)!!}{(m+1)!!}{\int _{0}^{\pi }}{\cos ^{m-2j}}(\phi )\sin (\phi )\mathrm{d}\phi \\ {} & =\frac{{\pi ^{2}}}{m+1}-\frac{4}{m+1}\frac{m!!}{(m+1)!!}\\ {} & \hspace{1em}-\frac{4}{m+1}\frac{m!!}{(m+1)!!}{\sum \limits_{j=0}^{\frac{m}{2}-1}}\frac{(m-1-2j)!!}{(m-2j)!!(m+1-2j)}\\ {} & =\frac{1}{m+1}\left({\pi ^{2}}-4\frac{m!!}{(m+1)!!}\left(1+{\sum \limits_{j=0}^{\frac{m}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\right)\right).\end{aligned}\]
This completes the proof. □Proof.
Immediately ${I_{2}}(0)=0$, so the constant term is zero. Next, we get expressions for ${a_{n}}$ in terms of ${d_{n}}$’s and ${J_{n}}$’s from Equation (3.3). Then straightforward calculations give
\[\begin{array}{l}\displaystyle {a_{1}}={d_{0}}(-{J_{0}}+3{J_{2}})=\frac{\sqrt{2\pi }}{2}\left(4-{\pi ^{2}}+\frac{3}{3}\left({\pi ^{2}}-4\frac{2}{3}\left(1+\frac{1}{6}\right)\right)\right)=\frac{4\sqrt{2\pi }}{9},\\ {} \displaystyle {a_{2}}={d_{1}}(-{J_{1}}+3{J_{3}})=\frac{{\pi ^{2}}}{4}+\frac{3{\pi ^{2}}}{4}\left(\frac{3}{8}-1\right)=-\frac{7{\pi ^{2}}}{32},\end{array}\]
and, for $k\ge 3$,
□Lemma 3.4.
Denote
Then
\[ {c_{0}}=-\frac{{\pi ^{2}}}{2},\hspace{1em}{c_{1}}=\frac{4\sqrt{2\pi }}{9},\hspace{1em}{c_{2}}=-\frac{{\pi ^{2}}}{8},\]
for $k\ge 3$ odd
\[ {c_{k}}=\frac{2\sqrt{2\pi }}{(k+2)!!}\left(1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)-\frac{k+1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)\right),\]
and for $k\ge 4$ even
Proof.
Considering the expressions for ${I_{1}}$, ${I_{2}}$, and ${I_{3}}$ gives
\[\begin{array}{l}\displaystyle {c_{0}}=-\frac{{\pi ^{2}}}{2},\\ {} \displaystyle {c_{1}}={a_{1}}=\frac{4\sqrt{2\pi }}{9},\\ {} \displaystyle {c_{2}}={a_{2}}+\frac{3{\pi ^{2}}}{32}=-\frac{{\pi ^{2}}}{8},\end{array}\]
and
where ${a_{k}}$ is given in Lemma 3.3. It follows that for $k\ge 3$ odd we have
\[\begin{aligned}{}{c_{k}}& ={d_{k-1}}(3{J_{k+1}}-{J_{k-1}})+{d_{k-3}}({J_{k+1}}-{J_{k-1}})\\ {} & =\frac{\sqrt{2\pi }}{2}\Bigg(\frac{1}{(k-1)!!}\Bigg[\frac{3}{k+2}\Bigg({\pi ^{2}}-4\frac{(k+1)!!}{(k+2)!!}\\ {} & \hspace{1em}\times \Bigg(1+{\sum \limits_{j=0}^{\frac{k+1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\Bigg)\Bigg)\\ {} & \hspace{1em}-\frac{1}{k}\Bigg({\pi ^{2}}-4\frac{(k-1)!!}{k!!}\Bigg(1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\Bigg)\Bigg)\Bigg]\\ {} & \hspace{1em}+\frac{1}{(k-3)!!}\Bigg[\frac{1}{k+2}\Bigg({\pi ^{2}}-4\frac{(k+1)!!}{(k+2)!!}\\ {} & \hspace{1em}\times \Bigg(1+{\sum \limits_{j=0}^{\frac{k+1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\Bigg)\Bigg)\\ {} & \hspace{1em}-\frac{1}{k}\Bigg({\pi ^{2}}-4\frac{(k-1)!!}{k!!}\Bigg(1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\Bigg)\Bigg)\Bigg]\Bigg),\end{aligned}\]
and after simplifying we have
\[ {c_{k}}=\frac{2\sqrt{2\pi }}{(k+2)!!}\left(1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)-\frac{k+1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)\right).\]
Similarly for $k\ge 4$ even we get
\[\begin{aligned}{}{c_{k}}& ={d_{k-1}}(3{J_{k+1}}-{J_{k-1}})+{d_{k-3}}({J_{k+1}}-{J_{k-1}})\\ {} & =\frac{{\pi ^{2}}}{(k-1)!!}\left[\frac{3}{k+2}\left(\frac{(k+1)!!}{(k+2)!!}-1\right)-\frac{1}{k}\left(\frac{(k-1)!!}{k!!}-1\right)\right]\\ {} & \hspace{1em}+\frac{{\pi ^{2}}}{(k-3)!!}\left[\frac{1}{k+2}\left(\frac{(k+1)!!}{(k+2)!!}-1\right)-\frac{1}{k}\left(\frac{(k-1)!!}{k!!}-1\right)\right]\\ {} =& -\frac{{\pi ^{2}}}{(k+2)!!}.\end{aligned}\]
This completes the proof. □Proof.
First note that
\[\begin{aligned}{}2\sqrt{2\pi }S(3)& =2\sqrt{2\pi }\left(1+\frac{1}{6}\left(\genfrac{}{}{0pt}{}{1}{0}\right)-\frac{4}{5}\frac{1}{{2^{3}}}\left(\genfrac{}{}{0pt}{}{3}{1}\right)\right)\\ {} & =2\sqrt{2\pi }\left(1+\frac{1}{6}-\frac{3}{10}\right)\\ {} & =\frac{26\sqrt{2\pi }}{15}\ge \pi .\end{aligned}\]
It remains to show that $S(k)$ is an increasing function. We have
\[\begin{aligned}{}S(k+2)& =1+{\sum \limits_{j=0}^{\frac{k-1}{2}}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)-\frac{k+3}{{2^{k+2}}(k+4)}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\\ {} & =S(k)+\frac{k+1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)+\frac{1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)\\ {} & \hspace{1em}-\frac{k+3}{{2^{k+2}}(k+4)}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\\ {} & =S(k)+\frac{1}{{2^{k}}}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)-\frac{k+3}{{2^{k+2}}(k+4)}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\\ {} & \ge S(k)+\frac{1}{{2^{k}}}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)-\frac{1}{{2^{k+2}}}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\\ {} & =S(k)+\frac{(k+1)!}{{2^{k+2}}\left(\frac{k+1}{2}\right)!\left(\frac{k+3}{2}\right)!}\left(k+3-(k+2)\right)\\ {} & =S(k)+\frac{(k+1)!}{{2^{k+2}}\left(\frac{k+1}{2}\right)!\left(\frac{k+3}{2}\right)!}\end{aligned}\]
and hence $S(k+2)\ge S(k)$. This completes the proof. □Corollary 3.6.
Proof.
For k even we clearly have ${c_{k}}\lt 0$. On the other hand, for k odd ${c_{k}}$ is positive since
by Lemma 3.5. □
The following result shows that the sum in Lemma 3.5 can also be bounded from above.
Proof.
By elementary manipulations as in the proof of Lemma 3.5 we obtain
\[ S(k+2)=S(k)+\frac{2\sqrt{2\pi }(k+3)}{{2^{k+2}}(k+2)(k+4)}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right).\]
From the Stirling approximation we infer
\[ \left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\le \sqrt{\frac{2}{\pi }}\frac{{2^{k+2}}}{\sqrt{(k+2)}},\]
and hence
Moreover,
allows us to get the estimates
\[ \underset{k\to \infty }{\lim }2\sqrt{2\pi }S(k)\le 4{\sum \limits_{k=0}^{\infty }}\frac{1}{{(2k+1)^{3/2}}}\le {\pi ^{2}}.\]
Since S is increasing by Lemma 3.5, it follows that $S(k)\le {\pi ^{2}}$ for all $k\ge 3$ odd. This completes the proof. □Lemma 3.8.
Let the coefficients ${c_{k}}$, $k=1,2,\dots $, be given by (3.4) and let $x\gt 0$ be fixed. Then the terms in the series
decreases monotonically for M large enough.
Proof.
Consider first the case when k is odd. Then, using the upper bound of Lemma 3.7 yields
\[ \left|\frac{{c_{k+1}}{x^{k+1}}}{{c_{k}}{x^{k}}}\right|\le \frac{\frac{{\pi ^{2}}}{(k+3)!!}}{\frac{{\pi ^{2}}}{(k+2)!!}}x=\frac{(k+2)!!}{(k+3)!!}x\]
which is less than one for sufficiently large k (depending on x). Similarly for k even we can use the lower bound from Lemma 3.5 to obtain
which again is less than one for large enough k. This yields the claim. □Proof.
By differentiating, we get
\[\begin{aligned}{}{M^{\prime }}(x)& ={\sum \limits_{k=0}^{\infty }}\frac{(2k+2){x^{2k+1}}}{(2k+4)!!}={\sum \limits_{k=0}^{\infty }}\frac{{x^{2k+1}}}{(2k+3)!!}\frac{(k+1)}{{2^{2k+2}}}\left(\genfrac{}{}{0pt}{}{2k+3}{k+1}\right).\end{aligned}\]
Now using
gives
\[ {M^{\prime }}(x)\ge {\sum \limits_{k=0}^{\infty }}\frac{\sqrt{k+1}}{(2k+3)!!}{x^{2k+1}}\ge N(x).\]
Similarly, it holds that
\[ {N^{\prime }}(x)={\sum \limits_{k=0}^{\infty }}\frac{(2k+1){x^{2k}}}{(2k+3)!!}={\sum \limits_{k=0}^{\infty }}\frac{{x^{2k}}}{(2k+2)!!}\frac{(2k+1){2^{2k+2}}}{(k+2)\left(\genfrac{}{}{0pt}{}{2k+3}{k+1}\right)}.\]
In this case we can use
leading to
\[ {N^{\prime }}(x)\le {\sum \limits_{k=0}^{\infty }}\frac{{x^{2k}}}{(2k+2)!!}\frac{k+\frac{1}{2}}{(k+2)(2k+3)}\le M(x).\]
Combining the two bounds above gives us
\[\begin{aligned}{}\frac{{\mathrm{d}^{}}}{\mathrm{d}{x^{}}}\left({M^{2}}(x)-{N^{2}}(x)\right)& =2M(x){M^{\prime }}(x)-2N(x){N^{\prime }}(x)\\ {} & \ge M(x)N(x)-M(x)N(x)=0.\end{aligned}\]
Consequently, ${M^{2}}-{N^{2}}$ is an increasing function for $x\ge 0$, which leads to
It follows that
and the proof is complete. □We are finally in the position to prove Proposition 2.7.
Proof of Proposition 2.7..
By Corollary 3.6 and Lemma 3.8 the series expansion (3.4) for ${f^{\prime }}(x)\exp ({x^{2}}/2)$ is convergent by the Leibniz alternating series test. Since the radius of convergence is unbounded, it is analytic, and thus we may split the series into its positive part and negative part, given by
\[ {\sum \limits_{k=0}^{\infty }}{c_{2k}}{x^{2k}}=:Q(x),\hspace{2em}{\sum \limits_{k=0}^{\infty }}{c_{2k+1}}{x^{2k+1}}=:P(x).\]
By Lemma 3.4 it holds that $Q(x)=-{\pi ^{2}}M(x)$ where $M(x)$ is as in Lemma 3.9. Now Lemma 3.7 gives
and hence
where $N(x)$ is as in Lemma 3.9. Therefore
Applying Lemma 3.9 to the above inequality now gives
and thus ${f^{\prime }}(x)\lt 0$ for all $x\ge 0$, where f is given by (3.2). The claim follows from this. □