Modern Stochastics: Theory and Applications logo


  • Help
Login Register

  1. Home
  2. To appear
  3. On parameter estimation for N(μ,σ2I3) ba ...

Modern Stochastics: Theory and Applications

Submit your article Information Become a Peer-reviewer
  • Article info
  • Full article
  • Related articles
  • More
    Article info Full article Related articles

On parameter estimation for N(μ,σ2I3) based on projected data into S2
Jordi-Lluís Figueras ORCID icon link to view author Jordi-Lluís Figueras details   Aron Persson ORCID icon link to view author Aron Persson details   Lauri Viitasaari ORCID icon link to view author Lauri Viitasaari details  

Authors

 
Placeholder
https://doi.org/10.15559/25-VMSTA279
Pub. online: 17 June 2025      Type: Research Article      Open accessOpen Access

Received
6 November 2024
Revised
12 March 2025
Accepted
16 May 2025
Published
17 June 2025

Abstract

The projected normal distribution, with isotropic variance, on the 2-sphere is considered using intrinsic statistics. It is shown that in this case, the expectation commutes with the projection, and that the covariance of the normal variable has a 1-1 correspondence with the intrinsic covariance of the projected normal distribution. This allows us to estimate, after the model identification, the parameters of the underlying normal distribution that generates the data.

1 Introduction

Directional statistics (aka. spherical statistics) form a relevant subfield of statistics in which one studies directions, rotations, and axes. A typical situation in spherical statistics is that observations are gathered on a sphere, say ${\mathbb{S}^{2}}$, and consequently the methods have to be adapted to non-Euclidean geometry. More generally, one can think of observations on more general compact Riemannian manifolds. Application areas are numerous as, for example, one can think of ${\mathbb{S}^{2}}$ representing the Earth surface, and measurements are then observations on this surface. To name just a few application areas, see [11, 22] for navigation/control in robotics, [14, 16] for modeling wind directional changes, [6] for finding lymphoblastic leukemia cells, [3] for movement of tectonic plates, [12] for modelling of protein chains, and [9, 19, 21] for radiology applications in the context of MIMO-systems. For details on spherical statistics, we refer to the monograph [5].
One of the central problems in statistics is to estimate model parameters from observations. In the spherical context, this can mean, for example, that one assumes a parametric distribution ${P_{\theta }}$ on ${\mathbb{S}^{2}}$ and then uses the data to estimate the unknown θ. The most commonly applied distributions on the sphere are von Mises–Fisher distributions (also called von Mises distributions in $\mathbb{S}$ or Kent distributions in ${\mathbb{S}^{2}}$) that can be viewed as the equivalent of the normal distribution on the sphere. For the parameter estimation for von Mises–Fisher distributions, see [23]. Another widely applied distribution on ${\mathbb{S}^{d}}$ is the projected normal distribution. That is, the distribution of $X/\| X\| $ for $X\sim N(\mu ,\Sigma )$. While the density function of the projected normal is well known (see [7]), the parameter estimation is much less studied due to the complicated nature of the density. In particular, the parameter estimation is studied in the circular case ($\mathbb{S}$), see [15, 16, 20, 24].
In this article, we consider the problem of parameter estimation related to the projected normal distribution onto ${\mathbb{S}^{2}}$. In contrast to the existing literature, we do not consider estimation of the parameters of the projected normal that can be obtained via standard methodology, as the density is completely known. Instead, our aim is to extract information on the underlying normal distribution $N(\mu ,\Sigma )$ (with support on the ambient Euclidean space ${\mathbb{R}^{3}}$) based solely on data consisting of projected points on ${\mathbb{S}^{2}}$. Obviously, we immediately run into identifiability issues if we observe only $X/\| X\| $ instead of $X\sim N(\mu ,\Sigma )$. First of all, it is clear that one cannot identify arbitrary shapes of Σ (for example, the distribution can be arbitrarily spread in the direction μ and this cannot be observed as all points in this direction are equally projected). For this reason, we assume, for the sake of simplicity, isotropic variance $\Sigma ={\sigma ^{2}}{I_{3}}$ and assume $X\sim N(\mu ,{\sigma ^{2}}{I_{3}})$. However, even in this case, we can only estimate the quantities $\mu /\| \mu \| $ (the direction of the location) and ${\sigma ^{2}}/\| \mu {\| ^{2}}$, the reason being that the distribution of the projection $\operatorname{pr}(aX)$ is invariant on $a\gt 0$. This is indeed natural, as intuitively one can only estimate the direction $\mu /\| \mu \| $ from the projections (onto ${\mathbb{S}^{2}}$). Similarly, $N(\mu ,{\sigma ^{2}}{I_{3}})$ for larger σ located in a distant μ seems similar to the normal distribution with smaller variance but located closer, if one is only observing both distributions on the surface of a sphere. We also note that estimation of the direction $\mu /\| \mu \| $ is already well known, and we claim no originality in this respect. Instead, our main contribution is the estimation of $\lambda ={\sigma ^{2}}/\| \mu {\| ^{2}}$. For this, we study the covariance matrix of $X/\| X\| $ on ${\mathbb{S}^{2}}$ and by linking it to certain special functions and analyzing their series expansions, we show that there is a bijective mapping between λ and the covariance matrix of $X/\| X\| $ on ${\mathbb{S}^{2}}$. As the latter can be estimated by using the methods of spherical statistics, we obtain a consistent estimator for λ via the inverse mapping that can be computed, e.g., via the bisection method.
The rest of the article is organized as follows. In Section 2 we present and discuss our main results. We begin with Section 2.1 where we introduce our setup and notation. After that, we discuss the convergence of sample estimators (for mean and covariance) in the context of a general Riemannian manifold. The case of the projected normal in ${\mathbb{S}^{2}}$ is then discussed in Section 2.3. All the proofs are postponed to Section 3.

2 Main result

In this section we present and discuss our main results. First, in Section 2.1, we shall introduce some notation and clarify some terminology used in the context of manifold-valued random variables. Some of the theory relies on differential geometric concepts which are briefly summarized in Appendix A. The convergence of sample covariances on a general compact manifold is discussed in Section 2.2, while the special case of ${\mathbb{S}^{2}}$ and the projected normal distribution is treated in Section 2.3.

2.1 General setting

Let $M\subset {\mathbb{R}^{k}}$ be a smooth n-dimensional manifold. Let $S\in {\mathbb{R}^{k}}$ be a dense subset and let $\operatorname{pr}:S\to M$ be the projection onto $M\subseteq {\mathbb{R}^{k}}$, assumed smooth everywhere on S and hence almost everywhere in ${\mathbb{R}^{k}}$. Let $(\Omega ,\mathbb{P})$ be a probability space and consider a normally distributed (multivariate) random variable
\[ X:\Omega \to {\mathbb{R}^{k}}\]
such that
\[ \mathbb{P}(X\notin S)=0.\]
Let ${x_{1}},\dots ,{x_{L}}$ be an independently drawn sample of X. Suppose now that we observe
\[ \operatorname{pr}({x_{1}}),\dots ,\operatorname{pr}({x_{L}}).\]
The question is now, can one estimate the mean and covariance of this projected sample?
In order for this question to have meaning, a notion of mean and covariance is needed for manifold-valued random variables. The intrinsic mean, a.k.a. the Frechét mean (see [17]), of an absolutely continuous random variable $X:\Omega \to M$ is defined as follows.
Definition 2.1.
Let M be a Riemannian manifold with corresponding distance function dist and volume form ${\operatorname{dVol}_{M}}$. Moreover, suppose ${p_{X}}:M\to [0,\infty )$ is a probability density function of some absolutely continuous random variable $X:\Omega \to M$. The expected value of X is then defined by
(2.1)
\[ \underset{q\in M}{\operatorname{arginf}}{\int _{M}}\operatorname{dist}{(q,y)^{2}}{p_{X}}(y){\operatorname{dVol}_{M}}(y)=\mathbb{E}[X].\]
Remark.
The argument in the infimum in (2.1) need not exist nor does it need to be unique. For example, consider $X\stackrel{\mathrm{d}}{=}N(0,{\sigma ^{2}}I)\in {\mathbb{R}^{n+1}}$, then $\operatorname{pr}(X)$ follows the uniform distribution on ${\mathbb{S}^{n}}$ and all points in ${\mathbb{S}^{n}}$ satisfy the infimum in (2.1). This is a very important difference from the case of ${\mathbb{R}^{n}}$-valued random variables. In ${\mathbb{R}^{n}}$ we know that if X is absolutely continuous, integrable and square integrable, then any point $\mu \in {\mathbb{R}^{n}}$ which minimizes the least square integral
\[ {\int _{{\mathbb{R}^{n}}}}{\left|x-\mu \right|^{2}}{p_{X}}\mathrm{d}x\]
is unique.
The intrinsic definition of the covariance matrix for a random variable on a Riemannian manifold is well known as well (see, e.g., [17]).
Definition 2.2.
Let M be a geodesically complete Riemannian manifold, and let ${\log _{q}}:M\to {T_{q}}M$ be the natural log map (defined a.e.). Let X be an M-valued absolutely continuous random variable with the intrinsic mean $\mu =\mathbb{E}[X]\in M$. Then, the covariance matrix for X is the linear map $\operatorname{Cov}(X):{T_{\mu }}M\to {T_{\mu }}M$ defined by the integral
\[ \operatorname{Cov}(X)={\int _{M}}{\log _{\mu }}(y){\log _{\mu }}{(y)^{T}}{p_{X}}(y){\operatorname{dVol}_{M}}(y).\]

2.2 Convergence of sample estimators on a compact manifold

Let ${y_{\ell }}$ be a sample of L independent measurements on a compact manifold M. In order to estimate the mean of such a sample, we shall utilize the discrete version of Definition 2.1. This definition follows that of [17].
Definition 2.3.
Let ${\{{y_{\ell }}\}_{\ell =1}^{L}}$ be a sample of L points on a Riemannian manifold M. Then the empirical mean is defined as
\[ \underset{q\in M}{\operatorname{arginf}}{\sum \limits_{\ell =1}^{L}}{\operatorname{dist}^{2}}({y_{\ell }},q).\]
Note that since the function ${\operatorname{dist}^{2}}$ has good regularity, one may hope to find such an infimum by solving the (nonlinear) equation
\[ q:{\sum \limits_{\ell =1}^{L}}{\log _{q}}({y_{\ell }})=0.\]
It was shown in [8] that if M is compact and if ${y_{\ell }}$ is sampled from a distribution with a unique empirical mean ${x_{0}}$, then the distribution of the above follows a central limit theorem. More precisely, if $\mu =\mathbb{E}[{y_{\ell }}]$ is the true mean of the underlying distribution and if $\hat{\mu }$ is the empirical mean, it holds that
\[ \sqrt{L}{\log _{\mu }}\left(\hat{\mu }\right)\stackrel{\text{d}}{\longrightarrow }N(0,V)\]
for some linear map $V\in \mathcal{L}({T_{\mu }}M,{T_{\mu }}M)$.
Definition 2.4.
Let ${\{{y_{\ell }}\}_{\ell =1}^{L}}$ be a sample of L points on a Riemannian manifold M with a unique empirical mean $\hat{\xi }$. Then the empirical covariance of the sample is defined by
(2.2)
\[ \hat{V}=\frac{1}{L-1}{\sum \limits_{\ell =1}^{L}}{\log _{\hat{\xi }}}({y_{\ell }}){\log _{\hat{\xi }}}{({y_{\ell }})^{T}}\]
where $\hat{V}:{T_{\hat{\xi }}}M\to {T_{\hat{\xi }}}M$ is a linear map.
Since the empirical mean converges by the results of [8], it remains to verify that the empirical covariance in Equation (2.2) converges to the covariance V of the limiting distribution ${\lim \nolimits_{L\to \infty }}\sqrt{L}{\log _{\xi }}\left(\hat{\xi }\right)$.
The following result shows that, in the case of isotropic covariance, the empirical covariance converges. The proof is postponed to Section 3.1.
Theorem 2.5.
Let ${\{{\xi _{\ell }}\}_{\ell =1}^{L}}$ be L independent identically distributed random variables on a compact geodesically complete manifold M. Suppose further they have a unique mean $\mathbb{E}[{\xi _{\ell }}]=\mu $ and an isotropic covariance $\operatorname{Cov}({\xi _{\ell }})=vI$. Then,
\[ \frac{1}{L-1}{\sum \limits_{\ell =1}^{L}}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}\stackrel{\mathbb{P}}{\longrightarrow }vI\]
with the same rate of convergence as the empirical mean $\hat{\mu }$ of the sample ${\{{\xi _{\ell }}\}_{\ell =1}^{L}}$.
Remark.
As far as we know, the rate of convergence for the empirical mean on compact manifolds is not completely solved as of now. In [8] it has been shown that the empirical mean has a rate of convergence $\sqrt{L}$ for a large class of manifolds, but not all compact geodesically complete manifolds. However, [2] provides rates of convergence for the empirical mean for a general class of metric spaces, from which it seems that the rate of convergence $\sqrt{L}$ isn’t necessarily true.
Following a similar method as in [18] the update scheme for a sample of observations is given in Algorithm 1.
vmsta279_g001.jpg
Algorithm 1
Estimating intrinsic average and covariance for a sample ${y_{\ell }}$ on a manifold M

2.3 Observing the projected normal in ${\mathbb{S}^{2}}$

In this section we consider $M={\mathbb{S}^{2}}$, the unit 2-sphere in ${\mathbb{R}^{3}}$. In this case, the projection map is simply
\[ \operatorname{pr}(x)=\frac{x}{\left\| x\right\| }\]
and the domain of definition for pr is $S={\mathbb{R}^{3}}\setminus \{0\}$.
Throughout, we shall use spherical coordinates, i.e. for the classical Cartesian coordinates ${(x,y,z)^{T}}$ in ${\mathbb{R}^{3}}$ we set $x=\cos (\theta )\sin (\phi )$, $y=\sin (\theta )\sin (\phi )$, and $z=\cos (\phi )$. Consider two points ${({\theta _{1}},{\phi _{1}})^{T}}$ and ${({\theta _{2}},{\phi _{2}})^{T}}$. Then it is a classical result (see, e.g., [10]) that the distance function may be written as
\[\begin{aligned}{}& \operatorname{dist}({({\theta _{1}},{\phi _{1}})^{T}},{({\theta _{2}},{\phi _{2}})^{T}})\\ {} & \hspace{1em}=\operatorname{acos}\left(\cos ({\phi _{1}})\cos ({\phi _{2}})+\sin ({\phi _{1}})\sin ({\phi _{2}})\cos ({\theta _{2}}-{\theta _{1}})\right).\end{aligned}\]
It has been shown in [7] that the projected normal, i.e. the random variable defined by $\operatorname{pr}(X)$ where $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )$, has the probability density function
(2.3)
\[ {p_{\operatorname{pr}(X)}}(\theta ,\phi )={\left(\frac{1}{2\pi A}\right)^{3/2}}{\left|\Sigma \right|^{-\frac{1}{2}}}\exp (C)\left(\operatorname{K}+{\operatorname{K}^{2}}\frac{\Phi (\operatorname{K})}{\varphi (\operatorname{K})}+\frac{\Phi (\operatorname{K})}{\varphi (\operatorname{K})}\right)\]
at a point $u={(\cos (\theta )\sin (\phi ),\sin (\theta )\sin (\phi ),\cos (\phi ))^{T}}$, where $A={u^{T}}{\Sigma ^{-1}}u$, $B={u^{T}}{\Sigma ^{-1}}\mu $, $C=-\frac{1}{2}{\mu ^{T}}{\Sigma ^{-1}}\mu $, and $\operatorname{K}=B{A^{-\frac{1}{2}}}$. Moreover, the functions $\varphi ,\Phi :\mathbb{R}\to \mathbb{R}$ are defined as
\[ \varphi (x)=\exp (-\frac{{x^{2}}}{2})\]
and
\[ \Phi (a)={\int _{-\infty }^{a}}\varphi (x)\mathrm{d}x,\]
respectively.
In our context, this random variable is then observed in ${\mathbb{S}^{2}}$. As ${\mathbb{S}^{2}}$ is an isometrically embedded Riemannian manifold, Definition 2.1 applies for the projected normal. Intuitively, the expectation of a projected normal distribution ought to be $\operatorname{pr}(\mu )$. Unfortunately, for the fully general case of the projected normal, this conjecture will be false. However, by imposing an isotropic covariance Σ onto X we shall show that this intuition is indeed correct. This is the topic of the next result whose proof is postponed in Section 3.2.
Theorem 2.6.
Let X be a normally distributed random variable with average $\mu \in {\mathbb{R}^{3}}$ and covariance matrix Σ, i.e. $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )\in {\mathbb{R}^{3}}$. If $\Sigma ={\sigma ^{2}}{I_{3}}$, then $\mathbb{E}[\operatorname{pr}(X)]=\operatorname{pr}(\mu )$.
Remark.
The above theorem is true whenever μ is an eigenvector of Σ. However, if μ is not an eigenvector of Σ, then the statement of Theorem 2.6 is false in general. To see this, let $\mu ={(0,0,1)^{T}}$ and
\[ \Sigma =\left(\begin{array}{c@{\hskip10.0pt}c@{\hskip10.0pt}c}1& 0& 0\\ {} 0& 1& 0.5\\ {} 0& 0.5& 1\end{array}\right).\]
In this case it follows that the probability density function is not symmetric around μ, see Figure 1 below.
Geometric picture showing a sphere S² and a normal random variable X with covariance matrix Σ, and mean μ. It illustrates that the projected normal random variable pr(X) might not have mean pr(μ) for a general covariance matrix Σ.
Fig. 1.
A normally distributed random variable $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )$, with the covariance matrix Σ such that the mean μ is not an eigenvector for Σ, for which $\mathbb{E}[\operatorname{pr}(X)]\ne \operatorname{pr}(\mu )$
In general, the tangent space of ${\mathbb{S}^{2}}$ at a point q is the set of vectors orthogonal (in the Euclidean product inherited from ${\mathbb{R}^{3}}$) to q. Note that the initial velocity of the geodesic connecting a point μ to q has the same direction as $q\times \mu $ (i.e. the cross-product of ${\mathbb{R}^{3}}$). The logarithm map of q centered at a point μ is thus the vector of the normalised starting velocity of the geodesic connecting μ with q times the length of the geodesic connecting μ and q, see [13]. Without loss of generality, consider $\mu ={(0,0,1)^{T}}$. Then
\[ {\log _{\mu }}(q)=\left(\begin{array}{c}\cos (\theta )\sin (\phi )\\ {} \sin (\theta )\sin (\phi )\end{array}\right)\frac{\phi }{\left|\sin (\phi )\right|},\]
where the basis for the vector is in a rewritten form of the basis (of ${T_{\mu }}{\mathbb{S}^{2}}$)
\[ \left\{\left(\begin{array}{c}1\\ {} 0\\ {} 0\end{array}\right),\left(\begin{array}{c}0\\ {} 1\\ {} 0\end{array}\right)\right\}.\]
Hence the intrinsic covariance of $\operatorname{pr}(X)$ can be written as
\[\begin{aligned}{}\operatorname{Cov}(\operatorname{pr}(X))=& {\int _{0}^{2\pi }}{\int _{0}^{\pi }}\left(\begin{array}{c@{\hskip10.0pt}c}{\cos ^{2}}(\theta ){\sin ^{2}}(\phi )& \cos (\theta )\sin (\theta ){\sin ^{2}}(\phi )\\ {} \cos (\theta )\sin (\theta ){\sin ^{2}}(\phi )& {\sin ^{2}}(\theta ){\sin ^{2}}(\phi )\end{array}\right)\\ {} & \times \frac{{\phi ^{2}}}{{\sin ^{2}}(\phi )}{\left(\frac{1}{2\pi }\right)^{3/2}}\exp (-\frac{1}{2{\sigma ^{2}}})\\ {} & \times \left(\operatorname{K}+{\operatorname{K}^{2}}\frac{\Phi (\operatorname{K})}{\varphi (\operatorname{K})}+\frac{\Phi (\operatorname{K})}{\varphi (\operatorname{K})}\right)\sin (\phi )\mathrm{d}\phi \mathrm{d}\theta ,\end{aligned}\]
where $\operatorname{K}=\frac{1}{\sigma }\cos (\phi )$. Or, equivalently, by simplifying and integrating w.r.t. θ,
(2.4)
\[ \begin{aligned}{}\operatorname{Cov}(\operatorname{pr}(X))=& \frac{\pi }{{(2\pi )^{3/2}}}\exp (-\frac{1}{2{\sigma ^{2}}})\left(\begin{array}{c@{\hskip10.0pt}c}1& 0\\ {} 0& 1\end{array}\right)\\ {} & \times {\int _{0}^{\pi }}{\phi ^{2}}\left(\frac{\cos (\phi )}{\sigma }+\left(\frac{{\cos ^{2}}(\phi )}{{\sigma ^{2}}}+1\right)\frac{\Phi (\frac{\cos (\phi )}{\sigma })}{\varphi (\frac{\cos (\phi )}{\sigma })}\right)\sin (\phi )\mathrm{d}\phi \\ {} =:& \left(\begin{array}{c@{\hskip10.0pt}c}1& 0\\ {} 0& 1\end{array}\right)f(\sigma ).\end{aligned}\]
Remark.
In the limit as $\sigma \to \infty $, it holds that ${p_{\operatorname{pr}(X)}}\to \frac{1}{4\pi }{𝟙_{{\mathbb{S}^{2}}}}$. The intrinsic covariance in this limit is
\[ \frac{{\pi ^{2}}-4}{4}\left(\begin{array}{c@{\hskip10.0pt}c}1& 0\\ {} 0& 1\end{array}\right).\]
On the other hand, in the limit as $\sigma \to 0$, ${p_{\operatorname{pr}(X)}}$ converges to the point mass distribution while the covariance goes to zero.
Note that the above estimates are only for the intrinsic expectation and covariance. If we want to relate these estimates to the extrinsic expectation and covariance, we end up with the obvious problem that $\operatorname{pr}(X)$ and $\operatorname{pr}(aX)$, $a\gt 0$, have the same distribution. Hence we need to choose which normal distribution in ${\mathbb{R}^{3}}$ corresponds to the intrinsic covariance and average. By Lemma 2.6 the average is known (up to a factor). Moreover, it turns out there is a one-to-one correspondence between the extrinsic covariance and the intrinsic covariance up to a factor $a\gt 0$, and thus one may estimate the underlying covariance parameter σ using the relation (2.4) with, e.g., the bisection method.
The degeneracy of the factor a essentially means that we can only estimate the parameter σ in the case when the projected average is the true average of the ${\mathbb{R}^{3}}$-valued normal random variable. More generally, if $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$, then $\frac{X}{\| \mu \| }\stackrel{\mathrm{d}}{=}N\left(\frac{\mu }{\| \mu \| },\frac{{\sigma ^{2}}}{\| \mu {\| ^{2}}}{I_{3}}\right)$. Consequently, we can only estimate the quantities $\frac{\mu }{\| \mu \| }\in {\mathbb{S}^{2}}$ and $\frac{{\sigma ^{2}}}{\| \mu {\| ^{2}}}$. This means that, as expected, we can estimate the direction $\frac{\mu }{\| \mu \| }$ of the extrinsic distribution $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ but not the distance $\| \mu \| $. On the other hand, for the variance we can only estimate the quantity $\frac{{\sigma ^{2}}}{\| \mu {\| ^{2}}}$. This is expected as well, as for the distribution $N(\mu ,{\sigma ^{2}}{I_{3}})$ located far away ($\| \mu \| $ large) the projections onto ${\mathbb{S}^{2}}$ are not wide spread even if $\sigma \gt 0$ is large.
The required one-to-one correspondence between extrinsic and intrinsic covariances is formulated in the following proposition, whose proof is presented in Section 3.3. The statement is also illustrated in Figure 2.
Proposition 2.7.
Let X be a normally distributed random variable with mean $\mu \in {\mathbb{R}^{3}}$ and isotropic variance ${\sigma ^{2}}{I_{3}}$, i.e. $\frac{X}{\left\| \mu \right\| }\stackrel{\mathrm{d}}{=}N(\frac{\mu }{\left\| \mu \right\| },\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}{I_{3}})$. Then
\[ \operatorname{Cov}(\operatorname{pr}(X))=v{I_{2}}\]
for $0\le v\lt \frac{{\pi ^{2}}-4}{4}$ and the relation
\[ f\left(\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}\right):=\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}=v\]
is a bijection.
The bijection f is crucial for observing projected normal random variables. Utilizing this f, if the scalar variance of some normal random variable $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ in ${\mathbb{R}^{3}}$ is known, then the intrinsic scalar variance of the corresponding projected normal random variable is precisely $f(\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}})$. Conversely, if some projected isotropic normal random variable has mean μ and covariance $v{I_{2}}$, then the normal random variable X such that $\operatorname{pr}(X)$ has these parameters is precisely $X\stackrel{\mathrm{d}}{=}N(\mu ,{f^{-1}}(v){I_{3}})$ (up to a positive scalar factor).
As a consequence of Proposition 2.7 and Theorem 2.5 we can conclude that given measurements on the sphere, we can estimate the scalar variance of an isotropically distributed normal random vector in ${\mathbb{R}^{3}}$. This leads to the next result that can be viewed as the main theorem of the present paper. Its proof follows directly from Theorem 2.5 and Proposition 2.7. The rate of convergence is obtained immediately from noting that [8, Theorem 2] applies to ${\mathbb{S}^{2}}$, and thus the empirical mean has rate of convergence $\sqrt{L}$, and by Theorem 2.5 so does the empirical covariance.
Theorem 2.8.
Let X be a normally distributed random variable with mean $\mu \in {\mathbb{R}^{3}}$ and isotropic variance ${\sigma ^{2}}{I_{3}}$, i.e. $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$. Given independent measurements $({x_{1}},{x_{2}},\dots ,{x_{L}})$ from $\operatorname{pr}(X)$, we can estimate $\lambda =\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}$ by
\[ \hat{\lambda }={f^{-1}}\left(\frac{\operatorname{tr}(\hat{V})}{2}\right)\]
where $\hat{V}$ is the empirical covariance matrix given in Equation (2.2) and where f is the bijection given in Proposition 2.7 and defined in Equation (2.4). Moreover, it holds that
\[ \hat{\lambda }\stackrel{\mathbb{P}}{\longrightarrow }\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}\]
as $L\to \infty $, with rate of convergence $\sqrt{L}$.
Graph showing the function f(σ²)=tr(Cov(pr(X)))/2 as a blue curve and the upper limit (π²-4)/4 as a red horizontal line, with σ² on the x-axis.
Fig. 2.
A plot of the scalar variance of $\operatorname{pr}(X)$, i.e. $f({\sigma ^{2}})$ from Proposition 2.7, where $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ and $\mu \in {\mathbb{S}^{2}}$ is arbitrary. The red line indicates the upper bound $({\pi ^{2}}-4)/4={\lim \nolimits_{\sigma \to \infty }}\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}$
We conclude this section with some simulations. Firstly, Table 1 illustrates an empirical verification of Theorem 2.5 for the case of the manifold ${\mathbb{S}^{2}}$. The simulations are conducted by generating L samples from $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ with $\mu {(0,0,1)^{T}}\in {\mathbb{S}^{2}}$ and $\sigma =1$ and projecting these samples onto ${\mathbb{S}^{2}}$. The sample covariance is computed as in Algorithm 1, and its error is then compared to the theoretical covariance using the Frobenius norm. This error is then averaged over using 100 repetitions of the Monte Carlo method.
Table 1.
The absolute error of 100 times repeated Monte Carlo simulations of the empirical covariance, Equation (2.2), compared to the theoretical covariance of the projected normal distribution, $\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))/2$. Each simulation run uses L data points and $\sigma =1$
L 30 50 100 1000 ${10^{4}}$ ${10^{5}}$ ${10^{6}}$
error 0.013 0.0080 0.0055 0.0033 0.0015 8.4e-05 2.9e-05
Secondly, Figure 3 is an illustration of the convergence result in Theorem 2.8. The simulations are done by generating L samples from $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ with $\mu {(0,0,1)^{T}}\in {\mathbb{S}^{2}}$ and $\sigma =1$. The empirical covariance $\hat{V}$, Equation (2.2), is computed for each set of samples. Then, for each set of samples, the estimator $\hat{\lambda }$ for ${\sigma ^{2}}$, see Theorem 2.8, is computed by ${f^{-1}}(\operatorname{tr}(\hat{V})/2)$. This is done with a 1000-fold repetition.
Box plot comparing estimates of lambda for 5 sample sizes (L=5 to 100), showing convergence to true lambda (red line) as sample size increases.
Fig. 3.
A box plot of the inferred estimator $\hat{\lambda }$ in Theorem 2.8 using the empirical covariance, Equation (2.2), for L randomly generated projected normal random variables with ${\sigma ^{2}}=1$ and $\mu ={(0,0,1)^{T}}$. For each L this is done with 1000 repetitions. The true underlying scalar variance, $\lambda ={f^{-1}}(\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))/2)=1$, is shown in red color. Note that some outliers are omitted for readability of the graph

3 Proofs

3.1 Proof of Theorem 2.5

First, note that
\[ \frac{1}{L-1}{\sum \limits_{\ell =1}^{L}}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}\]
is a linear map ${T_{\hat{\mu }}}{\mathbb{S}^{2}}\to {T_{\hat{\mu }}}{\mathbb{S}^{2}}$, and $\operatorname{Cov}(\xi )$ is by definition a linear map ${T_{\mu }}{\mathbb{S}^{2}}\to {T_{\mu }}{\mathbb{S}^{2}}$. In order to compare $\hat{V}$ and $\operatorname{Cov}(\xi )$ we shall transport parallelly the linear map $\hat{V}:{T_{\hat{\mu }}}\to {T_{\hat{\mu }}}$ to a linear map ${T_{\mu }}{\mathbb{S}^{2}}\to {T_{\mu }}{\mathbb{S}^{2}}$. That is, we look at ${P_{\hat{\mu },\mu }}\hat{V}{P_{\mu ,\hat{\mu }}}$ and see how far away it is from $v{I_{2}}$. It follows from Equation (A.1) that ${P_{\hat{\mu },\mu }}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}{P_{\mu ,\hat{\mu }}}$ has the first order expansion
\[\begin{aligned}{}& {P_{\hat{\mu },\mu }}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}{P_{\mu ,\hat{\mu }}}\\ {} & \hspace{1em}={\log _{\mu }}({\xi _{\ell }}){\log _{\mu }}{({\xi _{\ell }})^{T}}+\left({\nabla _{{\log _{\mu }}(\hat{\mu })}}{\log _{\mu }}({\xi _{\ell }})\right){\log _{\mu }}{({\xi _{\ell }})^{T}}\\ {} & \hspace{2em}+{\log _{\mu }}({\xi _{\ell }}){\left({\nabla _{{\log _{\mu }}(\hat{\mu })}}{\log _{\mu }}({\xi _{\ell }})\right)^{T}}+\mathcal{O}(\operatorname{dist}{(\mu ,\hat{\mu })^{2}}).\end{aligned}\]
Therefore,
\[\begin{aligned}{}& \underset{n\to \infty }{\lim }\left\| \mathbb{E}\left[{P_{\hat{\mu },\mu }}\frac{1}{n-1}{\sum \limits_{\ell =1}^{n}}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}{P_{\mu ,\hat{\mu }}}-vI\right]\right\| \\ {} & \hspace{1em}\le \underset{n\to \infty }{\lim }\left\| \frac{1}{n-1}{\sum \limits_{\ell =1}^{n}}\mathbb{E}\left[{\log _{\mu }}({\xi _{\ell }}){\log _{\mu }}{({\xi _{\ell }})^{T}}\right]-vI\right\| \\ {} & \hspace{2em}+\underset{n\to \infty }{\lim }\left\| \mathbb{E}\left[\left({\nabla _{{\log _{\mu }}(\hat{\mu })}}{\log _{\mu }}({\xi _{\ell }})\right){\log _{\mu }}{({\xi _{\ell }})^{T}}+{\log _{\mu }}({\xi _{\ell }}){\left({\nabla _{{\log _{\mu }}(\hat{\mu })}}{\log _{\mu }}({\xi _{\ell }})\right)^{T}}\right]\right\| \\ {} & \hspace{2em}+\underset{n\to \infty }{\lim }\mathbb{E}\left[\mathcal{O}(\operatorname{dist}{(\hat{\mu },\mu )^{2}})\right]\\ {} & \hspace{1em}\le \left\| vI-vI\right\| +\underset{n\to \infty }{\lim }\mathbb{E}\left[\mathcal{O}(\operatorname{dist}(\hat{\mu },\mu ))\right]\end{aligned}\]
Here, according to [8, Proposition 1], it holds that $\operatorname{dist}(\mu ,\hat{\mu })$ converges to zero almost surely. Since M is compact, it has finite mass and thus $\operatorname{dist}(\mu ,\hat{\mu })$ converges in expectation to 0 by the dominated convergence theorem. Finally, the fact that the empirical covariance has the same rate of convergence as the empirical mean follows from the last inequality. This completes the proof.
□

3.2 Proof of Theorem 2.6

By the very definition, we need to show that
(3.1)
\[ \underset{q\in {\mathbb{S}^{2}}}{\operatorname{arginf}}{\int _{{\mathbb{S}^{2}}}}\operatorname{dist}{(q,y)^{2}}{p_{\operatorname{pr}(X)}}(y)d{\mathbb{S}^{2}}(y)=\operatorname{pr}(\mathbb{E}[X])\]
for $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$. By rescaling X with a factor of $\frac{1}{\left\| \mu \right\| }$ and by a rotational symmetry, we may assume $\mu ={(-1,0,0)^{T}}$ without loss of generality.
We make a simple first-derivative test to find the minimum in (3.1). Let $({\theta _{1}},{\phi _{1}})$ be the spherical coordinates that are integrated over the sphere and let $({\theta _{2}},{\phi _{2}})$ be the coordinates that minimize the integral. Hence we need to show that $({\theta _{2}},{\phi _{2}})=(\pi ,\pi /2)$. Writing the integral in spherical coordinates and using the fact that the derivative with respect to ${\phi _{2}}$ is zero, it follows that $({\theta _{2}},{\phi _{2}})$ satisfies
\[ \frac{\partial }{\partial {\phi _{2}}}{\int _{{\mathbb{S}^{2}}}}{\operatorname{dist}^{2}}(({\theta _{1}},{\phi _{1}}),({\theta _{2}},{\phi _{2}})){p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\sin ({\phi _{1}})d{\theta _{1}}d{\phi _{1}}=0.\]
Now by the Leibniz integral rule the derivative may be moved inside the integral, and it holds
\[\begin{aligned}{}& {\int _{0}^{2\pi }}{\int _{0}^{\pi }}2\operatorname{dist}(({\theta _{1}},{\phi _{1}}),({\theta _{2}},{\phi _{2}}))\\ {} & \hspace{1em}\times \frac{(\cos ({\phi _{1}})\sin ({\phi _{2}})-\sin ({\phi _{1}})\cos ({\phi _{2}})\cos ({\theta _{2}}-{\theta _{1}}))}{\sqrt{1-{\left(\cos ({\phi _{1}})\cos ({\phi _{2}})+\sin ({\phi _{1}})\sin ({\phi _{2}})\cos ({\theta _{2}}-{\theta _{1}})\right)^{2}}}}\\ {} & \hspace{1em}\times {p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\sin ({\phi _{1}})\mathrm{d}{\theta _{1}}\mathrm{d}{\phi _{1}}\\ {} & \hspace{2em}=0.\end{aligned}\]
Plugging $({\theta _{2}},{\phi _{2}})=(\pi ,\pi /2)$ into the integral on the left-hand side yields an integral
\[\begin{aligned}{}& 2{\int _{0}^{2\pi }}{\int _{0}^{\pi }}\operatorname{acos}(\sin ({\phi _{1}})\cos (\pi -{\theta _{1}}))\frac{\cos ({\phi _{1}})}{\sqrt{1-{\sin ^{2}}({\phi _{1}}){\cos ^{2}}(\pi -{\theta _{1}})}}\\ {} & \hspace{1em}\times {p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\sin ({\phi _{1}})\mathrm{d}{\theta _{1}}\mathrm{d}{\phi _{1}}.\end{aligned}\]
This integral equals zero, which can be seen by observing that the integrand is odd along ${\phi _{1}}$ around $\frac{\pi }{2}$, since ${p_{\operatorname{pr}(X)}}$ is symmetric around μ by construction. By similar arguments, we obtain that
\[ \frac{\partial }{\partial {\theta _{2}}}{\int _{{\mathbb{S}^{2}}}}{\operatorname{dist}^{2}}(({\theta _{1}},{\phi _{1}}),({\theta _{2}},{\phi _{2}})){p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\sin ({\phi _{1}})\mathrm{d}{\phi _{1}}\mathrm{d}{\theta _{1}}\]
reduces at $({\theta _{2}},{\phi _{2}})=(\pi ,\pi /2)$ to
\[\begin{aligned}{}& 2{\int _{0}^{2\pi }}{\int _{0}^{\pi }}\operatorname{acos}(\sin ({\phi _{1}})\cos (\pi -{\theta _{1}}))\frac{{\sin ^{2}}({\phi _{1}})\sin (\pi -{\theta _{1}})}{\sqrt{1-{\sin ^{2}}({\phi _{1}}){\cos ^{2}}(\pi -{\theta _{1}})}}\\ {} & \hspace{1em}\times {p_{\operatorname{pr}(X)}}({\theta _{1}},{\phi _{1}})\mathrm{d}{\phi _{1}}\mathrm{d}{\theta _{1}}\end{aligned}\]
which equals zero by symmetry around ${\theta _{1}}=\pi $. Therefore, one can conclude that $\mu =\operatorname{pr}(\mathbb{E}[X])$ is a local extremum for the integral in Equation (3.1).
Next we shall argue why it is a global minimum. Note that a very similar computation will show that $-\mu $, i.e. in spherical coordinates $({\theta _{2}},{\phi _{2}})=(0,\pi /2)$ is another local extremum. In fact, μ and $-\mu $ are the only local extremum for the integral in Equation (3.1) since it is the only two points for which the distance function is rotationally symmetric around the line in ${\mathbb{R}^{3}}$ which is spanned by μ. By direct computation,
\[ {p_{\operatorname{pr}(X)}}\left(\pi ,\frac{\pi }{2}\right)=\frac{1}{{(2\pi )^{3/2}}}\exp (-\frac{1}{2{\sigma ^{2}}})\left(\frac{1}{\sigma }+\frac{1}{{\sigma ^{2}}}\frac{\Phi (\frac{1}{\sigma })}{\varphi (\frac{1}{\sigma })}+\frac{\Phi (\frac{1}{\sigma })}{\varphi (\frac{1}{\sigma })}\right)\]
and
\[ {p_{\operatorname{pr}(X)}}\left(0,\frac{\pi }{2}\right)=\frac{1}{{(2\pi )^{3/2}}}\exp (-\frac{1}{2{\sigma ^{2}}})\left(\frac{-1}{\sigma }+\frac{1}{{\sigma ^{2}}}\frac{\Phi (\frac{-1}{\sigma })}{\varphi (\frac{-1}{\sigma })}+\frac{\Phi (\frac{-1}{\sigma })}{\varphi (\frac{-1}{\sigma })}\right),\]
hence ${p_{\operatorname{pr}(X)}}(\mu )\gt {p_{\operatorname{pr}(X)}}(-\mu )$. More generally, a similar computation shows that ${p_{\operatorname{pr}(X)}}(y)\gt {p_{\operatorname{pr}(X)}}(Ry)$, if $\langle y,\mu \rangle \gt 0$, where R is the reflection mapping over the $y,z$-plane, i.e.
\[ R\left(\begin{array}{c}a\\ {} b\\ {} c\end{array}\right)=\left(\begin{array}{c}-a\\ {} b\\ {} c\end{array}\right).\]
Tautologically, the distance function increases when one goes father, and points being farther away will contribute more into the integral. Therefore $({\theta _{2}},{\phi _{2}})=(\pi ,\pi /2)$ yields a global minimum for the integral in Equation (3.1).
□

3.3 Proof of Proposition 2.7

Without loss of generality, we assume $\mu ={(0,0,1)^{T}}$. In this case the functions $A,B,C,\operatorname{K}$ inside Equation (2.3) are given by
\[\begin{array}{l}\displaystyle A=\frac{1}{{\sigma ^{2}}},\\ {} \displaystyle B=\frac{1}{{\sigma ^{2}}}\cos (\phi ),\\ {} \displaystyle C=-\frac{1}{2{\sigma ^{2}}},\end{array}\]
and
\[ \operatorname{K}=\frac{1}{\sigma }\cos (\phi ).\]
By (2.4) it holds that $\operatorname{Cov}(\operatorname{pr}(X))$ is isotropic and we can write
\[\begin{aligned}{}\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}& =\frac{1}{{(2\pi )^{1/2}}}\exp (-\frac{1}{2{\sigma ^{2}}})\\ {} & \hspace{1em}\times {\int _{0}^{\pi }}{\phi ^{2}}\left(\frac{\cos (\phi )}{\sigma }+\left(\frac{{\cos ^{2}}(\phi )}{{\sigma ^{2}}}+1\right)\frac{\Phi (\frac{\cos (\phi )}{\sigma })}{\varphi (\frac{\cos (\phi )}{\sigma })}\right)\sin (\phi )\mathrm{d}\phi .\end{aligned}\]
Denote
\[ f(\sigma )=\exp (-\frac{1}{2{\sigma ^{2}}}){\int _{0}^{\pi }}{\phi ^{2}}\left(\frac{\cos (\phi )}{\sigma }+\left(\frac{{\cos ^{2}}(\phi )}{{\sigma ^{2}}}+1\right)\frac{\Phi (\frac{\cos (\phi )}{\sigma })}{\varphi (\frac{\cos (\phi )}{\sigma })}\right)\sin (\phi )\mathrm{d}\phi .\]
In order to obtain the claim, it suffices to prove that $f(\sigma )$ is strictly increasing, from which it follows that $\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}$ is strictly increasing in σ as well. For notational simplicity, we set $x=\frac{1}{\sigma }$ and show that $f(x)$ is strictly decreasing in x, where now $f(x)$, with a slight abuse of notation, is given by
(3.2)
\[ \begin{aligned}{}f(x)& =\exp (-\frac{{x^{2}}}{2})\\ {} & \hspace{1em}\times {\int _{0}^{\pi }}{\phi ^{2}}\left(x\cos (\phi )+\left({x^{2}}{\cos ^{2}}(\phi )+1\right)\frac{\Phi (x\cos (\phi ))}{\varphi (x\cos (\phi ))}\right)\sin (\phi )\mathrm{d}\phi .\end{aligned}\]
By differentiating in x, we get
\[\begin{aligned}{}& {f^{\prime }}(x)=\exp (\frac{-{x^{2}}}{2})({I_{1}}+{I_{2}}+{I_{3}}),\hspace{1em}\mathrm{where}\\ {} & \hspace{3.33333pt}{I_{1}}:=-x{\int _{0}^{\pi }}{\phi ^{2}}\cos (\phi )x\sin (\phi )\mathrm{d}\phi ,\\ {} & \hspace{3.33333pt}{I_{2}}:={\int _{0}^{\pi }}{\phi ^{2}}\left(-{x^{3}}{\cos ^{2}}(\phi )-x+3x{\cos ^{2}}(\phi )+{x^{3}}{\cos ^{4}}(\phi )\right)\frac{\Phi (x\cos (\phi ))}{\varphi (x\cos (\phi ))}\sin (\phi )\mathrm{d}\phi ,\\ {} & \hspace{3.33333pt}{I_{3}}:={\int _{0}^{\pi }}{\phi ^{2}}\left(2\cos (\phi )+{x^{2}}{\cos ^{3}}(\phi )\right)\sin (\phi )\mathrm{d}\phi .\end{aligned}\]
Immediately, we have that
\[ {I_{1}}={x^{2}}\frac{{\pi ^{2}}}{4}\]
and
\[ {I_{3}}=-\frac{{\pi ^{2}}}{2}-\frac{5{\pi ^{2}}}{32}{x^{2}}.\]
In order to show ${f^{\prime }}(x)\lt 0$, we need to decipher ${I_{2}}$, which is more complicated. The main idea is to show that ${f^{\prime }}(x)$ is analytic for all $x\ge 0$ and then to show that there is a strictly decreasing analytic function between ${f^{\prime }}(x)$ and 0. We begin with Lemma 3.1 showing that $\frac{\Phi }{\varphi }$ is a Dawson-like function and therefore analytic. In Lemma 3.2 the series expansion of ${I_{2}}$ is integrated term-wise. The terms of the series expression for ${I_{2}}$ are given inductively in Lemma 3.3, which are then inserted to give explicit expressions for the terms of ${f^{\prime }}(x)\exp ({x^{2}}/2)$ in Lemma 3.4. Lemma 3.5 then gives a lower bound on the odd terms of ${f^{\prime }}(x)\exp ({x^{2}}/2)$, and Lemma 3.7 gives an upper bound. By Lemma 3.8 we show that the series expression of ${f^{\prime }}(x)\exp ({x^{2}}/2)$ is eventually decreasing as a series, and in combination with Corollary 3.6 it is concluded that ${f^{\prime }}(x)\exp ({x^{2}}/2)$ is entire. The proof is finished by comparing ${f^{\prime }}(x)\exp ({x^{2}}/2)$ to a linear combination of analytic functions computed in Lemma 3.9.
Lemma 3.1.
It holds that
\[ \frac{\Phi }{\varphi }(x)={\sum \limits_{k=0}^{\infty }}\frac{1}{(2k+1)!!}{x^{2k+1}}+\frac{\sqrt{2\pi }}{2}{\sum \limits_{k=0}^{\infty }}\frac{1}{(2k)!!}{x^{2k}}=:{\sum \limits_{k=0}^{\infty }}{d_{k}}{x^{k}}\]
where the series converges everywhere.
Proof.
Note that $\frac{\Phi }{\varphi }(x)$ is very similar to the Dawson function (originally studied in [4], see also [1] for further details) given by
\[ {D_{-}}(x)=\exp ({x^{2}}){\int _{0}^{x}}\exp (-{t^{2}})\mathrm{d}t,\]
and it has the series expansion
\[ {D_{-}}(x)={\sum \limits_{k=0}^{\infty }}\frac{{2^{k}}}{(2k+1)!!}{x^{2k+1}}.\]
Note that
\[ {\int _{0}^{x}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t=\sqrt{2}{\int _{0}^{x/\sqrt{2}}}\exp (-{u^{2}})\mathrm{d}u\]
by the variable substitution $u=t/\sqrt{2}$. Hence,
\[\begin{aligned}{}\exp (\frac{{x^{2}}}{2}){\int _{0}^{x}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t& =\sqrt{2}{D_{-}}(x/\sqrt{2})\\ {} & =\sqrt{2}{\sum \limits_{k=0}^{\infty }}\frac{{2^{k}}}{(2k+1)!!}\frac{{x^{2k+1}}}{{2^{k}}\sqrt{2}}={\sum \limits_{k=0}^{\infty }}\frac{1}{(2k+1)!!}{x^{2k+1}}.\end{aligned}\]
Moreover,
\[ {\int _{-\infty }^{0}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t=\frac{\sqrt{2\pi }}{2}\]
and hence
\[ \exp (\frac{{x^{2}}}{2}){\int _{-\infty }^{0}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t\]
has the series expansion
\[ \frac{\sqrt{2\pi }}{2}{\sum \limits_{k=0}^{\infty }}\frac{1}{k!}\frac{{x^{2k}}}{{2^{k}}}=\frac{\sqrt{2\pi }}{2}{\sum \limits_{k=0}^{\infty }}\frac{1}{(2k)!!}{x^{2k}}.\]
It follows that
\[\begin{aligned}{}\frac{\Phi }{\varphi }(x)& =\exp (\frac{{x^{2}}}{2}){\int _{-\infty }^{0}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t+\exp (\frac{{x^{2}}}{2}){\int _{0}^{x}}\exp (-\frac{{t^{2}}}{2})\mathrm{d}t\\ {} & ={\sum \limits_{k=0}^{\infty }}\frac{1}{(2k+1)!!}{x^{2k+1}}+\frac{\sqrt{2\pi }}{2}{\sum \limits_{k=0}^{\infty }}\frac{1}{(2k)!!}{x^{2k}}\end{aligned}\]
proving the claimed series representation. Finally, the convergence everywhere follows from the fact that the Dawson function converges everywhere.  □
Using Lemma 3.1 it follows that ${I_{2}}$ can be rewritten as
(3.3)
\[ \begin{aligned}{}{I_{2}}& ={\sum \limits_{k=0}^{\infty }}{d_{k}}{\int _{0}^{\pi }}{\phi ^{2}}\left(-{x^{3}}{\cos ^{2}}(\phi )-x+3x{\cos ^{2}}(\phi )+{x^{3}}{\cos ^{4}}(\phi )\right)\\ {} & \hspace{1em}\times {x^{k}}{\cos ^{k}}(\phi )\sin (\phi )\mathrm{d}\phi .\end{aligned}\]
Each term in Equation (3.3) involves
\[ {J_{m}}={\int _{0}^{\pi }}{\phi ^{2}}{\cos ^{m}}(\phi )\sin (\phi )\mathrm{d}\phi \]
which we will study next.
Lemma 3.2.
The sequence ${J_{m}}$ satisfies ${J_{0}}={\pi ^{2}}-4$, and for even $m\ne 0$
\[ {J_{m}}=\frac{1}{m+1}\left({\pi ^{2}}-4\frac{m!!}{(m+1)!!}\left(1+{\sum \limits_{j=0}^{\frac{m}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\right)\right),\]
while for $m\in \mathbb{N}$ odd
\[ {J_{m}}=\frac{{\pi ^{2}}}{m+1}\left(\frac{m!!}{(m+1)!!}-1\right).\]
Proof.
For ${J_{0}}$ we observe immediately that
\[ {J_{0}}={\int _{0}^{\pi }}{\phi ^{2}}\sin (\phi )\mathrm{d}\phi ={\pi ^{2}}-4.\]
Next, let m be odd. Then integration by parts gives
\[\begin{aligned}{}{J_{m}}& ={\left[-{\phi ^{2}}\frac{{\cos ^{m+1}}(\phi )}{m+1}\right]_{0}^{\pi }}+\frac{2}{m+1}{\int _{0}^{\pi }}\phi {\cos ^{m+1}}(\phi )\mathrm{d}\phi \\ {} & =-\frac{{\pi ^{2}}}{m+1}+\frac{2}{m+1}\Bigg[\frac{m!!}{(m+1)!!}{\phi ^{2}}+\phi {\sum \limits_{j=0}^{\frac{m-1}{2}}}{\cos ^{m-2j}}(\phi )\sin (\phi )\frac{m!!}{(m-2j)!!}\\ {} & \hspace{1em}\times \frac{(m-2j-1)!!}{(m+1)!!}\Bigg]{_{0}^{\pi }}-\frac{2}{m+1}{\sum \limits_{j=0}^{\frac{m-1}{2}}}\frac{m!!}{(m-2j)!!}\frac{(m-2j-1)!!}{(m+1)!!}\\ {} & \hspace{1em}\times {\int _{0}^{\pi }}{\cos ^{m-2j}}(\phi )\sin (\phi )d\phi -\frac{2}{m+1}\frac{m!!}{(m+1)!!}{\int _{0}^{\pi }}\phi \mathrm{d}\phi \\ {} & =-\frac{{\pi ^{2}}}{m+1}+\frac{2}{m+1}\frac{m!!}{(m+1)!!}{\pi ^{2}}-\frac{1}{m+1}\frac{m!!}{(m+1)!!}{\pi ^{2}}\\ {} & =\frac{{\pi ^{2}}}{m+1}\left(\frac{m!!}{(m+1)!!}-1\right).\end{aligned}\]
Similarly, when $m\gt 0$ is even, integration by parts gives
\[\begin{aligned}{}{J_{m}}& ={\left[-{\phi ^{2}}\frac{{\cos ^{m+1}}(\phi )}{m+1}\right]_{0}^{\pi }}+\frac{2}{m+1}{\int _{0}^{\pi }}\phi {\cos ^{m+1}}(\phi )\mathrm{d}\phi \\ {} & =\frac{{\pi ^{2}}}{m+1}+\frac{2}{m+1}\Bigg[\phi \sin (\phi )\frac{m!!}{(m+1)!!}+\phi {\sum \limits_{j=0}^{\frac{m}{2}-1}}{\cos ^{m-2j}}\sin (\phi )\frac{m!!}{(m-2j)!!}\\ {} & \hspace{1em}\times \frac{(m-1-2j)!!}{(m+1)!!}\Bigg]{_{0}^{\pi }}-\frac{2}{m+1}\frac{m!!}{(m+1)!!}{\int _{0}^{\pi }}\sin (\phi )d\phi \\ {} & \hspace{1em}-\frac{2}{m+1}{\sum \limits_{j=0}^{\frac{m}{2}-1}}\frac{m!!}{(m-2j)!!}\frac{(m-1-2j)!!}{(m+1)!!}{\int _{0}^{\pi }}{\cos ^{m-2j}}(\phi )\sin (\phi )\mathrm{d}\phi \\ {} & =\frac{{\pi ^{2}}}{m+1}-\frac{4}{m+1}\frac{m!!}{(m+1)!!}\\ {} & \hspace{1em}-\frac{4}{m+1}\frac{m!!}{(m+1)!!}{\sum \limits_{j=0}^{\frac{m}{2}-1}}\frac{(m-1-2j)!!}{(m-2j)!!(m+1-2j)}\\ {} & =\frac{1}{m+1}\left({\pi ^{2}}-4\frac{m!!}{(m+1)!!}\left(1+{\sum \limits_{j=0}^{\frac{m}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\right)\right).\end{aligned}\]
This completes the proof.  □
Lemma 3.3.
Denote
\[ {I_{2}}={\sum \limits_{n=1}^{\infty }}{a_{n}}{x^{n}}.\]
Then the coefficients ${a_{k}}$ satisfy
\[ {a_{1}}=\frac{4\sqrt{2\pi }}{9},\hspace{1em}{a_{2}}=-\frac{7{\pi ^{2}}}{32},\]
and
\[ {a_{k}}={d_{k-1}}(3{J_{k+1}}-{J_{k-1}})+{d_{k-3}}({J_{k+1}}-{J_{k-1}}),\hspace{1em}k\ge 3.\]
Proof.
Immediately ${I_{2}}(0)=0$, so the constant term is zero. Next, we get expressions for ${a_{n}}$ in terms of ${d_{n}}$’s and ${J_{n}}$’s from Equation (3.3). Then straightforward calculations give
\[\begin{array}{l}\displaystyle {a_{1}}={d_{0}}(-{J_{0}}+3{J_{2}})=\frac{\sqrt{2\pi }}{2}\left(4-{\pi ^{2}}+\frac{3}{3}\left({\pi ^{2}}-4\frac{2}{3}\left(1+\frac{1}{6}\right)\right)\right)=\frac{4\sqrt{2\pi }}{9},\\ {} \displaystyle {a_{2}}={d_{1}}(-{J_{1}}+3{J_{3}})=\frac{{\pi ^{2}}}{4}+\frac{3{\pi ^{2}}}{4}\left(\frac{3}{8}-1\right)=-\frac{7{\pi ^{2}}}{32},\end{array}\]
and, for $k\ge 3$,
\[ {a_{k}}={d_{k-1}}(3{J_{k+1}}-{J_{k-1}})+{d_{k-3}}({J_{k+1}}-{J_{k-1}}).\]
 □
Lemma 3.4.
Denote
(3.4)
\[ {f^{\prime }}(x)\exp (\frac{{x^{2}}}{2})=\sum \limits_{k=0}{c_{k}}{x^{k}}.\]
Then
\[ {c_{0}}=-\frac{{\pi ^{2}}}{2},\hspace{1em}{c_{1}}=\frac{4\sqrt{2\pi }}{9},\hspace{1em}{c_{2}}=-\frac{{\pi ^{2}}}{8},\]
for $k\ge 3$ odd
\[ {c_{k}}=\frac{2\sqrt{2\pi }}{(k+2)!!}\left(1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)-\frac{k+1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)\right),\]
and for $k\ge 4$ even
\[ {c_{k}}=-\frac{{\pi ^{2}}}{(k+2)!!}.\]
Proof.
Considering the expressions for ${I_{1}}$, ${I_{2}}$, and ${I_{3}}$ gives
\[\begin{array}{l}\displaystyle {c_{0}}=-\frac{{\pi ^{2}}}{2},\\ {} \displaystyle {c_{1}}={a_{1}}=\frac{4\sqrt{2\pi }}{9},\\ {} \displaystyle {c_{2}}={a_{2}}+\frac{3{\pi ^{2}}}{32}=-\frac{{\pi ^{2}}}{8},\end{array}\]
and
\[ {c_{k}}={a_{k}}\hspace{1em}\hspace{2.5pt}\text{whenever}\hspace{2.5pt}k\ge 3,\]
where ${a_{k}}$ is given in Lemma 3.3. It follows that for $k\ge 3$ odd we have
\[\begin{aligned}{}{c_{k}}& ={d_{k-1}}(3{J_{k+1}}-{J_{k-1}})+{d_{k-3}}({J_{k+1}}-{J_{k-1}})\\ {} & =\frac{\sqrt{2\pi }}{2}\Bigg(\frac{1}{(k-1)!!}\Bigg[\frac{3}{k+2}\Bigg({\pi ^{2}}-4\frac{(k+1)!!}{(k+2)!!}\\ {} & \hspace{1em}\times \Bigg(1+{\sum \limits_{j=0}^{\frac{k+1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\Bigg)\Bigg)\\ {} & \hspace{1em}-\frac{1}{k}\Bigg({\pi ^{2}}-4\frac{(k-1)!!}{k!!}\Bigg(1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\Bigg)\Bigg)\Bigg]\\ {} & \hspace{1em}+\frac{1}{(k-3)!!}\Bigg[\frac{1}{k+2}\Bigg({\pi ^{2}}-4\frac{(k+1)!!}{(k+2)!!}\\ {} & \hspace{1em}\times \Bigg(1+{\sum \limits_{j=0}^{\frac{k+1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\Bigg)\Bigg)\\ {} & \hspace{1em}-\frac{1}{k}\Bigg({\pi ^{2}}-4\frac{(k-1)!!}{k!!}\Bigg(1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)\Bigg)\Bigg)\Bigg]\Bigg),\end{aligned}\]
and after simplifying we have
\[ {c_{k}}=\frac{2\sqrt{2\pi }}{(k+2)!!}\left(1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)-\frac{k+1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)\right).\]
Similarly for $k\ge 4$ even we get
\[\begin{aligned}{}{c_{k}}& ={d_{k-1}}(3{J_{k+1}}-{J_{k-1}})+{d_{k-3}}({J_{k+1}}-{J_{k-1}})\\ {} & =\frac{{\pi ^{2}}}{(k-1)!!}\left[\frac{3}{k+2}\left(\frac{(k+1)!!}{(k+2)!!}-1\right)-\frac{1}{k}\left(\frac{(k-1)!!}{k!!}-1\right)\right]\\ {} & \hspace{1em}+\frac{{\pi ^{2}}}{(k-3)!!}\left[\frac{1}{k+2}\left(\frac{(k+1)!!}{(k+2)!!}-1\right)-\frac{1}{k}\left(\frac{(k-1)!!}{k!!}-1\right)\right]\\ {} =& -\frac{{\pi ^{2}}}{(k+2)!!}.\end{aligned}\]
This completes the proof.  □
Lemma 3.5.
For all $k\ge 3$ odd, set
(3.5)
\[ S(k)=1+{\sum \limits_{j=0}^{\frac{k-1}{2}-1}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)-\frac{k+1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right).\]
Then $S(k)$ is increasing and satisfies $2\sqrt{2\pi }S(k)\ge \pi $.
Proof.
First note that
\[\begin{aligned}{}2\sqrt{2\pi }S(3)& =2\sqrt{2\pi }\left(1+\frac{1}{6}\left(\genfrac{}{}{0pt}{}{1}{0}\right)-\frac{4}{5}\frac{1}{{2^{3}}}\left(\genfrac{}{}{0pt}{}{3}{1}\right)\right)\\ {} & =2\sqrt{2\pi }\left(1+\frac{1}{6}-\frac{3}{10}\right)\\ {} & =\frac{26\sqrt{2\pi }}{15}\ge \pi .\end{aligned}\]
It remains to show that $S(k)$ is an increasing function. We have
\[\begin{aligned}{}S(k+2)& =1+{\sum \limits_{j=0}^{\frac{k-1}{2}}}\frac{1}{{2^{2j+1}}(2j+3)}\left(\genfrac{}{}{0pt}{}{2j+1}{j}\right)-\frac{k+3}{{2^{k+2}}(k+4)}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\\ {} & =S(k)+\frac{k+1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)+\frac{1}{{2^{k}}(k+2)}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)\\ {} & \hspace{1em}-\frac{k+3}{{2^{k+2}}(k+4)}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\\ {} & =S(k)+\frac{1}{{2^{k}}}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)-\frac{k+3}{{2^{k+2}}(k+4)}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\\ {} & \ge S(k)+\frac{1}{{2^{k}}}\left(\genfrac{}{}{0pt}{}{k}{\frac{k-1}{2}}\right)-\frac{1}{{2^{k+2}}}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\\ {} & =S(k)+\frac{(k+1)!}{{2^{k+2}}\left(\frac{k+1}{2}\right)!\left(\frac{k+3}{2}\right)!}\left(k+3-(k+2)\right)\\ {} & =S(k)+\frac{(k+1)!}{{2^{k+2}}\left(\frac{k+1}{2}\right)!\left(\frac{k+3}{2}\right)!}\end{aligned}\]
and hence $S(k+2)\ge S(k)$. This completes the proof.  □
Corollary 3.6.
For $x\gt 0$ the series
\[ {\sum \limits_{k=0}^{\infty }}{c_{k}}{x^{k}},\]
where the coefficients ${c_{k}}$ are determined by (3.4), is alternating.
Proof.
For k even we clearly have ${c_{k}}\lt 0$. On the other hand, for k odd ${c_{k}}$ is positive since
\[ {c_{k}}\ge \frac{\pi }{(k+2)!!}\]
by Lemma 3.5.  □
The following result shows that the sum in Lemma 3.5 can also be bounded from above.
Lemma 3.7.
For all $k\ge 3$ odd, $S(k)$ defined by (3.5) satisfies $2\sqrt{2\pi }S(k)\le {\pi ^{2}}$.
Proof.
By elementary manipulations as in the proof of Lemma 3.5 we obtain
\[ S(k+2)=S(k)+\frac{2\sqrt{2\pi }(k+3)}{{2^{k+2}}(k+2)(k+4)}\left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right).\]
From the Stirling approximation we infer
\[ \left(\genfrac{}{}{0pt}{}{k+2}{\frac{k+1}{2}}\right)\le \sqrt{\frac{2}{\pi }}\frac{{2^{k+2}}}{\sqrt{(k+2)}},\]
and hence
\[ S(k+2)\le S(k)+\frac{\sqrt{2}}{\sqrt{\pi }{(k+2)^{3/2}}}.\]
Moreover,
\[ S(3)=1+\frac{1}{6}-\frac{3}{10}\le 1+\frac{1}{{3^{3/2}}}\]
allows us to get the estimates
\[ \underset{k\to \infty }{\lim }2\sqrt{2\pi }S(k)\le 4{\sum \limits_{k=0}^{\infty }}\frac{1}{{(2k+1)^{3/2}}}\le {\pi ^{2}}.\]
Since S is increasing by Lemma 3.5, it follows that $S(k)\le {\pi ^{2}}$ for all $k\ge 3$ odd. This completes the proof.  □
Lemma 3.8.
Let the coefficients ${c_{k}}$, $k=1,2,\dots $, be given by (3.4) and let $x\gt 0$ be fixed. Then the terms in the series
\[ {\sum \limits_{k=M}^{\infty }}{c_{k}}{x^{k}}\]
decreases monotonically for M large enough.
Proof.
Consider first the case when k is odd. Then, using the upper bound of Lemma 3.7 yields
\[ \left|\frac{{c_{k+1}}{x^{k+1}}}{{c_{k}}{x^{k}}}\right|\le \frac{\frac{{\pi ^{2}}}{(k+3)!!}}{\frac{{\pi ^{2}}}{(k+2)!!}}x=\frac{(k+2)!!}{(k+3)!!}x\]
which is less than one for sufficiently large k (depending on x). Similarly for k even we can use the lower bound from Lemma 3.5 to obtain
\[ \left|\frac{{c_{k+1}}{x^{k+1}}}{{c_{k}}{x^{k}}}\right|\le \frac{\frac{{\pi ^{2}}}{(k+3)!!}}{\frac{\pi }{(k+2)!!}}x=\frac{(k+2)!!}{(k+3)!!}\pi x\]
which again is less than one for large enough k. This yields the claim.  □
Lemma 3.9.
Denote
\[ M(x)={\sum \limits_{k=0}^{\infty }}\frac{{x^{2k}}}{(2k+2)!!}\]
and
\[ N(x)={\sum \limits_{k=0}^{\infty }}\frac{{x^{2k+1}}}{(2k+3)!!}.\]
Then
\[ M(x)\gt N(x)\]
for all $x\ge 0$.
Proof.
By differentiating, we get
\[\begin{aligned}{}{M^{\prime }}(x)& ={\sum \limits_{k=0}^{\infty }}\frac{(2k+2){x^{2k+1}}}{(2k+4)!!}={\sum \limits_{k=0}^{\infty }}\frac{{x^{2k+1}}}{(2k+3)!!}\frac{(k+1)}{{2^{2k+2}}}\left(\genfrac{}{}{0pt}{}{2k+3}{k+1}\right).\end{aligned}\]
Now using
\[ \left(\genfrac{}{}{0pt}{}{2k+3}{k+1}\right)\ge \frac{{2^{2k+2}}}{\sqrt{k+1}}\]
gives
\[ {M^{\prime }}(x)\ge {\sum \limits_{k=0}^{\infty }}\frac{\sqrt{k+1}}{(2k+3)!!}{x^{2k+1}}\ge N(x).\]
Similarly, it holds that
\[ {N^{\prime }}(x)={\sum \limits_{k=0}^{\infty }}\frac{(2k+1){x^{2k}}}{(2k+3)!!}={\sum \limits_{k=0}^{\infty }}\frac{{x^{2k}}}{(2k+2)!!}\frac{(2k+1){2^{2k+2}}}{(k+2)\left(\genfrac{}{}{0pt}{}{2k+3}{k+1}\right)}.\]
In this case we can use
\[ \left(\genfrac{}{}{0pt}{}{2k+3}{k+1}\right)\ge \frac{{2^{2k+3}}}{2k+3}\]
leading to
\[ {N^{\prime }}(x)\le {\sum \limits_{k=0}^{\infty }}\frac{{x^{2k}}}{(2k+2)!!}\frac{k+\frac{1}{2}}{(k+2)(2k+3)}\le M(x).\]
Combining the two bounds above gives us
\[\begin{aligned}{}\frac{{\mathrm{d}^{}}}{\mathrm{d}{x^{}}}\left({M^{2}}(x)-{N^{2}}(x)\right)& =2M(x){M^{\prime }}(x)-2N(x){N^{\prime }}(x)\\ {} & \ge M(x)N(x)-M(x)N(x)=0.\end{aligned}\]
Consequently, ${M^{2}}-{N^{2}}$ is an increasing function for $x\ge 0$, which leads to
\[ {M^{2}}(x)-{N^{2}}(x)\ge {M^{2}}(0)-{N^{2}}(0)=1.\]
It follows that
\[ M(x)\ge \sqrt{1+{N^{2}}(x)}\gt N(x),\]
and the proof is complete.  □
We are finally in the position to prove Proposition 2.7.
Proof of Proposition 2.7..
By Corollary 3.6 and Lemma 3.8 the series expansion (3.4) for ${f^{\prime }}(x)\exp ({x^{2}}/2)$ is convergent by the Leibniz alternating series test. Since the radius of convergence is unbounded, it is analytic, and thus we may split the series into its positive part and negative part, given by
\[ {\sum \limits_{k=0}^{\infty }}{c_{2k}}{x^{2k}}=:Q(x),\hspace{2em}{\sum \limits_{k=0}^{\infty }}{c_{2k+1}}{x^{2k+1}}=:P(x).\]
By Lemma 3.4 it holds that $Q(x)=-{\pi ^{2}}M(x)$ where $M(x)$ is as in Lemma 3.9. Now Lemma 3.7 gives
\[ {c_{2k}}\le \frac{{\pi ^{2}}}{(k+2)!!},\]
and hence
\[ Q(x)\le {\pi ^{2}}N(x),\]
where $N(x)$ is as in Lemma 3.9. Therefore
\[ {f^{\prime }}(x)\exp ({x^{2}}/2)=Q(x)+P(X)\le -{\pi ^{2}}M(x)+{\pi ^{2}}N(x).\]
Applying Lemma 3.9 to the above inequality now gives
\[ {f^{\prime }}(x)\exp (\frac{{x^{2}}}{2})\lt 0,\]
and thus ${f^{\prime }}(x)\lt 0$ for all $x\ge 0$, where f is given by (3.2). The claim follows from this.  □

A Some results from differential geometry

Here we introduce some classical concepts about manifolds and their properties used in the paper.
Let M be a smooth and compact n-dimensional manifold embedded in ${\mathbb{R}^{k}}$. At each point, the tangent space, ${T_{x}}M$, is equipt with the Euclidean inner product (Riemannian metric), inherited from the ambient space ${\mathbb{R}^{k}}$ making M an isometrically embedded manifold as a Riemannian manifold. The Riemannian metric, denoted $\left\langle \cdot ,\cdot \right\rangle $, induces a notion of length along smooth curves, $\gamma :[a,b]\to M$, by the formula
\[ L(\gamma )={\int _{a}^{b}}\left\| \dot{\gamma }(t)\right\| \mathrm{d}t.\]
Given two points, or equivalently a starting point and starting velocity, the curve which minimizes this distance is called a geodesic. Centered at a point $x\in M$, the map ${\exp _{x}}:{T_{x}}M\to M$ is called the Riemannian exponential map centered at $x\in M$. Given a tangent vector ${v_{x}}\in {T_{x}}M$, let γ be the unit speed geodesic such that $\gamma (0)=x$ and $\dot{\gamma }(0)=\frac{{v_{x}}}{\left\| {v_{x}}\right\| }$, then
\[ \exp ({v_{x}})=\gamma (\left\| {v_{x}}\right\| ).\]
We shall assume that M is geodesically complete, meaning that the exponential map is defined on the whole ${T_{x}}M$ for all $x\in M$. Denote the inverse of ${\exp _{p}}$ (restricted to all points y for which there is a unique geodesic connecting y to p) by ${\log _{p}}$. The distance function, $\operatorname{dist}:M\times M\to [0,\infty )$ (which is a topological metric) is then defined as $\operatorname{dist}(x,y)=\left\| {\log _{x}}(y)\right\| $. The Riemannian metric also gives a way to measure the volume (and orientation) of a parallelotope inside the tangent space, i.e. a rescaling of the determinant from linear algebra. This generalized determinant shall be referred to as ${\operatorname{dVol}_{M}}$, or the Riemannian volume form, which locally looks like the Lebesgue measure. By vector field, we mean a smooth assignment of a point $x\in M$ to a tangent vector ${T_{x}}M$, and the space of vector fields is denoted by $\mathfrak{X}(M)$. For a given smooth function $f\in {C^{\infty }}(M)$, a vector field X acts as a derivation on f in the following sense. Take a smooth curve $(-\varepsilon ,\varepsilon )\to M$ such that $\gamma (0)=p$, $\dot{\gamma }(0)=X(p)$. Then the function $X(f):M\to \mathbb{R}$ is point-wise defined by $X(f)(p)={(f\circ \gamma )^{\prime }}(0)$.
Note that for given vector fields $X,Y:M\to TM$, the expression $\left\langle X,Y\right\rangle $ is a smooth function $M\to \mathbb{R}$. In order to differentiate vector fields, a notion of connection is required. Here we shall use the Levi-Civita connection $\nabla :\mathfrak{X}(M)\times \mathfrak{X}(M)\to \mathfrak{X}(M)$ which is uniquely defined by the following identities:
  • i) ${\nabla _{fX+gY}}Z=f{\nabla _{X}}Z+g{\nabla _{Y}}Z$
    for all smooth functions $f,g\in {\mathcal{C}^{\infty }}(M)$ and all smooth vector fields
    $X,Y,Z\in \mathfrak{X}(M)$;
  • ii) ${\nabla _{X}}(gY+hZ)=g{\nabla _{X}}Y+X(g)Y+h{\nabla _{X}}Z+X(h)Z$
    for all smooth functions $g,h\in {\mathcal{C}^{\infty }}(M)$ and all smooth vector fields
    $X,Y,Z\in \mathfrak{X}(M)$;
  • iii) $X(\left\langle Y,Z\right\rangle )=\left\langle {\nabla _{X}}(Y),Z\right\rangle +\left\langle Y,{\nabla _{X}}(Z)\right\rangle $
    for all smooth vector fields $X,Y,Z\in \mathfrak{X}(M)$;
  • iv) $({\nabla _{X}}Y-{\nabla _{Y}}X)(f)=X(Y(f))-Y(X(f))$
    as derivations for all smooth functions $f\in {\mathcal{C}^{\infty }}(M)$ and all smooth vector fields $X,Y\in \mathfrak{X}(M)$.
Lastly, let ${P_{q,\ell }}:{T_{q}}M\to {T_{\ell }}M$ denote the parallel transport from q to ℓ. That is, ${P_{q,\ell }}({v_{q}})$ is the unique point $\tau (\operatorname{dist}(q,\ell ))\in {T_{\ell }}M$ that satisfies the initial value problem
\[ {\nabla _{\dot{\gamma }}}\tau (t)=0,\hspace{2em}\tau (0)={v_{q}},\]
where $\gamma :[0,\operatorname{dist}(q,\ell )]\to M$ is the geodesic connecting q to ℓ and τ is a vector field along γ, i.e. $\tau (t)={\tau _{\gamma (t)}}$. It is worthwhile to consider the parallel transport of the Riemannian logarithm map
\[ q\longmapsto {P_{q,\ell }}{\log _{q}}(\nu ),\]
where $\nu ,\ell \in M$ are fixed points. This map has the Taylor expansion around $q=\ell $ given by
(A.1)
\[ {P_{q,\ell }}{\log _{q}}(\nu )={\log _{\ell }}(\nu )+{\nabla _{{\log _{\ell }}(q)}}{\log _{\ell }}(\nu )+\mathcal{O}(\operatorname{dist}{(q,\ell )^{2}}).\]

References

[1] 
Abramowitz, M., Stegun, I.A.: Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables vol. 55. US Government printing office (1968) MR0415956
[2] 
Ahidar-Coutrix, A., Le Gouic, T., Paris, Q.: Convergence rates for empirical barycenters in metric spaces: curvature, convexity and extendable geodesics. Probab. Theory Relat. Fields 177(1), 323–368 (2020) MR4095017. https://doi.org/10.1007/s00440-019-00950-0
[3] 
Chang, T.: Spherical regression and the statistics of tectonic plate reconstructions. Int. Stat. Rev., 299–316 (1993)
[4] 
Dawson, H.G.: On the numerical value of ${\textstyle\int _{0}^{h}}{e^{{x^{2}}}}dx$. Proc. Lond. Math. Soc. 1(1), 519–522 (1897) MR1576451. https://doi.org/10.1112/plms/s1-29.1.519
[5] 
Fisher, N.I., Lewis, T., Embleton, B.J.J.: Statistical Analysis of Spherical Data. Cambridge University Press (1993) MR1247695
[6] 
Gao, F., Chia, K., Machin, D.: On the evidence for seasonal variation in the onset of acute lymphoblastic leukemia (all). Leuk. Res. 31(10), 1327–1338 (2007)
[7] 
Hernandez-Stumpfhauser, D., Breidt, F.J., van der Woerd, M.J.: The general projected normal distribution of arbitrary dimension: Modeling and Bayesian inference. Bayesian Anal. 12(1), 113–133 (2017) MR3597569. https://doi.org/10.1214/15-BA989
[8] 
Hotz, T., Le, H., Wood, A.T.A.: Central limit theorem for intrinsic Frechet means in smooth compact Riemannian manifolds. Probab. Theory Relat. Fields (2024) MR4771114. https://doi.org/10.1007/s00440-024-01291-3
[9] 
Hussien, M.T., Seddik, K.G., Gohary, R.H., Shaqfeh, M., Alnuweiri, H., Yanikomeroglu, H.: Multi-resolution broadcasting over the Grassmann and Stiefel manifolds. In: 2014 IEEE International Symposium on Information Theory, pp. 1907–1911 (2014). IEEE
[10] 
Kells, L.M., Kern, W.F., Bland, J.R.: Plane and Spherical Trigonometry. US Armed Forces Institute (1940) MR1524230. https://doi.org/10.2307/2302987
[11] 
Lang, M., Feiten, W.: Mpg-fast forward reasoning on 6 dof pose uncertainty. In: ROBOTIK 2012; 7th German Conference on Robotics, pp. 1–6 (2012). VDE
[12] 
Levitt, M.: A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104(1), 59–107 (1976)
[13] 
Li, K., Frisch, D., Radtke, S., Noack, B., Hanebeck, U.D.: Wavefront orientation estimation based on progressive Bingham filtering. In: 2018 Sensor Data Fusion: Trends, Solutions, Applications (SDF), pp. 1–6 (2018). IEEE
[14] 
Nuñez-Antonio, G., Ausín, M.C., Wiper, M.P.: Bayesian nonparametric models of circular variables based on Dirichlet process mixtures of normal distributions. J. Agric. Biol. Environ. Stat. 20, 47–64 (2015) MR3334466. https://doi.org/10.1007/s13253-014-0193-y
[15] 
Nuñez-Antonio, G., Gutiérrez-Peña, E.: A Bayesian analysis of directional data using the projected normal distribution. J. Appl. Stat. 32, 995–1001 (2005) MR2221902. https://doi.org/10.1080/02664760500164886
[16] 
Nuñez-Antonio, G., Gutiérrez-Peña, E., Escarela, G.: A Bayesian regression model for circular data based on the projected normal distribution. Stat. Model. 11, 185–201 (2011) MR2849683. https://doi.org/10.1177/1471082X1001100301
[17] 
Pennec, X.: Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. J. Math. Imaging Vis. 25, 127–154 (2006) MR2254442. https://doi.org/10.1007/s10851-006-6228-4
[18] 
Pennec, X., Arsigny, V.: Exponential barycenters of the canonical Cartan connection and invariant means on Lie groups. In: Matrix Information Geometry, pp. 123–166. Springer (2012) MR2964451. https://doi.org/10.1007/978-3-642-30232-9_7
[19] 
Pitaval, R., Tirkkonen, O.: Joint Grassmann-Stiefel quantization for MIMO product codebooks. IEEE Trans. Wirel. Commun. 13(1), 210–222 (2013)
[20] 
Presnell, B., Morrison, S.P., Littell, R.C.: Projected multivariate linear models for directional data. J. Am. Stat. Assoc. 93, 1068–1077 (1998) MR1649201. https://doi.org/10.2307/2669850
[21] 
Seddik, K.G., Gohary, R.H., Hussien, M.T., Shaqfeh, M., Alnuweiri, H., Yanikomeroglu, H.: Multi-resolution multicasting over the Grassmann and Stiefel manifolds. IEEE Trans. Wirel. Commun. 16(8), 5296–5310 (2017)
[22] 
Sveier, A., Egeland, O.: Pose estimation using dual quaternions and moving horizon estimation. IFAC-PapersOnLine 51(13), 186–191 (2018)
[23] 
Tanabe, A., Fukumizu, K., Oba, S., Takenouchi, T., Ishii, S.: Parameter estimation for von mises–fisher distributions. Comput. Stat. 22, 145–157 (2007) MR2299252. https://doi.org/10.1007/s00180-007-0030-7
[24] 
Wang, F., Gelfand, A.E.: Directional data analysis under the general projected normal distribution. Stat. Methodol. 10(1), 113–127 (2013) MR2974815. https://doi.org/10.1016/j.stamet.2012.07.005
Reading mode PDF XML

Table of contents
  • 1 Introduction
  • 2 Main result
  • 3 Proofs
  • A Some results from differential geometry
  • References

Copyright
© 2025 The Author(s). Published by VTeX
by logo by logo
Open access article under the CC BY license.

Keywords
Spherical statistics projected normal parameter estimation

MSC2010
62H11 62F10

Funding
This project has received partial funding from Huawei Technologies.

Metrics
since March 2018
7

Article info
views

5

Full article
views

2

PDF
downloads

1

XML
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

  • Figures
    4
  • Tables
    1
  • Theorems
    3
vmsta279_g001.jpg
Algorithm 1
Estimating intrinsic average and covariance for a sample ${y_{\ell }}$ on a manifold M
Geometric picture showing a sphere S² and a normal random variable X with covariance matrix Σ, and mean μ. It illustrates that the projected normal random variable pr(X) might not have mean pr(μ) for a general covariance matrix Σ.
Fig. 1.
A normally distributed random variable $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )$, with the covariance matrix Σ such that the mean μ is not an eigenvector for Σ, for which $\mathbb{E}[\operatorname{pr}(X)]\ne \operatorname{pr}(\mu )$
Graph showing the function f(σ²)=tr(Cov(pr(X)))/2 as a blue curve and the upper limit (π²-4)/4 as a red horizontal line, with σ² on the x-axis.
Fig. 2.
A plot of the scalar variance of $\operatorname{pr}(X)$, i.e. $f({\sigma ^{2}})$ from Proposition 2.7, where $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ and $\mu \in {\mathbb{S}^{2}}$ is arbitrary. The red line indicates the upper bound $({\pi ^{2}}-4)/4={\lim \nolimits_{\sigma \to \infty }}\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}$
Box plot comparing estimates of lambda for 5 sample sizes (L=5 to 100), showing convergence to true lambda (red line) as sample size increases.
Fig. 3.
A box plot of the inferred estimator $\hat{\lambda }$ in Theorem 2.8 using the empirical covariance, Equation (2.2), for L randomly generated projected normal random variables with ${\sigma ^{2}}=1$ and $\mu ={(0,0,1)^{T}}$. For each L this is done with 1000 repetitions. The true underlying scalar variance, $\lambda ={f^{-1}}(\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))/2)=1$, is shown in red color. Note that some outliers are omitted for readability of the graph
Table 1.
The absolute error of 100 times repeated Monte Carlo simulations of the empirical covariance, Equation (2.2), compared to the theoretical covariance of the projected normal distribution, $\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))/2$. Each simulation run uses L data points and $\sigma =1$
Theorem 2.5.
Theorem 2.6.
Theorem 2.8.
vmsta279_g001.jpg
Algorithm 1
Estimating intrinsic average and covariance for a sample ${y_{\ell }}$ on a manifold M
Geometric picture showing a sphere S² and a normal random variable X with covariance matrix Σ, and mean μ. It illustrates that the projected normal random variable pr(X) might not have mean pr(μ) for a general covariance matrix Σ.
Fig. 1.
A normally distributed random variable $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )$, with the covariance matrix Σ such that the mean μ is not an eigenvector for Σ, for which $\mathbb{E}[\operatorname{pr}(X)]\ne \operatorname{pr}(\mu )$
Graph showing the function f(σ²)=tr(Cov(pr(X)))/2 as a blue curve and the upper limit (π²-4)/4 as a red horizontal line, with σ² on the x-axis.
Fig. 2.
A plot of the scalar variance of $\operatorname{pr}(X)$, i.e. $f({\sigma ^{2}})$ from Proposition 2.7, where $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$ and $\mu \in {\mathbb{S}^{2}}$ is arbitrary. The red line indicates the upper bound $({\pi ^{2}}-4)/4={\lim \nolimits_{\sigma \to \infty }}\frac{\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))}{2}$
Box plot comparing estimates of lambda for 5 sample sizes (L=5 to 100), showing convergence to true lambda (red line) as sample size increases.
Fig. 3.
A box plot of the inferred estimator $\hat{\lambda }$ in Theorem 2.8 using the empirical covariance, Equation (2.2), for L randomly generated projected normal random variables with ${\sigma ^{2}}=1$ and $\mu ={(0,0,1)^{T}}$. For each L this is done with 1000 repetitions. The true underlying scalar variance, $\lambda ={f^{-1}}(\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))/2)=1$, is shown in red color. Note that some outliers are omitted for readability of the graph
Table 1.
The absolute error of 100 times repeated Monte Carlo simulations of the empirical covariance, Equation (2.2), compared to the theoretical covariance of the projected normal distribution, $\operatorname{tr}(\operatorname{Cov}(\operatorname{pr}(X)))/2$. Each simulation run uses L data points and $\sigma =1$
L 30 50 100 1000 ${10^{4}}$ ${10^{5}}$ ${10^{6}}$
error 0.013 0.0080 0.0055 0.0033 0.0015 8.4e-05 2.9e-05
Theorem 2.5.
Let ${\{{\xi _{\ell }}\}_{\ell =1}^{L}}$ be L independent identically distributed random variables on a compact geodesically complete manifold M. Suppose further they have a unique mean $\mathbb{E}[{\xi _{\ell }}]=\mu $ and an isotropic covariance $\operatorname{Cov}({\xi _{\ell }})=vI$. Then,
\[ \frac{1}{L-1}{\sum \limits_{\ell =1}^{L}}{\log _{\hat{\mu }}}({\xi _{\ell }}){\log _{\hat{\mu }}}{({\xi _{\ell }})^{T}}\stackrel{\mathbb{P}}{\longrightarrow }vI\]
with the same rate of convergence as the empirical mean $\hat{\mu }$ of the sample ${\{{\xi _{\ell }}\}_{\ell =1}^{L}}$.
Theorem 2.6.
Let X be a normally distributed random variable with average $\mu \in {\mathbb{R}^{3}}$ and covariance matrix Σ, i.e. $X\stackrel{\mathrm{d}}{=}N(\mu ,\Sigma )\in {\mathbb{R}^{3}}$. If $\Sigma ={\sigma ^{2}}{I_{3}}$, then $\mathbb{E}[\operatorname{pr}(X)]=\operatorname{pr}(\mu )$.
Theorem 2.8.
Let X be a normally distributed random variable with mean $\mu \in {\mathbb{R}^{3}}$ and isotropic variance ${\sigma ^{2}}{I_{3}}$, i.e. $X\stackrel{\mathrm{d}}{=}N(\mu ,{\sigma ^{2}}{I_{3}})$. Given independent measurements $({x_{1}},{x_{2}},\dots ,{x_{L}})$ from $\operatorname{pr}(X)$, we can estimate $\lambda =\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}$ by
\[ \hat{\lambda }={f^{-1}}\left(\frac{\operatorname{tr}(\hat{V})}{2}\right)\]
where $\hat{V}$ is the empirical covariance matrix given in Equation (2.2) and where f is the bijection given in Proposition 2.7 and defined in Equation (2.4). Moreover, it holds that
\[ \hat{\lambda }\stackrel{\mathbb{P}}{\longrightarrow }\frac{{\sigma ^{2}}}{{\left\| \mu \right\| ^{2}}}\]
as $L\to \infty $, with rate of convergence $\sqrt{L}$.

MSTA

MSTA

  • Online ISSN: 2351-6054
  • Print ISSN: 2351-6046
  • Copyright © 2018 VTeX

About

  • About journal
  • Indexed in
  • Editors-in-Chief

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • ejournals-vmsta@vtex.lt
  • Mokslininkų 2A
  • LT-08412 Vilnius
  • Lithuania
Powered by PubliMill  •  Privacy policy