Geometrical, Algebraic, Functional and Correlation Inequalities Applied in Support of James-Stein Estimator for Multidimensional Projections

Isoperimetric, Milman reverse, Hilbert, Widder, Fan-Taussky-Todd, Landau, and Fortuin–Kasteleyn–Ginibre (FKG) inequalities in n dimensions in investigations of multidimensional estimators support the use of James-Stein estimator against classical least squares as applied to Cumulant Analysis, Associate Random Variables, and Time Series Analysis.


Introduction
"quoting Virgil: At last they landed, where from far your eyes May view the turrets of new Carthage rise; There bought a space of ground, which Byrsa call'd, From the bull's hide they first inclos'd, and wall'd.
(Aeneid, Dryden's translation) This refers to the legend of Dido. Virgil's version has it that Dido, daughter of the king of Tyre, fled her home after her brother had killed her husband. Then she ended up on the north coast of Africa, where she bargained to buy as much land as she could enclose with an oxhide. So she cut the hide into thin strips, and then she faced, and presumably solved, the problem of enclosing the largest possible area within a given perimeter-the isoperimetric problem. But earthly factors mar the purity of the problem, for surely the clever Dido would have chosen an area by the coast so as to exploit the shore as part of the perimeter. This is essential for the mathematics as well as for the progress of the story. Virgil tells us that Aeneas, on his quest to found Rome, is shipwrecked and blown ashore at Carthage. Dido falls in love with him, but he does not return her love. He sails away and Dido kills herself. Kline concludes [23, p. 135]: "And so an ungrateful and unreceptive man with a rigid mind caused the loss of a potential mathematician. This was the first blow to mathematics which the Romans dealt." [33] From this last quote of Kline we can think or even deduce without rigorous proof that Dido possibly suspected some mistake or maybe something mysteriously unsolved in the problem, which she could blame for her unsuccessful love.
The subject of multidimensional projections that is equivalent to reduction of dimensions has a long history and there are several difficulties of approaching it. The development of multidimensional mathematical and statistical apparatus should not be ignored, when considering reduction in number of dimensions in simplifying mathematical or statistical problem. The very good example are James-Stein estimators that depend on the number of dimensions and in dimensions more than 2 is deferent from the classical least squares estimator that almost uniformly reduces every problem to 2-dimensional (2D) consideration.

Isoperimetric Inequalities
The Isoperimetric inequality for convex body K in R n , B n n-dimensional ball of unit radius and S a surface area of a compact set in R n : Where Volume and Surface area of the ball of radius r in R n is Vol (B n (r)) = The solution of problem is traced back to Pythagoras (2600 -2500 years ago), who first approached the problem, and then to Aristotle's (200 years later) knowledge of the maximum principle of the circle. Zenodorus (200-140 B. C.) (though his monograph was lost) was the first to write the proof that the maximal area with fixed perimeter in the plane would be a circle. Euler some 273 years ago wrote solution, which is considered necessary, but not sufficient, using method, which is called today Lagrange multipliers.
For example, the well-known Volterra function and Cantor ternary function defined on the interval [0; 1] are known to be continuous and rectifiable, but the arc length integral does not exist either as a proper or improper Riemann integral. It uses a Theorem if y and belong to the Euler class, then so do the variations y+t [3]. Gergonne some 200 years ago used symmetry argument that was later used by Steiner. The first formal proof for the planar case is attributed to Steiner (170 years ago) who gave five different proofs [33][34][35][36][37].
It can easily be derived from Brunn-Minkowski inequality, which however, has its inverse inequality found by Milman [48,49] For non rigorous consideration the proof for plane case that A ≤ L 2 /4 consists from few steps, where A denotes the area, and L the length of the curve.
1. Lemma. The figure solving the isoperimetric problem is convex. The proof uses contradiction argument.
2. Consideration of the n-polygons that consist from triangles.
3. Theorem. Only isosceles triangles should be considered 4. Rescaling for inscribing the cycle into the n-polygon. 5. According to [33] maximization of the area of two isosceles triangles with given bases and the base angles α, β and the bases a, b: sin α: sin β = a: b can lead to counterexample to Zenodorus lemma the isosceles triangle has the greatest area, and if the 2 isosceles triangles are not similar the similar isosceles triangles with the same bases and with the same sum of perimeters have the greatest area.
That was shown geometrically by Steiner, who says it had already been shown by Lhuilier using differential calculus. 6. In contrast to the earlier arguments by symmetry Steiner's argument is based on mean curve that is equally distanced from 2 original curves.
8. Another approach for n-polygons the problem becames to maximize with fixed perimeter . A year after Charles Hermite death Hurwitz [1] published Fourier series proof With application of Parseval's theorem The problem with the above parameterization t→ (x (t), y (t)) becomes x' 2 + y' 2 = (L/2 ) 2 (9) with x, x', y, and y' as below y (t) = 1/2c 0 + ∑ 3 + cos (nt) + d n sin (nt) (11) x'(t) = ∑ 2 + cos (nt) -a n sin (nt) (12) y'(t) = ∑ 0 + cos (nt) -c n sin (nt) (13) After expansion for term for n=1 becomes a 1 = d 1 and b 1 = -c 1 with terms for n > 1 vanishing a n = d n = b n = c n = 0 and the equations become equations for circle. Let b n = a n a 0 + a n-1 a 1 + ….+ a 0 a n , and ( ∑ -6 It was mentioned before that Isoperimetric inequality can be derived from Brunn-Minkowski inequality that was proposed by Hermann Brunn 10 years before Minkowski and states: where J denotes n-dimensional Lebesgue measure and the + on the left-hand side denotes Minkowski addition. 99 years passed before V. Milman established reverse form Vol (s 5K + t IL) 1/n ≤ C (s Vol (5K) 1/n + t Vol (IL) 1/n) (18) where φ and ψ are volume-preserving linear maps from R n to itself for any real numbers s, t > 0.

Sampling and Functional Inequalities Related to Probability and Stochastic Inequalities
Some of the most interesting extensions would be inequalities of Cauchy and Hilbert. Hilbert some 110 years ago, though did not use Cauchy's inequality, but rather some complicated methods to prove that with C= 2 , where the summations run from 1 to +∞, which was later reduced by I. Schur to C= . This inequality inspired very famous mathematicians G. H. Hardy, J. E. Littlewood and G. P´olya so much, that not only they devoted 2 (two) last chapters of their book "Inequalities" to Hilbert's double series theorem and its seemingly statistical extension "Rearrangements", but also as it was discussed in "by Hardy in his "Prolegomena to a chapter on inequalities," which was a Presidential Address at the annual meeting of the London Mathematical Society of 8 November/28, 1 year before D. V. Widder introduced two important extensions of Hilbert's inequality, and It presented 20 years of research into inequalities by the most distinguished mathematicians. "The joint author 'Hardy-Littlewood' produced 97 papers of the highest quality and was recognized as the best mathematician in the world for a decade or so. Hardy produced 279 papers of comparable quality in his own right. Littlewood produced 90 individual papers and 116 joint papers with various authors, including the 97 papers of which Hardy was a co-author." Hilbert's double series theorem possibly inspired A. Hurwitz to publish a new proof for AG ≥ HG inequality using sum of a function f of the n real variables over the n! of all possible n! permutations of variables. This paper, acording to R. Bellman, six years preceded his famous paper on the generation of invariants by integration over groups.
There were some improvements to constant C in Hilbert's inequality such, as by H. Frazer, who showed that it can be reduced to The later remarkable result is very important in light of the Bruijn-Wilf type best constant for n th finite section of Carleman's inequality ∑ (--… -* ) * P *+ < C n ∑ -* *+ , C n =e-2 e1/ln (n) 2 +O (1/ln (n) 3 ) And for the Hardy's inequality It is interesting to notice that Wilf adviced Bruijn to apply recurrence argument to Carleman's method The extensions of Widder with the remark of Hardy are the following inequalities: where the summations run from 0 to +∞, and the coefficient on the left-hand side is interpreted as 1/n when m = n; In this form it is very much resembling the famuos Landau-Kolmogorov inequality, which was inroduced by E. Landau 2 years after I. Schur's improvement.
‖.′‖ ≦ 4 ‖.‖ ‖.′′‖ He showed that for the norm of ‖.‖ defined to be the where C (n, k) are given in terms of zeros of polynomials. It is interesting to notice that C (n, 1) involves roots of lmln (m nm )√n h 1 o 1

= PROPOSITION 1 (Governing the reduction of dimensions). Therefore, as we can notice the use of supremum of a function over an interval, or maximum, or minimum instead of sum of the numbers completely loses the notion of multidimensionality and reduces it possibly to 1 dimension instead of double-number dimensional.
This Proposition is a direct consequence from the above discussion.
It is very certain, that this effect is given not only in mathematical formulae, but it has some psychological and methodological outcomes, that affect ability to recognize multidimensionality in the problem or mathematical model, as a way to solve it.
Similar notions can arise in the discrete inequalities of Ky Fan, Taussky, and Todd that date 50 to 60 years ago:

Introduction of Stein's Estimators for Exponential Family [46-48]
Introduction of Stein's estimators for exponential family of distributions is another connection between Poisson process in dimension ≥ 3 and normal or Gaussian distributions. If X is from exponential family of distributions its density is given by functional dependence: . q (x) = exp{7x-5(7)}k (x), x rℝ, let t (X) = - for any absolutely continuous function g on ℝ, such that E|/ # (v)|<∞, then the following identity holds This important identity is very useful for the application of the methods of linear combinations and ratios of random variables of exponential family of distributions that would be discussed in Subsection 2.3.1. and for future development of the theory of Stein estimator and different tests of hypotheses for multidimensional exponential families.

Linear Regression Model and Stein's Estimator Compared to Least Squares Estimator for Multidimensional Exponential Families
For regression model, Y = XC + e, where Y is a N x 1 vector of N observations on the variable to be explained, X is a N x K full-column-rank matrix of N observations on K fixed explanatory variables, C is a column vector of regression coefficients, and e is a N x 1 error vector with a multivariate normal distribution with mean vector 0 and variance-covariance matrix w I N , with w 2 unknown. The least-squares (LS) estimator for C is Stein's estimator compared to Least Squares estimator for multidimensional exponential families Stein's estimator for the above regression model is: where L is a scalar, An unbiased estimator of the bias of

Properties of Statistics and the Uniformly Most Powerful Invariant Test
It is worth to mention property of independence of Z = (X'X) 1/2 Ĉ † ‡ ∝N (J,w I K ) (J = (X'X 1/2 C)and (k‹ /w ) ∝ OE .
To test null hypothesis H 0 : C= 0 against alternative hypothesis H 1 : C≠ 0, the uniformly most powerful invariant test statistic is an F-distribution with K and n degrees of freedom and noncentrality parameter, D = C ' X ' XC/2w . And for F-ratio based on ℂ SE statistic • is a function of F, and it is invariant to the same linear and orthogonal transformations, as F is.

Simple Rules for the Reduction of Dimensions
As a result of the above discussion that established the support for the James -Stein estimators from Geometrical, Algebraic, Functional and Correlation Inequalities the following rules can be used for the consideration in the problems involving the reduction of dimensions: 1. The multidimensional problem should not be reduced to less than 3-dimensional problem; 2. James -Stein estimators should e preferred in place of usual least Squares estimators; 3. Other types of approaches to model multidimensional problem should be explored.
As an example below is a supporting discussion on three approaches to Multidimensional Time Model for Probability Distribution Function (MTM for PDF). It is concluded by Proposition that allows a new perspective in approaching multidimensional problem as decomposition of a process with special properties into composition of processes of Brownian motion.

First Approach to Multidimensional Time Model Through Kramers Turnover Problem in the Theory of Velocity of Chemical Reactions
Consider first the mathematical structure of the models of Boltzmann type kinetic equations for reacting gas mixtures for particles undergoing inelastic interactions with reactions of bimolecular and dissociation-recombination type is very complicated, because of the collisional operators that usually in the full Boltzmann equations, are expressed by 5-fold integrals. Consequently direct numerical applications of these models present several computational difficulties. The search for the simpler solution had its long way till the introduction of the equation for the Brownian motion by Albert Einstein. However, using the theory of Brownian motion for the velocity (rate) of chemical reactions Bohr, Kramers, and Slater used only one-dimensional (1D) model for The Kramers turnover problem, that is, obtaining a uniform expression for the rate of escape of a particle over a barrier for any value of the external friction until it was corrected by Grote-Hynes theory 40 years later, with new improvements following after 6 years by by Mel'nikov and Meshkov (MM). There are certainly other theories followed, all of them distinguish 1D approach from 2D, 3D, and multiD approaches.
It is important and very interesting to consider such point that Kramers in his original work had it as possibility that multidimensional pattern could be related to time dimensions, as he based his introduction theory of Brownian motion on the Einstein's pattern he considered a range of time intervals • . His discussion of the possibility of a term proportional to • in the expression for Moments of Brownian motion ' • k (n> 1) related it to the fact that "the values, which X takes at moments t1, t2. tn which lie sufficiently close together are no longer independent; and Moments of Brownian motion ' • k (n> 1) in fact are represented by a volume integral ∫…∫v (t 1 ) X (t 2 ).. X (tn) d t 1 d t 2 ....dt n over an n-dimensional cube; the contribution to this integral due to a narrow cylinder extending along the diagonal t 1 = t 2 =...= tn may give a term proportional to • ." [11]

Second Approach to Multidimensional Time Model Through Cumulant Functions and Time Series Analysis (Brillinger)
To strengthen this notion consider cumulants properties for time series analysis that provide measure of Gaussianity. If r. v. X is normal, then cumk{X} = 0 for k > 2, where cumk denotes the joint cumulants of X with itself k times.
For simplicity consider seq of iid Xi with all moments and E {Xi} = 0 and var {Xi} = 1, then for Sn = ΣXi/√k , cumk{Sn} = ncumk{X}/n k/2 that tends to 0 for k > 2, as n tends to infinity, so Sn has a limiting normal distribution.
And for time series analysis the moment function E{X (t+u1)… X (t+uk-1)X (t)} would not depend on t, and on the short time interval centered at point of time t can be approximated by normal distribution.

Third Approach Through Associated Random
Variables [23,24] Additional to the Brownian motion considerations in the theory of chemical reactions and time series analysis for cumulant functions, the same results can be obtained from the consideration of associated random variables.
[49] Consideration of associated random variables can be supported by Fortuin-Kasteleyn-Ginibre (FKG) inequality as a correlation inequality that states for finite distributive lattice on X with J nonnegative function on it satisfying FKG lattice conditions J (x ⋀ y) J (x ∨ y) ≥ J (x) J (y) for ∀ x, y ∈X Or <fg> -<f><g> ≥ 0, where finite distributive lattice has a least element, a minimal element x ≠ O is called an atom and relations ⋀ and ∨ satisfy either of the following for all x, y, z in X x ∨ (y ⋀ z) = x ∨ y) ⋀ (x ∨ z) Definition 1. For n > 1 the set of rv Xi is said to be associated, if for all given real-valued functions gi that are increasing ineach component when the other components are held fixed, the inequality E [Π/ ™ 2™ (X)]≥Πš 2™ (gj (X))holds, or equivalently, Corr (gi (X), gj (X)) ≥ 0, Theorem 1. (a) A set consisting of a single random variable is a set of associated random variables. (b) Independent random variables areassociated random variables. (c) A subset of a set of associated random variables forms a set of associated random variables. (d) Increasing functions of associated random variables are associated random variables [24]. Proposition 2. Therefore, the process X (t) with above properties can be represented by composition of Brownian motion processes in finite-dimensional time model.