At the centre of statistics lies the normal distribution, known to millions of people as the bell curve, or the bell-shaped curve. This is a two-parameter family of curves that are graphs of the equation

Not only is the bell curve familiar to these millions, but they also know of its main use: to describe the general, or idealized, the shape of graphs of data. It has, of course, many other uses and plays as significant a role in the social sciences as differentiation does in the natural sciences.
An approximation tool
The origins of the mathematical theory of probability are justly attributed to the famous correspondence between Fermat and Pascal, which was instigated in 1654 by the queries of the gambling Chevalier de Mere.
Among the various types of problems they considered were binomial distributions, which today would be described by:

This sum denotes the likelihood of between i and j successes in n trials with success probability p. Such a trial—now called a Bernoulli trial—is the most elementary of all random experiments. It has two outcomes, usually termed success and failure.
As the binomial examples Fermat and Pascal worked out involved only small values of n, they were not concerned with the computational challenge presented by the evaluation of general sums of this type. However, more complicated computations were not long in coming.
For example, in 1712 the Dutch mathematician Gravesande tested the hypothesis that male and female births are equally likely against the actual births in London over the 82 years 1629–1710.
A few years earlier Jacob Bernoulli had found estimates for binomial sums of the type of (2). These estimates, however, did not involve the exponential function e^x.
De Moivre began his search for such approximations in 1721. In 1733, he proved that:

De Moivre also asserted that (4) could be generalized to a similar asymmetrical context, with x varying from n/2 to d + n/2. This is easily done, with the precision of the approximation clarified by De Moivre’s proof.

FIGURE 2 demonstrates how the binomial probabilities associated with 50 independent repetitions of a Bernoulli trial with probability p = 0.3 of success are approximated by such an exponential curve. De Moivre’s discovery is standard fare in all introductory statistics courses where it is called the normal approximation to the binomial and rephrased as

Since this integral is easily evaluated by numerical methods and quite economically described by tables, it does indeed provide a very practical approximation for cumulative binomial probabilities.
The search for an error curve
Astronomy was the first science to call for accurate measurements. Consequently, it was also the first science to be troubled by measurement errors and to face the question of how to proceed in the presence of several distinct observations of the same quantity.
The first scientist to note in print that measurement errors are deserving of a systematic and scientific treatment was Galileo in his famous Dialogue Concerning the Two Chief Systems of the World—Ptolemaic and Copernican, published in 1632.
Thus, even as late as the mid-18th century doubts persisted about the value of repetition of experiments. More important, however, was Simpson’s experimentation with specific error curves—probability densities that model the distribution of random errors. In the two propositions, Simpson computed the probability that the error in the mean of several observations does not exceed a given bound when the individual errors take on the values:

or
![]()
Simpson’s choice of error curves may seem strange, but they were in all likelihood dictated by the state of the art of probability at that time. For r = 1 (the simplest case), these two distributions yield the two top graphs of FIGURE 4. One year later, Simpson, while effectively inventing the notion of continuous error distribution, dealt with similar problems in the context of the error curves described at the bottom of FIGURE4.

In 1774, Laplace proposed the first of his error curves. Denoting this function by φ(x), he stipulated that it must be symmetric in x and monotone decreasing for x > 0. Furthermore, he proposed that
… as we have no reason to suppose a different law for the ordinates than for their differences, it follows that we must, subject to the rules of probabilities, supposes the ratio of two infinitely small consecutive differences to be equal to that of the corresponding ordinates.
Laplace’s argument can be paraphrased as follows. Aside from their being symmetrical and descending (for x > 0), we know nothing about either φ(x) or φ(x). Hence, presumably by Occam’s razor, it must be assumed that they are proportional (the simpler assumption of equality leads to φ(x) = Ce^|x|, which is impossible). The resulting differential equation is easily solved and the extracted error curve is displayed in FIGURE 5. There is no indication that Laplace was in any way disturbed by this curve’s non-differentiability at x = 0. We are about to see that he was perfectly willing to entertain even more drastic singularities.

Laplace must have been aware of the shortcomings of his rationale, for three short years later he proposed an alternative curve [23]. Let a be the supremum of all the possible errors (in the context of a specific experiment) and let n be a positive integer.
Choose n points at random within the unit interval, thereby dividing it into n + 1 spacings. Order the spacings as:
d1 > d2 > ··· > dn+1, d1 + d2 +···+ dn+1 = 1.
Let d¯i be the expected value of di. Draw the points (i/n, d¯i),i = 1, 2,… , n + 1 and let n become infinitely large. The limit configuration is a curve that is proportional to ln(a/x) on (0, a]. Symmetry and the requirement that the total probability must be 1 then yield Laplace’s second candidate for the error curve (FIGURE 6):
This curve, with its infinite singularity at 0 and finite domain (a reversal of the properties of the error curve of FIGURE 5 and the bell-shaped curve) constitutes a step backwards in the evolutionary process and one suspects that Laplace was seduced by the considerable mathematical intricacies of the curve’s derivation. So much so that he seemed compelled to comment on the curve’s excessive complexity and to suggest that error analyses using this curve should be carried out only in “very delicate” investigations, such as the transit of Venus across the sun.
The next important development had its roots in a celestial event that occurred on January 1, 1801. On that day the Italian astronomer Giuseppe Piazzi sighted a heavenly body that he strongly suspected to be a new planet. He announced his discovery and named it Ceres. Unfortunately, six weeks later, before enough observations had been taken to make possible an accurate determination of its orbit, to ascertain that it was indeed a planet, Ceres disappeared behind the sun and was not expected to reemerge for nearly a year. Interest in this possibly new planet was widespread and
astronomers throughout Europe prepared themselves by computer-guessing the location where Ceres was most likely to reappear. The young Gauss, who had already made a name for himself as an extraordinary mathematician, proposed that an area of the sky be searched that was quite different from those suggested by the other astronomers and he turned out to be right.
Gauss explained that he used the least-squares criterion to locate the orbit that best fit the observations. This criterion was justified by a theory of errors that was based on the following three assumptions:
- Small errors are more likely than large errors.
- For any real number the likelihood of errors of magnitudes and − are equal.
- In the presence of several measurements of the same quantity, the most likely value
of the quantity being measured is their average.
Based on these assumptions he concluded that the probability density for the error (that is, the error curve) is:

where his a positive constant that Gauss thought of as the “precision of the measurement process”. We recognize this as the bell curve determined by µ = 0 and σ = 1/√2h.
Gauss’s ingenious derivation of this error curve made use of only some basic probabilistic arguments and standard calculus facts. As it falls within the grasp of undergraduate mathematics majors with a course in calculus-based statistics, his proof is presented here with only minor modifications.




Leave a Reply
Want to join the discussion?Feel free to contribute!