Statistics – Page 2 – Emanuele Seminara

Application 9_A_2

This video showing how application 9_A_2 works in C#.

The form is composed by:

a buttons;
6 label;
3 trackBar
a comboBox;
a pictureBox.

Through the trackBars it is possible to choose the distribution parameters and through the comboBox the type of distribution to be displayed. When the “Start” button is pressed, the corresponding distribution graph and two histograms, the final one and the one selected at the moment, will be shown in the pictureBox.

Download C#

November 8, 2021/0 Comments/by zifro

Applications, Statistics

Application 9_A_1

This video showing how application 9_A_1 works in C#.

The form is composed by:

a buttons;
two label;
a trackBar;
a pictureBox.

After choosing how many points to generate, by pressing the “Start” button the trend of the Empirical CDF and the Theoretical CDF will be shown in the pictureBox. As the number of points increases, we will see that the two gradually become the same.

Download C#

November 8, 2021/0 Comments/by zifro

Researches, Statistics

Research about applications 7_RA

In mathematics, a random walk is a mathematical object, known as a stochastic or random process, that describes a path that consists of a succession of random steps on some mathematical space such as the integers.
An elementary example of a random walk is the random walk on the integer number line, Z, which starts at 0 and each step moves +1 or −1 with equal probability. Other examples include the path traced by a molecule as it travels in a liquid or a gas (see Brownian motion), the search path of a foraging animal, the price of a fluctuating stock and the financial status of a gambler: all can be approximated by random walk models, even though they may not be truly random in reality.
As illustrated by those examples, random walks have applications to engineering and many scientific fields including ecology, psychology, computer science, physics, chemistry, biology, economics, and sociology. Random walks explain the observed behaviours of many processes in these fields and thus serve as a fundamental model for the recorded stochastic activity. As a more mathematical application, the value of π can be approximated by the use of a random walk in an agent-based modelling environment. The term random walk was first introduced by Karl Pearson in 1905.
In the simplest context, the walk is in discrete time, that is a sequence of random variables (Xt) = (X1, X2, …) indexed by the natural numbers. However, it is also possible to define random walks which take their steps at random times, and in that case, the position Xt has to be defined for all times t ∈ [0,+∞).

A random walk is a process by which randomly-moving objects wander away from where they started. The video below shows 7 black dots that start in one place randomly walking away.

How can we describe this mathematically?

The simplest random walk to understand is a 1-dimensional walk. Suppose that the black dot below is sitting on a number line. The black dot starts in the centre.

Then, it takes a step, either forward or backwards, with equal probability. It keeps taking steps either forward or backwards each time. Let’s call the 1st step a₁, the second step a₂, the third step a₃ and so on. Each “a” is either equal to +1 (if the step is forward) or -1 (if the step is backward). The picture below shows a black dot that has taken 5 steps and ended up at -1 on the number line.

Suppose we put the black dot at 0 and then let it take N steps (where N is any number). Now we want to know how far the black dot travels after it has taken N steps. Of course, the distance travelled after N steps will vary each time we repeat the experiment, so what we want to know is, if we repeat the experiment many many times, how far the black dot will have travelled on average. Let’s call the distance that the black dot has travelled “d”. Keep in mind that d can either be positive or negative, depending on whether the black dot ends up to the right or left of 0. We know that for anyone time that we repeat the experiment,

d = a₁ + a₂ + a₃ + … + a_N

Now we use the notation <d> to mean “the average of d if we repeated the experiments many times”:

But <a₁>=0, because if we repeated the experiment many many times, and a₁has an equal probability of being -1 or +1, we expect the average of a₁ to be 0. So then,

<d> = <a₁> + <a₂> + <a₃> + … + <a_N> = 0 + 0 + 0 + … + 0 = 0

This isn’t too surprising if you think about it. After all, <d> is the average location of the black dot after N steps, and since the dot is equally likely to move forward or backwards, we expect d to be 0, on average.

Well, that was useless!

That exercise didn’t tell us anything about how far the black dot gets after N steps! We are going to have to be a little more clever if we want to find out any kind of useful information.

Here’s an idea… Even though d can be positive or negative, d² is always positive, so it can’t average out to 0. What if we found the average value of d²?

<d²> = <(a₁ + a₂ + a₃ + … + a_N)²> = <(a₁ + a₂ + a₃ + … + a_N) (a₁ + a₂ + a₃ + … + a_N)>

= (<a₁²> + <a₂²> + <a₃²> + … + <a_N²>) + 2 (<a₁a₂> + <a₁a₃> + … <a₁a_N> + <a₂a₃> + … <a₂a_N> + …)

First let’s think about what <a₁²> is equal to. Well a₁ can either be +1 or -1. Either way, a₁² =1. Then on average a₁² is 1 and <a₁²> = 1. The same applies to <a₂²>, <a₃²>, and all the way up to <a_N²>.

Now let’s think about what <a₁a₂> is equal to. There are 4 possible combinations of a₁ and a₂, each with equal probability:

Since a1a2 is equally likely to be +1 or -1, <a1a2> = 0. The same is true of all of <a1a3>, <a1aN>, <a2a3>, <a2aN> and all of the other terms containing two different steps. Then:

<d2> = (<a12> + <a22> + <a32> + … + <aN2>) + 2 (<a1a2> + <a1a3> + … <a1aN> + <a2a3> + … <a2aN> + …)

= (1 + 1 + 1 + … +1) + 2 (0 + 0 + … + 0 + 0 + …) = N

Now we’re getting somewhere! The average of the square of the distance is equal to N. If we take the square root of this equation, we realize that: sqrt(<d2>)=sqrt(N)

Since sqrt(<d2>) is something like the average positive distance away from 0 after N steps (technically, it’s called the “root-mean-squared” distance), we expect that after N steps, the black dot will be roughly sqrt(N) steps away from where it started. So for 25 steps, we expect the black dot to have moved roughly 5 total spaces from 0 in either direction. Of course, sometimes it will move more and sometimes fewer total spaces, but 5 is roughly what we might expect.

Random walks in more than one dimension

Of course, the 1-dimensional random walk is easy to understand, but not as commonly found in nature as the 2D and 3D random walk, in which an object is free to move along a 2D plane or a 3D space instead of a 1D line (think of gas particles bouncing around in a room, able to move in 3D). We won’t work out the math here, but it turns out to be governed by the same rule as a 1D random walk — the total distance travelled from where it started is approximately sqrt(N) where N is the number of steps taken.

Above we assumed a dimensionless step size of 1. In real life, step sizes are measured in units of distance and sqrt(<d2>)=<r>sqrt(N), where “r” is the size of a step, measured in units of length, and <r> is the average size of a step if the random steps vary in size (which they typically do in nature).

Now that we know a little more about random walks, let’s take another look at the video we saw before. The video shows the paths of 7 black dots undergoing a random walk in 2D. All of the dots start in the same place and then start to wander around. Notice that the steps they take are not all the same length. By the end of the video, some of the dots ended up above where they started, some below, some to the right, and some to the left. Some don’t end up very far from where they started and others manage to travel quite a distance.

Not all random walks are “random”

So far all of the random walks we have considered allowed an object to move with equal probability in any direction. But not all random walks follow this rule. A biased random walk is a random walk that is biased in one direction, leading to a net drift on average of particles in one specific direction.

There are several ways that a random walk can be biased. Think back to our 1D number line.

Suppose that instead of an equal probability of moving left to right, there was a higher probability of moving to the right.
Or, suppose the probabilities of moving to the left or right remained equal, but whenever the dot moved to the right, it moved 2 spaces, and when it moved to the left, it only moved 1 space.

In both of these cases, we would expect a net drift to the right. In other words, the random walk is biased toward the right. The video below shows a biased random walk in 2D

Each black dot has an equal probability of stepping towards the right or the left. But the step size is on average longer if the step is towards the right then if the step is towards the left.

References

Random Walks (mit.edu)

Random walk – Wikipedia

November 8, 2021/0 Comments/by zifro

Researches, Statistics

Research about theory 9_R

At the centre of statistics lies the normal distribution, known to millions of people as the bell curve, or the bell-shaped curve. This is a two-parameter family of curves that are graphs of the equation

Not only is the bell curve familiar to these millions, but they also know of its main use: to describe the general, or idealized, the shape of graphs of data. It has, of course, many other uses and plays as significant a role in the social sciences as differentiation does in the natural sciences.

An approximation tool

The origins of the mathematical theory of probability are justly attributed to the famous correspondence between Fermat and Pascal, which was instigated in 1654 by the queries of the gambling Chevalier de Mere.
Among the various types of problems they considered were binomial distributions, which today would be described by:

This sum denotes the likelihood of between i and j successes in n trials with success probability p. Such a trial—now called a Bernoulli trial—is the most elementary of all random experiments. It has two outcomes, usually termed success and failure.
As the binomial examples Fermat and Pascal worked out involved only small values of n, they were not concerned with the computational challenge presented by the evaluation of general sums of this type. However, more complicated computations were not long in coming.
For example, in 1712 the Dutch mathematician Gravesande tested the hypothesis that male and female births are equally likely against the actual births in London over the 82 years 1629–1710.

A few years earlier Jacob Bernoulli had found estimates for binomial sums of the type of (2). These estimates, however, did not involve the exponential function e^x.
De Moivre began his search for such approximations in 1721. In 1733, he proved that:

De Moivre also asserted that (4) could be generalized to a similar asymmetrical context, with x varying from n/2 to d + n/2. This is easily done, with the precision of the approximation clarified by De Moivre’s proof.

FIGURE 2 demonstrates how the binomial probabilities associated with 50 independent repetitions of a Bernoulli trial with probability p = 0.3 of success are approximated by such an exponential curve. De Moivre’s discovery is standard fare in all introductory statistics courses where it is called the normal approximation to the binomial and rephrased as

Since this integral is easily evaluated by numerical methods and quite economically described by tables, it does indeed provide a very practical approximation for cumulative binomial probabilities.

The search for an error curve

Astronomy was the first science to call for accurate measurements. Consequently, it was also the first science to be troubled by measurement errors and to face the question of how to proceed in the presence of several distinct observations of the same quantity.

The first scientist to note in print that measurement errors are deserving of a systematic and scientific treatment was Galileo in his famous Dialogue Concerning the Two Chief Systems of the World—Ptolemaic and Copernican, published in 1632.

Thus, even as late as the mid-18th century doubts persisted about the value of repetition of experiments. More important, however, was Simpson’s experimentation with specific error curves—probability densities that model the distribution of random errors. In the two propositions, Simpson computed the probability that the error in the mean of several observations does not exceed a given bound when the individual errors take on the values:

Simpson’s choice of error curves may seem strange, but they were in all likelihood dictated by the state of the art of probability at that time. For r = 1 (the simplest case), these two distributions yield the two top graphs of FIGURE 4. One year later, Simpson, while effectively inventing the notion of continuous error distribution, dealt with similar problems in the context of the error curves described at the bottom of FIGURE4.

In 1774, Laplace proposed the first of his error curves. Denoting this function by φ(x), he stipulated that it must be symmetric in x and monotone decreasing for x > 0. Furthermore, he proposed that
… as we have no reason to suppose a different law for the ordinates than for their differences, it follows that we must, subject to the rules of probabilities, supposes the ratio of two infinitely small consecutive differences to be equal to that of the corresponding ordinates.

Laplace’s argument can be paraphrased as follows. Aside from their being symmetrical and descending (for x > 0), we know nothing about either φ(x) or φ(x). Hence, presumably by Occam’s razor, it must be assumed that they are proportional (the simpler assumption of equality leads to φ(x) = Ce^|x|, which is impossible). The resulting differential equation is easily solved and the extracted error curve is displayed in FIGURE 5. There is no indication that Laplace was in any way disturbed by this curve’s non-differentiability at x = 0. We are about to see that he was perfectly willing to entertain even more drastic singularities.

Laplace must have been aware of the shortcomings of his rationale, for three short years later he proposed an alternative curve [23]. Let a be the supremum of all the possible errors (in the context of a specific experiment) and let n be a positive integer.
Choose n points at random within the unit interval, thereby dividing it into n + 1 spacings. Order the spacings as:
d1 > d2 > ··· > dn+1, d1 + d2 +···+ dn+1 = 1.
Let d¯i be the expected value of di. Draw the points (i/n, d¯i),i = 1, 2,… , n + 1 and let n become infinitely large. The limit configuration is a curve that is proportional to ln(a/x) on (0, a]. Symmetry and the requirement that the total probability must be 1 then yield Laplace’s second candidate for the error curve (FIGURE 6):

This curve, with its infinite singularity at 0 and finite domain (a reversal of the properties of the error curve of FIGURE 5 and the bell-shaped curve) constitutes a step backwards in the evolutionary process and one suspects that Laplace was seduced by the considerable mathematical intricacies of the curve’s derivation. So much so that he seemed compelled to comment on the curve’s excessive complexity and to suggest that error analyses using this curve should be carried out only in “very delicate” investigations, such as the transit of Venus across the sun.

The next important development had its roots in a celestial event that occurred on January 1, 1801. On that day the Italian astronomer Giuseppe Piazzi sighted a heavenly body that he strongly suspected to be a new planet. He announced his discovery and named it Ceres. Unfortunately, six weeks later, before enough observations had been taken to make possible an accurate determination of its orbit, to ascertain that it was indeed a planet, Ceres disappeared behind the sun and was not expected to reemerge for nearly a year. Interest in this possibly new planet was widespread and
astronomers throughout Europe prepared themselves by computer-guessing the location where Ceres was most likely to reappear. The young Gauss, who had already made a name for himself as an extraordinary mathematician, proposed that an area of the sky be searched that was quite different from those suggested by the other astronomers and he turned out to be right.

Gauss explained that he used the least-squares criterion to locate the orbit that best fit the observations. This criterion was justified by a theory of errors that was based on the following three assumptions:

Small errors are more likely than large errors.
For any real number the likelihood of errors of magnitudes and − are equal.
In the presence of several measurements of the same quantity, the most likely value
of the quantity being measured is their average.

Based on these assumptions he concluded that the probability density for the error (that is, the error curve) is:

where his a positive constant that Gauss thought of as the “precision of the measurement process”. We recognize this as the bell curve determined by µ = 0 and σ = 1/√2h.
Gauss’s ingenious derivation of this error curve made use of only some basic probabilistic arguments and standard calculus facts. As it falls within the grasp of undergraduate mathematics majors with a course in calculus-based statistics, his proof is presented here with only minor modifications.

References

stahl.dvi (maa.org)

November 8, 2021/0 Comments/by zifro

HomeWork, Statistics

Homework 7

Researches about theory

9_R. History and derivation of the normal distribution. Touch, at least, the following three i mportant perspectives, putting them into an historical context to understand how the idea developed:

1) as approximation of binomial (De Moivre)
2) as error curve (Gauss)
3) as limit of sum of independent r.v.’s (Laplace)

Go to research

Applications

9_A_1. Create a simulation with graphics to convince yourself of the pointwise convergence of the empirical CDF to the theoretical distribution (Glivenko-Cantelli theorem). Use a simple random variable of your chooice for such a demonstration.

PlotEmpiricalCdfAndCompareWithSamplingDistributionExample_01 (source: https://it.mathworks.com/help/stats/cdfplot.html )

Go to applicaton

9_A_2. Generate sample paths of jump processes which at each time considered t = 1, …, n perform jumps computed as:

– σ sqrt(1/n) R(t)
where R(t) is a [-1,1] Rademacher random variable (https://en.wikipedia.org/wiki/Rademacher_distribution).

– σ sqrt(1/n) * Z(t), where Z(t) is a N(0,1) random variable (https://en.wikipedia.org/wiki/Normal_distribution)

and see what happens as n (simulation parameter) becomes larger.

[As before, at time n (last time) and one other chosen inner time 1 (source: https://www.datatime.eu/public/StatApp2020/ )

Go to applicaton

Researches about applications

7_RA Do a research about the random walk process and its properties. Compare your finding with your applications drawing your personal conclusions. Explain based on your exercise the beaviour of the distribution of the stochastic process (check out “Donsker’s invariance principle”). What are, in particular, its mean and variance at time n ?

Go to research

November 8, 2021/0 Comments/by zifro

Applications, Statistics

Application 8_A

This video showing how application 6_A works in C#.

The form is composed by:

a buttons;
nine label;
a comboBox;
four trackBar;
a pictureBox.

In the upper part of the window we can set the parameters of the number of sequences (m), of the number of points (n), of the sampling instant (t), of the failure percentage (p).
Once these parameters have been set, you can choose the type of data to be shown in the histograms. Pressing the “Start” button will generate m sequences made up of n points and shown in the pictureBox.

Absolute frequency

Relative frequency

Normalized relative frequency

As we can see in the pictures, for what regards absolute frequency, with n->∞ we will have a shape that tends to zero at the tails, so it is a degenerate form.

For what concerns the relative frequency, with n->∞ we will have that ni(number of success)/n will tend to p(where p is the probability of success), and the other values will be 0 because there will be a finite number over infinity, so we will have a peak in p and it is also a degenerate form.

The normalized relative frequency, instead, is the only shape that doesn’t degenerate, and for n->∞ it tends to the normal distribution with the “bell” shape.

Download C#

October 31, 2021/0 Comments/by zifro

Researches, Statistics

Research about theory 8_R

Convergence of Random Variables

Convergence of random variables (sometimes called stochastic convergence) is where a set of numbers settle on a particular number. A sequence of numbers (which could represent cars or anything else) can converge (mathematically, this time) on a single, specific number. Certain processes, distributions, and events can result in convergence— which basically means the values will get closer and closer together. When Random variables converge on a single number, they may not settle exactly that number, but they come very, very close. In notation, x (x_n → x) tells us that a sequence of random variables (x_n) converges to the value x.

In notation, that’s:

|x_n − x| → 0 as n → ∞.

What happens to these variables as they converge can’t be crunched into a single definition?

Convergence of Random Variables can be broken down into many types. The ones you’ll most often come across:

Convergence in probability,
Convergence in distribution,
Almost sure convergence,
Convergence in mean.

Each of these definitions is quite different from the others. However, for an infinite series of independent random variables: convergence in probability, convergence in distribution, and almost sure convergence are equivalent.

Convergence in probability

If you toss a coin n times, you would expect heads around 50% of the time. However, let’s say you toss the coin 10 times. You might get 7 tails and 3 heads (70%), 2 tails and 8 heads (20%), or a wide variety of other possible combinations. Eventually though, if you toss the coin enough times (say, 1,000), you’ll probably end up with about 50% tails. In other words, the percentage of heads will converge to the expected probability.

More formally, convergence in probability can be stated as the following formula:

P = probability, X_n = number of observed successes (e.g. tails) in n trials (e.g. tosses of the coin),
Lim (n→∞) = the limit at infinity — a number where the distribution converges to after an infinite number of trials (e.g. tosses of the coin), c = a constant where the sequence of random variables converges in probability to, ε = a positive number representing the distance between the expected value and the observed value.

Convergence in distribution

Convergence in distribution (sometimes called convergence in law) is based on the distribution of random variables, rather than the individual variables themselves.

In more formal terms, a sequence of random variables converges in distribution if the CDFs for that sequence converge into a single CDF. Let’s say you had a series of random variables, X_n. Each of these variables X₁, X₂,…X_n has a CDF FXn(x), which gives us a series of CDFs {FXn(x)}. Convergence in distribution implies that the CDFs converge to a single CDF, F_x(x).

Several methods are available for proving convergence in distribution. For example, Slutsky’s Theorem and the Delta Method can both help to establish convergence.

Almost sure convergence

Almost sure convergence (also called convergence in probability one) answers the question: given a random variable X, do the outcomes of the sequence X_n converge to the outcomes of X with a probability of 1? (Mittelhammer, 2013).

As an example of this type of convergence of random variables, let’s say an entomologist is studying feeding habits for wild house mice and records the amount of food consumed per day. The amount of food consumed will vary wildly, but we can be almost sure (quite certain) that amount will eventually become zero when the animal dies. It will almost certainly stay zero after that point. We’re “almost certain” because the animal could be revived, or appear dead for a while, or a scientist could discover the secret for eternal mouse life. In life — as in probability and statistics — nothing is certain.

Almost sure convergence is defined in terms of a scalar sequence or matrix sequence:

Scalar: X_n has almost sure convergence to X iff: P|X_n → X| = P(lim_n→∞X_n = X) = 1

Matrix: X_n has almost sure convergence to X iff: P|y_n[i,j] → y[i,j]| = P(lim_n→∞y_n[i,j] = y[i,j]) = 1, for all i and j.

Convergence in mean

Given a real number $r \geq 1$ , we say that the sequence $X n$ converges in the r-th mean (or in the L^r-norm) towards the random variable X, if the $r$ -th absolute moments E(|X_n|^r) and E(|X|^r) of $X n$ and X exist, and

\lim _{n\to \infty }\operatorname {E} \left(|X_{n}-X|^{r}\right)=0,

where the operator E denotes the expected value. Convergence in $r$ -th mean tells us that the expectation of the $r$ -th power of the difference between $X n$ and X converges to zero.

Law of Large Numbers

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and will tend to become closer to the expected value as more trials are performed.

The LLN is important because it guarantees stable long-term results for the averages of some random events.

There are two different versions of the law of large numbers that are described below. They are called the strong law of large numbers and the weak law of large numbers. Stated for the case where X₁, X₂, … is an infinite sequence of independent and identically distributed (i.i.d.) Lebesgue integrable random variables with expected value E(X₁) = E(X₂) = …= µ, both versions of the law state that – with virtual certainty – the sample average

{\overline {X}}_{n}={\frac {1}{n}}(X_{1}+\cdots +X_{n})

converges to the expected value:

${\overline {X}}_{n}\to \mu \quad {\textrm {as}}\ n\to \infty .$

(Lebesgue integrability of X_j means that the expected value E(X_j) exists according to Lebesgue integration and is finite. It does not mean that the associated probability measure is absolutely continuous with respect to the Lebesgue measure.)

Weak Law

The weak law of large numbers (also called Khinchin’s law) states that the sample average converges in probability towards the expected value.

$\text{[math]}$

That is, for any positive number ε,

$\lim _{n\to \infty }\Pr \!\left(\,|{\overline {X}}_{n}-\mu |>\varepsilon \,\right)=0.$

Interpreting this result, the weak law states that for any nonzero margin specified (ε), no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin.

The weak law applies in the case of i.i.d. random variables, but it also applies in some other cases.

Strong Law

The strong law of large numbers (also called Kolmogorov’s law) states that the sample average converges almost surely to the expected value.

${\begin{matrix}{}\\{\overline {X}}_{n}\ \xrightarrow {\text{a.s.}} \ \mu \qquad {\textrm {when}}\ n\to \infty .\\{}\end{matrix}}$

That is,

$\Pr \!\left(\lim _{n\to \infty }{\overline {X}}_{n}=\mu \right)=1.$

What this means is that the probability that, as the number of trials n goes to infinity, the average of the observations converges to the expected value, is equal to one.

The proof is more complex than that of the weak law. This law justifies the intuitive interpretation of the expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the “long-term average”.

Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). However, the weak law is known to hold in certain conditions where the strong law does not hold, and then the convergence is only weak (in probability).

Binomial convergence to normal and Poisson distributions

Binomial to Poisson

Let’s first talk about the relation between binomial and Poisson distribution,

At first glance, the binomial distribution and the Poisson distribution seem unrelated. But a closer look reveals a pretty interesting relationship. It turns out the Poisson distribution is just a special case of the binomial — where the number of trials is large, and the probability of success in any given one is small.

The binomial distribution works when we have a fixed number of events n, each with a constant probability of success p.
Imagine we don’t know the number of trials that will happen. Instead, we only know the average number of successes per time period. So we know the rate of successes per day, but not the number of trials n or the probability of success p that led to that rate. We are gonna prove what we just stated.

Let this be the rate of successes per day. It’s equal to np. That’s the number o trials n – however many there are – times the chance of success p for each of those trials. Think of it like this: if the chance of success is p and we run n trials per day, we’ll observe np successes per day on average. That’s our observed success rate lambda. Recall that te binomial distribution looks like this:

We defined:

Solving for p, we get:

What we’re going to do here is substitute this expression for p into the binomial distribution above, and take the limit as n goes to infinity, and try to come up sith something useful. That is,

Pulling out the constants

and

and splitting the term on the right that’s to the power of (n-k) into a term to the power of n and one to the power of -k, we get

Now let’s take the limit of this right-hand side one term at a time. We’ll do this in three steps. The first step is to find the limit of

In the numerator, we can expand n! into n terms of (n)(n-1)(n-2)…(1). And in the denominator, we can expand (n-k) into n-k terms of (n-k)(n-k-1)(n-k-2)…(1). That is,

Written this way, it’s clear that many of terms on the top and bottom cancel out. The (n-k)(n-k-1)…(1) terms cancel from both the numerator and denominator, leaving the following:

Since we canceled out n-k terms, the numerator here is left with k terms, from n to n-k+1. So this has k terms in the numerator, and k terms in the denominator since n is to the power of k.

Expanding out the numerator and denominator we can rewrite this as:

This has k terms. Clearly, every one of these k terms approaches 1 as n approaches infinity. So we know this portion of the problem just simplifies to one. So we’re done with the first step.

The second step is to find the limit of the term in the middle of our equation, which is

Recall that the definition of e = 2.718… is given by the following:

Our goal here is to find a way to manipulate our expression to look more like the definition of e, which we know the limit of. Let’s define a number x as

Now let’s substitute this into our expression and take the limit as follows:

This terms just simplifies to $e^{-\lambda}$ . So we’re done with out second step. That leaves only one more term for us to find the limit of. Our third and final step is to find the limit of the last term on the right, wich is

This is pretty simple. As n approaches infinity, this term becomes $1^{-k}$ which is equal to one. And that takes care of our last term. Putting these three results together, we can rewrite our original limit as

This just simplifies to the following:

This is equal to the familiar probability density function for the Poisson distribution, which gives us the probability of k successes per period given our parameter lambda.

So we’ve shown that the Poisson distribution is just a special case of the binomial, in which the number of n trials grows to infinity and the chance of success in any particular trial approaches zero. And that completes the proof.

Binomial to normal

If n is large enough, then the skew of the distribution is not too great. In this case, a reasonable approximation to B(n, p) is given by the normal distribution

{\mathcal {N}}(np,\,np(1-p)),

and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1. Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one:

One rule is that for n > 5 the normal approximation is adequate if the absolute value of the skewness is strictly less than 1/3; that is, if

{\frac {|1-2p|}{\sqrt {np(1-p)}}}={\frac {1}{\sqrt {n}}}\left|{\sqrt {\frac {1-p}{p}}}-{\sqrt {\frac {p}{1-p}}}\,\right|<{\frac {1}{3}}.

This can be made precise using the Berry–Esseen theorem.

A stronger rule states that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values; that is, only if

\mu \pm 3\sigma =np\pm 3{\sqrt {np(1-p)}}\in (0,n).

This 3-standard-deviation rule is equivalent to the following conditions, which also imply the first rule avobe.

Central Limit Theorem (CLT)

In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution (informally a bell curve) even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. This theorem has seen many changes during the formal development of probability theory.

Let ${\textstyle \{X_{1},\ldots ,X_{n}\}}$ be a random sample of size $n$ — that is, a sequence of independent and identically distributed (i.i.d.) random variables drawn from a distribution of expected value given by $μ$ and finite variance given by ${\textstyle \sigma ^{2}}$ . Suppose we are interested in the sample average

{\bar {X}}_{n}\equiv {\frac {X_{1}+\cdots +X_{n}}{n}}

of these random variables. By the law of large numbers, the sample averages converge almost surely (and therefore also converge in probability) to the expected value $μ$ as ${\textstyle n\to \infty }$ . The classical central limit theorem describes the size and the distributional form of the stochastic fluctuations around the deterministic number $μ$ during this convergence. More precisely, it states that as $n$ gets larger, the distribution of the difference between the sample average ${\textstyle {\bar {X}}_{n}}$ and its limit $μ$ , when multiplied by the factor ${\textstyle {\sqrt {n}}}$ (that is ${\textstyle {\sqrt {n}}({\bar {X}}_{n}-\mu )}$ ) approximates the normal distribution with mean 0 and variance ${\textstyle \sigma ^{2}}$ . For large enough $n$ , the distribution of ${\textstyle {\bar {X}}_{n}}$ is close to the normal distribution with mean $μ$ and variance ${\textstyle \sigma ^{2}/n}$ . The usefulness of the theorem is that the distribution of ${\textstyle {\sqrt {n}}({\bar {X}}_{n}-\mu )}$ approaches normality regardless of the shape of the distribution of the individual ${\textstyle X_{i}}$ .

References

Central limit theorem – Wikipedia

Binomial distribution – Wikipedia

Deriving the Poisson Distribution from the Binomial Distribution | by Andrew Chamberlain, Ph.D. | Medium

Law of large numbers – Wikipedia

Convergence of random variables – Wikipedia

Convergence of Random Variables: Simple Definition – Calculus How To

Click here to add your own text

October 31, 2021/0 Comments/by zifro

Researches, Statistics

Research about applications 6_RA

Our goal is to find the median of all the elements read so far starting from the first integer till the last integer. This is also called the Median of Running Integers. The data stream can be any source of data, for example, a file, an array of integers, an input stream, etc.

But what is the median? Median can be defined as the element in the data set which separates the higher half of the data sample from the lower half. In other words, we can get the median element as, when the input size is odd, we take the middle element of sorted data. If the input size is even, we pick an average of the middle two elements in the sorted stream.

How can we find the median using algorithms? If we keep each number in a sorted sequence then the cost of a single entry is O(n) and the finding median is O(n). A slight modification can be done by keeping the middle pointer and adjusting it based on the insertion on its left side and right side. In that case, the finding median after insertion is O(1). But the overall cost for finding median remains O(n) as an insertion in the sorted sequence is necessary after each number is entered.

So which approach could we use to find the median more efficiently? There are several methods to find the median:

Method 1: Insertion Sort
If we can sort the data as it appears, we can easily locate the median element. Insertion Sort is one such online algorithm that sorts the data appeared so far. At any instance of sorting, say after sorting i-th element, the first i elements of the array are sorted. The insertion sort doesn’t depend on future data to sort data input till that point. In other words, insertion sort considers data sorted so far while inserting the next element. This is the key part of insertion sort that makes it an online algorithm.
However, insertion sort takes O(n²) time to sort n elements. Perhaps we can use binary search on insertion sort to find the location of the next element in O(log n) time. Yet, we can’t do data movement in O(log n) time. No matter how efficient the implementation is, it takes polynomial time in case of insertion sort.
Method 2: Augmented self-balanced binary search tree (AVL, RB, etc…)
At every node of BST, maintain some elements in the subtree rooted at that node. We can use a node as the root of a simple binary tree, whose left child is self-balancing BST with elements less than root and right child is self-balancing BST with elements greater than root. The root element always holds the effective median.
If the left and right subtrees contain the same number of elements, the root node holds the average of left and right subtree root data. Otherwise, the root contains the same data as the root of subtree which is having more elements. After processing an incoming element, the left and right subtrees (BST) are differed utmost by 1.
Self-balancing BST is costly in managing the balancing factor of BST. However, they provide sorted data which we don’t need. We need median only. The next method makes use of Heaps to trace the median.
Method 3: Heaps
Similar to balancing BST in Method 2 above, we can use a max heap on the left side to represent elements that are less than the effective median, and a min-heap on the right side to represent elements that are greater than the effective median.
After processing an incoming element, the number of elements in heaps differs utmost by 1 element. When both heaps contain the same number of elements, we pick the average of heaps root data as the effective median. When the heaps are not balanced, we select the effective median from the root of the heap containing more elements.

Useful code:

// C# program to find med in
// stream of running integers
using System;
using System.Collections.Generic;
public class MedianMaintain
{
// method to calculate med of stream
public static void printMedian(int[] a)
{   
double med = a[0];
// max heap to store the smaller half elements
List<int> smaller = new List<int>();
// min-heap to store the greater half elements
List<int>  greater = new  List<int>();    
smaller.Add(a[0]);
Console.WriteLine(med);
// reading elements of stream one by one
/* At any time we try to make heaps balanced and
their sizes differ by at-most 1. If heaps are
balanced,then we declare median as average of
min_heap_right.top() and max_heap_left.top()
If heaps are unbalanced,then median is defined
as the top element of heap of larger size */
for(int i = 1; i < a.Length; i++)
{
int x = a[i];
// case1(left side heap has more elements)
if(smaller.Count > greater.Count)
{
if(x < med)
{
smaller.Sort();
smaller.Reverse();
greater.Add(smaller[0]);
smaller.RemoveAt(0);
smaller.Add(x);
}
else
greater.Add(x);
smaller.Sort();
smaller.Reverse();
greater.Sort();
med = (double)(smaller[0] + greater[0])/2;
}
// case2(both heaps are balanced)
else if(smaller.Count == greater.Count)
{
if(x < med)
{
smaller.Add(x);
smaller.Sort();
smaller.Reverse();
med = (double)smaller[0];
}
else
{
greater.Add(x);
greater.Sort();
med = (double)greater[0];
}
}
// case3(right side heap has more elements)
else
{
if(x > med)
{
greater.Sort();
smaller.Add(greater[0]);
greater.RemoveAt(0);
greater.Add(x);
}
else
smaller.Add(x);
smaller.Sort();
smaller.Reverse();
med = (double)(smaller[0] + greater[0])/2;
}
Console.WriteLine(med);
}
}
// Driver code
public static void Main(String []args)
{
// stream of integers
int[] arr = new int[]{5, 15, 10, 20, 3};
printMedian(arr);
}
}
// This code is contributed by Rajput-Ji

Output:
5
10
10
12.5
10

Complexity Analysis:

Time Complexity: O(n log n).
Time Complexity to insert an element in min-heap is log n. So to insert n element is O( n log n).
Auxiliary Space: O(n).
The Space required to store the elements in Heap is O(n).

Demo:

Median of Stream of Running Integers using STL | GeeksforGeeks – YouTube

References

Median of Stream of Running Integers using STL – GeeksforGeeks

Data Structures and Algorithm: Running Median (dsalgo.com)

Median in a stream of integers (running integers) – GeeksforGeeks

October 31, 2021/0 Comments/by zifro

HomeWork, Statistics

Homework 6

Researches about theory

8_R.

Do a research about the following topics:

– The law of large numbers LLN, the various definitions of convergence

– The convergence of the Binomial to the normal and Poisson distributions

– The central limit theorem [in anticipation of a topic we will study later]

Go to research

Applications

8_A. Exercise (also partially described in video 04)

Generate and represent m “sample paths” of n point each (m, n are program parameters), where each point represents a pair of:

time index t, and relative frequency of success f(t),

where f(t) is the sum of t Bernoulli random variables with distribution B(x, p) = p^x(1-p)^(1-x) observed at the various times up to t: j=1, …, t..

At time n (last time) and one other chosen inner time 1f(t) with the absolute frequency n(t) or by normalized relative frequency: f(t) / sqrt(p(1-p)/n).

Comment briefly on the result.

Empirical Freqency Sample Paths (courtesy: homework screenshot by Lorenzo Zara, year 2020)

(The general scheme of this exercise, will also be “reused” in next homeworks where we will consider other more interesting stochastic processes.)

Go to applicaton

Researches about applications

6_RA. Do a web research about the various methods proposed to compute the running median (one pass, online algorithms).
Store (cite all sources and attributions) the algorithm(s) that you think is(are) a good candidate, explaining briefly how it works and possibly try a quick demo.

Go to research

October 31, 2021/0 Comments/by zifro

Applications, Statistics

Application 7_A

This video showing how application 7_A works (in C#).

The form is composed by:

a buttons;
a label;
a richTextBox;
a textBox;
a pictureBox.
a openFileDialog

Pressing the “Load” button the data from the CSV file specified in the textBox are loaded and will appear in the richTextBox.
The scatterplot, two histograms and linear regression will be drawn in the pictureBox.

Download C#

Child_smokers.csv

October 30, 2021/0 Comments/by zifro