Research about theory 4_R

Marginal distribution

In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with a conditional distribution, which gives the probabilities contingent upon the values of the other variables.

Marginal variables are those variables in the subset of variables being retained. These concepts are “marginal” because they can be found by summing values in a table along rows or columns, and writing the sum in the margins of the table. The distribution of the marginal variables (the marginal distribution) is obtained by marginalizing – that is, focusing on the sums in the margin – over the distribution of the variables being discarded, and the discarded variables are said to have been marginalized out.

Joint distribution

Given random variables {X, Y, … }, that are defined on a probability space, the joint probability distribution for {X, Y, … } is a probability distribution that gives the probability that each of {X, Y, …} falls in any particular range or discrete set of values specified for that variable. In the case of only two random variables, this is called a bivariate distribution, but the concept generalizes to any number of random variables, giving a multivariate distribution.

The joint probability distribution can be expressed in terms of a joint cumulative distribution function and either in terms of a joint probability density function (in the case of continuous variables) or joint probability mass function (in the case of discrete variables). These in turn can be used to find two other types of distributions: the marginal distribution giving the probabilities for any one of the variables with no reference to any specific ranges of values for the other variables, and the conditional probability distribution giving the probabilities for any subset of the variables conditional on particular values of the remaining variables.

Conditional distribution

In probability theory and statistics, given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X is the probability distribution of Y when X is known to be a particular value. When both X and Y are categorical variables, a conditional probability table is typically used to represent the conditional probability. The conditional distribution contrasts with the marginal distribution of a random variable, which is its distribution without reference to the value of the other variable.

Let’s focus now on the conditional frequency. Given two attributes (X and Y), we want to compute the frequency of Xi given Yj; this means restricting your population to a subset which has got a certain value Yj for the Y variable.

To compute the f(Xi|Yj), we have to compute the occurrences of statistical units which has got Xi and Yj as values respectively for X and Y variables; getting the relative frequency means dividing the number of occurrences with the total number of the units of observation.

In symbol:

f(Xi|Yj) = f(Xi ∧ Yj)/ f(Yj) [THIS IS AN IDENTITY]

It is also true that:

f(Yj|Xi) = f(Yj ∧ Xi)/f(Xi)

Switching to probability realm, this is equal to: (To simplify assume that Xi = X and Yj = Y)

P(X|Y) = P(X ∧ Y)/ P(Y)

But also:

P(Y|X) = P(Y ∧ X)/P(X)

We should know that P(X ∧ Y) = P(Y ∧ X), so the two expressions above become:

P(X|Y) = P(X ∧ Y)/ P(Y) multiplying by P(Y) we obtain P(X|Y)*P(Y) = P(X ∧ Y)

P(Y|X) = P(X ∧ Y)/P(X) multiplying by P(X) we obtain P(Y|X)*P(X) = P(X ∧ Y)

Therefore: P(X|Y)*P(Y) = P(Y|X)*P(X)

If we divide the first member with P(Y), we obtain the following:

P(X|Y) = P(Y|X)*P(X) / P(Y) that is also known as Bayes’ Theorem

Given a set of statistical with two attributes (X and Y), we can build our contingency table computing each joint frequency. Let’s now focus on the conditional distributions on each column (or even rows, it’s the same).

Imagine now drawing a chart for each column to represent the shape of the distribution. Among the columns, the shape of the distribution might be different -there is nothing that ensures that each column has the same shape.

If switching through columns (which means changing the Y variable) the shape of conditional distributions is exactly the same, we can conclude that these variables (i.e. X and Y) aren’t correlated: we talk about PERFECT INDEPENDENCE.

What we said can be formalized as:

For each i (that is “for all the possible value of X”): f(Xi|Yj) is equal for each j that belongs to [1; c] (c represents the index of the last value of Y) and it’s also equal to f(Xi) (that is the marginal frequency)

Expressing the definition above with formulas we obtain:

$\frac{n_{ij}}{n_{\bullet j}}=\frac{n_{i \bullet}}{n} \Rightarrow \frac{n_{ij}}{n_{i \bullet}}=\frac{n_{\bullet j}}{n} \Rightarrow n_{ij} n=n_{i\bullet }n_{\bullet j}$