Central Limit Theorem and its applications

The central limit theorem is one of the statistical theorems with major practical implications. In this blog I will not show the mathematical proof of the central limit theorem, but I will describe its characteristics with some practical examples. The central limit theorem states that, under many conditions, independent random variables summed together will converge to a normal distribution as the number of variables increases. A normal or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. A classic normal distribution shape is the following:

and its properties are the following:

Indicated by μx and σx the mean and standard deviation of the starting distribution, the normal distribution obtained by the theorem will be characterised by the following mean and standard deviation values:

The most used application of the central limit theorem is the one related to the sample mean. For instance, consider having data with any starting distribution and sampling from this set randomly n values. Calculate the average of these values ​​and indicate it with xj. The mean is nothing more than the sum of n identically distributed variables divided by the value n. By repeating this sampling other times, as many averages are obtained. According to the central limit theorem, these means are normally distributed.

The distribution of the sample mean will be a normal with a mean and a standard deviation as reported above. The fact of being able to transform any distribution into a Gaussian with a reduced standard deviation compared to the original distribution is a very powerful weapon in the statistical analysis phase. This distribution can be easily transformed into a standard normal distribution according to the transformation

Graphic proof of the central limit theorem for the sample mean

Now I will graphically demonstrate the central limit theorem. Let’s start with any distribution of data. Given the following distributions:

From each of these 2 distributions I perform 100 samples. Each sampling consists of only one value (n = 1). They are distributed as follows:

The distributions obtained above reflect exactly the initial starting ones. Let’s increase the number of values for a single sampling passing from n = 1to n = 10:

Now both distribution begins to assume a Gaussian trend. It can be seen that as n increases, the standard deviation decreases. This is an expected result as shown in the previous formulas. The possibility of allocating more numbers allows to show a more continuous character of the distribution and therefore more and more similar to a Gaussian. By further increasing the number of samplings the approximation becomes increasingly evident.