Many natural phenomena or industrial processes depend on variables that take on unpredictable but completely random values. An example could be the value obtained following the throwing of a dice or the ambient temperature measured on the same day of the year at the same time. These variables are called random variables or causal (or stochastic) variables. Theorising a phenomenon of any nature that depends on random variables requires the introduction of the concept of probability distributions. In this blog we will go through the most common types of distribution and we will analyse them and their applications with real…


In this blog I would like to clarify a little more from a mathematical point of view the functionality of the linear regression. In fact, this method plays an important role for many machine learning algorithms and is extremely useful to understand it deeply.

The following a common definition of linear regression:

In statistics, linear regression is a method of estimating the conditional expectation of a dependent or endogenous variable given the values of other independent or exogenous variables.

This simply means that through linear regression, it is possible to estimate the value of a “something” that varies as a…


Cross validation is a statistical method used to estimate the ability of machine learning models. It is normally used in applied AI to analyse and choose a model for a given prescient visualisation problem as it is simple, simple to update, and leads to aptitude assessments that generally have a lower predisposition than the different strategies. In particular, k-fold cross-validation consists on the subdivision of the total dataset into k parts of equal number and, at each step, the k-th part of the dataset becomes the validation dataset, while the remaining part constitutes the training dataset. Thus, for each of…


What is web scraping

First of all, let’ s see what is meant by the term “web scraping”. Web scraping is a technique that consists of the extraction of information from a web page in an automated manner. Web scraping can be done both manually and with automated tools. The second way is usually preferred as it can be less costly and work at a faster rate and it can be done either through software or with a programming language. Today we will learn how to scrap the web with Python programming language.

Why is useful web scarping?

Web scraping is used for several purposes. For example, it allows…


In one of my previews blogs I’ve talked about the difference between Machine Learning and Programming Languages. When talking about programming languages, I have inserted a table in which the different programming languages were mentioned. Below you can review it:


part 1.

“God doesn’t play dice with the universe…”.

The origin of the Monte Carlo method is usually linked with the birth of computers and in particular with the researches done by Fermi, Ulam and Von Neumann, after World War II, on the processes of diffusion of neutrons. I was doubtful that the method, given the high number of replications required, could only assert itself with the spread of faster and cheaper computers. …


Differences and similitudes.

When I first approached the data science world I was feeling extremely confused about some core concepts and I still think that there is a lot of mess around them, probably because they are extremely interconnected. I am talking specifically about the concept of machine learning and programming language.

What is Machine Learning?

Machine learning is an application of artificial intelligence (AI) that gives systems the ability to automatically learn and improve from experience without being explicitly programmed. The main goal is to enable the computers learn automatically without being unequivocally modified.

What is Programming Language?

Programming…


The central limit theorem is one of the statistical theorems with major practical implications. In this blog I will not show the mathematical proof of the central limit theorem, but I will describe its characteristics with some practical examples. The central limit theorem states that, under many conditions, independent random variables summed together will converge to a normal distribution as the number of variables increases. A normal or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. A classic normal distribution shape is the following:


What are they and when is required their use

I first came across the concept of Synthetic Data when I was dealing with an imbalanced data set. As most of you may have experienced, working with imbalanced data sets can be a real challenge. In fact, when the problem of imbalanced data sets is underestimated, the consequence is poor performance when using most of the machine learning algorithms.

One common approach to work with imbalanced datasets is to oversample the minority class. The easiest way can be by duplicating examples in the minority class however, the additional data points don’t…


Functions are really a hot topic in Python. For beginners of programming, is really hard to build complex functions because is needed a lot of practice to be able to use them in an easy way. In this article we will go through all the elements of a function, explaining first what a function is, why is needed and after we will go through the syntax, components, and types of functions.

Let’s get started!

What is a function?

A function is a set of statements that take inputs, to perform some specific tasks and gives back an output. Functions are extremely useful because when…

Soledad Musella Rubio

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store