How it works

One important difference between machine learning and traditional statistics is that the focus is not on causality but on prediction.

The application of machine learning tools is ideal in complex environments where the right decision might depend on a lot of variables (i.e. “wide” data).

When there is an important business decision to be made that requires an accurate prediction, you can rely on machine learning to produce an outcome to validate the results of a prediction.

Separating the signal from the noise

A machine learning algorithm is based on three broad concepts:

Feature extraction is the process of figuring out what variables the model will use. You can simply drop all of the raw data straight into the model, but many machine learning techniques build new variables called “features”, which cluster together important signals that are spread over the many variables. Two common examples of feature extraction are in (1) face recognition, where the “features” are actual facial features that are calculated with information from many different pixels in an image; and in (2) text data, where underlying “topics” of discussion are extracted based on which words and phrases tend to appear together in the same documents.

Regularization determines whether the extracted features actually reflect signal rather than noise. Your model learns from both signal and noise and so it is important to split the difference between a conservative model and a flexible model. A conservative model does not make hasty judgments and is called “regularization”, while a flexible model may contain too much noise, which is called “over-fitting” because the model is learning patterns that won’t hold up in the future.

You can get the right balance between a flexible model and a conservative one by adding a “penalty of complexity”. This has two kinds of effects on a model, “selection” and “shrinkage”. “Selection” is when the algorithm focuses on only a few features that contain the best signal, and discards the others; whereas “shrinkage” is when the algorithm reduces each feature’s influence so that a prediction is not overly reliant on one feature. The most popular form of regularization is called “LASSO”, which stands for Least Absolute Shrinkage and Selection Operator.

How do you ensure the model is making good predictions? Cross-validation is an important test in establishing whether the model is accurate in “out of sample” cases, which is when the model is making predictions for data it has not encountered before. This is important because you will want to use the model to make new decisions, and you need to know it can do that consistently. The best way to perform an “out of sample” test for prediction accuracy is to use data that was not used in training the model.

Every dataset has a mix of signal and noise, and these concepts help you to make better predictions by identifying and separating “signal” (valuable, consistent relationships you want to learn) from “noise” (random correlations that you want to avoid because they will not occur again).

Each of these factors helps us identify and separate “signal” (valuable, consistent relationships that we want to learn) from “noise” (random correlations that won’t occur again in the future, that we want to avoid). Every dataset has a mix of signal and noise, and these concepts will help you sort through that mix to make better predictions.