Evaluating Performance— Regression

Rishi Kumar
Nerd For Tech
Published in
3 min readJun 12, 2021

--

Whatever we do, we need evaluation to let us know right or wrong.

I believe you all know what is Regression task. For example

  • Attempting to predict the price of a house given its feature is a regression task
  • Attempting to predict the country a house is in given its feature would be a classification task.

The most common evaluation metrics for regression:

→ Mean Absolute Error (MAE).

→ Mean Squared Error (MSE).

→ Root Mean Squared Error (RMSE).

Mean Absolute Error:

  • Mean Absolute Error is calculated by taking the mean of absolute value of the difference between actual and predicted value.
Figure 1 : MAE formula
  • Major drawback in MAE is it won’t punish the large errors.
Figure 2: Anscombe’s Quartet.
  • When we take Anscombe’s quartet, the best fit line will be same for different datasets. MAE won’t account outliers.
  • In a specific situation we can take below image.
Figure 3 : Outlier in prediction
  • There is a huge outlier in this image, where MAE won’t take account of these errors.

.

.

.

.

.

Figure 4 : Outlier

.

  • We want our error metrics to take account for these. So that we move some other error metrics like MSE.

.

.

.

Mean Squared Error (MSE):

  • Mean Squared Error is calculated by taking the average of the square of the difference between actual and predicted value.
  • When we take square of the value, larger errors are noted more than with MAE, making MSE more popular.
Figure 4 : MSE formula
  • However there is problem with MSE, when we take the square of the errors, (negative value will be converted into positive value, since we are squaring out) actually it also squares the unit for example when we take the square of the house prediction errors, it squares the Rupees to Rupees square which is difficult to interpret for us. To overcome this we go to RMSE.

Root Mean Squared Error (RMSE):

  • RMSE is just calculated by taking out the square root of Mean Squared Error.
Figure 5 : RMSE
  • RMSE is the most popular, because it does both punishing the error and has the same units as predicted value (y).

→ Now there is one most common question arises,

Is this value of RMSE good?

  • It’s all about business we look for,
  • A RMSE of $10 is fantastic for predicting the price of a house, but horrible for predicting the price of a candy bar.
  • Compare your error metric to the average value of the label in your data set to try to get an intuition of its overall performance.
  • Domain Knowledge also plays an important role.

I hope, you got an idea about what error metrics to take note.

Thanks for reading!

--

--

Rishi Kumar
Nerd For Tech

I'm a passionate and disciplined Data Science enthusiast working with Logitech as Data Scientist