Comparing Machine Learning Loss Algorithms

Introduction

This project was a research paper written and conducted by myself with the guidance of my Computer Science teacher to empirically test the efficiency and accuracy of various loss algorithms used in machine learning models.

The following research question was investigated: To what extent do different loss regression algorithms in machine learning affect the accuracy and efficiency of a machine learning model?

The loss algorithms tested include: Mean Squared Error, Mean Absolute Error, Huber Loss, LogCosh, Mean Absolute Percentage Error, and Mean Absolute Logarithmic Error. There are many more loss algorithms, but these proved to be some of the most widely used, and thus most suitable for the experiment. For more information and the formula, please refer to the paper attached.

Background Information

Supervised machine learning is a process in which the machine learning algorithm is provided with a data set containing various features and labels. Features are characteristics of an object which may correlate to the label. Labels are the value that will be predicted which is based on the features provided. In a supervised machine learning process, labels are provided for every object.

E.g. If a machine learning algorithm was to be created for the purposes of predicting a person’s max heart rate then some features might include: age, amount of exercise, gender, or diet. The label in this scenario would simply be the person’s max heart rate. Some features may have a higher amount of correlation with the label, while others may have low correlation or none at all. The amount of correlation between a specific feature and the label is called the weight. The goal of the machine learning model is to assign correct weights to features so that the model will fit the data and provide an accurate prediction of a label based on given features.

For the algorithm to adjust the weights to better fit the data, it needs to know how accurate the prediction was. This is where loss regression algorithms come in.

In supervised machine learning loss algorithms are used to calculate the error between the predicted value and the actual value. This will then help the model to adjust the weights optimize the solution. The process of iteratively finding a loss value and then optimizing the machine learning model is then repeated for every datapoint in the dataset until the loss values reach an acceptable value or begin to plateau. Different loss regression models provide different loss values, respond differently to outliers, and have different degrees of accuracy. Certain loss algorithms can be more accurate while others can be more efficient. Machine learning is often conducted using large datasets and as such efficiency is an important factor to consider when choosing a loss algorithm.

Conducting the experiment

This project was done in Python using the Tensorflow and Keras library. The machine learning processes were run on a NVIDA RTX 2060 using Cuda cores.

Set up

The following libraries were used in the project.

The pandas library was used for a graphical display of the data using the pandas dataframe.

The keras library was used for the loss algorithm implementation.

The numpy library was used to read the data from the datasets from csv files

The time library was used to time the execution time of the algorithm.

    import pandas as pd
    from tensorflow import keras
    import numpy as np
    from time import process_time

Dataset Selection

One of the most challenging aspects of machine learning is sanitization of datasets. Often times data will come in undesirable formats, and it is up to the Data Scientist to sanitize the dataset, e.g. prepare the data such that it is normalized and readable by the program. Since this was outside the scope of this project, the datasets were acquired from Kaggle, a Data Science community where people share their datasets.

3 data sets of varying sizes were chosen for this project, “Heart Failure Prediction Dataset” consisting of 1,000 entries, “California Housing Dataset” consisting of 17,000 entries and “Uber and Lyft Dataset Boston” consisting of 600,000 entries. Each dataset is a different size to see how different loss algorithms may differ based on dataset size. They all contain well documented data to provide insight on how the different loss algorithms would function in a real world application.

Testing Procedure

The features and labels are chosen for each dataset. The Heart dataset will be using Age vs Max Heart Rate. The Housing dataset will be using Median Income vs Median House Price. The Temperature dataset will be using Temperature vs Apparent Temperature.

The optimizer is a constant and will be Root Mean Squared Propagation.
The metric that will be used to evaluate accuracy is Root Mean Squared Error. Further detail on the metric is mentioned below in the Evaluating Accuracy section.
To test different loss algorithms, the parameter for loss will be changed.

Epochs and batch size will be constants when testing different loss algorithms within the same dataset, but will change for each dataset. In machine learning, each iteration through the dataset is called an epoch. The number of epochs represents the amount of times the machine learning algorithm has passed through an entire dataset. Batch size is the amount of samples that the model will go through before updating the parameters. Increasing the batch size will speed up the machine learning process, but will also increase the memory.

Evaluating Accuracy

Evaluating the accuracy of the different loss algorithms is a bit tricky. The loss values obtained from the different algorithm are not directly comparable to each other as they are of varying magnitudes and have different interpretations. To solve this issue of comparison, a metric will be used. Metrics in machine learning are used to evaluate the performance of a model. The metric used in this project is Root Mean Squared Error. This will be the standard metric used to evaluate the accuracy of the different loss algorithms. A RMSE value will be generated for each loss algorithm tested. This allows for a common value that can be utilized for the sake of comparison

Code Implementation

data = pd.read_csv(r"heart.csv")
# Input data into numpy array
# the specific features and labels change for each dataset
features = np.array((data['Age']))
label = np.array(data['MaxHR'])
# Create a Keras Sequential Model and compile it with specified
# optimizer, loss, and metric
# loss will be changed to fit the current experiment:
# keras.losses.MeanSquaredError()
# keras.losses.MeanAbsoluteError()
# keras.losses.huber()
# keras.losses.log_cosh()
# keras.losses.MeanAbsolutePercentageError()
# keras.losses.MeanAbsoluteLogarithmicError()
model = keras.models.Sequential()
# Start the time flag
t1 = process_time()
# Begin training the model
model.add(keras.layers.Dense(units=1, input_shape=(1,)))
    model.compile(optimizer=keras.optimizers.RMSprop(learning_rate=.01),
    loss=keras.losses.MeanSquaredError(),
    metrics=[keras.metrics.RootMeanSquaredError()])
# model.fit trains the model using the specified parameters
history = model.fit(
    features,
    label,
    batch_size=10,
    epochs=200
)
# End the time flag
t2 = process_time()
# Print the time taken
print(t2-t1)
# Print weights and bias
weight = model.get_weights()[0]
bias = model.get_weights()[1]
print(weight)
print(bias)

Results

In ascending order, most to least efficient were, MSE, MAE, LogCosh, MSLE, MAPE, and Huber. The tables containing the results can be found in the paper attached.

Effect on Efficiency

The extent to which different loss regression algorithms in a machine learning model will affect the efficiency is moderate. When training machine learning models on small datasets, the time discrepancy between different algorithms is only a few seconds, and has little to no statistical significance. However, when scaled up to datasets containing hundreds of thousands, to millions of data points, the time discrepancy will increase. Though some algorithms such as MAE and LogCosh, remained consistent with the control MSE.

Effect on Accuracy

The extent to which different loss regression algorithms in a machine learning model will affect the accuracy is usually slight but can be high in certain situations. Which algorithm is best will greatly depend on the dataset, however, the difference in accuracy between each algorithm did not vary by much, except for the MAPE algorithm, which was greatly affected by the scale of the dataset and which in turn greatly affected the machine learning model’s accuracy. MSE, the most widely used loss regression proved to be the most efficient and most accurate algorithm.

Conclusion

Machine learning is a quickly evolving field and is responsible for many of the new technologies in this day. Autonomous vehicles, pattern recognition, and predictive analysis, all rely on machine learning models. These machine learning models must be accurate. Even a small discrepancy can lead to a wrong label and that may be the difference between a pedestrian and open road. The training of these models are also very important and can take a long time. Thus, the efficiency of training a model is of great importance.

Future Work

I hope to work on more machine learning projects in the future. This was a simple but fun experiment to examine the effects of different loss algorithms on machine learning models. But I hope to work on more complex projects in the future.

Full Paper

Click here to view the full paper