Getting Started with ML.NET for Machine Learning: Building LLMs for AI Applications

In previous articles, we've highlighted concepts surrounding Artificial Intelligence (AI) and Machine Learning (ML) as it relates to governance, development, and infrastructure to build, deploy and maintain responsible AI/ML applications and their respective models. In this article we will highlight a simple and efficient way for getting acquainted with available tooling dedicated to simplifying the process for developing, training, fine-tuning, and evaluating machine learning algorithms. This also serves as a way to couple theory with practice, aligning much of the concepts and terminology with applicable experience.

The tooling and steps outlined in this article may not be applicable for your AI/ML use case. However, if you are looking for a practical guideline with a simple dataset, to better outline things to consider when developing more robust use cases centered around AI and machine learning, then this article is for you.

Prerequisites

In this section you'll find a list of prerequisites. We'll flow through these steps a bit more in depth, however if you already have them installed or want to skip ahead, please install the following tools:

Download the .NET SDK
Install ML. NET CLI
A Sample Dataset

Installation

In this section, we'll flow through the tools installed and their respective purposes. If you have already installed these tools, then it may suit you best to skip ahead to the build process of this article.

Install .NET SDK

Download and install the .NET SDK for the appropriate operating system.

.NET is an open sourced, cross-platform developer that enables developers to build applications written in C#, F# and Visual Basic. This can be installed in Visual Studio code via the extensions capability. It provides an extensive set of libraries and APIs to integrate required workflows and functionalities into applications and micro-services.

Note: The terminal did not automatically detect the cli post installation. Therefore, it required me to create a symbolic link for the installation directory to the bin folder. You can run the following command and check your local bin folder for confirmation:

echo $PATH

You may alternatively use a package manager that is installable on your OS.

Install ML.NET CLI

To build machine learning models in the command line, you'll need to install the ML.NET CLI. The following command in a bash shell will install the client on your local workstation.

dotnet tool install -g mlnet-osx-arm64

Note: The "mlnet" tool installed in ".dotnet" folder within the user directory, therefore you can quickly export the path and determine the best configuration options for you at a later time.

export PATH="$PATH:$HOME/.dotnet/tools

Execute the "mlnet" command in your terminal to confirm installation.

Build Process

Create an app directory

Create a local working directory.

mkdir MLApp

Navigate into this directory.

cd MLApp

Load your data

Prior to handling data, its important to have a general understanding of the several techniques that are available for use with machine learning. Machine Learning utilizes classical techniques for decision making, such as:

Classification - This is used when its ideal to predict how to classify items within a dataset. For example, if analyzing a set of responses from customers, there may be a need to classify items as a positive or negative experience.
Image classification - A technique used when its ideal to classify groupings for items within an image based dataset. For instance, if analyzing images for a set of employees who've provided headshots, we might want to group images as women or man.
Regression (numerical value prediction) - A technique used when its ideal to analyze data for numerical predictions. For example, predicting the cost of a specific product based on historical data.
Forecasting (future value) - A technique used when you want to predict the future value, within the scope of a time series. For example, predicting annual sales.
Recommendation - A technique used when it's ideal to recommend "a thing" based on historical feedback from the respective user. For instance, recommendation algorithms in streaming apps based on likes of previously watched movies and shows.

If you do not have your own data on hand, you can download an example dataset from UC Urvine's Machine Learning Repository. This file includes a readme, provide information related to each dataset. The datasets include sentiment feedback labeled with a "1" or "0" to represent a positive or negative experience. Each dataset includes a comment with a sentiment label.

Train the Model

Our next step is to "train" the model. In machine learning, training a model is simply providing the machine learning algorithm with data, so it can identify (learn) patterns and trends, to make predictions, forecasts, or suggested on new data sets when applied. The process of "feeding" data to the machine learning algorithm for analysis, is considered "model training".

In this example, we will use the dataset labeled as "amazon_cells_labelled.txt" which includes 500 positive (1) and 500 negative (0) responses, submitted by users via a website. To begin the training process, we'll run the following command:

mlnet classification --dataset sentiment\ labelled\ sentences/amazon_cells_labelled.txt --label-col 1 --has-header false --name SentimentModel --train-time 60

Evaluate the Model

In this instance, we are using the classification command, to apply trainers best for classification to the referenced text which will categorize it accordingly. Running the command for the provided train-time (length of time you want the CLI tool to explore the various models) producing experimental results:

|------------------------------------------------| Experiment Results |----------------------------------------------------------------------------------------------| Summary                |---------------------------------------------

|ML Task: multiclass classification                                  |

|Dataset: /Users/MLAPP/sentiment labelled sentences/amazon_cells_labelled.txt|

|Label : col1                                                        |

|Total experiment time :    59.0000 Secs                             |

|Total number of models explored: 400                                |

|--------------------------------------------------------------------|

|                        Top 5 models explored                       |

|--------------------------------------------------------------------|

|      Trainer                             MacroAccuracy Duration    |

|--------------------------------------------------------------------|

|35    FastForestOva                       0.8059     0.3900         |

|11    FastForestOva                       0.7964     0.6040         |

|174   FastForestOva                       0.7964     0.4490         |

|194   FastForestOva                       0.7964     0.4550         |

|238   FastForestOva                       0.7964     0.5060         |

|--------------------------------------------------------------------|

[Source=AutoMLExperiment, Kind=Info] cancel training because cancellation token is invoked...

The ML Task used for this dataset is the multiclass classification task, a supervised machine learning task that ingests a set of labeled data, where the label is converted into a numerical value, used to create classifiers. The classifiers are then used to predict the category of future and unlabeled datasets. In this instance, we are predicting the sentiment labeled as col1.

Evaluation Metrics

We also need to gain a better understanding on what each of the metrics means, as it relates to performance, of the models. When it relates to machine learning, similar to your SLIs and SLOs being specific to your application and environment, the metrics here are going specific to the machine learning tasks that the model is using.

Here, since we are using the multi-class task, we should aim to look at the following metrics:

Micro-accuracy - aggregates all of the classes to compute an average metric. This is ideal if there is potentially an imbalance with a or some classes.
Macro-accuracy - computes an average metric for each class, and then aggregates these metrics to create an overall metric.

The closer to 1.00 the metric is, the better the performance of each. The top 5 models explored, are the FastForestOva, maintaining an average of .80 after rounding up.

Generate the code

When the command runs successfully, it generates code output in various formats for the .Net console app, for future training and consumption of the model. If we take a look at the Program.cs file, we'll see the following code generated:

// This file was auto-generated by ML.NET Model Builder.

using System;

namespace SentimentModel.ConsoleApp

    class Program

        static void Main(string[] args)

            // Create single instance of sample data from first line of dataset for model input

            SentimentModel.ModelInput sampleData = new SentimentModel.ModelInput()

                Col0 = @"So there is no way for me to plug it in here in the US unless I go by a converter.",

};

            Console.WriteLine("Using model to make single prediction -- Comparing actual Col1 with predicted Col1 from sample data...\n\n");

            Console.WriteLine($"Col0: {@"So there is no way for me to plug it in here in the US unless I go by a converter."}");

            Console.WriteLine($"Col1: {0F}");

            var sortedScoresWithLabel = SentimentModel.PredictAllLabels(sampleData);

            Console.WriteLine($"{"Class",-40}{"Score",-20}");

            Console.WriteLine($"{"-----",-40}{"-----",-20}");

            foreach (var score in sortedScoresWithLabel)

                Console.WriteLine($"{score.Key,-40}{score.Value,-20}");

            Console.WriteLine("=============== End of process, hit any key to finish ===============");

            Console.ReadKey();

The program file, written in C#, takes Col0 using a zero index, and predicts the classification of col1 (positive or negative) while comparing it to the actual column its assigned to. We'll review the results in a later section.

Consume the Model

Once you have reviewed the Program.cs file to review the code, you'll want to use the following command to run the code:

dotnet run

This command will generate similar output:

$ dotnet run

Using model to make single prediction -- Comparing actual Col1 with predicted Col1 from sample data...

Col0: So there is no way for me to plug it in here in the US unless I go by a converter.

Col1: 0

Class                                   Score

-----                                   -----

0                                       0.9202177

1                                       0.0797823

============== End of process, hit any key to finish ==============

Based on the output, the sentiment of the first line in the dataset was assigned a "0" for a negative experience. Upon running the program, we are able to see the prediction scoring of "0.92", which aligns with the assigned column.

Final Thoughts

The practice of machine learning is no simple feat. This is especially true if you are transitioning from other fields, or if you have a habit of grasping lower level concepts prior to shifting into a new domain. However, it's a possible one. As organizations continue to look for new and innovative ways to develop automated tooling and managing infrastructure with AI and Machine Learning, it bridges the knowledge gap.

A.M. Tech Consulting