ML.NET Series: Menggunakan Data dari Database untuk Training Model

Hi Rekans,

Beberapa hari lalu diumumkan fitur terbaru dari ML.NET versi 1.4, salah satu diantaranya adalah Database Loader. Seperti kita ketahui diperusahaan-perusahaan cukup intensif menggunakan database relational sehingga untuk memanfaatkan data tersebut sebagai data training kita harus implementasi secara manual penarikan data dari database menggunakan IEnumerable collection seperti pada link ini.

Nah, pada release kali ini ML.NET menyertakan fitur Database Loader yang memungkinkan untuk menarik data langsung dari database, yang kita butuhkan hanya connectionstring dan query select untuk menarik datanya.

Database relational yang disupport saat ini antara lain SQL Server, Oracle, MySQL, PostgreSQL, IBM DB2, Sqlite, Azure SQL Database, Progress. Intinya semua database provider yang disupport System.Data baik di .Net Framework atau .Net Core.

Secara singkat cara memanfaatkan fitur ini bisa ikuti langkah dibawah ini:

  1. Buat aplikasi menggunakan console, windows form atau lainnya
  2. Install nuget package antara lain : Microsoft.ML, Microsoft.ML.Experimental, System.Data.SqlClient gunakan versi terakhir dan pre-release
  3. Lalu ini contoh kode untuk meload data dari database:
string dbFilePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Database", "Iris.mdf");
string connectionString = $"Data Source = (LocalDB)\\MSSQLLocalDB;AttachDbFilename={dbFilePath};Database=Iris;Integrated Security = True";
string commandText = "SELECT * from IrisData";

DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader<IrisData>();

DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance,
connectionString,
commandText);

IDataView dataView = loader.Load(dbSource);
var pre = dataView.Preview();

var trainTestData = mlContext.Data.TrainTestSplit(dataView);
var finalTransformerPipeLine = mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: "class", outputColumnName: "KeyColumn").
Append(mlContext.Transforms.Concatenate("Features", nameof(IrisData.petal_length), nameof(IrisData.petal_width), nameof(IrisData.sepal_length),
nameof(IrisData.sepal_width)));

// Apply the ML algorithm
var trainingPipeLine = finalTransformerPipeLine.Append(mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(labelColumnName: "KeyColumn", featureColumnName: "Features"))
.Append(mlContext.Transforms.Conversion.MapKeyToValue(outputColumnName: "class", inputColumnName: "KeyColumn"));

Console.WriteLine("Training the ML model while streaming data from a SQL database...");
Stopwatch watch = new Stopwatch();
watch.Start();

var model = trainingPipeLine.Fit(trainTestData.TrainSet);

watch.Stop();
Console.WriteLine("Elapsed time for training the model = {0} seconds", watch.ElapsedMilliseconds / 1000);

Console.WriteLine("Evaluating the model...");
Stopwatch watch2 = new Stopwatch();
watch2.Start();

var predictions = model.Transform(trainTestData.TestSet);
// Now that we have the test predictions, calculate the metrics of those predictions and output the results.
var metrics = mlContext.MulticlassClassification.Evaluate(predictions, "KeyColumn", "Score");

watch2.Stop();
Console.WriteLine("Elapsed time for evaluating the model = {0} seconds", watch2.ElapsedMilliseconds / 1000);

ConsoleHelper.PrintMultiClassClassificationMetrics("==== Evaluation Metrics training from a Database ====", metrics);

Console.WriteLine("Trying a single prediction:");

var predictionEngine = mlContext.Model.CreatePredictionEngine<IrisData, DataPrediction>(model);

var sampleData1 = new IrisData()
{
sepal_length = 6.1f,
sepal_width = 3f,
petal_length = 4.9f,
petal_width = 1.8f,
class1 = string.Empty
};

var sampleData2 = new IrisData()
{
sepal_length = 5.1f,
sepal_width = 3.5f,
petal_length = 1.4f,
petal_width = 0.2f,
class1 = string.Empty
};

var irisPred1 = predictionEngine.Predict(sampleData1);
var irisPred2 = predictionEngine.Predict(sampleData2);

// Since we apply MapValueToKey estimator with default parameters, key values
// depends on order of occurence in data file. Which is "Iris-setosa", "Iris-versicolor", "Iris-virginica"
// So if we have Score column equal to [0.2, 0.3, 0.5] that's mean what score for
// Iris-setosa is 0.2
// Iris-versicolor is 0.3
// Iris-virginica is 0.5.
//Add a dictionary to map the above float values to strings. 
Dictionary<float, string> IrisFlowers = new Dictionary<float, string>();
IrisFlowers.Add(0, "Setosa");
IrisFlowers.Add(1, "versicolor");
IrisFlowers.Add(2, "virginica");

Console.WriteLine($"Predicted Label 1: {IrisFlowers[Array.IndexOf(irisPred1.Score, irisPred1.Score.Max())]} - Score:{irisPred1.Score.Max()}", Color.YellowGreen);
Console.WriteLine($"Predicted Label 2: {IrisFlowers[Array.IndexOf(irisPred2.Score, irisPred2.Score.Max())]} - Score:{irisPred2.Score.Max()}", Color.YellowGreen);
Console.WriteLine();

Nah untuk contoh aplikasi lengkapnya dapat dilihat disini. Selamat bereksperimen.

Salam makers ;D

Loading

You May Also Like