SciSharp Series: Binary Classification with Keras.Net

Hi rekan dev,

Kali ini kita lanjutkan dengan ujicoba kasus mengklasifikasi email spam dengan menggunakan dataset dari UCI (SpamBase). Dataset ini sudah dilakukan preprocessing seperti menghitung kemunculan kata tertentu, lalu karakter tertentu, dan karakteristik lain, seperti keterangan attribut data berikut ini:

48 continuous real [0,100] attributes of type word_freq_WORD
= percentage of words in the e-mail that match WORD, i.e. 100 * (number of times the WORD appears in the e-mail) / total number of words in e-mail. A “word” in this case is any string of alphanumeric characters bounded by non-alphanumeric characters or end-of-string.

6 continuous real [0,100] attributes of type char_freq_CHAR]
= percentage of characters in the e-mail that match CHAR, i.e. 100 * (number of CHAR occurences) / total characters in e-mail

1 continuous real [1,…] attribute of type capital_run_length_average
= average length of uninterrupted sequences of capital letters

1 continuous integer [1,…] attribute of type capital_run_length_longest
= length of longest uninterrupted sequence of capital letters

1 continuous integer [1,…] attribute of type capital_run_length_total
= sum of length of uninterrupted sequences of capital letters
= total number of capital letters in the e-mail

1 nominal {0,1} class attribute of type spam
= denotes whether the e-mail was considered spam (1) or not (0), i.e. unsolicited commercial e-mail.

Keras.NET adalah versi porting dari Keras berbasis python, sehingga rekan-rekan bisa membuat model deep learning dengan .NET (C#).

Mulai Koding

Nah, mari kita coba saja langsung. silakan ikut langkah berikut:

  1. Install VSCode atau visual studio dari Download Visual Studio Code – Mac, Linux, Windows
  2. Lalu install .NET 6 kalau belum terinstall, dapatkan dari Download .NET (Linux, macOS, and Windows) (microsoft.com)
  3. Kemudian buatlah aplikasi console dengan cara ketik dotnet new console di command prompt, atau dari visual studio : new project > console.
  4. Klik kanan pada project lalu install nuget package : Keras.NET bisa ketik juga dengan dotnet add package Keras.NET, install juga package SliceAndDice
  5. Lalu ketikan kode berikut:
  6. 
    
    using Keras;
    using Keras.Layers;
    using Keras.Models;
    using Numpy;
    using Keras.Optimizers;
    using ML.Tools;
    using Keras.Utils;
    using SliceAndDice;
    using System.Linq;
    using System.Data;
    
    var datasetPath = $"{FileHelpers.AppDirectory}\\..\\..\\..\\..\\Dataset\\spambase.csv";
    Console.WriteLine(datasetPath);
    
    //tarik data csv
    var data = DatasetHelper.LoadAsDataTable(datasetPath,HasHeader:false);
    
    //split training n test data
    var (dt_train, dt_test) = data.Split();
    
    //lihat data contoh
    data.Head();
    
    //buang kolom y
    NDarray y_train = dt_train.Pop("Col57");
    //normalisasi dengan z-score
    data.Normalization();
    //features
    NDarray x_train = dt_train.ToNDArray();
    
    //Build sequential model
    var model = new Sequential();
    model.Add(new Dense(64, activation: "relu", input_shape: new Keras.Shape(dt_train.Columns.Count)));
    model.Add(new Dense(32, activation: "relu"));
    model.Add(new Dense(1, activation: "sigmoid")); //disesuaikan dengan jumlah class labelnya
    
    //Compile and train
    model.Compile(optimizer: new Adam(), loss: "binary_crossentropy", metrics: new string[] { "accuracy" });
    model.Fit(x_train, y_train, batch_size: 1, epochs: 10, verbose: 1, validation_split: 0.2f);
    
    //test
    
    var jawaban = (from x in dt_test.AsEnumerable()
                  select int.Parse(x.Field<string>("Col57"))).ToList();
    
    NDarray y_test = dt_test.Pop("Col57");
    dt_test.Normalization();
    NDarray x_test = dt_test.ToNDArray();
    
    var score = model.Evaluate(x_test, y_test);
    Console.WriteLine("Test loss: {0:n2}", score[0]);
    Console.WriteLine("Test accuracy: {0:n2}", score[1]);
    
    //test
    var res = model.Predict(x_test);
    //parse ndarray to float
    ArraySlice<float> ts = new ArraySlice<float>(res.GetData<float>());
    //slice and dice
    var xx = ts.Chunk(1);
    var counter = 0;
    foreach (var x in xx)
    {
        var IsSpam =  Math.Abs(1-x[0]) < Math.Abs(x[0]) ? true:false;
        
        Console.WriteLine($"test no.{counter + 1} = {(jawaban[counter] == 1?"Spam":"Not Spam")} / {(IsSpam ? "Spam":"Not Spam")}");
        counter++;
    }
    
    //Save model and weights
    string json = model.ToJson();
    File.WriteAllText("model.json", json);
    model.SaveWeight("model.h5");
    
    //Load model and weight
    var loaded_model = Sequential.ModelFromJson(File.ReadAllText("model.json"));
    loaded_model.LoadWeight("model.h5");
  7. Silakan jalankan dengan menekan F5

Penjelasan

Perbedaan saat melakukan tugas klasifikasi bisa dilihat penggunaan activation function di output layer yaitu menggunakan softmax, lalu jumlah nodenya cukup 1, output dari jaringan ini yaitu bernilai 1-0, mendekati 1 yaitu spam dan mendekati 0 yaitu non spam. 

Sedangkan saat training kita gunakan parameter loss functionnya binary_crossentropy dan metricnya adalah accuracy.

Silakan coba project lengkapnya dari link berikut:

SciSharpSeries/src/KerasNet.BinaryClassification at main · mifmasterz/SciSharpSeries (github.com)

-Salam Developer

Loading

You May Also Like