🏥 Brest Cancer Classification using Support Vector Machines & Principal Component Analysis ⚕️


1. Overview

. For this project, it will be used the 'Breast Cancer Wisconsin (Diagnostic) Database', as source for the ML model for predicting if a cancer is either Benign or Malignant, based on many features further described.

Here it has some information describing the data:

1.1 Principal Component Analysis (PCA)

According to wikipedia, PCA can be defined as:

PCA is a statistical technique for reducing the dimensionality of a dataset. This is accomplished by linearly transforming the data into a new coordinate system where (most of) the variation in the data can be described with fewer dimensions than the initial data .

Image source: i.stack.imgur.com

1.2 Support Vector Machines

According to wikipedia, SVM can be defined as:

Support Vector Machines are supervised learning models with associated learning algorithms. Given a set of training examples, each marked as belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier.

SVM maps training examples to points in space so as to maximise the width of the gap between the two categories. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.

img source: wikipedia

2. Preprocessing


3. Machine Learning Model


Part 1: Decomposing the model

Part 2: Apply SVM Classifier

Last, we have to evaluate the model. For that, two methods will be used:

Diagnosis

  1. Confusion Matrix:

    • Out of 114 samples of brest cancer in the test data, the model predicted 109 correctly and 5 incorrectly (0.96 accuracy). The most important is to reduce False Negative predictions (i.e, predicting the cancer is BENIGN, when it is MALIGNANT), as the risk is much greater of resulting in complications to the pacient. In test, out of 5 misevaluations, 3 were False Negatives, and two were False Positives;
  2. Classification Report:

    • Looking at the Classification Report, if we assess the model by Rightfully classifying a Malignant Cancer, we can only aprove the model if we accept a minimum Recall of 96%. On the other hand, if we consider the misdiagnosis equaly important, we can only aprove the model if we accept a f1-score of above 97%.