Classification

Predicts categorical class labels (discrete, nominal) by constructing a model based on training data to use for classifying new data → Decision Boundary.

Class Imbalance

General Process

  1. Model construction:
  • training set with class labels
  • classification rules, decision trees, formulae
  1. Model usage:
  • Estimate Accuracy
    • Test set (independent of training set) to compare with results from model
    • Accuracy Rate → hit rate
  • if acceptable use model to classify new data

Methods

Perceptron & Adaline

  • Both are not able to converge for data that is not linearly seperable

SVM vs. Logistic Regression

  • Logistic Regression is more prone to Outliers. SVM only look at the support vectors so it isnt as sensitive.
  • Logistic Regression can be updated easier with streamed data

also see Performance Evaluation Metrics