Guidelines

· Share screen shot on your response

· Share the code and the plots

· Put your name and id number

· Clear mark question number

· Upload Word document

· Insert Cover page Questions Attempted

HW04 Cover Sheet

Identify all questions that you attempted in this template

Q1 Chapter 04 Classification Examples

Part 1 Review logistic regression in Chapter 4 – Classification

https://github.com/JWarmenhoven/ISLR-python

Use the examples to review 4.3 logistic regression for the ISLR Text

a. Plot Figure 4.1

b. Plot Figure 4.2

c. Table 4.1, 4.2, 4.3

d. Plot Figure 4.3

Hint use – https://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%204.ipynb#4.3-Logistic-Regression

Part 2 Application to Caravan Insurance Data¶

Use Caravan.csv to apply KNN and Logistic Regression to the Caravan data

Hint use https://nbviewer.jupyter.org/github/JWarmenhoven/ISL-python/blob/master/Notebooks/Chapter%204.ipynb#4.6.5-K-Nearest-Neighbors

Q2. Classification Textbook Examples

Using the Boston data set, fit classification models in order to predict whether a given suburb has a crime rate above or below the median. Explore logistic regression, and KNN models using various subsets of the predictors. Describe your findings.

Hint use: https://botlnec.github.io/islp/sols/chapter4/exercise13/

Q3 Iris Data Set and Classification (iris.csv)

The Iris dataset was used in R.A. Fisher’s classic 1936 paper. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The columns in this dataset are:

· Id

· Sepal Length Cm

· Sepal Width Cm

· Petal Length Cm

· Petal Width Cm

· Species

a. Plot the iris dataset i) Sepal Length vs Sepal Width ii) Petal Length vs Petal Width

Split into Training / Test and

b. Apply Naïve Bayes Classifier to classify species with the decision boundaries

c. Apply logistic regression to classify species with the decision boundaries

d. Apply KNN algorithm to classify species with the decision boundaries

e. Compare the Truth matrix and Accuracy of the three algorithms

TP

TN

FP

FN

Accuracy

Naïve Bayes

Logistic Regression

KNN

Hint

Naïve Bayes – https://xavierbourretsicotte.github.io/Naive_Bayes_Classifier.html

Logistic Regression

https://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html

https://www.datacamp.com/community/tutorials/understanding-logistic-regression-python

KNN Algorithm

https://www.ritchieng.com/machine-learning-k-nearest-neighbors-knn/

Q4 Titanic Data Set and Classification (titanic.zip already separated as test, train)

a. Perform Exploratory Data Analysis

b. Do Feature Engineering

c. Apply logistic regression

d. Apply KNN algorithm

Hint

https://www.kaggle.com/angps95/basic-classification-methods-for-titanic

Q5. How does k-fold cross validation and grid search on the Social Ads Network data

Use the references the explain how the two work together to evaluate a model

https://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html

https://sebastianraschka.com/faq/docs/evaluate-a-model.html