# Creating Classifier Data using Logistic Regression

Question

The director of the MBA program at Salterdine University wants to develop a procedure to determine which applicants to admit to the MBA program. The director believes that an applicant’s undergraduate grade point average (GPA) and score on the GMAT exam are helpful in predicting which applicants will be good students. To assist in this endeavor, the director asked a committee of faculty members to classify 70 of the recent students in the MBA program into two groups: (1) good students and (2) weak students. The file MBAStudents.xlsm summarizes these ratings, along with the GPA and GMAT scores for 70 students.

Use Minitab or any other statistics software that you have available to answer the following questions.

• Use discriminant analysis to create a classifier for this data. How accurate (percentage) is this procedure?
• Use logistic regression to create a classifier for this data. How accurate is this procedure?

Minitab does not automatically provide the accuracy percentage so you have to calculate it. First, you need to obtain the predicted “event probabilities”. This is done by selecting “Storage” on the binary logistic regression box and then checking “Event Probability”. To determine if an observation was classified in the correct group you need to compare the probability (column EPRO1) and the Rating columns. If the probability is less than 0.5, it means that our model predicts that the observation belongs to group 1. If the probability is greater than 0.5, it means that our model predicts that the observation belongs to group 2. This way you will determine the number and percentage of misclassified observations on each group. You need to complete the following table:

 True Group Put into Group 1 2 1 2 Total N 35 35 N correct Proportion

Note that this is the same table that you automatically obtain in the discriminant analysis report, so by having the table of each procedure, we can determine which one is more accurate which is required on part d.

• Interpret the coefficients of the logistic regression.

Here is part of Minitab’s logistic regression report:

Logistic Regression Table

Odds     95% CI

Predictor        Coef    SE Coef      Z      P  Ratio  Lower  Upper

Constant      19.5430    4.73543   4.13  0.000

GPA          -3.03903   0.880393  -3.45  0.001   0.05   0.01   0.27

GMAT       -0.0189050  0.0062161  -3.04  0.002   0.98   0.97   0.99

The coefficients that you are required to interpret are those contained in the “Coef” column. Basically, you need to determine how the variables GPA and GMAT affect the odds of being in the groups. Week 8 folder contains “logistic regression slides” which provide a good explanation on how to interpret the coefficients. In particular, look at slides 22 and 23.

• Compare the discriminant analysis and the logistic regression. Which one is more accurate?

Summary

This question belongs to statistics and discusses about logistic regression and discriminant analysis.

Word count: NA

• Rasha

this is a very good website

• maani

I have 50 questions for the same test your page is showing only 28

• joeanne

• joeanne

hi can anyone help or guide me to my assignments. thanks

• Monik

• Cristina

This solution is perfect ...thanks

• Janete

Hello Allison,I love the 2nd image that you did! I also, had never heard of SumoPaint, is something that I will have to exolpre a bit! I understand completely the 52 (or so) youtube videos that you probably watched. Sometimes they have what you want, sometimes they don't! However, it is always satisfying when you are able to produce something that you have taught yourself. Great job!Debra 0 likes

• Sandeep

Perfect bank of solution.

• Oxana

great !

• Paul Brandon-Fritzius

thanks for the quick response. the solution looks good. :)

• tina Johnson

thnx for the answer. it was perfect. just the way i wanted it.

• Giuseppe

works fine.