For this assignment you will be analyzing the Weekly data frame in the ISLR package, which contains 9 features for 1089 weeks of S&P 500 stock index returns from 1990-2010. The goal of this assignment is to become more familiar with linear and logistic regression. All analyses must be performed in R using tidyverse and other packages discussed in class. Provide your responses (with R code pasted in text format) in designated spaces in this Word document, and then save it as a pdf and upload it to Canvas.
Question 1. Fit a linear regression model to predict the percentage return today from the percentage return for the five previous weeks. Print a summary of the model output to the console. Which feature is most statistically important in the model? What happens to the predicted percentage return today as you increase the value of this feature? How well does the model fit the training data? Explain your reasoning for all three answers.
Question 2. Use the linear regression model from question 1 to predict with 95% confidence the percentage return today when the percentage return for each of the five previous weeks is -10%. What is the mean predicted percentage return today? Are you able to predict whether the percentage return today will be positive or negative with 95% confidence, and how do you know? Last, based on your findings from questions 1 and 2, would you trust this model for predicting S&P 500 stock returns with your own money? Why or why not?
Question 3. Fit a logistic regression model to predict whether the percentage return today will be positive or negative from the percentage return for the five previous weeks. Make sure to recode the response classes so that true and predicted classes can be easily compared in the next question. Print a summary of the model output to the console. Which feature is most statistically important in the model? How well does the model fit the training data? Explain your reasoning for both answers.
Question 4. Use the logistic regression model from question 3 to predict classes for the training data. Print a confusion matrix and an estimate of classification training accuracy to the console. Use these findings to explain whether the model produces balanced classifications. Also, based on your findings from questions 3 and 4, would you trust this model for predicting S&P 500 stock returns with your own money? Why or why not?