This homework serves three goals
Type: Individual
Tasks:
For this homework you will be using RStudio on a local machine, either your own personal computer or a school computer. Note that RStudio should be available on most computers across campus.
All this time, we have been working with R Markdown files and knitting them to pdf documents on the Cloud. Whenever you click File > New File > R Markdown it gives you three different file types as a choice, HTML, PDF, and Word. I normally use HTML and teach with HTML. However, the reason we have been using PDF all this time was because Gradescope only accepts pdf file uploads!
For this homework we will be using HTML and one the homework is completed we will make it a pdf. The reason is because pdf files requires extra installation on your computer. However, if you are interested in making these installations such as installing TinyTex you are free to work with pdf. You can find instructions here on how to download TinyTex.
Make sure to watch the video on how to create R projects.
Homework Questions
csv
format. Once downloaded, you will need to load the data set and save it as baseball
. More information on the data can be found here.We are primarily interested in the association between the number of doubles a player hits (Doubles
) and their salary in thousands of dollars (Salary
).
Make a scatter plot between the response and explanatory variable and describe the association you see.
Create a linear regression model that regresses Salary
on Doubles
. Print out the summary.
Interpret the two \(\hat{\beta}\) values.
Create a linear regression model that regresses Salary
on Doubles
, as well as BattingAvg
, Runs
,Walks
, Strikeouts
, Errors
, and FreeAgent9192
.
Interpret the coefficient for Doubles
again. How is this interpretation different than the one you gave in part (c)?
Add the residuals and predicted values to the data frame. Create residual plots and a qqplot. Comment on whether or not the conditions are met to use the model you found in part (d).
Obtain a 95% confidence interval for the coefficient on Double
, and interpret it in the context of the problem.
Perform a hypothesis test on the coefficient for Double
to test whether it has a linear relationship with Salary
. State your hypotheses, draw your conclusion, provide evidence for the conclusion, and interpret your conclusion in the context of the problem.
Scoring: 25 points
In your project folder, you should see at least 4 things
Gradescope expects each answer to be on a separate page. So you will have to add page breaks to your document. Between each of your responses, add the following code <p style="page-break-before: always">
to add page breaks. Add this between each response. That means this code should come after your response to Question 1a and before your response to Question 1b. See this Rmd example file for further assistance. Note that HTML will not show page breaks. However, it will show the page breaks when you can save your file as a pdf. To do that open the HTML file in Google Chrome (or any web browser) and try to print the document (not really) by clicking Ctrl + P (Windows) or Cmd + P (Mac). For Destination, instead of choosing a printer, choose Save as PDF save the file.
If you have been using pdf files all along, then you will have to use \pagebreak
instead of the aforementioned code to add page breaks. See this Rmd example
Upload the pdf file on Gradescope.