For this project, you will work in groups to apply what you have learned by analyzing a dataset of your choice.
There are four assignments for the project. Their due dates are:
The overall goal of the project is to apply supervised statistical learning methods to a dataset of your choice to answer a question.
You may use any dataset of your choice, so long as it contains at minimum 500 observations. This dataset might be relevant to research outside of this course, another field, or some other interest of yours. If you have any questions about whether your data is appropriate, do not hesitate to ask. If you plan to use data from another endeavor of yours, such as a research project, be sure to gain permission from the controlling authority first.
The two most common sources of data used by students:
The final product of this project will be a written report of your analysis. It should contain the following sections:
Details of what is expected in each section will be discussed in the template document that will be provided. (This is something new we are trying this semester. Details in class as well.)
Your goal is not to use as many methods as possible. Your task is to use appropriate methods to find a good model that can perform the desired statistical learning task. Most importantly, you should motivate and discussion why that task is being completed, and how well it is being completed.
For this project, you must work in groups of at least three students and at most four students. A portion of your grade will come from your ability to work in a group setting. You may pick your group members if you like.
If you choose your group, a roster of your group members is due by Friday, November 10, 11:59 PM. Send a single email with the list of group members to David (dalpiaz2@illinois.edu). Include all participants’ University emails in the CC line as a means to verify that all agree to the group. Also include full names and NetIDs in the body of the email.
If you would like to be assigned to a group, send an email to David (dalpiaz2@illinois.edu) and simply state in the body of the email that you would like to be assigned to a group. You must do so by Friday, November 10, 11:59 PM, but if you do so earlier, you may be assigned to a group earlier.
Groups of “one” may be considered, if and only if, you are willing to sacrifice a total of 2.5% of your total course grade that comes from peer evaluation. (You will still self evaluate.) The ability to work in a group is an important skill. If you would like to be a group of one, send an email to David (dalpiaz2@illinois.edu) and simply state in the body of the email that you would like work as a group of one. You must do so by Friday, November 10, 11:59 PM
A proposal of your intended project is due by Friday, December 1, 11:59 PM. It should be submitted online via Compass by a single group member.
After review of the proposal, it will be evaluated in one of two ways:
A proposal of your intended project should include the following:
R
. Load the data, and print the first few values of the response variable as evidence. Why this data?R
. Fit either lm()
(regression) or glm()
(classification) then call predict()
on the results and return the first few values. You may need to perform some data cleaning before this step.As a group, you will submit a .zip
file as you would for homework that contains an .html
and .Rmd
file, as well as the data if it cannot be linked online. If your data is too large to submit, and cannot be linked, please let us know and we will find an alternative.
The final report of your analysis is due by Thursday, December 21, 10:00 PM. It should be submitted online via Compass by a single group member.
As a group, you will submit a .zip
file as you would for homework which contains a .pdf
and .Rmd
file, as well as the data if it cannot be linked to online. Be sure to follow the suggested formatting in the template document.
A peer evaluation of the group members is due by Thursday, December 21, 10:00 PM. It should be submitted online via Compass by each group member.
Individually, you will write a short review of each of your group members, including yourself. For each member, comment on:
Individually, you will submit a single file (.pdf
preferred) that contains your reviews.
Grading for the group choice is all-or-nothing based on making a group selection before the deadline.
You will be graded on formatting, motivation, appropriateness of data, etc.
A breakdown of the points for the final report:
R
: 10
rmarkdown
: 10
rmarkdown
?It is more important that you honestly review your team than give each member good remarks. You will be graded on how well you review your group members. If you simply give each of your team members equally good marks, you will likely receive fewer points for the portion of the grade dedicated to evaluating your peers.
This section will likely be updated as we progress through the remainder of the semester.
How long should the report be?
Isn’t this a lot to do at the end of the course while we have other things going on in the course? And it’s due during finals week?