Abstract

Even though it is the first thing to appear in the report, the abstract should be the last thing that you write. Generally the abstract should serve as a summary of the entire report. Reading only the abstract, the reader should have a good idea about what to expect from the rest of the document. Abstracts can be extremely variable in length, but a good heuristic is to use a sentence for each of the main sections of the IMRD:

Introduction

The introduction should discuss the “why” of your analysis and the “what” of your data. Essentially, you need to motivate why the analysis that you’re about to do should be done. In particular you should state a clear problem of interest. Why does this analysis need to be done? What is the goal of this analysis? The introduction should also provide enough background on the subject area for a reader to understand your analysis. Do not assume your reader knows anything about the subject area that your data comes from. If the reader does not understand your data, there is no way the reader will understand your motivation.

Since you did not collect this data, you can create any reasonable scenario that you would like. (In the real world, you would often have some input before collecting data.)

You do not need to provide a complete data dictionary in the introduction, but you should include one in the appendix. Often the data would be introduced in the Methods section, but here the data should be very closely linked to the motivation of the analysis.

Consider including some exploratory data analysis here, and providing some of it to the reader in the report if you feel it helps present the data.

Methods

The methods section should discuss what you did. The methods that you are using are those learned in class. This section should contain the bulk of your “work.” This section will contain most of the R code that is used to generate the results. Your R code is not expected to be perfect idiomatic R, but it is expected to be understood by a reader without too much effort. The majority of your code should be suppressed from the final report, but consider displaying code if it is concise and helps explain what you did. (If you use rmarkdown you can set echo = FALSE to suppress code.)

Consider adding subsections in this section. One potential set of subsections could be data and modeling. (Here we use modeling to mean fitting probability distributions.) The data section would describe your data. How will it be used in performing your analysis? What if any preprocessing have you done to it? The modeling section would describe the modeling methods that you will consider, as well as strategies for comparison.

Your goal is not to use as many methods as possible. Your task is to use appropriate methods to find a good model to help answer a question about your data.

Results

The results section should contain numerical or graphical summaries of your results. What are the results of applying your chosen methods? Consider reporting “final” or “best” models you have chosen. There is not necessarily one, singular correct model, but certainly some methods and models are better than others in certain situations. The results sections is about reporting your results.

Discussion

The discussion section should contain discussion of your results. That is, the discussion section is used for commenting on your results. This should also frame your results in the context of the data. What do your results mean? What other data do you wish had been collected? What interesting observations arose from your analysis? Results are often just numbers or graphics, here you need to explain what they tell you about the analysis you are performing. The results section tells the reader what the results are. The discussion section tells the reader why those results matter.

Any concluding remarks should be placed here.

Appendix

The appendix section should contain any additional code, tables, and graphics that are not explicitly referenced in the narrative of the report. (If you use rmarkdown and supply the .Rmd file that contains the suppressed code, this is not necessary.) The appendix must contain a data dictionary. Appropriate citations should be placed here.

Rubrics

Report

The 45 points for the report will be assigned as follows:

  • Introduction
    • [2] Analysis is clearly motivated.
      • The why of the analysis is made clear to the reader.
    • [4] Analysis has a clear goal.
      • Reader should be able to clearly identify the problem of interest.
      • Problem of interest can be stated in terms of a statistical problem.
    • [4] Data is clearly explained to the reader
      • Reader should understand what the data is, and how it can be used to achieve the goal.
      • Only the most relevant information should be placed in the introduction.
      • A full data dictionary should be included in an appendix.
    • [2] Exploratory data analysis
      • Only the most relevant EDA should be place in the introduction.
      • Additional EDA may be placed in the appendix.
  • Methods
    • [3] Appropriate methods from class are used.
    • [3] Methods are used correctly.
  • Results
    • [2] Results are clearly organized either visually or as a table.
    • [2] Correct and useful metrics are used.
  • Discussion
    • [3] Correct conclusions are drawn from the results.
    • [6] How the results relate to the goal and motivation is discussed.
  • Abstract
    • [2] Abstract appropriately summarizes the analysis performed.
  • Code
    • [2] R code is provided
      • Either via an Rmd file that generates the report, or in the appendix.
    • [3] R is used appropriately.
      • Does your code perform the desired tasks?
      • Is your code readable?
      • Is your style consistent?
  • General
    • [5] Narrative text is well written.
      • Text is free of spelling errors.
      • Text is written with clarity. (You will not be held to an overly strict grammar standard, but your writing must be understood.)
      • Text is written in a manner such that a reader does not already need to be familiar with the data. (Minimal familiarity with statistical learning is assumed.)
    • [2] Directions are followed.
      • Report has a title.
      • Group number and names are included in the report.

Presentation

The 45 points for the presentation will be assigned as follows:

Same as report:

  • Introduction
    • [2] Analysis is clearly motivated.
    • [4] Analysis has a clear goal.
    • [4] Data is clearly explained to the reader
    • [2] Exploratory data analysis
  • Methods
    • [3] Appropriate methods from class are used.
    • [3] Methods are used correctly.
  • Results
    • [2] Results are clearly organized either visually or as a table.
    • [2] Correct and useful metrics are used.
  • Discussion
    • [3] Correct conclusions are drawn from the results.
    • [6] How the results relate to the goal and motivation is discussed.

Unique to in-class presentation:

  • In-Class Presentation
    • [2] All groups members participate.
    • [5] Origination and presentation of slides
      • Slides are easy to read. (Not overly cluttered with words)
      • Flow of presentation is well organized.
    • [2] Directions are followed.
      • Report is received via email by date and time specified.
      • Group number and names are included in the report.
    • [5] TA Score
      • The TAs will both provide a score from 0 to 5 that summarizes how well they feel your group did on all preceding items. The average of their scores will be reported here.